All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery
@ 2017-07-28  8:06 Peter Xu
  2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
                   ` (30 more replies)
  0 siblings, 31 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

As we all know that postcopy migration has a potential risk to lost
the VM if the network is broken during the migration. This series
tries to solve the problem by allowing the migration to pause at the
failure point, and do recovery after the link is reconnected.

There was existing work on this issue from Md Haris Iqbal:

https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html

This series is a totally re-work of the issue, based on Alexey
Perevalov's recved bitmap v8 series:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html

Two new status are added to support the migration (used on both
sides):

  MIGRATION_STATUS_POSTCOPY_PAUSED
  MIGRATION_STATUS_POSTCOPY_RECOVER

The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
network failure is detected. It is a phase that we'll be in for a long
time as long as the failure is detected, and we'll be there until a
recovery is triggered.  In this state, all the threads (on source:
send thread, return-path thread; destination: ram-load thread,
page-fault thread) will be halted.

The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
a recovery, both source/destination VM will jump into this stage, do
whatever it needs to prepare the recovery (e.g., currently the most
important thing is to synchronize the dirty bitmap, please see commit
messages for more information). After the preparation is ready, the
source will do the final handshake with destination, then both sides
will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.

New commands/messages are defined as well to satisfy the need:

MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
delivering received bitmaps

MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
handshake of postcopy recovery.

Here's some more details on how the whole failure/recovery routine is
happened:

- start migration
- ... (switch from precopy to postcopy)
- both sides are in "postcopy-active" state
- ... (failure happened, e.g., network unplugged)
- both sides switch to "postcopy-paused" state
  - all the migration threads are stopped on both sides
- ... (both VMs hanged)
- ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
  source side, "-r" means "recover")
- both sides switch to "postcopy-recover" state
  - on source: send-thread, return-path-thread will be waked up
  - on dest: ram-load-thread waked up, fault-thread still paused
- source calls new savevmhandler hook resume_prepare() (currently,
  only ram is providing the hook):
  - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
    - src sends MIG_CMD_RECV_BITMAP to dst
    - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
      - src uses the recved bitmap to rebuild dirty bitmap
- source do final handshake with destination
  - src sends MIG_CMD_RESUME to dst, telling "src is ready"
    - when dst receives the command, fault thread will be waked up,
      meanwhile, dst switch back to "postcopy-active"
  - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
    - when src receives the ack, state switch to "postcopy-active"
- postcopy migration continued

Testing:

As I said, it's still an extremely simple test. I used socat to create
a socket bridge:

  socat tcp-listen:6666 tcp-connect:localhost:5555 &

Then do the migration via the bridge. I emulated the network failure
by killing the socat process (bridge down), then tries to recover the
migration using the other channel (default dst channel). It looks
like:

        port:6666    +------------------+
        +----------> | socat bridge [1] |-------+
        |            +------------------+       |
        |         (Original channel)            |
        |                                       | port: 5555
     +---------+  (Recovery channel)            +--->+---------+
     | src VM  |------------------------------------>| dst VM  |
     +---------+                                     +---------+

Known issues/notes:

- currently destination listening port still cannot change. E.g., the
  recovery should be using the same port on destination for
  simplicity. (on source, we can specify new URL)

- the patch: "migration: let dst listen on port always" is still
  hacky, it just kept the incoming accept open forever for now...

- some migration numbers might still be inaccurate, like total
  migration time, etc. (But I don't really think that matters much
  now)

- the patches are very lightly tested.

- Dave reported one problem that may hang destination main loop thread
  (one vcpu thread holds the BQL) and the rest. I haven't encountered
  it yet, but it does not mean this series can survive with it.

- other potential issues that I may have forgotten or unnoticed...

Anyway, the work is still in preliminary stage. Any suggestions and
comments are greatly welcomed.  Thanks.

Peter Xu (29):
  migration: fix incorrect postcopy recved_bitmap
  migration: fix comment disorder in RAMState
  io: fix qio_channel_socket_accept err handling
  bitmap: introduce bitmap_invert()
  bitmap: introduce bitmap_count_one()
  migration: dump str in migrate_set_state trace
  migration: better error handling with QEMUFile
  migration: reuse mis->userfault_quit_fd
  migration: provide postcopy_fault_thread_notify()
  migration: new property "x-postcopy-fast"
  migration: new postcopy-pause state
  migration: allow dst vm pause on postcopy
  migration: allow src return path to pause
  migration: allow send_rq to fail
  migration: allow fault thread to pause
  qmp: hmp: add migrate "resume" option
  migration: rebuild channel on source
  migration: new state "postcopy-recover"
  migration: let dst listen on port always
  migration: wakeup dst ram-load-thread for recover
  migration: new cmd MIG_CMD_RECV_BITMAP
  migration: new message MIG_RP_MSG_RECV_BITMAP
  migration: new cmd MIG_CMD_POSTCOPY_RESUME
  migration: new message MIG_RP_MSG_RESUME_ACK
  migration: introduce SaveVMHandlers.resume_prepare
  migration: synchronize dirty bitmap for resume
  migration: setup ramstate for resume
  migration: final handshake for the resume
  migration: reset migrate thread vars when resumed

 hmp-commands.hx              |   7 +-
 hmp.c                        |   4 +-
 include/migration/register.h |   2 +
 include/qemu/bitmap.h        |  20 ++
 io/channel-socket.c          |   1 +
 migration/exec.c             |   2 +-
 migration/fd.c               |   2 +-
 migration/migration.c        | 465 ++++++++++++++++++++++++++++++++++++++++---
 migration/migration.h        |  25 ++-
 migration/postcopy-ram.c     | 109 +++++++---
 migration/postcopy-ram.h     |   2 +
 migration/ram.c              | 209 ++++++++++++++++++-
 migration/ram.h              |   4 +
 migration/savevm.c           | 189 +++++++++++++++++-
 migration/savevm.h           |   3 +
 migration/socket.c           |   4 +-
 migration/trace-events       |  16 +-
 qapi-schema.json             |  12 +-
 util/bitmap.c                |  28 +++
 19 files changed, 1024 insertions(+), 80 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 16:34   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

The bitmap setup during postcopy is incorrectly when the pgaes are huge
pages. Fix it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 2 +-
 migration/ram.c          | 8 ++++++++
 migration/ram.h          | 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 276ce12..952b73a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
         ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
     }
     if (!ret) {
-        ramblock_recv_bitmap_set(host_addr, rb);
+        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());
     }
     return ret;
 }
diff --git a/migration/ram.c b/migration/ram.c
index 107ee9d..c93973c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -168,6 +168,14 @@ void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb)
     set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
 }
 
+void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
+                                    size_t len)
+{
+    bitmap_set(rb->receivedmap,
+               ramblock_recv_bitmap_offset(host_addr, rb),
+               len);
+}
+
 void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb)
 {
     clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
diff --git a/migration/ram.h b/migration/ram.h
index b711552..84e8623 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -55,6 +55,8 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
 int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb);
 void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb);
+void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
+                                    size_t len);
 void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb);
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
  2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 16:39   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Comments for "migration_dirty_pages" and "bitmap_mutex" are switched.
Fix it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c93973c..c12358d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -222,9 +222,9 @@ struct RAMState {
     uint64_t iterations_prev;
     /* Iterations since start */
     uint64_t iterations;
-    /* protects modification of the bitmap */
-    uint64_t migration_dirty_pages;
     /* number of dirty bits in the bitmap */
+    uint64_t migration_dirty_pages;
+    /* protects modification of the bitmap */
     QemuMutex bitmap_mutex;
     /* The RAMBlock used in the last src_page_requests */
     RAMBlock *last_req_rb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
  2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
  2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 16:53   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

When accept failed, we should setup errp with the reason. More
importantly, the caller may assume errp be non-NULL when error happens,
and not setting the errp may crash QEMU.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 io/channel-socket.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 53386b7..7bc308e 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
         if (errno == EINTR) {
             goto retry;
         }
+        error_setg_errno(errp, errno, "Unable to accept connection");
         goto error;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert()
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (2 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 17:11   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It is used to invert the whole bitmap.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/bitmap.h | 10 ++++++++++
 util/bitmap.c         | 13 +++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index c318da1..460d899 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -82,6 +82,7 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
                        const unsigned long *bitmap2, long bits);
 int slow_bitmap_intersects(const unsigned long *bitmap1,
                            const unsigned long *bitmap2, long bits);
+void slow_bitmap_invert(unsigned long *bitmap, long nbits);
 
 static inline unsigned long *bitmap_try_new(long nbits)
 {
@@ -216,6 +217,15 @@ static inline int bitmap_intersects(const unsigned long *src1,
     }
 }
 
+static inline void bitmap_invert(unsigned long *bitmap, long nbits)
+{
+    if (small_nbits(nbits)) {
+        *bitmap ^= BITMAP_LAST_WORD_MASK(nbits);
+    } else {
+        slow_bitmap_invert(bitmap, nbits);
+    }
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
diff --git a/util/bitmap.c b/util/bitmap.c
index efced9a..9b7408c 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -355,3 +355,16 @@ int slow_bitmap_intersects(const unsigned long *bitmap1,
     }
     return 0;
 }
+
+void slow_bitmap_invert(unsigned long *bitmap, long nbits)
+{
+    long k, lim = nbits/BITS_PER_LONG;
+
+    for (k = 0; k < lim; k++) {
+        bitmap[k] ^= ULONG_MAX;
+    }
+
+    if (nbits % BITS_PER_LONG) {
+        bitmap[k] ^= BITMAP_LAST_WORD_MASK(nbits);
+    }
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one()
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (3 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 17:58   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Count how many bits set in the bitmap.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/bitmap.h | 10 ++++++++++
 util/bitmap.c         | 15 +++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 460d899..9c18da0 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -83,6 +83,7 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
 int slow_bitmap_intersects(const unsigned long *bitmap1,
                            const unsigned long *bitmap2, long bits);
 void slow_bitmap_invert(unsigned long *bitmap, long nbits);
+long slow_bitmap_count_one(const unsigned long *bitmap, long nbits);
 
 static inline unsigned long *bitmap_try_new(long nbits)
 {
@@ -226,6 +227,15 @@ static inline void bitmap_invert(unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
+{
+    if (small_nbits(nbits)) {
+        return (ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits)));
+    } else {
+        return slow_bitmap_count_one(bitmap, nbits);
+    }
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
diff --git a/util/bitmap.c b/util/bitmap.c
index 9b7408c..73a1063 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -368,3 +368,18 @@ void slow_bitmap_invert(unsigned long *bitmap, long nbits)
         bitmap[k] ^= BITMAP_LAST_WORD_MASK(nbits);
     }
 }
+
+long slow_bitmap_count_one(const unsigned long *bitmap, long nbits)
+{
+    long k, lim = nbits/BITS_PER_LONG, result = 0;
+
+    for (k = 0; k < lim; k++) {
+        result += ctpopl(bitmap[k]);
+    }
+
+    if (nbits % BITS_PER_LONG) {
+        result += ctpopl(bitmap[k] & BITMAP_LAST_WORD_MASK(nbits));
+    }
+
+    return result;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (4 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 18:27   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Strings are more readable for debugging.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 3 ++-
 migration/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6803187..bdc4445 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -914,8 +914,9 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 void migrate_set_state(int *state, int old_state, int new_state)
 {
+    assert(new_state < MIGRATION_STATUS__MAX);
     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
-        trace_migrate_set_state(new_state);
+        trace_migrate_set_state(MigrationStatus_lookup[new_state]);
         migrate_generate_event(new_state);
     }
 }
diff --git a/migration/trace-events b/migration/trace-events
index cb2c4b5..08d00fa 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -80,7 +80,7 @@ ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
 await_return_path_close_on_source_joining(void) ""
-migrate_set_state(int new_state) "new state %d"
+migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (5 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 18:39   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

If the postcopy down due to some reason, we can always see this on dst:

  qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000

However in most cases that's not the real issue. The problem is that
qemu_get_be16() has no way to show whether the returned data is valid or
not, and we are _always_ assuming it is valid. That's possibly not wise.

The best approach to solve this would be: refactoring QEMUFile interface
to allow the APIs to return error if there is. However it needs quite a
bit of work and testing. For now, let's explicitly check the validity
first before using the data in all places for qemu_get_*().

This patch tries to fix most of the cases I can see. Only if we are with
this, can we make sure we are processing the valid data, and also can we
make sure we can capture the channel down events correctly.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c |  5 +++++
 migration/ram.c       | 22 ++++++++++++++++++----
 migration/savevm.c    | 29 +++++++++++++++++++++++++++--
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bdc4445..5b2602e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
         header_type = qemu_get_be16(rp);
         header_len = qemu_get_be16(rp);
 
+        if (qemu_file_get_error(rp)) {
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
         if (header_type >= MIG_RP_MSG_MAX ||
             header_type == MIG_RP_MSG_INVALID) {
             error_report("RP: Received invalid message 0x%04x length 0x%04x",
diff --git a/migration/ram.c b/migration/ram.c
index c12358d..7f4cb0f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2416,7 +2416,7 @@ static int ram_load_postcopy(QEMUFile *f)
     void *last_host = NULL;
     bool all_zero = false;
 
-    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
+    while (!(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
         void *host = NULL;
         void *page_buffer = NULL;
@@ -2425,6 +2425,16 @@ static int ram_load_postcopy(QEMUFile *f)
         uint8_t ch;
 
         addr = qemu_get_be64(f);
+
+        /*
+         * If qemu file error, we should stop here, and then "addr"
+         * may be invalid
+         */
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
@@ -2505,6 +2515,13 @@ static int ram_load_postcopy(QEMUFile *f)
             error_report("Unknown combination of migration flags: %#x"
                          " (postcopy mode)", flags);
             ret = -EINVAL;
+            break;
+        }
+
+        /* Detect for any possible file errors */
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
         }
 
         if (place_needed) {
@@ -2519,9 +2536,6 @@ static int ram_load_postcopy(QEMUFile *f)
                                           place_source, block);
             }
         }
-        if (!ret) {
-            ret = qemu_file_get_error(f);
-        }
     }
 
     return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index fdd15fa..13ae9d6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
+    /* Check validity before continue processing of cmds */
+    if (qemu_file_get_error(f)) {
+        return qemu_file_get_error(f);
+    }
+
     trace_loadvm_process_command(cmd, len);
     if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
         error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
@@ -1855,6 +1860,11 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
         return -EINVAL;
     }
 
+    /* Check validity before load the vmstate */
+    if (qemu_file_get_error(f)) {
+        return qemu_file_get_error(f);
+    }
+
     ret = vmstate_load(f, se);
     if (ret < 0) {
         error_report("error while loading state for instance 0x%x of"
@@ -1888,6 +1898,11 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
         return -EINVAL;
     }
 
+    /* Check validity before load the vmstate */
+    if (qemu_file_get_error(f)) {
+        return qemu_file_get_error(f);
+    }
+
     ret = vmstate_load(f, se);
     if (ret < 0) {
         error_report("error while loading state section id %d(%s)",
@@ -1944,8 +1959,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
     uint8_t section_type;
     int ret = 0;
 
-    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-        ret = 0;
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
@@ -1969,6 +1990,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
                 goto out;
             }
             break;
+        case QEMU_VM_EOF:
+            /* This is the end of migration */
+            goto out;
+            break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (6 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 18:42   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It was only used for quitting the page fault thread before. Let it be
something more useful - now we can use it to notify a "wake" for the
page fault thread (for any reason), and it only means "quit" if the
fault_thread_quit is set.

Since we changed what it does, renaming it to userfault_event_fd.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  6 ++++--
 migration/postcopy-ram.c | 24 ++++++++++++++++--------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 148c9fa..70e3094 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -35,6 +35,8 @@ struct MigrationIncomingState {
     bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
+    /* Set this when we want the fault thread to quit */
+    bool           fault_thread_quit;
 
     bool           have_listen_thread;
     QemuThread     listen_thread;
@@ -42,8 +44,8 @@ struct MigrationIncomingState {
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
-    /* To tell the fault_thread to quit */
-    int       userfault_quit_fd;
+    /* To notify the fault_thread to wake, e.g., when need to quit */
+    int       userfault_event_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 952b73a..4278fe7 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -305,7 +305,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
          * currently be at 0, we're going to increment it to 1
          */
         tmp64 = 1;
-        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+        atomic_set(&mis->fault_thread_quit, 1);
+        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
             trace_postcopy_ram_incoming_cleanup_join();
             qemu_thread_join(&mis->fault_thread);
         } else {
@@ -315,7 +316,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         }
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
-        close(mis->userfault_quit_fd);
+        close(mis->userfault_event_fd);
         mis->have_fault_thread = false;
     }
 
@@ -438,7 +439,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
         pfd[0].fd = mis->userfault_fd;
         pfd[0].events = POLLIN;
         pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].fd = mis->userfault_event_fd;
         pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
         pfd[1].revents = 0;
 
@@ -448,8 +449,15 @@ static void *postcopy_ram_fault_thread(void *opaque)
         }
 
         if (pfd[1].revents) {
-            trace_postcopy_ram_fault_thread_quit();
-            break;
+            uint64_t tmp64 = 0;
+
+            /* Consume the signal */
+            read(mis->userfault_event_fd, &tmp64, 8);
+
+            if (atomic_read(&mis->fault_thread_quit)) {
+                trace_postcopy_ram_fault_thread_quit();
+                break;
+            }
         }
 
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
@@ -528,9 +536,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     }
 
     /* Now an eventfd we use to tell the fault-thread to quit */
-    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
-    if (mis->userfault_quit_fd == -1) {
-        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+    mis->userfault_event_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_event_fd == -1) {
+        error_report("%s: Opening userfault_event_fd: %s", __func__,
                      strerror(errno));
         close(mis->userfault_fd);
         return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify()
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (7 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 18:45   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

A general helper to notify the fault thread.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 35 ++++++++++++++++++++---------------
 migration/postcopy-ram.h |  2 ++
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 4278fe7..9ce391d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -287,6 +287,21 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
     return 0;
 }
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis)
+{
+    uint64_t tmp64 = 1;
+
+    /*
+     * Tell the fault_thread to exit, it's an eventfd that should
+     * currently be at 0, we're going to increment it to 1
+     */
+    if (write(mis->userfault_event_fd, &tmp64, 8) != 8) {
+        /* Not much we can do here, but may as well report it */
+        error_report("%s: incrementing userfault_quit_fd: %s", __func__,
+                     strerror(errno));
+    }
+}
+
 /*
  * At the end of a migration where postcopy_ram_incoming_init was called.
  */
@@ -295,25 +310,15 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     trace_postcopy_ram_incoming_cleanup_entry();
 
     if (mis->have_fault_thread) {
-        uint64_t tmp64;
-
         if (qemu_ram_foreach_block(cleanup_range, mis)) {
             return -1;
         }
-        /*
-         * Tell the fault_thread to exit, it's an eventfd that should
-         * currently be at 0, we're going to increment it to 1
-         */
-        tmp64 = 1;
+        /* Let the fault thread quit */
         atomic_set(&mis->fault_thread_quit, 1);
-        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
-            trace_postcopy_ram_incoming_cleanup_join();
-            qemu_thread_join(&mis->fault_thread);
-        } else {
-            /* Not much we can do here, but may as well report it */
-            error_report("%s: incrementing userfault_quit_fd: %s", __func__,
-                         strerror(errno));
-        }
+        postcopy_fault_thread_notify(mis);
+        trace_postcopy_ram_incoming_cleanup_join();
+        qemu_thread_join(&mis->fault_thread);
+
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
         close(mis->userfault_event_fd);
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 78a3591..4a7644d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -114,4 +114,6 @@ PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis);
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast"
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (8 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-31 18:52   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This provides a way to start postcopy ASAP when migration starts. To do
this, we need both:

  -global migration.x-postcopy-ram=on \
  -global migration.x-postcopy-fast=on

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 9 ++++++++-
 migration/migration.h | 2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5b2602e..efee87e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1936,6 +1936,11 @@ bool migrate_colo_enabled(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
 }
 
+static bool postcopy_should_start(MigrationState *s)
+{
+    return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -2013,7 +2018,7 @@ static void *migration_thread(void *opaque)
                 if (migrate_postcopy_ram() &&
                     s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE &&
                     pend_nonpost <= threshold_size &&
-                    atomic_read(&s->start_postcopy)) {
+                    postcopy_should_start(s)) {
 
                     if (!postcopy_start(s, &old_vm_running)) {
                         current_active_state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
@@ -2170,6 +2175,8 @@ static Property migration_properties[] = {
                      send_configuration, true),
     DEFINE_PROP_BOOL("send-section-footer", MigrationState,
                      send_section_footer, true),
+    DEFINE_PROP_BOOL("x-postcopy-fast", MigrationState,
+                     start_postcopy_fast, false),
 
     /* Migration parameters */
     DEFINE_PROP_INT64("x-compress-level", MigrationState,
diff --git a/migration/migration.h b/migration/migration.h
index 70e3094..e902bae 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -113,6 +113,8 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
+    /* Set the flag if we want to start postcopy ASAP when migration starts */
+    bool start_postcopy_fast;
     /* Flag set after postcopy has sent the device state */
     bool postcopy_after_devices;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (9 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-28 15:53   ` Eric Blake
  2017-07-31 19:06   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
                   ` (19 subsequent siblings)
  30 siblings, 2 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing a new state "postcopy-paused", which can be used to pause a
postcopy migration. It is targeted to support network failures during
postcopy migration. Now when network down for postcopy, the source side
will not fail the migration. Instead we convert the status into this new
paused state, and we will try to wait for a rescue in the future.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 78 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration/migration.h  |  3 ++
 migration/trace-events |  1 +
 qapi-schema.json       |  5 +++-
 4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index efee87e..0bc70c8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -470,6 +470,7 @@ static bool migration_is_setup_or_active(int state)
     switch (state) {
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -545,6 +546,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -991,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
 
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+
+    qemu_sem_destroy(&s->postcopy_pause_sem);
 }
 
 void migrate_fd_error(MigrationState *s, const Error *error)
@@ -1134,6 +1138,7 @@ MigrationState *migrate_init(void)
     s->migration_thread_running = false;
     error_free(s->error);
     s->error = NULL;
+    qemu_sem_init(&s->postcopy_pause_sem, 0);
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
@@ -1942,6 +1947,69 @@ static bool postcopy_should_start(MigrationState *s)
 }
 
 /*
+ * We don't return until we are in a safe state to continue current
+ * postcopy migration.  Returns true to continue the migration, or
+ * false to terminate current migration.
+ */
+static bool postcopy_pause(MigrationState *s)
+{
+    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    /* Current channel is possibly broken. Release it. */
+    assert(s->to_dst_file);
+    qemu_file_shutdown(s->to_dst_file);
+    qemu_fclose(s->to_dst_file);
+    s->to_dst_file = NULL;
+
+    /*
+     * We wait until things fixed up. Then someone will setup the
+     * status back for us.
+     */
+    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&s->postcopy_pause_sem);
+    }
+
+    trace_postcopy_pause_continued();
+
+    return true;
+}
+
+/* Return true if we want to stop the migration, otherwise false. */
+static bool migration_detect_error(MigrationState *s)
+{
+    int ret;
+
+    /* Try to detect any file errors */
+    ret = qemu_file_get_error(s->to_dst_file);
+
+    if (!ret) {
+        /* Everything is fine */
+        return false;
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+        /*
+         * For postcopy, we allow the network to be down for a
+         * while. After that, it can be continued by a
+         * recovery phase.
+         */
+        return !postcopy_pause(s);
+    } else {
+        /*
+         * For precopy (or postcopy with error outside IO), we fail
+         * with no time.
+         */
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        trace_migration_thread_file_err();
+
+        /* Time to stop the migration, now. */
+        return true;
+    }
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
@@ -2037,12 +2105,14 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->to_dst_file)) {
-            migrate_set_state(&s->state, current_active_state,
-                              MIGRATION_STATUS_FAILED);
-            trace_migration_thread_file_err();
+        /*
+         * Try to detect any kind of failures, and see whether we
+         * should stop the migration now.
+         */
+        if (migration_detect_error(s)) {
             break;
         }
+
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         if (current_time >= initial_time + BUFFER_DELAY) {
             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
diff --git a/migration/migration.h b/migration/migration.h
index e902bae..24cdaf6 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -151,6 +151,9 @@ struct MigrationState
     bool send_configuration;
     /* Whether we send section footer during migration */
     bool send_section_footer;
+
+    /* Needed by postcopy-pause state */
+    QemuSemaphore postcopy_pause_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index 08d00fa..2211acc 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -98,6 +98,7 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
diff --git a/qapi-schema.json b/qapi-schema.json
index 9c6c3e1..2a36b80 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -667,6 +667,8 @@
 #
 # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
 #
+# @postcopy-paused: during postcopy but paused. (since 2.10)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -679,7 +681,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
+            'active', 'postcopy-active', 'postcopy-paused',
+            'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (10 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01  9:47   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  1 +
 migration/migration.h  |  3 +++
 migration/savevm.c     | 45 +++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  2 ++
 4 files changed, 51 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 0bc70c8..c729c5a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         memset(&mis_current, 0, sizeof(MigrationIncomingState));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index 24cdaf6..08b90e8 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -60,6 +60,9 @@ struct MigrationIncomingState {
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
     QemuSemaphore colo_incoming_sem;
+
+    /* notify PAUSED postcopy incoming migrations to try to continue */
+    QemuSemaphore postcopy_pause_sem_dst;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index 13ae9d6..1f62268 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1954,11 +1954,41 @@ void qemu_loadvm_state_cleanup(void)
     }
 }
 
+/* Return true if we should continue the migration, or false. */
+static bool postcopy_pause_incoming(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_incoming();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    assert(mis->from_src_file);
+    qemu_file_shutdown(mis->from_src_file);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+
+    assert(mis->to_src_file);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_file_shutdown(mis->to_src_file);
+    qemu_fclose(mis->to_src_file);
+    mis->to_src_file = NULL;
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
+    }
+
+    trace_postcopy_pause_incoming_continued();
+
+    return true;
+}
+
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret = 0;
 
+retry:
     while (true) {
         section_type = qemu_get_byte(f);
 
@@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
+
+        /*
+         * Detect whether it is:
+         *
+         * 1. postcopy running
+         * 2. network failure (-EIO)
+         *
+         * If so, we try to wait for a recovery.
+         */
+        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
+            ret == -EIO && postcopy_pause_incoming(mis)) {
+            /* Reset f to point to the newly created channel */
+            f = mis->from_src_file;
+            goto retry;
+        }
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 2211acc..22a629e 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,8 @@ open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_continued(void) ""
+postcopy_pause_incoming(void) ""
+postcopy_pause_incoming_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 13/29] migration: allow src return path to pause
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (11 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 10:01   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Let the thread pause for network issues.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--
 migration/migration.h  |  1 +
 migration/trace-events |  2 ++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c729c5a..d0b9a86 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -996,6 +996,7 @@ static void migrate_fd_cleanup(void *opaque)
     block_cleanup_parameters(s);
 
     qemu_sem_destroy(&s->postcopy_pause_sem);
+    qemu_sem_destroy(&s->postcopy_pause_rp_sem);
 }
 
 void migrate_fd_error(MigrationState *s, const Error *error)
@@ -1140,6 +1141,7 @@ MigrationState *migrate_init(void)
     error_free(s->error);
     s->error = NULL;
     qemu_sem_init(&s->postcopy_pause_sem, 0);
+    qemu_sem_init(&s->postcopy_pause_rp_sem, 0);
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
@@ -1527,6 +1529,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
+/* Return true to retry, false to quit */
+static bool postcopy_pause_return_path_thread(MigrationState *s)
+{
+    trace_postcopy_pause_return_path();
+
+    qemu_sem_wait(&s->postcopy_pause_rp_sem);
+
+    trace_postcopy_pause_return_path_continued();
+
+    return true;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1543,6 +1557,8 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+
+retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1634,13 +1650,28 @@ static void *source_return_path_thread(void *opaque)
             break;
         }
     }
-    if (qemu_file_get_error(rp)) {
+
+out:
+    res = qemu_file_get_error(rp);
+    if (res) {
+        if (res == -EIO) {
+            /*
+             * Maybe there is something we can do: it looks like a
+             * network down issue, and we pause for a recovery.
+             */
+            if (postcopy_pause_return_path_thread(ms)) {
+                /* Reload rp, reset the rest */
+                rp = ms->rp_state.from_dst_file;
+                ms->rp_state.error = false;
+                goto retry;
+            }
+        }
+
         trace_source_return_path_thread_bad_end();
         mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
-out:
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
     return NULL;
diff --git a/migration/migration.h b/migration/migration.h
index 08b90e8..7aaab13 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -157,6 +157,7 @@ struct MigrationState
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
+    QemuSemaphore postcopy_pause_rp_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index 22a629e..a269eec 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -98,6 +98,8 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_return_path(void) ""
+postcopy_pause_return_path_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (12 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 10:30   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
                   ` (16 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

We will not allow failures to happen when sending data from destination
to source via the return path. However it is possible that there can be
errors along the way.  This patch allows the migrate_send_rp_message()
to return error when it happens, and further extended it to
migrate_send_rp_req_pages().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 38 ++++++++++++++++++++++++++++++--------
 migration/migration.h |  2 +-
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d0b9a86..9a0b5b0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -195,17 +195,35 @@ static void deferred_incoming_migration(Error **errp)
  * Send a message on the return channel back to the source
  * of the migration.
  */
-static void migrate_send_rp_message(MigrationIncomingState *mis,
-                                    enum mig_rp_message_type message_type,
-                                    uint16_t len, void *data)
+static int migrate_send_rp_message(MigrationIncomingState *mis,
+                                   enum mig_rp_message_type message_type,
+                                   uint16_t len, void *data)
 {
+    int ret = 0;
+
     trace_migrate_send_rp_message((int)message_type, len);
     qemu_mutex_lock(&mis->rp_mutex);
+
+    /*
+     * It's possible that the file handle got lost due to network
+     * failures.
+     */
+    if (!mis->to_src_file) {
+        ret = -EIO;
+        goto error;
+    }
+
     qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
     qemu_put_be16(mis->to_src_file, len);
     qemu_put_buffer(mis->to_src_file, data, len);
     qemu_fflush(mis->to_src_file);
+
+    /* It's possible that qemu file got error during sending */
+    ret = qemu_file_get_error(mis->to_src_file);
+
+error:
     qemu_mutex_unlock(&mis->rp_mutex);
+    return ret;
 }
 
 /* Request a range of pages from the source VM at the given
@@ -215,26 +233,30 @@ static void migrate_send_rp_message(MigrationIncomingState *mis,
  *   Start: Address offset within the RB
  *   Len: Length in bytes required - must be a multiple of pagesize
  */
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
-                               ram_addr_t start, size_t len)
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, size_t len)
 {
     uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
     size_t msglen = 12; /* start + len */
+    int rbname_len;
+    enum mig_rp_message_type msg_type;
 
     *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
     *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
 
     if (rbname) {
-        int rbname_len = strlen(rbname);
+        rbname_len = strlen(rbname);
         assert(rbname_len < 256);
 
         bufc[msglen++] = rbname_len;
         memcpy(bufc + msglen, rbname, rbname_len);
         msglen += rbname_len;
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES_ID;
     } else {
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES;
     }
+
+    return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
 void qemu_start_incoming_migration(const char *uri, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 7aaab13..047872b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -201,7 +201,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (13 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 10:41   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    |  1 +
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c       |  3 +++
 migration/trace-events   |  2 ++
 5 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9a0b5b0..9d93836 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -147,6 +147,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index 047872b..574fedd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -63,6 +63,7 @@ struct MigrationIncomingState {
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
+    QemuSemaphore postcopy_pause_sem_fault;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9ce391d..ba53155 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -418,6 +418,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fault_thread();
+
+    qemu_sem_wait(&mis->postcopy_pause_sem_fault);
+
+    trace_postcopy_pause_fault_thread_continued();
+
+    return true;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -465,6 +476,22 @@ static void *postcopy_ram_fault_thread(void *opaque)
             }
         }
 
+        if (!mis->to_src_file) {
+            /*
+             * Possibly someone tells us that the return path is
+             * broken already using the event. We should hold until
+             * the channel is rebuilt.
+             */
+            if (postcopy_pause_fault_thread(mis)) {
+                last_rb = NULL;
+                /* Continue to read the userfaultfd */
+            } else {
+                error_report("%s: paused but don't allow to continue",
+                             __func__);
+                break;
+            }
+        }
+
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
         if (ret != sizeof(msg)) {
             if (errno == EAGAIN) {
@@ -504,18 +531,33 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset);
 
+retry:
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
          */
         if (rb != last_rb) {
             last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                            rb_offset, qemu_ram_pagesize(rb));
         } else {
             /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, NULL,
+                                            rb_offset, qemu_ram_pagesize(rb));
+        }
+
+        if (ret) {
+            /* May be network failure, try to wait for recovery */
+            if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                /* We got reconnected somehow, try to continue */
+                last_rb = NULL;
+                goto retry;
+            } else {
+                /* This is a unavoidable fault */
+                error_report("%s: migrate_send_rp_req_pages() get %d",
+                             __func__, ret);
+                break;
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
diff --git a/migration/savevm.c b/migration/savevm.c
index 1f62268..386788d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1974,6 +1974,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    /* Notify the fault thread for the invalidated file handle */
+    postcopy_fault_thread_notify(mis);
+
     while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
         qemu_sem_wait(&mis->postcopy_pause_sem_dst);
     }
diff --git a/migration/trace-events b/migration/trace-events
index a269eec..dbb4971 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -100,6 +100,8 @@ open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_return_path(void) ""
 postcopy_pause_return_path_continued(void) ""
+postcopy_pause_fault_thread(void) ""
+postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (14 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-07-28 15:57   ` Eric Blake
                     ` (2 more replies)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
                   ` (14 subsequent siblings)
  30 siblings, 3 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It will be used when we want to resume one paused migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx       | 7 ++++---
 hmp.c                 | 4 +++-
 migration/migration.c | 2 +-
 qapi-schema.json      | 5 ++++-
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 1941e19..7adb029 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -928,13 +928,14 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
+        .params     = "[-d] [-b] [-i] [-r] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+                      "\n\t\t\t -r to resume a paused migration",
         .cmd        = hmp_migrate,
     },
 
diff --git a/hmp.c b/hmp.c
index fd80dce..ebc1563 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1891,10 +1891,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     bool detach = qdict_get_try_bool(qdict, "detach", false);
     bool blk = qdict_get_try_bool(qdict, "blk", false);
     bool inc = qdict_get_try_bool(qdict, "inc", false);
+    bool resume = qdict_get_try_bool(qdict, "resume", false);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc,
+                false, false, true, resume, &err);
     if (err) {
         error_report_err(err);
         return;
diff --git a/migration/migration.c b/migration/migration.c
index 9d93836..36ff8c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1238,7 +1238,7 @@ bool migration_is_blocked(Error **errp)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_resume, bool resume, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
diff --git a/qapi-schema.json b/qapi-schema.json
index 2a36b80..27b7c4c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3208,6 +3208,8 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @resume: resume one paused migration
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
@@ -3229,7 +3231,8 @@
 #
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+           '*detach': 'bool', '*resume': 'bool' } }
 
 ##
 # @migrate-incoming:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 17/29] migration: rebuild channel on source
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (15 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 10:59   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch detects the "resume" flag of migration command, rebuild the
channels only if the flag is set.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 11 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 36ff8c3..64de0ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1244,6 +1244,15 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     MigrationState *s = migrate_get_current();
     const char *p;
 
+    if (has_resume && resume) {
+        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            error_setg(errp, "Cannot resume if there is no "
+                       "paused migration");
+            return;
+        }
+        goto do_resume;
+    }
+
     if (migration_is_setup_or_active(s->state) ||
         s->state == MIGRATION_STATUS_CANCELLING ||
         s->state == MIGRATION_STATUS_COLO) {
@@ -1279,6 +1288,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
     s = migrate_init();
 
+do_resume:
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -1700,7 +1710,8 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms)
+static int open_return_path_on_source(MigrationState *ms,
+                                      bool create_thread)
 {
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
@@ -1709,6 +1720,12 @@ static int open_return_path_on_source(MigrationState *ms)
     }
 
     trace_open_return_path_on_source();
+
+    if (!create_thread) {
+        /* We're done */
+        return 0;
+    }
+
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 
@@ -2249,15 +2266,24 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s)
 {
-    s->expected_downtime = s->parameters.downtime_limit;
-    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
+    int64_t rate_limit;
+    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
 
-    qemu_file_set_blocking(s->to_dst_file, true);
-    qemu_file_set_rate_limit(s->to_dst_file,
-                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
+    if (resume) {
+        /* This is a resumed migration */
+        rate_limit = INT64_MAX;
+    } else {
+        /* This is a fresh new migration */
+        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
+        s->expected_downtime = s->parameters.downtime_limit;
+        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
-    /* Notify before starting migration thread */
-    notifier_list_notify(&migration_state_notifiers, s);
+        /* Notify before starting migration thread */
+        notifier_list_notify(&migration_state_notifiers, s);
+    }
+
+    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
+    qemu_file_set_blocking(s->to_dst_file, true);
 
     /*
      * Open the return path. For postcopy, it is used exclusively. For
@@ -2265,15 +2291,19 @@ void migrate_fd_connect(MigrationState *s)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_use_return_path()) {
-        if (open_return_path_on_source(s)) {
+        if (open_return_path_on_source(s, !resume)) {
             error_report("Unable to open return-path for postcopy");
-            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
             return;
         }
     }
 
+    if (resume) {
+        /* TODO: do the resume logic */
+        return;
+    }
+
     qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
     s->migration_thread_running = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover"
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (16 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 11:36   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
                   ` (12 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).

This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 qapi-schema.json      |  4 +++-
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 64de0ee..3aabe11 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -495,6 +495,7 @@ static bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -571,6 +572,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -2018,6 +2020,13 @@ static bool postcopy_should_start(MigrationState *s)
     return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
 }
 
+/* Return zero if success, or <0 for error */
+static int postcopy_do_resume(MigrationState *s)
+{
+    /* TODO: do the resume logic */
+    return 0;
+}
+
 /*
  * We don't return until we are in a safe state to continue current
  * postcopy migration.  Returns true to continue the migration, or
@@ -2026,7 +2035,9 @@ static bool postcopy_should_start(MigrationState *s)
 static bool postcopy_pause(MigrationState *s)
 {
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+
+do_pause:
+    migrate_set_state(&s->state, s->state,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
     /* Current channel is possibly broken. Release it. */
@@ -2043,9 +2054,32 @@ static bool postcopy_pause(MigrationState *s)
         qemu_sem_wait(&s->postcopy_pause_sem);
     }
 
-    trace_postcopy_pause_continued();
+    if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        /* We were waken up by a recover procedure. Give it a shot */
 
-    return true;
+        /*
+         * Firstly, let's wake up the return path now, with a new
+         * return path channel.
+         */
+        qemu_sem_post(&s->postcopy_pause_rp_sem);
+
+        /* Do the resume logic */
+        if (postcopy_do_resume(s) == 0) {
+            /* Let's continue! */
+            trace_postcopy_pause_continued();
+            return true;
+        } else {
+            /*
+             * Something wrong happened during the recovery, let's
+             * pause again. Pause is always better than throwing data
+             * away.
+             */
+            goto do_pause;
+        }
+    } else {
+        /* This is not right... Time to quit. */
+        return false;
+    }
 }
 
 /* Return true if we want to stop the migration, otherwise false. */
@@ -2300,7 +2334,10 @@ void migrate_fd_connect(MigrationState *s)
     }
 
     if (resume) {
-        /* TODO: do the resume logic */
+        /* Wakeup the main migration thread to do the recovery */
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+        qemu_sem_post(&s->postcopy_pause_sem);
         return;
     }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 27b7c4c..10f1f60 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -669,6 +669,8 @@
 #
 # @postcopy-paused: during postcopy but paused. (since 2.10)
 #
+# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -682,7 +684,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
-            'completed', 'failed', 'colo' ] }
+            'postcopy-recover', 'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 19/29] migration: let dst listen on port always
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (17 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-01 10:56   ` Daniel P. Berrange
  2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/exec.c   | 2 +-
 migration/fd.c     | 2 +-
 migration/socket.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/exec.c b/migration/exec.c
index 08b599e..b4412db 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
 {
     migration_channel_process_incoming(ioc);
     object_unref(OBJECT(ioc));
-    return FALSE; /* unregister */
+    return TRUE; /* keep it registered */
 }
 
 void exec_start_incoming_migration(const char *command, Error **errp)
diff --git a/migration/fd.c b/migration/fd.c
index 30f5258..865277a 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 {
     migration_channel_process_incoming(ioc);
     object_unref(OBJECT(ioc));
-    return FALSE; /* unregister */
+    return TRUE; /* keep it registered */
 }
 
 void fd_start_incoming_migration(const char *infd, Error **errp)
diff --git a/migration/socket.c b/migration/socket.c
index 757d382..f2c2d01 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -153,8 +153,8 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
 
 out:
     /* Close listening socket as its no longer needed */
-    qio_channel_close(ioc, NULL);
-    return FALSE; /* unregister */
+    // qio_channel_close(ioc, NULL);
+    return TRUE; /* keep it registered */
 }
 
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (18 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03  9:28   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

On the destination side, we cannot wake up all the threads when we got
reconnected. The first thing to do is to wake up the main load thread,
so that we can continue to receive valid messages from source again and
reply when needed.

At this point, we switch the destination VM state from postcopy-paused
back to postcopy-recover.

Now we are finally ready to do the resume logic.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3aabe11..e498fa4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -389,10 +389,38 @@ static void process_incoming_migration_co(void *opaque)
 
 void migration_fd_process_incoming(QEMUFile *f)
 {
-    Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f);
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Coroutine *co;
+
+    mis->from_src_file = f;
+
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        /* Resumed migration to postcopy state */
+
+        /* Postcopy has standalone thread to do vm load */
+        qemu_file_set_blocking(f, true);
+
+        /* Re-configure the return path */
+        mis->to_src_file = qemu_file_get_return_path(f);
 
-    qemu_file_set_blocking(f, false);
-    qemu_coroutine_enter(co);
+        /* Reset the migration status to postcopy-active */
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+
+        /*
+         * Here, we only wake up the main loading thread (while the
+         * fault thread will still be waiting), so that we can receive
+         * commands from source now, and answer it if needed. The
+         * fault thread will be waked up afterwards until we are sure
+         * that source is ready to reply to page requests.
+         */
+        qemu_sem_post(&mis->postcopy_pause_sem_dst);
+    } else {
+        /* New incoming migration */
+        qemu_file_set_blocking(f, false);
+        co = qemu_coroutine_create(process_incoming_migration_co, f);
+        qemu_coroutine_enter(co);
+    }
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (19 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03  9:49   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
one ramblock.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  1 +
 3 files changed, 61 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 386788d..0ab13c0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -78,6 +78,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
 };
 
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
+    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -929,6 +931,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
+{
+    size_t len;
+    char buf[512];
+
+    trace_savevm_send_recv_bitmap(block_name);
+
+    buf[0] = len = strlen(block_name);
+    memcpy(buf + 1, block_name, len);
+
+    qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1705,6 +1720,47 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 }
 
 /*
+ * Handle request that source requests for recved_bitmap on
+ * destination. Payload format:
+ *
+ * len (1 byte) + ramblock_name (<255 bytes)
+ */
+static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
+                                     uint16_t len)
+{
+    QEMUFile *file = mis->from_src_file;
+    RAMBlock *rb;
+    char block_name[256];
+    size_t cnt;
+
+    cnt = qemu_get_counted_string(file, block_name);
+    if (!cnt) {
+        error_report("%s: failed to read block name", __func__);
+        return -EINVAL;
+    }
+
+    /* Validate before using the data */
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    if (len != cnt + 1) {
+        error_report("%s: invalid payload length (%d)", __func__, len);
+        return -EINVAL;
+    }
+
+    rb = qemu_ram_block_by_name(block_name);
+    if (!rb) {
+        error_report("%s: block '%s' not found", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* TODO: send the bitmap back to source */
+
+    return 0;
+}
+
+/*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
  * LOADVM_QUIT All good, but exit the loop
@@ -1777,6 +1833,9 @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case MIG_CMD_RECV_BITMAP:
+        return loadvm_handle_recv_bitmap(mis, len);
     }
 
     return 0;
diff --git a/migration/savevm.h b/migration/savevm.h
index 295c4a1..8126b1c 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
diff --git a/migration/trace-events b/migration/trace-events
index dbb4971..ca7b43f 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -34,6 +34,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (20 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 10:50   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.

This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.

When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by reverting the bits.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 62 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.h  |  2 ++
 migration/ram.c        | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h        |  2 ++
 migration/savevm.c     |  2 +-
 migration/trace-events |  2 ++
 6 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e498fa4..c2b85ac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -92,6 +92,7 @@ enum mig_rp_message_type {
 
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
+    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
 
     MIG_RP_MSG_MAX
 };
@@ -450,6 +451,39 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
 }
 
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name)
+{
+    char buf[512];
+    int len;
+    int64_t res;
+
+    /*
+     * First, we send the header part. It contains only the len of
+     * idstr, and the idstr itself.
+     */
+    len = strlen(block_name);
+    buf[0] = len;
+    memcpy(buf + 1, block_name, len);
+
+    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
+
+    /*
+     * Next, we dump the received bitmap to the stream.
+     *
+     * TODO: currently we are safe since we are the only one that is
+     * using the to_src_file handle (fault thread is still paused),
+     * and it's ok even not taking the mutex. However the best way is
+     * to take the lock before sending the message header, and release
+     * the lock after sending the bitmap.
+     */
+    qemu_mutex_lock(&mis->rp_mutex);
+    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    trace_migrate_send_rp_recv_bitmap(block_name, res);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1560,6 +1594,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1604,6 +1639,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+
+    if (!block) {
+        error_report("%s: invalid block name '%s'", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* Fetch the received bitmap and refresh the dirty bitmap */
+    return ram_dirty_bitmap_reload(s, block);
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1709,6 +1757,20 @@ retry:
             migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
+        case MIG_RP_MSG_RECV_BITMAP:
+            if (header_len < 1) {
+                error_report("%s: missing block name", __func__);
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
+            buf[buf[0] + 1] = '\0';
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index 574fedd..4d38308 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -204,5 +204,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 7f4cb0f..d543483 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -182,6 +182,32 @@ void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb)
 }
 
 /*
+ * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
+ *
+ * Returns >0 if success with sent bytes, or <0 if error.
+ */
+int64_t ramblock_recv_bitmap_send(QEMUFile *file, char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+    uint64_t size;
+
+    /* We should have made sure that the block exists */
+    assert(block);
+
+    /* Size of the bitmap, in bytes */
+    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
+    qemu_put_be64(file, size);
+    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
+    qemu_fflush(file);
+
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    return sizeof(size) + size;
+}
+
+/*
  * An outstanding page request, on the source, having been received
  * and queued
  */
@@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * Read the received bitmap, revert it as the initial dirty bitmap.
+ * This is only used when the postcopy migration is paused but wants
+ * to resume from a middle point.
+ */
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+{
+    QEMUFile *file = s->rp_state.from_dst_file;
+    uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8;
+    uint64_t size;
+
+    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: incorrect state %s", __func__,
+                     MigrationStatus_lookup[s->state]);
+        return -EINVAL;
+    }
+
+    size = qemu_get_be64(file);
+
+    /* The size of the bitmap should match with our ramblock */
+    if (size != local_size) {
+        error_report("%s: ramblock '%s' bitmap size mismatch "
+                     "(0x%lx != 0x%lx)", __func__, block->idstr,
+                     size, local_size);
+        return -EINVAL;
+    }
+
+    /*
+     * We are still during migration (though paused). The dirty bitmap
+     * won't change.  We can directly modify it.
+     */
+    size = qemu_get_buffer(file, (uint8_t *)block->bmap, local_size);
+
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    /*
+     * What we received is "received bitmap". Revert it as the initial
+     * dirty bitmap for this ramblock.
+     */
+    bitmap_invert(block->bmap, block->max_length >> TARGET_PAGE_BITS);
+
+    trace_ram_dirty_bitmap_reload(block->idstr);
+
+    return 0;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/ram.h b/migration/ram.h
index 84e8623..86eb973 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -58,5 +58,7 @@ void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
                                     size_t len);
 void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb);
+int64_t ramblock_recv_bitmap_send(QEMUFile *file, char *block_name);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ab13c0..def9213 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1755,7 +1755,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
         return -EINVAL;
     }
 
-    /* TODO: send the bitmap back to source */
+    migrate_send_rp_recv_bitmap(mis, block_name);
 
     return 0;
 }
diff --git a/migration/trace-events b/migration/trace-events
index ca7b43f..ed69551 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -77,6 +77,7 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
+ram_dirty_bitmap_reload(char *str) "%s"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
@@ -88,6 +89,7 @@ migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx"
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
 migration_completion_postcopy_end(void) ""
 migration_completion_postcopy_end_after_complete(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (21 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 11:05   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
                   ` (7 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing this new command to be sent when the source VM is ready to
resume the paused migration.  What the destination does here is
basically release the fault thread to continue service page faults.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 27 +++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  1 +
 3 files changed, 29 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index def9213..2e330bc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -77,6 +77,7 @@ enum qemu_vm_cmd {
     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                       were previously sent during
                                       precopy but are dirty. */
+    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
+    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
@@ -931,6 +933,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_postcopy_resume(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_resume();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
+}
+
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
 {
     size_t len;
@@ -1671,6 +1679,22 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     return LOADVM_QUIT;
 }
 
+static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
+{
+    /*
+     * This means source VM is ready to resume the postcopy migration.
+     * It's time to switch state and release the fault thread to
+     * continue service page faults.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_sem_post(&mis->postcopy_pause_sem_fault);
+
+    /* TODO: Tell source that "we are ready" */
+
+    return 0;
+}
+
 /**
  * Immediately following this command is a blob of data containing an embedded
  * chunk of migration stream; read it and load it.
@@ -1834,6 +1858,9 @@ static int loadvm_process_command(QEMUFile *f)
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
 
+    case MIG_CMD_POSTCOPY_RESUME:
+        return loadvm_postcopy_handle_resume(mis);
+
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
     }
diff --git a/migration/savevm.h b/migration/savevm.h
index 8126b1c..a5f3879 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_resume(QEMUFile *f);
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
diff --git a/migration/trace-events b/migration/trace-events
index ed69551..04dd9d8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -34,6 +34,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (22 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 11:21   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
 migration/migration.h  |  1 +
 migration/savevm.c     |  3 ++-
 migration/trace-events |  1 +
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index c2b85ac..62f91ce 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -93,6 +93,7 @@ enum mig_rp_message_type {
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
     MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
+    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
 
     MIG_RP_MSG_MAX
 };
@@ -484,6 +485,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
     trace_migrate_send_rp_recv_bitmap(block_name, res);
 }
 
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1595,6 +1604,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
     [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
+    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1652,6 +1662,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
     return ram_dirty_bitmap_reload(s, block);
 }
 
+static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+{
+    trace_source_return_path_thread_resume_ack(value);
+
+    /*
+     * Currently value will always be one. It can be used in the
+     * future to notify source that destination cannot continue.
+     */
+    assert(value == 1);
+
+    /* Now both sides are active. */
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    /* TODO: notify send thread that time to continue send pages */
+
+    return 0;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1771,6 +1800,14 @@ retry:
             }
             break;
 
+        case MIG_RP_MSG_RESUME_ACK:
+            tmp32 = ldl_be_p(buf);
+            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index 4d38308..2a3f905 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,5 +206,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 2e330bc..02a67ac 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1690,7 +1690,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
     qemu_sem_post(&mis->postcopy_pause_sem_fault);
 
-    /* TODO: Tell source that "we are ready" */
+    /* Tell source that "we are ready" */
+    migrate_send_rp_resume_ack(mis, 1);
 
     return 0;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 04dd9d8..0b43fec 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -116,6 +116,7 @@ source_return_path_thread_entry(void) ""
 source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "%x"
 source_return_path_thread_shut(uint32_t val) "%x"
+source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (23 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 11:38   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This is hook function to be called when a postcopy migration wants to
resume from a failure. For each module, it should provide its own
recovery logic before we switch to the postcopy-active state.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h |  2 ++
 migration/migration.c        | 20 +++++++++++++++++++-
 migration/savevm.c           | 25 +++++++++++++++++++++++++
 migration/savevm.h           |  1 +
 migration/trace-events       |  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index a0f1edd..b669362 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -41,6 +41,8 @@ typedef struct SaveVMHandlers {
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
+    /* Called when postcopy migration wants to resume from failure */
+    int (*resume_prepare)(MigrationState *s, void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(DeviceState *dev,
diff --git a/migration/migration.c b/migration/migration.c
index 62f91ce..6cb0ad3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2150,7 +2150,25 @@ static bool postcopy_should_start(MigrationState *s)
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
-    /* TODO: do the resume logic */
+    int ret;
+
+    /*
+     * Call all the resume_prepare() hooks, so that modules can be
+     * ready for the migration resume.
+     */
+    ret = qemu_savevm_state_resume_prepare(s);
+    if (ret) {
+        error_report("%s: resume_prepare() failure detected: %d",
+                     __func__, ret);
+        return ret;
+    }
+
+    /*
+     * TODO: handshake with dest using MIG_CMD_RESUME,
+     * MIG_RP_MSG_RESUME_ACK, then switch source state to
+     * "postcopy-active"
+     */
+
     return 0;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 02a67ac..08a4712 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1004,6 +1004,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
     }
 }
 
+int qemu_savevm_state_resume_prepare(MigrationState *s)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    trace_savevm_state_resume_prepare();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->resume_prepare) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        ret = se->ops->resume_prepare(s, se->opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * this function has three return values:
  *   negative: there was one error, and we have -errno.
diff --git a/migration/savevm.h b/migration/savevm.h
index a5f3879..3193f04 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,6 +31,7 @@
 
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/trace-events b/migration/trace-events
index 0b43fec..0fb2d1e 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -37,6 +37,7 @@ savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
+savevm_state_resume_prepare(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_cleanup(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (24 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 11:56   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
                   ` (4 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.

When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.

The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.

One issue to be solved here is how to synchronize the source send thread
and return-path thread. Semaphore cannot really work here since we
cannot guarantee the order of wait/post (it's possible that the reply is
very fast, even before send thread starts to wait). So conditional
variable is used to make sure the ordering is always correct.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  4 +++
 migration/migration.h  |  4 +++
 migration/ram.c        | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  1 +
 4 files changed, 77 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 6cb0ad3..93fbc96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1093,6 +1093,8 @@ static void migrate_fd_cleanup(void *opaque)
 
     qemu_sem_destroy(&s->postcopy_pause_sem);
     qemu_sem_destroy(&s->postcopy_pause_rp_sem);
+    qemu_mutex_destroy(&s->resume_lock);
+    qemu_cond_destroy(&s->resume_cond);
 }
 
 void migrate_fd_error(MigrationState *s, const Error *error)
@@ -1238,6 +1240,8 @@ MigrationState *migrate_init(void)
     s->error = NULL;
     qemu_sem_init(&s->postcopy_pause_sem, 0);
     qemu_sem_init(&s->postcopy_pause_rp_sem, 0);
+    qemu_mutex_init(&s->resume_lock);
+    qemu_cond_init(&s->resume_cond);
 
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
diff --git a/migration/migration.h b/migration/migration.h
index 2a3f905..c270f4c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -159,6 +159,10 @@ struct MigrationState
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
     QemuSemaphore postcopy_pause_rp_sem;
+
+    /* Used to sync-up between main send thread and rp-thread */
+    QemuMutex resume_lock;
+    QemuCond resume_cond;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/ram.c b/migration/ram.c
index d543483..c695b13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -46,6 +46,7 @@
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
+#include "savevm.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -256,6 +257,8 @@ struct RAMState {
     RAMBlock *last_req_rb;
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
+    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
+    int ramblock_to_sync;
     QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
 };
 typedef struct RAMState RAMState;
@@ -2731,6 +2734,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* Sync all the dirty bitmap with destination VM.  */
+static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
+{
+    RAMBlock *block;
+    QEMUFile *file = s->to_dst_file;
+    int ramblock_count = 0;
+
+    trace_ram_dirty_bitmap_sync("start");
+
+    /*
+     * We need to take the resume lock to make sure that the send
+     * thread (current thread) and the rp-thread will do their work in
+     * order.
+     */
+    qemu_mutex_lock(&s->resume_lock);
+
+    /* Request for receive-bitmap for each block */
+    RAMBLOCK_FOREACH(block) {
+        ramblock_count++;
+        qemu_savevm_send_recv_bitmap(file, block->idstr);
+    }
+
+    /* Init the ramblock count to total */
+    atomic_set(&rs->ramblock_to_sync, ramblock_count);
+
+    trace_ram_dirty_bitmap_sync("wait-bitmap");
+
+    /* Wait until all the ramblocks' dirty bitmap synced */
+    while (rs->ramblock_to_sync) {
+        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
+    }
+
+    trace_ram_dirty_bitmap_sync("completed");
+
+    qemu_mutex_unlock(&s->resume_lock);
+
+    return 0;
+}
+
+static void ram_dirty_bitmap_reload_notify(MigrationState *s)
+{
+    qemu_mutex_lock(&s->resume_lock);
+    atomic_dec(&ram_state->ramblock_to_sync);
+    if (ram_state->ramblock_to_sync == 0) {
+        /* Make sure the other thread gets the latest */
+        trace_ram_dirty_bitmap_sync("notify-send");
+        qemu_cond_signal(&s->resume_cond);
+    }
+    qemu_mutex_unlock(&s->resume_lock);
+}
+
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -2776,9 +2830,22 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     trace_ram_dirty_bitmap_reload(block->idstr);
 
+    /*
+     * We succeeded to sync bitmap for current ramblock. If this is
+     * the last one to sync, we need to notify the main send thread.
+     */
+    ram_dirty_bitmap_reload_notify(s);
+
     return 0;
 }
 
+static int ram_resume_prepare(MigrationState *s, void *opaque)
+{
+    RAMState *rs = *(RAMState **)opaque;
+
+    return ram_dirty_bitmap_sync_all(s, rs);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -2789,6 +2856,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
+    .resume_prepare = ram_resume_prepare,
 };
 
 void ram_mig_init(void)
diff --git a/migration/trace-events b/migration/trace-events
index 0fb2d1e..15ff1bf 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -80,6 +80,7 @@ ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 ram_dirty_bitmap_reload(char *str) "%s"
+ram_dirty_bitmap_sync(const char *str) "%s"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 27/29] migration: setup ramstate for resume
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (25 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 12:37   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
                   ` (3 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

After we updated the dirty bitmaps of ramblocks, we also need to update
the critical fields in RAMState to make sure it is ready for a resume.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index c695b13..427bf6e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1947,6 +1947,31 @@ static int ram_state_init(RAMState **rsp)
     return 0;
 }
 
+static void ram_state_resume_prepare(RAMState *rs)
+{
+    RAMBlock *block;
+    long pages = 0;
+
+    /*
+     * Postcopy is not using xbzrle/compression, so no need for that.
+     * Also, since source are already halted, we don't need to care
+     * about dirty page logging as well.
+     */
+
+    RAMBLOCK_FOREACH(block) {
+        pages += bitmap_count_one(block->bmap,
+                                  block->max_length >> TARGET_PAGE_BITS);
+    }
+
+    /* This may not be aligned with current bitmaps. Recalculate. */
+    rs->migration_dirty_pages = pages;
+
+    rs->last_seen_block = NULL;
+    rs->last_sent_block = NULL;
+    rs->last_page = 0;
+    rs->last_version = ram_list.version;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -2842,8 +2867,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 static int ram_resume_prepare(MigrationState *s, void *opaque)
 {
     RAMState *rs = *(RAMState **)opaque;
+    int ret;
 
-    return ram_dirty_bitmap_sync_all(s, rs);
+    ret = ram_dirty_bitmap_sync_all(s, rs);
+    if (ret) {
+        return ret;
+    }
+
+    ram_state_resume_prepare(rs);
+
+    return 0;
 }
 
 static SaveVMHandlers savevm_ram_handlers = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (26 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 13:47   ` Dr. David Alan Gilbert
  2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Finish the last step to do the final handshake for the recovery.

First source sends one MIG_CMD_RESUME to dst, telling that source is
ready to resume.

Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
dest is ready to resume (after switch to postcopy-active state).

When source received the RESUME_ACK, it switches its state to
postcopy-active, and finally the recovery is completed.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 93fbc96..ecebe30 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1666,6 +1666,13 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
     return ram_dirty_bitmap_reload(s, block);
 }
 
+static void postcopy_resume_handshake_ack(MigrationState *s)
+{
+    qemu_mutex_lock(&s->resume_lock);
+    qemu_cond_signal(&s->resume_cond);
+    qemu_mutex_unlock(&s->resume_lock);
+}
+
 static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
 {
     trace_source_return_path_thread_resume_ack(value);
@@ -1680,7 +1687,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
-    /* TODO: notify send thread that time to continue send pages */
+    /* Notify send thread that time to continue send pages */
+    postcopy_resume_handshake_ack(s);
 
     return 0;
 }
@@ -2151,6 +2159,25 @@ static bool postcopy_should_start(MigrationState *s)
     return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
 }
 
+static int postcopy_resume_handshake(MigrationState *s)
+{
+    qemu_mutex_lock(&s->resume_lock);
+
+    qemu_savevm_send_postcopy_resume(s->to_dst_file);
+
+    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
+    }
+
+    qemu_mutex_unlock(&s->resume_lock);
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
+    return -1;
+}
+
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
@@ -2168,10 +2195,14 @@ static int postcopy_do_resume(MigrationState *s)
     }
 
     /*
-     * TODO: handshake with dest using MIG_CMD_RESUME,
-     * MIG_RP_MSG_RESUME_ACK, then switch source state to
-     * "postcopy-active"
+     * Last handshake with destination on the resume (destination will
+     * switch to postcopy-active afterwards)
      */
+    ret = postcopy_resume_handshake(s);
+    if (ret) {
+        error_report("%s: handshake failed: %d", __func__, ret);
+        return ret;
+    }
 
     return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (27 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
@ 2017-07-28  8:06 ` Peter Xu
  2017-08-03 13:54   ` Dr. David Alan Gilbert
  2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
  2017-08-03 15:57 ` Dr. David Alan Gilbert
  30 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-07-28  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Firstly, MigThrError enumeration is introduced to describe the error in
migration_detect_error() better. This gives the migration_thread() a
chance to know whether a recovery has happened.

Then, if a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ecebe30..439bc22 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
     return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
 }
 
+typedef enum MigThrError {
+    /* No error detected */
+    MIG_THR_ERR_NONE = 0,
+    /* Detected error, but resumed successfully */
+    MIG_THR_ERR_RECOVERED = 1,
+    /* Detected fatal error, need to exit */
+    MIG_THR_ERR_FATAL = 2,
+} MigThrError;
+
 static int postcopy_resume_handshake(MigrationState *s)
 {
     qemu_mutex_lock(&s->resume_lock);
@@ -2209,10 +2218,10 @@ static int postcopy_do_resume(MigrationState *s)
 
 /*
  * We don't return until we are in a safe state to continue current
- * postcopy migration.  Returns true to continue the migration, or
- * false to terminate current migration.
+ * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
+ * MIG_THR_ERR_FATAL if unrecovery failure happened.
  */
-static bool postcopy_pause(MigrationState *s)
+static MigThrError postcopy_pause(MigrationState *s)
 {
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
@@ -2247,7 +2256,7 @@ do_pause:
         if (postcopy_do_resume(s) == 0) {
             /* Let's continue! */
             trace_postcopy_pause_continued();
-            return true;
+            return MIG_THR_ERR_RECOVERED;
         } else {
             /*
              * Something wrong happened during the recovery, let's
@@ -2258,12 +2267,11 @@ do_pause:
         }
     } else {
         /* This is not right... Time to quit. */
-        return false;
+        return MIG_THR_ERR_FATAL;
     }
 }
 
-/* Return true if we want to stop the migration, otherwise false. */
-static bool migration_detect_error(MigrationState *s)
+static MigThrError migration_detect_error(MigrationState *s)
 {
     int ret;
 
@@ -2272,7 +2280,7 @@ static bool migration_detect_error(MigrationState *s)
 
     if (!ret) {
         /* Everything is fine */
-        return false;
+        return MIG_THR_ERR_NONE;
     }
 
     if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
@@ -2281,7 +2289,7 @@ static bool migration_detect_error(MigrationState *s)
          * while. After that, it can be continued by a
          * recovery phase.
          */
-        return !postcopy_pause(s);
+        return postcopy_pause(s);
     } else {
         /*
          * For precopy (or postcopy with error outside IO), we fail
@@ -2291,7 +2299,7 @@ static bool migration_detect_error(MigrationState *s)
         trace_migration_thread_file_err();
 
         /* Time to stop the migration, now. */
-        return true;
+        return MIG_THR_ERR_FATAL;
     }
 }
 
@@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
     bool enable_colo = migrate_colo_enabled();
+    MigThrError thr_error;
 
     rcu_register_thread();
 
@@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
          * Try to detect any kind of failures, and see whether we
          * should stop the migration now.
          */
-        if (migration_detect_error(s)) {
+        thr_error = migration_detect_error(s);
+        if (thr_error == MIG_THR_ERR_FATAL) {
+            /* Stop migration */
             break;
+        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
+            /*
+             * Just recovered from a e.g. network failure, reset all
+             * the local variables.
+             */
+            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+            initial_bytes = 0;
         }
 
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (28 preceding siblings ...)
  2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
@ 2017-07-28 10:06 ` Peter Xu
  2017-08-03 15:57 ` Dr. David Alan Gilbert
  30 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-28 10:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert

On Fri, Jul 28, 2017 at 04:06:09PM +0800, Peter Xu wrote:
> As we all know that postcopy migration has a potential risk to lost
> the VM if the network is broken during the migration. This series
> tries to solve the problem by allowing the migration to pause at the
> failure point, and do recovery after the link is reconnected.
> 
> There was existing work on this issue from Md Haris Iqbal:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> 
> This series is a totally re-work of the issue, based on Alexey
> Perevalov's recved bitmap v8 series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
> 
> Two new status are added to support the migration (used on both
> sides):
> 
>   MIGRATION_STATUS_POSTCOPY_PAUSED
>   MIGRATION_STATUS_POSTCOPY_RECOVER
> 
> The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
> network failure is detected. It is a phase that we'll be in for a long
> time as long as the failure is detected, and we'll be there until a
> recovery is triggered.  In this state, all the threads (on source:
> send thread, return-path thread; destination: ram-load thread,
> page-fault thread) will be halted.
> 
> The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
> a recovery, both source/destination VM will jump into this stage, do
> whatever it needs to prepare the recovery (e.g., currently the most
> important thing is to synchronize the dirty bitmap, please see commit
> messages for more information). After the preparation is ready, the
> source will do the final handshake with destination, then both sides
> will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.
> 
> New commands/messages are defined as well to satisfy the need:
> 
> MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
> delivering received bitmaps
> 
> MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
> handshake of postcopy recovery.
> 
> Here's some more details on how the whole failure/recovery routine is
> happened:
> 
> - start migration
> - ... (switch from precopy to postcopy)
> - both sides are in "postcopy-active" state
> - ... (failure happened, e.g., network unplugged)
> - both sides switch to "postcopy-paused" state
>   - all the migration threads are stopped on both sides
> - ... (both VMs hanged)
> - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
>   source side, "-r" means "recover")
> - both sides switch to "postcopy-recover" state
>   - on source: send-thread, return-path-thread will be waked up
>   - on dest: ram-load-thread waked up, fault-thread still paused
> - source calls new savevmhandler hook resume_prepare() (currently,
>   only ram is providing the hook):
>   - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
>     - src sends MIG_CMD_RECV_BITMAP to dst
>     - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
>       - src uses the recved bitmap to rebuild dirty bitmap
> - source do final handshake with destination
>   - src sends MIG_CMD_RESUME to dst, telling "src is ready"
>     - when dst receives the command, fault thread will be waked up,
>       meanwhile, dst switch back to "postcopy-active"
>   - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
>     - when src receives the ack, state switch to "postcopy-active"
> - postcopy migration continued
> 
> Testing:
> 
> As I said, it's still an extremely simple test. I used socat to create
> a socket bridge:
> 
>   socat tcp-listen:6666 tcp-connect:localhost:5555 &
> 
> Then do the migration via the bridge. I emulated the network failure
> by killing the socat process (bridge down), then tries to recover the
> migration using the other channel (default dst channel). It looks
> like:
> 
>         port:6666    +------------------+
>         +----------> | socat bridge [1] |-------+
>         |            +------------------+       |
>         |         (Original channel)            |
>         |                                       | port: 5555
>      +---------+  (Recovery channel)            +--->+---------+
>      | src VM  |------------------------------------>| dst VM  |
>      +---------+                                     +---------+
> 
> Known issues/notes:
> 
> - currently destination listening port still cannot change. E.g., the
>   recovery should be using the same port on destination for
>   simplicity. (on source, we can specify new URL)
> 
> - the patch: "migration: let dst listen on port always" is still
>   hacky, it just kept the incoming accept open forever for now...
> 
> - some migration numbers might still be inaccurate, like total
>   migration time, etc. (But I don't really think that matters much
>   now)
> 
> - the patches are very lightly tested.
> 
> - Dave reported one problem that may hang destination main loop thread
>   (one vcpu thread holds the BQL) and the rest. I haven't encountered
>   it yet, but it does not mean this series can survive with it.
> 
> - other potential issues that I may have forgotten or unnoticed...
> 
> Anyway, the work is still in preliminary stage. Any suggestions and
> comments are greatly welcomed.  Thanks.

I pushed the series to github in case needed:

https://github.com/xzpeter/qemu/tree/postcopy-recovery-support

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state
  2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
@ 2017-07-28 15:53   ` Eric Blake
  2017-07-31  7:02     ` Peter Xu
  2017-07-31 19:06   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 116+ messages in thread
From: Eric Blake @ 2017-07-28 15:53 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1682 bytes --]

On 07/28/2017 03:06 AM, Peter Xu wrote:
> Introducing a new state "postcopy-paused", which can be used to pause a
> postcopy migration. It is targeted to support network failures during
> postcopy migration. Now when network down for postcopy, the source side
> will not fail the migration. Instead we convert the status into this new
> paused state, and we will try to wait for a rescue in the future.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

You might want to use scripts/git.orderfile to put .json changes early
in your diffs (interface before implementation makes for easier reviews).

> +++ b/qapi-schema.json
> @@ -667,6 +667,8 @@
>  #
>  # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
>  #
> +# @postcopy-paused: during postcopy but paused. (since 2.10)
> +#

You've missed 2.10; this should be 2.11.  Can this state occur without
any explicit request (ie. old clients may be confused by it), or do you
have to opt-in to a specific migration parameter to inform qemu that you
are aware of how to handle this state?

>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -679,7 +681,8 @@
>  ##
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
> -            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
> +            'active', 'postcopy-active', 'postcopy-paused',
> +            'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
@ 2017-07-28 15:57   ` Eric Blake
  2017-07-31  7:05     ` Peter Xu
  2017-08-01 10:42   ` Dr. David Alan Gilbert
  2017-08-01 11:03   ` Daniel P. Berrange
  2 siblings, 1 reply; 116+ messages in thread
From: Eric Blake @ 2017-07-28 15:57 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 755 bytes --]

On 07/28/2017 03:06 AM, Peter Xu wrote:
> It will be used when we want to resume one paused migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hmp-commands.hx       | 7 ++++---
>  hmp.c                 | 4 +++-
>  migration/migration.c | 2 +-
>  qapi-schema.json      | 5 ++++-
>  4 files changed, 12 insertions(+), 6 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -3208,6 +3208,8 @@
>  # @detach: this argument exists only for compatibility reasons and
>  #          is ignored by QEMU
>  #
> +# @resume: resume one paused migration

Mention default false, and that it is since 2.11.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state
  2017-07-28 15:53   ` Eric Blake
@ 2017-07-31  7:02     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-31  7:02 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Fri, Jul 28, 2017 at 10:53:00AM -0500, Eric Blake wrote:
> On 07/28/2017 03:06 AM, Peter Xu wrote:
> > Introducing a new state "postcopy-paused", which can be used to pause a
> > postcopy migration. It is targeted to support network failures during
> > postcopy migration. Now when network down for postcopy, the source side
> > will not fail the migration. Instead we convert the status into this new
> > paused state, and we will try to wait for a rescue in the future.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> 
> You might want to use scripts/git.orderfile to put .json changes early
> in your diffs (interface before implementation makes for easier reviews).

Will do.

> 
> > +++ b/qapi-schema.json
> > @@ -667,6 +667,8 @@
> >  #
> >  # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
> >  #
> > +# @postcopy-paused: during postcopy but paused. (since 2.10)
> > +#
> 
> You've missed 2.10; this should be 2.11.

Definitely. It should be for 2.11.

> Can this state occur without
> any explicit request (ie. old clients may be confused by it), or do you
> have to opt-in to a specific migration parameter to inform qemu that you
> are aware of how to handle this state?

Yes, it can occur automatically, without any operation from user's
side. And it does not have a tunable to switch it on/off - it'll
always be on (that's my plan, though), since IMHO holding at a paused
state is always better than crashing the source directly (that's what
we do now - when postcopy encountered network failure, the VM will
crash and data will be lost).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-07-28 15:57   ` Eric Blake
@ 2017-07-31  7:05     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-07-31  7:05 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Fri, Jul 28, 2017 at 10:57:12AM -0500, Eric Blake wrote:
> On 07/28/2017 03:06 AM, Peter Xu wrote:
> > It will be used when we want to resume one paused migration.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hmp-commands.hx       | 7 ++++---
> >  hmp.c                 | 4 +++-
> >  migration/migration.c | 2 +-
> >  qapi-schema.json      | 5 ++++-
> >  4 files changed, 12 insertions(+), 6 deletions(-)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -3208,6 +3208,8 @@
> >  # @detach: this argument exists only for compatibility reasons and
> >  #          is ignored by QEMU
> >  #
> > +# @resume: resume one paused migration
> 
> Mention default false, and that it is since 2.11.

Will fix.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
@ 2017-07-31 16:34   ` Dr. David Alan Gilbert
  2017-08-01  2:11     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 16:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> The bitmap setup during postcopy is incorrectly when the pgaes are huge
> pages. Fix it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/postcopy-ram.c | 2 +-
>  migration/ram.c          | 8 ++++++++
>  migration/ram.h          | 2 ++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 276ce12..952b73a 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>      }
>      if (!ret) {
> -        ramblock_recv_bitmap_set(host_addr, rb);
> +        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());

isn't that   pagesize / qemu_target_page_size() ?

Other than that it looks OK.

>      }
>      return ret;
>  }
> diff --git a/migration/ram.c b/migration/ram.c
> index 107ee9d..c93973c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -168,6 +168,14 @@ void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb)
>      set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
>  }
>  
> +void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
> +                                    size_t len)
> +{
> +    bitmap_set(rb->receivedmap,
> +               ramblock_recv_bitmap_offset(host_addr, rb),
> +               len);
> +}
> +
>  void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb)
>  {
>      clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
> diff --git a/migration/ram.h b/migration/ram.h
> index b711552..84e8623 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -55,6 +55,8 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
>  int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb);
>  void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb);
> +void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
> +                                    size_t len);
>  void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb);
>  
>  #endif
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState
  2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
@ 2017-07-31 16:39   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 16:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Comments for "migration_dirty_pages" and "bitmap_mutex" are switched.
> Fix it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Split this out, it can go in a trivial patch probably sooner.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c93973c..c12358d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -222,9 +222,9 @@ struct RAMState {
>      uint64_t iterations_prev;
>      /* Iterations since start */
>      uint64_t iterations;
> -    /* protects modification of the bitmap */
> -    uint64_t migration_dirty_pages;
>      /* number of dirty bits in the bitmap */
> +    uint64_t migration_dirty_pages;
> +    /* protects modification of the bitmap */
>      QemuMutex bitmap_mutex;
>      /* The RAMBlock used in the last src_page_requests */
>      RAMBlock *last_req_rb;
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
@ 2017-07-31 16:53   ` Dr. David Alan Gilbert
  2017-08-01  2:25     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 16:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, berrange, stefanha

* Peter Xu (peterx@redhat.com) wrote:
> When accept failed, we should setup errp with the reason. More
> importantly, the caller may assume errp be non-NULL when error happens,
> and not setting the errp may crash QEMU.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  io/channel-socket.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 53386b7..7bc308e 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
>          if (errno == EINTR) {
>              goto retry;
>          }
> +        error_setg_errno(errp, errno, "Unable to accept connection");
>          goto error;

OK, but this code actually has a bigger problem as well:

the original is:

    cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
                           &cioc->remoteAddrLen);
    if (cioc->fd < 0) {
        trace_qio_channel_socket_accept_fail(ioc);
        if (errno == EINTR) {
            goto retry;
        }
        goto error;
    }

Stefan confirmed that trace_ doesn't preserve errno; so the if
following it is wrong.  It needs to preserve errno.

(Again this patch can go on it's own)

Dave

>      }
>  
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert()
  2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
@ 2017-07-31 17:11   ` Dr. David Alan Gilbert
  2017-08-01  2:43     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 17:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> It is used to invert the whole bitmap.

Would it be easier to change bitmap_complement to use ^
in it's macro and slow_bitmap_complement, and then you could call it
with src==dst  to do the same thing with just that small change?

Dave

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/qemu/bitmap.h | 10 ++++++++++
>  util/bitmap.c         | 13 +++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index c318da1..460d899 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -82,6 +82,7 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
>                         const unsigned long *bitmap2, long bits);
>  int slow_bitmap_intersects(const unsigned long *bitmap1,
>                             const unsigned long *bitmap2, long bits);
> +void slow_bitmap_invert(unsigned long *bitmap, long nbits);
>  
>  static inline unsigned long *bitmap_try_new(long nbits)
>  {
> @@ -216,6 +217,15 @@ static inline int bitmap_intersects(const unsigned long *src1,
>      }
>  }
>  
> +static inline void bitmap_invert(unsigned long *bitmap, long nbits)
> +{
> +    if (small_nbits(nbits)) {
> +        *bitmap ^= BITMAP_LAST_WORD_MASK(nbits);
> +    } else {
> +        slow_bitmap_invert(bitmap, nbits);
> +    }
> +}
> +
>  void bitmap_set(unsigned long *map, long i, long len);
>  void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
> diff --git a/util/bitmap.c b/util/bitmap.c
> index efced9a..9b7408c 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -355,3 +355,16 @@ int slow_bitmap_intersects(const unsigned long *bitmap1,
>      }
>      return 0;
>  }
> +
> +void slow_bitmap_invert(unsigned long *bitmap, long nbits)
> +{
> +    long k, lim = nbits/BITS_PER_LONG;
> +
> +    for (k = 0; k < lim; k++) {
> +        bitmap[k] ^= ULONG_MAX;
> +    }
> +
> +    if (nbits % BITS_PER_LONG) {
> +        bitmap[k] ^= BITMAP_LAST_WORD_MASK(nbits);
> +    }
> +}
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one()
  2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
@ 2017-07-31 17:58   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 17:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Count how many bits set in the bitmap.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/qemu/bitmap.h | 10 ++++++++++
>  util/bitmap.c         | 15 +++++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 460d899..9c18da0 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -83,6 +83,7 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
>  int slow_bitmap_intersects(const unsigned long *bitmap1,
>                             const unsigned long *bitmap2, long bits);
>  void slow_bitmap_invert(unsigned long *bitmap, long nbits);
> +long slow_bitmap_count_one(const unsigned long *bitmap, long nbits);
>  
>  static inline unsigned long *bitmap_try_new(long nbits)
>  {
> @@ -226,6 +227,15 @@ static inline void bitmap_invert(unsigned long *bitmap, long nbits)
>      }
>  }
>  
> +static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
> +{
> +    if (small_nbits(nbits)) {
> +        return (ctpopl(*bitmap & BITMAP_LAST_WORD_MASK(nbits)));
> +    } else {
> +        return slow_bitmap_count_one(bitmap, nbits);
> +    }
> +}
> +
>  void bitmap_set(unsigned long *map, long i, long len);
>  void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
> diff --git a/util/bitmap.c b/util/bitmap.c
> index 9b7408c..73a1063 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -368,3 +368,18 @@ void slow_bitmap_invert(unsigned long *bitmap, long nbits)
>          bitmap[k] ^= BITMAP_LAST_WORD_MASK(nbits);
>      }
>  }
> +
> +long slow_bitmap_count_one(const unsigned long *bitmap, long nbits)
> +{
> +    long k, lim = nbits/BITS_PER_LONG, result = 0;
> +
> +    for (k = 0; k < lim; k++) {
> +        result += ctpopl(bitmap[k]);
> +    }
> +
> +    if (nbits % BITS_PER_LONG) {
> +        result += ctpopl(bitmap[k] & BITMAP_LAST_WORD_MASK(nbits));
> +    }
> +
> +    return result;
> +}

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(I checked what happens with BITMAP_LAST_WORD_MASK(0) interestingly it's
all 1 - so you do need that if)

> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace
  2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
@ 2017-07-31 18:27   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 18:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Strings are more readable for debugging.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c  | 3 ++-
>  migration/trace-events | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6803187..bdc4445 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -914,8 +914,9 @@ void qmp_migrate_start_postcopy(Error **errp)
>  
>  void migrate_set_state(int *state, int old_state, int new_state)
>  {
> +    assert(new_state < MIGRATION_STATUS__MAX);
>      if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
> -        trace_migrate_set_state(new_state);
> +        trace_migrate_set_state(MigrationStatus_lookup[new_state]);
>          migrate_generate_event(new_state);
>      }
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index cb2c4b5..08d00fa 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -80,7 +80,7 @@ ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
>  await_return_path_close_on_source_joining(void) ""
> -migrate_set_state(int new_state) "new state %d"
> +migrate_set_state(const char *new_state) "new state %s"
>  migrate_fd_cleanup(void) ""
>  migrate_fd_error(const char *error_desc) "error=%s"
>  migrate_fd_cancel(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile
  2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
@ 2017-07-31 18:39   ` Dr. David Alan Gilbert
  2017-08-01  5:49     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 18:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> If the postcopy down due to some reason, we can always see this on dst:
> 
>   qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> 
> However in most cases that's not the real issue. The problem is that
> qemu_get_be16() has no way to show whether the returned data is valid or
> not, and we are _always_ assuming it is valid. That's possibly not wise.
> 
> The best approach to solve this would be: refactoring QEMUFile interface
> to allow the APIs to return error if there is. However it needs quite a
> bit of work and testing. For now, let's explicitly check the validity
> first before using the data in all places for qemu_get_*().
> 
> This patch tries to fix most of the cases I can see. Only if we are with
> this, can we make sure we are processing the valid data, and also can we
> make sure we can capture the channel down events correctly.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c |  5 +++++
>  migration/ram.c       | 22 ++++++++++++++++++----
>  migration/savevm.c    | 29 +++++++++++++++++++++++++++--
>  3 files changed, 50 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index bdc4445..5b2602e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1543,6 +1543,11 @@ static void *source_return_path_thread(void *opaque)
>          header_type = qemu_get_be16(rp);
>          header_len = qemu_get_be16(rp);
>  
> +        if (qemu_file_get_error(rp)) {
> +            mark_source_rp_bad(ms);
> +            goto out;
> +        }
> +
>          if (header_type >= MIG_RP_MSG_MAX ||
>              header_type == MIG_RP_MSG_INVALID) {
>              error_report("RP: Received invalid message 0x%04x length 0x%04x",
> diff --git a/migration/ram.c b/migration/ram.c
> index c12358d..7f4cb0f 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2416,7 +2416,7 @@ static int ram_load_postcopy(QEMUFile *f)
>      void *last_host = NULL;
>      bool all_zero = false;
>  
> -    while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> +    while (!(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr;
>          void *host = NULL;
>          void *page_buffer = NULL;
> @@ -2425,6 +2425,16 @@ static int ram_load_postcopy(QEMUFile *f)
>          uint8_t ch;
>  
>          addr = qemu_get_be64(f);
> +
> +        /*
> +         * If qemu file error, we should stop here, and then "addr"
> +         * may be invalid
> +         */
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
> +        }

I'd prefer:
    ret = qemu_file_get_error(f);
    if (ret) {
        break;
    }


> +
>          flags = addr & ~TARGET_PAGE_MASK;
>          addr &= TARGET_PAGE_MASK;
>  
> @@ -2505,6 +2515,13 @@ static int ram_load_postcopy(QEMUFile *f)
>              error_report("Unknown combination of migration flags: %#x"
>                           " (postcopy mode)", flags);
>              ret = -EINVAL;
> +            break;
> +        }
> +
> +        /* Detect for any possible file errors */
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
>          }
>  
>          if (place_needed) {
> @@ -2519,9 +2536,6 @@ static int ram_load_postcopy(QEMUFile *f)
>                                            place_source, block);
>              }
>          }
> -        if (!ret) {
> -            ret = qemu_file_get_error(f);
> -        }

I think we've lost an error check here; the code before this does:
   ret = postcopy_place_page.....

and if that failed it used to be detected by the !ret check in the
while condition, but with that gone, we need to add a check for ret
after the place page.
      
>      }
>  
>      return ret;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index fdd15fa..13ae9d6 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
>      cmd = qemu_get_be16(f);
>      len = qemu_get_be16(f);
>  
> +    /* Check validity before continue processing of cmds */
> +    if (qemu_file_get_error(f)) {
> +        return qemu_file_get_error(f);
> +    }
> +
>      trace_loadvm_process_command(cmd, len);
>      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
>          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> @@ -1855,6 +1860,11 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>          return -EINVAL;
>      }
>  
> +    /* Check validity before load the vmstate */
> +    if (qemu_file_get_error(f)) {
> +        return qemu_file_get_error(f);
> +    }
> +

Do you need a check after the instance_id/version_id read and in
check_section_footer?


>      ret = vmstate_load(f, se);
>      if (ret < 0) {
>          error_report("error while loading state for instance 0x%x of"
> @@ -1888,6 +1898,11 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>          return -EINVAL;
>      }
>  
> +    /* Check validity before load the vmstate */
> +    if (qemu_file_get_error(f)) {
> +        return qemu_file_get_error(f);
> +    }
> +

Similar question; do you need the check after the section_id =   read ?

>      ret = vmstate_load(f, se);
>      if (ret < 0) {
>          error_report("error while loading state section id %d(%s)",
> @@ -1944,8 +1959,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>      uint8_t section_type;
>      int ret = 0;
>  
> -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> -        ret = 0;
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
> +        }
> +
>          trace_qemu_loadvm_state_section(section_type);
>          switch (section_type) {
>          case QEMU_VM_SECTION_START:
> @@ -1969,6 +1990,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>                  goto out;
>              }
>              break;
> +        case QEMU_VM_EOF:
> +            /* This is the end of migration */
> +            goto out;
> +            break;

Just the goto is sufficient there.

>          default:
>              error_report("Unknown savevm section type %d", section_type);
>              ret = -EINVAL;
> -- 
> 2.7.4

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd
  2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
@ 2017-07-31 18:42   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 18:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> It was only used for quitting the page fault thread before. Let it be
> something more useful - now we can use it to notify a "wake" for the
> page fault thread (for any reason), and it only means "quit" if the
> fault_thread_quit is set.
> 
> Since we changed what it does, renaming it to userfault_event_fd.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h    |  6 ++++--
>  migration/postcopy-ram.c | 24 ++++++++++++++++--------
>  2 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 148c9fa..70e3094 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -35,6 +35,8 @@ struct MigrationIncomingState {
>      bool           have_fault_thread;
>      QemuThread     fault_thread;
>      QemuSemaphore  fault_thread_sem;
> +    /* Set this when we want the fault thread to quit */
> +    bool           fault_thread_quit;
>  
>      bool           have_listen_thread;
>      QemuThread     listen_thread;
> @@ -42,8 +44,8 @@ struct MigrationIncomingState {
>  
>      /* For the kernel to send us notifications */
>      int       userfault_fd;
> -    /* To tell the fault_thread to quit */
> -    int       userfault_quit_fd;
> +    /* To notify the fault_thread to wake, e.g., when need to quit */
> +    int       userfault_event_fd;
>      QEMUFile *to_src_file;
>      QemuMutex rp_mutex;    /* We send replies from multiple threads */
>      void     *postcopy_tmp_page;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 952b73a..4278fe7 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -305,7 +305,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>           * currently be at 0, we're going to increment it to 1
>           */
>          tmp64 = 1;
> -        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> +        atomic_set(&mis->fault_thread_quit, 1);
> +        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
>              trace_postcopy_ram_incoming_cleanup_join();
>              qemu_thread_join(&mis->fault_thread);
>          } else {
> @@ -315,7 +316,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>          }
>          trace_postcopy_ram_incoming_cleanup_closeuf();
>          close(mis->userfault_fd);
> -        close(mis->userfault_quit_fd);
> +        close(mis->userfault_event_fd);
>          mis->have_fault_thread = false;
>      }
>  
> @@ -438,7 +439,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          pfd[0].fd = mis->userfault_fd;
>          pfd[0].events = POLLIN;
>          pfd[0].revents = 0;
> -        pfd[1].fd = mis->userfault_quit_fd;
> +        pfd[1].fd = mis->userfault_event_fd;
>          pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
>          pfd[1].revents = 0;
>  
> @@ -448,8 +449,15 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          }
>  
>          if (pfd[1].revents) {
> -            trace_postcopy_ram_fault_thread_quit();
> -            break;
> +            uint64_t tmp64 = 0;
> +
> +            /* Consume the signal */
> +            read(mis->userfault_event_fd, &tmp64, 8);
> +
> +            if (atomic_read(&mis->fault_thread_quit)) {
> +                trace_postcopy_ram_fault_thread_quit();
> +                break;
> +            }
>          }
>  
>          ret = read(mis->userfault_fd, &msg, sizeof(msg));
> @@ -528,9 +536,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>      }
>  
>      /* Now an eventfd we use to tell the fault-thread to quit */
> -    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
> -    if (mis->userfault_quit_fd == -1) {
> -        error_report("%s: Opening userfault_quit_fd: %s", __func__,
> +    mis->userfault_event_fd = eventfd(0, EFD_CLOEXEC);
> +    if (mis->userfault_event_fd == -1) {
> +        error_report("%s: Opening userfault_event_fd: %s", __func__,
>                       strerror(errno));
>          close(mis->userfault_fd);
>          return -1;
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify()
  2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
@ 2017-07-31 18:45   ` Dr. David Alan Gilbert
  2017-08-01  3:01     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 18:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> A general helper to notify the fault thread.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/postcopy-ram.c | 35 ++++++++++++++++++++---------------
>  migration/postcopy-ram.h |  2 ++
>  2 files changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 4278fe7..9ce391d 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -287,6 +287,21 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>      return 0;
>  }
>  
> +void postcopy_fault_thread_notify(MigrationIncomingState *mis)
> +{
> +    uint64_t tmp64 = 1;
> +
> +    /*
> +     * Tell the fault_thread to exit, it's an eventfd that should
> +     * currently be at 0, we're going to increment it to 1
> +     */
> +    if (write(mis->userfault_event_fd, &tmp64, 8) != 8) {
> +        /* Not much we can do here, but may as well report it */
> +        error_report("%s: incrementing userfault_quit_fd: %s", __func__,

minor; that error message needs updating with the new name, or since
it's a standalone function, 'incrementing failed:'  would work.
Other than that:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +                     strerror(errno));
> +    }
> +}
> +
>  /*
>   * At the end of a migration where postcopy_ram_incoming_init was called.
>   */
> @@ -295,25 +310,15 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>      trace_postcopy_ram_incoming_cleanup_entry();
>  
>      if (mis->have_fault_thread) {
> -        uint64_t tmp64;
> -
>          if (qemu_ram_foreach_block(cleanup_range, mis)) {
>              return -1;
>          }
> -        /*
> -         * Tell the fault_thread to exit, it's an eventfd that should
> -         * currently be at 0, we're going to increment it to 1
> -         */
> -        tmp64 = 1;
> +        /* Let the fault thread quit */
>          atomic_set(&mis->fault_thread_quit, 1);
> -        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
> -            trace_postcopy_ram_incoming_cleanup_join();
> -            qemu_thread_join(&mis->fault_thread);
> -        } else {
> -            /* Not much we can do here, but may as well report it */
> -            error_report("%s: incrementing userfault_quit_fd: %s", __func__,
> -                         strerror(errno));
> -        }
> +        postcopy_fault_thread_notify(mis);
> +        trace_postcopy_ram_incoming_cleanup_join();
> +        qemu_thread_join(&mis->fault_thread);
> +
>          trace_postcopy_ram_incoming_cleanup_closeuf();
>          close(mis->userfault_fd);
>          close(mis->userfault_event_fd);
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 78a3591..4a7644d 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -114,4 +114,6 @@ PostcopyState postcopy_state_get(void);
>  /* Set the state and return the old state */
>  PostcopyState postcopy_state_set(PostcopyState new_state);
>  
> +void postcopy_fault_thread_notify(MigrationIncomingState *mis);
> +
>  #endif
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast"
  2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
@ 2017-07-31 18:52   ` Dr. David Alan Gilbert
  2017-08-01  3:13     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 18:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This provides a way to start postcopy ASAP when migration starts. To do
> this, we need both:
> 
>   -global migration.x-postcopy-ram=on \
>   -global migration.x-postcopy-fast=on

Can you explain why this is necessary?  Both sides already know
they're doing a postcopy recovery don't they?

Dave

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 9 ++++++++-
>  migration/migration.h | 2 ++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5b2602e..efee87e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1936,6 +1936,11 @@ bool migrate_colo_enabled(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
>  }
>  
> +static bool postcopy_should_start(MigrationState *s)
> +{
> +    return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> +}
> +
>  /*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
> @@ -2013,7 +2018,7 @@ static void *migration_thread(void *opaque)
>                  if (migrate_postcopy_ram() &&
>                      s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE &&
>                      pend_nonpost <= threshold_size &&
> -                    atomic_read(&s->start_postcopy)) {
> +                    postcopy_should_start(s)) {
>  
>                      if (!postcopy_start(s, &old_vm_running)) {
>                          current_active_state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
> @@ -2170,6 +2175,8 @@ static Property migration_properties[] = {
>                       send_configuration, true),
>      DEFINE_PROP_BOOL("send-section-footer", MigrationState,
>                       send_section_footer, true),
> +    DEFINE_PROP_BOOL("x-postcopy-fast", MigrationState,
> +                     start_postcopy_fast, false),
>  
>      /* Migration parameters */
>      DEFINE_PROP_INT64("x-compress-level", MigrationState,
> diff --git a/migration/migration.h b/migration/migration.h
> index 70e3094..e902bae 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -113,6 +113,8 @@ struct MigrationState
>  
>      /* Flag set once the migration has been asked to enter postcopy */
>      bool start_postcopy;
> +    /* Set the flag if we want to start postcopy ASAP when migration starts */
> +    bool start_postcopy_fast;
>      /* Flag set after postcopy has sent the device state */
>      bool postcopy_after_devices;
>  
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state
  2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
  2017-07-28 15:53   ` Eric Blake
@ 2017-07-31 19:06   ` Dr. David Alan Gilbert
  2017-08-01  6:28     ` Peter Xu
  1 sibling, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-31 19:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing a new state "postcopy-paused", which can be used to pause a
> postcopy migration. It is targeted to support network failures during
> postcopy migration. Now when network down for postcopy, the source side
> will not fail the migration. Instead we convert the status into this new
> paused state, and we will try to wait for a rescue in the future.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I think this should probably be split into:
   a) A patch that adds a new state and the entries in query_migrate etc
   b) A patch that wires up the semaphore and the use of the state.

> ---
>  migration/migration.c  | 78 +++++++++++++++++++++++++++++++++++++++++++++++---
>  migration/migration.h  |  3 ++
>  migration/trace-events |  1 +
>  qapi-schema.json       |  5 +++-
>  4 files changed, 82 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index efee87e..0bc70c8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -470,6 +470,7 @@ static bool migration_is_setup_or_active(int state)
>      switch (state) {
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
>      case MIGRATION_STATUS_SETUP:
>          return true;
>  
> @@ -545,6 +546,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_CANCELLING:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
>           /* TODO add some postcopy stats */
>          info->has_status = true;
>          info->has_total_time = true;
> @@ -991,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
>  
>      notifier_list_notify(&migration_state_notifiers, s);
>      block_cleanup_parameters(s);
> +
> +    qemu_sem_destroy(&s->postcopy_pause_sem);
>  }
>  
>  void migrate_fd_error(MigrationState *s, const Error *error)
> @@ -1134,6 +1138,7 @@ MigrationState *migrate_init(void)
>      s->migration_thread_running = false;
>      error_free(s->error);
>      s->error = NULL;
> +    qemu_sem_init(&s->postcopy_pause_sem, 0);
>  
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
> @@ -1942,6 +1947,69 @@ static bool postcopy_should_start(MigrationState *s)
>  }
>  
>  /*
> + * We don't return until we are in a safe state to continue current
> + * postcopy migration.  Returns true to continue the migration, or
> + * false to terminate current migration.
> + */
> +static bool postcopy_pause(MigrationState *s)
> +{
> +    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);

I never like asserts on the sending side.

> +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +    /* Current channel is possibly broken. Release it. */
> +    assert(s->to_dst_file);
> +    qemu_file_shutdown(s->to_dst_file);
> +    qemu_fclose(s->to_dst_file);
> +    s->to_dst_file = NULL;

That does scare me a little; I think it's OK, I'm not sure what happens
to the ->from_dst_file fd and the return-path processing.

> +    /*
> +     * We wait until things fixed up. Then someone will setup the
> +     * status back for us.
> +     */
> +    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        qemu_sem_wait(&s->postcopy_pause_sem);
> +    }

Something should get written to stderr prior to this, so when we
find a migration apparently stuck we can tell why.

Dave

> +
> +    trace_postcopy_pause_continued();
> +
> +    return true;
> +}
> +
> +/* Return true if we want to stop the migration, otherwise false. */
> +static bool migration_detect_error(MigrationState *s)
> +{
> +    int ret;
> +
> +    /* Try to detect any file errors */
> +    ret = qemu_file_get_error(s->to_dst_file);
> +
> +    if (!ret) {
> +        /* Everything is fine */
> +        return false;
> +    }
> +
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> +        /*
> +         * For postcopy, we allow the network to be down for a
> +         * while. After that, it can be continued by a
> +         * recovery phase.
> +         */
> +        return !postcopy_pause(s);
> +    } else {
> +        /*
> +         * For precopy (or postcopy with error outside IO), we fail
> +         * with no time.
> +         */
> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> +        trace_migration_thread_file_err();
> +
> +        /* Time to stop the migration, now. */
> +        return true;
> +    }
> +}
> +
> +/*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
>   */
> @@ -2037,12 +2105,14 @@ static void *migration_thread(void *opaque)
>              }
>          }
>  
> -        if (qemu_file_get_error(s->to_dst_file)) {
> -            migrate_set_state(&s->state, current_active_state,
> -                              MIGRATION_STATUS_FAILED);
> -            trace_migration_thread_file_err();
> +        /*
> +         * Try to detect any kind of failures, and see whether we
> +         * should stop the migration now.
> +         */
> +        if (migration_detect_error(s)) {
>              break;
>          }
> +
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>          if (current_time >= initial_time + BUFFER_DELAY) {
>              uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
> diff --git a/migration/migration.h b/migration/migration.h
> index e902bae..24cdaf6 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -151,6 +151,9 @@ struct MigrationState
>      bool send_configuration;
>      /* Whether we send section footer during migration */
>      bool send_section_footer;
> +
> +    /* Needed by postcopy-pause state */
> +    QemuSemaphore postcopy_pause_sem;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/trace-events b/migration/trace-events
> index 08d00fa..2211acc 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -98,6 +98,7 @@ migration_thread_setup_complete(void) ""
>  open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
> +postcopy_pause_continued(void) ""
>  postcopy_start_set_run(void) ""
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 9c6c3e1..2a36b80 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -667,6 +667,8 @@
>  #
>  # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
>  #
> +# @postcopy-paused: during postcopy but paused. (since 2.10)
> +#
>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -679,7 +681,8 @@
>  ##
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
> -            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
> +            'active', 'postcopy-active', 'postcopy-paused',
> +            'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-07-31 16:34   ` Dr. David Alan Gilbert
@ 2017-08-01  2:11     ` Peter Xu
  2017-08-01  5:48       ` Alexey Perevalov
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-01  2:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 05:34:14PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > The bitmap setup during postcopy is incorrectly when the pgaes are huge
> > pages. Fix it.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/postcopy-ram.c | 2 +-
> >  migration/ram.c          | 8 ++++++++
> >  migration/ram.h          | 2 ++
> >  3 files changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 276ce12..952b73a 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >      }
> >      if (!ret) {
> > -        ramblock_recv_bitmap_set(host_addr, rb);
> > +        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());
> 
> isn't that   pagesize / qemu_target_page_size() ?
> 
> Other than that it looks OK.

Yes, I should have fixed this before.

I guess Alexey will handle this change (along with the copied bitmap
series)?  Anyway, I'll fix it as well in my series, until Alexey post
the new version that I can rebase to.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-07-31 16:53   ` Dr. David Alan Gilbert
@ 2017-08-01  2:25     ` Peter Xu
  2017-08-01  8:32       ` Daniel P. Berrange
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-01  2:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, berrange, stefanha

On Mon, Jul 31, 2017 at 05:53:39PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > When accept failed, we should setup errp with the reason. More
> > importantly, the caller may assume errp be non-NULL when error happens,
> > and not setting the errp may crash QEMU.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  io/channel-socket.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 53386b7..7bc308e 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> >          if (errno == EINTR) {
> >              goto retry;
> >          }
> > +        error_setg_errno(errp, errno, "Unable to accept connection");
> >          goto error;
> 
> OK, but this code actually has a bigger problem as well:
> 
> the original is:
> 
>     cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
>                            &cioc->remoteAddrLen);
>     if (cioc->fd < 0) {
>         trace_qio_channel_socket_accept_fail(ioc);
>         if (errno == EINTR) {
>             goto retry;
>         }
>         goto error;
>     }
> 
> Stefan confirmed that trace_ doesn't preserve errno; so the if
> following it is wrong.  It needs to preserve errno.

Ah... If so, not sure whether we can do the reservation in trace codes
in general?

For this one, I can just move the trace_*() below the errno check.
After all, if EINTR is got, it's not really a fail, so imho we should
not trace it with "accept fail".

> 
> (Again this patch can go on it's own)

Yes. For these patches, I intentionally put them at the beginning of
the series (for easier picking up standalone). Do you (or Juan?) want
me to repost these patches separately?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert()
  2017-07-31 17:11   ` Dr. David Alan Gilbert
@ 2017-08-01  2:43     ` Peter Xu
  2017-08-01  8:40       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-01  2:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 06:11:56PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > It is used to invert the whole bitmap.
> 
> Would it be easier to change bitmap_complement to use ^
> in it's macro and slow_bitmap_complement, and then you could call it
> with src==dst  to do the same thing with just that small change?

Or, I can directly use that and drop this patch. :-)

(I didn't really notice that one before)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify()
  2017-07-31 18:45   ` Dr. David Alan Gilbert
@ 2017-08-01  3:01     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-01  3:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 07:45:38PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > A general helper to notify the fault thread.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/postcopy-ram.c | 35 ++++++++++++++++++++---------------
> >  migration/postcopy-ram.h |  2 ++
> >  2 files changed, 22 insertions(+), 15 deletions(-)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 4278fe7..9ce391d 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -287,6 +287,21 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> >      return 0;
> >  }
> >  
> > +void postcopy_fault_thread_notify(MigrationIncomingState *mis)
> > +{
> > +    uint64_t tmp64 = 1;
> > +
> > +    /*
> > +     * Tell the fault_thread to exit, it's an eventfd that should
> > +     * currently be at 0, we're going to increment it to 1
> > +     */
> > +    if (write(mis->userfault_event_fd, &tmp64, 8) != 8) {
> > +        /* Not much we can do here, but may as well report it */
> > +        error_report("%s: incrementing userfault_quit_fd: %s", __func__,
> 
> minor; that error message needs updating with the new name, or since
> it's a standalone function, 'incrementing failed:'  would work.
> Other than that:

Will fix (possibly should be in previous patch since that patch did
the name change).  Also, I think I need to touch up the comment as well
with s/exit/wake/.

> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast"
  2017-07-31 18:52   ` Dr. David Alan Gilbert
@ 2017-08-01  3:13     ` Peter Xu
  2017-08-01  8:50       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-01  3:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 07:52:24PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This provides a way to start postcopy ASAP when migration starts. To do
> > this, we need both:
> > 
> >   -global migration.x-postcopy-ram=on \
> >   -global migration.x-postcopy-fast=on
> 
> Can you explain why this is necessary?  Both sides already know
> they're doing a postcopy recovery don't they?

What I wanted to do here is to provide a way to start postcopy at the
very beginning (actually it'll possibly start postcopy at the first
loop in migration_thread), instead of start postcopy until we trigger
it using "migrate_start_postcopy" command.

I used it for easier debugging (so I don't need to type
"migrate_start_postcopy" every time when I trigger postcopy
migration), meanwhile I think it can also be used when someone really
want to start postcopy from the very beginning.

Would such a new parameter makes sense?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-08-01  2:11     ` Peter Xu
@ 2017-08-01  5:48       ` Alexey Perevalov
  2017-08-01  6:02         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Alexey Perevalov @ 2017-08-01  5:48 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Andrea Arcangeli

On 08/01/2017 05:11 AM, Peter Xu wrote:
> On Mon, Jul 31, 2017 at 05:34:14PM +0100, Dr. David Alan Gilbert wrote:
>> * Peter Xu (peterx@redhat.com) wrote:
>>> The bitmap setup during postcopy is incorrectly when the pgaes are huge
>>> pages. Fix it.
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>   migration/postcopy-ram.c | 2 +-
>>>   migration/ram.c          | 8 ++++++++
>>>   migration/ram.h          | 2 ++
>>>   3 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>> index 276ce12..952b73a 100644
>>> --- a/migration/postcopy-ram.c
>>> +++ b/migration/postcopy-ram.c
>>> @@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>           ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>       }
>>>       if (!ret) {
>>> -        ramblock_recv_bitmap_set(host_addr, rb);
>>> +        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());
>> isn't that   pagesize / qemu_target_page_size() ?
>>
>> Other than that it looks OK.
> Yes, I should have fixed this before.
>
> I guess Alexey will handle this change (along with the copied bitmap
> series)?  Anyway, I'll fix it as well in my series, until Alexey post
> the new version that I can rebase to.  Thanks,
>
I'll squash it, and I'll resend it today.
Are you agree to add

Signed-off-by: Peter Xu <peterx@redhat.com>

to my patch?


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile
  2017-07-31 18:39   ` Dr. David Alan Gilbert
@ 2017-08-01  5:49     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-01  5:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 07:39:24PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:

[...]

> > @@ -2425,6 +2425,16 @@ static int ram_load_postcopy(QEMUFile *f)
> >          uint8_t ch;
> >  
> >          addr = qemu_get_be64(f);
> > +
> > +        /*
> > +         * If qemu file error, we should stop here, and then "addr"
> > +         * may be invalid
> > +         */
> > +        if (qemu_file_get_error(f)) {
> > +            ret = qemu_file_get_error(f);
> > +            break;
> > +        }
> 
> I'd prefer:
>     ret = qemu_file_get_error(f);
>     if (ret) {
>         break;
>     }

Sure.  Fixing up.

> 
> 
> > +
> >          flags = addr & ~TARGET_PAGE_MASK;
> >          addr &= TARGET_PAGE_MASK;
> >  
> > @@ -2505,6 +2515,13 @@ static int ram_load_postcopy(QEMUFile *f)
> >              error_report("Unknown combination of migration flags: %#x"
> >                           " (postcopy mode)", flags);
> >              ret = -EINVAL;
> > +            break;
> > +        }
> > +
> > +        /* Detect for any possible file errors */
> > +        if (qemu_file_get_error(f)) {
> > +            ret = qemu_file_get_error(f);
> > +            break;
> >          }
> >  
> >          if (place_needed) {
> > @@ -2519,9 +2536,6 @@ static int ram_load_postcopy(QEMUFile *f)
> >                                            place_source, block);
> >              }
> >          }
> > -        if (!ret) {
> > -            ret = qemu_file_get_error(f);
> > -        }
> 
> I think we've lost an error check here; the code before this does:
>    ret = postcopy_place_page.....
> 
> and if that failed it used to be detected by the !ret check in the
> while condition, but with that gone, we need to add a check for ret
> after the place page.

Hmm yes, I should check the ret code here in case that place_page
failed.

I was thinking it was checking only for file errors, but obviously it
was not.

>       
> >      }
> >  
> >      return ret;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index fdd15fa..13ae9d6 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1720,6 +1720,11 @@ static int loadvm_process_command(QEMUFile *f)
> >      cmd = qemu_get_be16(f);
> >      len = qemu_get_be16(f);
> >  
> > +    /* Check validity before continue processing of cmds */
> > +    if (qemu_file_get_error(f)) {
> > +        return qemu_file_get_error(f);
> > +    }
> > +
> >      trace_loadvm_process_command(cmd, len);
> >      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
> >          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> > @@ -1855,6 +1860,11 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
> >          return -EINVAL;
> >      }
> >  
> > +    /* Check validity before load the vmstate */
> > +    if (qemu_file_get_error(f)) {
> > +        return qemu_file_get_error(f);
> > +    }
> > +
> 
> Do you need a check after the instance_id/version_id read and

I was trying to avoid checking it too many times, so I only checked it
before vmstate_load(). However yes it'll be good to check there as
well.

Generally speaking this patch is not fixing every callers for
qemu_get*(), but only those which will affect the network failure
detection for postcopy only.

> in
> check_section_footer?

Yes. Will do.

> 
> 
> >      ret = vmstate_load(f, se);
> >      if (ret < 0) {
> >          error_report("error while loading state for instance 0x%x of"
> > @@ -1888,6 +1898,11 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
> >          return -EINVAL;
> >      }
> >  
> > +    /* Check validity before load the vmstate */
> > +    if (qemu_file_get_error(f)) {
> > +        return qemu_file_get_error(f);
> > +    }
> > +
> 
> Similar question; do you need the check after the section_id =   read ?

Will do.

> 
> >      ret = vmstate_load(f, se);
> >      if (ret < 0) {
> >          error_report("error while loading state section id %d(%s)",
> > @@ -1944,8 +1959,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >      uint8_t section_type;
> >      int ret = 0;
> >  
> > -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> > -        ret = 0;
> > +    while (true) {
> > +        section_type = qemu_get_byte(f);
> > +
> > +        if (qemu_file_get_error(f)) {
> > +            ret = qemu_file_get_error(f);
> > +            break;
> > +        }
> > +
> >          trace_qemu_loadvm_state_section(section_type);
> >          switch (section_type) {
> >          case QEMU_VM_SECTION_START:
> > @@ -1969,6 +1990,10 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >                  goto out;
> >              }
> >              break;
> > +        case QEMU_VM_EOF:
> > +            /* This is the end of migration */
> > +            goto out;
> > +            break;
> 
> Just the goto is sufficient there.

Will fix.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-08-01  5:48       ` Alexey Perevalov
@ 2017-08-01  6:02         ` Peter Xu
  2017-08-01  6:12           ` Alexey Perevalov
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-01  6:02 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: Dr. David Alan Gilbert, qemu-devel, Laurent Vivier,
	Juan Quintela, Andrea Arcangeli

On Tue, Aug 01, 2017 at 08:48:18AM +0300, Alexey Perevalov wrote:
> On 08/01/2017 05:11 AM, Peter Xu wrote:
> >On Mon, Jul 31, 2017 at 05:34:14PM +0100, Dr. David Alan Gilbert wrote:
> >>* Peter Xu (peterx@redhat.com) wrote:
> >>>The bitmap setup during postcopy is incorrectly when the pgaes are huge
> >>>pages. Fix it.
> >>>
> >>>Signed-off-by: Peter Xu <peterx@redhat.com>
> >>>---
> >>>  migration/postcopy-ram.c | 2 +-
> >>>  migration/ram.c          | 8 ++++++++
> >>>  migration/ram.h          | 2 ++
> >>>  3 files changed, 11 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >>>index 276ce12..952b73a 100644
> >>>--- a/migration/postcopy-ram.c
> >>>+++ b/migration/postcopy-ram.c
> >>>@@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >>>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>      }
> >>>      if (!ret) {
> >>>-        ramblock_recv_bitmap_set(host_addr, rb);
> >>>+        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());
> >>isn't that   pagesize / qemu_target_page_size() ?
> >>
> >>Other than that it looks OK.
> >Yes, I should have fixed this before.
> >
> >I guess Alexey will handle this change (along with the copied bitmap
> >series)?  Anyway, I'll fix it as well in my series, until Alexey post
> >the new version that I can rebase to.  Thanks,
> >
> I'll squash it, and I'll resend it today.
> Are you agree to add
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> to my patch?

Firstly, if you are squashing the patch, fixing the issue that Dave
has pointed out, please feel free to add my R-b on the patch.

I don't know whether it would be suitable to add my S-o-b here - since
most of the patch content is written by you, not me. But I'm totally
fine if you want to include that (btw, thanks for the offer :).

So either one R-b or S-o-b is okay to me.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap
  2017-08-01  6:02         ` Peter Xu
@ 2017-08-01  6:12           ` Alexey Perevalov
  0 siblings, 0 replies; 116+ messages in thread
From: Alexey Perevalov @ 2017-08-01  6:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Laurent Vivier,
	Juan Quintela, Andrea Arcangeli

On 08/01/2017 09:02 AM, Peter Xu wrote:
> On Tue, Aug 01, 2017 at 08:48:18AM +0300, Alexey Perevalov wrote:
>> On 08/01/2017 05:11 AM, Peter Xu wrote:
>>> On Mon, Jul 31, 2017 at 05:34:14PM +0100, Dr. David Alan Gilbert wrote:
>>>> * Peter Xu (peterx@redhat.com) wrote:
>>>>> The bitmap setup during postcopy is incorrectly when the pgaes are huge
>>>>> pages. Fix it.
>>>>>
>>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>>> ---
>>>>>   migration/postcopy-ram.c | 2 +-
>>>>>   migration/ram.c          | 8 ++++++++
>>>>>   migration/ram.h          | 2 ++
>>>>>   3 files changed, 11 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>>>> index 276ce12..952b73a 100644
>>>>> --- a/migration/postcopy-ram.c
>>>>> +++ b/migration/postcopy-ram.c
>>>>> @@ -578,7 +578,7 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>>>           ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>       }
>>>>>       if (!ret) {
>>>>> -        ramblock_recv_bitmap_set(host_addr, rb);
>>>>> +        ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / getpagesize());
>>>> isn't that   pagesize / qemu_target_page_size() ?
>>>>
>>>> Other than that it looks OK.
>>> Yes, I should have fixed this before.
>>>
>>> I guess Alexey will handle this change (along with the copied bitmap
>>> series)?  Anyway, I'll fix it as well in my series, until Alexey post
>>> the new version that I can rebase to.  Thanks,
>>>
>> I'll squash it, and I'll resend it today.
>> Are you agree to add
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>
>> to my patch?
> Firstly, if you are squashing the patch, fixing the issue that Dave
> has pointed out, please feel free to add my R-b on the patch.
Of course I'll take into account David's suggestion.
>
> I don't know whether it would be suitable to add my S-o-b here - since
> most of the patch content is written by you, not me. But I'm totally
> fine if you want to include that (btw, thanks for the offer :).
>
> So either one R-b or S-o-b is okay to me.  Thanks,
>

-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state
  2017-07-31 19:06   ` Dr. David Alan Gilbert
@ 2017-08-01  6:28     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-01  6:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Mon, Jul 31, 2017 at 08:06:18PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Introducing a new state "postcopy-paused", which can be used to pause a
> > postcopy migration. It is targeted to support network failures during
> > postcopy migration. Now when network down for postcopy, the source side
> > will not fail the migration. Instead we convert the status into this new
> > paused state, and we will try to wait for a rescue in the future.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I think this should probably be split into:
>    a) A patch that adds a new state and the entries in query_migrate etc
>    b) A patch that wires up the semaphore and the use of the state.

Reasonable.  Let me split it.

> 
> > ---
> >  migration/migration.c  | 78 +++++++++++++++++++++++++++++++++++++++++++++++---
> >  migration/migration.h  |  3 ++
> >  migration/trace-events |  1 +
> >  qapi-schema.json       |  5 +++-
> >  4 files changed, 82 insertions(+), 5 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index efee87e..0bc70c8 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -470,6 +470,7 @@ static bool migration_is_setup_or_active(int state)
> >      switch (state) {
> >      case MIGRATION_STATUS_ACTIVE:
> >      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> >      case MIGRATION_STATUS_SETUP:
> >          return true;
> >  
> > @@ -545,6 +546,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
> >      case MIGRATION_STATUS_ACTIVE:
> >      case MIGRATION_STATUS_CANCELLING:
> >      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> >           /* TODO add some postcopy stats */
> >          info->has_status = true;
> >          info->has_total_time = true;
> > @@ -991,6 +993,8 @@ static void migrate_fd_cleanup(void *opaque)
> >  
> >      notifier_list_notify(&migration_state_notifiers, s);
> >      block_cleanup_parameters(s);
> > +
> > +    qemu_sem_destroy(&s->postcopy_pause_sem);
> >  }
> >  
> >  void migrate_fd_error(MigrationState *s, const Error *error)
> > @@ -1134,6 +1138,7 @@ MigrationState *migrate_init(void)
> >      s->migration_thread_running = false;
> >      error_free(s->error);
> >      s->error = NULL;
> > +    qemu_sem_init(&s->postcopy_pause_sem, 0);
> >  
> >      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
> >  
> > @@ -1942,6 +1947,69 @@ static bool postcopy_should_start(MigrationState *s)
> >  }
> >  
> >  /*
> > + * We don't return until we are in a safe state to continue current
> > + * postcopy migration.  Returns true to continue the migration, or
> > + * false to terminate current migration.
> > + */
> > +static bool postcopy_pause(MigrationState *s)
> > +{
> > +    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> 
> I never like asserts on the sending side.

Indeed aborting on source side is dangerous (e.g., source may loss
data). However this is definitely a "valid assertion" - if current
state is not "postcopy-active", we should be in a very strange state.
If we just continue to run the latter codes, imho it is as dangerous
as if we assert() here and stop the program. Even, that may be more
dangerous considering that we don't really know what will happen
next...

> 
> > +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > +
> > +    /* Current channel is possibly broken. Release it. */
> > +    assert(s->to_dst_file);
> > +    qemu_file_shutdown(s->to_dst_file);
> > +    qemu_fclose(s->to_dst_file);
> > +    s->to_dst_file = NULL;
> 
> That does scare me a little; I think it's OK, I'm not sure what happens
> to the ->from_dst_file fd and the return-path processing.

For sockets: I think the QIOChannelSocket.fd will be set to -1 during
close, then the return path code will not be able to read from that
channel any more (it'll get -EIO then as well), and it'll pause as
well. If it was blocking at recvmsg(), it should return with a
failure.

But yes, I think there may have possible indeed risk conditions
between the to/from QEMUFiles considering they are sharing the same
channel... Maybe that is a separate problem of "whether QIO channel
codes are thread safe"? I am not sure of it yet, otherwise we may need
some locking mechanism.

> 
> > +    /*
> > +     * We wait until things fixed up. Then someone will setup the
> > +     * status back for us.
> > +     */
> > +    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        qemu_sem_wait(&s->postcopy_pause_sem);
> > +    }
> 
> Something should get written to stderr prior to this, so when we
> find a migration apparently stuck we can tell why.

Yes I think so.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-08-01  2:25     ` Peter Xu
@ 2017-08-01  8:32       ` Daniel P. Berrange
  2017-08-01  8:55         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Daniel P. Berrange @ 2017-08-01  8:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Laurent Vivier,
	Alexey Perevalov, Juan Quintela, Andrea Arcangeli, stefanha

On Tue, Aug 01, 2017 at 10:25:19AM +0800, Peter Xu wrote:
> On Mon, Jul 31, 2017 at 05:53:39PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > When accept failed, we should setup errp with the reason. More
> > > importantly, the caller may assume errp be non-NULL when error happens,
> > > and not setting the errp may crash QEMU.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  io/channel-socket.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > index 53386b7..7bc308e 100644
> > > --- a/io/channel-socket.c
> > > +++ b/io/channel-socket.c
> > > @@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > >          if (errno == EINTR) {
> > >              goto retry;
> > >          }
> > > +        error_setg_errno(errp, errno, "Unable to accept connection");
> > >          goto error;
> > 
> > OK, but this code actually has a bigger problem as well:
> > 
> > the original is:
> > 
> >     cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
> >                            &cioc->remoteAddrLen);
> >     if (cioc->fd < 0) {
> >         trace_qio_channel_socket_accept_fail(ioc);
> >         if (errno == EINTR) {
> >             goto retry;
> >         }
> >         goto error;
> >     }
> > 
> > Stefan confirmed that trace_ doesn't preserve errno; so the if
> > following it is wrong.  It needs to preserve errno.
> 
> Ah... If so, not sure whether we can do the reservation in trace codes
> in general?
> 
> For this one, I can just move the trace_*() below the errno check.
> After all, if EINTR is got, it's not really a fail, so imho we should
> not trace it with "accept fail".

Agreed, we just need to move the trace below the if.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert()
  2017-08-01  2:43     ` Peter Xu
@ 2017-08-01  8:40       ` Dr. David Alan Gilbert
  2017-08-02  3:20         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01  8:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Jul 31, 2017 at 06:11:56PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > It is used to invert the whole bitmap.
> > 
> > Would it be easier to change bitmap_complement to use ^
> > in it's macro and slow_bitmap_complement, and then you could call it
> > with src==dst  to do the same thing with just that small change?
> 
> Or, I can directly use that and drop this patch. :-)

Yes, that's fine - note the only difference I see is what happens to the
bits in the last word after the end of the count; your code leaves them
as is, the complement code will zero them on the destination I think.

Dave

> (I didn't really notice that one before)
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast"
  2017-08-01  3:13     ` Peter Xu
@ 2017-08-01  8:50       ` Dr. David Alan Gilbert
  2017-08-02  3:31         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01  8:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Jul 31, 2017 at 07:52:24PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > This provides a way to start postcopy ASAP when migration starts. To do
> > > this, we need both:
> > > 
> > >   -global migration.x-postcopy-ram=on \
> > >   -global migration.x-postcopy-fast=on
> > 
> > Can you explain why this is necessary?  Both sides already know
> > they're doing a postcopy recovery don't they?
> 
> What I wanted to do here is to provide a way to start postcopy at the
> very beginning (actually it'll possibly start postcopy at the first
> loop in migration_thread), instead of start postcopy until we trigger
> it using "migrate_start_postcopy" command.
> 
> I used it for easier debugging (so I don't need to type
> "migrate_start_postcopy" every time when I trigger postcopy
> migration), meanwhile I think it can also be used when someone really
> want to start postcopy from the very beginning.
> 
> Would such a new parameter makes sense?

Other than debugging, I don't think there's a real use for it; the
slight delay between starting migration and triggering postcopy has
very little cost.

Dave

> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-08-01  8:32       ` Daniel P. Berrange
@ 2017-08-01  8:55         ` Dr. David Alan Gilbert
  2017-08-02  3:21           ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01  8:55 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Peter Xu, qemu-devel, Laurent Vivier, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, stefanha

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Aug 01, 2017 at 10:25:19AM +0800, Peter Xu wrote:
> > On Mon, Jul 31, 2017 at 05:53:39PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > When accept failed, we should setup errp with the reason. More
> > > > importantly, the caller may assume errp be non-NULL when error happens,
> > > > and not setting the errp may crash QEMU.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  io/channel-socket.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > index 53386b7..7bc308e 100644
> > > > --- a/io/channel-socket.c
> > > > +++ b/io/channel-socket.c
> > > > @@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > >          if (errno == EINTR) {
> > > >              goto retry;
> > > >          }
> > > > +        error_setg_errno(errp, errno, "Unable to accept connection");
> > > >          goto error;
> > > 
> > > OK, but this code actually has a bigger problem as well:
> > > 
> > > the original is:
> > > 
> > >     cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
> > >                            &cioc->remoteAddrLen);
> > >     if (cioc->fd < 0) {
> > >         trace_qio_channel_socket_accept_fail(ioc);
> > >         if (errno == EINTR) {
> > >             goto retry;
> > >         }
> > >         goto error;
> > >     }
> > > 
> > > Stefan confirmed that trace_ doesn't preserve errno; so the if
> > > following it is wrong.  It needs to preserve errno.
> > 
> > Ah... If so, not sure whether we can do the reservation in trace codes
> > in general?
> > 
> > For this one, I can just move the trace_*() below the errno check.
> > After all, if EINTR is got, it's not really a fail, so imho we should
> > not trace it with "accept fail".
> 
> Agreed, we just need to move the trace below the if.

Peter: Can you split this as a separate patch and it seems OK to try and
put this in 2.10 since it's a strict bug fix.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-08-01  9:47   ` Dr. David Alan Gilbert
  2017-08-02  5:06     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01  9:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> When there is IO error on the incoming channel (e.g., network down),
> instead of bailing out immediately, we allow the dst vm to switch to the
> new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> new semaphore, until someone poke it for another attempt.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  |  1 +
>  migration/migration.h  |  3 +++
>  migration/savevm.c     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  migration/trace-events |  2 ++
>  4 files changed, 51 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0bc70c8..c729c5a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -146,6 +146,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
>          memset(&mis_current, 0, sizeof(MigrationIncomingState));
>          qemu_mutex_init(&mis_current.rp_mutex);
>          qemu_event_init(&mis_current.main_thread_load_event, false);
> +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
>          once = true;
>      }
>      return &mis_current;
> diff --git a/migration/migration.h b/migration/migration.h
> index 24cdaf6..08b90e8 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -60,6 +60,9 @@ struct MigrationIncomingState {
>      /* The coroutine we should enter (back) after failover */
>      Coroutine *migration_incoming_co;
>      QemuSemaphore colo_incoming_sem;
> +
> +    /* notify PAUSED postcopy incoming migrations to try to continue */
> +    QemuSemaphore postcopy_pause_sem_dst;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 13ae9d6..1f62268 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1954,11 +1954,41 @@ void qemu_loadvm_state_cleanup(void)
>      }
>  }
>  
> +/* Return true if we should continue the migration, or false. */
> +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_incoming();
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +    assert(mis->from_src_file);
> +    qemu_file_shutdown(mis->from_src_file);
> +    qemu_fclose(mis->from_src_file);
> +    mis->from_src_file = NULL;
> +
> +    assert(mis->to_src_file);
> +    qemu_mutex_lock(&mis->rp_mutex);
> +    qemu_file_shutdown(mis->to_src_file);
> +    qemu_fclose(mis->to_src_file);
> +    mis->to_src_file = NULL;
> +    qemu_mutex_unlock(&mis->rp_mutex);

Hmm is that safe?  If we look at migrate_send_rp_message we have:

    static void migrate_send_rp_message(MigrationIncomingState *mis,
                                        enum mig_rp_message_type message_type,
                                        uint16_t len, void *data)
    {
        trace_migrate_send_rp_message((int)message_type, len);
        qemu_mutex_lock(&mis->rp_mutex);
        qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
        qemu_put_be16(mis->to_src_file, len);
        qemu_put_buffer(mis->to_src_file, data, len);
        qemu_fflush(mis->to_src_file);
        qemu_mutex_unlock(&mis->rp_mutex);
    }

If we came into postcopy_pause_incoming at about the same time
migrate_send_rp_message was being called and pause_incoming took the
lock first, then once it release the lock, send_rp_message carries on
and uses mis->to_src_file that's now NULL.

One solution here is to just call qemu_file_shutdown() but leave the
files open at this point, but clean the files up sometime later.

> +
> +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> +    }
> +
> +    trace_postcopy_pause_incoming_continued();
> +
> +    return true;
> +}
> +
>  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>      uint8_t section_type;
>      int ret = 0;
>  
> +retry:
>      while (true) {
>          section_type = qemu_get_byte(f);
>  
> @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  out:
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
> +
> +        /*
> +         * Detect whether it is:
> +         *
> +         * 1. postcopy running
> +         * 2. network failure (-EIO)
> +         *
> +         * If so, we try to wait for a recovery.
> +         */
> +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> +            ret == -EIO && postcopy_pause_incoming(mis)) {
> +            /* Reset f to point to the newly created channel */
> +            f = mis->from_src_file;
> +            goto retry;
> +        }

I wonder if:

           if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
               ret == -EIO && postcopy_pause_incoming(mis)) {
               /* Try again after postcopy recovery */
               return qemu_loadvm_state_main(mis->from_src_file, mis);
           }
would be nicer; it avoids the goto loop.

Dave

>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 2211acc..22a629e 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -99,6 +99,8 @@ open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
>  postcopy_pause_continued(void) ""
> +postcopy_pause_incoming(void) ""
> +postcopy_pause_incoming_continued(void) ""
>  postcopy_start_set_run(void) ""
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 13/29] migration: allow src return path to pause
  2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
@ 2017-08-01 10:01   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 10:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Let the thread pause for network issues.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--
>  migration/migration.h  |  1 +
>  migration/trace-events |  2 ++
>  3 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c729c5a..d0b9a86 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -996,6 +996,7 @@ static void migrate_fd_cleanup(void *opaque)
>      block_cleanup_parameters(s);
>  
>      qemu_sem_destroy(&s->postcopy_pause_sem);
> +    qemu_sem_destroy(&s->postcopy_pause_rp_sem);
>  }
>  
>  void migrate_fd_error(MigrationState *s, const Error *error)
> @@ -1140,6 +1141,7 @@ MigrationState *migrate_init(void)
>      error_free(s->error);
>      s->error = NULL;
>      qemu_sem_init(&s->postcopy_pause_sem, 0);
> +    qemu_sem_init(&s->postcopy_pause_rp_sem, 0);
>  
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
> @@ -1527,6 +1529,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>      }
>  }
>  
> +/* Return true to retry, false to quit */
> +static bool postcopy_pause_return_path_thread(MigrationState *s)
> +{
> +    trace_postcopy_pause_return_path();
> +
> +    qemu_sem_wait(&s->postcopy_pause_rp_sem);
> +
> +    trace_postcopy_pause_return_path_continued();
> +
> +    return true;
> +}
> +
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -1543,6 +1557,8 @@ static void *source_return_path_thread(void *opaque)
>      int res;
>  
>      trace_source_return_path_thread_entry();
> +
> +retry:
>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
>             migration_is_setup_or_active(ms->state)) {
>          trace_source_return_path_thread_loop_top();
> @@ -1634,13 +1650,28 @@ static void *source_return_path_thread(void *opaque)
>              break;
>          }
>      }
> -    if (qemu_file_get_error(rp)) {
> +
> +out:
> +    res = qemu_file_get_error(rp);
> +    if (res) {
> +        if (res == -EIO) {
> +            /*
> +             * Maybe there is something we can do: it looks like a
> +             * network down issue, and we pause for a recovery.
> +             */
> +            if (postcopy_pause_return_path_thread(ms)) {
> +                /* Reload rp, reset the rest */
> +                rp = ms->rp_state.from_dst_file;
> +                ms->rp_state.error = false;
> +                goto retry;

The recursion trick I suggested in the previous patch might
also work here.

but it's OK, so

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +            }
> +        }
> +
>          trace_source_return_path_thread_bad_end();
>          mark_source_rp_bad(ms);
>      }
>  
>      trace_source_return_path_thread_end();
> -out:
>      ms->rp_state.from_dst_file = NULL;
>      qemu_fclose(rp);
>      return NULL;
> diff --git a/migration/migration.h b/migration/migration.h
> index 08b90e8..7aaab13 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -157,6 +157,7 @@ struct MigrationState
>  
>      /* Needed by postcopy-pause state */
>      QemuSemaphore postcopy_pause_sem;
> +    QemuSemaphore postcopy_pause_rp_sem;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/trace-events b/migration/trace-events
> index 22a629e..a269eec 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -98,6 +98,8 @@ migration_thread_setup_complete(void) ""
>  open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
> +postcopy_pause_return_path(void) ""
> +postcopy_pause_return_path_continued(void) ""
>  postcopy_pause_continued(void) ""
>  postcopy_pause_incoming(void) ""
>  postcopy_pause_incoming_continued(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail
  2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
@ 2017-08-01 10:30   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 10:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> We will not allow failures to happen when sending data from destination
> to source via the return path. However it is possible that there can be
> errors along the way.  This patch allows the migrate_send_rp_message()
> to return error when it happens, and further extended it to
> migrate_send_rp_req_pages().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 38 ++++++++++++++++++++++++++++++--------
>  migration/migration.h |  2 +-
>  2 files changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index d0b9a86..9a0b5b0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -195,17 +195,35 @@ static void deferred_incoming_migration(Error **errp)
>   * Send a message on the return channel back to the source
>   * of the migration.
>   */
> -static void migrate_send_rp_message(MigrationIncomingState *mis,
> -                                    enum mig_rp_message_type message_type,
> -                                    uint16_t len, void *data)
> +static int migrate_send_rp_message(MigrationIncomingState *mis,
> +                                   enum mig_rp_message_type message_type,
> +                                   uint16_t len, void *data)
>  {
> +    int ret = 0;
> +
>      trace_migrate_send_rp_message((int)message_type, len);
>      qemu_mutex_lock(&mis->rp_mutex);
> +
> +    /*
> +     * It's possible that the file handle got lost due to network
> +     * failures.
> +     */
> +    if (!mis->to_src_file) {
> +        ret = -EIO;
> +        goto error;
> +    }
> +

Right, and this answers one of my questions from the previous patches.



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>      qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
>      qemu_put_be16(mis->to_src_file, len);
>      qemu_put_buffer(mis->to_src_file, data, len);
>      qemu_fflush(mis->to_src_file);
> +
> +    /* It's possible that qemu file got error during sending */
> +    ret = qemu_file_get_error(mis->to_src_file);
> +
> +error:
>      qemu_mutex_unlock(&mis->rp_mutex);
> +    return ret;
>  }
>  
>  /* Request a range of pages from the source VM at the given
> @@ -215,26 +233,30 @@ static void migrate_send_rp_message(MigrationIncomingState *mis,
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> -                               ram_addr_t start, size_t len)
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> +                              ram_addr_t start, size_t len)
>  {
>      uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>      size_t msglen = 12; /* start + len */
> +    int rbname_len;
> +    enum mig_rp_message_type msg_type;
>  
>      *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
>      *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
>  
>      if (rbname) {
> -        int rbname_len = strlen(rbname);
> +        rbname_len = strlen(rbname);
>          assert(rbname_len < 256);
>  
>          bufc[msglen++] = rbname_len;
>          memcpy(bufc + msglen, rbname, rbname_len);
>          msglen += rbname_len;
> -        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
> +        msg_type = MIG_RP_MSG_REQ_PAGES_ID;
>      } else {
> -        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
> +        msg_type = MIG_RP_MSG_REQ_PAGES;
>      }
> +
> +    return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
> diff --git a/migration/migration.h b/migration/migration.h
> index 7aaab13..047872b 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -201,7 +201,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>                            uint32_t value);
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
>  
>  #endif
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause
  2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
@ 2017-08-01 10:41   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 10:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Allows the fault thread to stop handling page faults temporarily. When
> network failure happened (and if we expect a recovery afterwards), we
> should not allow the fault thread to continue sending things to source,
> instead, it should halt for a while until the connection is rebuilt.
> 
> When the dest main thread noticed the failure, it kicks the fault thread
> to switch to pause state.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c    |  1 +
>  migration/migration.h    |  1 +
>  migration/postcopy-ram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++----
>  migration/savevm.c       |  3 +++
>  migration/trace-events   |  2 ++
>  5 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9a0b5b0..9d93836 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -147,6 +147,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
>          qemu_mutex_init(&mis_current.rp_mutex);
>          qemu_event_init(&mis_current.main_thread_load_event, false);
>          qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
> +        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
>          once = true;
>      }
>      return &mis_current;
> diff --git a/migration/migration.h b/migration/migration.h
> index 047872b..574fedd 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -63,6 +63,7 @@ struct MigrationIncomingState {
>  
>      /* notify PAUSED postcopy incoming migrations to try to continue */
>      QemuSemaphore postcopy_pause_sem_dst;
> +    QemuSemaphore postcopy_pause_sem_fault;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 9ce391d..ba53155 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -418,6 +418,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_fault_thread();
> +
> +    qemu_sem_wait(&mis->postcopy_pause_sem_fault);
> +
> +    trace_postcopy_pause_fault_thread_continued();
> +
> +    return true;
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -465,6 +476,22 @@ static void *postcopy_ram_fault_thread(void *opaque)
>              }
>          }
>  
> +        if (!mis->to_src_file) {
> +            /*
> +             * Possibly someone tells us that the return path is
> +             * broken already using the event. We should hold until
> +             * the channel is rebuilt.
> +             */
> +            if (postcopy_pause_fault_thread(mis)) {
> +                last_rb = NULL;
> +                /* Continue to read the userfaultfd */
> +            } else {
> +                error_report("%s: paused but don't allow to continue",
> +                             __func__);
> +                break;
> +            }
> +        }
> +
>          ret = read(mis->userfault_fd, &msg, sizeof(msg));
>          if (ret != sizeof(msg)) {
>              if (errno == EAGAIN) {
> @@ -504,18 +531,33 @@ static void *postcopy_ram_fault_thread(void *opaque)
>                                                  qemu_ram_get_idstr(rb),
>                                                  rb_offset);
>  
> +retry:
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
>           */
>          if (rb != last_rb) {
>              last_rb = rb;
> -            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
> -                                     rb_offset, qemu_ram_pagesize(rb));
> +            ret = migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
> +                                            rb_offset, qemu_ram_pagesize(rb));
>          } else {
>              /* Save some space */
> -            migrate_send_rp_req_pages(mis, NULL,
> -                                     rb_offset, qemu_ram_pagesize(rb));
> +            ret = migrate_send_rp_req_pages(mis, NULL,
> +                                            rb_offset, qemu_ram_pagesize(rb));
> +        }
> +
> +        if (ret) {
> +            /* May be network failure, try to wait for recovery */
> +            if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> +                /* We got reconnected somehow, try to continue */
> +                last_rb = NULL;
> +                goto retry;
> +            } else {
> +                /* This is a unavoidable fault */
> +                error_report("%s: migrate_send_rp_req_pages() get %d",
> +                             __func__, ret);
> +                break;
> +            }
>          }
>      }
>      trace_postcopy_ram_fault_thread_exit();
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1f62268..386788d 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1974,6 +1974,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>      mis->to_src_file = NULL;
>      qemu_mutex_unlock(&mis->rp_mutex);
>  
> +    /* Notify the fault thread for the invalidated file handle */
> +    postcopy_fault_thread_notify(mis);
> +
>      while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
>          qemu_sem_wait(&mis->postcopy_pause_sem_dst);
>      }
> diff --git a/migration/trace-events b/migration/trace-events
> index a269eec..dbb4971 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -100,6 +100,8 @@ open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
>  postcopy_pause_return_path(void) ""
>  postcopy_pause_return_path_continued(void) ""
> +postcopy_pause_fault_thread(void) ""
> +postcopy_pause_fault_thread_continued(void) ""
>  postcopy_pause_continued(void) ""
>  postcopy_pause_incoming(void) ""
>  postcopy_pause_incoming_continued(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
  2017-07-28 15:57   ` Eric Blake
@ 2017-08-01 10:42   ` Dr. David Alan Gilbert
  2017-08-01 11:03   ` Daniel P. Berrange
  2 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 10:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> It will be used when we want to resume one paused migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Other than Eric's comments:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  hmp-commands.hx       | 7 ++++---
>  hmp.c                 | 4 +++-
>  migration/migration.c | 2 +-
>  qapi-schema.json      | 5 ++++-
>  4 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 1941e19..7adb029 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -928,13 +928,14 @@ ETEXI
>  
>      {
>          .name       = "migrate",
> -        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> -        .params     = "[-d] [-b] [-i] uri",
> +        .args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
> +        .params     = "[-d] [-b] [-i] [-r] uri",
>          .help       = "migrate to URI (using -d to not wait for completion)"
>  		      "\n\t\t\t -b for migration without shared storage with"
>  		      " full copy of disk\n\t\t\t -i for migration without "
>  		      "shared storage with incremental copy of disk "
> -		      "(base image shared between src and destination)",
> +		      "(base image shared between src and destination)"
> +                      "\n\t\t\t -r to resume a paused migration",
>          .cmd        = hmp_migrate,
>      },
>  
> diff --git a/hmp.c b/hmp.c
> index fd80dce..ebc1563 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1891,10 +1891,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>      bool detach = qdict_get_try_bool(qdict, "detach", false);
>      bool blk = qdict_get_try_bool(qdict, "blk", false);
>      bool inc = qdict_get_try_bool(qdict, "inc", false);
> +    bool resume = qdict_get_try_bool(qdict, "resume", false);
>      const char *uri = qdict_get_str(qdict, "uri");
>      Error *err = NULL;
>  
> -    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
> +    qmp_migrate(uri, !!blk, blk, !!inc, inc,
> +                false, false, true, resume, &err);
>      if (err) {
>          error_report_err(err);
>          return;
> diff --git a/migration/migration.c b/migration/migration.c
> index 9d93836..36ff8c3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1238,7 +1238,7 @@ bool migration_is_blocked(Error **errp)
>  
>  void qmp_migrate(const char *uri, bool has_blk, bool blk,
>                   bool has_inc, bool inc, bool has_detach, bool detach,
> -                 Error **errp)
> +                 bool has_resume, bool resume, Error **errp)
>  {
>      Error *local_err = NULL;
>      MigrationState *s = migrate_get_current();
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 2a36b80..27b7c4c 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3208,6 +3208,8 @@
>  # @detach: this argument exists only for compatibility reasons and
>  #          is ignored by QEMU
>  #
> +# @resume: resume one paused migration
> +#
>  # Returns: nothing on success
>  #
>  # Since: 0.14.0
> @@ -3229,7 +3231,8 @@
>  #
>  ##
>  { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
> +           '*detach': 'bool', '*resume': 'bool' } }
>  
>  ##
>  # @migrate-incoming:
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 19/29] migration: let dst listen on port always
  2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
@ 2017-08-01 10:56   ` Daniel P. Berrange
  2017-08-02  7:02     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Daniel P. Berrange @ 2017-08-01 10:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Fri, Jul 28, 2017 at 04:06:28PM +0800, Peter Xu wrote:
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/exec.c   | 2 +-
>  migration/fd.c     | 2 +-
>  migration/socket.c | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/exec.c b/migration/exec.c
> index 08b599e..b4412db 100644
> --- a/migration/exec.c
> +++ b/migration/exec.c
> @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
>  {
>      migration_channel_process_incoming(ioc);
>      object_unref(OBJECT(ioc));
> -    return FALSE; /* unregister */
> +    return TRUE; /* keep it registered */
>  }
>  
>  void exec_start_incoming_migration(const char *command, Error **errp)
> diff --git a/migration/fd.c b/migration/fd.c
> index 30f5258..865277a 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>  {
>      migration_channel_process_incoming(ioc);
>      object_unref(OBJECT(ioc));
> -    return FALSE; /* unregister */
> +    return TRUE; /* keep it registered */
>  }
>  
>  void fd_start_incoming_migration(const char *infd, Error **errp)
> diff --git a/migration/socket.c b/migration/socket.c
> index 757d382..f2c2d01 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -153,8 +153,8 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
>  
>  out:
>      /* Close listening socket as its no longer needed */
> -    qio_channel_close(ioc, NULL);
> -    return FALSE; /* unregister */
> +    // qio_channel_close(ioc, NULL);
> +    return TRUE; /* keep it registered */
>  }


This is not a very desirable approach IMHO.

There are two separate things at play - first we have the listener socket,
and second we have the I/O watch that monitors for incoming clients.

The current code here closes the listener, and returns FALSE to unregister
the event loop watch.

You're reversing both of these so that we keep the listener open and we
keep monitoring for incoming clients. Ignoring migration resume for a
minute, this means that the destination QEMU will now accept arbitrarily
many incoming clients and keep trying to start a new incoming migration.

The behaviour we need is diferent. We *want* to unregister the event
loop watch once we've accepted a client. We should only keep the socket
listener in existance, but *not* accept any more clients. Only once we
have hit a problem and want to accept a new client to do migration
recovery, should we be re-adding the event loop watch.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 17/29] migration: rebuild channel on source
  2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
@ 2017-08-01 10:59   ` Dr. David Alan Gilbert
  2017-08-02  6:14     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 10:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This patch detects the "resume" flag of migration command, rebuild the
> channels only if the flag is set.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 36ff8c3..64de0ee 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1244,6 +1244,15 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>      MigrationState *s = migrate_get_current();
>      const char *p;
>  
> +    if (has_resume && resume) {
> +        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +            error_setg(errp, "Cannot resume if there is no "
> +                       "paused migration");
> +            return;
> +        }
> +        goto do_resume;
> +    }
> +
>      if (migration_is_setup_or_active(s->state) ||
>          s->state == MIGRATION_STATUS_CANCELLING ||
>          s->state == MIGRATION_STATUS_COLO) {
> @@ -1279,6 +1288,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>  
>      s = migrate_init();
>  
> +do_resume:

Can we find a way to avoid this label?
Perhaps split the bottom half of this function out into a separate
function?

Dave

>      if (strstart(uri, "tcp:", &p)) {
>          tcp_start_outgoing_migration(s, p, &local_err);
>  #ifdef CONFIG_RDMA
> @@ -1700,7 +1710,8 @@ out:
>      return NULL;
>  }
>  
> -static int open_return_path_on_source(MigrationState *ms)
> +static int open_return_path_on_source(MigrationState *ms,
> +                                      bool create_thread)
>  {
>  
>      ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> @@ -1709,6 +1720,12 @@ static int open_return_path_on_source(MigrationState *ms)
>      }
>  
>      trace_open_return_path_on_source();
> +
> +    if (!create_thread) {
> +        /* We're done */
> +        return 0;
> +    }
> +
>      qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>                         source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>  
> @@ -2249,15 +2266,24 @@ static void *migration_thread(void *opaque)
>  
>  void migrate_fd_connect(MigrationState *s)
>  {
> -    s->expected_downtime = s->parameters.downtime_limit;
> -    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
> +    int64_t rate_limit;
> +    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>  
> -    qemu_file_set_blocking(s->to_dst_file, true);
> -    qemu_file_set_rate_limit(s->to_dst_file,
> -                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
> +    if (resume) {
> +        /* This is a resumed migration */
> +        rate_limit = INT64_MAX;
> +    } else {
> +        /* This is a fresh new migration */
> +        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
> +        s->expected_downtime = s->parameters.downtime_limit;
> +        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
>  
> -    /* Notify before starting migration thread */
> -    notifier_list_notify(&migration_state_notifiers, s);
> +        /* Notify before starting migration thread */
> +        notifier_list_notify(&migration_state_notifiers, s);
> +    }
> +
> +    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
> +    qemu_file_set_blocking(s->to_dst_file, true);
>  
>      /*
>       * Open the return path. For postcopy, it is used exclusively. For
> @@ -2265,15 +2291,19 @@ void migrate_fd_connect(MigrationState *s)
>       * QEMU uses the return path.
>       */
>      if (migrate_postcopy_ram() || migrate_use_return_path()) {
> -        if (open_return_path_on_source(s)) {
> +        if (open_return_path_on_source(s, !resume)) {
>              error_report("Unable to open return-path for postcopy");
> -            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> -                              MIGRATION_STATUS_FAILED);
> +            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>              migrate_fd_cleanup(s);
>              return;
>          }
>      }
>  
> +    if (resume) {
> +        /* TODO: do the resume logic */
> +        return;
> +    }
> +
>      qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
>                         QEMU_THREAD_JOINABLE);
>      s->migration_thread_running = true;
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
  2017-07-28 15:57   ` Eric Blake
  2017-08-01 10:42   ` Dr. David Alan Gilbert
@ 2017-08-01 11:03   ` Daniel P. Berrange
  2017-08-02  5:56     ` Peter Xu
  2 siblings, 1 reply; 116+ messages in thread
From: Daniel P. Berrange @ 2017-08-01 11:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Fri, Jul 28, 2017 at 04:06:25PM +0800, Peter Xu wrote:
> It will be used when we want to resume one paused migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hmp-commands.hx       | 7 ++++---
>  hmp.c                 | 4 +++-
>  migration/migration.c | 2 +-
>  qapi-schema.json      | 5 ++++-
>  4 files changed, 12 insertions(+), 6 deletions(-)

I'm not seeing explicit info about how we handle the original failure
and how it relates to this resume command, but this feels like a
potentially racy approach to me.

If we have a network problem between source & target, we could see
two results. Either the TCP stream will simply hang (it'll still
appear open to QEMU but no traffic will be flowing), or the connection
may actually break such that we get EOF and end up closing the file
descriptor.

In the latter case, we're ok because the original channel is now
gone and we can safely establish the new one by issuing the new
'migrate --resume URI' command.

In the former case, however, there is the possibility that the
hang may come back to life at some point, concurrently with us
trying to do 'migrate --resume URI' and I'm unclear on the
semantics if that happens.

Should the original connection carry on, and thus cause the
'migrate --resume' command to fail, or will we forcably terminate
the original connection no matter what and use the new "resumed"
connection.

There's also synchronization with the target host - at the time we
want to recover, we need to be able to tell the target to accept
new incoming clients again, but we don't want to do that if the
original connection comes back to life.

It feels to me that if the mgmt app or admin believes the migration
is in a stuck state, we should be able to explicitly terminate the
existing connection via a monitor command. Then setup the target
host to accept new client, and then issue this migrate resume on
the source.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover"
  2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
@ 2017-08-01 11:36   ` Dr. David Alan Gilbert
  2017-08-02  6:42     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-01 11:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing new migration state "postcopy-recover". If a migration
> procedure is paused and the connection is rebuilt afterward
> successfully, we'll switch the source VM state from "postcopy-paused" to
> the new state "postcopy-recover", then we'll do the resume logic in the
> migration thread (along with the return path thread).
> 
> This patch only do the state switch on source side. Another following up
> patch will handle the state switching on destination side using the same
> status bit.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 45 +++++++++++++++++++++++++++++++++++++++++----
>  qapi-schema.json      |  4 +++-
>  2 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 64de0ee..3aabe11 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -495,6 +495,7 @@ static bool migration_is_setup_or_active(int state)
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>      case MIGRATION_STATUS_SETUP:
>          return true;
>  
> @@ -571,6 +572,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIGRATION_STATUS_CANCELLING:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>           /* TODO add some postcopy stats */
>          info->has_status = true;
>          info->has_total_time = true;
> @@ -2018,6 +2020,13 @@ static bool postcopy_should_start(MigrationState *s)
>      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
>  }
>  
> +/* Return zero if success, or <0 for error */
> +static int postcopy_do_resume(MigrationState *s)
> +{
> +    /* TODO: do the resume logic */
> +    return 0;
> +}
> +
>  /*
>   * We don't return until we are in a safe state to continue current
>   * postcopy migration.  Returns true to continue the migration, or
> @@ -2026,7 +2035,9 @@ static bool postcopy_should_start(MigrationState *s)
>  static bool postcopy_pause(MigrationState *s)
>  {
>      assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> -    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +
> +do_pause:
> +    migrate_set_state(&s->state, s->state,
>                        MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
>      /* Current channel is possibly broken. Release it. */
> @@ -2043,9 +2054,32 @@ static bool postcopy_pause(MigrationState *s)
>          qemu_sem_wait(&s->postcopy_pause_sem);
>      }
>  
> -    trace_postcopy_pause_continued();
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        /* We were waken up by a recover procedure. Give it a shot */
>  
> -    return true;
> +        /*
> +         * Firstly, let's wake up the return path now, with a new
> +         * return path channel.
> +         */
> +        qemu_sem_post(&s->postcopy_pause_rp_sem);
> +
> +        /* Do the resume logic */
> +        if (postcopy_do_resume(s) == 0) {
> +            /* Let's continue! */
> +            trace_postcopy_pause_continued();
> +            return true;
> +        } else {
> +            /*
> +             * Something wrong happened during the recovery, let's
> +             * pause again. Pause is always better than throwing data
> +             * away.
> +             */
> +            goto do_pause;

You should be able to turn this around into a do {} while or similar
rather than goto.

Dave

> +        }
> +    } else {
> +        /* This is not right... Time to quit. */
> +        return false;
> +    }
>  }
>  
>  /* Return true if we want to stop the migration, otherwise false. */
> @@ -2300,7 +2334,10 @@ void migrate_fd_connect(MigrationState *s)
>      }
>  
>      if (resume) {
> -        /* TODO: do the resume logic */
> +        /* Wakeup the main migration thread to do the recovery */
> +        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> +                          MIGRATION_STATUS_POSTCOPY_RECOVER);
> +        qemu_sem_post(&s->postcopy_pause_sem);
>          return;
>      }
>  
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 27b7c4c..10f1f60 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -669,6 +669,8 @@
>  #
>  # @postcopy-paused: during postcopy but paused. (since 2.10)
>  #
> +# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
> +#
>  # @completed: migration is finished.
>  #
>  # @failed: some error occurred during migration process.
> @@ -682,7 +684,7 @@
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
>              'active', 'postcopy-active', 'postcopy-paused',
> -            'completed', 'failed', 'colo' ] }
> +            'postcopy-recover', 'completed', 'failed', 'colo' ] }
>  
>  ##
>  # @MigrationInfo:
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert()
  2017-08-01  8:40       ` Dr. David Alan Gilbert
@ 2017-08-02  3:20         ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02  3:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Tue, Aug 01, 2017 at 09:40:09AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, Jul 31, 2017 at 06:11:56PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > It is used to invert the whole bitmap.
> > > 
> > > Would it be easier to change bitmap_complement to use ^
> > > in it's macro and slow_bitmap_complement, and then you could call it
> > > with src==dst  to do the same thing with just that small change?
> > 
> > Or, I can directly use that and drop this patch. :-)
> 
> Yes, that's fine - note the only difference I see is what happens to the
> bits in the last word after the end of the count; your code leaves them
> as is, the complement code will zero them on the destination I think.

I see.  I believe both should work since bitmap users should not
use those bits after all (considering those bits are outside range of
valid bits when declaring the bitmap).  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling
  2017-08-01  8:55         ` Dr. David Alan Gilbert
@ 2017-08-02  3:21           ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02  3:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel P. Berrange, qemu-devel, Laurent Vivier, Alexey Perevalov,
	Juan Quintela, Andrea Arcangeli, stefanha

On Tue, Aug 01, 2017 at 09:55:08AM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, Aug 01, 2017 at 10:25:19AM +0800, Peter Xu wrote:
> > > On Mon, Jul 31, 2017 at 05:53:39PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > When accept failed, we should setup errp with the reason. More
> > > > > importantly, the caller may assume errp be non-NULL when error happens,
> > > > > and not setting the errp may crash QEMU.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > >  io/channel-socket.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > index 53386b7..7bc308e 100644
> > > > > --- a/io/channel-socket.c
> > > > > +++ b/io/channel-socket.c
> > > > > @@ -344,6 +344,7 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > >          if (errno == EINTR) {
> > > > >              goto retry;
> > > > >          }
> > > > > +        error_setg_errno(errp, errno, "Unable to accept connection");
> > > > >          goto error;
> > > > 
> > > > OK, but this code actually has a bigger problem as well:
> > > > 
> > > > the original is:
> > > > 
> > > >     cioc->fd = qemu_accept(ioc->fd, (struct sockaddr *)&cioc->remoteAddr,
> > > >                            &cioc->remoteAddrLen);
> > > >     if (cioc->fd < 0) {
> > > >         trace_qio_channel_socket_accept_fail(ioc);
> > > >         if (errno == EINTR) {
> > > >             goto retry;
> > > >         }
> > > >         goto error;
> > > >     }
> > > > 
> > > > Stefan confirmed that trace_ doesn't preserve errno; so the if
> > > > following it is wrong.  It needs to preserve errno.
> > > 
> > > Ah... If so, not sure whether we can do the reservation in trace codes
> > > in general?
> > > 
> > > For this one, I can just move the trace_*() below the errno check.
> > > After all, if EINTR is got, it's not really a fail, so imho we should
> > > not trace it with "accept fail".
> > 
> > Agreed, we just need to move the trace below the if.
> 
> Peter: Can you split this as a separate patch and it seems OK to try and
> put this in 2.10 since it's a strict bug fix.

Sure!  Then I'll possibly include the comment fix patch as well.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast"
  2017-08-01  8:50       ` Dr. David Alan Gilbert
@ 2017-08-02  3:31         ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Tue, Aug 01, 2017 at 09:50:02AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, Jul 31, 2017 at 07:52:24PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > This provides a way to start postcopy ASAP when migration starts. To do
> > > > this, we need both:
> > > > 
> > > >   -global migration.x-postcopy-ram=on \
> > > >   -global migration.x-postcopy-fast=on
> > > 
> > > Can you explain why this is necessary?  Both sides already know
> > > they're doing a postcopy recovery don't they?
> > 
> > What I wanted to do here is to provide a way to start postcopy at the
> > very beginning (actually it'll possibly start postcopy at the first
> > loop in migration_thread), instead of start postcopy until we trigger
> > it using "migrate_start_postcopy" command.
> > 
> > I used it for easier debugging (so I don't need to type
> > "migrate_start_postcopy" every time when I trigger postcopy
> > migration), meanwhile I think it can also be used when someone really
> > want to start postcopy from the very beginning.
> > 
> > Would such a new parameter makes sense?
> 
> Other than debugging, I don't think there's a real use for it; the
> slight delay between starting migration and triggering postcopy has
> very little cost.

Then let me drop this patch in next version. I do think I should avoid
introducing too many things "for debugging only"...

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-08-01  9:47   ` Dr. David Alan Gilbert
@ 2017-08-02  5:06     ` Peter Xu
  2017-08-03 14:03       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-02  5:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:

[...]

> > +/* Return true if we should continue the migration, or false. */
> > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > +{
> > +    trace_postcopy_pause_incoming();
> > +
> > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > +
> > +    assert(mis->from_src_file);
> > +    qemu_file_shutdown(mis->from_src_file);
> > +    qemu_fclose(mis->from_src_file);
> > +    mis->from_src_file = NULL;
> > +
> > +    assert(mis->to_src_file);
> > +    qemu_mutex_lock(&mis->rp_mutex);
> > +    qemu_file_shutdown(mis->to_src_file);
> > +    qemu_fclose(mis->to_src_file);
> > +    mis->to_src_file = NULL;
> > +    qemu_mutex_unlock(&mis->rp_mutex);
> 
> Hmm is that safe?  If we look at migrate_send_rp_message we have:
> 
>     static void migrate_send_rp_message(MigrationIncomingState *mis,
>                                         enum mig_rp_message_type message_type,
>                                         uint16_t len, void *data)
>     {
>         trace_migrate_send_rp_message((int)message_type, len);
>         qemu_mutex_lock(&mis->rp_mutex);
>         qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
>         qemu_put_be16(mis->to_src_file, len);
>         qemu_put_buffer(mis->to_src_file, data, len);
>         qemu_fflush(mis->to_src_file);
>         qemu_mutex_unlock(&mis->rp_mutex);
>     }
> 
> If we came into postcopy_pause_incoming at about the same time
> migrate_send_rp_message was being called and pause_incoming took the
> lock first, then once it release the lock, send_rp_message carries on
> and uses mis->to_src_file that's now NULL.
> 
> One solution here is to just call qemu_file_shutdown() but leave the
> files open at this point, but clean the files up sometime later.

I see the commnent on patch 14 as well - yeah, we need patch 14 to
co-op here, and as long as we are with patch 14, we should be ok.

> 
> > +
> > +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > +    }
> > +
> > +    trace_postcopy_pause_incoming_continued();
> > +
> > +    return true;
> > +}
> > +
> >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >  {
> >      uint8_t section_type;
> >      int ret = 0;
> >  
> > +retry:
> >      while (true) {
> >          section_type = qemu_get_byte(f);
> >  
> > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> >  out:
> >      if (ret < 0) {
> >          qemu_file_set_error(f, ret);
> > +
> > +        /*
> > +         * Detect whether it is:
> > +         *
> > +         * 1. postcopy running
> > +         * 2. network failure (-EIO)
> > +         *
> > +         * If so, we try to wait for a recovery.
> > +         */
> > +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > +            ret == -EIO && postcopy_pause_incoming(mis)) {
> > +            /* Reset f to point to the newly created channel */
> > +            f = mis->from_src_file;
> > +            goto retry;
> > +        }
> 
> I wonder if:
> 
>            if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
>                ret == -EIO && postcopy_pause_incoming(mis)) {
>                /* Try again after postcopy recovery */
>                return qemu_loadvm_state_main(mis->from_src_file, mis);
>            }
> would be nicer; it avoids the goto loop.

I agree we should avoid using goto loops. However I do see vast usages
of goto like this one when we want to redo part of the procedures of a
function (or, of course, when handling errors in "C-style").

Calling qemu_loadvm_state_main() inside itself is ok as well, but it
also has defect: stack usage would be out of control, or even, it can
be controled by malicious users. E.g., if someone used program to
periodically stop/start any network endpoint along the migration
network, QEMU may go into a paused -> recovery -> active -> paused ...
loop, and stack usage will just grow with time. I'd say it's an
extreme example though...

(Another way besides above two: maybe we can just return in
 qemu_loadvm_state_main with something like -EAGAIN, then the caller
 of qemu_loadvm_state_main can re-call it when necessary, though I
 would prefer "goto is okay here"... :)

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-08-01 11:03   ` Daniel P. Berrange
@ 2017-08-02  5:56     ` Peter Xu
  2017-08-02  9:28       ` Daniel P. Berrange
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-02  5:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Tue, Aug 01, 2017 at 12:03:48PM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 28, 2017 at 04:06:25PM +0800, Peter Xu wrote:
> > It will be used when we want to resume one paused migration.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hmp-commands.hx       | 7 ++++---
> >  hmp.c                 | 4 +++-
> >  migration/migration.c | 2 +-
> >  qapi-schema.json      | 5 ++++-
> >  4 files changed, 12 insertions(+), 6 deletions(-)
> 
> I'm not seeing explicit info about how we handle the original failure
> and how it relates to this resume command, but this feels like a
> potentially racy approach to me.
> 
> If we have a network problem between source & target, we could see
> two results. Either the TCP stream will simply hang (it'll still
> appear open to QEMU but no traffic will be flowing),

(let's say this is the "1st condition")

> or the connection
> may actually break such that we get EOF and end up closing the file
> descriptor.

(let's say this is the "2nd condition")

> 
> In the latter case, we're ok because the original channel is now
> gone and we can safely establish the new one by issuing the new
> 'migrate --resume URI' command.
> 
> In the former case, however, there is the possibility that the
> hang may come back to life at some point, concurrently with us
> trying to do 'migrate --resume URI' and I'm unclear on the
> semantics if that happens.
> 
> Should the original connection carry on, and thus cause the
> 'migrate --resume' command to fail, or will we forcably terminate
> the original connection no matter what and use the new "resumed"
> connection.

Hmm yes, this is a good question. Currently this series is only
handling the 2nd condition, say, when we can detect the error via
system calls (IIUC we can know nothing when the 1st condition is
encountered, we just e.g. block at the system calls as usual when
reading the file handle). And currently the "resume" command is only
allowed if the 2nd condition is detected (so it will never destroy an
existing channel).

If you see the next following patch, there is something like:

    if (has_resume && resume) {
        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
            error_setg(errp, "Cannot resume if there is no "
                       "paused migration");
            return;
        }
        goto do_resume;
    }

And here MIGRATION_STATUS_POSTCOPY_PAUSED will only be set when the
2nd condition is met.

> 
> There's also synchronization with the target host - at the time we
> want to recover, we need to be able to tell the target to accept
> new incoming clients again, but we don't want to do that if the
> original connection comes back to life.

Yeah, I hacked this part in this v1 series (as you may have seen) to
keep the ports open-forever. I am not sure whether that is acceptable,
but looks not. :)

How about this: when destination detected 2nd condition, it firstly
switch to "postcopy-pause" state, then re-opens the accept channels.
And it can turns the accept channels off when the state moves out of
"postcopy-pause".

> 
> It feels to me that if the mgmt app or admin believes the migration
> is in a stuck state, we should be able to explicitly terminate the
> existing connection via a monitor command. Then setup the target
> host to accept new client, and then issue this migrate resume on
> the source.

Totally agree. That should be the only way to handle 1st condition
well. However, would you mind if I postpone it a bit? IMHO as long as
we can solve the 2nd condition nicely (which is the goal of this
series), then it won't be too hard to continue support the 1st
condition.

Since we are at here discussing the usage model... maybe I can further
extend it a bit to gain more input.

IMHO in general there are two phases for the recovery (assume we are
always talking about postcopy):

  active --> paused --> recovery --> active
               [1]         [2]

For [1]: the 1st condition we discussed above can be seen as "manual
pause" - user can provide a command to forcely discard existing
migration channel. While 2nd condition is the "automatic pause" (what
this series does): when qemu detected network problem, it
automatically switch to the paused state.

For [2]: we are always doing it in the "manual" way: we need a command
to trigger the recovery.

What I am thinking is whether it would make sense in the future to do
the "automatic" thing for [2] as well. In that sense, source
periodically detects connectability of existing migration channel
(which is broken), and it will auto-reconnect if it finds that the
network is recovered. We can add a new capability bit for it (e.g.,
"postcopy-auto-recovery"), showing whether we would like the
"automatic recovery" happen.

If we put these into a matrix:

|------------+---------------+----------------------------------------|
| Pause mode | Recovery mode | Use case                               |
|------------+---------------+----------------------------------------|
| manual     | manual        | 1st condition mentioned above          |
|            | auto          | (I *guess* we don't need this one)     |
|------------+---------------+----------------------------------------|
| auto       | manual        | 2nd condition mentioned above          |
|            | auto          | (will we want this one in the future?) |
|------------+---------------+----------------------------------------|

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 17/29] migration: rebuild channel on source
  2017-08-01 10:59   ` Dr. David Alan Gilbert
@ 2017-08-02  6:14     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02  6:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Tue, Aug 01, 2017 at 11:59:07AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This patch detects the "resume" flag of migration command, rebuild the
> > channels only if the flag is set.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 41 insertions(+), 11 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 36ff8c3..64de0ee 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1244,6 +1244,15 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
> >      MigrationState *s = migrate_get_current();
> >      const char *p;
> >  
> > +    if (has_resume && resume) {
> > +        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +            error_setg(errp, "Cannot resume if there is no "
> > +                       "paused migration");
> > +            return;
> > +        }
> > +        goto do_resume;
> > +    }
> > +
> >      if (migration_is_setup_or_active(s->state) ||
> >          s->state == MIGRATION_STATUS_CANCELLING ||
> >          s->state == MIGRATION_STATUS_COLO) {
> > @@ -1279,6 +1288,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
> >  
> >      s = migrate_init();
> >  
> > +do_resume:
> 
> Can we find a way to avoid this label?
> Perhaps split the bottom half of this function out into a separate
> function?

Yes this label can indeed be avoided (sorry for my laziness). Will
take the suggestion.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover"
  2017-08-01 11:36   ` Dr. David Alan Gilbert
@ 2017-08-02  6:42     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02  6:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Tue, Aug 01, 2017 at 12:36:22PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:

[...]
> > @@ -2043,9 +2054,32 @@ static bool postcopy_pause(MigrationState *s)
> >          qemu_sem_wait(&s->postcopy_pause_sem);
> >      }
> >  
> > -    trace_postcopy_pause_continued();
> > +    if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        /* We were waken up by a recover procedure. Give it a shot */
> >  
> > -    return true;
> > +        /*
> > +         * Firstly, let's wake up the return path now, with a new
> > +         * return path channel.
> > +         */
> > +        qemu_sem_post(&s->postcopy_pause_rp_sem);
> > +
> > +        /* Do the resume logic */
> > +        if (postcopy_do_resume(s) == 0) {
> > +            /* Let's continue! */
> > +            trace_postcopy_pause_continued();
> > +            return true;
> > +        } else {
> > +            /*
> > +             * Something wrong happened during the recovery, let's
> > +             * pause again. Pause is always better than throwing data
> > +             * away.
> > +             */
> > +            goto do_pause;
> 
> You should be able to turn this around into a do {} while or similar
> rather than goto.

Indeed. Fixing up.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 19/29] migration: let dst listen on port always
  2017-08-01 10:56   ` Daniel P. Berrange
@ 2017-08-02  7:02     ` Peter Xu
  2017-08-02  9:26       ` Daniel P. Berrange
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-02  7:02 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Tue, Aug 01, 2017 at 11:56:10AM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 28, 2017 at 04:06:28PM +0800, Peter Xu wrote:
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/exec.c   | 2 +-
> >  migration/fd.c     | 2 +-
> >  migration/socket.c | 4 ++--
> >  3 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/migration/exec.c b/migration/exec.c
> > index 08b599e..b4412db 100644
> > --- a/migration/exec.c
> > +++ b/migration/exec.c
> > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> >  {
> >      migration_channel_process_incoming(ioc);
> >      object_unref(OBJECT(ioc));
> > -    return FALSE; /* unregister */
> > +    return TRUE; /* keep it registered */
> >  }
> >  
> >  void exec_start_incoming_migration(const char *command, Error **errp)
> > diff --git a/migration/fd.c b/migration/fd.c
> > index 30f5258..865277a 100644
> > --- a/migration/fd.c
> > +++ b/migration/fd.c
> > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> >  {
> >      migration_channel_process_incoming(ioc);
> >      object_unref(OBJECT(ioc));
> > -    return FALSE; /* unregister */
> > +    return TRUE; /* keep it registered */
> >  }
> >  
> >  void fd_start_incoming_migration(const char *infd, Error **errp)
> > diff --git a/migration/socket.c b/migration/socket.c
> > index 757d382..f2c2d01 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -153,8 +153,8 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> >  
> >  out:
> >      /* Close listening socket as its no longer needed */
> > -    qio_channel_close(ioc, NULL);
> > -    return FALSE; /* unregister */
> > +    // qio_channel_close(ioc, NULL);
> > +    return TRUE; /* keep it registered */
> >  }
> 
> 
> This is not a very desirable approach IMHO.
> 
> There are two separate things at play - first we have the listener socket,
> and second we have the I/O watch that monitors for incoming clients.
> 
> The current code here closes the listener, and returns FALSE to unregister
> the event loop watch.
> 
> You're reversing both of these so that we keep the listener open and we
> keep monitoring for incoming clients. Ignoring migration resume for a
> minute, this means that the destination QEMU will now accept arbitrarily
> many incoming clients and keep trying to start a new incoming migration.
> 
> The behaviour we need is diferent. We *want* to unregister the event
> loop watch once we've accepted a client. We should only keep the socket
> listener in existance, but *not* accept any more clients. Only once we
> have hit a problem and want to accept a new client to do migration
> recovery, should we be re-adding the event loop watch.

I replied with another approach in the other thread: how about we
re-enable the listen port duing "postcopy-pause" state, and disable
the listen port when get out of that migration state?

Here "listen port" I mean both the IO watch and the socket object. Now
what I can think of: we keep these objects always there, meanwhile we
introduce a new bit for migration, say, "accept_incoming", to decide
whether we will really accept one connection. Then we drop the new
connection if that bit is not set.

(Or a new QIOChannelFeature to temporarily refuse incoming
 connection? E.g., QIO_CHANNEL_FEATURE_LISTEN_REFUSE?)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 19/29] migration: let dst listen on port always
  2017-08-02  7:02     ` Peter Xu
@ 2017-08-02  9:26       ` Daniel P. Berrange
  2017-08-02 11:02         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Daniel P. Berrange @ 2017-08-02  9:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Wed, Aug 02, 2017 at 03:02:24PM +0800, Peter Xu wrote:
> On Tue, Aug 01, 2017 at 11:56:10AM +0100, Daniel P. Berrange wrote:
> > On Fri, Jul 28, 2017 at 04:06:28PM +0800, Peter Xu wrote:
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/exec.c   | 2 +-
> > >  migration/fd.c     | 2 +-
> > >  migration/socket.c | 4 ++--
> > >  3 files changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/migration/exec.c b/migration/exec.c
> > > index 08b599e..b4412db 100644
> > > --- a/migration/exec.c
> > > +++ b/migration/exec.c
> > > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> > >  {
> > >      migration_channel_process_incoming(ioc);
> > >      object_unref(OBJECT(ioc));
> > > -    return FALSE; /* unregister */
> > > +    return TRUE; /* keep it registered */
> > >  }
> > >  
> > >  void exec_start_incoming_migration(const char *command, Error **errp)
> > > diff --git a/migration/fd.c b/migration/fd.c
> > > index 30f5258..865277a 100644
> > > --- a/migration/fd.c
> > > +++ b/migration/fd.c
> > > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > >  {
> > >      migration_channel_process_incoming(ioc);
> > >      object_unref(OBJECT(ioc));
> > > -    return FALSE; /* unregister */
> > > +    return TRUE; /* keep it registered */
> > >  }
> > >  
> > >  void fd_start_incoming_migration(const char *infd, Error **errp)
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index 757d382..f2c2d01 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -153,8 +153,8 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> > >  
> > >  out:
> > >      /* Close listening socket as its no longer needed */
> > > -    qio_channel_close(ioc, NULL);
> > > -    return FALSE; /* unregister */
> > > +    // qio_channel_close(ioc, NULL);
> > > +    return TRUE; /* keep it registered */
> > >  }
> > 
> > 
> > This is not a very desirable approach IMHO.
> > 
> > There are two separate things at play - first we have the listener socket,
> > and second we have the I/O watch that monitors for incoming clients.
> > 
> > The current code here closes the listener, and returns FALSE to unregister
> > the event loop watch.
> > 
> > You're reversing both of these so that we keep the listener open and we
> > keep monitoring for incoming clients. Ignoring migration resume for a
> > minute, this means that the destination QEMU will now accept arbitrarily
> > many incoming clients and keep trying to start a new incoming migration.
> > 
> > The behaviour we need is diferent. We *want* to unregister the event
> > loop watch once we've accepted a client. We should only keep the socket
> > listener in existance, but *not* accept any more clients. Only once we
> > have hit a problem and want to accept a new client to do migration
> > recovery, should we be re-adding the event loop watch.
> 
> I replied with another approach in the other thread: how about we
> re-enable the listen port duing "postcopy-pause" state, and disable
> the listen port when get out of that migration state?

Thinking about this agan, I realize I only considered the socket
migration backend.  If we are using the 'fd' backend, then we
*must* have an explicit monitor command invoked on the target
host to obtain the new client connection. This is because the
'fd' that QEMU has is the actual client connection, not a listener
socket, so libvirt needs to be able to pass in a new fd.


> Here "listen port" I mean both the IO watch and the socket object. Now
> what I can think of: we keep these objects always there, meanwhile we
> introduce a new bit for migration, say, "accept_incoming", to decide
> whether we will really accept one connection. Then we drop the new
> connection if that bit is not set.
> 
> (Or a new QIOChannelFeature to temporarily refuse incoming
>  connection? E.g., QIO_CHANNEL_FEATURE_LISTEN_REFUSE?)

That feature already exists - you just unregister the event loop
watch and re-register it when you're ready again. The chardev
socket code works this way already, since it only allows a single
client at a time

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option
  2017-08-02  5:56     ` Peter Xu
@ 2017-08-02  9:28       ` Daniel P. Berrange
  0 siblings, 0 replies; 116+ messages in thread
From: Daniel P. Berrange @ 2017-08-02  9:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Wed, Aug 02, 2017 at 01:56:46PM +0800, Peter Xu wrote:
> On Tue, Aug 01, 2017 at 12:03:48PM +0100, Daniel P. Berrange wrote:
> > On Fri, Jul 28, 2017 at 04:06:25PM +0800, Peter Xu wrote:
> > > It will be used when we want to resume one paused migration.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  hmp-commands.hx       | 7 ++++---
> > >  hmp.c                 | 4 +++-
> > >  migration/migration.c | 2 +-
> > >  qapi-schema.json      | 5 ++++-
> > >  4 files changed, 12 insertions(+), 6 deletions(-)
> > 
> > I'm not seeing explicit info about how we handle the original failure
> > and how it relates to this resume command, but this feels like a
> > potentially racy approach to me.
> > 
> > If we have a network problem between source & target, we could see
> > two results. Either the TCP stream will simply hang (it'll still
> > appear open to QEMU but no traffic will be flowing),
> 
> (let's say this is the "1st condition")
> 
> > or the connection
> > may actually break such that we get EOF and end up closing the file
> > descriptor.
> 
> (let's say this is the "2nd condition")
> 
> > 
> > In the latter case, we're ok because the original channel is now
> > gone and we can safely establish the new one by issuing the new
> > 'migrate --resume URI' command.
> > 
> > In the former case, however, there is the possibility that the
> > hang may come back to life at some point, concurrently with us
> > trying to do 'migrate --resume URI' and I'm unclear on the
> > semantics if that happens.
> > 
> > Should the original connection carry on, and thus cause the
> > 'migrate --resume' command to fail, or will we forcably terminate
> > the original connection no matter what and use the new "resumed"
> > connection.
> 
> Hmm yes, this is a good question. Currently this series is only
> handling the 2nd condition, say, when we can detect the error via
> system calls (IIUC we can know nothing when the 1st condition is
> encountered, we just e.g. block at the system calls as usual when
> reading the file handle). And currently the "resume" command is only
> allowed if the 2nd condition is detected (so it will never destroy an
> existing channel).
> 
> If you see the next following patch, there is something like:
> 
>     if (has_resume && resume) {
>         if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
>             error_setg(errp, "Cannot resume if there is no "
>                        "paused migration");
>             return;
>         }
>         goto do_resume;
>     }
> 
> And here MIGRATION_STATUS_POSTCOPY_PAUSED will only be set when the
> 2nd condition is met.
> 
> > 
> > There's also synchronization with the target host - at the time we
> > want to recover, we need to be able to tell the target to accept
> > new incoming clients again, but we don't want to do that if the
> > original connection comes back to life.
> 
> Yeah, I hacked this part in this v1 series (as you may have seen) to
> keep the ports open-forever. I am not sure whether that is acceptable,
> but looks not. :)
> 
> How about this: when destination detected 2nd condition, it firstly
> switch to "postcopy-pause" state, then re-opens the accept channels.
> And it can turns the accept channels off when the state moves out of
> "postcopy-pause".
> 
> > 
> > It feels to me that if the mgmt app or admin believes the migration
> > is in a stuck state, we should be able to explicitly terminate the
> > existing connection via a monitor command. Then setup the target
> > host to accept new client, and then issue this migrate resume on
> > the source.
> 
> Totally agree. That should be the only way to handle 1st condition
> well. However, would you mind if I postpone it a bit? IMHO as long as
> we can solve the 2nd condition nicely (which is the goal of this
> series), then it won't be too hard to continue support the 1st
> condition.

Sure, the 1st scenario is an easy bolt on to the second scenario. I
just wanted to be clear about what the target of these patches is,
because I think the 1st scenario is probably the most common one.

I guess if you have TCP keepalives enabled with a reasonably short
timeout, the 1st scenario will turn into the 2nd scenario fairly
quickly.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 19/29] migration: let dst listen on port always
  2017-08-02  9:26       ` Daniel P. Berrange
@ 2017-08-02 11:02         ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-02 11:02 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Laurent Vivier, Andrea Arcangeli, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Wed, Aug 02, 2017 at 10:26:48AM +0100, Daniel P. Berrange wrote:
> On Wed, Aug 02, 2017 at 03:02:24PM +0800, Peter Xu wrote:
> > On Tue, Aug 01, 2017 at 11:56:10AM +0100, Daniel P. Berrange wrote:
> > > On Fri, Jul 28, 2017 at 04:06:28PM +0800, Peter Xu wrote:
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  migration/exec.c   | 2 +-
> > > >  migration/fd.c     | 2 +-
> > > >  migration/socket.c | 4 ++--
> > > >  3 files changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/migration/exec.c b/migration/exec.c
> > > > index 08b599e..b4412db 100644
> > > > --- a/migration/exec.c
> > > > +++ b/migration/exec.c
> > > > @@ -49,7 +49,7 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
> > > >  {
> > > >      migration_channel_process_incoming(ioc);
> > > >      object_unref(OBJECT(ioc));
> > > > -    return FALSE; /* unregister */
> > > > +    return TRUE; /* keep it registered */
> > > >  }
> > > >  
> > > >  void exec_start_incoming_migration(const char *command, Error **errp)
> > > > diff --git a/migration/fd.c b/migration/fd.c
> > > > index 30f5258..865277a 100644
> > > > --- a/migration/fd.c
> > > > +++ b/migration/fd.c
> > > > @@ -49,7 +49,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > > >  {
> > > >      migration_channel_process_incoming(ioc);
> > > >      object_unref(OBJECT(ioc));
> > > > -    return FALSE; /* unregister */
> > > > +    return TRUE; /* keep it registered */
> > > >  }
> > > >  
> > > >  void fd_start_incoming_migration(const char *infd, Error **errp)
> > > > diff --git a/migration/socket.c b/migration/socket.c
> > > > index 757d382..f2c2d01 100644
> > > > --- a/migration/socket.c
> > > > +++ b/migration/socket.c
> > > > @@ -153,8 +153,8 @@ static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
> > > >  
> > > >  out:
> > > >      /* Close listening socket as its no longer needed */
> > > > -    qio_channel_close(ioc, NULL);
> > > > -    return FALSE; /* unregister */
> > > > +    // qio_channel_close(ioc, NULL);
> > > > +    return TRUE; /* keep it registered */
> > > >  }
> > > 
> > > 
> > > This is not a very desirable approach IMHO.
> > > 
> > > There are two separate things at play - first we have the listener socket,
> > > and second we have the I/O watch that monitors for incoming clients.
> > > 
> > > The current code here closes the listener, and returns FALSE to unregister
> > > the event loop watch.
> > > 
> > > You're reversing both of these so that we keep the listener open and we
> > > keep monitoring for incoming clients. Ignoring migration resume for a
> > > minute, this means that the destination QEMU will now accept arbitrarily
> > > many incoming clients and keep trying to start a new incoming migration.
> > > 
> > > The behaviour we need is diferent. We *want* to unregister the event
> > > loop watch once we've accepted a client. We should only keep the socket
> > > listener in existance, but *not* accept any more clients. Only once we
> > > have hit a problem and want to accept a new client to do migration
> > > recovery, should we be re-adding the event loop watch.
> > 
> > I replied with another approach in the other thread: how about we
> > re-enable the listen port duing "postcopy-pause" state, and disable
> > the listen port when get out of that migration state?
> 
> Thinking about this agan, I realize I only considered the socket
> migration backend.  If we are using the 'fd' backend, then we
> *must* have an explicit monitor command invoked on the target
> host to obtain the new client connection. This is because the
> 'fd' that QEMU has is the actual client connection, not a listener
> socket, so libvirt needs to be able to pass in a new fd.

Hmm right... So looks like I cannot really ignore this issue even for
the first version (I thought I could).

Not sure whether we can do this: just allow the "migrate_incoming URL"
to work even for not "delayed" cases? Then when we receive that
command, we'll do:

- if there is no existing listening port (when the old URL is one of
  "defer", "exec", "fd"), we create a new listening port using the new
  URL provided

- if there is existing listening port (when the old URL is one of
  "tcp", "sock", "rdma"), we first release the old listening port and
  resources, then create a new port using the new URL

> 
> 
> > Here "listen port" I mean both the IO watch and the socket object. Now
> > what I can think of: we keep these objects always there, meanwhile we
> > introduce a new bit for migration, say, "accept_incoming", to decide
> > whether we will really accept one connection. Then we drop the new
> > connection if that bit is not set.
> > 
> > (Or a new QIOChannelFeature to temporarily refuse incoming
> >  connection? E.g., QIO_CHANNEL_FEATURE_LISTEN_REFUSE?)
> 
> That feature already exists - you just unregister the event loop
> watch and re-register it when you're ready again. The chardev
> socket code works this way already, since it only allows a single
> client at a time

Ah I see.  Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover
  2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
@ 2017-08-03  9:28   ` Dr. David Alan Gilbert
  2017-08-04  5:46     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03  9:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On the destination side, we cannot wake up all the threads when we got
> reconnected. The first thing to do is to wake up the main load thread,
> so that we can continue to receive valid messages from source again and
> reply when needed.
> 
> At this point, we switch the destination VM state from postcopy-paused
> back to postcopy-recover.
> 
> Now we are finally ready to do the resume logic.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 34 +++++++++++++++++++++++++++++++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 3aabe11..e498fa4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -389,10 +389,38 @@ static void process_incoming_migration_co(void *opaque)
>  
>  void migration_fd_process_incoming(QEMUFile *f)
>  {
> -    Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f);
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    Coroutine *co;
> +
> +    mis->from_src_file = f;
> +
> +    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        /* Resumed migration to postcopy state */
> +
> +        /* Postcopy has standalone thread to do vm load */
> +        qemu_file_set_blocking(f, true);
> +
> +        /* Re-configure the return path */
> +        mis->to_src_file = qemu_file_get_return_path(f);
>  
> -    qemu_file_set_blocking(f, false);
> -    qemu_coroutine_enter(co);
> +        /* Reset the migration status to postcopy-active */
> +        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> +                          MIGRATION_STATUS_POSTCOPY_RECOVER);

The comment doesn't match the code.

Other than that;


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +
> +        /*
> +         * Here, we only wake up the main loading thread (while the
> +         * fault thread will still be waiting), so that we can receive
> +         * commands from source now, and answer it if needed. The
> +         * fault thread will be waked up afterwards until we are sure
> +         * that source is ready to reply to page requests.
> +         */
> +        qemu_sem_post(&mis->postcopy_pause_sem_dst);
> +    } else {
> +        /* New incoming migration */
> +        qemu_file_set_blocking(f, false);
> +        co = qemu_coroutine_create(process_incoming_migration_co, f);
> +        qemu_coroutine_enter(co);
> +    }
>  }
>  
>  /*
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
@ 2017-08-03  9:49   ` Dr. David Alan Gilbert
  2017-08-04  6:08     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03  9:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
> one ramblock.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/savevm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/savevm.h     |  1 +
>  migration/trace-events |  1 +
>  3 files changed, 61 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 386788d..0ab13c0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -78,6 +78,7 @@ enum qemu_vm_cmd {
>                                        were previously sent during
>                                        precopy but are dirty. */
>      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> +    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
>      MIG_CMD_MAX
>  };
>  
> @@ -95,6 +96,7 @@ static struct mig_cmd_args {
>      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
>                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
>      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
> +    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
>      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -929,6 +931,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
>      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
>  }
>  
> +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
> +{
> +    size_t len;
> +    char buf[512];

Only needs to be 256 bytes?

> +    trace_savevm_send_recv_bitmap(block_name);
> +
> +    buf[0] = len = strlen(block_name);
> +    memcpy(buf + 1, block_name, len);
> +
> +    qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
> +}
> +
>  bool qemu_savevm_state_blocked(Error **errp)
>  {
>      SaveStateEntry *se;
> @@ -1705,6 +1720,47 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
>  }
>  
>  /*
> + * Handle request that source requests for recved_bitmap on
> + * destination. Payload format:
> + *
> + * len (1 byte) + ramblock_name (<255 bytes)
> + */
> +static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
> +                                     uint16_t len)
> +{
> +    QEMUFile *file = mis->from_src_file;
> +    RAMBlock *rb;
> +    char block_name[256];
> +    size_t cnt;
> +
> +    cnt = qemu_get_counted_string(file, block_name);
> +    if (!cnt) {
> +        error_report("%s: failed to read block name", __func__);
> +        return -EINVAL;
> +    }
> +
> +    /* Validate before using the data */
> +    if (qemu_file_get_error(file)) {
> +        return qemu_file_get_error(file);
> +    }
> +
> +    if (len != cnt + 1) {
> +        error_report("%s: invalid payload length (%d)", __func__, len);
> +        return -EINVAL;
> +    }
> +
> +    rb = qemu_ram_block_by_name(block_name);
> +    if (!rb) {
> +        error_report("%s: block '%s' not found", __func__, block_name);
> +        return -EINVAL;
> +    }
> +
> +    /* TODO: send the bitmap back to source */

Probably worth adding a trace in this function somewhere.

Other than that;


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +    return 0;
> +}
> +
> +/*
>   * Process an incoming 'QEMU_VM_COMMAND'
>   * 0           just a normal return
>   * LOADVM_QUIT All good, but exit the loop
> @@ -1777,6 +1833,9 @@ static int loadvm_process_command(QEMUFile *f)
>  
>      case MIG_CMD_POSTCOPY_RAM_DISCARD:
>          return loadvm_postcopy_ram_handle_discard(mis, len);
> +
> +    case MIG_CMD_RECV_BITMAP:
> +        return loadvm_handle_recv_bitmap(mis, len);
>      }
>  
>      return 0;
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 295c4a1..8126b1c 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
>  void qemu_savevm_send_postcopy_advise(QEMUFile *f);
>  void qemu_savevm_send_postcopy_listen(QEMUFile *f);
>  void qemu_savevm_send_postcopy_run(QEMUFile *f);
> +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
>  
>  void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>                                             uint16_t len,
> diff --git a/migration/trace-events b/migration/trace-events
> index dbb4971..ca7b43f 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -34,6 +34,7 @@ savevm_send_open_return_path(void) ""
>  savevm_send_ping(uint32_t val) "%x"
>  savevm_send_postcopy_listen(void) ""
>  savevm_send_postcopy_run(void) ""
> +savevm_send_recv_bitmap(char *name) "%s"
>  savevm_state_setup(void) ""
>  savevm_state_header(void) ""
>  savevm_state_iterate(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2017-08-03 10:50   ` Dr. David Alan Gilbert
  2017-08-04  6:59     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 10:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
> received bitmap of ramblock back to source.
> 
> This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
> the header (including the ramblock name), and it was appended with the
> whole ramblock received bitmap on the destination side.
> 
> When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
> it parses it, convert it to the dirty bitmap by reverting the bits.

Inverting not reverting?

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  | 62 ++++++++++++++++++++++++++++++++++++++++++
>  migration/migration.h  |  2 ++
>  migration/ram.c        | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/ram.h        |  2 ++
>  migration/savevm.c     |  2 +-
>  migration/trace-events |  2 ++
>  6 files changed, 143 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index e498fa4..c2b85ac 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -92,6 +92,7 @@ enum mig_rp_message_type {
>  
>      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
>      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
> +    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>  
>      MIG_RP_MSG_MAX
>  };
> @@ -450,6 +451,39 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>      migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
>  }
>  
> +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> +                                 char *block_name)
> +{
> +    char buf[512];
> +    int len;
> +    int64_t res;
> +
> +    /*
> +     * First, we send the header part. It contains only the len of
> +     * idstr, and the idstr itself.
> +     */
> +    len = strlen(block_name);
> +    buf[0] = len;
> +    memcpy(buf + 1, block_name, len);
> +
> +    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
> +
> +    /*
> +     * Next, we dump the received bitmap to the stream.
> +     *
> +     * TODO: currently we are safe since we are the only one that is
> +     * using the to_src_file handle (fault thread is still paused),
> +     * and it's ok even not taking the mutex. However the best way is
> +     * to take the lock before sending the message header, and release
> +     * the lock after sending the bitmap.
> +     */

Should we be checking the state?

> +    qemu_mutex_lock(&mis->rp_mutex);
> +    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
> +    qemu_mutex_unlock(&mis->rp_mutex);
> +
> +    trace_migrate_send_rp_recv_bitmap(block_name, res);

OK, that's a little unusual - I don't think we've got anywhere else
where the data for the rp_ message isn't in the call to
migrate_send_rp_message.
(Another way to structure it would be to make each message send a chunk
of bitmap; but lets stick with this structure for now)

Can you add, either here or in ramblock_recv_bitmap_send an 'end marker'
on the bitmap data; just a (non-0) known value byte that would help us
check if we had a mess where things got misaligned.

> +}
> +
>  MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
>  {
>      MigrationCapabilityStatusList *head = NULL;
> @@ -1560,6 +1594,7 @@ static struct rp_cmd_args {
>      [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
>      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
>      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
> +    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
>      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1604,6 +1639,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
>      return true;
>  }
>  
> +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
> +{
> +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> +
> +    if (!block) {
> +        error_report("%s: invalid block name '%s'", __func__, block_name);
> +        return -EINVAL;
> +    }
> +
> +    /* Fetch the received bitmap and refresh the dirty bitmap */
> +    return ram_dirty_bitmap_reload(s, block);
> +}
> +
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -1709,6 +1757,20 @@ retry:
>              migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
>              break;
>  
> +        case MIG_RP_MSG_RECV_BITMAP:
> +            if (header_len < 1) {
> +                error_report("%s: missing block name", __func__);
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
> +            buf[buf[0] + 1] = '\0';
> +            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            break;
> +
>          default:
>              break;
>          }
> diff --git a/migration/migration.h b/migration/migration.h
> index 574fedd..4d38308 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -204,5 +204,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
> +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> +                                 char *block_name);
>  
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 7f4cb0f..d543483 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -182,6 +182,32 @@ void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb)
>  }
>  
>  /*
> + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> + *
> + * Returns >0 if success with sent bytes, or <0 if error.
> + */
> +int64_t ramblock_recv_bitmap_send(QEMUFile *file, char *block_name)
> +{
> +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> +    uint64_t size;
> +
> +    /* We should have made sure that the block exists */
> +    assert(block);

Best not to make it assert; just make it fail - the block name is
coming off the wire anyway.
(Also can we make it a const char *block_name)

> +    /* Size of the bitmap, in bytes */
> +    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> +    qemu_put_be64(file, size);
> +    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);

Do we need to be careful about endianness and length of long here?
The migration stream can (theoretically) migrate between hosts of
different endianness, e.g. a Power LE and Power BE host it can also
migrate between a 32bit and 64bit host where the 'long' used in our
bitmap is a different length.
I think that means you have to save it as a series of long's;
and also just make sure 'size' is a multiple of 'long' - otherwise
you lose the last few bytes, which on a big endian system would
be a problem.

Also, should we be using 'max_length' or 'used_length' - ram_save_setup
stores the used_length.  I don't think we should be accessing outside
the used_length?  That might also make the thing about 'size' being
rounded to a 'long' more interesting; maybe need to check you don't use
the bits outside the used_length.

> +    qemu_fflush(file);
> +
> +    if (qemu_file_get_error(file)) {
> +        return qemu_file_get_error(file);
> +    }
> +
> +    return sizeof(size) + size;

I think since size is always sent as a 64bit that's  size + 8.

> +}
> +
> +/*
>   * An outstanding page request, on the source, having been received
>   * and queued
>   */
> @@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +/*
> + * Read the received bitmap, revert it as the initial dirty bitmap.
> + * This is only used when the postcopy migration is paused but wants
> + * to resume from a middle point.
> + */
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> +{
> +    QEMUFile *file = s->rp_state.from_dst_file;
> +    uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> +    uint64_t size;
> +
> +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: incorrect state %s", __func__,
> +                     MigrationStatus_lookup[s->state]);
> +        return -EINVAL;
> +    }
> +
> +    size = qemu_get_be64(file);
> +
> +    /* The size of the bitmap should match with our ramblock */
> +    if (size != local_size) {
> +        error_report("%s: ramblock '%s' bitmap size mismatch "
> +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> +                     size, local_size);
> +        return -EINVAL;
> +    }

Coming back to the used_length thing above;  again I think the rule
is that the used_length has to match not the max_length.

> +    /*
> +     * We are still during migration (though paused). The dirty bitmap
> +     * won't change.  We can directly modify it.
> +     */
> +    size = qemu_get_buffer(file, (uint8_t *)block->bmap, local_size);
> +
> +    if (qemu_file_get_error(file)) {
> +        return qemu_file_get_error(file);
> +    }
> +
> +    /*
> +     * What we received is "received bitmap". Revert it as the initial
> +     * dirty bitmap for this ramblock.
> +     */
> +    bitmap_invert(block->bmap, block->max_length >> TARGET_PAGE_BITS);
> +
> +    trace_ram_dirty_bitmap_reload(block->idstr);
> +
> +    return 0;
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/migration/ram.h b/migration/ram.h
> index 84e8623..86eb973 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -58,5 +58,7 @@ void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb);
>  void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
>                                      size_t len);
>  void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb);
> +int64_t ramblock_recv_bitmap_send(QEMUFile *file, char *block_name);
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 0ab13c0..def9213 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1755,7 +1755,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
>          return -EINVAL;
>      }
>  
> -    /* TODO: send the bitmap back to source */
> +    migrate_send_rp_recv_bitmap(mis, block_name);
>  
>      return 0;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index ca7b43f..ed69551 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -77,6 +77,7 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>  ram_postcopy_send_discard_bitmap(void) ""
>  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
> +ram_dirty_bitmap_reload(char *str) "%s"
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> @@ -88,6 +89,7 @@ migrate_fd_cancel(void) ""
>  migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx"
>  migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
>  migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> +migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
>  migration_completion_file_err(void) ""
>  migration_completion_postcopy_end(void) ""
>  migration_completion_postcopy_end_after_complete(void) ""
> -- 
> 2.7.4

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2017-08-03 11:05   ` Dr. David Alan Gilbert
  2017-08-04  7:04     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 11:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Introducing this new command to be sent when the source VM is ready to
> resume the paused migration.  What the destination does here is
> basically release the fault thread to continue service page faults.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/savevm.c     | 27 +++++++++++++++++++++++++++
>  migration/savevm.h     |  1 +
>  migration/trace-events |  1 +
>  3 files changed, 29 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index def9213..2e330bc 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -77,6 +77,7 @@ enum qemu_vm_cmd {
>      MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
>                                        were previously sent during
>                                        precopy but are dirty. */
> +    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
>      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
>      MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
>      MIG_CMD_MAX
> @@ -95,6 +96,7 @@ static struct mig_cmd_args {
>      [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
>      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
>                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
> +    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
>      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
>      [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
>      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
> @@ -931,6 +933,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
>      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
>  }
>  
> +void qemu_savevm_send_postcopy_resume(QEMUFile *f)
> +{
> +    trace_savevm_send_postcopy_resume();
> +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
> +}
> +
>  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
>  {
>      size_t len;
> @@ -1671,6 +1679,22 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>      return LOADVM_QUIT;
>  }
>  
> +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> +{
> +    /*
> +     * This means source VM is ready to resume the postcopy migration.
> +     * It's time to switch state and release the fault thread to
> +     * continue service page faults.
> +     */
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    qemu_sem_post(&mis->postcopy_pause_sem_fault);

Is it worth sanity checking that you were in RECOVER at this point?

Dave

> +
> +    /* TODO: Tell source that "we are ready" */
> +
> +    return 0;
> +}
> +
>  /**
>   * Immediately following this command is a blob of data containing an embedded
>   * chunk of migration stream; read it and load it.
> @@ -1834,6 +1858,9 @@ static int loadvm_process_command(QEMUFile *f)
>      case MIG_CMD_POSTCOPY_RAM_DISCARD:
>          return loadvm_postcopy_ram_handle_discard(mis, len);
>  
> +    case MIG_CMD_POSTCOPY_RESUME:
> +        return loadvm_postcopy_handle_resume(mis);
> +
>      case MIG_CMD_RECV_BITMAP:
>          return loadvm_handle_recv_bitmap(mis, len);
>      }
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 8126b1c..a5f3879 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
>  void qemu_savevm_send_postcopy_advise(QEMUFile *f);
>  void qemu_savevm_send_postcopy_listen(QEMUFile *f);
>  void qemu_savevm_send_postcopy_run(QEMUFile *f);
> +void qemu_savevm_send_postcopy_resume(QEMUFile *f);
>  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
>  
>  void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> diff --git a/migration/trace-events b/migration/trace-events
> index ed69551..04dd9d8 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -34,6 +34,7 @@ savevm_send_open_return_path(void) ""
>  savevm_send_ping(uint32_t val) "%x"
>  savevm_send_postcopy_listen(void) ""
>  savevm_send_postcopy_run(void) ""
> +savevm_send_postcopy_resume(void) ""
>  savevm_send_recv_bitmap(char *name) "%s"
>  savevm_state_setup(void) ""
>  savevm_state_header(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2017-08-03 11:21   ` Dr. David Alan Gilbert
  2017-08-04  7:23     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 11:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
> is used as payload to let the source know whether destination is ready
> to continue the migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
>  migration/migration.h  |  1 +
>  migration/savevm.c     |  3 ++-
>  migration/trace-events |  1 +
>  4 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c2b85ac..62f91ce 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -93,6 +93,7 @@ enum mig_rp_message_type {
>      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
>      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
>      MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
> +    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
>  
>      MIG_RP_MSG_MAX
>  };
> @@ -484,6 +485,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>      trace_migrate_send_rp_recv_bitmap(block_name, res);
>  }
>  
> +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
> +{
> +    uint32_t buf;
> +
> +    buf = cpu_to_be32(value);
> +    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
> +}
> +
>  MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
>  {
>      MigrationCapabilityStatusList *head = NULL;
> @@ -1595,6 +1604,7 @@ static struct rp_cmd_args {
>      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
>      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
>      [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
> +    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
>      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1652,6 +1662,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
>      return ram_dirty_bitmap_reload(s, block);
>  }
>  
> +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> +{
> +    trace_source_return_path_thread_resume_ack(value);
> +
> +    /*
> +     * Currently value will always be one. It can be used in the
> +     * future to notify source that destination cannot continue.
> +     */
> +    assert(value == 1);

Again I prefer the routine to fail than to assert.
Maybe it's worth having a constant rather than the magic 1.


> +    /* Now both sides are active. */
> +    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +
> +    /* TODO: notify send thread that time to continue send pages */
> +
> +    return 0;
> +}
> +
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -1771,6 +1800,14 @@ retry:
>              }
>              break;
>  
> +        case MIG_RP_MSG_RESUME_ACK:
> +            tmp32 = ldl_be_p(buf);
> +            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
> +                mark_source_rp_bad(ms);
> +                goto out;
> +            }
> +            break;
> +
>          default:
>              break;
>          }
> diff --git a/migration/migration.h b/migration/migration.h
> index 4d38308..2a3f905 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -206,5 +206,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                   char *block_name);
> +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>  
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 2e330bc..02a67ac 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1690,7 +1690,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
>      qemu_sem_post(&mis->postcopy_pause_sem_fault);
>  
> -    /* TODO: Tell source that "we are ready" */
> +    /* Tell source that "we are ready" */
> +    migrate_send_rp_resume_ack(mis, 1);
>  
>      return 0;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 04dd9d8..0b43fec 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -116,6 +116,7 @@ source_return_path_thread_entry(void) ""
>  source_return_path_thread_loop_top(void) ""
>  source_return_path_thread_pong(uint32_t val) "%x"
>  source_return_path_thread_shut(uint32_t val) "%x"
> +source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  migrate_global_state_post_load(const char *state) "loaded state: %s"
>  migrate_global_state_pre_save(const char *state) "saved state: %s"
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> -- 
> 2.7.4

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare
  2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2017-08-03 11:38   ` Dr. David Alan Gilbert
  2017-08-04  7:39     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 11:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This is hook function to be called when a postcopy migration wants to
> resume from a failure. For each module, it should provide its own
> recovery logic before we switch to the postcopy-active state.

Would a change-state handler be able to do this, or perhaps
the notifier chain I have in my shared memory world:
 https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06459.html

Dave

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/migration/register.h |  2 ++
>  migration/migration.c        | 20 +++++++++++++++++++-
>  migration/savevm.c           | 25 +++++++++++++++++++++++++
>  migration/savevm.h           |  1 +
>  migration/trace-events       |  1 +
>  5 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index a0f1edd..b669362 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -41,6 +41,8 @@ typedef struct SaveVMHandlers {
>      LoadStateHandler *load_state;
>      int (*load_setup)(QEMUFile *f, void *opaque);
>      int (*load_cleanup)(void *opaque);
> +    /* Called when postcopy migration wants to resume from failure */
> +    int (*resume_prepare)(MigrationState *s, void *opaque);
>  } SaveVMHandlers;
>  
>  int register_savevm_live(DeviceState *dev,
> diff --git a/migration/migration.c b/migration/migration.c
> index 62f91ce..6cb0ad3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2150,7 +2150,25 @@ static bool postcopy_should_start(MigrationState *s)
>  /* Return zero if success, or <0 for error */
>  static int postcopy_do_resume(MigrationState *s)
>  {
> -    /* TODO: do the resume logic */
> +    int ret;
> +
> +    /*
> +     * Call all the resume_prepare() hooks, so that modules can be
> +     * ready for the migration resume.
> +     */
> +    ret = qemu_savevm_state_resume_prepare(s);
> +    if (ret) {
> +        error_report("%s: resume_prepare() failure detected: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
> +    /*
> +     * TODO: handshake with dest using MIG_CMD_RESUME,
> +     * MIG_RP_MSG_RESUME_ACK, then switch source state to
> +     * "postcopy-active"
> +     */
> +
>      return 0;
>  }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 02a67ac..08a4712 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1004,6 +1004,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
>      }
>  }
>  
> +int qemu_savevm_state_resume_prepare(MigrationState *s)
> +{
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    trace_savevm_state_resume_prepare();
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->resume_prepare) {
> +            continue;
> +        }
> +        if (se->ops && se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +        ret = se->ops->resume_prepare(s, se->opaque);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>   * this function has three return values:
>   *   negative: there was one error, and we have -errno.
> diff --git a/migration/savevm.h b/migration/savevm.h
> index a5f3879..3193f04 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -31,6 +31,7 @@
>  
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_state_setup(QEMUFile *f);
> +int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
>  int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
>  void qemu_savevm_state_cleanup(void);
> diff --git a/migration/trace-events b/migration/trace-events
> index 0b43fec..0fb2d1e 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -37,6 +37,7 @@ savevm_send_postcopy_run(void) ""
>  savevm_send_postcopy_resume(void) ""
>  savevm_send_recv_bitmap(char *name) "%s"
>  savevm_state_setup(void) ""
> +savevm_state_resume_prepare(void) ""
>  savevm_state_header(void) ""
>  savevm_state_iterate(void) ""
>  savevm_state_cleanup(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume
  2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
@ 2017-08-03 11:56   ` Dr. David Alan Gilbert
  2017-08-04  7:49     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 11:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> This patch implements the first part of core RAM resume logic for
> postcopy. ram_resume_prepare() is provided for the work.
> 
> When the migration is interrupted by network failure, the dirty bitmap
> on the source side will be meaningless, because even the dirty bit is
> cleared, it is still possible that the sent page was lost along the way
> to destination. Here instead of continue the migration with the old
> dirty bitmap on source, we ask the destination side to send back its
> received bitmap, then invert it to be our initial dirty bitmap.
> 
> The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
> once per ramblock, to ask for the received bitmap. On destination side,
> MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
> Data will be received on the return-path thread of source, and the main
> migration thread will be notified when all the ramblock bitmaps are
> synchronized.
> 
> One issue to be solved here is how to synchronize the source send thread
> and return-path thread. Semaphore cannot really work here since we
> cannot guarantee the order of wait/post (it's possible that the reply is
> very fast, even before send thread starts to wait). So conditional
> variable is used to make sure the ordering is always correct.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c  |  4 +++
>  migration/migration.h  |  4 +++
>  migration/ram.c        | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/trace-events |  1 +
>  4 files changed, 77 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6cb0ad3..93fbc96 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1093,6 +1093,8 @@ static void migrate_fd_cleanup(void *opaque)
>  
>      qemu_sem_destroy(&s->postcopy_pause_sem);
>      qemu_sem_destroy(&s->postcopy_pause_rp_sem);
> +    qemu_mutex_destroy(&s->resume_lock);
> +    qemu_cond_destroy(&s->resume_cond);
>  }
>  
>  void migrate_fd_error(MigrationState *s, const Error *error)
> @@ -1238,6 +1240,8 @@ MigrationState *migrate_init(void)
>      s->error = NULL;
>      qemu_sem_init(&s->postcopy_pause_sem, 0);
>      qemu_sem_init(&s->postcopy_pause_rp_sem, 0);
> +    qemu_mutex_init(&s->resume_lock);
> +    qemu_cond_init(&s->resume_cond);
>  
>      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index 2a3f905..c270f4c 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -159,6 +159,10 @@ struct MigrationState
>      /* Needed by postcopy-pause state */
>      QemuSemaphore postcopy_pause_sem;
>      QemuSemaphore postcopy_pause_rp_sem;
> +
> +    /* Used to sync-up between main send thread and rp-thread */
> +    QemuMutex resume_lock;
> +    QemuCond resume_cond;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/ram.c b/migration/ram.c
> index d543483..c695b13 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -46,6 +46,7 @@
>  #include "exec/ram_addr.h"
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
> +#include "savevm.h"
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -256,6 +257,8 @@ struct RAMState {
>      RAMBlock *last_req_rb;
>      /* Queue of outstanding page requests from the destination */
>      QemuMutex src_page_req_mutex;
> +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> +    int ramblock_to_sync;
>      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
>  };
>  typedef struct RAMState RAMState;
> @@ -2731,6 +2734,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +/* Sync all the dirty bitmap with destination VM.  */
> +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> +{
> +    RAMBlock *block;
> +    QEMUFile *file = s->to_dst_file;
> +    int ramblock_count = 0;
> +
> +    trace_ram_dirty_bitmap_sync("start");

Most (but not all) of our trace_ uses have separate trace_ entries for
each step; e.g.   trace_ram_dirty_bitmap_sync_start9)

> +    /*
> +     * We need to take the resume lock to make sure that the send
> +     * thread (current thread) and the rp-thread will do their work in
> +     * order.
> +     */
> +    qemu_mutex_lock(&s->resume_lock);
> +
> +    /* Request for receive-bitmap for each block */
> +    RAMBLOCK_FOREACH(block) {
> +        ramblock_count++;
> +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> +    }
> +
> +    /* Init the ramblock count to total */
> +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> +
> +    trace_ram_dirty_bitmap_sync("wait-bitmap");
> +
> +    /* Wait until all the ramblocks' dirty bitmap synced */
> +    while (rs->ramblock_to_sync) {
> +        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> +    }

Does the locking here get simpler if you:
  a) count the number of RAMBlocks 'n'
  b) Initialise a sempahore to -(n-1)
  c) Call qemu_savevm_send_recv_bitmap for each bitmap
  d) sem_wait on the semaphore - which is waiting for the semaphore to
     be >0

as you receive each bitmap do a sem_post; on the last one
it should go from 0->1 and the sem_wait should wake up?

Dave

> +    trace_ram_dirty_bitmap_sync("completed");
> +
> +    qemu_mutex_unlock(&s->resume_lock);
> +
> +    return 0;
> +}
> +
> +static void ram_dirty_bitmap_reload_notify(MigrationState *s)
> +{
> +    qemu_mutex_lock(&s->resume_lock);
> +    atomic_dec(&ram_state->ramblock_to_sync);
> +    if (ram_state->ramblock_to_sync == 0) {
> +        /* Make sure the other thread gets the latest */
> +        trace_ram_dirty_bitmap_sync("notify-send");
> +        qemu_cond_signal(&s->resume_cond);
> +    }
> +    qemu_mutex_unlock(&s->resume_lock);
> +}
> +
>  /*
>   * Read the received bitmap, revert it as the initial dirty bitmap.
>   * This is only used when the postcopy migration is paused but wants
> @@ -2776,9 +2830,22 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>  
>      trace_ram_dirty_bitmap_reload(block->idstr);
>  
> +    /*
> +     * We succeeded to sync bitmap for current ramblock. If this is
> +     * the last one to sync, we need to notify the main send thread.
> +     */
> +    ram_dirty_bitmap_reload_notify(s);
> +
>      return 0;
>  }
>  
> +static int ram_resume_prepare(MigrationState *s, void *opaque)
> +{
> +    RAMState *rs = *(RAMState **)opaque;
> +
> +    return ram_dirty_bitmap_sync_all(s, rs);
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> @@ -2789,6 +2856,7 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .save_cleanup = ram_save_cleanup,
>      .load_setup = ram_load_setup,
>      .load_cleanup = ram_load_cleanup,
> +    .resume_prepare = ram_resume_prepare,
>  };
>  
>  void ram_mig_init(void)
> diff --git a/migration/trace-events b/migration/trace-events
> index 0fb2d1e..15ff1bf 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -80,6 +80,7 @@ ram_postcopy_send_discard_bitmap(void) ""
>  ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
>  ram_dirty_bitmap_reload(char *str) "%s"
> +ram_dirty_bitmap_sync(const char *str) "%s"
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 27/29] migration: setup ramstate for resume
  2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
@ 2017-08-03 12:37   ` Dr. David Alan Gilbert
  2017-08-04  8:39     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 12:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> After we updated the dirty bitmaps of ramblocks, we also need to update
> the critical fields in RAMState to make sure it is ready for a resume.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c695b13..427bf6e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1947,6 +1947,31 @@ static int ram_state_init(RAMState **rsp)
>      return 0;
>  }
>  
> +static void ram_state_resume_prepare(RAMState *rs)
> +{
> +    RAMBlock *block;
> +    long pages = 0;
> +
> +    /*
> +     * Postcopy is not using xbzrle/compression, so no need for that.
> +     * Also, since source are already halted, we don't need to care
> +     * about dirty page logging as well.
> +     */
> +
> +    RAMBLOCK_FOREACH(block) {
> +        pages += bitmap_count_one(block->bmap,
> +                                  block->max_length >> TARGET_PAGE_BITS);

Again I think that needs to be block->used_length (see
migration_bitmap_sync).

> +    }
> +
> +    /* This may not be aligned with current bitmaps. Recalculate. */
> +    rs->migration_dirty_pages = pages;
> +
> +    rs->last_seen_block = NULL;
> +    rs->last_sent_block = NULL;
> +    rs->last_page = 0;
> +    rs->last_version = ram_list.version;

A trace at this point with the pages count might be worthwhile.

> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -2842,8 +2867,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>  static int ram_resume_prepare(MigrationState *s, void *opaque)
>  {
>      RAMState *rs = *(RAMState **)opaque;
> +    int ret;
>  
> -    return ram_dirty_bitmap_sync_all(s, rs);
> +    ret = ram_dirty_bitmap_sync_all(s, rs);

Interesting; I'd assumed you'd load directly into this
bitmap rather than loading into the bitmap on each block.
Do we ever get the case where a bitmap is set on the source
bitmap but not in the loaded bitmap?

Dave

> +    if (ret) {
> +        return ret;
> +    }
> +
> +    ram_state_resume_prepare(rs);
> +
> +    return 0;
>  }
>  
>  static SaveVMHandlers savevm_ram_handlers = {
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
  2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
@ 2017-08-03 13:47   ` Dr. David Alan Gilbert
  2017-08-04  9:05     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 13:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Finish the last step to do the final handshake for the recovery.
> 
> First source sends one MIG_CMD_RESUME to dst, telling that source is
> ready to resume.
> 
> Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
> dest is ready to resume (after switch to postcopy-active state).
> 
> When source received the RESUME_ACK, it switches its state to
> postcopy-active, and finally the recovery is completed.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 93fbc96..ecebe30 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1666,6 +1666,13 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
>      return ram_dirty_bitmap_reload(s, block);
>  }
>  
> +static void postcopy_resume_handshake_ack(MigrationState *s)
> +{
> +    qemu_mutex_lock(&s->resume_lock);
> +    qemu_cond_signal(&s->resume_cond);
> +    qemu_mutex_unlock(&s->resume_lock);
> +}
> +
>  static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
>  {
>      trace_source_return_path_thread_resume_ack(value);
> @@ -1680,7 +1687,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
>      migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
>  
> -    /* TODO: notify send thread that time to continue send pages */
> +    /* Notify send thread that time to continue send pages */
> +    postcopy_resume_handshake_ack(s);
>  
>      return 0;
>  }
> @@ -2151,6 +2159,25 @@ static bool postcopy_should_start(MigrationState *s)
>      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
>  }
>  
> +static int postcopy_resume_handshake(MigrationState *s)
> +{
> +    qemu_mutex_lock(&s->resume_lock);
> +
> +    qemu_savevm_send_postcopy_resume(s->to_dst_file);
> +
> +    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> +    }
> +
> +    qemu_mutex_unlock(&s->resume_lock);
> +
> +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        return 0;
> +    }

That feels to be a small racy - couldn't that validly become a
MIGRATION_STATUS_COMPLETED before that check?

I wonder if we need to change migrate_fd_cancel to be able to
cause a cancel in this case?

Dave

> +    return -1;
> +}
> +
>  /* Return zero if success, or <0 for error */
>  static int postcopy_do_resume(MigrationState *s)
>  {
> @@ -2168,10 +2195,14 @@ static int postcopy_do_resume(MigrationState *s)
>      }
>  
>      /*
> -     * TODO: handshake with dest using MIG_CMD_RESUME,
> -     * MIG_RP_MSG_RESUME_ACK, then switch source state to
> -     * "postcopy-active"
> +     * Last handshake with destination on the resume (destination will
> +     * switch to postcopy-active afterwards)
>       */
> +    ret = postcopy_resume_handshake(s);
> +    if (ret) {
> +        error_report("%s: handshake failed: %d", __func__, ret);
> +        return ret;
> +    }
>  
>      return 0;
>  }
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
  2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
@ 2017-08-03 13:54   ` Dr. David Alan Gilbert
  2017-08-04  8:52     ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 13:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Firstly, MigThrError enumeration is introduced to describe the error in
> migration_detect_error() better. This gives the migration_thread() a
> chance to know whether a recovery has happened.
> 
> Then, if a recovery is detected, migration_thread() will reset its local
> variables to prepare for that.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 40 +++++++++++++++++++++++++++++-----------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ecebe30..439bc22 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
>      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
>  }
>  
> +typedef enum MigThrError {
> +    /* No error detected */
> +    MIG_THR_ERR_NONE = 0,
> +    /* Detected error, but resumed successfully */
> +    MIG_THR_ERR_RECOVERED = 1,
> +    /* Detected fatal error, need to exit */
> +    MIG_THR_ERR_FATAL = 2,
> +} MigThrError;
> +

Could you move this patch earlier to when postcopy_pause is created
so it's created with this enum?

>  static int postcopy_resume_handshake(MigrationState *s)
>  {
>      qemu_mutex_lock(&s->resume_lock);
> @@ -2209,10 +2218,10 @@ static int postcopy_do_resume(MigrationState *s)
>  
>  /*
>   * We don't return until we are in a safe state to continue current
> - * postcopy migration.  Returns true to continue the migration, or
> - * false to terminate current migration.
> + * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> + * MIG_THR_ERR_FATAL if unrecovery failure happened.
>   */
> -static bool postcopy_pause(MigrationState *s)
> +static MigThrError postcopy_pause(MigrationState *s)
>  {
>      assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>  
> @@ -2247,7 +2256,7 @@ do_pause:
>          if (postcopy_do_resume(s) == 0) {
>              /* Let's continue! */
>              trace_postcopy_pause_continued();
> -            return true;
> +            return MIG_THR_ERR_RECOVERED;
>          } else {
>              /*
>               * Something wrong happened during the recovery, let's
> @@ -2258,12 +2267,11 @@ do_pause:
>          }
>      } else {
>          /* This is not right... Time to quit. */
> -        return false;
> +        return MIG_THR_ERR_FATAL;
>      }
>  }
>  
> -/* Return true if we want to stop the migration, otherwise false. */
> -static bool migration_detect_error(MigrationState *s)
> +static MigThrError migration_detect_error(MigrationState *s)
>  {
>      int ret;
>  
> @@ -2272,7 +2280,7 @@ static bool migration_detect_error(MigrationState *s)
>  
>      if (!ret) {
>          /* Everything is fine */
> -        return false;
> +        return MIG_THR_ERR_NONE;
>      }
>  
>      if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> @@ -2281,7 +2289,7 @@ static bool migration_detect_error(MigrationState *s)
>           * while. After that, it can be continued by a
>           * recovery phase.
>           */
> -        return !postcopy_pause(s);
> +        return postcopy_pause(s);
>      } else {
>          /*
>           * For precopy (or postcopy with error outside IO), we fail
> @@ -2291,7 +2299,7 @@ static bool migration_detect_error(MigrationState *s)
>          trace_migration_thread_file_err();
>  
>          /* Time to stop the migration, now. */
> -        return true;
> +        return MIG_THR_ERR_FATAL;
>      }
>  }
>  
> @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
>      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
>      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
>      bool enable_colo = migrate_colo_enabled();
> +    MigThrError thr_error;
>  
>      rcu_register_thread();
>  
> @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
>           * Try to detect any kind of failures, and see whether we
>           * should stop the migration now.
>           */
> -        if (migration_detect_error(s)) {
> +        thr_error = migration_detect_error(s);
> +        if (thr_error == MIG_THR_ERR_FATAL) {
> +            /* Stop migration */
>              break;
> +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> +            /*
> +             * Just recovered from a e.g. network failure, reset all
> +             * the local variables.
> +             */
> +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +            initial_bytes = 0;

They don't seem that important to reset?

Dave

>          }
>  
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-08-02  5:06     ` Peter Xu
@ 2017-08-03 14:03       ` Dr. David Alan Gilbert
  2017-08-04  3:43         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 14:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> 
> [...]
> 
> > > +/* Return true if we should continue the migration, or false. */
> > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > +{
> > > +    trace_postcopy_pause_incoming();
> > > +
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > +
> > > +    assert(mis->from_src_file);
> > > +    qemu_file_shutdown(mis->from_src_file);
> > > +    qemu_fclose(mis->from_src_file);
> > > +    mis->from_src_file = NULL;
> > > +
> > > +    assert(mis->to_src_file);
> > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > +    qemu_file_shutdown(mis->to_src_file);
> > > +    qemu_fclose(mis->to_src_file);
> > > +    mis->to_src_file = NULL;
> > > +    qemu_mutex_unlock(&mis->rp_mutex);
> > 
> > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > 
> >     static void migrate_send_rp_message(MigrationIncomingState *mis,
> >                                         enum mig_rp_message_type message_type,
> >                                         uint16_t len, void *data)
> >     {
> >         trace_migrate_send_rp_message((int)message_type, len);
> >         qemu_mutex_lock(&mis->rp_mutex);
> >         qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> >         qemu_put_be16(mis->to_src_file, len);
> >         qemu_put_buffer(mis->to_src_file, data, len);
> >         qemu_fflush(mis->to_src_file);
> >         qemu_mutex_unlock(&mis->rp_mutex);
> >     }
> > 
> > If we came into postcopy_pause_incoming at about the same time
> > migrate_send_rp_message was being called and pause_incoming took the
> > lock first, then once it release the lock, send_rp_message carries on
> > and uses mis->to_src_file that's now NULL.
> > 
> > One solution here is to just call qemu_file_shutdown() but leave the
> > files open at this point, but clean the files up sometime later.
> 
> I see the commnent on patch 14 as well - yeah, we need patch 14 to
> co-op here, and as long as we are with patch 14, we should be ok.
> 
> > 
> > > +
> > > +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > +    }
> > > +
> > > +    trace_postcopy_pause_incoming_continued();
> > > +
> > > +    return true;
> > > +}
> > > +
> > >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > >  {
> > >      uint8_t section_type;
> > >      int ret = 0;
> > >  
> > > +retry:
> > >      while (true) {
> > >          section_type = qemu_get_byte(f);
> > >  
> > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > >  out:
> > >      if (ret < 0) {
> > >          qemu_file_set_error(f, ret);
> > > +
> > > +        /*
> > > +         * Detect whether it is:
> > > +         *
> > > +         * 1. postcopy running
> > > +         * 2. network failure (-EIO)
> > > +         *
> > > +         * If so, we try to wait for a recovery.
> > > +         */
> > > +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > +            ret == -EIO && postcopy_pause_incoming(mis)) {
> > > +            /* Reset f to point to the newly created channel */
> > > +            f = mis->from_src_file;
> > > +            goto retry;
> > > +        }
> > 
> > I wonder if:
> > 
> >            if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> >                ret == -EIO && postcopy_pause_incoming(mis)) {
> >                /* Try again after postcopy recovery */
> >                return qemu_loadvm_state_main(mis->from_src_file, mis);
> >            }
> > would be nicer; it avoids the goto loop.
> 
> I agree we should avoid using goto loops. However I do see vast usages
> of goto like this one when we want to redo part of the procedures of a
> function (or, of course, when handling errors in "C-style").

We mostly use them to jump forward to an error exit; only rarely do
we do loops with them;  so if we can sensibly avoid them it's best.

> Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> also has defect: stack usage would be out of control, or even, it can
> be controled by malicious users. E.g., if someone used program to
> periodically stop/start any network endpoint along the migration
> network, QEMU may go into a paused -> recovery -> active -> paused ...
> loop, and stack usage will just grow with time. I'd say it's an
> extreme example though...

I think it's safe because it's a tail-call so a new stack frame isn't
needed.

> (Another way besides above two: maybe we can just return in
>  qemu_loadvm_state_main with something like -EAGAIN, then the caller
>  of qemu_loadvm_state_main can re-call it when necessary, though I
>  would prefer "goto is okay here"... :)

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery
  2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
                   ` (29 preceding siblings ...)
  2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
@ 2017-08-03 15:57 ` Dr. David Alan Gilbert
  2017-08-21  7:47   ` Peter Xu
  30 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-03 15:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, berrange

* Peter Xu (peterx@redhat.com) wrote:
> As we all know that postcopy migration has a potential risk to lost
> the VM if the network is broken during the migration. This series
> tries to solve the problem by allowing the migration to pause at the
> failure point, and do recovery after the link is reconnected.
> 
> There was existing work on this issue from Md Haris Iqbal:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> 
> This series is a totally re-work of the issue, based on Alexey
> Perevalov's recved bitmap v8 series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html


Hi Peter,
  See my comments on the individual patches; but at a top level I think
it looks pretty good.

  I still worry about two related things, one I see is similar to what
you discussed with Dan.

  1) Is what happens if we end up hanging on a missing page with the bql
  taken and can't use the monitor.
  Checking my notes from when I was chatting to Harris last year,
    'info cpu' was pretty good at doing this because it needed the vcpus
  to come out of their loops, so if any vcpu was blocked on memory we'd
  block waiting.  The other case is where an emulated IO device accesses
  it, and that's easiest by doing a migrate with inbound network
  traffic.
  In this case, will your 'accept' still work?

  2) Similar to Dan's question of what happens if the network just hangs
  as opposed to gives an error;  it should eventually sort itself out
  with TCP timeouts - eventually.  Perhaps the easiest way to test this
  is just to add a iptables -j DROP  for the migration port - it's
  probably easier to trigger (1).


  Solving (1) is tricky - I'm not sure that it needs solving for a first
attempt as long as we have some ideas.

Dave

> Two new status are added to support the migration (used on both
> sides):
> 
>   MIGRATION_STATUS_POSTCOPY_PAUSED
>   MIGRATION_STATUS_POSTCOPY_RECOVER
> 
> The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
> network failure is detected. It is a phase that we'll be in for a long
> time as long as the failure is detected, and we'll be there until a
> recovery is triggered.  In this state, all the threads (on source:
> send thread, return-path thread; destination: ram-load thread,
> page-fault thread) will be halted.
> 
> The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
> a recovery, both source/destination VM will jump into this stage, do
> whatever it needs to prepare the recovery (e.g., currently the most
> important thing is to synchronize the dirty bitmap, please see commit
> messages for more information). After the preparation is ready, the
> source will do the final handshake with destination, then both sides
> will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.
> 
> New commands/messages are defined as well to satisfy the need:
> 
> MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
> delivering received bitmaps
> 
> MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
> handshake of postcopy recovery.
> 
> Here's some more details on how the whole failure/recovery routine is
> happened:
> 
> - start migration
> - ... (switch from precopy to postcopy)
> - both sides are in "postcopy-active" state
> - ... (failure happened, e.g., network unplugged)
> - both sides switch to "postcopy-paused" state
>   - all the migration threads are stopped on both sides
> - ... (both VMs hanged)
> - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
>   source side, "-r" means "recover")
> - both sides switch to "postcopy-recover" state
>   - on source: send-thread, return-path-thread will be waked up
>   - on dest: ram-load-thread waked up, fault-thread still paused
> - source calls new savevmhandler hook resume_prepare() (currently,
>   only ram is providing the hook):
>   - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
>     - src sends MIG_CMD_RECV_BITMAP to dst
>     - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
>       - src uses the recved bitmap to rebuild dirty bitmap
> - source do final handshake with destination
>   - src sends MIG_CMD_RESUME to dst, telling "src is ready"
>     - when dst receives the command, fault thread will be waked up,
>       meanwhile, dst switch back to "postcopy-active"
>   - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
>     - when src receives the ack, state switch to "postcopy-active"
> - postcopy migration continued
> 
> Testing:
> 
> As I said, it's still an extremely simple test. I used socat to create
> a socket bridge:
> 
>   socat tcp-listen:6666 tcp-connect:localhost:5555 &
> 
> Then do the migration via the bridge. I emulated the network failure
> by killing the socat process (bridge down), then tries to recover the
> migration using the other channel (default dst channel). It looks
> like:
> 
>         port:6666    +------------------+
>         +----------> | socat bridge [1] |-------+
>         |            +------------------+       |
>         |         (Original channel)            |
>         |                                       | port: 5555
>      +---------+  (Recovery channel)            +--->+---------+
>      | src VM  |------------------------------------>| dst VM  |
>      +---------+                                     +---------+
> 
> Known issues/notes:
> 
> - currently destination listening port still cannot change. E.g., the
>   recovery should be using the same port on destination for
>   simplicity. (on source, we can specify new URL)
> 
> - the patch: "migration: let dst listen on port always" is still
>   hacky, it just kept the incoming accept open forever for now...
> 
> - some migration numbers might still be inaccurate, like total
>   migration time, etc. (But I don't really think that matters much
>   now)
> 
> - the patches are very lightly tested.
> 
> - Dave reported one problem that may hang destination main loop thread
>   (one vcpu thread holds the BQL) and the rest. I haven't encountered
>   it yet, but it does not mean this series can survive with it.
> 
> - other potential issues that I may have forgotten or unnoticed...
> 
> Anyway, the work is still in preliminary stage. Any suggestions and
> comments are greatly welcomed.  Thanks.
> 
> Peter Xu (29):
>   migration: fix incorrect postcopy recved_bitmap
>   migration: fix comment disorder in RAMState
>   io: fix qio_channel_socket_accept err handling
>   bitmap: introduce bitmap_invert()
>   bitmap: introduce bitmap_count_one()
>   migration: dump str in migrate_set_state trace
>   migration: better error handling with QEMUFile
>   migration: reuse mis->userfault_quit_fd
>   migration: provide postcopy_fault_thread_notify()
>   migration: new property "x-postcopy-fast"
>   migration: new postcopy-pause state
>   migration: allow dst vm pause on postcopy
>   migration: allow src return path to pause
>   migration: allow send_rq to fail
>   migration: allow fault thread to pause
>   qmp: hmp: add migrate "resume" option
>   migration: rebuild channel on source
>   migration: new state "postcopy-recover"
>   migration: let dst listen on port always
>   migration: wakeup dst ram-load-thread for recover
>   migration: new cmd MIG_CMD_RECV_BITMAP
>   migration: new message MIG_RP_MSG_RECV_BITMAP
>   migration: new cmd MIG_CMD_POSTCOPY_RESUME
>   migration: new message MIG_RP_MSG_RESUME_ACK
>   migration: introduce SaveVMHandlers.resume_prepare
>   migration: synchronize dirty bitmap for resume
>   migration: setup ramstate for resume
>   migration: final handshake for the resume
>   migration: reset migrate thread vars when resumed
> 
>  hmp-commands.hx              |   7 +-
>  hmp.c                        |   4 +-
>  include/migration/register.h |   2 +
>  include/qemu/bitmap.h        |  20 ++
>  io/channel-socket.c          |   1 +
>  migration/exec.c             |   2 +-
>  migration/fd.c               |   2 +-
>  migration/migration.c        | 465 ++++++++++++++++++++++++++++++++++++++++---
>  migration/migration.h        |  25 ++-
>  migration/postcopy-ram.c     | 109 +++++++---
>  migration/postcopy-ram.h     |   2 +
>  migration/ram.c              | 209 ++++++++++++++++++-
>  migration/ram.h              |   4 +
>  migration/savevm.c           | 189 +++++++++++++++++-
>  migration/savevm.h           |   3 +
>  migration/socket.c           |   4 +-
>  migration/trace-events       |  16 +-
>  qapi-schema.json             |  12 +-
>  util/bitmap.c                |  28 +++
>  19 files changed, 1024 insertions(+), 80 deletions(-)
> 
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-08-03 14:03       ` Dr. David Alan Gilbert
@ 2017-08-04  3:43         ` Peter Xu
  2017-08-04  9:33           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  3:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > 
> > [...]
> > 
> > > > +/* Return true if we should continue the migration, or false. */
> > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > +{
> > > > +    trace_postcopy_pause_incoming();
> > > > +
> > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > +
> > > > +    assert(mis->from_src_file);
> > > > +    qemu_file_shutdown(mis->from_src_file);
> > > > +    qemu_fclose(mis->from_src_file);
> > > > +    mis->from_src_file = NULL;
> > > > +
> > > > +    assert(mis->to_src_file);
> > > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > > +    qemu_file_shutdown(mis->to_src_file);
> > > > +    qemu_fclose(mis->to_src_file);
> > > > +    mis->to_src_file = NULL;
> > > > +    qemu_mutex_unlock(&mis->rp_mutex);
> > > 
> > > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > > 
> > >     static void migrate_send_rp_message(MigrationIncomingState *mis,
> > >                                         enum mig_rp_message_type message_type,
> > >                                         uint16_t len, void *data)
> > >     {
> > >         trace_migrate_send_rp_message((int)message_type, len);
> > >         qemu_mutex_lock(&mis->rp_mutex);
> > >         qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > >         qemu_put_be16(mis->to_src_file, len);
> > >         qemu_put_buffer(mis->to_src_file, data, len);
> > >         qemu_fflush(mis->to_src_file);
> > >         qemu_mutex_unlock(&mis->rp_mutex);
> > >     }
> > > 
> > > If we came into postcopy_pause_incoming at about the same time
> > > migrate_send_rp_message was being called and pause_incoming took the
> > > lock first, then once it release the lock, send_rp_message carries on
> > > and uses mis->to_src_file that's now NULL.
> > > 
> > > One solution here is to just call qemu_file_shutdown() but leave the
> > > files open at this point, but clean the files up sometime later.
> > 
> > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > co-op here, and as long as we are with patch 14, we should be ok.
> > 
> > > 
> > > > +
> > > > +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > +    }
> > > > +
> > > > +    trace_postcopy_pause_incoming_continued();
> > > > +
> > > > +    return true;
> > > > +}
> > > > +
> > > >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > >  {
> > > >      uint8_t section_type;
> > > >      int ret = 0;
> > > >  
> > > > +retry:
> > > >      while (true) {
> > > >          section_type = qemu_get_byte(f);
> > > >  
> > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > >  out:
> > > >      if (ret < 0) {
> > > >          qemu_file_set_error(f, ret);
> > > > +
> > > > +        /*
> > > > +         * Detect whether it is:
> > > > +         *
> > > > +         * 1. postcopy running
> > > > +         * 2. network failure (-EIO)
> > > > +         *
> > > > +         * If so, we try to wait for a recovery.
> > > > +         */
> > > > +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > +            ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > +            /* Reset f to point to the newly created channel */
> > > > +            f = mis->from_src_file;
> > > > +            goto retry;
> > > > +        }
> > > 
> > > I wonder if:
> > > 
> > >            if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > >                ret == -EIO && postcopy_pause_incoming(mis)) {
> > >                /* Try again after postcopy recovery */
> > >                return qemu_loadvm_state_main(mis->from_src_file, mis);
> > >            }
> > > would be nicer; it avoids the goto loop.
> > 
> > I agree we should avoid using goto loops. However I do see vast usages
> > of goto like this one when we want to redo part of the procedures of a
> > function (or, of course, when handling errors in "C-style").
> 
> We mostly use them to jump forward to an error exit; only rarely do
> we do loops with them;  so if we can sensibly avoid them it's best.
> 
> > Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> > also has defect: stack usage would be out of control, or even, it can
> > be controled by malicious users. E.g., if someone used program to
> > periodically stop/start any network endpoint along the migration
> > network, QEMU may go into a paused -> recovery -> active -> paused ...
> > loop, and stack usage will just grow with time. I'd say it's an
> > extreme example though...
> 
> I think it's safe because it's a tail-call so a new stack frame isn't
> needed.

I tried it and dumped the assembly, looks like even with tail-call, we
didn't really avoid the "callq":

(gdb) disassemble qemu_loadvm_state_main
Dump of assembler code for function qemu_loadvm_state_main:
   0x00000000005d9ff8 <+0>:     push   %rbp
   0x00000000005d9ff9 <+1>:     mov    %rsp,%rbp
   0x00000000005d9ffc <+4>:     sub    $0x20,%rsp
   0x00000000005da000 <+8>:     mov    %rdi,-0x18(%rbp)
   0x00000000005da004 <+12>:    mov    %rsi,-0x20(%rbp)
   0x00000000005da008 <+16>:    movl   $0x0,-0x4(%rbp)
   0x00000000005da00f <+23>:    mov    -0x18(%rbp),%rax
   0x00000000005da013 <+27>:    mov    %rax,%rdi
   0x00000000005da016 <+30>:    callq  0x5e185e <qemu_get_byte>

[...]

   0x00000000005da135 <+317>:   jne    0x5da165 <qemu_loadvm_state_main+365>
   0x00000000005da137 <+319>:   cmpl   $0xfffffffb,-0x4(%rbp)
   0x00000000005da13b <+323>:   jne    0x5da165 <qemu_loadvm_state_main+365>
   0x00000000005da13d <+325>:   mov    -0x20(%rbp),%rax
   0x00000000005da141 <+329>:   mov    %rax,%rdi
   0x00000000005da144 <+332>:   callq  0x5d9eb4 <postcopy_pause_incoming>
   0x00000000005da149 <+337>:   test   %al,%al
   0x00000000005da14b <+339>:   je     0x5da165 <qemu_loadvm_state_main+365>
   0x00000000005da14d <+341>:   mov    -0x20(%rbp),%rax
   0x00000000005da151 <+345>:   mov    (%rax),%rax
   0x00000000005da154 <+348>:   mov    -0x20(%rbp),%rdx
   0x00000000005da158 <+352>:   mov    %rdx,%rsi
   0x00000000005da15b <+355>:   mov    %rax,%rdi
   0x00000000005da15e <+358>:   callq  0x5d9ff8 <qemu_loadvm_state_main>
                                ^^^^^^^^^^^^^^^ (this one)
   0x00000000005da163 <+363>:   jmp    0x5da168 <qemu_loadvm_state_main+368>
   0x00000000005da165 <+365>:   mov    -0x4(%rbp),%eax
   0x00000000005da168 <+368>:   leaveq
   0x00000000005da169 <+369>:   retq

Do we need extra compilation parameters to achieve the tail-call
optimization for gcc? My gcc version is: v6.1.1 20160621.

(even with extra flags, I am still a bit worried on whether it'll work
 on the other compilers though)

And, the "label-way" to retry is indeed used widely at least in both
QEMU and Linux kernel. I tried to directly grep "^retry:" (so we are
ignoring the same usage using different label names), there are ~30
usage in QEMU and hundreds of cases in Linux kernel. So not sure
whether this can be seen as another "legal" way to use C labels...

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover
  2017-08-03  9:28   ` Dr. David Alan Gilbert
@ 2017-08-04  5:46     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  5:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 10:28:20AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On the destination side, we cannot wake up all the threads when we got
> > reconnected. The first thing to do is to wake up the main load thread,
> > so that we can continue to receive valid messages from source again and
> > reply when needed.
> > 
> > At this point, we switch the destination VM state from postcopy-paused
> > back to postcopy-recover.
> > 
> > Now we are finally ready to do the resume logic.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 34 +++++++++++++++++++++++++++++++---
> >  1 file changed, 31 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 3aabe11..e498fa4 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -389,10 +389,38 @@ static void process_incoming_migration_co(void *opaque)
> >  
> >  void migration_fd_process_incoming(QEMUFile *f)
> >  {
> > -    Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f);
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    Coroutine *co;
> > +
> > +    mis->from_src_file = f;
> > +
> > +    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        /* Resumed migration to postcopy state */

(I guess here it should be "Resumed from a paused postcopy migration")

> > +
> > +        /* Postcopy has standalone thread to do vm load */
> > +        qemu_file_set_blocking(f, true);
> > +
> > +        /* Re-configure the return path */
> > +        mis->to_src_file = qemu_file_get_return_path(f);
> >  
> > -    qemu_file_set_blocking(f, false);
> > -    qemu_coroutine_enter(co);
> > +        /* Reset the migration status to postcopy-active */
> > +        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> > +                          MIGRATION_STATUS_POSTCOPY_RECOVER);
> 
> The comment doesn't match the code.

Indeed. I'll remove the comment since the code explains.

> 
> Other than that;
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-08-03  9:49   ` Dr. David Alan Gilbert
@ 2017-08-04  6:08     ` Peter Xu
  2017-08-04  6:15       ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  6:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 10:49:02AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
> > one ramblock.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/savevm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/savevm.h     |  1 +
> >  migration/trace-events |  1 +
> >  3 files changed, 61 insertions(+)
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 386788d..0ab13c0 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -78,6 +78,7 @@ enum qemu_vm_cmd {
> >                                        were previously sent during
> >                                        precopy but are dirty. */
> >      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> > +    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
> >      MIG_CMD_MAX
> >  };
> >  
> > @@ -95,6 +96,7 @@ static struct mig_cmd_args {
> >      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
> >                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
> >      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
> > +    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
> >      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
> >  };
> >  
> > @@ -929,6 +931,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
> >      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
> >  }
> >  
> > +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
> > +{
> > +    size_t len;
> > +    char buf[512];
> 
> Only needs to be 256 bytes?

Yes, it is.

Even, I guess I should use dynamic allocation, since 256 has the
assumption of block_name size.

[...]

> > +static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
> > +                                     uint16_t len)
> > +{
> > +    QEMUFile *file = mis->from_src_file;
> > +    RAMBlock *rb;
> > +    char block_name[256];
> > +    size_t cnt;
> > +
> > +    cnt = qemu_get_counted_string(file, block_name);
> > +    if (!cnt) {
> > +        error_report("%s: failed to read block name", __func__);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* Validate before using the data */
> > +    if (qemu_file_get_error(file)) {
> > +        return qemu_file_get_error(file);
> > +    }
> > +
> > +    if (len != cnt + 1) {
> > +        error_report("%s: invalid payload length (%d)", __func__, len);
> > +        return -EINVAL;
> > +    }
> > +
> > +    rb = qemu_ram_block_by_name(block_name);
> > +    if (!rb) {
> > +        error_report("%s: block '%s' not found", __func__, block_name);
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* TODO: send the bitmap back to source */
> 
> Probably worth adding a trace in this function somewhere.

Will do.

> 
> Other than that;
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-08-04  6:08     ` Peter Xu
@ 2017-08-04  6:15       ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  6:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 02:08:33PM +0800, Peter Xu wrote:
> On Thu, Aug 03, 2017 at 10:49:02AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
> > > one ramblock.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/savevm.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  migration/savevm.h     |  1 +
> > >  migration/trace-events |  1 +
> > >  3 files changed, 61 insertions(+)
> > > 
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index 386788d..0ab13c0 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -78,6 +78,7 @@ enum qemu_vm_cmd {
> > >                                        were previously sent during
> > >                                        precopy but are dirty. */
> > >      MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> > > +    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
> > >      MIG_CMD_MAX
> > >  };
> > >  
> > > @@ -95,6 +96,7 @@ static struct mig_cmd_args {
> > >      [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
> > >                                     .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
> > >      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
> > > +    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
> > >      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
> > >  };
> > >  
> > > @@ -929,6 +931,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
> > >      qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
> > >  }
> > >  
> > > +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
> > > +{
> > > +    size_t len;
> > > +    char buf[512];
> > 
> > Only needs to be 256 bytes?
> 
> Yes, it is.
> 
> Even, I guess I should use dynamic allocation, since 256 has the
> assumption of block_name size.

Oh wait - we will put the length into first byte, which cannot be
bigger than 255... I'll use 256 directly, then I guess I can keep the
r-b.   Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-03 10:50   ` Dr. David Alan Gilbert
@ 2017-08-04  6:59     ` Peter Xu
  2017-08-04  9:49       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  6:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
> > received bitmap of ramblock back to source.
> > 
> > This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
> > the header (including the ramblock name), and it was appended with the
> > whole ramblock received bitmap on the destination side.
> > 
> > When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
> > it parses it, convert it to the dirty bitmap by reverting the bits.
> 
> Inverting not reverting?

Oops.  Sorry for my poor English!

[...]

> > +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> > +                                 char *block_name)
> > +{
> > +    char buf[512];
> > +    int len;
> > +    int64_t res;
> > +
> > +    /*
> > +     * First, we send the header part. It contains only the len of
> > +     * idstr, and the idstr itself.
> > +     */
> > +    len = strlen(block_name);
> > +    buf[0] = len;
> > +    memcpy(buf + 1, block_name, len);
> > +
> > +    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
> > +
> > +    /*
> > +     * Next, we dump the received bitmap to the stream.
> > +     *
> > +     * TODO: currently we are safe since we are the only one that is
> > +     * using the to_src_file handle (fault thread is still paused),
> > +     * and it's ok even not taking the mutex. However the best way is
> > +     * to take the lock before sending the message header, and release
> > +     * the lock after sending the bitmap.
> > +     */
> 
> Should we be checking the state?

Sure.  I can add an assertion.

> 
> > +    qemu_mutex_lock(&mis->rp_mutex);
> > +    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
> > +    qemu_mutex_unlock(&mis->rp_mutex);
> > +
> > +    trace_migrate_send_rp_recv_bitmap(block_name, res);
> 
> OK, that's a little unusual - I don't think we've got anywhere else
> where the data for the rp_ message isn't in the call to
> migrate_send_rp_message.
> (Another way to structure it would be to make each message send a chunk
> of bitmap; but lets stick with this structure for now)

Yeah, but I thought it is unnecessary complicity, so I didn't do that.

> 
> Can you add, either here or in ramblock_recv_bitmap_send an 'end marker'
> on the bitmap data; just a (non-0) known value byte that would help us
> check if we had a mess where things got misaligned.

Of course. Yes the length at the beginning may not be enough. An
ending mark looks safer.

[...]

> >  /*
> > + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
> > + *
> > + * Returns >0 if success with sent bytes, or <0 if error.
> > + */
> > +int64_t ramblock_recv_bitmap_send(QEMUFile *file, char *block_name)
> > +{
> > +    RAMBlock *block = qemu_ram_block_by_name(block_name);
> > +    uint64_t size;
> > +
> > +    /* We should have made sure that the block exists */
> > +    assert(block);
> 
> Best not to make it assert; just make it fail - the block name is
> coming off the wire anyway.
> (Also can we make it a const char *block_name)

Okay.

> 
> > +    /* Size of the bitmap, in bytes */
> > +    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > +    qemu_put_be64(file, size);
> > +    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
> 
> Do we need to be careful about endianness and length of long here?
> The migration stream can (theoretically) migrate between hosts of
> different endianness, e.g. a Power LE and Power BE host it can also
> migrate between a 32bit and 64bit host where the 'long' used in our
> bitmap is a different length.

Ah, good catch...

I feel like we'd better provide a new bitmap helper for this when the
bitmap will be sent to somewhere else, like:

  void bitmap_to_le(unsigned long *dst, const unsigned long *src,
                    long nbits);
  void bitmap_from_le(unsigned long *dst, const unsigned long *src,
                      long nbits);

I used little endian since I *think* that should work even cross 32/64
bits machines (and I think big endian should not work).

> I think that means you have to save it as a series of long's;
> and also just make sure 'size' is a multiple of 'long' - otherwise
> you lose the last few bytes, which on a big endian system would
> be a problem.

Yeah, then the size should possibly be pre-cooked with
BITS_TO_LONGS(). However that's slightly tricky as well, maybe I
should provide another bitmap helper:

  static inline long bitmap_size(long nbits)
  {
      return BITS_TO_LONGS(nbits);
  }

Since the whole thing should be part of bitmap APIs imho.

> 
> Also, should we be using 'max_length' or 'used_length' - ram_save_setup
> stores the used_length.  I don't think we should be accessing outside
> the used_length?  That might also make the thing about 'size' being
> rounded to a 'long' more interesting; maybe need to check you don't use
> the bits outside the used_length.

Yes. AFAIU max_length and used_length are always the same currently in
our codes. I used max_length since in ram_state_init() we inited
block->bmap and block->unsentmap with it. I can switch to used_length
though.

> 
> > +    qemu_fflush(file);
> > +
> > +    if (qemu_file_get_error(file)) {
> > +        return qemu_file_get_error(file);
> > +    }
> > +
> > +    return sizeof(size) + size;
> 
> I think since size is always sent as a 64bit that's  size + 8.

Yes. I "offloaded" the calcluation of sizeof(size) to compiler (in
case I got brain furt when writting the codes...). So you prefer
digits directly in these cases? It might be just fragile if we changed
the type of "size" someday (though I guess we won't).

Let me use "size + 8".

> 
> > +}
> > +
> > +/*
> >   * An outstanding page request, on the source, having been received
> >   * and queued
> >   */
> > @@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      return ret;
> >  }
> >  
> > +/*
> > + * Read the received bitmap, revert it as the initial dirty bitmap.
> > + * This is only used when the postcopy migration is paused but wants
> > + * to resume from a middle point.
> > + */
> > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > +{
> > +    QEMUFile *file = s->rp_state.from_dst_file;
> > +    uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > +    uint64_t size;
> > +
> > +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        error_report("%s: incorrect state %s", __func__,
> > +                     MigrationStatus_lookup[s->state]);
> > +        return -EINVAL;
> > +    }
> > +
> > +    size = qemu_get_be64(file);
> > +
> > +    /* The size of the bitmap should match with our ramblock */
> > +    if (size != local_size) {
> > +        error_report("%s: ramblock '%s' bitmap size mismatch "
> > +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> > +                     size, local_size);
> > +        return -EINVAL;
> > +    }
> 
> Coming back to the used_length thing above;  again I think the rule
> is that the used_length has to match not the max_length.

Yeah. Will switch.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-03 11:05   ` Dr. David Alan Gilbert
@ 2017-08-04  7:04     ` Peter Xu
  2017-08-04  7:09       ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  7:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > +{
> > +    /*
> > +     * This means source VM is ready to resume the postcopy migration.
> > +     * It's time to switch state and release the fault thread to
> > +     * continue service page faults.
> > +     */
> > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> 
> Is it worth sanity checking that you were in RECOVER at this point?

Yeah, it never hurts.  Will do.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-04  7:04     ` Peter Xu
@ 2017-08-04  7:09       ` Peter Xu
  2017-08-04  8:30         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  7:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > +{
> > > +    /*
> > > +     * This means source VM is ready to resume the postcopy migration.
> > > +     * It's time to switch state and release the fault thread to
> > > +     * continue service page faults.
> > > +     */
> > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > 
> > Is it worth sanity checking that you were in RECOVER at this point?
> 
> Yeah, it never hurts.  Will do.

Not sure whether this would be good (note: I returned 0 in the if):

diff --git a/migration/savevm.c b/migration/savevm.c
index b7843c2..b34f59b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1709,6 +1709,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 
 static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
 {
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: illegal resume received", __func__);
+        /* Don't fail the load, only for this. */
+        return 0;
+    }
+
     /*
      * This means source VM is ready to resume the postcopy migration.
      * It's time to switch state and release the fault thread to

Basically I just don't want to crash the dest VM (it holds hot dirty
pages) even if it receives a faulty RESUME command.

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-08-03 11:21   ` Dr. David Alan Gilbert
@ 2017-08-04  7:23     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  7:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 12:21:41PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> > +{
> > +    trace_source_return_path_thread_resume_ack(value);
> > +
> > +    /*
> > +     * Currently value will always be one. It can be used in the
> > +     * future to notify source that destination cannot continue.
> > +     */
> > +    assert(value == 1);
> 
> Again I prefer the routine to fail than to assert.
> Maybe it's worth having a constant rather than the magic 1.

Will do.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare
  2017-08-03 11:38   ` Dr. David Alan Gilbert
@ 2017-08-04  7:39     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  7:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 12:38:01PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This is hook function to be called when a postcopy migration wants to
> > resume from a failure. For each module, it should provide its own
> > recovery logic before we switch to the postcopy-active state.
> 
> Would a change-state handler be able to do this,

We don't have such a change-state handler, do we?

> or perhaps
> the notifier chain I have in my shared memory world:
>  https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06459.html

The postcopy_notify() can do this as well. I just need a way to hook
up all the system modules for migration. In our case, it's only RAM,
but I think maybe one day we need block support. So as long as the
mechanism (either current SaveVMHandlers interface, or
postcopy_notify) can do the notification, IMHO it'll be fine.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume
  2017-08-03 11:56   ` Dr. David Alan Gilbert
@ 2017-08-04  7:49     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  7:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 12:56:31PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > @@ -256,6 +257,8 @@ struct RAMState {
> >      RAMBlock *last_req_rb;
> >      /* Queue of outstanding page requests from the destination */
> >      QemuMutex src_page_req_mutex;
> > +    /* Ramblock counts to sync dirty bitmap. Only used for recovery */
> > +    int ramblock_to_sync;
> >      QSIMPLEQ_HEAD(src_page_requests, RAMSrcPageRequest) src_page_requests;
> >  };
> >  typedef struct RAMState RAMState;
> > @@ -2731,6 +2734,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      return ret;
> >  }
> >  
> > +/* Sync all the dirty bitmap with destination VM.  */
> > +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
> > +{
> > +    RAMBlock *block;
> > +    QEMUFile *file = s->to_dst_file;
> > +    int ramblock_count = 0;
> > +
> > +    trace_ram_dirty_bitmap_sync("start");
> 
> Most (but not all) of our trace_ uses have separate trace_ entries for
> each step; e.g.   trace_ram_dirty_bitmap_sync_start9)

Okay.

> 
> > +    /*
> > +     * We need to take the resume lock to make sure that the send
> > +     * thread (current thread) and the rp-thread will do their work in
> > +     * order.
> > +     */
> > +    qemu_mutex_lock(&s->resume_lock);
> > +
> > +    /* Request for receive-bitmap for each block */
> > +    RAMBLOCK_FOREACH(block) {
> > +        ramblock_count++;
> > +        qemu_savevm_send_recv_bitmap(file, block->idstr);
> > +    }
> > +
> > +    /* Init the ramblock count to total */
> > +    atomic_set(&rs->ramblock_to_sync, ramblock_count);
> > +
> > +    trace_ram_dirty_bitmap_sync("wait-bitmap");
> > +
> > +    /* Wait until all the ramblocks' dirty bitmap synced */
> > +    while (rs->ramblock_to_sync) {
> > +        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > +    }
> 
> Does the locking here get simpler if you:
>   a) count the number of RAMBlocks 'n'
>   b) Initialise a sempahore to -(n-1)
>   c) Call qemu_savevm_send_recv_bitmap for each bitmap
>   d) sem_wait on the semaphore - which is waiting for the semaphore to
>      be >0
> 
> as you receive each bitmap do a sem_post; on the last one
> it should go from 0->1 and the sem_wait should wake up?

I think you are right. :-) A single semaphore suffice here (and also
for the following up handshake).

I will touch up the commit message as well.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-04  7:09       ` Peter Xu
@ 2017-08-04  8:30         ` Dr. David Alan Gilbert
  2017-08-04  9:22           ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-04  8:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > +{
> > > > +    /*
> > > > +     * This means source VM is ready to resume the postcopy migration.
> > > > +     * It's time to switch state and release the fault thread to
> > > > +     * continue service page faults.
> > > > +     */
> > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > 
> > > Is it worth sanity checking that you were in RECOVER at this point?
> > 
> > Yeah, it never hurts.  Will do.
> 
> Not sure whether this would be good (note: I returned 0 in the if):
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7843c2..b34f59b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1709,6 +1709,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>  
>  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>  {
> +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: illegal resume received", __func__);
> +        /* Don't fail the load, only for this. */
> +        return 0;
> +    }
> +
>      /*
>       * This means source VM is ready to resume the postcopy migration.
>       * It's time to switch state and release the fault thread to
> 
> Basically I just don't want to crash the dest VM (it holds hot dirty
> pages) even if it receives a faulty RESUME command.

Yes, so now that's a fun problem; effectively you then have 3 valid
failure modes:
    a) An IO failure so we need to go into POSTCOPY_PAUSE
    b) A fatal migration stream problem to quit
    c) A non-fatal migration stream problem to go .. back into PAUSE?

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 27/29] migration: setup ramstate for resume
  2017-08-03 12:37   ` Dr. David Alan Gilbert
@ 2017-08-04  8:39     ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  8:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 01:37:04PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > After we updated the dirty bitmaps of ramblocks, we also need to update
> > the critical fields in RAMState to make sure it is ready for a resume.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/ram.c | 35 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index c695b13..427bf6e 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1947,6 +1947,31 @@ static int ram_state_init(RAMState **rsp)
> >      return 0;
> >  }
> >  
> > +static void ram_state_resume_prepare(RAMState *rs)
> > +{
> > +    RAMBlock *block;
> > +    long pages = 0;
> > +
> > +    /*
> > +     * Postcopy is not using xbzrle/compression, so no need for that.
> > +     * Also, since source are already halted, we don't need to care
> > +     * about dirty page logging as well.
> > +     */
> > +
> > +    RAMBLOCK_FOREACH(block) {
> > +        pages += bitmap_count_one(block->bmap,
> > +                                  block->max_length >> TARGET_PAGE_BITS);
> 
> Again I think that needs to be block->used_length (see
> migration_bitmap_sync).

Fixing.

> 
> > +    }
> > +
> > +    /* This may not be aligned with current bitmaps. Recalculate. */
> > +    rs->migration_dirty_pages = pages;
> > +
> > +    rs->last_seen_block = NULL;
> > +    rs->last_sent_block = NULL;
> > +    rs->last_page = 0;
> > +    rs->last_version = ram_list.version;
> 
> A trace at this point with the pages count might be worthwhile.

Added.

> 
> > +}
> > +
> >  /*
> >   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
> >   * long-running RCU critical section.  When rcu-reclaims in the code
> > @@ -2842,8 +2867,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> >  static int ram_resume_prepare(MigrationState *s, void *opaque)
> >  {
> >      RAMState *rs = *(RAMState **)opaque;
> > +    int ret;
> >  
> > -    return ram_dirty_bitmap_sync_all(s, rs);
> > +    ret = ram_dirty_bitmap_sync_all(s, rs);
> 
> Interesting; I'd assumed you'd load directly into this
> bitmap rather than loading into the bitmap on each block.
> Do we ever get the case where a bitmap is set on the source
> bitmap but not in the loaded bitmap?

(confirmed with Dave offlist that this is not a problem, blame myself
 on using a poor function name "ram_dirty_bitmap_sync_all" which is
 too close to existng "migration_bitmap_sync")

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
  2017-08-03 13:54   ` Dr. David Alan Gilbert
@ 2017-08-04  8:52     ` Peter Xu
  2017-08-04  9:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  8:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Firstly, MigThrError enumeration is introduced to describe the error in
> > migration_detect_error() better. This gives the migration_thread() a
> > chance to know whether a recovery has happened.
> > 
> > Then, if a recovery is detected, migration_thread() will reset its local
> > variables to prepare for that.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 40 +++++++++++++++++++++++++++++-----------
> >  1 file changed, 29 insertions(+), 11 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index ecebe30..439bc22 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
> >      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> >  }
> >  
> > +typedef enum MigThrError {
> > +    /* No error detected */
> > +    MIG_THR_ERR_NONE = 0,
> > +    /* Detected error, but resumed successfully */
> > +    MIG_THR_ERR_RECOVERED = 1,
> > +    /* Detected fatal error, need to exit */
> > +    MIG_THR_ERR_FATAL = 2,
> > +} MigThrError;
> > +
> 
> Could you move this patch earlier to when postcopy_pause is created
> so it's created with this enum?

Sure.

[...]

> > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> >      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> >      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> >      bool enable_colo = migrate_colo_enabled();
> > +    MigThrError thr_error;
> >  
> >      rcu_register_thread();
> >  
> > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> >           * Try to detect any kind of failures, and see whether we
> >           * should stop the migration now.
> >           */
> > -        if (migration_detect_error(s)) {
> > +        thr_error = migration_detect_error(s);
> > +        if (thr_error == MIG_THR_ERR_FATAL) {
> > +            /* Stop migration */
> >              break;
> > +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > +            /*
> > +             * Just recovered from a e.g. network failure, reset all
> > +             * the local variables.
> > +             */
> > +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +            initial_bytes = 0;
> 
> They don't seem that important to reset?

The problem is that we have this in migration_thread():

        if (current_time >= initial_time + BUFFER_DELAY) {
            uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
                                         initial_bytes;
            uint64_t time_spent = current_time - initial_time;
            double bandwidth = (double)transferred_bytes / time_spent;
            threshold_size = bandwidth * s->parameters.downtime_limit;
            ...
        }

Here qemu_ftell() would possibly be very small since we have just
resumed... and then transferred_bytes will be extremely huge since
"qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
Then, with luck, we'll got extremely huge "bandwidth" as well.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
  2017-08-03 13:47   ` Dr. David Alan Gilbert
@ 2017-08-04  9:05     ` Peter Xu
  2017-08-04  9:53       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-04  9:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > +static int postcopy_resume_handshake(MigrationState *s)
> > +{
> > +    qemu_mutex_lock(&s->resume_lock);
> > +
> > +    qemu_savevm_send_postcopy_resume(s->to_dst_file);
> > +
> > +    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > +    }
> > +
> > +    qemu_mutex_unlock(&s->resume_lock);
> > +
> > +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > +        return 0;
> > +    }
> 
> That feels to be a small racy - couldn't that validly become a
> MIGRATION_STATUS_COMPLETED before that check?

Since postcopy_resume_handshake() is called in migration_thread()
context, so it won't change to complete at this point (confirmed with
Dave offlist on the question).

> 
> I wonder if we need to change migrate_fd_cancel to be able to
> cause a cancel in this case?

Yeah that's important, but haven't considered in current series. Do
you mind to postpone it as TODO as well (along with the work to allow
the user to manually switch to PAUSED state, as Dan suggested)?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-08-04  8:30         ` Dr. David Alan Gilbert
@ 2017-08-04  9:22           ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  9:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 09:30:01AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > > 
> > > [...]
> > > 
> > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > > +{
> > > > > +    /*
> > > > > +     * This means source VM is ready to resume the postcopy migration.
> > > > > +     * It's time to switch state and release the fault thread to
> > > > > +     * continue service page faults.
> > > > > +     */
> > > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > > 
> > > > Is it worth sanity checking that you were in RECOVER at this point?
> > > 
> > > Yeah, it never hurts.  Will do.
> > 
> > Not sure whether this would be good (note: I returned 0 in the if):
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index b7843c2..b34f59b 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1709,6 +1709,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> >  
> >  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> >  {
> > +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        error_report("%s: illegal resume received", __func__);
> > +        /* Don't fail the load, only for this. */
> > +        return 0;
> > +    }
> > +
> >      /*
> >       * This means source VM is ready to resume the postcopy migration.
> >       * It's time to switch state and release the fault thread to
> > 
> > Basically I just don't want to crash the dest VM (it holds hot dirty
> > pages) even if it receives a faulty RESUME command.
> 
> Yes, so now that's a fun problem; effectively you then have 3 valid
> failure modes:
>     a) An IO failure so we need to go into POSTCOPY_PAUSE
>     b) A fatal migration stream problem to quit
>     c) A non-fatal migration stream problem to go .. back into PAUSE?

Hmm yes...

So I got at least three TODO ITEMs now:

- support manual switch source into PAUSED state
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
  PAUSED state on both sides

It just depends on whether we would like to postpone these work, or we
think any of them are essential even for the first version.

IMHO we can postpone this 3rd one as well.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-08-04  3:43         ` Peter Xu
@ 2017-08-04  9:33           ` Dr. David Alan Gilbert
  2017-08-04  9:44             ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-04  9:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > 
> > > [...]
> > > 
> > > > > +/* Return true if we should continue the migration, or false. */
> > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > +{
> > > > > +    trace_postcopy_pause_incoming();
> > > > > +
> > > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > +
> > > > > +    assert(mis->from_src_file);
> > > > > +    qemu_file_shutdown(mis->from_src_file);
> > > > > +    qemu_fclose(mis->from_src_file);
> > > > > +    mis->from_src_file = NULL;
> > > > > +
> > > > > +    assert(mis->to_src_file);
> > > > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > > > +    qemu_file_shutdown(mis->to_src_file);
> > > > > +    qemu_fclose(mis->to_src_file);
> > > > > +    mis->to_src_file = NULL;
> > > > > +    qemu_mutex_unlock(&mis->rp_mutex);
> > > > 
> > > > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > > > 
> > > >     static void migrate_send_rp_message(MigrationIncomingState *mis,
> > > >                                         enum mig_rp_message_type message_type,
> > > >                                         uint16_t len, void *data)
> > > >     {
> > > >         trace_migrate_send_rp_message((int)message_type, len);
> > > >         qemu_mutex_lock(&mis->rp_mutex);
> > > >         qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > > >         qemu_put_be16(mis->to_src_file, len);
> > > >         qemu_put_buffer(mis->to_src_file, data, len);
> > > >         qemu_fflush(mis->to_src_file);
> > > >         qemu_mutex_unlock(&mis->rp_mutex);
> > > >     }
> > > > 
> > > > If we came into postcopy_pause_incoming at about the same time
> > > > migrate_send_rp_message was being called and pause_incoming took the
> > > > lock first, then once it release the lock, send_rp_message carries on
> > > > and uses mis->to_src_file that's now NULL.
> > > > 
> > > > One solution here is to just call qemu_file_shutdown() but leave the
> > > > files open at this point, but clean the files up sometime later.
> > > 
> > > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > > co-op here, and as long as we are with patch 14, we should be ok.
> > > 
> > > > 
> > > > > +
> > > > > +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > > +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > > +    }
> > > > > +
> > > > > +    trace_postcopy_pause_incoming_continued();
> > > > > +
> > > > > +    return true;
> > > > > +}
> > > > > +
> > > > >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > >  {
> > > > >      uint8_t section_type;
> > > > >      int ret = 0;
> > > > >  
> > > > > +retry:
> > > > >      while (true) {
> > > > >          section_type = qemu_get_byte(f);
> > > > >  
> > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > >  out:
> > > > >      if (ret < 0) {
> > > > >          qemu_file_set_error(f, ret);
> > > > > +
> > > > > +        /*
> > > > > +         * Detect whether it is:
> > > > > +         *
> > > > > +         * 1. postcopy running
> > > > > +         * 2. network failure (-EIO)
> > > > > +         *
> > > > > +         * If so, we try to wait for a recovery.
> > > > > +         */
> > > > > +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > > +            ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > > +            /* Reset f to point to the newly created channel */
> > > > > +            f = mis->from_src_file;
> > > > > +            goto retry;
> > > > > +        }
> > > > 
> > > > I wonder if:
> > > > 
> > > >            if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > >                ret == -EIO && postcopy_pause_incoming(mis)) {
> > > >                /* Try again after postcopy recovery */
> > > >                return qemu_loadvm_state_main(mis->from_src_file, mis);
> > > >            }
> > > > would be nicer; it avoids the goto loop.
> > > 
> > > I agree we should avoid using goto loops. However I do see vast usages
> > > of goto like this one when we want to redo part of the procedures of a
> > > function (or, of course, when handling errors in "C-style").
> > 
> > We mostly use them to jump forward to an error exit; only rarely do
> > we do loops with them;  so if we can sensibly avoid them it's best.
> > 
> > > Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> > > also has defect: stack usage would be out of control, or even, it can
> > > be controled by malicious users. E.g., if someone used program to
> > > periodically stop/start any network endpoint along the migration
> > > network, QEMU may go into a paused -> recovery -> active -> paused ...
> > > loop, and stack usage will just grow with time. I'd say it's an
> > > extreme example though...
> > 
> > I think it's safe because it's a tail-call so a new stack frame isn't
> > needed.
> 
> I tried it and dumped the assembly, looks like even with tail-call, we
> didn't really avoid the "callq":
> 
> (gdb) disassemble qemu_loadvm_state_main
> Dump of assembler code for function qemu_loadvm_state_main:
>    0x00000000005d9ff8 <+0>:     push   %rbp
>    0x00000000005d9ff9 <+1>:     mov    %rsp,%rbp
>    0x00000000005d9ffc <+4>:     sub    $0x20,%rsp
>    0x00000000005da000 <+8>:     mov    %rdi,-0x18(%rbp)
>    0x00000000005da004 <+12>:    mov    %rsi,-0x20(%rbp)
>    0x00000000005da008 <+16>:    movl   $0x0,-0x4(%rbp)
>    0x00000000005da00f <+23>:    mov    -0x18(%rbp),%rax
>    0x00000000005da013 <+27>:    mov    %rax,%rdi
>    0x00000000005da016 <+30>:    callq  0x5e185e <qemu_get_byte>
> 
> [...]
> 
>    0x00000000005da135 <+317>:   jne    0x5da165 <qemu_loadvm_state_main+365>
>    0x00000000005da137 <+319>:   cmpl   $0xfffffffb,-0x4(%rbp)
>    0x00000000005da13b <+323>:   jne    0x5da165 <qemu_loadvm_state_main+365>
>    0x00000000005da13d <+325>:   mov    -0x20(%rbp),%rax
>    0x00000000005da141 <+329>:   mov    %rax,%rdi
>    0x00000000005da144 <+332>:   callq  0x5d9eb4 <postcopy_pause_incoming>
>    0x00000000005da149 <+337>:   test   %al,%al
>    0x00000000005da14b <+339>:   je     0x5da165 <qemu_loadvm_state_main+365>
>    0x00000000005da14d <+341>:   mov    -0x20(%rbp),%rax
>    0x00000000005da151 <+345>:   mov    (%rax),%rax
>    0x00000000005da154 <+348>:   mov    -0x20(%rbp),%rdx
>    0x00000000005da158 <+352>:   mov    %rdx,%rsi
>    0x00000000005da15b <+355>:   mov    %rax,%rdi
>    0x00000000005da15e <+358>:   callq  0x5d9ff8 <qemu_loadvm_state_main>
>                                 ^^^^^^^^^^^^^^^ (this one)
>    0x00000000005da163 <+363>:   jmp    0x5da168 <qemu_loadvm_state_main+368>
>    0x00000000005da165 <+365>:   mov    -0x4(%rbp),%eax
>    0x00000000005da168 <+368>:   leaveq
>    0x00000000005da169 <+369>:   retq
> 
> Do we need extra compilation parameters to achieve the tail-call
> optimization for gcc? My gcc version is: v6.1.1 20160621.
> 
> (even with extra flags, I am still a bit worried on whether it'll work
>  on the other compilers though)

Huh, I'd expected it to be smarter than that; not sure why it didn't!
Anyway, tbh I wouldn't worry about the stack depth in this case.

> And, the "label-way" to retry is indeed used widely at least in both
> QEMU and Linux kernel. I tried to directly grep "^retry:" (so we are
> ignoring the same usage using different label names), there are ~30
> usage in QEMU and hundreds of cases in Linux kernel. So not sure
> whether this can be seen as another "legal" way to use C labels...

OK, my distaste for Goto's is perhaps a bit stronger than others;
it's OK though.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
  2017-08-04  9:33           ` Dr. David Alan Gilbert
@ 2017-08-04  9:44             ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-04  9:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 10:33:19AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > +/* Return true if we should continue the migration, or false. */
> > > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > > +{
> > > > > > +    trace_postcopy_pause_incoming();
> > > > > > +
> > > > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > > +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > > +
> > > > > > +    assert(mis->from_src_file);
> > > > > > +    qemu_file_shutdown(mis->from_src_file);
> > > > > > +    qemu_fclose(mis->from_src_file);
> > > > > > +    mis->from_src_file = NULL;
> > > > > > +
> > > > > > +    assert(mis->to_src_file);
> > > > > > +    qemu_mutex_lock(&mis->rp_mutex);
> > > > > > +    qemu_file_shutdown(mis->to_src_file);
> > > > > > +    qemu_fclose(mis->to_src_file);
> > > > > > +    mis->to_src_file = NULL;
> > > > > > +    qemu_mutex_unlock(&mis->rp_mutex);
> > > > > 
> > > > > Hmm is that safe?  If we look at migrate_send_rp_message we have:
> > > > > 
> > > > >     static void migrate_send_rp_message(MigrationIncomingState *mis,
> > > > >                                         enum mig_rp_message_type message_type,
> > > > >                                         uint16_t len, void *data)
> > > > >     {
> > > > >         trace_migrate_send_rp_message((int)message_type, len);
> > > > >         qemu_mutex_lock(&mis->rp_mutex);
> > > > >         qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > > > >         qemu_put_be16(mis->to_src_file, len);
> > > > >         qemu_put_buffer(mis->to_src_file, data, len);
> > > > >         qemu_fflush(mis->to_src_file);
> > > > >         qemu_mutex_unlock(&mis->rp_mutex);
> > > > >     }
> > > > > 
> > > > > If we came into postcopy_pause_incoming at about the same time
> > > > > migrate_send_rp_message was being called and pause_incoming took the
> > > > > lock first, then once it release the lock, send_rp_message carries on
> > > > > and uses mis->to_src_file that's now NULL.
> > > > > 
> > > > > One solution here is to just call qemu_file_shutdown() but leave the
> > > > > files open at this point, but clean the files up sometime later.
> > > > 
> > > > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > > > co-op here, and as long as we are with patch 14, we should be ok.
> > > > 
> > > > > 
> > > > > > +
> > > > > > +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > > > +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > > > +    }
> > > > > > +
> > > > > > +    trace_postcopy_pause_incoming_continued();
> > > > > > +
> > > > > > +    return true;
> > > > > > +}
> > > > > > +
> > > > > >  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > > >  {
> > > > > >      uint8_t section_type;
> > > > > >      int ret = 0;
> > > > > >  
> > > > > > +retry:
> > > > > >      while (true) {
> > > > > >          section_type = qemu_get_byte(f);
> > > > > >  
> > > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > > >  out:
> > > > > >      if (ret < 0) {
> > > > > >          qemu_file_set_error(f, ret);
> > > > > > +
> > > > > > +        /*
> > > > > > +         * Detect whether it is:
> > > > > > +         *
> > > > > > +         * 1. postcopy running
> > > > > > +         * 2. network failure (-EIO)
> > > > > > +         *
> > > > > > +         * If so, we try to wait for a recovery.
> > > > > > +         */
> > > > > > +        if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > > > +            ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > > > +            /* Reset f to point to the newly created channel */
> > > > > > +            f = mis->from_src_file;
> > > > > > +            goto retry;
> > > > > > +        }
> > > > > 
> > > > > I wonder if:
> > > > > 
> > > > >            if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > >                ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > >                /* Try again after postcopy recovery */
> > > > >                return qemu_loadvm_state_main(mis->from_src_file, mis);
> > > > >            }
> > > > > would be nicer; it avoids the goto loop.
> > > > 
> > > > I agree we should avoid using goto loops. However I do see vast usages
> > > > of goto like this one when we want to redo part of the procedures of a
> > > > function (or, of course, when handling errors in "C-style").
> > > 
> > > We mostly use them to jump forward to an error exit; only rarely do
> > > we do loops with them;  so if we can sensibly avoid them it's best.
> > > 
> > > > Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> > > > also has defect: stack usage would be out of control, or even, it can
> > > > be controled by malicious users. E.g., if someone used program to
> > > > periodically stop/start any network endpoint along the migration
> > > > network, QEMU may go into a paused -> recovery -> active -> paused ...
> > > > loop, and stack usage will just grow with time. I'd say it's an
> > > > extreme example though...
> > > 
> > > I think it's safe because it's a tail-call so a new stack frame isn't
> > > needed.
> > 
> > I tried it and dumped the assembly, looks like even with tail-call, we
> > didn't really avoid the "callq":
> > 
> > (gdb) disassemble qemu_loadvm_state_main
> > Dump of assembler code for function qemu_loadvm_state_main:
> >    0x00000000005d9ff8 <+0>:     push   %rbp
> >    0x00000000005d9ff9 <+1>:     mov    %rsp,%rbp
> >    0x00000000005d9ffc <+4>:     sub    $0x20,%rsp
> >    0x00000000005da000 <+8>:     mov    %rdi,-0x18(%rbp)
> >    0x00000000005da004 <+12>:    mov    %rsi,-0x20(%rbp)
> >    0x00000000005da008 <+16>:    movl   $0x0,-0x4(%rbp)
> >    0x00000000005da00f <+23>:    mov    -0x18(%rbp),%rax
> >    0x00000000005da013 <+27>:    mov    %rax,%rdi
> >    0x00000000005da016 <+30>:    callq  0x5e185e <qemu_get_byte>
> > 
> > [...]
> > 
> >    0x00000000005da135 <+317>:   jne    0x5da165 <qemu_loadvm_state_main+365>
> >    0x00000000005da137 <+319>:   cmpl   $0xfffffffb,-0x4(%rbp)
> >    0x00000000005da13b <+323>:   jne    0x5da165 <qemu_loadvm_state_main+365>
> >    0x00000000005da13d <+325>:   mov    -0x20(%rbp),%rax
> >    0x00000000005da141 <+329>:   mov    %rax,%rdi
> >    0x00000000005da144 <+332>:   callq  0x5d9eb4 <postcopy_pause_incoming>
> >    0x00000000005da149 <+337>:   test   %al,%al
> >    0x00000000005da14b <+339>:   je     0x5da165 <qemu_loadvm_state_main+365>
> >    0x00000000005da14d <+341>:   mov    -0x20(%rbp),%rax
> >    0x00000000005da151 <+345>:   mov    (%rax),%rax
> >    0x00000000005da154 <+348>:   mov    -0x20(%rbp),%rdx
> >    0x00000000005da158 <+352>:   mov    %rdx,%rsi
> >    0x00000000005da15b <+355>:   mov    %rax,%rdi
> >    0x00000000005da15e <+358>:   callq  0x5d9ff8 <qemu_loadvm_state_main>
> >                                 ^^^^^^^^^^^^^^^ (this one)
> >    0x00000000005da163 <+363>:   jmp    0x5da168 <qemu_loadvm_state_main+368>
> >    0x00000000005da165 <+365>:   mov    -0x4(%rbp),%eax
> >    0x00000000005da168 <+368>:   leaveq
> >    0x00000000005da169 <+369>:   retq
> > 
> > Do we need extra compilation parameters to achieve the tail-call
> > optimization for gcc? My gcc version is: v6.1.1 20160621.
> > 
> > (even with extra flags, I am still a bit worried on whether it'll work
> >  on the other compilers though)
> 
> Huh, I'd expected it to be smarter than that; not sure why it didn't!
> Anyway, tbh I wouldn't worry about the stack depth in this case.

(I agree I was harsh...)

> 
> > And, the "label-way" to retry is indeed used widely at least in both
> > QEMU and Linux kernel. I tried to directly grep "^retry:" (so we are
> > ignoring the same usage using different label names), there are ~30
> > usage in QEMU and hundreds of cases in Linux kernel. So not sure
> > whether this can be seen as another "legal" way to use C labels...
> 
> OK, my distaste for Goto's is perhaps a bit stronger than others;
> it's OK though.

So I "struggled" for my laziness to keep those labels... Thanks! :-P

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-04  6:59     ` Peter Xu
@ 2017-08-04  9:49       ` Dr. David Alan Gilbert
  2017-08-07  6:11         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-04  9:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote:
> > > +    /* Size of the bitmap, in bytes */
> > > +    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > +    qemu_put_be64(file, size);
> > > +    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
> > 
> > Do we need to be careful about endianness and length of long here?
> > The migration stream can (theoretically) migrate between hosts of
> > different endianness, e.g. a Power LE and Power BE host it can also
> > migrate between a 32bit and 64bit host where the 'long' used in our
> > bitmap is a different length.
> 
> Ah, good catch...
> 
> I feel like we'd better provide a new bitmap helper for this when the
> bitmap will be sent to somewhere else, like:
> 
>   void bitmap_to_le(unsigned long *dst, const unsigned long *src,
>                     long nbits);
>   void bitmap_from_le(unsigned long *dst, const unsigned long *src,
>                       long nbits);
> 
> I used little endian since I *think* that should work even cross 32/64
> bits machines (and I think big endian should not work).

Lets think about some combinations:

64 bit LE  G0,G1...G7
64 bit BE  G7,G6...G0
32 bit LE  A0,A1,A2,A3, B0,B1,B2,B3
32 bit BE  A3,A2,A1,A0  B3,B2,B1,B0

considering a 64bit BE src to a 32bit LE dest:
  64 bit BE  G7,G6...G0
  bitmap_to_le swaps that to
             G0,G1,..G7

destination reads two 32bit chunks:
  G0,G1,G2,G3    G4,G5,G6,G7

dest is LE so no byteswap is needed.

Yes, I _think_ that's OK.

> > I think that means you have to save it as a series of long's;
> > and also just make sure 'size' is a multiple of 'long' - otherwise
> > you lose the last few bytes, which on a big endian system would
> > be a problem.
> 
> Yeah, then the size should possibly be pre-cooked with
> BITS_TO_LONGS(). However that's slightly tricky as well, maybe I
> should provide another bitmap helper:
> 
>   static inline long bitmap_size(long nbits)
>   {
>       return BITS_TO_LONGS(nbits);
>   }
> 
> Since the whole thing should be part of bitmap APIs imho.

The macro is enough I think.

> > 
> > Also, should we be using 'max_length' or 'used_length' - ram_save_setup
> > stores the used_length.  I don't think we should be accessing outside
> > the used_length?  That might also make the thing about 'size' being
> > rounded to a 'long' more interesting; maybe need to check you don't use
> > the bits outside the used_length.
> 
> Yes. AFAIU max_length and used_length are always the same currently in
> our codes. I used max_length since in ram_state_init() we inited
> block->bmap and block->unsentmap with it. I can switch to used_length
> though.

I remember it went in a couple of years ago because there were cases it
was different; they're rare though - I think it was an ACPI case.

> > > +    qemu_fflush(file);
> > > +
> > > +    if (qemu_file_get_error(file)) {
> > > +        return qemu_file_get_error(file);
> > > +    }
> > > +
> > > +    return sizeof(size) + size;
> > 
> > I think since size is always sent as a 64bit that's  size + 8.
> 
> Yes. I "offloaded" the calcluation of sizeof(size) to compiler (in
> case I got brain furt when writting the codes...). So you prefer
> digits directly in these cases? It might be just fragile if we changed
> the type of "size" someday (though I guess we won't).
> 
> Let me use "size + 8".

Lets stick with what you have actually; it's OK since size is a
uint64_t.

Dave

> 
> > 
> > > +}
> > > +
> > > +/*
> > >   * An outstanding page request, on the source, having been received
> > >   * and queued
> > >   */
> > > @@ -2705,6 +2731,54 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > >      return ret;
> > >  }
> > >  
> > > +/*
> > > + * Read the received bitmap, revert it as the initial dirty bitmap.
> > > + * This is only used when the postcopy migration is paused but wants
> > > + * to resume from a middle point.
> > > + */
> > > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > > +{
> > > +    QEMUFile *file = s->rp_state.from_dst_file;
> > > +    uint64_t local_size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > +    uint64_t size;
> > > +
> > > +    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > > +        error_report("%s: incorrect state %s", __func__,
> > > +                     MigrationStatus_lookup[s->state]);
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    size = qemu_get_be64(file);
> > > +
> > > +    /* The size of the bitmap should match with our ramblock */
> > > +    if (size != local_size) {
> > > +        error_report("%s: ramblock '%s' bitmap size mismatch "
> > > +                     "(0x%lx != 0x%lx)", __func__, block->idstr,
> > > +                     size, local_size);
> > > +        return -EINVAL;
> > > +    }
> > 
> > Coming back to the used_length thing above;  again I think the rule
> > is that the used_length has to match not the max_length.
> 
> Yeah. Will switch.
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
  2017-08-04  8:52     ` Peter Xu
@ 2017-08-04  9:52       ` Dr. David Alan Gilbert
  2017-08-07  6:57         ` Peter Xu
  0 siblings, 1 reply; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-04  9:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Firstly, MigThrError enumeration is introduced to describe the error in
> > > migration_detect_error() better. This gives the migration_thread() a
> > > chance to know whether a recovery has happened.
> > > 
> > > Then, if a recovery is detected, migration_thread() will reset its local
> > > variables to prepare for that.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c | 40 +++++++++++++++++++++++++++++-----------
> > >  1 file changed, 29 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ecebe30..439bc22 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s)
> > >      return atomic_read(&s->start_postcopy) || s->start_postcopy_fast;
> > >  }
> > >  
> > > +typedef enum MigThrError {
> > > +    /* No error detected */
> > > +    MIG_THR_ERR_NONE = 0,
> > > +    /* Detected error, but resumed successfully */
> > > +    MIG_THR_ERR_RECOVERED = 1,
> > > +    /* Detected fatal error, need to exit */
> > > +    MIG_THR_ERR_FATAL = 2,
> > > +} MigThrError;
> > > +
> > 
> > Could you move this patch earlier to when postcopy_pause is created
> > so it's created with this enum?
> 
> Sure.
> 
> [...]
> 
> > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> > >      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > >      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> > >      bool enable_colo = migrate_colo_enabled();
> > > +    MigThrError thr_error;
> > >  
> > >      rcu_register_thread();
> > >  
> > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> > >           * Try to detect any kind of failures, and see whether we
> > >           * should stop the migration now.
> > >           */
> > > -        if (migration_detect_error(s)) {
> > > +        thr_error = migration_detect_error(s);
> > > +        if (thr_error == MIG_THR_ERR_FATAL) {
> > > +            /* Stop migration */
> > >              break;
> > > +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > > +            /*
> > > +             * Just recovered from a e.g. network failure, reset all
> > > +             * the local variables.
> > > +             */
> > > +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +            initial_bytes = 0;
> > 
> > They don't seem that important to reset?
> 
> The problem is that we have this in migration_thread():
> 
>         if (current_time >= initial_time + BUFFER_DELAY) {
>             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
>                                          initial_bytes;
>             uint64_t time_spent = current_time - initial_time;
>             double bandwidth = (double)transferred_bytes / time_spent;
>             threshold_size = bandwidth * s->parameters.downtime_limit;
>             ...
>         }
> 
> Here qemu_ftell() would possibly be very small since we have just
> resumed... and then transferred_bytes will be extremely huge since
> "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
> Then, with luck, we'll got extremely huge "bandwidth" as well.

Ah yes that's a good reason to reset it then; add a comment like
'important to avoid breaking transferred_bytes and bandwidth
calculation'

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 28/29] migration: final handshake for the resume
  2017-08-04  9:05     ` Peter Xu
@ 2017-08-04  9:53       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-04  9:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 02:47:44PM +0100, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > +static int postcopy_resume_handshake(MigrationState *s)
> > > +{
> > > +    qemu_mutex_lock(&s->resume_lock);
> > > +
> > > +    qemu_savevm_send_postcopy_resume(s->to_dst_file);
> > > +
> > > +    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > > +        qemu_cond_wait(&s->resume_cond, &s->resume_lock);
> > > +    }
> > > +
> > > +    qemu_mutex_unlock(&s->resume_lock);
> > > +
> > > +    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > > +        return 0;
> > > +    }
> > 
> > That feels to be a small racy - couldn't that validly become a
> > MIGRATION_STATUS_COMPLETED before that check?
> 
> Since postcopy_resume_handshake() is called in migration_thread()
> context, so it won't change to complete at this point (confirmed with
> Dave offlist on the question).

Yes.

> > 
> > I wonder if we need to change migrate_fd_cancel to be able to
> > cause a cancel in this case?
> 
> Yeah that's important, but haven't considered in current series. Do
> you mind to postpone it as TODO as well (along with the work to allow
> the user to manually switch to PAUSED state, as Dan suggested)?

Yes I don't the cancel in that case is that important; it's already in
the recovery from a bad situation.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-04  9:49       ` Dr. David Alan Gilbert
@ 2017-08-07  6:11         ` Peter Xu
  2017-08-07  9:04           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Xu @ 2017-08-07  6:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 10:49:42AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote:
> > > > +    /* Size of the bitmap, in bytes */
> > > > +    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > > +    qemu_put_be64(file, size);
> > > > +    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
> > > 
> > > Do we need to be careful about endianness and length of long here?
> > > The migration stream can (theoretically) migrate between hosts of
> > > different endianness, e.g. a Power LE and Power BE host it can also
> > > migrate between a 32bit and 64bit host where the 'long' used in our
> > > bitmap is a different length.
> > 
> > Ah, good catch...
> > 
> > I feel like we'd better provide a new bitmap helper for this when the
> > bitmap will be sent to somewhere else, like:
> > 
> >   void bitmap_to_le(unsigned long *dst, const unsigned long *src,
> >                     long nbits);
> >   void bitmap_from_le(unsigned long *dst, const unsigned long *src,
> >                       long nbits);
> > 
> > I used little endian since I *think* that should work even cross 32/64
> > bits machines (and I think big endian should not work).
> 
> Lets think about some combinations:
> 
> 64 bit LE  G0,G1...G7
> 64 bit BE  G7,G6...G0
> 32 bit LE  A0,A1,A2,A3, B0,B1,B2,B3
> 32 bit BE  A3,A2,A1,A0  B3,B2,B1,B0
> 
> considering a 64bit BE src to a 32bit LE dest:
>   64 bit BE  G7,G6...G0
>   bitmap_to_le swaps that to
>              G0,G1,..G7
> 
> destination reads two 32bit chunks:
>   G0,G1,G2,G3    G4,G5,G6,G7
> 
> dest is LE so no byteswap is needed.
> 
> Yes, I _think_ that's OK.

Hmm, I thought it over again and see another problem, which makes it
more interesting...

The size of the bitmap can actually be different on hosts that are
using different word sizes (or say, long size, which should be the
same size of the word size). E.g., when we allocate a bitmap that
covers nbits=6*32+1, we'll get 28 bytes (6*4+4) sized bitmap on 32bit
machines, and 32 bytes (3*8+8) sized bitmap on 64bit machines.

I really hoped that we have some typedef for current bitmap, like:

  typedef long *Bitmap;

Then I can simply consider to switch the "long" to uint64_t, but sadly
we don't have such. Then, type switching would be slightly overkill
(maybe still doable, at least I need to touch lots of files).

One of the solution is: I always send the bitmap in 8-bytes chunk,
even on 32bit machines. If there is not aligned bitmap size (of course
it'll only happen on 32bit machines), I align it up to 8-bytes, then
fill the rest with zeros. I'll try to do this hack only in migration
code for now.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed
  2017-08-04  9:52       ` Dr. David Alan Gilbert
@ 2017-08-07  6:57         ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-07  6:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

On Fri, Aug 04, 2017 at 10:52:27AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Aug 03, 2017 at 02:54:35PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > > @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque)
> > > >      /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > > >      enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
> > > >      bool enable_colo = migrate_colo_enabled();
> > > > +    MigThrError thr_error;
> > > >  
> > > >      rcu_register_thread();
> > > >  
> > > > @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque)
> > > >           * Try to detect any kind of failures, and see whether we
> > > >           * should stop the migration now.
> > > >           */
> > > > -        if (migration_detect_error(s)) {
> > > > +        thr_error = migration_detect_error(s);
> > > > +        if (thr_error == MIG_THR_ERR_FATAL) {
> > > > +            /* Stop migration */
> > > >              break;
> > > > +        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
> > > > +            /*
> > > > +             * Just recovered from a e.g. network failure, reset all
> > > > +             * the local variables.
> > > > +             */
> > > > +            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > +            initial_bytes = 0;
> > > 
> > > They don't seem that important to reset?
> > 
> > The problem is that we have this in migration_thread():
> > 
> >         if (current_time >= initial_time + BUFFER_DELAY) {
> >             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
> >                                          initial_bytes;
> >             uint64_t time_spent = current_time - initial_time;
> >             double bandwidth = (double)transferred_bytes / time_spent;
> >             threshold_size = bandwidth * s->parameters.downtime_limit;
> >             ...
> >         }
> > 
> > Here qemu_ftell() would possibly be very small since we have just
> > resumed... and then transferred_bytes will be extremely huge since
> > "qemu_ftell(s->to_dst_file) - initial_bytes" is actually negative...
> > Then, with luck, we'll got extremely huge "bandwidth" as well.
> 
> Ah yes that's a good reason to reset it then; add a comment like
> 'important to avoid breaking transferred_bytes and bandwidth
> calculation'

Will do.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-08-07  6:11         ` Peter Xu
@ 2017-08-07  9:04           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 116+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-07  9:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Aug 04, 2017 at 10:49:42AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Aug 03, 2017 at 11:50:22AM +0100, Dr. David Alan Gilbert wrote:
> > > > > +    /* Size of the bitmap, in bytes */
> > > > > +    size = (block->max_length >> TARGET_PAGE_BITS) / 8;
> > > > > +    qemu_put_be64(file, size);
> > > > > +    qemu_put_buffer(file, (const uint8_t *)block->receivedmap, size);
> > > > 
> > > > Do we need to be careful about endianness and length of long here?
> > > > The migration stream can (theoretically) migrate between hosts of
> > > > different endianness, e.g. a Power LE and Power BE host it can also
> > > > migrate between a 32bit and 64bit host where the 'long' used in our
> > > > bitmap is a different length.
> > > 
> > > Ah, good catch...
> > > 
> > > I feel like we'd better provide a new bitmap helper for this when the
> > > bitmap will be sent to somewhere else, like:
> > > 
> > >   void bitmap_to_le(unsigned long *dst, const unsigned long *src,
> > >                     long nbits);
> > >   void bitmap_from_le(unsigned long *dst, const unsigned long *src,
> > >                       long nbits);
> > > 
> > > I used little endian since I *think* that should work even cross 32/64
> > > bits machines (and I think big endian should not work).
> > 
> > Lets think about some combinations:
> > 
> > 64 bit LE  G0,G1...G7
> > 64 bit BE  G7,G6...G0
> > 32 bit LE  A0,A1,A2,A3, B0,B1,B2,B3
> > 32 bit BE  A3,A2,A1,A0  B3,B2,B1,B0
> > 
> > considering a 64bit BE src to a 32bit LE dest:
> >   64 bit BE  G7,G6...G0
> >   bitmap_to_le swaps that to
> >              G0,G1,..G7
> > 
> > destination reads two 32bit chunks:
> >   G0,G1,G2,G3    G4,G5,G6,G7
> > 
> > dest is LE so no byteswap is needed.
> > 
> > Yes, I _think_ that's OK.
> 
> Hmm, I thought it over again and see another problem, which makes it
> more interesting...
> 
> The size of the bitmap can actually be different on hosts that are
> using different word sizes (or say, long size, which should be the
> same size of the word size). E.g., when we allocate a bitmap that
> covers nbits=6*32+1, we'll get 28 bytes (6*4+4) sized bitmap on 32bit
> machines, and 32 bytes (3*8+8) sized bitmap on 64bit machines.
> 
> I really hoped that we have some typedef for current bitmap, like:
> 
>   typedef long *Bitmap;
> 
> Then I can simply consider to switch the "long" to uint64_t, but sadly
> we don't have such. Then, type switching would be slightly overkill
> (maybe still doable, at least I need to touch lots of files).

Yes, it's messy.

> One of the solution is: I always send the bitmap in 8-bytes chunk,
> even on 32bit machines. If there is not aligned bitmap size (of course
> it'll only happen on 32bit machines), I align it up to 8-bytes, then
> fill the rest with zeros. I'll try to do this hack only in migration
> code for now.

Yes, although you may find it easier to send them in 4byte chunks and
pad on reception.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery
  2017-08-03 15:57 ` Dr. David Alan Gilbert
@ 2017-08-21  7:47   ` Peter Xu
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Xu @ 2017-08-21  7:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Laurent Vivier, Alexey Perevalov, Juan Quintela,
	Andrea Arcangeli, berrange

On Thu, Aug 03, 2017 at 04:57:54PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > As we all know that postcopy migration has a potential risk to lost
> > the VM if the network is broken during the migration. This series
> > tries to solve the problem by allowing the migration to pause at the
> > failure point, and do recovery after the link is reconnected.
> > 
> > There was existing work on this issue from Md Haris Iqbal:
> > 
> > https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> > 
> > This series is a totally re-work of the issue, based on Alexey
> > Perevalov's recved bitmap v8 series:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
> 
> 
> Hi Peter,
>   See my comments on the individual patches; but at a top level I think
> it looks pretty good.
> 
>   I still worry about two related things, one I see is similar to what
> you discussed with Dan.
> 
>   1) Is what happens if we end up hanging on a missing page with the bql
>   taken and can't use the monitor.
>   Checking my notes from when I was chatting to Harris last year,
>     'info cpu' was pretty good at doing this because it needed the vcpus
>   to come out of their loops, so if any vcpu was blocked on memory we'd
>   block waiting.  The other case is where an emulated IO device accesses
>   it, and that's easiest by doing a migrate with inbound network
>   traffic.
>   In this case, will your 'accept' still work?

It will not work.

To solve this problem, I posted the series:

  [RFC 0/6] monitor: allow per-monitor thread

Let's see whether that is acceptable.

> 
>   2) Similar to Dan's question of what happens if the network just hangs
>   as opposed to gives an error;  it should eventually sort itself out
>   with TCP timeouts - eventually.  Perhaps the easiest way to test this
>   is just to add a iptables -j DROP  for the migration port - it's
>   probably easier to trigger (1).

Yeah, so I think I'll just avoid considering this for now.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2017-08-21  7:47 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34   ` Dr. David Alan Gilbert
2017-08-01  2:11     ` Peter Xu
2017-08-01  5:48       ` Alexey Perevalov
2017-08-01  6:02         ` Peter Xu
2017-08-01  6:12           ` Alexey Perevalov
2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53   ` Dr. David Alan Gilbert
2017-08-01  2:25     ` Peter Xu
2017-08-01  8:32       ` Daniel P. Berrange
2017-08-01  8:55         ` Dr. David Alan Gilbert
2017-08-02  3:21           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11   ` Dr. David Alan Gilbert
2017-08-01  2:43     ` Peter Xu
2017-08-01  8:40       ` Dr. David Alan Gilbert
2017-08-02  3:20         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39   ` Dr. David Alan Gilbert
2017-08-01  5:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45   ` Dr. David Alan Gilbert
2017-08-01  3:01     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52   ` Dr. David Alan Gilbert
2017-08-01  3:13     ` Peter Xu
2017-08-01  8:50       ` Dr. David Alan Gilbert
2017-08-02  3:31         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53   ` Eric Blake
2017-07-31  7:02     ` Peter Xu
2017-07-31 19:06   ` Dr. David Alan Gilbert
2017-08-01  6:28     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01  9:47   ` Dr. David Alan Gilbert
2017-08-02  5:06     ` Peter Xu
2017-08-03 14:03       ` Dr. David Alan Gilbert
2017-08-04  3:43         ` Peter Xu
2017-08-04  9:33           ` Dr. David Alan Gilbert
2017-08-04  9:44             ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57   ` Eric Blake
2017-07-31  7:05     ` Peter Xu
2017-08-01 10:42   ` Dr. David Alan Gilbert
2017-08-01 11:03   ` Daniel P. Berrange
2017-08-02  5:56     ` Peter Xu
2017-08-02  9:28       ` Daniel P. Berrange
2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59   ` Dr. David Alan Gilbert
2017-08-02  6:14     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36   ` Dr. David Alan Gilbert
2017-08-02  6:42     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56   ` Daniel P. Berrange
2017-08-02  7:02     ` Peter Xu
2017-08-02  9:26       ` Daniel P. Berrange
2017-08-02 11:02         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03  9:28   ` Dr. David Alan Gilbert
2017-08-04  5:46     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03  9:49   ` Dr. David Alan Gilbert
2017-08-04  6:08     ` Peter Xu
2017-08-04  6:15       ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50   ` Dr. David Alan Gilbert
2017-08-04  6:59     ` Peter Xu
2017-08-04  9:49       ` Dr. David Alan Gilbert
2017-08-07  6:11         ` Peter Xu
2017-08-07  9:04           ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05   ` Dr. David Alan Gilbert
2017-08-04  7:04     ` Peter Xu
2017-08-04  7:09       ` Peter Xu
2017-08-04  8:30         ` Dr. David Alan Gilbert
2017-08-04  9:22           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21   ` Dr. David Alan Gilbert
2017-08-04  7:23     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38   ` Dr. David Alan Gilbert
2017-08-04  7:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56   ` Dr. David Alan Gilbert
2017-08-04  7:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37   ` Dr. David Alan Gilbert
2017-08-04  8:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47   ` Dr. David Alan Gilbert
2017-08-04  9:05     ` Peter Xu
2017-08-04  9:53       ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54   ` Dr. David Alan Gilbert
2017-08-04  8:52     ` Peter Xu
2017-08-04  9:52       ` Dr. David Alan Gilbert
2017-08-07  6:57         ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21  7:47   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.