All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/46] Postcopy implementation
@ 2014-07-04 17:41 Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (46 more replies)
  0 siblings, 47 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is the 1st cut of my version of postcopy; it is designed for use with
the Linux kernel additions recently posted by Andrea Arcangeli here:

   http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00525.html

The current status is:
  1) It works - I've done testing on large/busy VMs using
     google-stress app test, and it completes cleanly even over a 1Gbps
     link.

  2) It's still rather rough around the corner cases and exit code,
     so it's probably not ready for review yet, but I thought I'd put
     it out for people working in the area and for people to see the
     general shape.

I've taken some ideas from Isaku Yamahata's work, but the code is
structured a bit differently, however I'm greatful for the understanding
I got from Isaku's code of the problems postcopy has to deal with.

Note, the last patch in the series is a chunk of documentation,
which it's probably useful to read for detail.

Some points:
   a) I've tried to keep the pieces I've added general rather than postcopy
      specific.

   b) Precopy is done using the main precopy code (with very few modifications)
      and it automatically switches to postcopy after a timeout.

   c) A 'command' section type is added that controls the progress of postcopy
      and can be used for anything else people need to control that's not actually
      guest state.

   d) There is a 'return path' mechanism for sending data from the destination
      back to the source, postcopy uses this for the page requests, but I think
      it's also useful for things like the failover implementations, and can
      be used to signal failure from the destination to the source.

   e) I've added a 'migration_set_parameter' command as somewhere to put integer
      parameters associated with migration.
      e.1) And I use that initially for the length of precopy to try.

   f) I've added a 'MigrationIncomingState' where postcopy keeps it's data,
      it seems a good idea to move other incoming state there as well.

c & d together provide a 'reqack' that's basically a ping, quite useful
for debugging

Current TODO:
   1) It's not bisectable yet
   2) There are no testsuite additions (although I have a virt-test modification
      I've been using).
   3) End-of-migration cleanup is unfinished, as is some of the error handling.
   4) Not all the code is there for systems with hostpagesize!=qemupagesize
   5) xbzrle needs disabling once in postcopy
   6) RDMA needs some rework
   7) I moved QEMUFile into a public header, it's a bad idea and I'll undo that.
   8) My latency measurements on page requests are generally pretty good, but I'm
      seeing a few high spikes, I need to investigate.
   9) Andrea has suggestions on ways to avoid some of the huge-page splitting
      that occurs during the discard phase after precopy.
  10) I'd like to format the data on the return path in a more structured way
      (i.e. maybe using stuff from my BER world).

I'm hoping to get that lot cleared up by 2.2

How to use:
   migrate_set_capability x-postcopy-ram on
   migrate_set_parameter x-postcopy-start-time 500            <--- that's in ms
   migrate -d tcp:whereever:port

Other ideas:
  I think the returnpath+command could be used to send qmp commands to
  the destination to allow the src/destination to coordinate a hotplug
  or to set up the destination state.

Dave
Dr. David Alan Gilbert (46):
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  Move QEMUFile structure to qemu-file.h
  QEMUSizedBuffer/QEMUFile
  improve DPRINTF macros, add to savevm
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Create MigrationIncomingState
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  qemu_loadvm debug
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  migrate_init: Call from savevm
  Allow savevm handlers to state whether they could go into postcopy
  postcopy: OS support test
  Migration parameters: Add qmp/hmp commands for setting/viewing
  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  qemu_savevm_state_complete: Postcopy changes
  Postcopy: Maintain sentmap during postcopy pre phase
  Postcopy page-map-incoming (PMI) structure
  postcopy: Add incoming_init/cleanup functions
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: postcopy_start
  Postcopy: Rework migration thread for postcopy mode
  mig fd_connect: open return path
  Postcopy: Create a fault handler thread before marking the ram as
    userfault
  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  Add assertion to check migration_dirty_pages doesn't go -ve; have seen
    it happen once but not sure why
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  qemu_ram_block_from_host
  Handle userfault requests (although userfaultfd not done yet)
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
  postcopy: Use userfaultfd
  End of migration for postcopy
  Start documenting how postcopy works.

 Makefile.objs                    |   2 +-
 arch_init.c                      | 477 ++++++++++++++++++++--
 docs/migration.txt               | 148 +++++++
 exec.c                           |  60 ++-
 hmp-commands.hx                  |  17 +
 hmp.c                            |  54 +++
 hmp.h                            |   4 +
 include/exec/cpu-common.h        |   7 +-
 include/migration/migration.h    | 115 ++++++
 include/migration/postcopy-ram.h |  83 ++++
 include/migration/qemu-file.h    |  62 +++
 include/migration/vmstate.h      |   2 +-
 include/qemu/typedefs.h          |   8 +-
 include/sysemu/sysemu.h          |  41 +-
 migration-rdma.c                 |   4 +-
 migration.c                      | 695 ++++++++++++++++++++++++++++++--
 monitor.c                        |  25 ++
 postcopy-ram.c                   | 833 +++++++++++++++++++++++++++++++++++++++
 qapi-schema.json                 |  56 ++-
 qemu-file.c                      | 578 +++++++++++++++++++++++++--
 qmp-commands.hx                  |  23 ++
 savevm.c                         | 822 +++++++++++++++++++++++++++++++++++---
 22 files changed, 3933 insertions(+), 183 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-07 15:46   ` Eric Blake
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 02/46] Move QEMUFile structure to qemu-file.h Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's none-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and __qemu_rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration-rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 5a2a25e..a9ad052 100644
--- a/exec.c
+++ b/exec.c
@@ -2786,12 +2786,18 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->length, opaque);
+        ret = func(block->idstr, block->host, block->offset, block->length,
+                   opaque);
+        if (ret) {
+            return ret;
+        }
     }
+    return 0;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e3ec4c8..8042f50 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -118,10 +118,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration-rdma.c b/migration-rdma.c
index d99812c..666c052 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -595,10 +595,10 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
+    return __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 02/46] Move QEMUFile structure to qemu-file.h
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 03/46] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

This is mostly as an easy way to get to the MigrationIncomingState
that I'm hanging off the file.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com
---
 include/migration/qemu-file.h | 22 ++++++++++++++++++++++
 qemu-file.c                   | 40 +++++++++-------------------------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..6e797bf 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -82,6 +82,9 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                size_t size,
                                int *bytes_sent);
 
+#define QEMUFILE_IO_BUF_SIZE 32768
+#define QEMUFILE_MAX_IOV_SIZE MIN(IOV_MAX, 64)
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +97,25 @@ typedef struct QEMUFileOps {
     QEMURamSaveFunc *save_page;
 } QEMUFileOps;
 
+struct QEMUFile {
+    const QEMUFileOps *ops;
+    void *opaque;
+
+    int64_t bytes_xfer;
+    int64_t xfer_limit;
+
+    int64_t pos; /* start of buffer when writing, end of buffer
+                    when reading */
+    int buf_index;
+    int buf_size; /* 0 when writing */
+    uint8_t buf[QEMUFILE_IO_BUF_SIZE];
+
+    struct iovec iov[QEMUFILE_MAX_IOV_SIZE];
+    unsigned int iovcnt;
+
+    int last_error;
+};
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..b4f0c73 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -6,28 +6,6 @@
 #include "migration/qemu-file.h"
 #include "trace.h"
 
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
-
-struct QEMUFile {
-    const QEMUFileOps *ops;
-    void *opaque;
-
-    int64_t bytes_xfer;
-    int64_t xfer_limit;
-
-    int64_t pos; /* start of buffer when writing, end of buffer
-                    when reading */
-    int buf_index;
-    int buf_size; /* 0 when writing */
-    uint8_t buf[IO_BUF_SIZE];
-
-    struct iovec iov[MAX_IOV_SIZE];
-    unsigned int iovcnt;
-
-    int last_error;
-};
-
 typedef struct QEMUFileStdio {
     FILE *stdio_file;
     QEMUFile *file;
@@ -553,7 +531,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
     f->buf_size = pending;
 
     len = f->ops->get_buffer(f->opaque, f->buf + pending, f->pos,
-                        IO_BUF_SIZE - pending);
+                        QEMUFILE_IO_BUF_SIZE - pending);
     if (len > 0) {
         f->buf_size += len;
         f->pos += len;
@@ -621,7 +599,7 @@ static void add_to_iovec(QEMUFile *f, const uint8_t *buf, int size)
         f->iov[f->iovcnt++].iov_len = size;
     }
 
-    if (f->iovcnt >= MAX_IOV_SIZE) {
+    if (f->iovcnt >= QEMUFILE_MAX_IOV_SIZE) {
         qemu_fflush(f);
     }
 }
@@ -650,7 +628,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
     }
 
     while (size > 0) {
-        l = IO_BUF_SIZE - f->buf_index;
+        l = QEMUFILE_IO_BUF_SIZE - f->buf_index;
         if (l > size) {
             l = size;
         }
@@ -660,7 +638,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
             add_to_iovec(f, f->buf + f->buf_index, l);
         }
         f->buf_index += l;
-        if (f->buf_index == IO_BUF_SIZE) {
+        if (f->buf_index == QEMUFILE_IO_BUF_SIZE) {
             qemu_fflush(f);
         }
         if (qemu_file_get_error(f)) {
@@ -683,7 +661,7 @@ void qemu_put_byte(QEMUFile *f, int v)
         add_to_iovec(f, f->buf + f->buf_index, 1);
     }
     f->buf_index++;
-    if (f->buf_index == IO_BUF_SIZE) {
+    if (f->buf_index == QEMUFILE_IO_BUF_SIZE) {
         qemu_fflush(f);
     }
 }
@@ -709,8 +687,8 @@ int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
     int index;
 
     assert(!qemu_file_is_writable(f));
-    assert(offset < IO_BUF_SIZE);
-    assert(size <= IO_BUF_SIZE - offset);
+    assert(offset < QEMUFILE_IO_BUF_SIZE);
+    assert(size <= QEMUFILE_IO_BUF_SIZE - offset);
 
     /* The 1st byte to read from */
     index = f->buf_index + offset;
@@ -759,7 +737,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
     while (pending > 0) {
         int res;
 
-        res = qemu_peek_buffer(f, buf, MIN(pending, IO_BUF_SIZE), 0);
+        res = qemu_peek_buffer(f, buf, MIN(pending, QEMUFILE_IO_BUF_SIZE), 0);
         if (res == 0) {
             return done;
         }
@@ -780,7 +758,7 @@ int qemu_peek_byte(QEMUFile *f, int offset)
     int index = f->buf_index + offset;
 
     assert(!qemu_file_is_writable(f));
-    assert(offset < IO_BUF_SIZE);
+    assert(offset < QEMUFILE_IO_BUF_SIZE);
 
     if (index >= f->buf_size) {
         qemu_fill_buffer(f);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 03/46] QEMUSizedBuffer/QEMUFile
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 02/46] Move QEMUFile structure to qemu-file.h Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 04/46] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Stefan Berger's patch to create a QEMUFile that goes to a memory buffer;
from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Joel Schopp <jschopp@linux.vnet.ibm.com>

For minor tweeks/rebase I've done to it:
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  28 +++
 include/qemu/typedefs.h       |   1 +
 qemu-file.c                   | 410 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 439 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 6e797bf..1ce3702 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #define QEMU_FILE_H 1
 #include "exec/cpu-common.h"
 
+#include <stdint.h>
+
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
  * streaming.  The handler should try to write all of the data it can.
@@ -116,11 +118,21 @@ struct QEMUFile {
     int last_error;
 };
 
+struct QEMUSizedBuffer {
+    struct iovec *iov;
+    size_t n_iov;
+    size_t size; /* total allocated size in all iov's */
+    size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
@@ -133,6 +145,22 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
 bool qemu_file_mode_is_not_valid(const char *mode);
 
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+                       uint8_t **buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+                     off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
     qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct MSIMessage MSIMessage;
 typedef struct SerialState SerialState;
 typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index b4f0c73..69479f1 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -856,3 +856,413 @@ uint64_t qemu_get_be64(QEMUFile *f)
     v |= qemu_get_be32(f);
     return v;
 }
+
+#define QSB_CHUNK_SIZE      (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *       hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+    QEMUSizedBuffer *qsb;
+    size_t alloc_len, num_chunks, i, to_copy;
+    size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+                        ? QSB_MAX_CHUNK_SIZE
+                        : QSB_CHUNK_SIZE;
+
+    if (len == 0) {
+        /* we want to allocate at least one chunk */
+        len = QSB_CHUNK_SIZE;
+    }
+
+    num_chunks = DIV_ROUND_UP(len, chunk_size);
+    alloc_len = num_chunks * chunk_size;
+
+    qsb = g_new0(QEMUSizedBuffer, 1);
+    qsb->iov = g_new0(struct iovec, num_chunks);
+    qsb->n_iov = num_chunks;
+
+    for (i = 0; i < num_chunks; i++) {
+        qsb->iov[i].iov_base = g_malloc0(chunk_size);
+        qsb->iov[i].iov_len = chunk_size;
+        if (buffer) {
+            to_copy = (len - qsb->used) > chunk_size
+                      ? chunk_size : (len - qsb->used);
+            memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy);
+            qsb->used += to_copy;
+        }
+    }
+
+    qsb->size = alloc_len;
+
+    return qsb;
+}
+
+/**
+ * Free the QEMUSizedBuffer
+ *
+ * @qsb: The QEMUSizedBuffer to free
+ */
+void qsb_free(QEMUSizedBuffer *qsb)
+{
+    size_t i;
+
+    if (!qsb) {
+        return;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        g_free(qsb->iov[i].iov_base);
+    }
+    g_free(qsb->iov);
+    g_free(qsb);
+}
+
+/**
+ * Get the number of of used bytes in the QEMUSizedBuffer
+ *
+ * @qsb: A QEMUSizedBuffer
+ *
+ * Returns the number of bytes currently used in this buffer
+ */
+size_t qsb_get_length(const QEMUSizedBuffer *qsb)
+{
+    return qsb->used;
+}
+
+/**
+ * Set the length of the buffer; The primary usage of this
+ * function is to truncate the number of used bytes in the buffer.
+ * The size will not be extended beyond the current  number of
+ * allocated bytes in the QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_len : The new length of bytes in the buffer
+ *
+ * Returns the number of bytes the buffer was trucated or extended
+ * to.
+ */
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t new_len)
+{
+    if (new_len <= qsb->size) {
+        qsb->used = new_len;
+    } else {
+        qsb->used = qsb->size;
+    }
+    return qsb->used;
+}
+
+/**
+ * Get the iovec that holds the data for a given position @pos.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @pos: The index of a byte in the buffer
+ * @d_off: Pointer to an offset that this function will indicate
+ *         at what position within the returned iovec the byte
+ *         is to be found
+ *
+ * Returns the index of the iovec that holds the byte at the given
+ * index @pos in the byte stream; a negative number if the iovec
+ * for the given position @pos does not exist.
+ */
+static ssize_t qsb_get_iovec(const QEMUSizedBuffer *qsb,
+                             off_t pos, off_t *d_off)
+{
+    ssize_t i;
+    off_t curr = 0;
+
+    if (pos > qsb->used) {
+        return -1;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        if (curr + qsb->iov[i].iov_len > pos) {
+            *d_off = pos - curr;
+            return i;
+        }
+        curr += qsb->iov[i].iov_len;
+    }
+    return -1;
+}
+
+/*
+ * Convert the QEMUSizedBuffer into a flat buffer.
+ *
+ * Note: If at all possible, try to avoid this function since it
+ *       may unnecessarily copy memory around.
+ *
+ * @qsb: pointer to QEMUSizedBuffer
+ * @start : offset to start at
+ * @count: number of bytes to copy
+ * @buf: a pointer to an optional buffer to write into; the pointer may
+ *       point to NULL in which case the buffer will be allocated;
+ *       if buffer is provided, it must be large enough to hold @count bytes
+ *
+ * Returns the number of bytes  copied into the output buffer
+ */
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *qsb, off_t start,
+                       size_t count, uint8_t **buf)
+{
+    uint8_t *buffer;
+    const struct iovec *iov;
+    size_t to_copy, all_copy;
+    ssize_t index;
+    off_t s_off;
+    off_t d_off = 0;
+    char *s;
+
+    if (start > qsb->used) {
+        return 0;
+    }
+
+    all_copy = qsb->used - start;
+    if (all_copy > count) {
+        all_copy = count;
+    } else {
+        count = all_copy;
+    }
+
+    if (*buf == NULL) {
+        *buf = g_malloc(all_copy);
+    }
+    buffer = *buf;
+
+    index = qsb_get_iovec(qsb, start, &s_off);
+    if (index < 0) {
+        return 0;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        s = iov->iov_base;
+
+        to_copy = iov->iov_len - s_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+        memcpy(&buffer[d_off], &s[s_off], to_copy);
+
+        d_off += to_copy;
+        all_copy -= to_copy;
+
+        s_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Grow the QEMUSizedBuffer to the given size and allocated
+ * memory for it.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_size: The new size of the buffer
+ *
+ * Returns an error code in case of memory allocation failure
+ * or the new size of the buffer otherwise. The returned size
+ * may be greater or equal to @new_size.
+ */
+static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size)
+{
+    size_t needed_chunks, i;
+    size_t chunk_size = QSB_CHUNK_SIZE;
+
+    if (qsb->size < new_size) {
+        needed_chunks = DIV_ROUND_UP(new_size - qsb->size,
+                                     chunk_size);
+
+        qsb->iov = g_realloc_n(qsb->iov, qsb->n_iov + needed_chunks,
+                               sizeof(struct iovec));
+        if (qsb->iov == NULL) {
+            return -ENOMEM;
+        }
+
+        for (i = qsb->n_iov; i < qsb->n_iov + needed_chunks; i++) {
+            qsb->iov[i].iov_base = g_malloc0(chunk_size);
+            qsb->iov[i].iov_len = chunk_size;
+        }
+
+        qsb->n_iov += needed_chunks;
+        qsb->size += (needed_chunks * chunk_size);
+    }
+
+    return qsb->size;
+}
+
+/**
+ * Write into the QEMUSizedBuffer at a given position and a given
+ * number of bytes. This function will automatically grow the
+ * QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @source: A byte array to copy data from
+ * @pos: The position withing the @qsb to write data to
+ * @size: The number of bytes to copy into the @qsb
+ *
+ * Returns an error code in case of memory allocation failure,
+ * @size otherwise.
+ */
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
+                     off_t pos, size_t count)
+{
+    ssize_t rc = qsb_grow(qsb, pos + count);
+    size_t to_copy;
+    size_t all_copy = count;
+    const struct iovec *iov;
+    ssize_t index;
+    char *dest;
+    off_t d_off, s_off = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    if (pos + count > qsb->used) {
+        qsb->used = pos + count;
+    }
+
+    index = qsb_get_iovec(qsb, pos, &d_off);
+    if (index < 0) {
+        return 0;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        dest = iov->iov_base;
+
+        to_copy = iov->iov_len - d_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+
+        memcpy(&dest[d_off], &source[s_off], to_copy);
+
+        s_off += to_copy;
+        all_copy -= to_copy;
+
+        d_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Create an exact copy of the given QEMUSizedBuffer.
+ *
+ * @qsb : A QEMUSizedBuffer
+ *
+ * Returns a clone of @qsb
+ */
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
+{
+    QEMUSizedBuffer *out = qsb_create(NULL, qsb_get_length(qsb));
+    size_t i;
+    off_t pos = 0;
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        pos += qsb_write_at(out, qsb->iov[i].iov_base,
+                            pos, qsb->iov[i].iov_len);
+    }
+
+    return out;
+}
+
+typedef struct QEMUBuffer {
+    QEMUSizedBuffer *qsb;
+    QEMUFile *file;
+} QEMUBuffer;
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+    ssize_t len = qsb_get_length(s->qsb) - pos;
+
+    if (len <= 0) {
+        return 0;
+    }
+
+    if (len > size) {
+        len = size;
+    }
+    return qsb_get_buffer(s->qsb, pos, len, &buf);
+}
+
+static int buf_put_buffer(void *opaque, const uint8_t *buf,
+                          int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+
+    return qsb_write_at(s->qsb, buf, pos, size);
+}
+
+static int buf_close(void *opaque)
+{
+    QEMUBuffer *s = opaque;
+
+    qsb_free(s->qsb);
+
+    g_free(s);
+
+    return 0;
+}
+
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f)
+{
+    QEMUBuffer *p;
+
+    qemu_fflush(f);
+
+    p = (QEMUBuffer *)f->opaque;
+
+    return p->qsb;
+}
+
+static const QEMUFileOps buf_read_ops = {
+    .get_buffer = buf_get_buffer,
+    .close =      buf_close
+};
+
+static const QEMUFileOps buf_write_ops = {
+    .put_buffer = buf_put_buffer,
+    .close =      buf_close
+};
+
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
+{
+    QEMUBuffer *s;
+
+    if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || mode[1] != 0) {
+        fprintf(stderr, "qemu_bufopen: Argument validity check failed\n");
+        return NULL;
+    }
+
+    s = g_malloc0(sizeof(QEMUBuffer));
+    if (mode[0] == 'r') {
+        s->qsb = input;
+    }
+
+    if (s->qsb == NULL) {
+        s->qsb = qsb_create(NULL, 0);
+    }
+
+    if (mode[0] == 'r') {
+        s->file = qemu_fopen_ops(s, &buf_read_ops);
+    } else {
+        s->file = qemu_fopen_ops(s, &buf_write_ops);
+    }
+    return s->file;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 04/46] improve DPRINTF macros, add to savevm
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 03/46] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 05/46] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Improve the existing DPRINTF macros in migration.c and arch_init
by:
  1) Making them go to stderr rather than stdout (so you can run with
-nographic and redirect your debug to a file)
  2) Making them print the ms time with each debug - useful for
debugging latency issues

Add the same macro to savevm

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c |  5 ++++-
 migration.c | 12 ++++++++++++
 savevm.c    | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 8ddaf35..eb6455a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -53,9 +53,12 @@
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
 
+// #define DEBUG_ARCH_INIT
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
-    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+    do { fprintf(stderr,  "arch_init@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
 #else
 #define DPRINTF(fmt, ...) \
     do { } while (0)
diff --git a/migration.c b/migration.c
index 8d675b3..e241370 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,18 @@
 #include "qmp-commands.h"
 #include "trace.h"
 
+//#define DEBUG_MIGRATION
+
+#ifdef DEBUG_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "migration@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 enum {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
diff --git a/savevm.c b/savevm.c
index e19ae0a..c3a1f68 100644
--- a/savevm.c
+++ b/savevm.c
@@ -43,6 +43,16 @@
 #include "block/snapshot.h"
 #include "block/qapi.h"
 
+#ifdef DEBUG_SAVEVM
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "savevm@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 05/46] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 04/46] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 06/46] Create MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

and use it in loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 qemu-file.c                   | 15 +++++++++++++++
 savevm.c                      | 18 ++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 1ce3702..e6d3a5c 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -322,4 +322,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);
 #endif
diff --git a/qemu-file.c b/qemu-file.c
index 69479f1..88cacc7 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -857,6 +857,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: 0 on success and a 0 terminated string in the buffer
+ */
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
+{
+    unsigned int len = qemu_get_byte(f);
+    int res = qemu_get_buffer(f, buf, len);
+
+    buf[len] = 0;
+
+    return res != len;
+}
+
 #define QSB_CHUNK_SIZE      (1 << 10)
 #define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
 
diff --git a/savevm.c b/savevm.c
index c3a1f68..cb6f0de 100644
--- a/savevm.c
+++ b/savevm.c
@@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
 
     v = qemu_get_be32(f);
     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
         return -ENOTSUP;
     }
     if (v != QEMU_VM_FILE_VERSION) {
@@ -918,31 +918,33 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
-        char idstr[257];
-        int len;
+        char idstr[256];
 
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
             /* Read section start */
             section_id = qemu_get_be32(f);
-            len = qemu_get_byte(f);
-            qemu_get_buffer(f, (uint8_t *)idstr, len);
-            idstr[len] = 0;
+            if (qemu_get_counted_string(f, (uint8_t *)idstr)) {
+                error_report("Unable to read ID string for section %u",
+                            section_id);
+                return -EINVAL;
+            }
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
-                fprintf(stderr, "Unknown savevm section or instance '%s' %d\n", idstr, instance_id);
+                error_report("Unknown savevm section or instance '%s' %d",
+                             idstr, instance_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
-                fprintf(stderr, "savevm: unsupported version %d for '%s' v%d\n",
+                error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
                 ret = -EINVAL;
                 goto out;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 06/46] Create MigrationIncomingState
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 05/46] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co
Add MIS to QEMUFile

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  8 ++++++++
 include/migration/qemu-file.h |  2 ++
 include/qemu/typedefs.h       |  2 ++
 migration.c                   | 22 ++++++++++++++++++++++
 savevm.c                      |  2 ++
 5 files changed, 36 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..4103460 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,6 +41,14 @@ struct MigrationParams {
 
 typedef struct MigrationState MigrationState;
 
+/* State for the incoming migration */
+struct MigrationIncomingState {
+    QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
+void migration_incoming_state_destroy(MigrationIncomingState *mis);
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index e6d3a5c..df38646 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -116,6 +116,8 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+
+    MigrationIncomingState *mis;
 };
 
 struct QEMUSizedBuffer {
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index db1153a..0f79b5c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct Visitor Visitor;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 
 typedef struct Property Property;
@@ -44,6 +45,7 @@ typedef struct PixelFormat PixelFormat;
 typedef struct QemuConsole QemuConsole;
 typedef struct CharDriverState CharDriverState;
 typedef struct MACAddr MACAddr;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct NetClientState NetClientState;
 typedef struct I2CBus I2CBus;
 typedef struct ISABus ISABus;
diff --git a/migration.c b/migration.c
index e241370..c5b133d 100644
--- a/migration.c
+++ b/migration.c
@@ -100,15 +100,37 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     }
 }
 
+MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
+{
+    MigrationIncomingState *mis = g_malloc0(sizeof(MigrationIncomingState));
+    mis->file = f;
+
+    return mis;
+}
+
+void migration_incoming_state_destroy(MigrationIncomingState *mis)
+{
+    g_free(mis);
+}
+
 static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
     int ret;
 
+    mis = migration_incoming_state_init(f);
+    f->mis = mis;
+
     ret = qemu_loadvm_state(f);
+
+    f->mis = NULL;
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
+    migration_incoming_state_destroy(mis);
+    mis = NULL;
+
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
         exit(EXIT_FAILURE);
diff --git a/savevm.c b/savevm.c
index cb6f0de..46cb9b0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,9 +1244,11 @@ int load_vmstate(const char *name)
     }
 
     qemu_system_reset(VMRESET_SILENT);
+    f->mis = migration_incoming_state_init(f);
     ret = qemu_loadvm_state(f);
 
     qemu_fclose(f);
+    migration_incoming_state_destroy(f->mis);
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 06/46] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-05 10:06   ` Paolo Bonzini
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  8 +++++
 qemu-file.c                   | 74 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index df38646..ec1a342 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -87,6 +87,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
 #define QEMUFILE_IO_BUF_SIZE 32768
 #define QEMUFILE_MAX_IOV_SIZE MIN(IOV_MAX, 64)
 
+/*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -97,6 +102,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
 
 struct QEMUFile {
@@ -117,6 +123,7 @@ struct QEMUFile {
 
     int last_error;
 
+    struct QEMUFile *return_path;
     MigrationIncomingState *mis;
 };
 
@@ -202,6 +209,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/qemu-file.c b/qemu-file.c
index 88cacc7..98a6d2a 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -16,6 +16,54 @@ typedef struct QEMUFileSocket {
     QEMUFile *file;
 } QEMUFileSocket;
 
+/* Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ * qemu_fopen_socket marks write fd's as blocking, but doesn't
+ * touch read fd's status, so we dup the fd just to keep settings
+ * separate. [TBD: Do I need to explicitly mark as non-block on read?]
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *qfs = opaque;
+    int revfd;
+    bool this_is_read;
+    QEMUFile *result;
+
+    /* If it's already open, return it */
+    if (qfs->file->return_path) {
+        return qfs->file->return_path;
+    }
+
+    if (qemu_file_get_error(qfs->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    /* I don't think there's a better way to tell which direction 'this' is */
+    this_is_read = qfs->file->ops->get_buffer != NULL;
+
+    revfd = dup(qfs->fd);
+    if (revfd == -1) {
+        error_report("Error duplicating fd for return path: %s",
+                      strerror(errno));
+        return NULL;
+    }
+
+    qemu_set_nonblock(revfd);
+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+    qfs->file->return_path = result;
+
+    if (result) {
+        /* We are the reverse path of our reverse path (although I don't
+           expect this to be used, it would stop another dup if it was */
+        result->return_path = qfs->file;
+    } else {
+        close(revfd);
+    }
+
+    return result;
+}
+
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                     int64_t pos)
 {
@@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd =     socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close =      socket_close
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd =     socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close =      socket_close
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .get_return_path = socket_dup_return_path
 };
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-05 10:07   ` Paolo Bonzini
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 09/46] Migration commands Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The return path uses a non-blocking fd so as not to block waiting
for the (possibly broken) destination to finish returning a message,
however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/qemu-file.c b/qemu-file.c
index 98a6d2a..9809428 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -70,12 +70,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
     }
-    return len;
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 09/46] Migration commands
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 10/46] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  7 +++++
 savevm.c                      | 61 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4103460..728b93e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -33,6 +33,7 @@
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_COMMAND              0x06
 
 struct MigrationParams {
     bool blk;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..66821eb 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -81,6 +81,13 @@ void do_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+
+    QEMU_VM_CMD_AFTERLASTVALID
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
diff --git a/savevm.c b/savevm.c
index 46cb9b0..0c7d537 100644
--- a/savevm.c
+++ b/savevm.c
@@ -592,6 +592,25 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
     vmstate_save_state(f, se->vmsd, se->opaque);
 }
 
+
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    uint32_t tmp = (uint16_t)command;
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, tmp);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -881,6 +900,42 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
+/* Process an incoming 'QEMU_VM_COMMAND'
+ * -ve return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t com;
+    uint16_t len;
+    uint32_t tmp32;
+
+    com = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
+    switch (com) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -987,6 +1042,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 10/46] Return path: Control commands
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 09/46] Migration commands Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 11/46] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPENRP - To request that the destination open the return path
   * REQACK - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  2 ++
 savevm.c                      | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 728b93e..0e21c5d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,8 @@ typedef struct MigrationState MigrationState;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
+
+    QEMUFile *return_path;
 };
 
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 66821eb..b25e938 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,8 @@ void qemu_announce_self(void);
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
+    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
     QEMU_VM_CMD_AFTERLASTVALID
 };
diff --git a/savevm.c b/savevm.c
index 0c7d537..16b672b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -611,6 +611,19 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    DPRINTF("send_reqack %d", value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, QEMU_VM_CMD_REQACK, 4, (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_openrp(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
+}
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -918,6 +931,7 @@ static int loadvm_process_command_simple_lencheck(const char *name,
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = f->mis;
     uint16_t com;
     uint16_t len;
     uint32_t tmp32;
@@ -927,6 +941,35 @@ static int loadvm_process_command(QEMUFile *f)
 
     /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
     switch (com) {
+    case QEMU_VM_CMD_OPENRP:
+        if (loadvm_process_command_simple_lencheck("CMD_OPENRP", len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPENRP called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPENRP failed - could not open return path");
+            return -1;
+        }
+        break;
+
+    case QEMU_VM_CMD_REQACK:
+        if (loadvm_process_command_simple_lencheck("CMD_REQACK", len, 4)) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        DPRINTF("Received REQACK 0x%x", tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_REQACK (0x%x) received with no open return path",
+                         tmp32);
+            return -1;
+        }
+        migrate_send_rp_ack(mis, tmp32);
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 11/46] Return path: Send responses from destination to source
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 10/46] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 12/46] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_ack to send an 'ack' message
  Use it in the CMD_REQACK handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 15 +++++++++++++++
 migration.c                   | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0e21c5d..375efec 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,6 +40,12 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Commands sent on the return path from destination to source*/
+enum mig_rpcomm_cmd {
+    MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
+    MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_AFTERLASTVALID
+};
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -47,6 +53,7 @@ struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
 
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
@@ -167,6 +174,14 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data);
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value);
+
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index c5b133d..74bffbc 100644
--- a/migration.c
+++ b/migration.c
@@ -77,6 +77,32 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+/* Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data)
+{
+    DPRINTF("migrate_send_rp_message: cmd=%d, len=%d\n", (int)cmd, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)cmd);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/* Send an 'ACK' message on the return channel with the given value */
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -104,6 +130,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 {
     MigrationIncomingState *mis = g_malloc0(sizeof(MigrationIncomingState));
     mis->file = f;
+    qemu_mutex_init(&mis->rp_mutex);
 
     return mis;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 12/46] Return path: Source handling of return path
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 11/46] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 13/46] qemu_loadvm debug Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  10 +++
 migration.c                   | 142 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 375efec..f722f06 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -46,6 +46,14 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
     MIG_RPCOMM_AFTERLASTVALID
 };
+
+struct MigrationRetPathState {
+    uint16_t header_com;   /* Headers of last (partially?) received command */
+    uint16_t header_len;
+    uint32_t latest_ack;
+    bool     error; /* True if something bad happened on the RP */
+};
+
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -67,9 +75,11 @@ struct MigrationState
     QemuThread thread;
     QEMUBH *cleanup_bh;
     QEMUFile *file;
+    QEMUFile *return_path;
 
     int state;
     MigrationParams params;
+    struct MigrationRetPathState rp_state;
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration.c b/migration.c
index 74bffbc..e69a49e 100644
--- a/migration.c
+++ b/migration.c
@@ -351,6 +351,17 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    if (ms->return_path) {
+        DPRINTF("cleaning up return path\n");
+        int return_fd = qemu_get_fd(ms->return_path);
+        qemu_set_fd_handler2(return_fd, NULL, NULL, NULL, ms);
+        qemu_fclose(ms->return_path);
+        ms->return_path = NULL;
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -358,6 +369,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -635,8 +648,135 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
+
+/*
+ * 'can read handler' for the fd callback
+ * stops the data handler being called if it's gone into
+ * error.
+ * Note: Currently we don't provide a recovery mechanism,
+ * if we do then we'll have to call qemu_notify_event
+ */
+static int source_return_path_can_read_handler(void *opaque)
+{
+    MigrationState *s = opaque;
+    return !s->rp_state.error;
+}
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ * We might be called multiple times for the same command if the data
+ * wasn't fully available when we were first called. When that
+ * happens we stash the command/length as soon as we get it.
+ */
+static void source_return_path_handler(void *opaque)
+{
+    MigrationState *s = opaque;
+    QEMUFile *rp = qemu_file_get_return_path(s->file);
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    DPRINTF("RP: Receive\n");
+    if (!rp || qemu_file_get_error(rp)) {
+        source_return_path_bad(s);
+        return;
+    }
+
+    if (s->rp_state.header_com == MIG_RPCOMM_INVALID) {
+        uint16_t expected_len;
+
+        /* No command stored, so we're expecting a new header */
+        res = qemu_peek_buffer(rp, buf, 4, 0);
+
+        /* If we couldn't get all of our header, give up and
+         * we should be called back again when the rest arrives
+         */
+        if (res != 4) {
+            return;
+        }
+        qemu_file_skip(rp, 4);
 
+        s->rp_state.header_com = be16_to_cpup((uint16_t *)buf);
+        s->rp_state.header_len = be16_to_cpup((uint16_t *)(buf + 2));
+
+        switch (s->rp_state.header_com) {
+        case MIG_RPCOMM_ACK:
+            expected_len = 4;
+            break;
+
+        default:
+            DPRINTF("RP: Received invalid cmd 0x%04x length 0x%04x\n",
+                    s->rp_state.header_com, s->rp_state.header_len);
+            source_return_path_bad(s);
+            return;
+        }
+
+        if (s->rp_state.header_len > expected_len) {
+            DPRINTF("RP: Received command 0x%04x with"
+                    "incorrect length %d expecting %d\n",
+                    s->rp_state.header_com, s->rp_state.header_len,
+                    expected_len);
+            source_return_path_bad(s);
+            return;
+        }
+    }
+
+    /* We know we've got a valid header by this point */
+    res = qemu_peek_buffer(rp, buf, s->rp_state.header_len, 0);
+
+    /* If we haven't got all the data yet, just try again later */
+    if (res != s->rp_state.header_len) {
+        return;
+    }
+    qemu_file_skip(rp, s->rp_state.header_len);
+
+    /* OK, we have the command and the data */
+    switch (s->rp_state.header_com) {
+    case MIG_RPCOMM_ACK:
+        tmp32 = be32_to_cpup((uint32_t *)buf);
+        DPRINTF("RP: Received ACK 0x%x\n", tmp32);
+        atomic_xchg(&s->rp_state.latest_ack, tmp32);
+        break;
+
+    default:
+        /* This shouldn't happen because we should catch this above */
+        DPRINTF("RP: Bad header_com in dispatch\n");
+    }
+    /* Latest command processed, now leave a gap for the next one */
+    s->rp_state.header_com = MIG_RPCOMM_INVALID;
+}
+
+static int open_outgoing_return_path(MigrationState *ms)
+{
+    int return_fd;
+
+    ms->return_path = qemu_file_get_return_path(ms->file);
+    if (!ms->return_path) {
+        return -1;
+    }
+
+    return_fd = qemu_get_fd(ms->return_path);
+    qemu_set_fd_handler2(return_fd, source_return_path_can_read_handler,
+                         source_return_path_handler, NULL, ms);
+
+    return 0;
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 13/46] qemu_loadvm debug
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 12/46] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 14/46] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add lots of DPRINTF debug in qemu_loadvm*

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/savevm.c b/savevm.c
index 16b672b..662a910 100644
--- a/savevm.c
+++ b/savevm.c
@@ -719,6 +719,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
 
         if (ret < 0) {
+            DPRINTF("%s: setting error state after iterate on id=%d/%s",
+                    __func__, se->section_id, se->idstr);
             qemu_file_set_error(f, ret);
         }
         if (ret <= 0) {
@@ -1018,6 +1020,7 @@ int qemu_loadvm_state(QEMUFile *f)
         SaveStateEntry *se;
         char idstr[256];
 
+        DPRINTF("qemu_loadvm_state loop: section_type=%d", section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
@@ -1031,6 +1034,9 @@ int qemu_loadvm_state(QEMUFile *f)
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
+            DPRINTF("qemu_loadvm_state loop START/FULL: id=%d(%s)",
+                    section_id, idstr);
+
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
@@ -1058,8 +1064,9 @@ int qemu_loadvm_state(QEMUFile *f)
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state for instance 0x%x of device '%s'\n",
-                        instance_id, idstr);
+                error_report("qemu: error while loading state for"
+                             "instance 0x%x of device '%s'",
+                             instance_id, idstr);
                 goto out;
             }
             break;
@@ -1067,23 +1074,25 @@ int qemu_loadvm_state(QEMUFile *f)
         case QEMU_VM_SECTION_END:
             section_id = qemu_get_be32(f);
 
+            DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
-                fprintf(stderr, "Unknown savevm section %d\n", section_id);
+                error_report("Unknown savevm section %d", section_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state section id %d\n",
-                        section_id);
+                error_report("qemu: error while loading state section"
+                             " id %d (%s)", section_id, le->se->idstr);
                 goto out;
             }
+            DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
@@ -1092,11 +1101,12 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             break;
         default:
-            fprintf(stderr, "Unknown savevm section type %d\n", section_type);
+            error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
             goto out;
         }
     }
+    DPRINTF("qemu_loadvm_state loop: exited loop");
 
     cpu_synchronize_all_post_init();
 
@@ -1112,6 +1122,7 @@ out:
         ret = qemu_file_get_error(f);
     }
 
+    DPRINTF("qemu_loadvm_state out: ret=%d", ret);
     return ret;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 14/46] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 13/46] qemu_loadvm debug Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Misses out lines that are all 0 so can be quite compact
depending on the circumstance.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index eb6455a..a3c468e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -767,6 +767,45 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of and it won't bother printing lines that are all this value
+ * if 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128l;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur+linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur+curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found |= (thisbit ^ expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08lx : %s\n", cur, linebuf);
+        }
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index f722f06..a1ed7a3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -155,6 +155,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 14/46] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-05 10:26   ` Paolo Bonzini
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and signal whether the parent
should.

loadvm_handlers is made static since it's lifetime is greater
than the outer qemu_loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 136 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 84 insertions(+), 52 deletions(-)

diff --git a/savevm.c b/savevm.c
index 662a910..c277d77 100644
--- a/savevm.c
+++ b/savevm.c
@@ -915,6 +915,26 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/* These are ORable flags */
+const int LOADVM_EXITCODE_QUITLOOP     =  1;
+const int LOADVM_EXITCODE_QUITPARENT   =  2;
+const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
+
+typedef struct LoadStateEntry {
+    QLIST_ENTRY(LoadStateEntry) entry;
+    SaveStateEntry *se;
+    int section_id;
+    int version_id;
+} LoadStateEntry;
+
+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+static LoadStateEntry_Head loadvm_handlers =
+ QLIST_HEAD_INITIALIZER(loadvm_handlers);
+
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -930,8 +950,11 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 
 /* Process an incoming 'QEMU_VM_COMMAND'
  * -ve return on error (will issue error message)
+ * 0   just a normal return
+ * 1   All good, but exit the loop
  */
-static int loadvm_process_command(QEMUFile *f)
+static int loadvm_process_command(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
     MigrationIncomingState *mis = f->mis;
     uint16_t com;
@@ -981,39 +1004,13 @@ static int loadvm_process_command(QEMUFile *f)
     return 0;
 }
 
-typedef struct LoadStateEntry {
-    QLIST_ENTRY(LoadStateEntry) entry;
-    SaveStateEntry *se;
-    int section_id;
-    int version_id;
-} LoadStateEntry;
-
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
-    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-        QLIST_HEAD_INITIALIZER(loadvm_handlers);
-    LoadStateEntry *le, *new_le;
+    LoadStateEntry *le;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-
-    if (qemu_savevm_state_blocked(NULL)) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        return -ENOTSUP;
-    }
+    int exitcode = 0;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
@@ -1042,16 +1039,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1060,14 +1055,14 @@ int qemu_loadvm_state(QEMUFile *f)
             le->se = se;
             le->section_id = section_id;
             le->version_id = version_id;
-            QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+            QLIST_INSERT_HEAD(loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state for"
                              "instance 0x%x of device '%s'",
                              instance_id, idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1075,47 +1070,84 @@ int qemu_loadvm_state(QEMUFile *f)
             section_id = qemu_get_be32(f);
 
             DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
-            QLIST_FOREACH(le, &loadvm_handlers, entry) {
+            QLIST_FOREACH(le, loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state section"
                              " id %d (%s)", section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
-            ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            ret = loadvm_process_command(f, loadvm_handlers);
+            DPRINTF("%s QEMU_VM_COMMAND ret: %d", __func__, ret);
+            if ((ret < 0) || (ret & LOADVM_EXITCODE_QUITLOOP)) {
+                return ret;
             }
+            exitcode |= ret; /* Lets us pass flags up to the parent */
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
     DPRINTF("qemu_loadvm_state loop: exited loop");
 
-    cpu_synchronize_all_post_init();
+    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
+        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
+        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
+        exitcode &= LOADVM_EXITCODE_QUITLOOP;
+    }
+
+    return exitcode;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    LoadStateEntry *le, *new_le;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        return -EINVAL;
+    }
 
-    ret = 0;
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        return -ENOTSUP;
+    }
+
+    QLIST_INIT(&loadvm_handlers);
+    ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
-out:
-    QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-        QLIST_REMOVE(le, entry);
-        g_free(le);
+    if (ret == 0) {
+        cpu_synchronize_all_post_init();
+    }
+
+    if ((ret < 0) || !(ret & LOADVM_EXITCODE_KEEPHANDLERS)) {
+        QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
+            QLIST_REMOVE(le, entry);
+            g_free(le);
+        }
     }
 
     if (ret == 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram.
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-07 19:41   ` Eric Blake
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 17/46] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 migration.c                   | 9 +++++++++
 qapi-schema.json              | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index a1ed7a3..35ad1f6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -171,6 +171,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
 
diff --git a/migration.c b/migration.c
index e69a49e..67cdfd6 100644
--- a/migration.c
+++ b/migration.c
@@ -612,6 +612,15 @@ bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index b11aad2..eac3739 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,14 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
+#          migrated, pulling the remaining pages along as needed. NOTE: If the
+#          migration fails during postcopy the VM will fail.  (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 17/46] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 18/46] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add state variable showing current incoming postcopy state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 include/sysemu/sysemu.h       |  23 ++++
 savevm.c                      | 313 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 344 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 35ad1f6..6c0e990 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -60,6 +60,14 @@ typedef struct MigrationState MigrationState;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    enum {
+        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+        POSTCOPY_RAM_INCOMING_ADVISE,
+        POSTCOPY_RAM_INCOMING_LISTENING,
+        POSTCOPY_RAM_INCOMING_RUNNING,
+        POSTCOPY_RAM_INCOMING_END
+    } postcopy_ram_state;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b25e938..0641cc2 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,16 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
+    QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
+                                              warn we might want to do PC */
+    QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,      /* A list of pages to discard that
+                                              were previously sent during
+                                              precopy but are dirty. */
+    QEMU_VM_CMD_POSTCOPY_RAM_LISTEN,       /* Start listening for incoming
+                                              pages as it's running. */
+    QEMU_VM_CMD_POSTCOPY_RAM_RUN,          /* Start execution */
+    QEMU_VM_CMD_POSTCOPY_RAM_END,          /* Postcopy is finished. */
+
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
@@ -97,6 +107,19 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist);
+
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index c277d77..a67e23d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -33,12 +33,14 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -624,6 +626,83 @@ void qemu_savevm_send_openrp(QEMUFile *f)
 {
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
+
+/* Send prior to any RAM transfer */
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-advise");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 0, NULL);
+}
+
+/* Prior to running, to cause pages that have been dirtied after precopy
+ * started to be discarded on the destination.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *  2 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
+ *      byte   version (0)
+ *      byte   offset into the 1st data word containing 1st page of RAMBlock
+ *      byte   Length of name field
+ *  n x byte   RAM block name (NOT 0 terminated)
+ *  n x
+ *      be64   Page addresses for start of an invalidation range
+ *      be64   mask of 64 pages, '1' to discard'
+ *
+ *  Hopefully this is pretty sparse so we don't get too many entries,
+ *  and using the mask should deal with most pagesize differences
+ *  just ending up as a single full mask
+ *
+ *  The mask is always 64bits irrespective of the long size
+ *
+ *  Note the destination is free to discard *more* than we've asked
+ *  (e.g. rounding up to some convenient page size)
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  pagelist: one 8byte header word (empty) then len*(start,mask) pairs
+ *            The caller must have already put these into be64 format
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+
+    DPRINTF("send postcopy-ram-discard");
+    buf = g_malloc0(len*16 + strlen(name) + 3);
+    buf[0] = 0; /* Version */
+    buf[1] = offset;
+    assert(strlen(name) < 256);
+    buf[2] = strlen(name);
+    memcpy(buf+3, name, strlen(name));
+    tmplen = 3+strlen(name);
+    memcpy(buf + tmplen, pagelist, len*16);
+
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,
+                             tmplen + len*16, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive page data. */
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-listen");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-run");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL);
+}
+
+/* End of postcopy - with a status byte; 0 is good, anything else is a fail */
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status)
+{
+    DPRINTF("send postcopy-ram-end");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -935,6 +1014,209 @@ static LoadStateEntry_Head loadvm_handlers =
 static int qemu_loadvm_state_main(QEMUFile *f,
                                   LoadStateEntry_Head *loadvm_handlers);
 
+/* ------ incoming postcopy-ram messages ------ */
+/* 'advise' arrives before any RAM transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    /* Check this host can do it */
+    if (postcopy_ram_hosttest()) {
+        return -1;
+    }
+
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ * Bits set in the message represent a page in the source VMs bitmap, but
+ * since the guest/target page sizes can be different on s/d then we have
+ * to convert.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    const int source_target_page_bits = 12; /* TODO */
+    unsigned int first_bit_offset;
+    char ramid[256];
+
+    DPRINTF("%s", __func__);
+
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    /* We're expecting a
+     *    3 byte header,
+     *    a RAM ID string
+     *    then at least 1 2x8 byte chunks
+    */
+    if (len < 19) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+    first_bit_offset = qemu_get_byte(mis->file);
+
+    if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len & 15) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    while (len) {
+        uint64_t startaddr, mask;
+        /*
+         * We now have pairs of address, mask
+         *   The address is in multiples of 64bit chunks in the source bitmask
+         *     ie multiply by 64 and then source-target-page-size to get bytes
+         *     '0' represents the chunk in which the RAMBlock starts for the
+         *     source and 'first_bit_offset' (see above) represents which bit in
+         *     that first word corresponds to the first page of the RAMBlock
+         *   The mask is 64 bits of bitmask starting at that offset into the
+         *   RAMBlock.
+         *
+         *   For example:
+         *      an address of 1 with a first_bit_offset of 12 indicates
+         *      page 1*64 - 12 = page 52 for bit 0 of the mask
+         *      Source guarantees that for address 0, bits <first_bit_offset
+         *      shall be 0
+         */
+        startaddr = qemu_get_be64(mis->file) * 64;
+        mask = qemu_get_be64(mis->file);
+
+        len -= 16;
+
+        while (mask) {
+            /* mask= .....?10...0 */
+            /*             ^fs    */
+            int firstset = ctz64(mask);
+
+            /* tmp64=.....?11...1 */
+            /*             ^fs    */
+            uint64_t tmp64 = mask | ((((uint64_t)1)<<firstset)-1);
+
+            /* mask= .?01..10...0 */
+            /*         ^fz ^fs    */
+            int firstzero = cto64(tmp64);
+
+            if ((startaddr == 0) && (firstset < first_bit_offset)) {
+                error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set"
+                               " prior to block; block=%s offset=%d"
+                               " firstset=%d\n", ramid, first_bit_offset,
+                               firstzero);
+                return -1;
+            }
+            /*
+             * we know there must be at least 1 bit set due to the loop entry
+             * If there is no 0 firstzero will be 64
+             */
+            /* TODO - ram_discard_range gets added in a later patch
+            int ret = ram_discard_range(mis, ramid, source_target_page_bits,
+                                startaddr + firstset - first_bit_offset,
+                                startaddr + (firstzero - 1) - first_bit_offset);
+             */
+            ret = -1; /* TODO */
+            if (ret) {
+                return ret;
+            }
+
+            /* mask= .?0000000000 */
+            /*         ^fz ^fs    */
+            if (firstzero != 64) {
+                mask &= (((uint64_t)-1) << firstzero);
+            } else {
+                mask = 0;
+            }
+        }
+    }
+    DPRINTF("%s finished", __func__);
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive page data */
+static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
+
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+    /* Hold onto your hats, starting the CPU */
+    vm_start();
+
+    return 0;
+}
+
+/* The end - with a byte from the source which can tell us to fail. */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_END in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    return -1; /* TODO - expecting 1 byte good/fail */
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -996,6 +1278,37 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_advise(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_listen(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_run(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_END:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_END",
+                                                   len, 1)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_end(mis);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 18/46] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 17/46] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 19/46] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
of migration stream to be sent in one go, and be received by
a separate instance of the loadvm loop while not interacting
with the migration stream.

This is used by postcopy to load device state (from the package)
while loading memory pages from the main stream.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 savevm.c                | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0641cc2..abf0d63 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -86,6 +86,7 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
+    QEMU_VM_CMD_PACKAGED,      /* Send a wrapped stream within this stream */
 
     QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
                                               warn we might want to do PC */
@@ -100,6 +101,8 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -111,6 +114,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len, uint8_t offset,
diff --git a/savevm.c b/savevm.c
index a67e23d..7c5cdba 100644
--- a/savevm.c
+++ b/savevm.c
@@ -627,6 +627,38 @@ void qemu_savevm_send_openrp(QEMUFile *f)
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ */
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    tmp = cpu_to_be32(len);
+
+    DPRINTF("send_packaged");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+}
+
 /* Send prior to any RAM transfer */
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
 {
@@ -1230,6 +1262,46 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length,
+                                      LoadStateEntry_Head *loadvm_handlers)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    DPRINTF("loadvm_handle_cmd_packaged: length=%u", length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    DPRINTF("%s: Received %d package, going to load", __func__, ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+    packf->mis = mis;
+
+    ret = qemu_loadvm_state_main(packf, loadvm_handlers);
+    DPRINTF("%s: qemu_loadvm_state_main returned %d", __func__, ret);
+    qemu_fclose(packf); /* also frees the qsb */
+
+    return ret;
+}
+
 /* Process an incoming 'QEMU_VM_COMMAND'
  * -ve return on error (will issue error message)
  * 0   just a normal return
@@ -1278,6 +1350,14 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32, loadvm_handlers);
+
     case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
                                                    len, 0)) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 19/46] migrate_init: Call from savevm
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 18/46] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 20/46] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 include/qemu/typedefs.h       | 1 +
 migration.c                   | 2 +-
 savevm.c                      | 2 ++
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 6c0e990..cf66921 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -138,6 +138,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0f79b5c..8539de6 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -16,6 +16,7 @@ struct Monitor;
 typedef struct Monitor Monitor;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 
 typedef struct Property Property;
 typedef struct PropertyInfo PropertyInfo;
diff --git a/migration.c b/migration.c
index 67cdfd6..eac12ab 100644
--- a/migration.c
+++ b/migration.c
@@ -442,7 +442,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index 7c5cdba..843443f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -941,6 +941,8 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 20/46] Allow savevm handlers to state whether they could go into postcopy
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 19/46] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 21/46] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use that to split the qemu_savevm_state_pending counts into postcopiable
and non-postcopiable amounts

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  7 +++++++
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  4 +++-
 migration.c                 |  9 ++++++++-
 savevm.c                    | 23 +++++++++++++++++++----
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a3c468e..aeeaf37 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1190,6 +1190,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* RAM's always up for postcopying */
+static bool ram_can_postcopy(void *opaque)
+{
+    return true;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -1197,6 +1203,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
+    .can_postcopy = ram_can_postcopy,
 };
 
 void ram_mig_init(void)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a001bd..4991935 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,7 +54,7 @@ typedef struct SaveVMHandlers {
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
     uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    bool (*can_postcopy)(void *opaque);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index abf0d63..dc53580 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -109,7 +109,9 @@ void qemu_savevm_state_begin(QEMUFile *f,
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/migration.c b/migration.c
index eac12ab..8343679 100644
--- a/migration.c
+++ b/migration.c
@@ -806,8 +806,15 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
+            uint64_t pend_post, pend_nonpost;
+            DPRINTF("iterate\n");
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
             trace_migrate_pending(pending_size, max_size);
+            DPRINTF("pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64
+                    " nonpost=%" PRIu64 ")\n",
+                    pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/savevm.c b/savevm.c
index 843443f..d8af526 100644
--- a/savevm.c
+++ b/savevm.c
@@ -903,10 +903,18 @@ void qemu_savevm_state_complete(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t res_nonpc = 0;
+    uint64_t res_pc = 0;
+    uint64_t tmp;
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -917,9 +925,16 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        tmp = se->ops->save_live_pending(f, se->opaque, max_size);
+
+        if (se->ops->can_postcopy(se->opaque)) {
+            res_pc += tmp;
+        } else {
+            res_nonpc += tmp;
+        }
     }
-    return ret;
+    *res_non_postcopiable = res_nonpc;
+    *res_postcopiable = res_pc;
 }
 
 void qemu_savevm_state_cancel(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 21/46] postcopy: OS support test
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 20/46] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 Makefile.objs                    |   2 +-
 include/migration/postcopy-ram.h |  19 ++++++
 postcopy-ram.c                   | 129 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

diff --git a/Makefile.objs b/Makefile.objs
index 1f76cea..a7ad235 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -55,7 +55,7 @@ common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
 common-obj-y += qemu-char.o #aio.o
 common-obj-y += block-migration.o
-common-obj-y += page_cache.o xbzrle.o
+common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..dcd1afa
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return 0 if the host supports everything we need to do postcopy-ram */
+int postcopy_ram_hosttest(void);
+
+#endif
diff --git a/postcopy-ram.c b/postcopy-ram.c
new file mode 100644
index 0000000..1f3e6ea
--- /dev/null
+++ b/postcopy-ram.c
@@ -0,0 +1,129 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+
+//#define DEBUG_POSTCOPY
+
+#ifdef DEBUG_POSTCOPY
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "postcopy@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+/* On Linux we use:
+ *    madvise MADV_USERFAULT - to mark an area of anonymous memory such
+ *                             that userspace is notifed of accesses to
+ *                             unallocated areas.
+ *    userfaultfd      - opens a socket to receive USERFAULT messages
+ *    remap_anon_pages - to shuffle mapped pages into previously unallocated
+ *                       areas without creating loads of VMAs.
+ */
+
+#include <sys/mman.h>
+#include <sys/types.h>
+
+/* TODO remove once we have libc defs
+ * NOTE: These are x86-64 numbers for Andrea's 3.15.0 world */
+#ifndef MADV_USERFAULT
+#define MADV_USERFAULT   18
+#define MADV_NOUSERFAULT 19
+#endif
+
+#ifndef __NR_remap_anon_pages
+#define __NR_remap_anon_pages 317
+#endif
+
+int postcopy_ram_hosttest(void)
+{
+    /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
+     *
+     * Try each syscall we need, but this isn't a testbench,
+     * just enough to see that we have the calls
+     */
+    void *testarea, *testarea2;
+    long pagesize = getpagesize();
+
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (!testarea) {
+        perror("postcopy_ram_hosttest: Failed to map test area");
+        return -1;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    if (madvise(testarea, pagesize, MADV_USERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
+    if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
+    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                     MAP_ANONYMOUS, -1, 0);
+    if (!testarea2) {
+        perror("postcopy_ram_hosttest: Failed to map second test area");
+        return -1;
+    }
+    g_assert(((size_t)testarea2 & (pagesize-1)) == 0);
+    *(char *)testarea = 0; /* Force the map of the new page */
+    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
+        pagesize) {
+        perror("postcopy_ram_hosttest: remap_anon_pages not available");
+        munmap(testarea, pagesize);
+        munmap(testarea2, pagesize);
+        return -1;
+    }
+
+    munmap(testarea, pagesize);
+    munmap(testarea2, pagesize);
+    return 0;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+int postcopy_ram_hosttest(void)
+{
+    error_report("postcopy_ram_hosttest: No OS support");
+    return -1;
+}
+
+#endif
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 21/46] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-07 19:50   ` Eric Blake
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 23/46] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add somewhere for the various migration parameters to be set with
one command;

As suggested in the thread:
http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00243.html

There are many existing migration parameters that are scattered over
many individual commands; moving those to this scheme would probably break
things for others, so I've left them be.

Preserve the migration tunable values across the reinit of the migration
status in the same way that the capability flags are preserved.

Add completion routine for it.

Use the postcopy time out setting as the first parameter.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp-commands.hx               | 17 ++++++++++
 hmp.c                         | 54 ++++++++++++++++++++++++++++++
 hmp.h                         |  4 +++
 include/migration/migration.h |  4 +++
 migration.c                   | 78 ++++++++++++++++++++++++++++++++++---------
 monitor.c                     | 25 ++++++++++++++
 qapi-schema.json              | 50 +++++++++++++++++++++++++++
 qmp-commands.hx               | 23 +++++++++++++
 8 files changed, 239 insertions(+), 16 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d0943b1..4098a52 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -987,6 +987,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "migrate_set_parameter",
+        .args_type  = "parameter:s,value:l",
+        .params     = "parameter value",
+        .help       = "Change the value of the given parameter",
+        .mhandler.cmd = hmp_migrate_set_parameter,
+        .command_completion = migrate_set_parameter_completion,
+    },
+
+STEXI
+@item migrate_set_parameter @var{parameter} @var{value}
+@findex migrate_set_parameter
+Change the value of a migration parameter @var{parameter}.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
@@ -1770,6 +1785,8 @@ show migration status
 show current migration capabilities
 @item info migrate_cache_size
 show current migration XBZRLE cache size
+@item info migrate_parameters
+show current migration parameters
 @item info balloon
 show balloon information
 @item info qtree
diff --git a/hmp.c b/hmp.c
index 4d1838e..1d14aa4 100644
--- a/hmp.c
+++ b/hmp.c
@@ -156,6 +156,9 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "\n");
     }
 
+    if (info->has_status)
+        hmp_info_migrate_parameters(mon, NULL);
+
     if (info->has_status) {
         monitor_printf(mon, "Migration status: %s\n", info->status);
         monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
@@ -252,6 +255,25 @@ void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict)
                    qmp_query_migrate_cache_size(NULL) >> 10);
 }
 
+void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
+{
+    MigrationParameterList *params, *param;
+
+    params = qmp_query_migrate_parameters(NULL);
+
+    if (params) {
+        monitor_printf(mon, "parameters: ");
+        for (param = params; param; param = param->next) {
+            monitor_printf(mon, "%s: %" PRIu64 " ",
+                       MigrationParameterName_lookup[param->value->parameter],
+                       param->value->value);
+        }
+        monitor_printf(mon, "\n");
+    }
+
+    qapi_free_MigrationParameterList(params);
+}
+
 void hmp_info_cpus(Monitor *mon, const QDict *qdict)
 {
     CpuInfoList *cpu_list, *cpu;
@@ -1077,6 +1099,38 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
+{
+    const char *param_name = qdict_get_str(qdict, "parameter");
+    int64_t value = qdict_get_int(qdict, "value");
+    Error *err = NULL;
+    MigrationParameterList *params = g_malloc0(sizeof(*params));
+    int i;
+
+    for (i = 0; i < MIGRATION_PARAMETER_NAME_MAX; i++) {
+        if (strcmp(param_name, MigrationParameterName_lookup[i]) == 0) {
+            params->value = g_malloc0(sizeof(*params->value));
+            params->value->parameter = i;
+            params->value->value = value;
+            params->next = NULL;
+            qmp_migrate_set_parameters(params, &err);
+            break;
+        }
+    }
+
+    if (i == MIGRATION_PARAMETER_NAME_MAX) {
+        error_set(&err, QERR_INVALID_PARAMETER, param_name);
+    }
+
+    qapi_free_MigrationParameterList(params);
+
+    if (err) {
+        monitor_printf(mon, "migrate_set_parameter: %s\n",
+                       error_get_pretty(err));
+        error_free(err);
+    }
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 4fd3c4a..609241c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -29,6 +29,7 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
 void hmp_info_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_block(Monitor *mon, const QDict *qdict);
 void hmp_info_blockstats(Monitor *mon, const QDict *qdict);
@@ -64,6 +65,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
@@ -110,6 +112,8 @@ void watchdog_action_completion(ReadLineState *rs, int nb_args,
                                 const char *str);
 void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
                                        const char *str);
+void migrate_set_parameter_completion(ReadLineState *rs, int nb_args,
+                                      const char *str);
 void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void host_net_remove_completion(ReadLineState *rs, int nb_args,
                                 const char *str);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index cf66921..d2754f7 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -35,6 +35,7 @@
 #define QEMU_VM_SUBSECTION           0x05
 #define QEMU_VM_COMMAND              0x06
 
+/* This is state currently only used by block */
 struct MigrationParams {
     bool blk;
     bool shared;
@@ -75,6 +76,7 @@ struct MigrationIncomingState {
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
 void migration_incoming_state_destroy(MigrationIncomingState *mis);
 
+/* State for the outgoing migration */
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -95,6 +97,8 @@ struct MigrationState
     int64_t dirty_pages_rate;
     int64_t dirty_bytes_rate;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    /* For migrate_set_parameters command */
+    int64_t tunables[MIGRATION_PARAMETER_NAME_MAX];
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
diff --git a/migration.c b/migration.c
index 8343679..20ed6fa 100644
--- a/migration.c
+++ b/migration.c
@@ -226,6 +226,64 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
+                                  Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    MigrationCapabilityStatusList *cap;
+
+    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+        error_set(errp, QERR_MIGRATION_ACTIVE);
+        return;
+    }
+
+    for (cap = params; cap; cap = cap->next) {
+        s->enabled_capabilities[cap->value->capability] = cap->value->state;
+    }
+}
+
+void qmp_migrate_set_parameters(MigrationParameterList *params,
+                                  Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    MigrationParameterList *parm;
+
+    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+        error_set(errp, QERR_MIGRATION_ACTIVE);
+        return;
+    }
+
+    for (parm = params; parm; parm = parm->next) {
+        s->tunables[parm->value->parameter] = parm->value->value;
+    }
+}
+
+MigrationParameterList *qmp_query_migrate_parameters(Error **errp)
+{
+    MigrationParameterList *head = NULL;
+    MigrationParameterList *parms;
+    MigrationState *s = migrate_get_current();
+    int i;
+
+    parms = NULL;
+    for (i = 0; i < MIGRATION_PARAMETER_NAME_MAX; i++) {
+        if (head == NULL) {
+            head = g_malloc0(sizeof(*parms));
+            parms = head;
+        } else {
+            parms->next = g_malloc0(sizeof(*parms));
+            parms = parms->next;
+        }
+        parms->value =
+            g_malloc(sizeof(*parms->value));
+        parms->value->parameter = i;
+        parms->value->value = s->tunables[i];
+    }
+
+    return head;
+}
+
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -326,22 +384,6 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     return info;
 }
 
-void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
-                                  Error **errp)
-{
-    MigrationState *s = migrate_get_current();
-    MigrationCapabilityStatusList *cap;
-
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
-        error_set(errp, QERR_MIGRATION_ACTIVE);
-        return;
-    }
-
-    for (cap = params; cap; cap = cap->next) {
-        s->enabled_capabilities[cap->value->capability] = cap->value->state;
-    }
-}
-
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
@@ -447,15 +489,19 @@ MigrationState *migrate_init(const MigrationParams *params)
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
+    int64_t tunables[MIGRATION_PARAMETER_NAME_MAX];
     int64_t xbzrle_cache_size = s->xbzrle_cache_size;
 
+    /* Preserve user settings across this clear */
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
+    memcpy(tunables, s->tunables, sizeof(tunables));
 
     memset(s, 0, sizeof(*s));
     s->params = *params;
     memcpy(s->enabled_capabilities, enabled_capabilities,
            sizeof(enabled_capabilities));
+    memcpy(s->tunables, tunables, sizeof(tunables));
     s->xbzrle_cache_size = xbzrle_cache_size;
 
     s->bandwidth_limit = bandwidth_limit;
diff --git a/monitor.c b/monitor.c
index 5bc70a6..f354b7b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2869,6 +2869,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.cmd = hmp_info_migrate_cache_size,
     },
     {
+        .name       = "migrate_parameters",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show current migration parameters",
+        .mhandler.cmd = hmp_info_migrate_parameters,
+    },
+    {
         .name       = "balloon",
         .args_type  = "",
         .params     = "",
@@ -4553,6 +4560,24 @@ void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
     }
 }
 
+void migrate_set_parameter_completion(ReadLineState *rs, int nb_args,
+                                       const char *str)
+{
+    size_t len;
+
+    len = strlen(str);
+    readline_set_completion_index(rs, len);
+    if (nb_args == 2) {
+        int i;
+        for (i = 0; i < MIGRATION_PARAMETER_NAME_MAX; i++) {
+            const char *name = MigrationParameterName_lookup[i];
+            if (!strncmp(str, name, len)) {
+                readline_add_completion(rs, name);
+            }
+        }
+    }
+}
+
 void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str)
 {
     int i;
diff --git a/qapi-schema.json b/qapi-schema.json
index eac3739..678ad26 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -538,6 +538,56 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @MigrationParameterName
+#
+# Migration parameter enumeration
+#     Most existing parameters have separate commands/entries but it seems
+#     better to group them in the same way as the capability flags
+#
+# @x-postcopy-start-time: Time (in ms) after the start of migration to consider
+#                         switching to postcopy mode
+#
+# Since: 2.0
+##
+{ 'enum': 'MigrationParameterName',
+  'data': ['x-postcopy-start-time'] }
+
+##
+# @MigrationParameter
+# @parameter: parameter enum
+#
+# @value: value int
+#
+# Since: 2.0
+##
+{ 'type': 'MigrationParameter',
+  'data': {'parameter': 'MigrationParameterName', 'value': 'int' } }
+
+##
+# @migrate-set-parameters
+#
+# Change the given migration parameter
+#
+# @parameters: json array of parameters to be changed
+#
+# Since: 2.0
+##
+{ 'command': 'migrate-set-parameters',
+   'data': { 'parameters': ['MigrationParameter']}}
+
+##
+# @query-migrate-parameters
+#
+# Returns the current settings of the migration parameters
+#
+# Returns: @MigrationParameter
+#
+# Since: 2.0
+##
+{ 'command': 'query-migrate-parameters',
+  'returns': ['MigrationParameter']}
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..7931c6a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3202,6 +3202,29 @@ EQMP
 	.mhandler.cmd_new = qmp_marshal_input_migrate_set_capabilities,
     },
 SQMP
+migrate-set-parameters
+------------------------
+
+Set migration parameters
+
+- "x-postcopy-start-time": Postcopy start time
+
+Arguments:
+
+Example:
+
+-> { "execute": "migrate-set-parameters" , "arguments":
+     { "parameters": [ { "parameter": "x-postcopy-start-time", "value": 30 } ] } }
+
+EQMP
+
+    {
+        .name       = "migrate-set-parameters",
+        .args_type  = "parameters:O",
+        .params     = "parameter:s,value:i",
+	.mhandler.cmd_new = qmp_marshal_input_migrate_set_parameters,
+    },
+SQMP
 query-migrate-capabilities
 --------------------------
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 23/46] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 24/46] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIG_STATE_POSTCOPY_ACTIVE' is entered after the precopy timelimit
has expired and migration switches to postcopy.

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration.c                   | 76 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d2754f7..71442d8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -146,6 +146,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_bytes_remaining(void);
diff --git a/migration.c b/migration.c
index 20ed6fa..d9a9e5b 100644
--- a/migration.c
+++ b/migration.c
@@ -38,13 +38,14 @@
     do { } while (0)
 #endif
 
-enum {
+enum MigrationPhase {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
     MIG_STATE_SETUP,
     MIG_STATE_CANCELLING,
     MIG_STATE_CANCELLED,
     MIG_STATE_ACTIVE,
+    MIG_STATE_POSTCOPY_ACTIVE,
     MIG_STATE_COMPLETED,
 };
 
@@ -226,13 +227,30 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+/* Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIG_STATE_ACTIVE:
+    case MIG_STATE_POSTCOPY_ACTIVE:
+    case MIG_STATE_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
                                   Error **errp)
 {
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -248,7 +266,7 @@ void qmp_migrate_set_parameters(MigrationParameterList *params,
     MigrationState *s = migrate_get_current();
     MigrationParameterList *parm;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -347,6 +365,40 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIG_STATE_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->status = g_strdup("postcopy-active");
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIG_STATE_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -423,7 +475,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIG_STATE_ACTIVE);
+    assert((s->state != MIG_STATE_ACTIVE) &&
+           (s->state != MIG_STATE_POSTCOPY_ACTIVE));
 
     if (s->state != MIG_STATE_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -451,7 +504,8 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
+        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE &&
+            old_state != MIG_STATE_POSTCOPY_ACTIVE) {
             break;
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
@@ -484,6 +538,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIG_STATE_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -536,7 +595,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIG_STATE_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -847,7 +906,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
-    while (s->state == MIG_STATE_ACTIVE) {
+    DPRINTF("setup complete\n");
+
+    while (s->state == MIG_STATE_ACTIVE ||
+           s->state == MIG_STATE_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 24/46] qemu_savevm_state_complete: Postcopy changes
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 23/46] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 25/46] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When postcopy calls qemu_savevm_state_complete it's not really
the end of migration, so skip:
   a) Finishing postcopiable iterative devices - they'll carry on
   b) The termination byte on the end of the stream.

We then also add:
  qemu_savevm_state_postcopy_complete
which is called at the end of a postcopy migration to call the
complete methods on devices skipped in the _complete call.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index dc53580..ce52c0a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -112,6 +112,7 @@ void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
+void qemu_savevm_state_postcopy_complete(QEMUFile *f);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/savevm.c b/savevm.c
index d8af526..a2c5fc8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -845,10 +845,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
+/*
+ * Calls the complete routines just for those devices that are postcopiable;
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls the plain qemu_savevm_state_complete to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_postcopy_complete(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete ||
+            !se->ops->can_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_savevm_send_postcopy_ram_end(f, 0 /* Good */);
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+}
+
 void qemu_savevm_state_complete(QEMUFile *f)
 {
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete();
 
@@ -863,6 +904,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
                 continue;
             }
         }
+        if (in_postcopy && se->ops &&  se->ops->can_postcopy &&
+            se->ops->can_postcopy(se->opaque)) {
+            DPRINTF("%s: Skipping %s in postcopy", __func__, se->idstr);
+            continue;
+        }
         trace_savevm_section_start(se->idstr, se->section_id);
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_END);
@@ -899,7 +945,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
+
     qemu_fflush(f);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 25/46] Postcopy: Maintain sentmap during postcopy pre phase
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 24/46] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 26/46] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      | 162 ++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h    |   5 ++
 include/migration/postcopy-ram.h |  20 +++++
 migration.c                      |   2 +
 postcopy-ram.c                   | 156 +++++++++++++++++++++++++++++++++++++
 savevm.c                         |   3 -
 6 files changed, 342 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index aeeaf37..134ea7e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -40,6 +40,7 @@
 #include "hw/audio/audio.h"
 #include "sysemu/kvm.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "hw/i386/smbios.h"
 #include "exec/address-spaces.h"
 #include "hw/audio/pcspk.h"
@@ -413,9 +414,15 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return bytes_sent;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * bitoffset: Pointer into which to store the offset into the dirty map
+ *            at which the bit was found.
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 unsigned long *bitoffset)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -434,6 +441,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *bitoffset = next;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -562,6 +570,19 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /*
  * ram_save_page: Send the given page to the stream
  *
@@ -650,13 +671,14 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
     bool complete_round = false;
     int bytes_sent = 0;
     MemoryRegion *mr;
+    unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -674,6 +696,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 
             /* if page is unmodified, continue to the next */
             if (bytes_sent > 0) {
+                MigrationState *s = migrate_get_current();
+                if (s->sentmap) {
+                    set_bit(bitoffset, s->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -733,12 +760,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -806,6 +840,123 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/*
+ * Utility for the outgoing postcopy code; this performs
+ * sentmap &= migration_bitmap
+ * returning the length of the bitmap
+ */
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    migration_bitmap_sync();
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap, ram_pages);
+    return ram_pages;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+int ram_postcopy_each_ram_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        unsigned long first, last;
+        first = block->offset >> TARGET_PAGE_BITS;
+        last = (block->offset + (block->length-1)) >> TARGET_PAGE_BITS;
+        ret = postcopy_send_discard_bm_ram(ms, block->idstr, first, last);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive range of bits indexed in the source
+ *    VMs bitmap for this RAMBlock, source_target_page_bits tells
+ *    us what one of those bits represents.
+ *
+ * start/end are offsets from the start of the bitmap for RAMBlock 'block_name'
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end)
+{
+    assert(end >= start);
+    unsigned int bitdif;
+
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        return -1;
+    }
+
+    if (source_target_page_bits != TARGET_PAGE_BITS) {
+        if (source_target_page_bits < TARGET_PAGE_BITS) {
+            /*
+             * e.g. source is 4K and we're 64k - we'll have to discard
+             * on the larger boundary
+             * e.g. a range of  70K...132K we would discard from
+             * 64K..192K, so round start down, and end up
+             */
+            bitdif = TARGET_PAGE_BITS - source_target_page_bits;
+            start = start >> bitdif;
+            if (end & ((1<<bitdif)-1)) {
+                end = end >> bitdif;
+                end++;
+            } else {
+                end = end >> bitdif;
+            }
+
+        } else {
+            /* e.g. source is 64K and we're 4K - easy just scale the indexes */
+            bitdif = source_target_page_bits - TARGET_PAGE_BITS;
+
+            start = start << bitdif;
+            end = end << bitdif;
+        }
+    }
+
+    uint64_t index_offset = rb->offset >> TARGET_PAGE_BITS;
+    postcopy_pmi_discard_range(mis, start + index_offset, (end - start) + 1);
+
+    /* +1 gives the byte after the end of the last page to be discarded */
+    ram_addr_t end_offset = (end+1) << TARGET_PAGE_BITS;
+    uint8_t *host_startaddr = rb->host + (start << TARGET_PAGE_BITS);
+    uint8_t *host_endaddr;
+
+    if (end_offset <= rb->length) {
+        host_endaddr   = rb->host + (end_offset-1);
+        return postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%zu/%zu/%zu)",
+                     block_name, start, end, rb->length);
+        return -1;
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -844,7 +995,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
         acct_clear();
     }
-
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
@@ -854,6 +1004,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 71442d8..2289254 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -171,6 +171,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms);
+int ram_postcopy_each_ram_discard(MigrationState *ms);
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index dcd1afa..fe89a3c 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -13,7 +13,27 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "migration/migration.h"
+
 /* Return 0 if the host supports everything we need to do postcopy-ram */
 int postcopy_ram_hosttest(void);
 
+/* Send the list of sent-but-dirty pages */
+int postcopy_send_discard_bitmap(MigrationState *ms);
+
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called back from arch_init's ram_postcopy_each_ram_discard to handle
+ * discarding one RAMBlock's pre-postcopy dirty pages
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end);
+
 #endif
diff --git a/migration.c b/migration.c
index d9a9e5b..ca0fd7b 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
@@ -928,6 +929,7 @@ static void *migration_thread(void *opaque)
             } else {
                 int ret;
 
+                DPRINTF("done iterating\n");
                 qemu_mutex_lock_iothread();
                 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 1f3e6ea..ff6bdd6 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,7 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
 
@@ -116,6 +117,21 @@ int postcopy_ram_hosttest(void)
     return 0;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        perror("postcopy_ram_discard_range MADV_DONTNEED");
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -125,5 +141,145 @@ int postcopy_ram_hosttest(void)
     return -1;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
+                               void *end)
+{
+    error_report("postcopy_ram_discard_range: No OS support");
+    return -1;
+}
+#endif
+
+/* ------------------------------------------------------------------------- */
+/*
+ * A helper to get 64 bits from the sentmap; trivial for HOST_LONG_BITS=64
+ * messier for other sizes; pads with 0's at end if an unaligned end
+ *   check2nd32: True if it's safe to read the upper 32bits in a 32bit long
+ *               map
+ */
+static uint64_t get_64bits_sentmap(unsigned long *sentmap, bool check2nd32,
+                                   int64_t start)
+{
+    uint64_t result;
+#if HOST_LONG_BITS == 64
+    result = sentmap[start / 64];
+#elif HOST_LONG_BITS == 32
+    /*
+     * Irrespective of host endianness, sentmap[n] is for pages earlier
+     * than sentmap[n+1] so we can't just cast up
+     */
+    uint32_t sm0, sm1;
+    sm0 = sentmap[start / 32];
+    sm1 = check2nd32 ? sentmap[(start / 32) + 1] : 0;
+    result = sm0 | ((uint64_t)sm1) << 32;
+#else
+#error "Host long other than 64/32 not supported"
+#endif
+
+    return result;
+}
+
+/*
+ * Callback from ram_postcopy_each_ram_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end)
+{
+    /* Keeps command under 256 bytes - but arbitrary */
+    const unsigned int max_entries_per_command = 12;
+    uint16_t cur_entry;
+    uint64_t buffer[2*max_entries_per_command];
+    unsigned int nsentwords = 0;
+    unsigned int nsentcmds = 0;
+
+    /*
+     * There is no guarantee that start, end are on convenient 64bit multiples
+     * (We always send 64bit chunks over the wire, irrespective of long size)
+     */
+    unsigned long first64, last64, cur64;
+    first64 = start / 64;
+    last64 = end / 64;
+
+    cur_entry = 0;
+    for (cur64 = first64; cur64 <= last64; cur64++) {
+        /* Deal with start/end not on alignment */
+        uint64_t mask;
+        mask = ~(uint64_t)0;
+
+        if ((cur64 == first64) && (start & 63)) {
+            /* e.g. (start & 63) = 3
+             *         1 << .    -> 2^3
+             *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+             *         ~.        -> mask 63..3
+             */
+            mask &= ~((((uint64_t)1) << (start & 63)) - 1);
+        }
+
+        if ((cur64 == last64) && ((end & 64) != 63)) {
+            /* e.g. (end & 64) = 3
+             *            .   +1 -> 4
+             *         1 << .    -> 2^4
+             *         . -1      -> 2^4 - 1
+             *                   = mask set 3..0
+             */
+            mask &= (((uint64_t)1) << ((end & 64) + 1)) - 1;
+        }
+
+        uint64_t data = get_64bits_sentmap(ms->sentmap,
+                                           (end & 64) >= 32, cur64 * 64);
+        data &= mask;
+
+        if (data) {
+            cpu_to_be64w(buffer+2*cur_entry, (cur64-first64));
+            cpu_to_be64w(buffer+1+2*cur_entry, data);
+            cur_entry++;
+            nsentwords++;
+
+            if (cur_entry == max_entries_per_command) {
+                /* Full set, ship it! */
+                qemu_savevm_send_postcopy_ram_discard(ms->file, name,
+                                                      cur_entry,
+                                                      start & 63,
+                                                      buffer);
+                nsentcmds++;
+                cur_entry = 0;
+            }
+        }
+    }
+
+    /* Anything unsent? */
+    if (cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, name, cur_entry,
+                                              start & 63, buffer);
+        nsentcmds++;
+    }
+
+    /*fprintf(stderr, "postcopy_send_discard_bm_ram: '%s' mask words"
+                      " sent=%d in %d commands.\n",
+            name, nsentwords, nsentcmds);*/
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     * (arch_init gives us the full size as a return)
+     */
+    ram_mask_postcopy_bitmap(ms);
+
+    DPRINTF("Dumping merged sentmap");
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
 #endif
 
+    return ram_postcopy_each_ram_discard(ms);
+}
+
diff --git a/savevm.c b/savevm.c
index a2c5fc8..1d5375c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1238,12 +1238,9 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
              * we know there must be at least 1 bit set due to the loop entry
              * If there is no 0 firstzero will be 64
              */
-            /* TODO - ram_discard_range gets added in a later patch
             int ret = ram_discard_range(mis, ramid, source_target_page_bits,
                                 startaddr + firstset - first_bit_offset,
                                 startaddr + (firstzero - 1) - first_bit_offset);
-             */
-            ret = -1; /* TODO */
             if (ret) {
                 return ret;
             }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 26/46] Postcopy page-map-incoming (PMI) structure
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 25/46] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 27/46] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The PMI holds the state of each page on the incoming side,
so that we can tell if the page is missing, already received
or there is a request outstanding for it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  18 ++++++
 include/migration/postcopy-ram.h |   4 ++
 include/qemu/typedefs.h          |   1 +
 postcopy-ram.c                   | 118 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2289254..722c846 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -57,6 +57,23 @@ struct MigrationRetPathState {
 
 typedef struct MigrationState MigrationState;
 
+/* Postcopy page-map-incoming - data about each page on the inbound side */
+
+typedef enum {
+   POSTCOPY_PMI_MISSING,   /* page hasn't yet been received */
+   POSTCOPY_PMI_REQUESTED, /* Kernel asked for a page, but we've not got it */
+   POSTCOPY_PMI_RECEIVED   /* We've got the page */
+} PostcopyPMIState;
+
+struct PostcopyPMI {
+    /* TODO: I'm expecting to rework this using some atomic compare-exchange
+     * thing, which will require merging the maps together
+     */
+    QemuMutex      mutex;
+    unsigned long *received_map;  /* Pages that we have received */
+    unsigned long *requested_map; /* Pages that we're sending a request for */
+};
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -71,6 +88,7 @@ struct MigrationIncomingState {
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    PostcopyPMI    postcopy_pmi;
 };
 
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index fe89a3c..75ca0fd 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -36,4 +36,8 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
                                  unsigned long start, unsigned long end);
 
+void postcopy_pmi_destroy(MigrationIncomingState *mis);
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 8539de6..61b330c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -77,6 +77,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
+typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
 typedef struct AdapterInfo AdapterInfo;
 
diff --git a/postcopy-ram.c b/postcopy-ram.c
index ff6bdd6..f92f516 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,8 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
@@ -66,6 +68,122 @@
 #define __NR_remap_anon_pages 317
 #endif
 
+/* ---------------------------------------------------------------------- */
+/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
+/* state of each page used by the inbound postcopy                        */
+
+static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    qemu_mutex_init(&mis->postcopy_pmi.mutex);
+    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
+    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
+    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
+    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
+}
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis)
+{
+    if (mis->postcopy_pmi.received_map) {
+        g_free(mis->postcopy_pmi.received_map);
+        mis->postcopy_pmi.received_map = NULL;
+    }
+    if (mis->postcopy_pmi.requested_map) {
+        g_free(mis->postcopy_pmi.requested_map);
+        mis->postcopy_pmi.requested_map = NULL;
+    }
+    qemu_mutex_destroy(&mis->postcopy_pmi.mutex);
+}
+
+/*
+ * Mark a set of pages in the PMI as being clear; this is used by the discard
+ * at the start of postcopy, and before the postcopy stream starts.
+ */
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages)
+{
+    bitmap_clear(mis->postcopy_pmi.received_map, start, npages);
+}
+
+/*
+ * Retrieve the state of the given page
+ * Note: This version for use by callers already holding the lock
+ */
+static PostcopyPMIState postcopy_pmi_get_state_nolock(
+                            MigrationIncomingState *mis,
+                            size_t bitmap_index)
+{
+    bool received, requested;
+
+    received = test_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    requested = test_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+
+    if (received) {
+        assert(!requested);
+        return POSTCOPY_PMI_RECEIVED;
+    } else {
+        return requested ? POSTCOPY_PMI_REQUESTED : POSTCOPY_PMI_MISSING;
+    }
+}
+
+/* Retrieve the state of the given page */
+static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
+                                               size_t bitmap_index)
+{
+    PostcopyPMIState ret;
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    ret = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+
+    return ret;
+}
+
+/*
+ * Set the page state to the given state if the previous state was as expected
+ * Return the actual previous state.
+ */
+static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
+                                           size_t bitmap_index,
+                                           PostcopyPMIState expected_state,
+                                           PostcopyPMIState new_state)
+{
+    PostcopyPMIState old_state;
+
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    old_state = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+
+    if (old_state == expected_state) {
+        switch (new_state) {
+        case POSTCOPY_PMI_MISSING:
+          assert(0); /* This shouldn't actually happen - use discard_range */
+          break;
+
+        case POSTCOPY_PMI_REQUESTED:
+          assert(old_state == POSTCOPY_PMI_MISSING);
+          set_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+
+        case POSTCOPY_PMI_RECEIVED:
+          assert(old_state == POSTCOPY_PMI_MISSING ||
+                 old_state == POSTCOPY_PMI_REQUESTED);
+          set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+          clear_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+        }
+    }
+
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+    return old_state;
+}
+
+static void postcopy_pmi_dump(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_pmi_dump: requested\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.requested_map, false);
+    fprintf(stderr, "postcopy_pmi_dump: received\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
+}
+
+/* ---------------------------------------------------------------------- */
 int postcopy_ram_hosttest(void)
 {
     /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 27/46] postcopy: Add incoming_init/cleanup functions
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 26/46] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 28/46] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide functions to be called before the start of a postcopy
enabled migration (even if it's not eventually used) and
at the end.

During the init we must disable huge pages in the RAM that
we will receive postcopy data into, since if they start off
as hugepage and get a 4k page written to them, the rest of
the hugepage won't get userfault'd and won't work as a destination
for remap.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  12 +++++
 postcopy-ram.c                   | 100 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 75ca0fd..b5f0cb5 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -30,6 +30,18 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Called back from arch_init's ram_postcopy_each_ram_discard to handle
  * discarding one RAMBlock's pre-postcopy dirty pages
  */
diff --git a/postcopy-ram.c b/postcopy-ram.c
index f92f516..2159c60 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -250,6 +250,93 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can use remap_anon_pages to put
+     * a 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        perror("init_area: NOHUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    /* Turn off userfault here as well? */
+
+    DPRINTF("cleanup_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        perror("init_area: HUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    postcopy_pmi_init(mis, ram_pages);
+
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -265,6 +352,19 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
     error_report("postcopy_ram_discard_range: No OS support");
     return -1;
 }
+
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support\n");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    error_report("postcopy_ram_incoming_cleanup: No OS support\n");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 28/46] postcopy: Incoming initialisation
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 27/46] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 29/46] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 11 +++++++++++
 include/migration/migration.h |  1 +
 migration.c                   |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 134ea7e..fd7399c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1234,6 +1234,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 722c846..397f41c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -194,6 +194,7 @@ int ram_postcopy_each_ram_discard(MigrationState *ms);
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       int source_target_page_bits,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration.c b/migration.c
index ca0fd7b..55bb767 100644
--- a/migration.c
+++ b/migration.c
@@ -139,6 +139,8 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 
 void migration_incoming_state_destroy(MigrationIncomingState *mis)
 {
+    postcopy_pmi_destroy(mis);
+
     g_free(mis);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 29/46] postcopy: ram_enable_notify to switch on userfault
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 28/46] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 30/46] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  5 +++++
 postcopy-ram.c                   | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index b5f0cb5..383b1e8 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -28,6 +28,11 @@ int postcopy_send_discard_bitmap(MigrationState *ms);
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                uint8_t *end);
 
+/*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 2159c60..c605dd3 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -337,9 +337,38 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: Unused
+ * Returns 0 on success
+ */
+static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
+                                       ram_addr_t offset, ram_addr_t length,
+                                       void *opaque)
+{
+    if (madvise(host_addr, length, MADV_USERFAULT)) {
+        perror("postcopy_ram_sensitise_area");
+        return -1;
+    }
+    return 0;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 int postcopy_ram_hosttest(void)
 {
     error_report("postcopy_ram_hosttest: No OS support");
@@ -365,6 +394,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return -1;
 }
 
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
+    return -1;
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 30/46] Postcopy: postcopy_start
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 29/46] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_start:
  Perform all the initialisation associated with starting up postcopy
mode from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/migration.c b/migration.c
index 55bb767..0d567ef 100644
--- a/migration.c
+++ b/migration.c
@@ -890,6 +890,91 @@ static int open_outgoing_return_path(MigrationState *ms)
     return 0;
 }
 
+/* Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
+
+    DPRINTF("postcopy_start\n");
+    qemu_mutex_lock_iothread();
+    DPRINTF("postcopy_start: setting run state\n");
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (postcopy_send_discard_bitmap(ms)) {
+        DPRINTF("postcopy send discard bitmap failed\n");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    DPRINTF("postcopy_start: sending req 2\n");
+    qemu_savevm_send_reqack(ms->file, 2);
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    DPRINTF("postcopy_start: do state_complete\n");
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+
+    qemu_savevm_state_complete(fb);
+    DPRINTF("postcopy_start: sending req 3\n");
+    qemu_savevm_send_reqack(fb, 3);
+
+    qemu_savevm_send_postcopy_ram_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
+        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
+                (unsigned long)(qsb_get_length(qsb)));
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        qemu_fclose(fb);
+        return -1;
+    }
+    qemu_savevm_send_packaged(ms->file, qsb);
+    qemu_fclose(fb);
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("postcopy_start not finished sending ack\n");
+    qemu_savevm_send_reqack(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+    }
+
+    return ret;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 30/46] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-05 10:19   ` Paolo Bonzini
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 32/46] mig fd_connect: open return path Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Switch to postcopy if:
   1) There's still a significant amount to transfer
   2) Postcopy is enabled
   3) It's taken longer than the time set by the parameter.

and change the cleanup at the end of migration to match.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 73 insertions(+), 19 deletions(-)

diff --git a/migration.c b/migration.c
index 0d567ef..c73fcfa 100644
--- a/migration.c
+++ b/migration.c
@@ -982,16 +982,40 @@ static int postcopy_start(MigrationState *ms)
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    /* Really, the time we started */
+    const int64_t initial_time_fixed = initial_time;
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
+    int64_t pc_start_time;
+
     bool old_vm_running = false;
+    pc_start_time = s->tunables[MIGRATION_PARAMETER_NAME_X_POSTCOPY_START_TIME];
+
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
 
     qemu_savevm_state_begin(s->file, &s->params);
 
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open it's end so it can reply */
+        qemu_savevm_send_openrp(s->file);
+
+        /* And ask it to send an ack that will make stuff easier to debug */
+        qemu_savevm_send_reqack(s->file, 1);
+
+        /* Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_ram_advise(s->file);
+    }
+
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIG_STATE_ACTIVE;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
     DPRINTF("setup complete\n");
@@ -1012,37 +1036,66 @@ static void *migration_thread(void *opaque)
                     " nonpost=%" PRIu64 ")\n",
                     pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
-                qemu_savevm_state_iterate(s->file);
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
+                    pend_nonpost == 0 &&
+                    (current_time >= initial_time_fixed + pc_start_time)) {
+
+                    if (!postcopy_start(s)) {
+                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
+                    }
+
+                    continue;
+                } else {
+                    /* Just another iteration step */
+                    qemu_savevm_state_iterate(s->file);
+                }
             } else {
                 int ret;
 
-                DPRINTF("done iterating\n");
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
-
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
-                }
-                qemu_mutex_unlock_iothread();
-
-                if (ret < 0) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
-                    break;
+                DPRINTF("done iterating pending size %" PRIu64 "\n",
+                        pending_size);
+
+                if (s->state == MIG_STATE_ACTIVE) {
+                    qemu_mutex_lock_iothread();
+                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+                    old_vm_running = runstate_is_running();
+
+                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+                    if (ret >= 0) {
+                        qemu_file_set_rate_limit(s->file, INT64_MAX);
+                        qemu_savevm_state_complete(s->file);
+                    }
+                    qemu_mutex_unlock_iothread();
+
+                    if (ret < 0) {
+                        migrate_set_state(s, current_active_type,
+                                          MIG_STATE_ERROR);
+                        break;
+                    }
+                } else {
+                    assert(s->state == MIG_STATE_POSTCOPY_ACTIVE);
+                    DPRINTF("postcopy end\n");
+
+                    qemu_savevm_state_postcopy_complete(s->file);
+                    DPRINTF("postcopy end after complete\n");
                 }
 
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
+                    migrate_set_state(s, current_active_type,
+                                      MIG_STATE_COMPLETED);
                     break;
                 }
             }
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
+            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
+            DPRINTF("migration_thread: file is in error state\n");
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1073,6 +1126,7 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    DPRINTF("migration_thread: Hit error: case\n");
     qemu_mutex_lock_iothread();
     if (s->state == MIG_STATE_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 32/46] mig fd_connect: open return path
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 33/46] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/migration.c b/migration.c
index c73fcfa..c7ba6a1 100644
--- a/migration.c
+++ b/migration.c
@@ -1164,6 +1164,18 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /* Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_outgoing_return_path(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 33/46] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 32/46] mig fd_connect: open return path Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 34/46] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 postcopy-ram.c                | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 397f41c..67e5528 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -86,6 +86,9 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index c605dd3..fb7b02b 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -357,8 +357,31 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
     return 0;
 }
 
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 34/46] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 33/46] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 35/46] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 ++
 migration.c                   | 75 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 67e5528..f53add7 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,7 @@ struct MigrationParams {
 enum mig_rpcomm_cmd {
     MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_REQPAGES,     /* data (start: be64, len: be64) */
     MIG_RPCOMM_AFTERLASTVALID
 };
 
@@ -234,6 +235,8 @@ void migrate_send_rp_message(MigrationIncomingState *mis,
                              uint16_t len, uint8_t *data);
 void migrate_send_rp_ack(MigrationIncomingState *mis,
                          uint32_t value);
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index c7ba6a1..efad18f 100644
--- a/migration.c
+++ b/migration.c
@@ -105,6 +105,38 @@ void migrate_send_rp_ack(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = (uint64_t)start;
+    buf64[0] = cpu_to_be64(buf64[0]);
+    buf64[1] = (uint64_t)len;
+    buf64[1] = cpu_to_be64(buf64[1]);
+    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -776,6 +808,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+}
+
+/*
  * 'can read handler' for the fd callback
  * stops the data handler being called if it's gone into
  * error.
@@ -799,9 +842,12 @@ static void source_return_path_handler(void *opaque)
 {
     MigrationState *s = opaque;
     QEMUFile *rp = qemu_file_get_return_path(s->file);
+    uint16_t expected_len;
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
+    char *tmpstr;
     int res;
 
     DPRINTF("RP: Receive\n");
@@ -811,7 +857,6 @@ static void source_return_path_handler(void *opaque)
     }
 
     if (s->rp_state.header_com == MIG_RPCOMM_INVALID) {
-        uint16_t expected_len;
 
         /* No command stored, so we're expecting a new header */
         res = qemu_peek_buffer(rp, buf, 4, 0);
@@ -832,6 +877,11 @@ static void source_return_path_handler(void *opaque)
             expected_len = 4;
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             DPRINTF("RP: Received invalid cmd 0x%04x length 0x%04x\n",
                     s->rp_state.header_com, s->rp_state.header_len);
@@ -866,6 +916,29 @@ static void source_return_path_handler(void *opaque)
         atomic_xchg(&s->rp_state.latest_ack, tmp32);
         break;
 
+    case MIG_RPCOMM_REQPAGES:
+        tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
+        tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
+        tmpstr = NULL;
+        if (tmp64b & 1) {
+            /* Now we expect an idstr */
+            tmp32 = buf[16]; /* Length of the following idstr */
+            tmpstr = (char *)&buf[17];
+            buf[17+tmp32] = '\0';
+            expected_len = 16+1+tmp32;
+        } else {
+            expected_len = 16;
+        }
+        if (s->rp_state.header_len != expected_len) {
+            error_report("RP: Received ReqPage with length %d expecting %d",
+                    s->rp_state.header_len, expected_len);
+            source_return_path_bad(s);
+        }
+        migrate_handle_rp_reqpages(s, tmpstr,
+                                      (ram_addr_t)tmp64a,
+                                      (ram_addr_t)tmp64b);
+        break;
+
     default:
         /* This shouldn't happen because we should catch this above */
         DPRINTF("RP: Bad header_com in dispatch\n");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 35/46] Page request: Process incoming page request
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 34/46] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 36/46] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQPAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h | 26 ++++++++++++++++++++++
 include/qemu/typedefs.h       |  3 ++-
 migration.c                   | 34 +++++++++++++++++++++++++++-
 4 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index fd7399c..cc4acea 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -658,6 +658,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no previous block");
+            return -1;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            return -1;
+        }
+    }
+    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
+                    ramblock->idstr, start, len);
+
+    if (start+len > ramblock->length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->length);
+        return -1;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return 0;
+}
+
+/*
  * ram_find_and_save_block: Finds a page to send and sends it to f
  *
  * Returns:  The number of bytes written.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index f53add7..fe639b4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -97,6 +97,16 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
 void migration_incoming_state_destroy(MigrationIncomingState *mis);
+/* An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
 
 /* State for the outgoing migration */
 struct MigrationState
@@ -124,6 +134,19 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     *    Where it's used to send the dirtymap at the start
+     *    of the postcopy phase, then cleared
+     */
+    unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -257,4 +280,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              int *bytes_sent);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 61b330c..d57acc5 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 
+typedef struct AdapterInfo AdapterInfo;
 typedef struct AioContext AioContext;
 
 typedef struct Visitor Visitor;
@@ -79,6 +80,6 @@ typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
 typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
-typedef struct AdapterInfo AdapterInfo;
+typedef struct RAMBlock RAMBlock;
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration.c b/migration.c
index efad18f..66d281b 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 //#define DEBUG_MIGRATION
 
@@ -500,6 +502,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -602,6 +613,9 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIG_STATE_SETUP;
     trace_migrate_set_state(MIG_STATE_SETUP);
 
+    qemu_mutex_init(&s->src_page_req_mutex);
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -815,7 +829,25 @@ static void source_return_path_bad(MigrationState *s)
 static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
+            rbname, start, len);
+
+    /* Round everything up to our host page size */
+    long our_host_ps = sysconf(_SC_PAGESIZE);
+    if (start & (our_host_ps-1)) {
+        long roundings = start & (our_host_ps-1);
+        start -= roundings;
+        len += roundings;
+    }
+    if (len & (our_host_ps-1)) {
+        long roundings = len & (our_host_ps-1);
+        len -= roundings;
+        len += our_host_ps;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 36/46] Page request: Consume pages off the post-copy queue
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 35/46] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 106 insertions(+), 24 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index cc4acea..c006d21 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -458,6 +458,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
     return ret;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     ram_addr_t addr;
@@ -658,6 +671,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:   The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:   MigrationState in
+ *  offset:   the byte offset within the RAMBlock for the start of the page
+ * bitoffset: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       unsigned long *bitoffset)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *bitoffset = (entry->offset + entry->rb->offset) >> TARGET_PAGE_BITS;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+/*
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
  *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
@@ -718,44 +764,80 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int bytes_sent = 0;
-    MemoryRegion *mr;
     unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
-        }
-        if (offset >= block->length) {
-            offset = 0;
-            block = QTAILQ_NEXT(block, next);
-            if (!block) {
-                block = QTAILQ_FIRST(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
+                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
+            /* We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again.
+             */
+            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
+                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
+                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
+                        test_bit(bitoffset, ms->sentmap));
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We mustn't change block/offset unless it's to a valid one
+             * otherwise we can go down some of the exit cases in the normal
+             * path.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
         } else {
-            bytes_sent = ram_save_page(f, block, offset, last_stage);
-
-            /* if page is unmodified, continue to the next */
-            if (bytes_sent > 0) {
-                MigrationState *s = migrate_get_current();
-                if (s->sentmap) {
-                    set_bit(bitoffset, s->sentmap);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                           &bitoffset);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
+            }
+            if (offset >= block->length) {
+                offset = 0;
+                block = QTAILQ_NEXT(block, next);
+                if (!block) {
+                    block = QTAILQ_FIRST(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
                 }
+                continue; /* pick an offset in the new block */
+            }
+        }
 
-                last_sent_block = block;
-                break;
+        /* We have a page to send, so send it */
+        bytes_sent = ram_save_page(f, block, offset, last_stage);
+
+        /* if page is unmodified, continue to the next */
+        if (bytes_sent > 0) {
+            if (ms->sentmap) {
+                set_bit(bitoffset, ms->sentmap);
             }
+
+            last_sent_block = block;
+            break;
         }
     }
     last_seen_block = block;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 36/46] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-11 15:20   ` Eric Blake
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 38/46] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  46 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch_init.c b/arch_init.c
index c006d21..58eccc1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -439,6 +439,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 
     if (next < size) {
         clear_bit(next, migration_bitmap);
+        assert(migration_dirty_pages > 0);
         migration_dirty_pages--;
     }
     *bitoffset = next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 38/46] postcopy_ram.c: place_page and helpers
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 39/46] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the new remap_anon_pages syscall).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |   1 +
 include/migration/postcopy-ram.h |  23 +++++++++
 postcopy-ram.c                   | 105 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index fe639b4..1a33b05 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -93,6 +93,7 @@ struct MigrationIncomingState {
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
+    void          *postcopy_tmp_page;
 };
 
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 383b1e8..57a74f0 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -53,6 +53,29 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
 int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
                                  unsigned long start, unsigned long end);
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset);
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 void postcopy_pmi_destroy(MigrationIncomingState *mis);
 void postcopy_pmi_discard_range(MigrationIncomingState *mis,
                                 size_t start, size_t npages);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index fb7b02b..de3534f 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -334,6 +334,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -390,6 +394,88 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset)
+{
+    void *tmp = postcopy_get_tmp_page(mis);
+    if (!tmp) {
+        return -ENOMEM;
+    }
+    *(char *)tmp = 0;
+    return postcopy_place_page(mis, host, tmp, bitmap_offset);
+}
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset)
+{
+    PostcopyPMIState old_state, tmp_state;
+
+    if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
+            getpagesize()) {
+        perror("remap_anon_pages in postcopy_place_page");
+        fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
+                postcopy_pmi_get_state(mis, bitmap_offset));
+
+        return -errno;
+    }
+
+    tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
+    do {
+        old_state = tmp_state;
+        tmp_state = postcopy_pmi_change_state(mis, bitmap_offset, old_state,
+                                              POSTCOPY_PMI_RECEIVED);
+
+    } while (old_state != tmp_state);
+
+
+    if (old_state == POSTCOPY_PMI_REQUESTED) {
+        /* TODO: Notify kernel */
+    }
+
+    /* TODO: hostpagesize!=targetpagesize case */
+    return 0;
+}
+
+/*
+ * Returns a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            perror("mapping postcopy tmp page");
+            return NULL;
+        }
+        if (madvise(mis->postcopy_tmp_page, getpagesize(), MADV_DONTFORK)) {
+            munmap(mis->postcopy_tmp_page, getpagesize());
+            perror("postcpy tmp page DONTFORK");
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 int postcopy_ram_hosttest(void)
@@ -422,6 +508,25 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
     return -1;
 }
+
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host)
+{
+    error_report("postcopy_place_zero_page: No OS support");
+    return -1;
+}
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
+{
+    error_report("postcopy_place_page: No OS support");
+    return -1;
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    error_report("postcopy_get_tmp_page: No OS support");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 39/46] Postcopy: Use helpers to map pages during migration
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 38/46] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 40/46] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 73 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 58eccc1..b971f47 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1328,9 +1328,18 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ * rb: Pointer to RAMBlock* that gets filled in with the RB we find
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
-                                            int flags)
+                                            int flags, RAMBlock **rb)
 {
     static RAMBlock *block = NULL;
     char id[256];
@@ -1341,6 +1350,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
+        if (rb) {
+            *rb = block;
+        }
 
         return memory_region_get_ram_ptr(block->mr) + offset;
     }
@@ -1350,8 +1362,12 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
+        if (!strncmp(id, block->idstr, sizeof(id))) {
+            if (rb) {
+                *rb = block;
+            }
             return memory_region_get_ram_ptr(block->mr) + offset;
+        }
     }
 
     error_report("Can't find block %s!", id);
@@ -1385,6 +1401,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     ram_addr_t addr;
     int flags, ret = 0;
     static uint64_t seq_iter;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    bool postcopy_running = f->mis->postcopy_ram_state >=
+                            POSTCOPY_RAM_INCOMING_LISTENING;
 
     seq_iter++;
 
@@ -1439,8 +1461,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         } else if (flags & RAM_SAVE_FLAG_COMPRESS) {
             void *host;
             uint8_t ch;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -1448,20 +1471,63 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
 
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                if (!ch) {
+                    ret = postcopy_place_zero_page(f->mis, host,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                } else {
+                    void *tmp;
+                    tmp = postcopy_get_tmp_page(f->mis);
+                    if (!tmp) {
+                        return -ENOMEM;
+                    }
+                    memset(tmp, ch, TARGET_PAGE_SIZE);
+                    ret = postcopy_place_page(f->mis, host, tmp,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                }
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy compress @"
+                                 "%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_PAGE) {
             void *host;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
 
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            } else {
+                void *tmp = postcopy_get_tmp_page(f->mis);
+                if (!tmp) {
+                    return -ENOMEM;
+                }
+                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
+                ret = postcopy_place_page(f->mis, host, tmp,
+                          (addr + rb->offset) >> TARGET_PAGE_BITS);
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy simple"
+                                 "@%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
-            void *host = host_from_stream_offset(f, addr, flags);
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
+            }
+            void *host = host_from_stream_offset(f, addr, flags, NULL);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 40/46] qemu_ram_block_from_host
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 39/46] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 41/46] Handle userfault requests (although userfaultfd not done yet) Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock, the global ram_addr_t value and it's bitmap position.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since it's the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 50 ++++++++++++++++++++++++++++++++++++++++++-----
 include/exec/cpu-common.h |  3 +++
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index a9ad052..cc446ec 100644
--- a/exec.c
+++ b/exec.c
@@ -1176,6 +1176,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
     RAMBlock *new_block = find_ram_block(addr);
@@ -1515,16 +1520,32 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
+ *                          isn't available)
+ *
+ * Returns: RAMBlock (or NULL if not found)
+ */
+RAMBlock *qemu_ram_block_from_host(void *ptr, ram_addr_t *ram_addr,
+                                   ram_addr_t *offset, unsigned long *bm_index)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
 
     if (xen_enabled()) {
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        return qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (!block) {
+            return NULL;
+        }
+        *offset = (host - block->host);
+        return block;
     }
 
     block = ram_list.mru_block;
@@ -1545,7 +1566,26 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
+    *offset = (host - block->host);
+    *ram_addr = block->offset + *offset;
+    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+    unsigned long index; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, ram_addr, &offset, &index);
+
+    if (!block) {
+        return NULL;
+    }
+
     return block->mr;
 }
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 8042f50..117bf21 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -55,8 +55,11 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, ram_addr_t *ram_addr,
+                                   ram_addr_t *offset, unsigned long *bm_index);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 41/46] Handle userfault requests (although userfaultfd not done yet)
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 40/46] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 42/46] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 postcopy-ram.c                | 93 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1a33b05..46fc37b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -89,6 +89,7 @@ struct MigrationIncomingState {
 
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
+    int            userfault_fd;
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
diff --git a/postcopy-ram.c b/postcopy-ram.c
index de3534f..8d0a225 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -183,7 +183,6 @@ static void postcopy_pmi_dump(MigrationIncomingState *mis)
     ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
 }
 
-/* ---------------------------------------------------------------------- */
 int postcopy_ram_hosttest(void)
 {
     /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
@@ -367,12 +366,97 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+    void *hostaddr;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL;
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    DPRINTF("%s", __func__);
     qemu_sem_post(&mis->fault_thread_sem);
     while (1) {
-        /* TODO: In later patch */
+        PostcopyPMIState old_state, tmp_state;
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        unsigned long bitmap_index;
+
+        /* Read a faulting HVA from the kernel */
+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
+        if (ret != sizeof(hostaddr)) {
+            if (ret < 0) {
+                perror("Failed to read full userfault hostaddr");
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %ld",
+                             __func__, ret, sizeof(hostaddr));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+
+        /* TODO: We want to be marking host-page-size areas of the bitmaps? */
+        last_rb = rb;
+        rb = qemu_ram_block_from_host(hostaddr, &in_raspace, &rb_offset,
+                                      &bitmap_index);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %p",
+                         hostaddr);
+            break;
+        }
+
+        DPRINTF("%s: Request for HVA=%p index=%lx rb=%s offset=%zx",
+                __func__, hostaddr, bitmap_index, qemu_ram_get_idstr(rb),
+                rb_offset);
+
+        tmp_state = postcopy_pmi_get_state(mis, bitmap_index);
+        do {
+            old_state = tmp_state;
+
+            switch (old_state) {
+            case POSTCOPY_PMI_REQUESTED:
+                /* Do nothing - it's already requested */
+                break;
+
+            case POSTCOPY_PMI_RECEIVED:
+                /* Already arrived - no state change, just kick the kernel */
+                DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
+                        hostaddr);
+                /* TODO! Send ack
+                if (ack_userfault(mis, hostaddr, hostpagesize)) {
+                    assert(0);
+                } */
+                break;
+
+            case POSTCOPY_PMI_MISSING:
+
+                tmp_state = postcopy_pmi_change_state(mis, bitmap_index,
+                                           old_state, POSTCOPY_PMI_REQUESTED);
+                if (tmp_state == POSTCOPY_PMI_MISSING) {
+                    /*
+                     * Send the request to the source - we want to request one
+                     * of our host page sizes (which is >= TPS)
+                     */
+                    if (rb != last_rb) {
+                        migrate_send_rp_reqpages(mis, qemu_ram_get_idstr(rb),
+                                                 rb_offset, hostpagesize);
+                    } else {
+                        /* Save some space */
+                        migrate_send_rp_reqpages(mis, NULL,
+                                                 rb_offset, hostpagesize);
+                    }
+
+                    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_END) {
+                        /* This shouldn't happen - the command to close the
+                         * postcopy stream should be after the last page of RAM
+                         * so we're not going to get an answer
+                         */
+                        error_report("postcopy_ram_fault_thread: UF after end");
+                        postcopy_pmi_dump(mis);
+                        assert(0);
+                    }
+                }
+                break;
+           }
+        } while (tmp_state != old_state);
     }
 
     return NULL;
@@ -381,6 +465,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     /* Create the fault handler thread and wait for it to be ready */
+    mis->userfault_fd = -1; /* TODO */
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 42/46] Start up a postcopy/listener thread ready for incoming page data
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 41/46] Handle userfault requests (although userfaultfd not done yet) Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 43/46] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration.c                   |  6 +++++
 savevm.c                      | 63 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 46fc37b..3313b3c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -89,6 +89,10 @@ struct MigrationIncomingState {
 
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     int            userfault_fd;
 
     QEMUFile *return_path;
diff --git a/migration.c b/migration.c
index 66d281b..fc8911d 100644
--- a/migration.c
+++ b/migration.c
@@ -1045,6 +1045,12 @@ static int postcopy_start(MigrationState *ms)
      */
     QEMUFile *fb = qemu_bufopen("w", NULL);
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_ram_listen(fb);
+
     qemu_savevm_state_complete(fb);
     DPRINTF("postcopy_start: sending req 3\n");
     qemu_savevm_send_reqack(fb, 3);
diff --git a/savevm.c b/savevm.c
index 1d5375c..f4907db 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1259,9 +1259,46 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+typedef struct ram_listen_thread_data {
+    QEMUFile *f;
+    LoadStateEntry_Head *lh;
+} ram_listen_thread_data;
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    ram_listen_thread_data *rltd = opaque;
+    int load_res;
+
+    qemu_sem_post(&rltd->f->mis->listen_thread_sem);
+    DPRINTF("postcopy_ram_listen_thread start");
+
+    load_res = qemu_loadvm_state_main(rltd->f, rltd->lh);
+
+    DPRINTF("postcopy_ram_listen_thread exiting");
+    if (load_res) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(rltd->f, load_res);
+    }
+    /* TODO: Find somewhere better for this! */
+    close(rltd->f->mis->userfault_fd);
+    postcopy_ram_incoming_cleanup(rltd->f->mis);
+    g_free(rltd);
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive page data */
 static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 {
+    ram_listen_thread_data *rltd = g_malloc(sizeof(ram_listen_thread_data));
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
         error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
@@ -1280,8 +1317,25 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
-    return 0;
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    rltd->f = mis->file;
+    rltd->lh = &loadvm_handlers;
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, rltd, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+
+    /*
+     * all good - cause the loop that handled this command to exit because
+     * the new thread is taking over
+     */
+    return LOADVM_EXITCODE_QUITPARENT | LOADVM_EXITCODE_KEEPHANDLERS;
 }
 
 /* After all discards we can start running and asking for pages */
@@ -1596,6 +1650,11 @@ int qemu_loadvm_state(QEMUFile *f)
     QLIST_INIT(&loadvm_handlers);
     ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
+    if (f->mis->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         cpu_synchronize_all_post_init();
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 43/46] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 42/46] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 44/46] postcopy: Use userfaultfd Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_ram_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/savevm.c b/savevm.c
index f4907db..8c79d37 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1341,6 +1341,8 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 /* After all discards we can start running and asking for pages */
 static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
@@ -1349,14 +1351,39 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
     }
 
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+    bdrv_clear_incoming_migration_all();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: cpu_synchronize_all_post_init");
+    cpu_synchronize_all_post_init();
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: vm_start");
     /* Hold onto your hats, starting the CPU */
     vm_start();
 
-    return 0;
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
-/* The end - with a byte from the source which can tell us to fail. */
-static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+/* The end - with a byte from the source which can tell us to fail.
+ * The source sends this either if there is a failure, or if it believes it's
+ * sent everything
+ */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis,
+                                          uint8_t status)
 {
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
@@ -1364,7 +1391,17 @@ static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
                      mis->postcopy_ram_state);
         return -1;
     }
-    return -1; /* TODO - expecting 1 byte good/fail */
+
+    DPRINTF("loadvm_postcopy_ram_handle_end status=%d", status);
+
+    /* TODO: Give up on non-0 status
+     * TODO: If 0 status, check we've received everything (all outstanding
+     * requests should already have been completed)
+     * TODO: Indicate completion back to source
+     */
+    /* TODO: Or should this be none? */
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_END;
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
@@ -1505,7 +1542,7 @@ static int loadvm_process_command(QEMUFile *f,
                                                    len, 1)) {
             return -1;
         }
-        return loadvm_postcopy_ram_handle_end(mis);
+        return loadvm_postcopy_ram_handle_end(mis, qemu_get_byte(f));
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 44/46] postcopy: Use userfaultfd
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 43/46] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 45/46] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages marked as MADV_USERFAULT, and
allows the program to acknowledge those stalls and tell the accessing
thread to carry on.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 postcopy-ram.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 89 insertions(+), 6 deletions(-)

diff --git a/postcopy-ram.c b/postcopy-ram.c
index 8d0a225..466c42b 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -68,6 +68,14 @@
 #define __NR_remap_anon_pages 317
 #endif
 
+#ifndef __NR_userfaultfd
+#define __NR_userfaultfd 318
+#endif
+
+#ifndef USERFAULTFD_PROTOCOL
+#define USERFAULTFD_PROTOCOL (uint64_t)0xaa
+#endif
+
 /* ---------------------------------------------------------------------- */
 /* Postcopy pagemap-inbound (pmi) - data structures that record the       */
 /* state of each page used by the inbound postcopy                        */
@@ -192,6 +200,7 @@ int postcopy_ram_hosttest(void)
      */
     void *testarea, *testarea2;
     long pagesize = getpagesize();
+    int ufd;
 
     testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
                                     MAP_ANONYMOUS, -1, 0);
@@ -201,15 +210,24 @@ int postcopy_ram_hosttest(void)
     }
     g_assert(((size_t)testarea & (pagesize-1)) == 0);
 
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        perror("postcopy_ram_hosttest: userfaultfd not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
     if (madvise(testarea, pagesize, MADV_USERFAULT)) {
         perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
         munmap(testarea, pagesize);
+        close(ufd);
         return -1;
     }
 
     if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
         perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
         munmap(testarea, pagesize);
+        close(ufd);
         return -1;
     }
 
@@ -226,11 +244,13 @@ int postcopy_ram_hosttest(void)
         perror("postcopy_ram_hosttest: remap_anon_pages not available");
         munmap(testarea, pagesize);
         munmap(testarea2, pagesize);
+        close(ufd);
         return -1;
     }
 
     munmap(testarea, pagesize);
     munmap(testarea2, pagesize);
+    close(ufd);
     return 0;
 }
 
@@ -361,6 +381,39 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
 }
 
 /*
+ * Tell the kernel that we've now got some memory it previously asked for.
+ * Note: We're not allowed to ack a page which wasn't requested.
+ */
+static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
+{
+    uint64_t tmp[2];
+
+    /* Kernel wants the range that's now safe to access */
+    tmp[0] = (uint64_t)start;
+    tmp[1] = (uint64_t)start + (uint64_t)(len-1);
+
+    if (write(mis->userfault_fd, tmp, 16) != 16) {
+        int e = errno;
+
+        if (e == ENOENT) {
+            /* Kernel said it wasn't waiting - one case where this can
+             * happen is where two threads triggered the userfault
+             * and we receive the page and ack it just after we received
+             * the 2nd request and that ends up deciding it should ack it
+             * We could optimise it out, but it's rare.
+             */
+            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
+            return 0;
+        }
+        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
+                     start, len, e);
+        return -errno;
+    }
+
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
@@ -420,10 +473,9 @@ static void *postcopy_ram_fault_thread(void *opaque)
                 /* Already arrived - no state change, just kick the kernel */
                 DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
                         hostaddr);
-                /* TODO! Send ack
                 if (ack_userfault(mis, hostaddr, hostpagesize)) {
                     assert(0);
-                } */
+                }
                 break;
 
             case POSTCOPY_PMI_MISSING:
@@ -464,8 +516,33 @@ static void *postcopy_ram_fault_thread(void *opaque)
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
-    mis->userfault_fd = -1; /* TODO */
+    uint64_t tmp64;
+
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (mis->userfault_fd == -1) {
+        perror("Failed to open userfault fd");
+        return -1;
+    }
+
+    /*
+     * Version handshake, we send it the version we want and expect to get the
+     * same back.
+     */
+    tmp64 = USERFAULTFD_PROTOCOL;
+    if (write(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Writing userfaultfd version");
+        return -1;
+    }
+    if (read(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Reading userfaultfd version");
+        return -1;
+    }
+    if (tmp64 != USERFAULTFD_PROTOCOL) {
+        error_report("Mismatched userfaultfd version, expected %zx, got %zx",
+                     (size_t)USERFAULTFD_PROTOCOL, (size_t)tmp64);
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
@@ -476,6 +553,8 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
         return -1;
     }
 
+    DPRINTF("postcopy_ram_enable_notify: Sensitised");
+
     return 0;
 }
 
@@ -509,11 +588,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
     if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
             getpagesize()) {
+        int e = errno;
         perror("remap_anon_pages in postcopy_place_page");
         fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
                 postcopy_pmi_get_state(mis, bitmap_offset));
 
-        return -errno;
+        return -e;
     }
 
     tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
@@ -526,7 +606,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
 
     if (old_state == POSTCOPY_PMI_REQUESTED) {
-        /* TODO: Notify kernel */
+        /* Send the kernel the host address that should now be accessible */
+        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
+                __func__, bitmap_offset, host);
+        return ack_userfault(mis, host, getpagesize());
     }
 
     /* TODO: hostpagesize!=targetpagesize case */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 45/46] End of migration for postcopy
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 44/46] postcopy: Use userfaultfd Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 46/46] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2014-07-05 10:28 ` [Qemu-devel] [PATCH 00/46] Postcopy implementation Paolo Bonzini
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/migration.c b/migration.c
index fc8911d..5a85d03 100644
--- a/migration.c
+++ b/migration.c
@@ -190,6 +190,26 @@ static void process_incoming_migration_co(void *opaque)
 
     ret = qemu_loadvm_state(f);
 
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * Where a migration had postcopy enabled (and thus went to advise)
+         * but managed to complete within the precopy period
+         */
+        postcopy_ram_incoming_cleanup(mis);
+    }
+
+    DPRINTF("%s: ret=%d postcopy_ram_state=%d", __func__, ret,
+            mis->postcopy_ram_state);
+    if ((ret >= 0) &&
+        (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
+        /*
+         * Postcopy was started, cleanup should happen at the end of the
+         * postcopy thread.
+         */
+        DPRINTF("process_incoming_migration_co: exiting main branch");
+        return;
+    }
+
     f->mis = NULL;
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [PATCH 46/46] Start documenting how postcopy works.
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 45/46] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2014-07-04 17:41 ` Dr. David Alan Gilbert (git)
  2014-07-05 10:28 ` [Qemu-devel] [PATCH 00/46] Postcopy implementation Paolo Bonzini
  46 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-07-04 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..dbd5e5f 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,151 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way comms; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by fd_handler on main thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+Opening the return path generally sets the fd to be non-blocking so that a
+failed destination can't block the source; and since the non-blockingness seems
+to follow both directions it does alter the semantics of the forward path.
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+it's plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is automatically made to precopy.
+
+=== Enabling postcopy ===
+
+To enable pure postcopy:
+
+migrate_set_capability x-postcopy-ram on
+
+To add a period of precopy:
+
+migrate_set_parameter x-postcopy-start-time 500
+
+(time in ms)
+
+=== Postcopy states ===
+Postcopy moves through a series of states (see postcopy_ram_state)
+from ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't passed the start-time threshold; here the destination
+          checks it's OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
+
+Normal precopy now carries on as normal, until the point that the source
+hits the start-time threshold and transitions to postcopy.  The source
+stops it's CPUs and transmits a 'discard bitmap' indicating pages that
+have been previously sent but are now dirty again and hence are out of
+date on the destination.
+
+The migration stream now contains a 'package' containing it's own chunk
+of migration stream, followed by a return to a normal stream containing
+page data.  The package (sent as CMD_PACKAGED) contains the commands to
+cycle the states on the destination, followed by all of the device
+state excluding RAM.  This lets the destination request pages from the
+source in parallel with loading device state, this is required since
+some devices (virtio) access guest memory during device initialisation.
+
+  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+The package now contains all the remaining state data and the command
+to transition to the next state.
+
+  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+Page data is sent from the source to the destination both as part
+of a linear scan (like normal migration), and received by the 'listen thread',
+When the destination tries to use a page it hasn't got, it requests
+it from the source (down the return path) and the source sends this
+page in the same stream.  When the source has transmitted all pages
+it sends a POSTCOPY_RAM_END command to transition to
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+state, the migration is now complete.
+
+=== Source side page maps ===
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that once in postcopy mode, the sent map is still updated, however it's
+contents are not-consistent as a local view of what's been sent since it's
+only got the masked result.
+
+=== Destination side page maps ===
+(Needs to be changed so we can update both easily - at the moment updates are done
+ with a lock)
+The destination keeps a 'requested map' and a 'received map'.
+Both maps are initially 0, as pages are received the bits are set in 'received map'.
+Incoming requests from the kernel cause the bit to be set in the 'requested map'.
+When a page is received that is marked as 'requested' the kernel is notified.
+If the kernel requests a page that has already been 'received' the kernel is notified
+without re-requesting.
+
+This leads to three valid page states:
+page states:
+    missing (!rc,!rq)  - page not yet received or requested
+    received (rc,!rq)  - Page received
+    requested (!rc,rq) - page requested but not yet received
+
+state transitions:
+      received -> missing   (only during setup/discard)
+
+      missing -> received   (normal incoming page)
+      requested -> received (incoming page previously requested)
+      missing -> requested  (userfault request)
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-07-05 10:06   ` Paolo Bonzini
  2014-07-16  9:37     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-05 10:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
>
> Wire it up for 'socket' QEMUFile's using a dup'd fd.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/qemu-file.h |  8 +++++
>  qemu-file.c                   | 74 +++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 76 insertions(+), 6 deletions(-)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index df38646..ec1a342 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -87,6 +87,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
>  #define QEMUFILE_IO_BUF_SIZE 32768
>  #define QEMUFILE_MAX_IOV_SIZE MIN(IOV_MAX, 64)
>
> +/*
> + * Return a QEMUFile for comms in the opposite direction
> + */
> +typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
> +
>  typedef struct QEMUFileOps {
>      QEMUFilePutBufferFunc *put_buffer;
>      QEMUFileGetBufferFunc *get_buffer;
> @@ -97,6 +102,7 @@ typedef struct QEMUFileOps {
>      QEMURamHookFunc *after_ram_iterate;
>      QEMURamHookFunc *hook_ram_load;
>      QEMURamSaveFunc *save_page;
> +    QEMURetPathFunc *get_return_path;
>  } QEMUFileOps;
>
>  struct QEMUFile {
> @@ -117,6 +123,7 @@ struct QEMUFile {
>
>      int last_error;
>
> +    struct QEMUFile *return_path;
>      MigrationIncomingState *mis;
>  };
>
> @@ -202,6 +209,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
>  int64_t qemu_file_get_rate_limit(QEMUFile *f);
>  int qemu_file_get_error(QEMUFile *f);
>  void qemu_file_set_error(QEMUFile *f, int ret);
> +QEMUFile *qemu_file_get_return_path(QEMUFile *f);
>  void qemu_fflush(QEMUFile *f);
>
>  static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
> diff --git a/qemu-file.c b/qemu-file.c
> index 88cacc7..98a6d2a 100644
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -16,6 +16,54 @@ typedef struct QEMUFileSocket {
>      QEMUFile *file;
>  } QEMUFileSocket;
>
> +/* Give a QEMUFile* off the same socket but data in the opposite
> + * direction.
> + * qemu_fopen_socket marks write fd's as blocking, but doesn't
> + * touch read fd's status, so we dup the fd just to keep settings
> + * separate. [TBD: Do I need to explicitly mark as non-block on read?]
> + */
> +static QEMUFile *socket_dup_return_path(void *opaque)
> +{
> +    QEMUFileSocket *qfs = opaque;
> +    int revfd;
> +    bool this_is_read;
> +    QEMUFile *result;
> +
> +    /* If it's already open, return it */
> +    if (qfs->file->return_path) {
> +        return qfs->file->return_path;

Wouldn't this leave a dangling file descriptor if you call 
socket_dup_return_path twice, and then close the original QEMUFile?

> +    }
> +
> +    if (qemu_file_get_error(qfs->file)) {
> +        /* If the forward file is in error, don't try and open a return */
> +        return NULL;
> +    }
> +
> +    /* I don't think there's a better way to tell which direction 'this' is */
> +    this_is_read = qfs->file->ops->get_buffer != NULL;
> +
> +    revfd = dup(qfs->fd);
> +    if (revfd == -1) {
> +        error_report("Error duplicating fd for return path: %s",
> +                      strerror(errno));
> +        return NULL;
> +    }
> +
> +    qemu_set_nonblock(revfd);

Blocking/nonblocking is per-file *description*, not descriptor.  So 
you're making the original QEMUFile nonblocking as well.  Can you 
explain why this is needed before I reach the meat of the patch series?

In other words, can you draw a table with source/dest and read/write, 
and whether it should be blocking or non-blocking?

Paolo

> +    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
> +    qfs->file->return_path = result;
> +
> +    if (result) {
> +        /* We are the reverse path of our reverse path (although I don't
> +           expect this to be used, it would stop another dup if it was */
> +        result->return_path = qfs->file;
> +    } else {
> +        close(revfd);
> +    }
> +
> +    return result;
> +}
> +
>  static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
>                                      int64_t pos)
>  {
> @@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
>  }
>
>  static const QEMUFileOps socket_read_ops = {
> -    .get_fd =     socket_get_fd,
> -    .get_buffer = socket_get_buffer,
> -    .close =      socket_close
> +    .get_fd          = socket_get_fd,
> +    .get_buffer      = socket_get_buffer,
> +    .close           = socket_close,
> +    .get_return_path = socket_dup_return_path
>  };
>
>  static const QEMUFileOps socket_write_ops = {
> -    .get_fd =     socket_get_fd,
> -    .writev_buffer = socket_writev_buffer,
> -    .close =      socket_close
> +    .get_fd          = socket_get_fd,
> +    .writev_buffer   = socket_writev_buffer,
> +    .close           = socket_close,
> +    .get_return_path = socket_dup_return_path
>  };
>
> +/*
> + * Result: QEMUFile* for a 'return path' for comms in the opposite direction
> + *         NULL if not available
> + */
> +QEMUFile *qemu_file_get_return_path(QEMUFile *f)
> +{
> +    if (!f->ops->get_return_path) {
> +        return NULL;
> +    }
> +    return f->ops->get_return_path(f->opaque);
> +}
> +
>  bool qemu_file_mode_is_not_valid(const char *mode)
>  {
>      if (mode == NULL ||
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-07-05 10:07   ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-05 10:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The return path uses a non-blocking fd so as not to block waiting
> for the (possibly broken) destination to finish returning a message,
> however we still want outbound data to behave in the same way and block.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/qemu-file.c b/qemu-file.c
> index 98a6d2a..9809428 100644
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -70,12 +70,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
>      QEMUFileSocket *s = opaque;
>      ssize_t len;
>      ssize_t size = iov_size(iov, iovcnt);
> +    ssize_t offset = 0;
> +    int     err;
>
> -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> -    if (len < size) {
> -        len = -socket_error();
> +    while (size > 0) {
> +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> +
> +        if (len > 0) {
> +            size -= len;
> +            offset += len;
> +        }
> +
> +        if (size > 0) {
> +            err = socket_error();
> +
> +            if (err != EAGAIN) {
> +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> +                             err, size, len);
> +                /*
> +                 * If I've already sent some but only just got the error, I
> +                 * could return the amount validly sent so far and wait for the
> +                 * next call to report the error, but I'd rather flag the error
> +                 * immediately.
> +                 */
> +                return -err;
> +            }
> +
> +            /* Emulate blocking */
> +            GPollFD pfd;
> +
> +            pfd.fd = s->fd;
> +            pfd.events = G_IO_OUT | G_IO_ERR;
> +            pfd.revents = 0;
> +            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
> +        }
>      }
> -    return len;
> +
> +    return offset;
>  }
>
>  static int socket_get_fd(void *opaque)
>

I guess the table I just asked about would help clarifying this as well. 
  Also note that g_poll doesn't work on sockets on Windows, but I hope 
we can avoid it altogether. :)

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
@ 2014-07-05 10:19   ` Paolo Bonzini
  2014-08-28 11:04     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-05 10:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Switch to postcopy if:
>    1) There's still a significant amount to transfer
>    2) Postcopy is enabled
>    3) It's taken longer than the time set by the parameter.
>
> and change the cleanup at the end of migration to match.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 73 insertions(+), 19 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index 0d567ef..c73fcfa 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -982,16 +982,40 @@ static int postcopy_start(MigrationState *ms)
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    /* Used by the bandwidth calcs, updated later */
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    /* Really, the time we started */
> +    const int64_t initial_time_fixed = initial_time;
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
>      int64_t max_size = 0;
>      int64_t start_time = initial_time;
> +    int64_t pc_start_time;
> +
>      bool old_vm_running = false;
> +    pc_start_time = s->tunables[MIGRATION_PARAMETER_NAME_X_POSTCOPY_START_TIME];
> +
> +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> +    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
>
>      qemu_savevm_state_begin(s->file, &s->params);
>
> +    if (migrate_postcopy_ram()) {
> +        /* Now tell the dest that it should open it's end so it can reply */
> +        qemu_savevm_send_openrp(s->file);
> +
> +        /* And ask it to send an ack that will make stuff easier to debug */
> +        qemu_savevm_send_reqack(s->file, 1);
> +
> +        /* Tell the destination that we *might* want to do postcopy later;
> +         * if the other end can't do postcopy it should fail now, nice and
> +         * early.
> +         */
> +        qemu_savevm_send_postcopy_ram_advise(s->file);
> +    }
> +
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> +    current_active_type = MIG_STATE_ACTIVE;
>      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
>
>      DPRINTF("setup complete\n");
> @@ -1012,37 +1036,66 @@ static void *migration_thread(void *opaque)
>                      " nonpost=%" PRIu64 ")\n",
>                      pending_size, max_size, pend_post, pend_nonpost);
>              if (pending_size && pending_size >= max_size) {
> -                qemu_savevm_state_iterate(s->file);
> +                /* Still a significant amount to transfer */
> +
> +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                if (migrate_postcopy_ram() &&
> +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> +                    pend_nonpost == 0 &&
> +                    (current_time >= initial_time_fixed + pc_start_time)) {
> +
> +                    if (!postcopy_start(s)) {
> +                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> +                    }
> +
> +                    continue;
> +                } else {

You don't really need the "else" if you have a continue.  However, do 
you need _any_ of the "else" and "continue"?  Would the next iteration 
of the "while" loop do anything else but invoking qemu_savevm_state_iterate.

> +                    /* Just another iteration step */
> +                    qemu_savevm_state_iterate(s->file);
> +                }
>              } else {
>                  int ret;
>
> -                DPRINTF("done iterating\n");
> -                qemu_mutex_lock_iothread();
> -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> -                old_vm_running = runstate_is_running();
> -
> -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> -                if (ret >= 0) {
> -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> -                    qemu_savevm_state_complete(s->file);
> -                }
> -                qemu_mutex_unlock_iothread();
> -
> -                if (ret < 0) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> -                    break;
> +                DPRINTF("done iterating pending size %" PRIu64 "\n",
> +                        pending_size);
> +
> +                if (s->state == MIG_STATE_ACTIVE) {
> +                    qemu_mutex_lock_iothread();
> +                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> +                    old_vm_running = runstate_is_running();
> +
> +                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +                    if (ret >= 0) {
> +                        qemu_file_set_rate_limit(s->file, INT64_MAX);
> +                        qemu_savevm_state_complete(s->file);
> +                    }
> +                    qemu_mutex_unlock_iothread();
> +                    if (ret < 0) {
> +                        migrate_set_state(s, current_active_type,
> +                                          MIG_STATE_ERROR);
> +                        break;
> +                    }

I think all this code applies to postcopy as well.  Only the body of the 
first "if" must be replaced by qemu_savevm_state_postcopy_complete for 
postcopy.

> +                } else {
> +                    assert(s->state == MIG_STATE_POSTCOPY_ACTIVE);

This can fail if you get a cancel in the meanwhile.  You can replace the 
"if (s->state == MIG_STATE_ACTIVE" by "if (current_active_type == 
MIG_STATE_ACTIVE)" and remove the assert here.  Alternatively:

    if (migrate_postcopy_ram()) {
        assert(current_active_type == MIG_STATE_ACTIVE);
        ...
    } else {
        assert(current_active_type == MIG_STATE_POSTCOPY_ACTIVE);
        ...
    }

> +                    DPRINTF("postcopy end\n");
> +
> +                    qemu_savevm_state_postcopy_complete(s->file);
> +                    DPRINTF("postcopy end after complete\n");
>                  }
>
>                  if (!qemu_file_get_error(s->file)) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
> +                    migrate_set_state(s, current_active_type,
> +                                      MIG_STATE_COMPLETED);
>                      break;
>                  }
>              }
>          }
>
>          if (qemu_file_get_error(s->file)) {
> -            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> +            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> +            DPRINTF("migration_thread: file is in error state\n");
>              break;
>          }
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> @@ -1073,6 +1126,7 @@ static void *migration_thread(void *opaque)
>          }
>      }
>
> +    DPRINTF("migration_thread: Hit error: case\n");

This dprintf looks weird.

Paolo

>      qemu_mutex_lock_iothread();
>      if (s->state == MIG_STATE_COMPLETED) {
>          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-07-05 10:26   ` Paolo Bonzini
  2014-07-07 14:35     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-05 10:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy needs to have two migration streams loading concurrently;
> one from memory (with the device state) and the other from the fd
> with the memory transactions.

Can you explain this?

I would have though the order is

     precopy RAM and everything
     prepare postcopy RAM ("sent && dirty" bitmap)
     finish precopy non-RAM
     finish devices
     postcopy RAM

Why do you need to have all the packaging stuff and a separate 
memory-based migration stream for devices?  I'm sure I'm missing 
something. :)

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 46/46] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-07-05 10:28 ` Paolo Bonzini
  2014-07-07 14:02   ` Dr. David Alan Gilbert
  46 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-05 10:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
>    e) I've added a 'migration_set_parameter' command as somewhere to put integer
>       parameters associated with migration.
>       e.1) And I use that initially for the length of precopy to try.

Could you have instead a "migrate_start_postcopy" command, and leave the 
policy to management instead?

We could also (later) add an event for the end of the migration bulk 
phase, that can help management deciding when to switch?

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-05 10:28 ` [Qemu-devel] [PATCH 00/46] Postcopy implementation Paolo Bonzini
@ 2014-07-07 14:02   ` Dr. David Alan Gilbert
  2014-07-07 14:35     ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 14:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >   e) I've added a 'migration_set_parameter' command as somewhere to put integer
> >      parameters associated with migration.
> >      e.1) And I use that initially for the length of precopy to try.
> 
> Could you have instead a "migrate_start_postcopy" command, and leave the
> policy to management instead?

Hmm; yes that is probably possible - although with the migration_set_parameter
configuration you get the best of both worlds:
   1) You can set the parameter to say a few seconds and let QEMU handle it
   2) You can set the parameter really large, but (I need to check) you could
      drop the parameter later and then cause it to kick in.

I also did it this way because it was similar to the way the auto-throttling
mechanism.

> We could also (later) add an event for the end of the migration bulk phase,
> that can help management deciding when to switch?

Yeh, the libvirt people want an event to know when (the existing) migration completes,
and this would easily fit onto that.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-05 10:26   ` Paolo Bonzini
@ 2014-07-07 14:35     ` Dr. David Alan Gilbert
  2014-07-07 14:53       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 14:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Postcopy needs to have two migration streams loading concurrently;
> >one from memory (with the device state) and the other from the fd
> >with the memory transactions.
> 
> Can you explain this?
> 
> I would have though the order is
> 
>     precopy RAM and everything
>     prepare postcopy RAM ("sent && dirty" bitmap)
>     finish precopy non-RAM
>     finish devices
>     postcopy RAM
> 
> Why do you need to have all the packaging stuff and a separate memory-based
> migration stream for devices?  I'm sure I'm missing something. :)

The thing you're missing is the details of 'finish devices'.
The device emulation may access guest memory as part of loading it's
state, so you can't successfully complete 'finish devices' without
having the 'postcopy RAM' available to provide pages.
Thus you need to be able to start up 'postcopy RAM' before 'finish devices'
has completed, and you can't do that if 'finish devices' is still stuffing
data down the fd.

Now, if hypothetically you had:
  1) A migration format that let you separate out device state so that you
could load all the state of the device off the fd without calling the device
IO code.
  2) All devices were good and didn't touch guest memory while loading their
state.

then you could avoid this complexity.  However, if you look at how Stefan's
BER code tried to do 1 (which I don't do in my way of doing it), it was by
using the same trick of stuffing the device data into a dummy memory file
to find out the size of the data.   And I'm not convinced (2) will happen
this century.

> Paolo

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-07 14:02   ` Dr. David Alan Gilbert
@ 2014-07-07 14:35     ` Paolo Bonzini
  2014-07-07 14:58       ` Dr. David Alan Gilbert
  2014-07-10 11:29       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-07 14:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
>> > Could you have instead a "migrate_start_postcopy" command, and leave the
>> > policy to management instead?
> Hmm; yes that is probably possible - although with the migration_set_parameter
> configuration you get the best of both worlds:
>    1) You can set the parameter to say a few seconds and let QEMU handle it
>    2) You can set the parameter really large, but (I need to check) you could
>       drop the parameter later and then cause it to kick in.
>
> I also did it this way because it was similar to the way the auto-throttling
> mechanism.

Auto-throttling doesn't let you configure when it kicks in (it doesn't 
even need support from the destination side).  For postcopy you would 
still have a capability, like auto-throttling, just not the argument.

The reason why I prefer a manual step from management, is because 
postcopy is a one-way street.  Suppose a newer version of management 
software has started migration with postcopy configured, and then an 
older version is started.  It is probably an invalid thing to do, but 
the confusion in the older version could be fatal and it's nice if 
there's an easy way to prevent it.

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-07 14:35     ` Dr. David Alan Gilbert
@ 2014-07-07 14:53       ` Paolo Bonzini
  2014-07-07 15:04         ` Dr. David Alan Gilbert
  2014-07-16  9:25         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-07 14:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> Postcopy needs to have two migration streams loading concurrently;
>>> one from memory (with the device state) and the other from the fd
>>> with the memory transactions.
>>
>> Can you explain this?
>>
>> I would have though the order is
>>
>>     precopy RAM and everything
>>     prepare postcopy RAM ("sent && dirty" bitmap)
>>     finish precopy non-RAM
>>     finish devices
>>     postcopy RAM
>>
>> Why do you need to have all the packaging stuff and a separate memory-based
>> migration stream for devices?  I'm sure I'm missing something. :)
>
> The thing you're missing is the details of 'finish devices'.
> The device emulation may access guest memory as part of loading it's
> state, so you can't successfully complete 'finish devices' without
> having the 'postcopy RAM' available to provide pages.

I see.  Can you document the flow (preferrably as a reply to this email 
_and_ in docs/ when you send v2 of the code :))?

 From my cursory read of the code it is something like this on the source:

     finish precopy non-RAM
     start RAM postcopy
     for each device
         pack up data
         send it to destination

and on the destination:

     while source sends packet
         pick up packet atomically
         pass the packet to device loader
             (while the loader works, userfaultfd does background magic)

But something is missing still, either some kind of ack is needed 
between device data sends or userfaultfd needs to be able to process 
device data packets.

Paolo

> Thus you need to be able to start up 'postcopy RAM' before 'finish devices'
> has completed, and you can't do that if 'finish devices' is still stuffing
> data down the fd.
>
> Now, if hypothetically you had:
>   1) A migration format that let you separate out device state so that you
> could load all the state of the device off the fd without calling the device
> IO code.
>   2) All devices were good and didn't touch guest memory while loading their
> state.
>
> then you could avoid this complexity.  However, if you look at how Stefan's
> BER code tried to do 1 (which I don't do in my way of doing it), it was by
> using the same trick of stuffing the device data into a dummy memory file
> to find out the size of the data.   And I'm not convinced (2) will happen
> this century.
>
>> Paolo
>
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-07 14:35     ` Paolo Bonzini
@ 2014-07-07 14:58       ` Dr. David Alan Gilbert
  2014-07-10 11:29       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 14:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
> >>> Could you have instead a "migrate_start_postcopy" command, and leave the
> >>> policy to management instead?
> >Hmm; yes that is probably possible - although with the migration_set_parameter
> >configuration you get the best of both worlds:
> >   1) You can set the parameter to say a few seconds and let QEMU handle it
> >   2) You can set the parameter really large, but (I need to check) you could
> >      drop the parameter later and then cause it to kick in.
> >
> >I also did it this way because it was similar to the way the auto-throttling
> >mechanism.
> 
> Auto-throttling doesn't let you configure when it kicks in (it doesn't even
> need support from the destination side).  For postcopy you would still have
> a capability, like auto-throttling, just not the argument.

But auto-throttling is handled automatically by qemu rather than management;
and it seems right that it should be a configurable threshold.

> The reason why I prefer a manual step from management, is because postcopy
> is a one-way street.  Suppose a newer version of management software has
> started migration with postcopy configured, and then an older version is
> started.  It is probably an invalid thing to do, but the confusion in the
> older version could be fatal and it's nice if there's an easy way to prevent
> it.

Hmm, the way the setup at the moment is setup the new-version would have
had to start the VMs down the one way street, I'm not sure what the old
software would do bad to it.

Having said that, I can see some advantages - in particular knowing when
it's safe to cancel.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-07 14:53       ` Paolo Bonzini
@ 2014-07-07 15:04         ` Dr. David Alan Gilbert
  2014-07-16  9:25         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 15:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >>>From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>>
> >>>Postcopy needs to have two migration streams loading concurrently;
> >>>one from memory (with the device state) and the other from the fd
> >>>with the memory transactions.
> >>
> >>Can you explain this?
> >>
> >>I would have though the order is
> >>
> >>    precopy RAM and everything
> >>    prepare postcopy RAM ("sent && dirty" bitmap)
> >>    finish precopy non-RAM
> >>    finish devices
> >>    postcopy RAM
> >>
> >>Why do you need to have all the packaging stuff and a separate memory-based
> >>migration stream for devices?  I'm sure I'm missing something. :)
> >
> >The thing you're missing is the details of 'finish devices'.
> >The device emulation may access guest memory as part of loading it's
> >state, so you can't successfully complete 'finish devices' without
> >having the 'postcopy RAM' available to provide pages.
> 
> I see.  Can you document the flow (preferrably as a reply to this email
> _and_ in docs/ when you send v2 of the code :))?

Yep, will do; I need to go through and check through it before I write the
detail reply.

Dave

> 
> From my cursory read of the code it is something like this on the source:
> 
>     finish precopy non-RAM
>     start RAM postcopy
>     for each device
>         pack up data
>         send it to destination
> 
> and on the destination:
> 
>     while source sends packet
>         pick up packet atomically
>         pass the packet to device loader
>             (while the loader works, userfaultfd does background magic)
> 
> But something is missing still, either some kind of ack is needed between
> device data sends or userfaultfd needs to be able to process device data
> packets.
> 
> Paolo
> 
> >Thus you need to be able to start up 'postcopy RAM' before 'finish devices'
> >has completed, and you can't do that if 'finish devices' is still stuffing
> >data down the fd.
> >
> >Now, if hypothetically you had:
> >  1) A migration format that let you separate out device state so that you
> >could load all the state of the device off the fd without calling the device
> >IO code.
> >  2) All devices were good and didn't touch guest memory while loading their
> >state.
> >
> >then you could avoid this complexity.  However, if you look at how Stefan's
> >BER code tried to do 1 (which I don't do in my way of doing it), it was by
> >using the same trick of stuffing the device data into a dummy memory file
> >to find out the size of the data.   And I'm not convinced (2) will happen
> >this century.
> >
> >>Paolo
> >
> >Dave
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-07-07 15:46   ` Eric Blake
  2014-07-07 15:48     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Eric Blake @ 2014-07-07 15:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> check the return value of the function it calls and error if it's none-0
> Fixup qemu_rdma_init_one_block that is the only current caller,
>   and __qemu_rdma_add_block the only function it calls using it.

As long as you are at it, is it worth renaming the use of __qemu prefix
in a function name?  That namespace is reserved for the compiler/libc,
and we shouldn't be stomping on it.  Probably best as a separate patch.

> 
> Pass the name of the ramblock to the function; helps in debugging.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c                    | 10 ++++++++--
>  include/exec/cpu-common.h |  4 ++--
>  migration-rdma.c          |  4 ++--
>  3 files changed, 12 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-07-07 15:46   ` Eric Blake
@ 2014-07-07 15:48     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 15:48 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Eric Blake (eblake@redhat.com) wrote:
> On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > check the return value of the function it calls and error if it's none-0
> > Fixup qemu_rdma_init_one_block that is the only current caller,
> >   and __qemu_rdma_add_block the only function it calls using it.
> 
> As long as you are at it, is it worth renaming the use of __qemu prefix
> in a function name?  That namespace is reserved for the compiler/libc,
> and we shouldn't be stomping on it.  Probably best as a separate patch.

Yeh it sounds separate, but worth doing but this was getting a large enough
set of patches already.

Dave

> 
> > 
> > Pass the name of the ramblock to the function; helps in debugging.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  exec.c                    | 10 ++++++++--
> >  include/exec/cpu-common.h |  4 ++--
> >  migration-rdma.c          |  4 ++--
> >  3 files changed, 12 insertions(+), 6 deletions(-)
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram.
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-07-07 19:41   ` Eric Blake
  2014-07-07 20:23     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Eric Blake @ 2014-07-07 19:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h | 1 +
>  migration.c                   | 9 +++++++++
>  qapi-schema.json              | 6 +++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 

> +++ b/qapi-schema.json
> @@ -491,10 +491,14 @@
>  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>  #          to speed up convergence of RAM migration. (since 1.6)
>  #
> +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> +#          migration fails during postcopy the VM will fail.  (since 2.2)

How does this work with libvirt's current insistence that it manually
resumes the guest on the destination in order to give feedback to the
source on whether it was successful? I'm not sure if enabling this bool
is the right thing to do, or if we just need more visibility (such as
events rather than the current state of polling) for libvirt to know
that it is time to resume the destination and start the post-copy phase.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing Dr. David Alan Gilbert (git)
@ 2014-07-07 19:50   ` Eric Blake
  0 siblings, 0 replies; 83+ messages in thread
From: Eric Blake @ 2014-07-07 19:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]

On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add somewhere for the various migration parameters to be set with
> one command;
> 
> As suggested in the thread:
> http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00243.html

That's a bit old; I had a newer suggestion here:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02274.html

with positive feedback here and in further followups:
https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00653.html

> 
> There are many existing migration parameters that are scattered over
> many individual commands; moving those to this scheme would probably break
> things for others, so I've left them be.

I don't mind having information duplicated into two places.  I'd much
rather have all the old commands continue to work one item at a time,
but have the new command (or new extension to existing command) globally
show all tunables, than trying to force management to mix both old and
new calls.  That is, if the new approach works, libvirt would rather use
JUST the new approach instead of a mix of old and new, when managing all
the tunables.

> 
> Preserve the migration tunable values across the reinit of the migration
> status in the same way that the capability flags are preserved.
> 
> Add completion routine for it.
> 
> Use the postcopy time out setting as the first parameter.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hmp-commands.hx               | 17 ++++++++++
>  hmp.c                         | 54 ++++++++++++++++++++++++++++++
>  hmp.h                         |  4 +++
>  include/migration/migration.h |  4 +++
>  migration.c                   | 78 ++++++++++++++++++++++++++++++++++---------
>  monitor.c                     | 25 ++++++++++++++
>  qapi-schema.json              | 50 +++++++++++++++++++++++++++
>  qmp-commands.hx               | 23 +++++++++++++
>  8 files changed, 239 insertions(+), 16 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -538,6 +538,56 @@
>  { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>  
>  ##
> +# @MigrationParameterName
> +#
> +# Migration parameter enumeration
> +#     Most existing parameters have separate commands/entries but it seems
> +#     better to group them in the same way as the capability flags
> +#
> +# @x-postcopy-start-time: Time (in ms) after the start of migration to consider
> +#                         switching to postcopy mode
> +#
> +# Since: 2.0

Not even close to 2.0 material.  This is 2.2 at the earliest. (Several
more instances in this patch)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram.
  2014-07-07 19:41   ` Eric Blake
@ 2014-07-07 20:23     ` Dr. David Alan Gilbert
  2014-07-10 16:17       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-07 20:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Eric Blake (eblake@redhat.com) wrote:
> On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h | 1 +
> >  migration.c                   | 9 +++++++++
> >  qapi-schema.json              | 6 +++++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -491,10 +491,14 @@
> >  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
> >  #          to speed up convergence of RAM migration. (since 1.6)
> >  #
> > +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> > +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> > +#          migration fails during postcopy the VM will fail.  (since 2.2)
> 
> How does this work with libvirt's current insistence that it manually
> resumes the guest on the destination in order to give feedback to the
> source on whether it was successful? I'm not sure if enabling this bool
> is the right thing to do, or if we just need more visibility (such as
> events rather than the current state of polling) for libvirt to know
> that it is time to resume the destination and start the post-copy phase.

That's an interesting overlap with Paolo's question.
(and approximately the same answer)

I think what I need to do for that is:
   1) As for precopy add the option not to start the destination CPU on entry to postcopy;
      I think that's OK, because we can carry on in postcopy mode even if the destination
      CPU isn't running, we just won't generate page requests.
   2) Finally fix up the old request libvirt has for events based on migration state.

Admittedly I don't quite understand how (1) is supposed to interact with device
state.

Dave

> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-07 14:35     ` Paolo Bonzini
  2014-07-07 14:58       ` Dr. David Alan Gilbert
@ 2014-07-10 11:29       ` Dr. David Alan Gilbert
  2014-07-10 12:48         ` Eric Blake
  1 sibling, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-10 11:29 UTC (permalink / raw)
  To: Paolo Bonzini, eblake; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
> >>> Could you have instead a "migrate_start_postcopy" command, and leave the
> >>> policy to management instead?
> >Hmm; yes that is probably possible - although with the migration_set_parameter
> >configuration you get the best of both worlds:
> >   1) You can set the parameter to say a few seconds and let QEMU handle it
> >   2) You can set the parameter really large, but (I need to check) you could
> >      drop the parameter later and then cause it to kick in.
> >
> >I also did it this way because it was similar to the way the auto-throttling
> >mechanism.
> 
> Auto-throttling doesn't let you configure when it kicks in (it doesn't even
> need support from the destination side).  For postcopy you would still have
> a capability, like auto-throttling, just not the argument.
> 
> The reason why I prefer a manual step from management, is because postcopy
> is a one-way street.  Suppose a newer version of management software has
> started migration with postcopy configured, and then an older version is
> started.  It is probably an invalid thing to do, but the confusion in the
> older version could be fatal and it's nice if there's an easy way to prevent
> it.

Actually the 'migrate_start_postcopy' idea is growing on me; Eric is that
also your preferred way of doing it?

If we did this I'd:
   1) Remove the migration_set_parameter code I added
   2) and the x-postcopy_ram_start_time parameter
   3) Add a new command migrate_start_postcopy that just sets a flag
      which is tested in the same place as I currently check the timeout.
      If it's issued after a migration has finished it doesn't fail because
      that would be racy.  If issued before a migration starts that's OK
      as long as postcopy is enabled and means to start postcopy mode
      immediately.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 11:29       ` Dr. David Alan Gilbert
@ 2014-07-10 12:48         ` Eric Blake
  2014-07-10 13:37           ` Dr. David Alan Gilbert
  2014-08-11 15:31           ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 83+ messages in thread
From: Eric Blake @ 2014-07-10 12:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Paolo Bonzini
  Cc: aarcange, yamahata, quintela, qemu-devel, lilei

[-- Attachment #1: Type: text/plain, Size: 2942 bytes --]

On 07/10/2014 05:29 AM, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
>>>>> Could you have instead a "migrate_start_postcopy" command, and leave the
>>>>> policy to management instead?
>>> Hmm; yes that is probably possible - although with the migration_set_parameter
>>> configuration you get the best of both worlds:
>>>   1) You can set the parameter to say a few seconds and let QEMU handle it
>>>   2) You can set the parameter really large, but (I need to check) you could
>>>      drop the parameter later and then cause it to kick in.
>>>
>>> I also did it this way because it was similar to the way the auto-throttling
>>> mechanism.
>>
>> Auto-throttling doesn't let you configure when it kicks in (it doesn't even
>> need support from the destination side).  For postcopy you would still have
>> a capability, like auto-throttling, just not the argument.
>>
>> The reason why I prefer a manual step from management, is because postcopy
>> is a one-way street.  Suppose a newer version of management software has
>> started migration with postcopy configured, and then an older version is
>> started.  It is probably an invalid thing to do, but the confusion in the
>> older version could be fatal and it's nice if there's an easy way to prevent
>> it.
> 
> Actually the 'migrate_start_postcopy' idea is growing on me; Eric is that
> also your preferred way of doing it?
> 
> If we did this I'd:
>    1) Remove the migration_set_parameter code I added
>    2) and the x-postcopy_ram_start_time parameter
>    3) Add a new command migrate_start_postcopy that just sets a flag
>       which is tested in the same place as I currently check the timeout.
>       If it's issued after a migration has finished it doesn't fail because
>       that would be racy.  If issued before a migration starts that's OK
>       as long as postcopy is enabled and means to start postcopy mode
>       immediately.

So to make sure I understand, the idea is that the management starts
migration as normal, then after enough time has elapsed, issues a
migrate_start_postcopy to tell qemu that it is okay to switch to
postcopy at the next convenient opportunity?  Is there any need for an
event telling libvirt that enough pre-copy has occurred to make a
postcopy worthwhile?  And of course, I _still_ want an event for when
normal precopy migration is ready (instead of the current solution of
libvirt having to poll to track progress).

But in answer to your question, yes, it sounds like adding a new command
(actually, per QMP conventions it should probably be
migrate-start-postcopy with dashes instead of underscore) for management
to determine if/when to allow postcopy to kick in seems okay.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 12:48         ` Eric Blake
@ 2014-07-10 13:37           ` Dr. David Alan Gilbert
  2014-07-10 15:33             ` Andrea Arcangeli
  2014-08-11 15:31           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-10 13:37 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, lilei, quintela, qemu-devel, Paolo Bonzini

* Eric Blake (eblake@redhat.com) wrote:
> On 07/10/2014 05:29 AM, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
> >>>>> Could you have instead a "migrate_start_postcopy" command, and leave the
> >>>>> policy to management instead?
> >>> Hmm; yes that is probably possible - although with the migration_set_parameter
> >>> configuration you get the best of both worlds:
> >>>   1) You can set the parameter to say a few seconds and let QEMU handle it
> >>>   2) You can set the parameter really large, but (I need to check) you could
> >>>      drop the parameter later and then cause it to kick in.
> >>>
> >>> I also did it this way because it was similar to the way the auto-throttling
> >>> mechanism.
> >>
> >> Auto-throttling doesn't let you configure when it kicks in (it doesn't even
> >> need support from the destination side).  For postcopy you would still have
> >> a capability, like auto-throttling, just not the argument.
> >>
> >> The reason why I prefer a manual step from management, is because postcopy
> >> is a one-way street.  Suppose a newer version of management software has
> >> started migration with postcopy configured, and then an older version is
> >> started.  It is probably an invalid thing to do, but the confusion in the
> >> older version could be fatal and it's nice if there's an easy way to prevent
> >> it.
> > 
> > Actually the 'migrate_start_postcopy' idea is growing on me; Eric is that
> > also your preferred way of doing it?
> > 
> > If we did this I'd:
> >    1) Remove the migration_set_parameter code I added
> >    2) and the x-postcopy_ram_start_time parameter
> >    3) Add a new command migrate_start_postcopy that just sets a flag
> >       which is tested in the same place as I currently check the timeout.
> >       If it's issued after a migration has finished it doesn't fail because
> >       that would be racy.  If issued before a migration starts that's OK
> >       as long as postcopy is enabled and means to start postcopy mode
> >       immediately.
> 
> So to make sure I understand, the idea is that the management starts
> migration as normal, then after enough time has elapsed, issues a
> migrate_start_postcopy to tell qemu that it is okay to switch to
> postcopy at the next convenient opportunity? 

Yep, that's the idea.

> Is there any need for an
> event telling libvirt that enough pre-copy has occurred to make a
> postcopy worthwhile?

I'm not sure that qemu knows much more than management does at that
point; any such decision you can make based on an arbitrary cut off
(i.e. migration is taking too long) or you could consider something
based on some of the other stats that migration already exposes
(like the dirty pages stats); if we've got any more stats that you
need we can always expose them.

> And of course, I _still_ want an event for when
> normal precopy migration is ready (instead of the current solution of
> libvirt having to poll to track progress).

Agreed; although we can just do that independently of this big patch set.

> But in answer to your question, yes, it sounds like adding a new command
> (actually, per QMP conventions it should probably be
> migrate-start-postcopy with dashes instead of underscore) for management
> to determine if/when to allow postcopy to kick in seems okay.

OK, I'll do it.

Dave

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 13:37           ` Dr. David Alan Gilbert
@ 2014-07-10 15:33             ` Andrea Arcangeli
  2014-07-10 15:49               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Andrea Arcangeli @ 2014-07-10 15:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: yamahata, lilei, quintela, qemu-devel, Paolo Bonzini

On Thu, Jul 10, 2014 at 02:37:43PM +0100, Dr. David Alan Gilbert wrote:
> * Eric Blake (eblake@redhat.com) wrote:
> > Is there any need for an
> > event telling libvirt that enough pre-copy has occurred to make a
> > postcopy worthwhile?
> 
> I'm not sure that qemu knows much more than management does at that
> point; any such decision you can make based on an arbitrary cut off
> (i.e. migration is taking too long) or you could consider something
> based on some of the other stats that migration already exposes
> (like the dirty pages stats); if we've got any more stats that you
> need we can always expose them.
>
> Agreed; although we can just do that independently of this big patch set.

It can be independent yes, but I think such event is needed (and once
we add such event I hope we can get rid of the polling libvirt is
doing for pure precopy too).

I think for very large guests what should happen is a single _lazy_
pass of precopy and then immediately postcopy.

That's why I think an event that notifies libvirt when it should issue
the postcopy command is good, to be able to implement the single
_lazy_ pass and nothing more than that.

qemu should stop precopy and the source guest just before sending the
event, so then libvirt can assign all storage to the destination just
before issuing the postcopy commmand. By the time the event has been
raised by qemu, the guest in the source qemu must never run
anymore. So it is actually the same event needed in pure precopy too
(except when using precopy+postcopy the "precopy complete" event will
fire much sooner). We'll still need a parameter to precopy to tell
qemu when precopy should stop.

The single precopy lazy pass would consist of clearing the dirty
bitmap, starting precopy, then if any page is found dirty by the time
precopy tries to send it, we skip it. We only send those pages in
precopy that haven't been modified yet by the time we reach them in
precopy.

Pages heavily modified will be sent purely through
postcopy. Ultimately postcopy will be a page sorting feature to
massively decrease the downtime latency, and to reduce to 2*ramsize
the maximum amount of data transferred on the network without having
to slow down the guest artificially. We'll also know exactly the
maximum time in advance that it takes to migrate a large host no
matter the load in it (2*ramsize divided by the network bandwidth
available at the migration time). It'll be totally deterministic, no
black magic slowdowns anymore.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 15:33             ` Andrea Arcangeli
@ 2014-07-10 15:49               ` Dr. David Alan Gilbert
  2014-07-11  4:05                 ` Sanidhya Kashyap
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-10 15:49 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: yamahata, lilei, quintela, qemu-devel, Paolo Bonzini

* Andrea Arcangeli (aarcange@redhat.com) wrote:
> On Thu, Jul 10, 2014 at 02:37:43PM +0100, Dr. David Alan Gilbert wrote:
> > * Eric Blake (eblake@redhat.com) wrote:
> > > Is there any need for an
> > > event telling libvirt that enough pre-copy has occurred to make a
> > > postcopy worthwhile?
> > 
> > I'm not sure that qemu knows much more than management does at that
> > point; any such decision you can make based on an arbitrary cut off
> > (i.e. migration is taking too long) or you could consider something
> > based on some of the other stats that migration already exposes
> > (like the dirty pages stats); if we've got any more stats that you
> > need we can always expose them.
> >
> > Agreed; although we can just do that independently of this big patch set.
> 
> It can be independent yes, but I think such event is needed (and once
> we add such event I hope we can get rid of the polling libvirt is
> doing for pure precopy too).
> 
> I think for very large guests what should happen is a single _lazy_
> pass of precopy and then immediately postcopy.
> 
> That's why I think an event that notifies libvirt when it should issue
> the postcopy command is good, to be able to implement the single
> _lazy_ pass and nothing more than that.
> 
> qemu should stop precopy and the source guest just before sending the
> event, so then libvirt can assign all storage to the destination just
> before issuing the postcopy commmand. By the time the event has been
> raised by qemu, the guest in the source qemu must never run
> anymore. So it is actually the same event needed in pure precopy too
> (except when using precopy+postcopy the "precopy complete" event will
> fire much sooner). We'll still need a parameter to precopy to tell
> qemu when precopy should stop.

That's an interesting different type of event; I think we probably
have that first pass information but it's not part of the 'state'
(i.e. whether it's started/completed/cancelled enum).

> The single precopy lazy pass would consist of clearing the dirty
> bitmap, starting precopy, then if any page is found dirty by the time
> precopy tries to send it, we skip it. We only send those pages in
> precopy that haven't been modified yet by the time we reach them in
> precopy.
> 
> Pages heavily modified will be sent purely through
> postcopy. Ultimately postcopy will be a page sorting feature to
> massively decrease the downtime latency, and to reduce to 2*ramsize
> the maximum amount of data transferred on the network without having
> to slow down the guest artificially. We'll also know exactly the
> maximum time in advance that it takes to migrate a large host no
> matter the load in it (2*ramsize divided by the network bandwidth
> available at the migration time). It'll be totally deterministic, no
> black magic slowdowns anymore.

There is a trade off;  killing the precopy does reduce network bandwidth,
but the other side of it is that you would incur more postcopy round trips,
so your average latency will probably increase.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram.
  2014-07-07 20:23     ` Dr. David Alan Gilbert
@ 2014-07-10 16:17       ` Paolo Bonzini
  2014-07-10 19:02         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-10 16:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Eric Blake
  Cc: aarcange, yamahata, lilei, qemu-devel, quintela

Il 07/07/2014 22:23, Dr. David Alan Gilbert ha scritto:
> I think what I need to do for that is:
>    1) As for precopy add the option not to start the destination CPU on entry to postcopy;
>       I think that's OK, because we can carry on in postcopy mode even if the destination
>       CPU isn't running, we just won't generate page requests.
>
> Admittedly I don't quite understand how (1) is supposed to interact with device
> state.

This is just passing "-S" on the destination side.  Device state is 
treated the same as without "-S" and can still generate page requests. 
The only difference is whether you have a vm_start() or not.

I think it should be possible to restart the VM on the source side after 
postcopy migration, as long as migration has failed or has been 
canceled.  Whether that makes sense or causes dire disk corruption 
depends on the particular scenario, but then the same holds for precopy 
and we don't try at all to prevent "cont" at the end of migration.  It 
makes it much easier for libvirt to restart the source if it cannot 
continue on the destination.

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram.
  2014-07-10 16:17       ` Paolo Bonzini
@ 2014-07-10 19:02         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-10 19:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, lilei, quintela, qemu-devel

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 22:23, Dr. David Alan Gilbert ha scritto:
> >I think what I need to do for that is:
> >   1) As for precopy add the option not to start the destination CPU on entry to postcopy;
> >      I think that's OK, because we can carry on in postcopy mode even if the destination
> >      CPU isn't running, we just won't generate page requests.
> >
> >Admittedly I don't quite understand how (1) is supposed to interact with device
> >state.
> 
> This is just passing "-S" on the destination side.  Device state is treated
> the same as without "-S" and can still generate page requests. The only
> difference is whether you have a vm_start() or not.

Good, that sounds easy enough.

> I think it should be possible to restart the VM on the source side after
> postcopy migration, as long as migration has failed or has been canceled.
> Whether that makes sense or causes dire disk corruption depends on the
> particular scenario, but then the same holds for precopy and we don't try at
> all to prevent "cont" at the end of migration.  It makes it much easier for
> libvirt to restart the source if it cannot continue on the destination.

Interesting; Andrea fell into accidentally starting his source and
was somewhat surprised.
I was just going to add the RAN_STATE_MEMORY_STALE that Lei Li added
in the exec-migration patchset.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 15:49               ` Dr. David Alan Gilbert
@ 2014-07-11  4:05                 ` Sanidhya Kashyap
  0 siblings, 0 replies; 83+ messages in thread
From: Sanidhya Kashyap @ 2014-07-11  4:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Andrea Arcangeli, yamahata, lilei, quintela, qemu-devel, Paolo Bonzini


>> The single precopy lazy pass would consist of clearing the dirty
>> bitmap, starting precopy, then if any page is found dirty by the time
>> precopy tries to send it, we skip it. We only send those pages in
>> precopy that haven't been modified yet by the time we reach them in
>> precopy.
>>
>> Pages heavily modified will be sent purely through
>> postcopy. Ultimately postcopy will be a page sorting feature to
>> massively decrease the downtime latency, and to reduce to 2*ramsize
>> the maximum amount of data transferred on the network without having
>> to slow down the guest artificially. We'll also know exactly the
>> maximum time in advance that it takes to migrate a large host no
>> matter the load in it (2*ramsize divided by the network bandwidth
>> available at the migration time). It'll be totally deterministic, no
>> black magic slowdowns anymore.
> 
> There is a trade off;  killing the precopy does reduce network bandwidth,
> but the other side of it is that you would incur more postcopy round trips,
> so your average latency will probably increase.
> 

I agree with David on the latency issue. I (along with my colleague)
have tried the idea of single iteration precopy and then postcopy (with
our own version of pre+post implementation). In case of workloads with
huge writable working set size, the VM remains a bit inactive, because
of transfer of pages. We coined a new term i.e. perceivable downtime,
which can be measured for workloads running some network intensive tasks.

The multiple postcopy round trips will certainly worsen the performance
of a memory intensive workload like mcf of SPECCPU 2006 or even
memcached based guest is migrated (some of the workloads on which we
tested our prototype).

Currently, I don't know how does David's postcopy implementation handles
multiple pages, which I will try to investigate in sometime.

--

Sanidhya

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why
  2014-07-04 17:41 ` [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why Dr. David Alan Gilbert (git)
@ 2014-07-11 15:20   ` Eric Blake
  2014-07-11 15:41     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Eric Blake @ 2014-07-11 15:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Long subject line. Also, spell out "negative" instead of abbreviating "-ve"

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch_init.c b/arch_init.c
> index c006d21..58eccc1 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -439,6 +439,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
>  
>      if (next < size) {
>          clear_bit(next, migration_bitmap);
> +        assert(migration_dirty_pages > 0);
>          migration_dirty_pages--;
>      }
>      *bitoffset = next;
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why
  2014-07-11 15:20   ` Eric Blake
@ 2014-07-11 15:41     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-11 15:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Eric Blake (eblake@redhat.com) wrote:
> On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Long subject line. Also, spell out "negative" instead of abbreviating "-ve"

Fixed.

Dave

> 
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index c006d21..58eccc1 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -439,6 +439,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
> >  
> >      if (next < size) {
> >          clear_bit(next, migration_bitmap);
> > +        assert(migration_dirty_pages > 0);
> >          migration_dirty_pages--;
> >      }
> >      *bitoffset = next;
> > 
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
  2014-07-07 14:53       ` Paolo Bonzini
  2014-07-07 15:04         ` Dr. David Alan Gilbert
@ 2014-07-16  9:25         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-16  9:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >>>From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>>
> >>>Postcopy needs to have two migration streams loading concurrently;
> >>>one from memory (with the device state) and the other from the fd
> >>>with the memory transactions.
> >>
> >>Can you explain this?
> >>
> >>I would have though the order is
> >>
> >>    precopy RAM and everything
> >>    prepare postcopy RAM ("sent && dirty" bitmap)
> >>    finish precopy non-RAM
> >>    finish devices
> >>    postcopy RAM
> >>
> >>Why do you need to have all the packaging stuff and a separate memory-based
> >>migration stream for devices?  I'm sure I'm missing something. :)
> >
> >The thing you're missing is the details of 'finish devices'.
> >The device emulation may access guest memory as part of loading it's
> >state, so you can't successfully complete 'finish devices' without
> >having the 'postcopy RAM' available to provide pages.
> 
> I see.  Can you document the flow (preferrably as a reply to this email
> _and_ in docs/ when you send v2 of the code :))?

I thought I documented enough in the docs/migration.txt stuff in the last
patch (see the 'Postcopy states' section); however lets see if I the following
is better:

----
Postcopy stream

Loading of device data may cause the device emulation to access guest RAM
that may trigger faults that have to be resolved by the source, as such
the migration stream has to be able to respond with page data *during* the
device load, and hence the device data has to be read from the stream completely
before the device load begins to free the stream up.  This is acheived by
'packaging' the device data into a blob that's read in one go.

Source behaviour

Until postcopy is entered the migration stream is identical to normal postcopy,
except for the addition of a 'postcopy advise' command at the beginning to
let the destination know that postcopy might happen.  When postcopy starts
the source sends the page discard data and then forms the 'package' containing:

   Command: 'postcopy ram listen'
   The device state
      A series of sections, identical to the precopy streams device state stream
      containing everything except postcopiable devices (i.e. RAM)
   Command: 'postcopy ram run'

The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
contents are formatted in the same way as the main migration stream.

Destination behaviour

Initially the destination looks the same as precopy, with a single thread
reading the migration stream; the 'postcopy advise' and 'discard' commands
are processed to change the way RAM is managed, but don't affect the stream
processing.

------------------------------------------------------------------------------
                        1      2   3     4 5                      6   7
main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
thread                             |       |
                                   |     (page request)   
                                   |        \___
                                   v            \
listen thread:                     --- page -- page -- page -- page -- page --

                                   a   b        c
------------------------------------------------------------------------------

On receipt of CMD_PACKAGED (1)
   All the data associated with the package - the ( ... ) section in the
diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
recurses into qemu_loadvm_state_main to process the contents of the package (2)
which contains commands (3,6) and devices (4...)

On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
a new thread (a) is started that takes over servicing the migration stream,
while the main thread carries on loading the package.   It loads normal
background page data (b) but if during a device load a fault happens (5) the
returned page (c) is loaded by the listen thread allowing the main threads
device load to carry on.

The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
CPUs start running.
At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
and is no longer used by migration, while the listen thread carries
on servicing page data until the end of migration.

----

Is that any better?

Dave
P.S. I know of at least one bug in this code at the moment, it happens
on a VM that doesn't have many dirty pages where all the pages are
transmitted, and hence the listen thread finishes, before the main thread
gets to 'run'.

> From my cursory read of the code it is something like this on the source:
> 
>     finish precopy non-RAM
>     start RAM postcopy
>     for each device
>         pack up data
>         send it to destination
> 
> and on the destination:
> 
>     while source sends packet
>         pick up packet atomically
>         pass the packet to device loader
>             (while the loader works, userfaultfd does background magic)
> 
> But something is missing still, either some kind of ack is needed between
> device data sends or userfaultfd needs to be able to process device data
> packets.
> 
> Paolo
> 
> >Thus you need to be able to start up 'postcopy RAM' before 'finish devices'
> >has completed, and you can't do that if 'finish devices' is still stuffing
> >data down the fd.
> >
> >Now, if hypothetically you had:
> >  1) A migration format that let you separate out device state so that you
> >could load all the state of the device off the fd without calling the device
> >IO code.
> >  2) All devices were good and didn't touch guest memory while loading their
> >state.
> >
> >then you could avoid this complexity.  However, if you look at how Stefan's
> >BER code tried to do 1 (which I don't do in my way of doing it), it was by
> >using the same trick of stuffing the device data into a dummy memory file
> >to find out the size of the data.   And I'm not convinced (2) will happen
> >this century.
> >
> >>Paolo
> >
> >Dave
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-05 10:06   ` Paolo Bonzini
@ 2014-07-16  9:37     ` Dr. David Alan Gilbert
  2014-07-16  9:50       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-16  9:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Postcopy needs a method to send messages from the destination back to
> >the source, this is the 'return path'.
> >

<snip>

> >+/* Give a QEMUFile* off the same socket but data in the opposite
> >+ * direction.
> >+ * qemu_fopen_socket marks write fd's as blocking, but doesn't
> >+ * touch read fd's status, so we dup the fd just to keep settings
> >+ * separate. [TBD: Do I need to explicitly mark as non-block on read?]
> >+ */
> >+static QEMUFile *socket_dup_return_path(void *opaque)
> >+{
> >+    QEMUFileSocket *qfs = opaque;
> >+    int revfd;
> >+    bool this_is_read;
> >+    QEMUFile *result;
> >+
> >+    /* If it's already open, return it */
> >+    if (qfs->file->return_path) {
> >+        return qfs->file->return_path;
> 
> Wouldn't this leave a dangling file descriptor if you call
> socket_dup_return_path twice, and then close the original QEMUFile?

Hmm - how?

> >+    }
> >+
> >+    if (qemu_file_get_error(qfs->file)) {
> >+        /* If the forward file is in error, don't try and open a return */
> >+        return NULL;
> >+    }
> >+
> >+    /* I don't think there's a better way to tell which direction 'this' is */
> >+    this_is_read = qfs->file->ops->get_buffer != NULL;
> >+
> >+    revfd = dup(qfs->fd);
> >+    if (revfd == -1) {
> >+        error_report("Error duplicating fd for return path: %s",
> >+                      strerror(errno));
> >+        return NULL;
> >+    }
> >+
> >+    qemu_set_nonblock(revfd);
> 
> Blocking/nonblocking is per-file *description*, not descriptor.  So you're
> making the original QEMUFile nonblocking as well.  Can you explain why this
> is needed before I reach the meat of the patch series?

Yes, I went through that a few times until I got that it was per-entity not
the fd itself, it still makes life easier for the rest of the QEMUFile
code to have a separate fd for it (hence still worth the dup).

> In other words, can you draw a table with source/dest and read/write, and
> whether it should be blocking or non-blocking?

Sure; the non-blocking ness is mostly on the source side;
modifying the table in the docs patch a little:

  Source side
     Forward path - written by migration thread
            : It's OK for this to be blocking, but we end up with it being
              non-blocking, and modify the socket code to emulate blocking.

     Return path  - opened by main thread, read by fd_handler on main thread
            : Must be non-blocking so as not to block the main thread while
              waiting for a partially sent command.

  Destination side
     Forward path - read by main thread
     Return path  - opened by main thread, written by main thread AND postcopy
                    thread (protected by rp_mutex)

     I think I'm OK with both these being blocking.

Dave

> 
> Paolo
> 
> >+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
> >+    qfs->file->return_path = result;
> >+
> >+    if (result) {
> >+        /* We are the reverse path of our reverse path (although I don't
> >+           expect this to be used, it would stop another dup if it was */
> >+        result->return_path = qfs->file;
> >+    } else {
> >+        close(revfd);
> >+    }
> >+
> >+    return result;
> >+}
> >+
> > static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
> >                                     int64_t pos)
> > {
> >@@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
> > }
> >
> > static const QEMUFileOps socket_read_ops = {
> >-    .get_fd =     socket_get_fd,
> >-    .get_buffer = socket_get_buffer,
> >-    .close =      socket_close
> >+    .get_fd          = socket_get_fd,
> >+    .get_buffer      = socket_get_buffer,
> >+    .close           = socket_close,
> >+    .get_return_path = socket_dup_return_path
> > };
> >
> > static const QEMUFileOps socket_write_ops = {
> >-    .get_fd =     socket_get_fd,
> >-    .writev_buffer = socket_writev_buffer,
> >-    .close =      socket_close
> >+    .get_fd          = socket_get_fd,
> >+    .writev_buffer   = socket_writev_buffer,
> >+    .close           = socket_close,
> >+    .get_return_path = socket_dup_return_path
> > };
> >
> >+/*
> >+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
> >+ *         NULL if not available
> >+ */
> >+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
> >+{
> >+    if (!f->ops->get_return_path) {
> >+        return NULL;
> >+    }
> >+    return f->ops->get_return_path(f->opaque);
> >+}
> >+
> > bool qemu_file_mode_is_not_valid(const char *mode)
> > {
> >     if (mode == NULL ||
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-16  9:37     ` Dr. David Alan Gilbert
@ 2014-07-16  9:50       ` Paolo Bonzini
  2014-07-16 11:52         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-16  9:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
>>>
>>> +
>>> +    /* If it's already open, return it */
>>> +    if (qfs->file->return_path) {
>>> +        return qfs->file->return_path;
>>
>> Wouldn't this leave a dangling file descriptor if you call
>> socket_dup_return_path twice, and then close the original QEMUFile?
>
> Hmm - how?

The problem is that there is no reference count on QEMUFile, so if you do

   f1 = open_return_path(f0);
   f2 = open_return_path(f0);
   /* now f1 == f2 */
   qemu_fclose(f1);
   /* now f2 is dangling */

The remark about "closing the original QEMUFile" is also related to this 
part:

     if (result) {
         /* We are the reverse path of our reverse path (although I don't
            expect this to be used, it would stop another dup if it was /
         result->return_path = qfs->file;

which has a similar bug

   f1 = open_return_path(f0);
   f2 = open_return_path(f1);
   /* now f2 == f0 */
   qemu_fclose(f0);
   /* now f2 is dangling */

If this is correct, the simplest fix is to drop the optimization.

>
>   Source side
>      Forward path - written by migration thread
>             : It's OK for this to be blocking, but we end up with it being
>               non-blocking, and modify the socket code to emulate blocking.

This likely has a performance impact though.  The first migration thread 
code drop from Juan already improved throughput a lot, even if it kept 
the iothread all the time and only converted from nonblocking writes to 
blocking.

>      Return path  - opened by main thread, read by fd_handler on main thread
>             : Must be non-blocking so as not to block the main thread while
>               waiting for a partially sent command.

Why can't you handle this in the migration thread (or a new postcopy 
thread on the source side)?  Then it can stay blocking.

>   Destination side
>      Forward path - read by main thread

This must be nonblocking so that the monitor keeps responding.

>      Return path  - opened by main thread, written by main thread AND postcopy
>                     thread (protected by rp_mutex)

When does the main thread needs to write?

If it doesn't need that, you can just switch to blocking when you 
process the listen command (i.e. when the postcopy thread starts).

Paolo

>      I think I'm OK with both these being blocking.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-16  9:50       ` Paolo Bonzini
@ 2014-07-16 11:52         ` Dr. David Alan Gilbert
  2014-07-16 12:31           ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-16 11:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, lilei, qemu-devel, quintela

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
> >>>
> >>>+
> >>>+    /* If it's already open, return it */
> >>>+    if (qfs->file->return_path) {
> >>>+        return qfs->file->return_path;
> >>
> >>Wouldn't this leave a dangling file descriptor if you call
> >>socket_dup_return_path twice, and then close the original QEMUFile?
> >
> >Hmm - how?
> 
> The problem is that there is no reference count on QEMUFile, so if you do
> 
>   f1 = open_return_path(f0);
>   f2 = open_return_path(f0);
>   /* now f1 == f2 */
>   qemu_fclose(f1);
>   /* now f2 is dangling */

I think from the way I'm using it, I can remove the optimisation, but I do
need to check.

I'm not too sure what your worry is about 'f2' in this case; I guess the caller
needs to know that it should only close the return path once - is that
your worry?

> The remark about "closing the original QEMUFile" is also related to this
> part:
> 
>     if (result) {
>         /* We are the reverse path of our reverse path (although I don't
>            expect this to be used, it would stop another dup if it was /
>         result->return_path = qfs->file;
> 
> which has a similar bug
> 
>   f1 = open_return_path(f0);
>   f2 = open_return_path(f1);
>   /* now f2 == f0 */
>   qemu_fclose(f0);
>   /* now f2 is dangling */
> 
> If this is correct, the simplest fix is to drop the optimization.

I'm more nervous about dropping that one, because the current scheme
does provide a clean way of finding the forward path if you've got the
reverse (although I don't think I make use of it).

> >  Source side
> >     Forward path - written by migration thread
> >            : It's OK for this to be blocking, but we end up with it being
> >              non-blocking, and modify the socket code to emulate blocking.
> 
> This likely has a performance impact though.  The first migration thread
> code drop from Juan already improved throughput a lot, even if it kept the
> iothread all the time and only converted from nonblocking writes to
> blocking.

Can you give some more reasoning as to why you think this will hit the
performance so much; I thought the output buffers were quite big anyway.

> >     Return path  - opened by main thread, read by fd_handler on main thread
> >            : Must be non-blocking so as not to block the main thread while
> >              waiting for a partially sent command.
> 
> Why can't you handle this in the migration thread (or a new postcopy thread
> on the source side)?  Then it can stay blocking.

Handling it within the migration thread would make it much more complicated
(which would be bad since it's already complex enough);

A return path thread on the source side, hmm yes that could do it  - especially
since the migration thread is already a separate thread from the main thread
handling this and thus already needs locking paraphernalia.

> >  Destination side
> >     Forward path - read by main thread
> 
> This must be nonblocking so that the monitor keeps responding.

Interesting, I suspect none of the code in there is set up for that at the
moment, so how does that work during migration at the moment?

Actually, I see I missed something here; this should be:

   Destination side
         Forward path - read by main thread, and listener thread (see the
             separate mail that described that listner thread)

and that means that once postcopy is going (and the listener thread started)
it can't block the monitor.

> >     Return path  - opened by main thread, written by main thread AND postcopy
> >                    thread (protected by rp_mutex)
> 
> When does the main thread needs to write?

Not much; the only things the main thread currently responds to are the
ReqAck (ping like) requests; those are turning out to be very useful during debug;
I've also got the ability for the destination to send a migration result back to the
source which seems useful to be able to 'fail' early.

> If it doesn't need that, you can just switch to blocking when you process
> the listen command (i.e. when the postcopy thread starts).

Why don't I just do it anyway? Prior to postcopy starting we're in the same
situation as we're in with precopy today, so can already get mainblock threading.

Dave

> Paolo
> 
> >     I think I'm OK with both these being blocking.
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-16 11:52         ` Dr. David Alan Gilbert
@ 2014-07-16 12:31           ` Paolo Bonzini
  2014-07-16 17:10             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-16 12:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, lilei, qemu-devel, quintela

Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
>>>>>
>>>>> +
>>>>> +    /* If it's already open, return it */
>>>>> +    if (qfs->file->return_path) {
>>>>> +        return qfs->file->return_path;
>>>>
>>>> Wouldn't this leave a dangling file descriptor if you call
>>>> socket_dup_return_path twice, and then close the original QEMUFile?
>>>
>>> Hmm - how?
>>
>> The problem is that there is no reference count on QEMUFile, so if you do
>>
>>   f1 = open_return_path(f0);
>>   f2 = open_return_path(f0);
>>   /* now f1 == f2 */
>>   qemu_fclose(f1);
>>   /* now f2 is dangling */
>
> I think from the way I'm using it, I can remove the optimisation, but I do
> need to check.
>
> I'm not too sure what your worry is about 'f2' in this case; I guess the caller
> needs to know that it should only close the return path once - is that
> your worry?

Yes.  The API is not well encapsulated; a "random" caller of 
open_return_path does not know (and cannot know) whether it should close 
the returned file or not.

> I'm more nervous about dropping that one, because the current scheme
> does provide a clean way of finding the forward path if you've got the
> reverse (although I don't think I make use of it).

If it isn't used, why keep it?

>>>  Source side
>>>     Forward path - written by migration thread
>>>            : It's OK for this to be blocking, but we end up with it being
>>>              non-blocking, and modify the socket code to emulate blocking.
>>
>> This likely has a performance impact though.  The first migration thread
>> code drop from Juan already improved throughput a lot, even if it kept the
>> iothread all the time and only converted from nonblocking writes to
>> blocking.
>
> Can you give some more reasoning as to why you think this will hit the
> performance so much; I thought the output buffers were quite big anyway.

I don't really know, it's
>>>     Return path  - opened by main thread, read by fd_handler on main thread
>>>            : Must be non-blocking so as not to block the main thread while
>>>              waiting for a partially sent command.
>>
>> Why can't you handle this in the migration thread (or a new postcopy thread
>> on the source side)?  Then it can stay blocking.
>
> Handling it within the migration thread would make it much more complicated
> (which would be bad since it's already complex enough);

Ok.  I'm not sure why it is more complicated since migration is 
essentially two-phase, one where the source drives it and one where the 
source just waits for requests, but I'll trust you on this. :)

>>>  Destination side
>>>     Forward path - read by main thread
>>
>> This must be nonblocking so that the monitor keeps responding.
>
> Interesting, I suspect none of the code in there is set up for that at the
> moment, so how does that work during migration at the moment?

It sure is. :)

On the destination side, migration is done in a coroutine (see 
process_incoming_migration) so it's all transparent.  Only 
socket_get_buffer has to know about this:

         len = qemu_recv(s->fd, buf, size, 0);
         if (len != -1) {
             break;
         }
         if (socket_error() == EAGAIN) {
             yield_until_fd_readable(s->fd);
         } else if (socket_error() != EINTR) {
             break;
         }

If the socket is put in blocking mode recv will never return EAGAIN, so 
this code will only run if the socket is nonblocking.

> Actually, I see I missed something here; this should be:
>
>    Destination side
>          Forward path - read by main thread, and listener thread (see the
>              separate mail that described that listner thread)
>
> and that means that once postcopy is going (and the listener thread started)
> it can't block the monitor.

Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) 
once it gets its hands on the QEMUFile.

>>>     Return path  - opened by main thread, written by main thread AND postcopy
>>>                    thread (protected by rp_mutex)
>>
>> When does the main thread needs to write?
>
> Not much; the only things the main thread currently responds to are the
> ReqAck (ping like) requests; those are turning out to be very useful during debug;
> I've also got the ability for the destination to send a migration result back to the
> source which seems useful to be able to 'fail' early.

Why can't this be done in the listener thread?  (Thus transforming it 
into a more general postcopy migration thread; later we could even 
change incoming migration from a coroutine to a thread).

>> If it doesn't need that, you can just switch to blocking when you process
>> the listen command (i.e. when the postcopy thread starts).
>
> Why don't I just do it anyway? Prior to postcopy starting we're in the same
> situation as we're in with precopy today, so can already get mainblock threading.

See above for the explanation.

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-16 12:31           ` Paolo Bonzini
@ 2014-07-16 17:10             ` Dr. David Alan Gilbert
  2014-07-17  6:25               ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-07-16 17:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, lilei, qemu-devel, quintela

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
> >>>>>
> >>>>>+
> >>>>>+    /* If it's already open, return it */
> >>>>>+    if (qfs->file->return_path) {
> >>>>>+        return qfs->file->return_path;
> >>>>
> >>>>Wouldn't this leave a dangling file descriptor if you call
> >>>>socket_dup_return_path twice, and then close the original QEMUFile?
> >>>
> >>>Hmm - how?
> >>
> >>The problem is that there is no reference count on QEMUFile, so if you do
> >>
> >>  f1 = open_return_path(f0);
> >>  f2 = open_return_path(f0);
> >>  /* now f1 == f2 */
> >>  qemu_fclose(f1);
> >>  /* now f2 is dangling */
> >
> >I think from the way I'm using it, I can remove the optimisation, but I do
> >need to check.
> >
> >I'm not too sure what your worry is about 'f2' in this case; I guess the caller
> >needs to know that it should only close the return path once - is that
> >your worry?
> 
> Yes.  The API is not well encapsulated; a "random" caller of
> open_return_path does not know (and cannot know) whether it should close the
> returned file or not.

OK, then yes I'll give that a go taking out those optimisations.

> >I'm more nervous about dropping that one, because the current scheme
> >does provide a clean way of finding the forward path if you've got the
> >reverse (although I don't think I make use of it).
> 
> If it isn't used, why keep it?

It just felt pleasently symmetric being able to get the forward path
by asking for the return path on the return path; but I can remove it.

> >>> Source side
> >>>    Forward path - written by migration thread
> >>>           : It's OK for this to be blocking, but we end up with it being
> >>>             non-blocking, and modify the socket code to emulate blocking.
> >>
> >>This likely has a performance impact though.  The first migration thread
> >>code drop from Juan already improved throughput a lot, even if it kept the
> >>iothread all the time and only converted from nonblocking writes to
> >>blocking.
> >
> >Can you give some more reasoning as to why you think this will hit the
> >performance so much; I thought the output buffers were quite big anyway.
> 
> I don't really know, it's
> >>>    Return path  - opened by main thread, read by fd_handler on main thread
> >>>           : Must be non-blocking so as not to block the main thread while
> >>>             waiting for a partially sent command.
> >>
> >>Why can't you handle this in the migration thread (or a new postcopy thread
> >>on the source side)?  Then it can stay blocking.
> >
> >Handling it within the migration thread would make it much more complicated
> >(which would be bad since it's already complex enough);
> 
> Ok.  I'm not sure why it is more complicated since migration is essentially
> two-phase, one where the source drives it and one where the source just
> waits for requests, but I'll trust you on this. :)

It's not as clean a split like that; during the postcopy phase we still do the linear
page scan to send pages before they're requested, so the main migration thread
code keeps going.
(There's an 'interesting' balance here, send too many linear pages and they get
in the way of the postcopy requests and increase the latency, but sending them
means that you get a lot of the pages without having to request them which is 0
latency)

> >>> Destination side
> >>>    Forward path - read by main thread
> >>
> >>This must be nonblocking so that the monitor keeps responding.
> >
> >Interesting, I suspect none of the code in there is set up for that at the
> >moment, so how does that work during migration at the moment?
> 
> It sure is. :)

Oh so it is; I missed the 'qemu_set_nonblock(fd);' in process_incoming_migration

> On the destination side, migration is done in a coroutine (see
> process_incoming_migration) so it's all transparent.  Only socket_get_buffer
> has to know about this:
> 
>         len = qemu_recv(s->fd, buf, size, 0);
>         if (len != -1) {
>             break;
>         }
>         if (socket_error() == EAGAIN) {
>             yield_until_fd_readable(s->fd);
>         } else if (socket_error() != EINTR) {
>             break;
>         }
> 
> If the socket is put in blocking mode recv will never return EAGAIN, so this
> code will only run if the socket is nonblocking.

OK, yes.

> >Actually, I see I missed something here; this should be:
> >
> >   Destination side
> >         Forward path - read by main thread, and listener thread (see the
> >             separate mail that described that listner thread)
> >
> >and that means that once postcopy is going (and the listener thread started)
> >it can't block the monitor.
> 
> Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once
> it gets its hands on the QEMUFile.
> 
> >>>    Return path  - opened by main thread, written by main thread AND postcopy
> >>>                   thread (protected by rp_mutex)
> >>
> >>When does the main thread needs to write?
> >
> >Not much; the only things the main thread currently responds to are the
> >ReqAck (ping like) requests; those are turning out to be very useful during debug;
> >I've also got the ability for the destination to send a migration result back to the
> >source which seems useful to be able to 'fail' early.
> 
> Why can't this be done in the listener thread?  (Thus transforming it into a
> more general postcopy migration thread; later we could even change incoming
> migration from a coroutine to a thread).

It depends when the ReqAck is sent; i.e. if it's received when the listener
thread is running and processing the stream then it's the listener thread that
sends the reply.

However, that's not necessarily a big issue now that you've pointed out that
the destination fd is already running non-blocking.   If the worst comes to the
worst I could just disable the ReqAck's in non-debug.

> >>If it doesn't need that, you can just switch to blocking when you process
> >>the listen command (i.e. when the postcopy thread starts).
> >
> >Why don't I just do it anyway? Prior to postcopy starting we're in the same
> >situation as we're in with precopy today, so can already get mainblock threading.
> 
> See above for the explanation.
> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
  2014-07-16 17:10             ` Dr. David Alan Gilbert
@ 2014-07-17  6:25               ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2014-07-17  6:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, quintela, lilei, qemu-devel

Il 16/07/2014 19:10, Dr. David Alan Gilbert ha scritto:
>>> > >
>>> > >Handling it within the migration thread would make it much more complicated
>>> > >(which would be bad since it's already complex enough);
>> >
>> > Ok.  I'm not sure why it is more complicated since migration is essentially
>> > two-phase, one where the source drives it and one where the source just
>> > waits for requests, but I'll trust you on this. :)
> It's not as clean a split like that; during the postcopy phase we still do the linear
> page scan to send pages before they're requested, so the main migration thread
> code keeps going.

Ah, right.  As I said, I trusted you! ;)

>>>>> > >>>    Return path  - opened by main thread, written by main thread AND postcopy
>>>>> > >>>                   thread (protected by rp_mutex)
>>>> > >>
>>>> > >>When does the main thread needs to write?
>>> > >
>>> > >Not much; the only things the main thread currently responds to are the
>>> > >ReqAck (ping like) requests; those are turning out to be very useful during debug;
>>> > >I've also got the ability for the destination to send a migration result back to the
>>> > >source which seems useful to be able to 'fail' early.
>> >
>> > Why can't this be done in the listener thread?  (Thus transforming it into a
>> > more general postcopy migration thread; later we could even change incoming
>> > migration from a coroutine to a thread).
> It depends when the ReqAck is sent; i.e. if it's received when the listener
> thread is running and processing the stream then it's the listener thread that
> sends the reply.

Could the start of the listener thread basically coincide with a 
handover from the main thread to the listener thread?  If so, perhaps 
you can avoid sending ReqAcks packets before the "start destination" 
command (which I suppose should move processing of incoming page data 
from the main thread to the userfaultfd thread).  Then the "start 
destination" command can also change the socket back to blocking.

Paolo

> However, that's not necessarily a big issue now that you've pointed out that
> the destination fd is already running non-blocking.   If the worst comes to the
> worst I could just disable the ReqAck's in non-debug.
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
  2014-07-10 12:48         ` Eric Blake
  2014-07-10 13:37           ` Dr. David Alan Gilbert
@ 2014-08-11 15:31           ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-11 15:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, lilei, quintela, qemu-devel, Paolo Bonzini

* Eric Blake (eblake@redhat.com) wrote:
> On 07/10/2014 05:29 AM, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto:
> >>>>> Could you have instead a "migrate_start_postcopy" command, and leave the
> >>>>> policy to management instead?
> >>> Hmm; yes that is probably possible - although with the migration_set_parameter
> >>> configuration you get the best of both worlds:
> >>>   1) You can set the parameter to say a few seconds and let QEMU handle it
> >>>   2) You can set the parameter really large, but (I need to check) you could
> >>>      drop the parameter later and then cause it to kick in.
> >>>
> >>> I also did it this way because it was similar to the way the auto-throttling
> >>> mechanism.
> >>
> >> Auto-throttling doesn't let you configure when it kicks in (it doesn't even
> >> need support from the destination side).  For postcopy you would still have
> >> a capability, like auto-throttling, just not the argument.
> >>
> >> The reason why I prefer a manual step from management, is because postcopy
> >> is a one-way street.  Suppose a newer version of management software has
> >> started migration with postcopy configured, and then an older version is
> >> started.  It is probably an invalid thing to do, but the confusion in the
> >> older version could be fatal and it's nice if there's an easy way to prevent
> >> it.
> > 
> > Actually the 'migrate_start_postcopy' idea is growing on me; Eric is that
> > also your preferred way of doing it?
> > 
> > If we did this I'd:
> >    1) Remove the migration_set_parameter code I added
> >    2) and the x-postcopy_ram_start_time parameter
> >    3) Add a new command migrate_start_postcopy that just sets a flag
> >       which is tested in the same place as I currently check the timeout.
> >       If it's issued after a migration has finished it doesn't fail because
> >       that would be racy.  If issued before a migration starts that's OK
> >       as long as postcopy is enabled and means to start postcopy mode
> >       immediately.
> 
> So to make sure I understand, the idea is that the management starts
> migration as normal, then after enough time has elapsed, issues a
> migrate_start_postcopy to tell qemu that it is okay to switch to
> postcopy at the next convenient opportunity?  Is there any need for an
> event telling libvirt that enough pre-copy has occurred to make a
> postcopy worthwhile?  And of course, I _still_ want an event for when
> normal precopy migration is ready (instead of the current solution of
> libvirt having to poll to track progress).
> 
> But in answer to your question, yes, it sounds like adding a new command
> (actually, per QMP conventions it should probably be
> migrate-start-postcopy with dashes instead of underscore) for management
> to determine if/when to allow postcopy to kick in seems okay.

That's implemented in the v2 I just posted.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode
  2014-07-05 10:19   ` Paolo Bonzini
@ 2014-08-28 11:04     ` Dr. David Alan Gilbert
  2014-08-28 11:23       ` Paolo Bonzini
  0 siblings, 1 reply; 83+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-28 11:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Paolo Bonzini (pbonzini@redhat.com) wrote:

Hi Paolo,
  Apologies, I realised I hadn't dug into this comment.

> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Switch to postcopy if:
> >   1) There's still a significant amount to transfer
> >   2) Postcopy is enabled
> >   3) It's taken longer than the time set by the parameter.
> >
> >and change the cleanup at the end of migration to match.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> > migration.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 73 insertions(+), 19 deletions(-)
> >
> >diff --git a/migration.c b/migration.c
> >index 0d567ef..c73fcfa 100644
> >--- a/migration.c
> >+++ b/migration.c
> >@@ -982,16 +982,40 @@ static int postcopy_start(MigrationState *ms)
> > static void *migration_thread(void *opaque)
> > {
> >     MigrationState *s = opaque;
> >+    /* Used by the bandwidth calcs, updated later */
> >     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >+    /* Really, the time we started */
> >+    const int64_t initial_time_fixed = initial_time;
> >     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >     int64_t initial_bytes = 0;
> >     int64_t max_size = 0;
> >     int64_t start_time = initial_time;
> >+    int64_t pc_start_time;
> >+
> >     bool old_vm_running = false;
> >+    pc_start_time = s->tunables[MIGRATION_PARAMETER_NAME_X_POSTCOPY_START_TIME];
> >+
> >+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> >+    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
> >
> >     qemu_savevm_state_begin(s->file, &s->params);
> >
> >+    if (migrate_postcopy_ram()) {
> >+        /* Now tell the dest that it should open it's end so it can reply */
> >+        qemu_savevm_send_openrp(s->file);
> >+
> >+        /* And ask it to send an ack that will make stuff easier to debug */
> >+        qemu_savevm_send_reqack(s->file, 1);
> >+
> >+        /* Tell the destination that we *might* want to do postcopy later;
> >+         * if the other end can't do postcopy it should fail now, nice and
> >+         * early.
> >+         */
> >+        qemu_savevm_send_postcopy_ram_advise(s->file);
> >+    }
> >+
> >     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> >+    current_active_type = MIG_STATE_ACTIVE;
> >     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
> >
> >     DPRINTF("setup complete\n");
> >@@ -1012,37 +1036,66 @@ static void *migration_thread(void *opaque)
> >                     " nonpost=%" PRIu64 ")\n",
> >                     pending_size, max_size, pend_post, pend_nonpost);
> >             if (pending_size && pending_size >= max_size) {
> >-                qemu_savevm_state_iterate(s->file);
> >+                /* Still a significant amount to transfer */
> >+
> >+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >+                if (migrate_postcopy_ram() &&
> >+                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> >+                    pend_nonpost == 0 &&
> >+                    (current_time >= initial_time_fixed + pc_start_time)) {
> >+
> >+                    if (!postcopy_start(s)) {
> >+                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> >+                    }
> >+
> >+                    continue;
> >+                } else {
> 
> You don't really need the "else" if you have a continue.  However, do you
> need _any_ of the "else" and "continue"?  Would the next iteration of the
> "while" loop do anything else but invoking qemu_savevm_state_iterate.

Yes, I've dropped that 'else'; however, I've kept the continue - we're about
3 if's deep here inside the loop and there's a bunch of stuff at the end of
the if's but still inside the loop that I'm not 100% sure I want to run
again at this point (although it's probably OK).

> >+                    /* Just another iteration step */
> >+                    qemu_savevm_state_iterate(s->file);
> >+                }
> >             } else {
> >                 int ret;
> >
> >-                DPRINTF("done iterating\n");
> >-                qemu_mutex_lock_iothread();
> >-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >-                old_vm_running = runstate_is_running();
> >-
> >-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >-                if (ret >= 0) {
> >-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> >-                    qemu_savevm_state_complete(s->file);
> >-                }
> >-                qemu_mutex_unlock_iothread();
> >-
> >-                if (ret < 0) {
> >-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> >-                    break;
> >+                DPRINTF("done iterating pending size %" PRIu64 "\n",
> >+                        pending_size);
> >+
> >+                if (s->state == MIG_STATE_ACTIVE) {
> >+                    qemu_mutex_lock_iothread();
> >+                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >+                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >+                    old_vm_running = runstate_is_running();
> >+
> >+                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >+                    if (ret >= 0) {
> >+                        qemu_file_set_rate_limit(s->file, INT64_MAX);
> >+                        qemu_savevm_state_complete(s->file);
> >+                    }
> >+                    qemu_mutex_unlock_iothread();
> >+                    if (ret < 0) {
> >+                        migrate_set_state(s, current_active_type,
> >+                                          MIG_STATE_ERROR);
> >+                        break;
> >+                    }
> 
> I think all this code applies to postcopy as well.  Only the body of the
> first "if" must be replaced by qemu_savevm_state_postcopy_complete for
> postcopy.

A lot of this stuff is done, but it's done at the point we transition into
postcopy, not at the end (see postcopy_start).  However, I've not
got the wakup_request and old_vm_running check; so I probably need to
think where they should go; what's the purpose of the qemu_system_wakeup_request
there ? it seems to be getting the guest into running state - which is
where I'd assumed it was already.

> >+                } else {
> >+                    assert(s->state == MIG_STATE_POSTCOPY_ACTIVE);
> 
> This can fail if you get a cancel in the meanwhile.  You can replace the "if
> (s->state == MIG_STATE_ACTIVE" by "if (current_active_type ==
> MIG_STATE_ACTIVE)" and remove the assert here.  Alternatively:

Ah, thanks - fixed in the next version.

>    if (migrate_postcopy_ram()) {
>        assert(current_active_type == MIG_STATE_ACTIVE);
>        ...
>    } else {
>        assert(current_active_type == MIG_STATE_POSTCOPY_ACTIVE);
>        ...
>    }
> 
> >+                    DPRINTF("postcopy end\n");
> >+
> >+                    qemu_savevm_state_postcopy_complete(s->file);
> >+                    DPRINTF("postcopy end after complete\n");
> >                 }
> >
> >                 if (!qemu_file_get_error(s->file)) {
> >-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
> >+                    migrate_set_state(s, current_active_type,
> >+                                      MIG_STATE_COMPLETED);
> >                     break;
> >                 }
> >             }
> >         }
> >
> >         if (qemu_file_get_error(s->file)) {
> >-            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> >+            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> >+            DPRINTF("migration_thread: file is in error state\n");
> >             break;
> >         }
> >         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >@@ -1073,6 +1126,7 @@ static void *migration_thread(void *opaque)
> >         }
> >     }
> >
> >+    DPRINTF("migration_thread: Hit error: case\n");
> 
> This dprintf looks weird.

Fixed.

Dave

> 
> Paolo
> 
> >     qemu_mutex_lock_iothread();
> >     if (s->state == MIG_STATE_COMPLETED) {
> >         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode
  2014-08-28 11:04     ` Dr. David Alan Gilbert
@ 2014-08-28 11:23       ` Paolo Bonzini
  0 siblings, 0 replies; 83+ messages in thread
From: Paolo Bonzini @ 2014-08-28 11:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

Il 28/08/2014 13:04, Dr. David Alan Gilbert ha scritto:
>> You don't really need the "else" if you have a continue.  However, do you
>> need _any_ of the "else" and "continue"?  Would the next iteration of the
>> "while" loop do anything else but invoking qemu_savevm_state_iterate.
> 
> Yes, I've dropped that 'else'; however, I've kept the continue - we're about
> 3 if's deep here inside the loop and there's a bunch of stuff at the end of
> the if's but still inside the loop that I'm not 100% sure I want to run
> again at this point (although it's probably OK).

My point is that the next iteration would start exactly with the else
(calling qemu_savevm_state_iterate) and then do the stuff at the end of
the if's (bandwidth calculation and all that).  So why not do that
immediately, without the "continue"?

> A lot of this stuff is done, but it's done at the point we transition into
> postcopy, not at the end (see postcopy_start).

Perhaps you can move the common parts to a separate function instead of
cut-and-paste? ;)

> However, I've not
> got the wakup_request and old_vm_running check; so I probably need to
> think where they should go; what's the purpose of the qemu_system_wakeup_request
> there ? it seems to be getting the guest into running state - which is
> where I'd assumed it was already.

No, it doesn't have to---you could be doing non-live migration.
old_vm_running makes sure that if migration fails the VM restarts.  You
need to grab the state just before force-stopping the VM.

Regarding qemu_system_wakeup_request, it only does something if the
virtual machine is suspended-to-RAM; the call should be handling
migration of such a VM.  The idea is that since we don't transmit the
runstate, we just wakeup the VM on the destination.  But you need to
prepare the VM for that (which is basically a reset plus setting a
couple of ACPI registers).

The handling of the request is done here:

    if (qemu_wakeup_requested()) {
        pause_all_vcpus();
        cpu_synchronize_all_states();
        qemu_system_reset(VMRESET_SILENT);
        notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
        wakeup_reason = QEMU_WAKEUP_REASON_NONE;
        resume_all_vcpus();
        qapi_event_send_wakeup(&error_abort);
    }


However, it might be broken or might be working by chance only.

Paolo

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2014-08-28 11:24 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-04 17:41 [Qemu-devel] [PATCH 00/46] Postcopy implementation Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 01/46] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2014-07-07 15:46   ` Eric Blake
2014-07-07 15:48     ` Dr. David Alan Gilbert
2014-07-04 17:41 ` [Qemu-devel] [PATCH 02/46] Move QEMUFile structure to qemu-file.h Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 03/46] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 04/46] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 05/46] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 06/46] Create MigrationIncomingState Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2014-07-05 10:06   ` Paolo Bonzini
2014-07-16  9:37     ` Dr. David Alan Gilbert
2014-07-16  9:50       ` Paolo Bonzini
2014-07-16 11:52         ` Dr. David Alan Gilbert
2014-07-16 12:31           ` Paolo Bonzini
2014-07-16 17:10             ` Dr. David Alan Gilbert
2014-07-17  6:25               ` Paolo Bonzini
2014-07-04 17:41 ` [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2014-07-05 10:07   ` Paolo Bonzini
2014-07-04 17:41 ` [Qemu-devel] [PATCH 09/46] Migration commands Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 10/46] Return path: Control commands Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 11/46] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 12/46] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 13/46] qemu_loadvm debug Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 14/46] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2014-07-05 10:26   ` Paolo Bonzini
2014-07-07 14:35     ` Dr. David Alan Gilbert
2014-07-07 14:53       ` Paolo Bonzini
2014-07-07 15:04         ` Dr. David Alan Gilbert
2014-07-16  9:25         ` Dr. David Alan Gilbert
2014-07-04 17:41 ` [Qemu-devel] [PATCH 16/46] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2014-07-07 19:41   ` Eric Blake
2014-07-07 20:23     ` Dr. David Alan Gilbert
2014-07-10 16:17       ` Paolo Bonzini
2014-07-10 19:02         ` Dr. David Alan Gilbert
2014-07-04 17:41 ` [Qemu-devel] [PATCH 17/46] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 18/46] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 19/46] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 20/46] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 21/46] postcopy: OS support test Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 22/46] Migration parameters: Add qmp/hmp commands for setting/viewing Dr. David Alan Gilbert (git)
2014-07-07 19:50   ` Eric Blake
2014-07-04 17:41 ` [Qemu-devel] [PATCH 23/46] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 24/46] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 25/46] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 26/46] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 27/46] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 28/46] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 29/46] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 30/46] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
2014-07-05 10:19   ` Paolo Bonzini
2014-08-28 11:04     ` Dr. David Alan Gilbert
2014-08-28 11:23       ` Paolo Bonzini
2014-07-04 17:41 ` [Qemu-devel] [PATCH 32/46] mig fd_connect: open return path Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 33/46] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 34/46] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 35/46] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 36/46] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 37/46] Add assertion to check migration_dirty_pages doesn't go -ve; have seen it happen once but not sure why Dr. David Alan Gilbert (git)
2014-07-11 15:20   ` Eric Blake
2014-07-11 15:41     ` Dr. David Alan Gilbert
2014-07-04 17:41 ` [Qemu-devel] [PATCH 38/46] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 39/46] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 40/46] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 41/46] Handle userfault requests (although userfaultfd not done yet) Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 42/46] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 43/46] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 44/46] postcopy: Use userfaultfd Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 45/46] End of migration for postcopy Dr. David Alan Gilbert (git)
2014-07-04 17:41 ` [Qemu-devel] [PATCH 46/46] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2014-07-05 10:28 ` [Qemu-devel] [PATCH 00/46] Postcopy implementation Paolo Bonzini
2014-07-07 14:02   ` Dr. David Alan Gilbert
2014-07-07 14:35     ` Paolo Bonzini
2014-07-07 14:58       ` Dr. David Alan Gilbert
2014-07-10 11:29       ` Dr. David Alan Gilbert
2014-07-10 12:48         ` Eric Blake
2014-07-10 13:37           ` Dr. David Alan Gilbert
2014-07-10 15:33             ` Andrea Arcangeli
2014-07-10 15:49               ` Dr. David Alan Gilbert
2014-07-11  4:05                 ` Sanidhya Kashyap
2014-08-11 15:31           ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.