All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/47] Postcopy implementation
@ 2014-08-28 15:03 Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 01/47] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
                   ` (46 more replies)
  0 siblings, 47 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is the 3rd cut of my postcopy implementation.  

The two largest changes since V2 are:
    * The source side return path handler is now a thread, and that
      allows the source side fd to be blocking.  (I'm using shutdown(2)
      to make that thread joinable at the end).  I've not changed the
      destination side yet, I'm still concerned that we could end
      up blocking the HMP/QMP if we're not careful.

    * The dirty bitmaps are no longer sync'ed after we switch into postcopy
      We no longer need to do it, because in principal nothing should change,
      and indeed if it does change, the dirtying caused the page to be resent
      and the destination would reject it (as it should since it can have been
      changing itself).  In practice I'm seeing occasional cases where dirtying
      was happening, I'm suspecting this might be related to the networking
      code still being live (see the current patches on the list to stop that)
      but I haven't dug too deep on this.

There are also a bunch of other fixes:
  Docs update (Paolo's comments, and Eric's)
  postcopy test: fix error path (zhanghailiang's comment)
  More clean up of exit cases; a lot less TODO's
  More bisectable - but not there yet

I've also included the QEMUSizedBuffer/QEMUFile patches in this set for those
who want to patch from the mail; but still intend to update them separately.

      
This code is also now in a public GIT repo; see

https://github.com/orbitfp7/qemu/tree/wp3-postcopy

This version is tag: wp3-postcopy-v3 on the wp3-postcopy branch.

Postcopy requires the kernel modifications from Andrea posted here:
  http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00525.html

Current TODO:
   1) It's not bisectable yet
   2) There are no testsuite additions (although I have a virt-test modification
      I've been using).
   3) Not all the code is there for systems with hostpagesize!=qemupagesize
   4) xbzrle needs disabling once in postcopy
   5) RDMA needs some rework
   6) The latency measurements are now pretty consistent, no very large spikes,
      but they're a bit higher than expected, I need to look at rate limiting
      just the background scan.
   7) Conversion of destination side return-path to blocking fd needs investigation
      (as per discussion with Paolo)
   8) Andrea has suggestions on ways to avoid some of the huge-page splitting
      that occurs during the discard phase after precopy.
   9) I'd like to format the data on the return path in a more structured way
      (i.e. maybe using stuff from my BER world).
  10) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
      the source feels like it needs looking at for postcopy.
  11) I've got an occasional (1/100 ~ 1/500ish) failure on migration of idle VMs

Dave



Dr. David Alan Gilbert (47):
  QEMUSizedBuffer/QEMUFile
  Tests: QEMUSizedBuffer/QEMUBuffer
  Start documenting how postcopy works.
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  improve DPRINTF macros, add to savevm
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Create MigrationIncomingState
  socket shutdown
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  qemu_loadvm errors and debug
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  migrate_init: Call from savevm
  Allow savevm handlers to state whether they could go into postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  qemu_savevm_state_complete: Postcopy changes
  Postcopy: Maintain sentmap during postcopy pre phase
  Postcopy page-map-incoming (PMI) structure
  postcopy: Add incoming_init/cleanup functions
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: postcopy_start
  Postcopy: Rework migration thread for postcopy mode
  mig fd_connect: open return path
  Postcopy: Create a fault handler thread before marking the ram as
    userfault
  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  Add assertion to check migration_dirty_pages
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  qemu_ram_block_from_host
  Don't sync dirty bitmaps in postcopy
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
  End of migration for postcopy

 Makefile.objs                    |   2 +-
 arch_init.c                      | 499 +++++++++++++++++++--
 docs/migration.txt               | 188 ++++++++
 exec.c                           |  66 ++-
 hmp-commands.hx                  |  15 +
 hmp.c                            |   7 +
 hmp.h                            |   1 +
 include/exec/cpu-common.h        |   8 +-
 include/migration/migration.h    | 127 ++++++
 include/migration/postcopy-ram.h |  89 ++++
 include/migration/qemu-file.h    |  47 ++
 include/migration/vmstate.h      |   2 +-
 include/qemu/sockets.h           |   1 +
 include/qemu/typedefs.h          |   8 +-
 include/sysemu/sysemu.h          |  41 +-
 migration-rdma.c                 |   4 +-
 migration.c                      | 680 ++++++++++++++++++++++++++--
 postcopy-ram.c                   | 924 +++++++++++++++++++++++++++++++++++++++
 qapi-schema.json                 |  14 +-
 qemu-file.c                      | 552 ++++++++++++++++++++++-
 qmp-commands.hx                  |  19 +
 savevm.c                         | 848 ++++++++++++++++++++++++++++++++---
 tests/Makefile                   |   2 +-
 tests/test-vmstate.c             |  73 ++--
 util/qemu-sockets.c              |  28 ++
 25 files changed, 4069 insertions(+), 176 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 01/47] QEMUSizedBuffer/QEMUFile
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

** Intended to merge separately; see the separate qemu-devel thread for
   latest version
**

This is based on Stefan Berger's patch that creates a QEMUFile that goes
to a memory buffer; from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Joel Schopp <jschopp@linux.vnet.ibm.com>

For minor tweeks/rebase I've done to it:
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  28 +++
 include/qemu/typedefs.h       |   1 +
 qemu-file.c                   | 410 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 439 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..80af3ff 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #define QEMU_FILE_H 1
 #include "exec/cpu-common.h"
 
+#include <stdint.h>
+
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
  * streaming.  The handler should try to write all of the data it can.
@@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
     QEMURamSaveFunc *save_page;
 } QEMUFileOps;
 
+struct QEMUSizedBuffer {
+    struct iovec *iov;
+    size_t n_iov;
+    size_t size; /* total allocated size in all iov's */
+    size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
@@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
 bool qemu_file_mode_is_not_valid(const char *mode);
 
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+                       uint8_t **buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+                     off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
     qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct MSIMessage MSIMessage;
 typedef struct SerialState SerialState;
 typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..d64bee2 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -878,3 +878,413 @@ uint64_t qemu_get_be64(QEMUFile *f)
     v |= qemu_get_be32(f);
     return v;
 }
+
+#define QSB_CHUNK_SIZE      (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *       hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+    QEMUSizedBuffer *qsb;
+    size_t alloc_len, num_chunks, i, to_copy;
+    size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+                        ? QSB_MAX_CHUNK_SIZE
+                        : QSB_CHUNK_SIZE;
+
+    if (len == 0) {
+        /* we want to allocate at least one chunk */
+        len = QSB_CHUNK_SIZE;
+    }
+
+    num_chunks = DIV_ROUND_UP(len, chunk_size);
+    alloc_len = num_chunks * chunk_size;
+
+    qsb = g_new0(QEMUSizedBuffer, 1);
+    qsb->iov = g_new0(struct iovec, num_chunks);
+    qsb->n_iov = num_chunks;
+
+    for (i = 0; i < num_chunks; i++) {
+        qsb->iov[i].iov_base = g_malloc0(chunk_size);
+        qsb->iov[i].iov_len = chunk_size;
+        if (buffer) {
+            to_copy = (len - qsb->used) > chunk_size
+                      ? chunk_size : (len - qsb->used);
+            memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy);
+            qsb->used += to_copy;
+        }
+    }
+
+    qsb->size = alloc_len;
+
+    return qsb;
+}
+
+/**
+ * Free the QEMUSizedBuffer
+ *
+ * @qsb: The QEMUSizedBuffer to free
+ */
+void qsb_free(QEMUSizedBuffer *qsb)
+{
+    size_t i;
+
+    if (!qsb) {
+        return;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        g_free(qsb->iov[i].iov_base);
+    }
+    g_free(qsb->iov);
+    g_free(qsb);
+}
+
+/**
+ * Get the number of of used bytes in the QEMUSizedBuffer
+ *
+ * @qsb: A QEMUSizedBuffer
+ *
+ * Returns the number of bytes currently used in this buffer
+ */
+size_t qsb_get_length(const QEMUSizedBuffer *qsb)
+{
+    return qsb->used;
+}
+
+/**
+ * Set the length of the buffer; the primary usage of this
+ * function is to truncate the number of used bytes in the buffer.
+ * The size will not be extended beyond the current number of
+ * allocated bytes in the QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_len : The new length of bytes in the buffer
+ *
+ * Returns the number of bytes the buffer was truncated or extended
+ * to.
+ */
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t new_len)
+{
+    if (new_len <= qsb->size) {
+        qsb->used = new_len;
+    } else {
+        qsb->used = qsb->size;
+    }
+    return qsb->used;
+}
+
+/**
+ * Get the iovec that holds the data for a given position @pos.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @pos: The index of a byte in the buffer
+ * @d_off: Pointer to an offset that this function will indicate
+ *         at what position within the returned iovec the byte
+ *         is to be found
+ *
+ * Returns the index of the iovec that holds the byte at the given
+ * index @pos in the byte stream; a negative number if the iovec
+ * for the given position @pos does not exist.
+ */
+static ssize_t qsb_get_iovec(const QEMUSizedBuffer *qsb,
+                             off_t pos, off_t *d_off)
+{
+    ssize_t i;
+    off_t curr = 0;
+
+    if (pos > qsb->used) {
+        return -1;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        if (curr + qsb->iov[i].iov_len > pos) {
+            *d_off = pos - curr;
+            return i;
+        }
+        curr += qsb->iov[i].iov_len;
+    }
+    return -1;
+}
+
+/*
+ * Convert the QEMUSizedBuffer into a flat buffer.
+ *
+ * Note: If at all possible, try to avoid this function since it
+ *       may unnecessarily copy memory around.
+ *
+ * @qsb: pointer to QEMUSizedBuffer
+ * @start : offset to start at
+ * @count: number of bytes to copy
+ * @buf: a pointer to an optional buffer to write into; the pointer may
+ *       point to NULL in which case the buffer will be allocated;
+ *       if buffer is provided, it must be large enough to hold @count bytes
+ *
+ * Returns the number of bytes  copied into the output buffer
+ */
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *qsb, off_t start,
+                       size_t count, uint8_t **buf)
+{
+    uint8_t *buffer;
+    const struct iovec *iov;
+    size_t to_copy, all_copy;
+    ssize_t index;
+    off_t s_off;
+    off_t d_off = 0;
+    char *s;
+
+    if (start > qsb->used) {
+        return 0;
+    }
+
+    all_copy = qsb->used - start;
+    if (all_copy > count) {
+        all_copy = count;
+    } else {
+        count = all_copy;
+    }
+
+    if (*buf == NULL) {
+        *buf = g_malloc(all_copy);
+    }
+    buffer = *buf;
+
+    index = qsb_get_iovec(qsb, start, &s_off);
+    if (index < 0) {
+        return 0;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        s = iov->iov_base;
+
+        to_copy = iov->iov_len - s_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+        memcpy(&buffer[d_off], &s[s_off], to_copy);
+
+        d_off += to_copy;
+        all_copy -= to_copy;
+
+        s_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Grow the QEMUSizedBuffer to the given size and allocated
+ * memory for it.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_size: The new size of the buffer
+ *
+ * Returns an error code in case of memory allocation failure
+ * or the new size of the buffer otherwise. The returned size
+ * may be greater or equal to @new_size.
+ */
+static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size)
+{
+    size_t needed_chunks, i;
+    size_t chunk_size = QSB_CHUNK_SIZE;
+
+    if (qsb->size < new_size) {
+        needed_chunks = DIV_ROUND_UP(new_size - qsb->size,
+                                     chunk_size);
+
+        qsb->iov = g_realloc_n(qsb->iov, qsb->n_iov + needed_chunks,
+                               sizeof(struct iovec));
+        if (qsb->iov == NULL) {
+            return -ENOMEM;
+        }
+
+        for (i = qsb->n_iov; i < qsb->n_iov + needed_chunks; i++) {
+            qsb->iov[i].iov_base = g_malloc0(chunk_size);
+            qsb->iov[i].iov_len = chunk_size;
+        }
+
+        qsb->n_iov += needed_chunks;
+        qsb->size += (needed_chunks * chunk_size);
+    }
+
+    return qsb->size;
+}
+
+/**
+ * Write into the QEMUSizedBuffer at a given position and a given
+ * number of bytes. This function will automatically grow the
+ * QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @source: A byte array to copy data from
+ * @pos: The position withing the @qsb to write data to
+ * @size: The number of bytes to copy into the @qsb
+ *
+ * Returns an error code in case of memory allocation failure,
+ * @size otherwise.
+ */
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
+                     off_t pos, size_t count)
+{
+    ssize_t rc = qsb_grow(qsb, pos + count);
+    size_t to_copy;
+    size_t all_copy = count;
+    const struct iovec *iov;
+    ssize_t index;
+    char *dest;
+    off_t d_off, s_off = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    if (pos + count > qsb->used) {
+        qsb->used = pos + count;
+    }
+
+    index = qsb_get_iovec(qsb, pos, &d_off);
+    if (index < 0) {
+        return 0;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        dest = iov->iov_base;
+
+        to_copy = iov->iov_len - d_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+
+        memcpy(&dest[d_off], &source[s_off], to_copy);
+
+        s_off += to_copy;
+        all_copy -= to_copy;
+
+        d_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Create an exact copy of the given QEMUSizedBuffer.
+ *
+ * @qsb : A QEMUSizedBuffer
+ *
+ * Returns a clone of @qsb
+ */
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
+{
+    QEMUSizedBuffer *out = qsb_create(NULL, qsb_get_length(qsb));
+    size_t i;
+    off_t pos = 0;
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        pos += qsb_write_at(out, qsb->iov[i].iov_base,
+                            pos, qsb->iov[i].iov_len);
+    }
+
+    return out;
+}
+
+typedef struct QEMUBuffer {
+    QEMUSizedBuffer *qsb;
+    QEMUFile *file;
+} QEMUBuffer;
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+    ssize_t len = qsb_get_length(s->qsb) - pos;
+
+    if (len <= 0) {
+        return 0;
+    }
+
+    if (len > size) {
+        len = size;
+    }
+    return qsb_get_buffer(s->qsb, pos, len, &buf);
+}
+
+static int buf_put_buffer(void *opaque, const uint8_t *buf,
+                          int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+
+    return qsb_write_at(s->qsb, buf, pos, size);
+}
+
+static int buf_close(void *opaque)
+{
+    QEMUBuffer *s = opaque;
+
+    qsb_free(s->qsb);
+
+    g_free(s);
+
+    return 0;
+}
+
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f)
+{
+    QEMUBuffer *p;
+
+    qemu_fflush(f);
+
+    p = (QEMUBuffer *)f->opaque;
+
+    return p->qsb;
+}
+
+static const QEMUFileOps buf_read_ops = {
+    .get_buffer = buf_get_buffer,
+    .close =      buf_close
+};
+
+static const QEMUFileOps buf_write_ops = {
+    .put_buffer = buf_put_buffer,
+    .close =      buf_close
+};
+
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
+{
+    QEMUBuffer *s;
+
+    if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || mode[1] != 0) {
+        fprintf(stderr, "qemu_bufopen: Argument validity check failed\n");
+        return NULL;
+    }
+
+    s = g_malloc0(sizeof(QEMUBuffer));
+    if (mode[0] == 'r') {
+        s->qsb = input;
+    }
+
+    if (s->qsb == NULL) {
+        s->qsb = qsb_create(NULL, 0);
+    }
+
+    if (mode[0] == 'r') {
+        s->file = qemu_fopen_ops(s, &buf_read_ops);
+    } else {
+        s->file = qemu_fopen_ops(s, &buf_write_ops);
+    }
+    return s->file;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 02/47] Tests: QEMUSizedBuffer/QEMUBuffer
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 01/47] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

** Intended to merge separately; see the separate qemu-devel thread for
   latest version
**

Modify some of tests/test-vmstate.c to use the in memory file based
on QEMUSizedBuffer to provide basic testing of QEMUSizedBuffer and
the associated memory backed QEMUFile type.

Only some of the tests are changed so that the fd backed QEMUFile is
still tested.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/Makefile       |  2 +-
 tests/test-vmstate.c | 73 ++++++++++++++++++++++++++--------------------------
 2 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 837e9c8..e574a91 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -253,7 +253,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	libqemuutil.a libqemustub.a
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	vmstate.o qemu-file.o \
-	libqemuutil.a
+	libqemuutil.a libqemustub.a
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d72c64c..716d034 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -43,6 +43,12 @@ void yield_until_fd_readable(int fd)
     select(fd + 1, &fds, NULL, NULL, NULL);
 }
 
+/*
+ * Some tests use 'open_test_file' to work on a real fd, some use
+ * an in memory file (QEMUSizedBuffer+qemu_bufopen); we could pick one
+ * but this way we test both.
+ */
+
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
@@ -54,6 +60,29 @@ static QEMUFile *open_test_file(bool write)
     return qemu_fdopen(fd, write ? "wb" : "rb");
 }
 
+/* Open a read-only qemu-file from an existing memory block */
+static QEMUFile *open_mem_file_read(const void *data, size_t len)
+{
+    /* The qsb gets freed by qemu_fclose */
+    QEMUSizedBuffer *qsb = qsb_create(data, len);
+
+    return qemu_bufopen("r", qsb);
+}
+
+/*
+ * Check that the contents of the memory-buffered file f match
+ * the given size/data.
+ */
+static void check_mem_file(QEMUFile *f, void *data, size_t size)
+{
+    uint8_t *result = NULL; /* qsb_get_buffer allocs a buffer */
+    const QEMUSizedBuffer *qsb = qemu_buf_get(f);
+    g_assert_cmpint(qsb_get_length(qsb), ==, size);
+    g_assert_cmpint(qsb_get_buffer(qsb, 0, size, &result), ==, size);
+    g_assert_cmpint(memcmp(result, data, size), ==, 0);
+    g_free(result);
+}
+
 #define SUCCESS(val) \
     g_assert_cmpint((val), ==, 0)
 
@@ -371,14 +400,12 @@ static const VMStateDescription vmstate_skipping = {
 
 static void test_save_noskip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
+    QEMUFile *fsave = qemu_bufopen("w", NULL);
     TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
                        .skip_c_e = false };
     vmstate_save_state(fsave, &vmstate_skipping, &obj);
     g_assert(!qemu_file_get_error(fsave));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
     uint8_t expected[] = {
         0, 0, 0, 1,             /* a */
         0, 0, 0, 2,             /* b */
@@ -387,52 +414,31 @@ static void test_save_noskip(void)
         0, 0, 0, 5,             /* e */
         0, 0, 0, 0, 0, 0, 0, 6, /* f */
     };
-    uint8_t result[sizeof(expected)];
-    g_assert_cmpint(qemu_get_buffer(loading, result, sizeof(result)), ==,
-                    sizeof(result));
-    g_assert(!qemu_file_get_error(loading));
-    g_assert_cmpint(memcmp(result, expected, sizeof(result)), ==, 0);
-
-    /* Must reach EOF */
-    qemu_get_byte(loading);
-    g_assert_cmpint(qemu_file_get_error(loading), ==, -EIO);
-
-    qemu_fclose(loading);
+    check_mem_file(fsave, expected, sizeof(expected));
+    qemu_fclose(fsave);
 }
 
 static void test_save_skip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
+    QEMUFile *fsave = qemu_bufopen("w", NULL);
     TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
                        .skip_c_e = true };
     vmstate_save_state(fsave, &vmstate_skipping, &obj);
     g_assert(!qemu_file_get_error(fsave));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
     uint8_t expected[] = {
         0, 0, 0, 1,             /* a */
         0, 0, 0, 2,             /* b */
         0, 0, 0, 0, 0, 0, 0, 4, /* d */
         0, 0, 0, 0, 0, 0, 0, 6, /* f */
     };
-    uint8_t result[sizeof(expected)];
-    g_assert_cmpint(qemu_get_buffer(loading, result, sizeof(result)), ==,
-                    sizeof(result));
-    g_assert(!qemu_file_get_error(loading));
-    g_assert_cmpint(memcmp(result, expected, sizeof(result)), ==, 0);
-
-
-    /* Must reach EOF */
-    qemu_get_byte(loading);
-    g_assert_cmpint(qemu_file_get_error(loading), ==, -EIO);
+    check_mem_file(fsave, expected, sizeof(expected));
 
-    qemu_fclose(loading);
+    qemu_fclose(fsave);
 }
 
 static void test_load_noskip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
     uint8_t buf[] = {
         0, 0, 0, 10,             /* a */
         0, 0, 0, 20,             /* b */
@@ -442,10 +448,8 @@ static void test_load_noskip(void)
         0, 0, 0, 0, 0, 0, 0, 60, /* f */
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
-    qemu_put_buffer(fsave, buf, sizeof(buf));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
+    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
     TestStruct obj = { .skip_c_e = false };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
@@ -460,7 +464,6 @@ static void test_load_noskip(void)
 
 static void test_load_skip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
     uint8_t buf[] = {
         0, 0, 0, 10,             /* a */
         0, 0, 0, 20,             /* b */
@@ -468,10 +471,8 @@ static void test_load_skip(void)
         0, 0, 0, 0, 0, 0, 0, 60, /* f */
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
-    qemu_put_buffer(fsave, buf, sizeof(buf));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
+    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
     TestStruct obj = { .skip_c_e = true, .c = 300, .e = 500 };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 01/47] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-09-09  3:34   ` Hongyang Yang
  2014-09-09  3:39   ` Hongyang Yang
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  46 siblings, 2 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..7f0fdc4 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way communication; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is automatically made to precopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal postcopy,
+except for the addition of a 'postcopy advise' command at the beginning to
+let the destination know that postcopy might happen.  When postcopy starts
+the source sends the page discard data and then forms the 'package' containing:
+
+   Command: 'postcopy ram listen'
+   The device state
+      A series of sections, identical to the precopy streams device state stream
+      containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy ram run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+------------------------------------------------------------------------------
+                        1      2   3     4 5                      6   7
+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
+thread                             |       |
+                                   |     (page request)
+                                   |        \___
+                                   v            \
+listen thread:                     --- page -- page -- page -- page -- page --
+
+                                   a   b        c
+------------------------------------------------------------------------------
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+
+On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package.   It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+
+=== Postcopy states ===
+
+Postcopy moves through a series of states (see postcopy_ram_state)
+from ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't had the start command; here the destination
+          checks that its OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
+
+  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+          state, the migration is now complete.
+
+=== Source side page maps ===
+
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that once in postcopy mode, the sent map is still updated; however,
+its contents are not necessarily consistent with the pages already sent
+due to the masking with the migration bitmap.
+
+=== Destination side page maps ===
+
+(Needs to be changed so we can update both easily - at the moment updates are done
+ with a lock)
+The destination keeps a 'requested map' and a 'received map'.
+Both maps are initially 0, as pages are received the bits are set in 'received map'.
+Incoming requests from the kernel cause the bit to be set in the 'requested map'.
+When a page is received that is marked as 'requested' the kernel is notified.
+If the kernel requests a page that has already been 'received' the kernel is notified
+without re-requesting.
+
+This leads to three valid page states:
+page states:
+    missing (!rc,!rq)  - page not yet received or requested
+    received (rc,!rq)  - Page received
+    requested (!rc,rq) - page requested but not yet received
+
+state transitions:
+      received -> missing   (only during setup/discard)
+
+      missing -> received   (normal incoming page)
+      requested -> received (incoming page previously requested)
+      missing -> requested  (userfault request)
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and __qemu_rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration-rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 5f9857c..8b95502 100644
--- a/exec.c
+++ b/exec.c
@@ -2774,12 +2774,18 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->length, opaque);
+        ret = func(block->idstr, block->host, block->offset, block->length,
+                   opaque);
+        if (ret) {
+            return ret;
+        }
     }
+    return 0;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e3ec4c8..8042f50 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -118,10 +118,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration-rdma.c b/migration-rdma.c
index d99812c..666c052 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -595,10 +595,10 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
+    return __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 05/47] improve DPRINTF macros, add to savevm
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Improve the existing DPRINTF macros in migration.c and arch_init
by:
  1) Making them go to stderr rather than stdout (so you can run with
-nographic and redirect your debug to a file)
  2) Making them print the ms time with each debug - useful for
debugging latency issues

Add the same macro to savevm

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c |  5 ++++-
 migration.c | 12 ++++++++++++
 savevm.c    | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 28ece76..d2b565e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -53,9 +53,12 @@
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
 
+// #define DEBUG_ARCH_INIT
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
-    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+    do { fprintf(stderr,  "arch_init@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
 #else
 #define DPRINTF(fmt, ...) \
     do { } while (0)
diff --git a/migration.c b/migration.c
index 8d675b3..e241370 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,18 @@
 #include "qmp-commands.h"
 #include "trace.h"
 
+//#define DEBUG_MIGRATION
+
+#ifdef DEBUG_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "migration@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 enum {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
diff --git a/savevm.c b/savevm.c
index e19ae0a..c3a1f68 100644
--- a/savevm.c
+++ b/savevm.c
@@ -43,6 +43,16 @@
 #include "block/snapshot.h"
 #include "block/qapi.h"
 
+#ifdef DEBUG_SAVEVM
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "savevm@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

and use it in loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 qemu-file.c                   | 15 +++++++++++++++
 savevm.c                      | 18 ++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 80af3ff..e50d696 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -300,4 +300,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);
 #endif
diff --git a/qemu-file.c b/qemu-file.c
index d64bee2..f6d64ce 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -879,6 +879,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: 0 on success and a 0 terminated string in the buffer
+ */
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
+{
+    unsigned int len = qemu_get_byte(f);
+    int res = qemu_get_buffer(f, buf, len);
+
+    buf[len] = 0;
+
+    return res != len;
+}
+
 #define QSB_CHUNK_SIZE      (1 << 10)
 #define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
 
diff --git a/savevm.c b/savevm.c
index c3a1f68..cb6f0de 100644
--- a/savevm.c
+++ b/savevm.c
@@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
 
     v = qemu_get_be32(f);
     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
         return -ENOTSUP;
     }
     if (v != QEMU_VM_FILE_VERSION) {
@@ -918,31 +918,33 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
-        char idstr[257];
-        int len;
+        char idstr[256];
 
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
             /* Read section start */
             section_id = qemu_get_be32(f);
-            len = qemu_get_byte(f);
-            qemu_get_buffer(f, (uint8_t *)idstr, len);
-            idstr[len] = 0;
+            if (qemu_get_counted_string(f, (uint8_t *)idstr)) {
+                error_report("Unable to read ID string for section %u",
+                            section_id);
+                return -EINVAL;
+            }
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
-                fprintf(stderr, "Unknown savevm section or instance '%s' %d\n", idstr, instance_id);
+                error_report("Unknown savevm section or instance '%s' %d",
+                             idstr, instance_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
-                fprintf(stderr, "savevm: unsupported version %d for '%s' v%d\n",
+                error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
                 ret = -EINVAL;
                 goto out;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 07/47] Create MigrationIncomingState
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 08/47] socket shutdown Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  9 +++++++++
 include/qemu/typedefs.h       |  2 ++
 migration.c                   | 28 ++++++++++++++++++++++++++++
 savevm.c                      |  2 ++
 4 files changed, 41 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..8a36255 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,6 +41,15 @@ struct MigrationParams {
 
 typedef struct MigrationState MigrationState;
 
+/* State for the incoming migration */
+struct MigrationIncomingState {
+    QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_get_current(void);
+MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
+void migration_incoming_state_destroy(void);
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index db1153a..0f79b5c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct Visitor Visitor;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 
 typedef struct Property Property;
@@ -44,6 +45,7 @@ typedef struct PixelFormat PixelFormat;
 typedef struct QemuConsole QemuConsole;
 typedef struct CharDriverState CharDriverState;
 typedef struct MACAddr MACAddr;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct NetClientState NetClientState;
 typedef struct I2CBus I2CBus;
 typedef struct ISABus ISABus;
diff --git a/migration.c b/migration.c
index e241370..ac46ddb 100644
--- a/migration.c
+++ b/migration.c
@@ -65,6 +65,7 @@ static NotifierList migration_state_notifiers =
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
 
+/* For outgoing */
 MigrationState *migrate_get_current(void)
 {
     static MigrationState current_migration = {
@@ -77,6 +78,28 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+/* For incoming */
+static MigrationIncomingState *mis_current;
+
+MigrationIncomingState *migration_incoming_get_current(void)
+{
+    return mis_current;
+}
+
+MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
+{
+    mis_current = g_malloc0(sizeof(MigrationIncomingState));
+    mis_current->file = f;
+
+    return mis_current;
+}
+
+void migration_incoming_state_destroy(void)
+{
+    g_free(mis_current);
+    mis_current = NULL;
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -106,9 +129,14 @@ static void process_incoming_migration_co(void *opaque)
     Error *local_err = NULL;
     int ret;
 
+    migration_incoming_state_init(f);
+
     ret = qemu_loadvm_state(f);
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
+    migration_incoming_state_destroy();
+
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
         exit(EXIT_FAILURE);
diff --git a/savevm.c b/savevm.c
index cb6f0de..a0c3b40 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,9 +1244,11 @@ int load_vmstate(const char *name)
     }
 
     qemu_system_reset(VMRESET_SILENT);
+    migration_incoming_state_init(f);
     ret = qemu_loadvm_state(f);
 
     qemu_fclose(f);
+    migration_incoming_state_destroy();
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 08/47] socket shutdown
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 09/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add QEMUFile interface to allow a socket to be 'shut down' - i.e. any
reads/writes will fail (and any blocking read/write will be woken).

Add qemu_socket wrapper to let OS dependencies be extracted out.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h | 10 ++++++++++
 include/qemu/sockets.h        |  1 +
 qemu-file.c                   | 27 +++++++++++++++++++++++++--
 util/qemu-sockets.c           | 28 ++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index e50d696..ca8fbdc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -84,6 +84,14 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                size_t size,
                                int *bytes_sent);
 
+/*
+ * Stop any read or write (depending on flags) on the underlying
+ * transport on the QEMUFile.
+ * Existing blocking reads/writes must be woken
+ * Returns 0 on success, -err on error
+ */
+typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +102,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
 struct QEMUSizedBuffer {
@@ -178,6 +187,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+void qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index fdbb196..ea8ffc6 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -41,6 +41,7 @@ int socket_set_nodelay(int fd);
 void qemu_set_block(int fd);
 void qemu_set_nonblock(int fd);
 int socket_set_fast_reuse(int fd);
+int socket_shutdown(int fd, bool rd, bool wr);
 int send_all(int fd, const void *buf, int len1);
 int recv_all(int fd, void *buf, int len1, bool single_read);
 
diff --git a/qemu-file.c b/qemu-file.c
index f6d64ce..d7401fa 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -90,6 +90,14 @@ static int socket_close(void *opaque)
     return 0;
 }
 
+/* qemufile_ to disambiguate from the qemu-sockets.c code which it uses */
+static int qemufile_socket_shutdown(void *opaque, bool rd, bool wr)
+{
+    QEMUFileSocket *s = opaque;
+
+    return socket_shutdown(s->fd, rd, wr);
+}
+
 static int stdio_get_fd(void *opaque)
 {
     QEMUFileStdio *s = opaque;
@@ -337,15 +345,30 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 static const QEMUFileOps socket_read_ops = {
     .get_fd =     socket_get_fd,
     .get_buffer = socket_get_buffer,
-    .close =      socket_close
+    .close =      socket_close,
+    .shut_down       = qemufile_socket_shutdown
+
 };
 
 static const QEMUFileOps socket_write_ops = {
     .get_fd =     socket_get_fd,
     .writev_buffer = socket_writev_buffer,
-    .close =      socket_close
+    .close =      socket_close,
+    .shut_down       = qemufile_socket_shutdown
+
 };
 
+/*
+ * Stop a file from being read/written - not all backing files can do this
+ * typically only sockets can.  The caller should make sure they only
+ * call this for things that can.
+ */
+void qemu_file_shutdown(QEMUFile *f)
+{
+    assert(f->ops->shut_down);
+    f->ops->shut_down(f, true, true);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 5d38395..c68cb52 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -982,3 +982,31 @@ int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp)
     qemu_opts_del(opts);
     return fd;
 }
+
+int socket_shutdown(int fd, bool rd, bool wr)
+{
+    int how = 0;
+
+#ifndef WIN32
+    if (rd) {
+        how = SHUT_RD;
+    }
+
+    if (wr) {
+        how = rd ? SHUT_RDWR : SHUT_WR;
+    }
+
+#else
+    /* Untested */
+    if (rd) {
+        how = SD_RECEIVE;
+    }
+
+    if (wr) {
+        how = rd ? SD_BOTH : SD_SEND;
+    }
+
+#endif
+
+    return shutdown(fd, how);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 09/47] Return path: Open a return path on QEMUFile for sockets
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 08/47] socket shutdown Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 10/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 qemu-file.c                   | 73 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index ca8fbdc..fd07af5 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                int *bytes_sent);
 
 /*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
+/*
  * Stop any read or write (depending on flags) on the underlying
  * transport on the QEMUFile.
  * Existing blocking reads/writes must be woken
@@ -102,6 +107,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
     QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
@@ -188,6 +194,7 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 void qemu_file_shutdown(QEMUFile *f);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/qemu-file.c b/qemu-file.c
index d7401fa..694e887 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -26,6 +26,8 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+
+    struct QEMUFile *return_path;
 };
 
 typedef struct QEMUFileStdio {
@@ -38,6 +40,45 @@ typedef struct QEMUFileSocket {
     QEMUFile *file;
 } QEMUFileSocket;
 
+/*
+ * Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *qfs = opaque;
+    int revfd;
+    bool this_is_read;
+    QEMUFile *result;
+
+    /* We should only be called once to get a RP on a file */
+    assert(!qfs->file->return_path);
+
+    if (qemu_file_get_error(qfs->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    /* I don't think there's a better way to tell which direction 'this' is */
+    this_is_read = qfs->file->ops->get_buffer != NULL;
+
+    revfd = dup(qfs->fd);
+    if (revfd == -1) {
+        error_report("Error duplicating fd for return path: %s",
+                      strerror(errno));
+        return NULL;
+    }
+
+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+    qfs->file->return_path = result;
+
+    if (!result) {
+        close(revfd);
+    }
+
+    return result;
+}
+
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                     int64_t pos)
 {
@@ -343,19 +384,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd =     socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close =      socket_close,
-    .shut_down       = qemufile_socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .shut_down       = qemufile_socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd =     socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close =      socket_close,
-    .shut_down       = qemufile_socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .shut_down       = qemufile_socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 /*
@@ -369,6 +410,18 @@ void qemu_file_shutdown(QEMUFile *f)
     f->ops->shut_down(f, true, true);
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 10/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 09/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 11/47] Migration commands Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The return path uses a non-blocking fd so as not to block waiting
for the (possibly broken) destination to finish returning a message,
however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/qemu-file.c b/qemu-file.c
index 694e887..2ba59a5 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -85,12 +85,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
     }
-    return len;
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 11/47] Migration commands
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 10/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 12/47] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  9 ++++++++
 savevm.c                      | 48 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8a36255..e23947a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -33,6 +33,7 @@
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_COMMAND              0x06
 
 struct MigrationParams {
     bool blk;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..a0a91e3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -81,6 +81,13 @@ void do_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+
+    QEMU_VM_CMD_AFTERLASTVALID
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -88,6 +95,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index a0c3b40..3cae292 100644
--- a/savevm.c
+++ b/savevm.c
@@ -592,6 +592,25 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
     vmstate_save_state(f, se->vmsd, se->opaque);
 }
 
+
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    uint32_t tmp = (uint16_t)command;
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, tmp);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -881,6 +900,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * negative return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t com;
+    uint16_t len;
+
+    com = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
+    switch (com) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -987,6 +1029,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 12/47] Return path: Control commands
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 11/47] Migration commands Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 13/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPENRP - To request that the destination open the return path
   * REQACK - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  4 +++
 savevm.c                      | 57 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e23947a..173775b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,8 @@ typedef struct MigrationState MigrationState;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
+
+    QEMUFile *return_path;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index a0a91e3..4dd6ba0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,8 @@ void qemu_announce_self(void);
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
+    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
     QEMU_VM_CMD_AFTERLASTVALID
 };
@@ -97,6 +99,8 @@ void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_openrp(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index 3cae292..793384a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -611,6 +611,19 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    DPRINTF("send_reqack %d", value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, QEMU_VM_CMD_REQACK, 4, (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_openrp(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
+}
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -900,20 +913,64 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
     uint16_t len;
+    uint32_t tmp32;
 
     com = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
     /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
     switch (com) {
+    case QEMU_VM_CMD_OPENRP:
+        if (loadvm_process_command_simple_lencheck("CMD_OPENRP", len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPENRP called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPENRP failed - could not open return path");
+            return -1;
+        }
+        break;
+
+    case QEMU_VM_CMD_REQACK:
+        if (loadvm_process_command_simple_lencheck("CMD_REQACK", len, 4)) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        DPRINTF("Received REQACK 0x%x", tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_REQACK (0x%x) received with no open return path",
+                         tmp32);
+            return -1;
+        }
+        /* migrate_send_rp_ack(mis, tmp32); TODO: gets added later */
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 13/47] Return path: Send responses from destination to source
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 12/47] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 14/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send an 'ack' message
  Use it in the CMD_REQACK handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 18 ++++++++++++++++++
 migration.c                   | 41 +++++++++++++++++++++++++++++++++++++++++
 savevm.c                      |  2 +-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 173775b..12e640d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,6 +40,13 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Commands sent on the return path from destination to source*/
+enum mig_rpcomm_cmd {
+    MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
+    MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_AFTERLASTVALID
+};
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -47,6 +54,7 @@ struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -168,6 +176,16 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value);
+
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index ac46ddb..5ba8f3e 100644
--- a/migration.c
+++ b/migration.c
@@ -90,6 +90,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -100,6 +101,46 @@ void migration_incoming_state_destroy(void)
     mis_current = NULL;
 }
 
+/* Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data)
+{
+    DPRINTF("migrate_send_rp_message: cmd=%d, len=%d\n", (int)cmd, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)cmd);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  None-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_SHUT, 4, (uint8_t *)&buf);
+}
+
+/* Send an 'ACK' message on the return channel with the given value */
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
diff --git a/savevm.c b/savevm.c
index 793384a..8eebbfd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -969,7 +969,7 @@ static int loadvm_process_command(QEMUFile *f)
                          tmp32);
             return -1;
         }
-        /* migrate_send_rp_ack(mis, tmp32); TODO: gets added later */
+        migrate_send_rp_ack(mis, tmp32);
         break;
 
     default:
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 14/47] Return path: Source handling of return path
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 13/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 15/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  10 +++
 migration.c                   | 156 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 12e640d..b87c289 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -47,6 +47,14 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
     MIG_RPCOMM_AFTERLASTVALID
 };
+
+/* Source side RP state */
+struct MigrationRetPathState {
+    uint32_t      latest_ack;
+    QemuThread    rp_thread;
+    bool          error;
+};
+
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -69,9 +77,11 @@ struct MigrationState
     QemuThread thread;
     QEMUBH *cleanup_bh;
     QEMUFile *file;
+    QEMUFile *return_path;
 
     int state;
     MigrationParams params;
+    struct MigrationRetPathState rp_state;
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration.c b/migration.c
index 5ba8f3e..1754b67 100644
--- a/migration.c
+++ b/migration.c
@@ -371,6 +371,15 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    if (ms->return_path) {
+        DPRINTF("cleaning up return path\n");
+        qemu_fclose(ms->return_path);
+        ms->return_path = NULL;
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -378,6 +387,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -414,6 +425,11 @@ static void migrate_fd_cancel(MigrationState *s)
     int old_state ;
     trace_migrate_fd_cancel();
 
+    if (s->return_path) {
+        /* shutdown the rp socket, so causing the rp thread to shutdown */
+        qemu_file_shutdown(s->return_path);
+    }
+
     do {
         old_state = s->state;
         if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
@@ -655,8 +671,146 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ */
+static void *source_return_path_thread(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->return_path;
+    uint16_t expected_len, header_len, header_com;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    DPRINTF("RP: %s entry", __func__);
+    while (rp && !qemu_file_get_error(rp) &&
+        migration_already_active(ms)) {
+        DPRINTF("RP: %s top of loop", __func__);
+        header_com = qemu_get_be16(rp);
+        header_len = qemu_get_be16(rp);
+
+        switch (header_com) {
+        case MIG_RPCOMM_SHUT:
+        case MIG_RPCOMM_ACK:
+            expected_len = 4;
+            break;
+
+        default:
+            error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
+                    header_com, header_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        if (header_len > expected_len) {
+            error_report("RP: Received command 0x%04x with"
+                    "incorrect length %d expecting %d",
+                    header_com, header_len,
+                    expected_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, header_len);
+        if (res != header_len) {
+            DPRINTF("RP: Failed to read command data");
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the command and the data */
+        switch (header_com) {
+        case MIG_RPCOMM_SHUT:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            if (tmp32) {
+                error_report("RP: Sibling indicated error %d", tmp32);
+                source_return_path_bad(ms);
+            } else {
+                DPRINTF("RP: SHUT received");
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RPCOMM_ACK:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            DPRINTF("RP: Received ACK 0x%x", tmp32);
+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
+            break;
+
+        default:
+            /* This shouldn't happen because we should catch this above */
+            DPRINTF("RP: Bad header_com in dispatch");
+        }
+        /* Latest command processed, now leave a gap for the next one */
+        header_com = MIG_RPCOMM_INVALID;
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        DPRINTF("%s: rp bad at end", __func__);
+        source_return_path_bad(ms);
+    }
+
+    DPRINTF("%s: Bottom exit", __func__);
+
+out:
+    return NULL;
+}
+
+static int open_outgoing_return_path(MigrationState *ms)
+{
+
+    ms->return_path = qemu_file_get_return_path(ms->file);
+    if (!ms->return_path) {
+        return -1;
+    }
+
+    DPRINTF("%s: starting thread", __func__);
+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+
+    DPRINTF("%s: continuing", __func__);
+
+    return 0;
+}
+
+static void await_outgoing_return_path_close(MigrationState *ms)
+{
+    /*
+     * If this is a normal exit then the destination will send a SHUT and the
+     * rp_thread will exit, however if there's an error we need to cause
+     * it to exit, which we can do by a shutdown.
+     * (canceling must also shutdown to stop us getting stuck here if
+     * the destination died at just the wrong place)
+     */
+    if (qemu_file_get_error(ms->file) && ms->return_path) {
+        qemu_file_shutdown(ms->return_path);
+    }
+    DPRINTF("%s: Joining", __func__);
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    DPRINTF("%s: Exit", __func__);
+}
 
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 15/47] qemu_loadvm errors and debug
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 14/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 16/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Flip many fprintf's to error_report
Add lots of DPRINTF debug in qemu_loadvm*

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/savevm.c b/savevm.c
index 8eebbfd..2c0d61a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -719,6 +719,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
 
         if (ret < 0) {
+            DPRINTF("%s: setting error state after iterate on id=%d/%s",
+                    __func__, se->section_id, se->idstr);
             qemu_file_set_error(f, ret);
         }
         if (ret <= 0) {
@@ -1019,6 +1021,7 @@ int qemu_loadvm_state(QEMUFile *f)
         SaveStateEntry *se;
         char idstr[256];
 
+        DPRINTF("qemu_loadvm_state loop: section_type=%d", section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
@@ -1032,6 +1035,9 @@ int qemu_loadvm_state(QEMUFile *f)
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
+            DPRINTF("qemu_loadvm_state loop START/FULL: id=%d(%s)",
+                    section_id, idstr);
+
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
@@ -1059,8 +1065,9 @@ int qemu_loadvm_state(QEMUFile *f)
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state for instance 0x%x of device '%s'\n",
-                        instance_id, idstr);
+                error_report("qemu: error while loading state for"
+                             "instance 0x%x of device '%s'",
+                             instance_id, idstr);
                 goto out;
             }
             break;
@@ -1068,23 +1075,25 @@ int qemu_loadvm_state(QEMUFile *f)
         case QEMU_VM_SECTION_END:
             section_id = qemu_get_be32(f);
 
+            DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
-                fprintf(stderr, "Unknown savevm section %d\n", section_id);
+                error_report("Unknown savevm section %d", section_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state section id %d\n",
-                        section_id);
+                error_report("qemu: error while loading state section"
+                             " id %d (%s)", section_id, le->se->idstr);
                 goto out;
             }
+            DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
@@ -1093,11 +1102,12 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             break;
         default:
-            fprintf(stderr, "Unknown savevm section type %d\n", section_type);
+            error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
             goto out;
         }
     }
+    DPRINTF("qemu_loadvm_state loop: exited loop");
 
     cpu_synchronize_all_post_init();
 
@@ -1113,6 +1123,7 @@ out:
         ret = qemu_file_get_error(f);
     }
 
+    DPRINTF("qemu_loadvm_state out: ret=%d", ret);
     return ret;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 16/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 15/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 17/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Misses out lines that are all the expected value so the output
can be quite compact depending on the circumstance.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index d2b565e..2568369 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -767,6 +767,45 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of and it won't bother printing lines that are all this value
+ * if 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128l;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur+linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur+curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found |= (thisbit ^ expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08lx : %s\n", cur, linebuf);
+        }
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index b87c289..ff47987 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -157,6 +157,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 17/47] Rework loadvm path for subloops
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 16/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 18/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and signal whether the parent
should.

loadvm_handlers is made static since it's lifetime is greater
than the outer qemu_loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 136 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 84 insertions(+), 52 deletions(-)

diff --git a/savevm.c b/savevm.c
index 2c0d61a..7236232 100644
--- a/savevm.c
+++ b/savevm.c
@@ -915,6 +915,26 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/* These are ORable flags */
+const int LOADVM_EXITCODE_QUITLOOP     =  1;
+const int LOADVM_EXITCODE_QUITPARENT   =  2;
+const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
+
+typedef struct LoadStateEntry {
+    QLIST_ENTRY(LoadStateEntry) entry;
+    SaveStateEntry *se;
+    int section_id;
+    int version_id;
+} LoadStateEntry;
+
+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+static LoadStateEntry_Head loadvm_handlers =
+ QLIST_HEAD_INITIALIZER(loadvm_handlers);
+
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -931,8 +951,11 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
+ * 0   just a normal return
+ * 1   All good, but exit the loop
  */
-static int loadvm_process_command(QEMUFile *f)
+static int loadvm_process_command(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
@@ -982,39 +1005,13 @@ static int loadvm_process_command(QEMUFile *f)
     return 0;
 }
 
-typedef struct LoadStateEntry {
-    QLIST_ENTRY(LoadStateEntry) entry;
-    SaveStateEntry *se;
-    int section_id;
-    int version_id;
-} LoadStateEntry;
-
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
-    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-        QLIST_HEAD_INITIALIZER(loadvm_handlers);
-    LoadStateEntry *le, *new_le;
+    LoadStateEntry *le;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-
-    if (qemu_savevm_state_blocked(NULL)) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        return -ENOTSUP;
-    }
+    int exitcode = 0;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
@@ -1043,16 +1040,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1061,14 +1056,14 @@ int qemu_loadvm_state(QEMUFile *f)
             le->se = se;
             le->section_id = section_id;
             le->version_id = version_id;
-            QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+            QLIST_INSERT_HEAD(loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state for"
                              "instance 0x%x of device '%s'",
                              instance_id, idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1076,47 +1071,84 @@ int qemu_loadvm_state(QEMUFile *f)
             section_id = qemu_get_be32(f);
 
             DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
-            QLIST_FOREACH(le, &loadvm_handlers, entry) {
+            QLIST_FOREACH(le, loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state section"
                              " id %d (%s)", section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
-            ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            ret = loadvm_process_command(f, loadvm_handlers);
+            DPRINTF("%s QEMU_VM_COMMAND ret: %d", __func__, ret);
+            if ((ret < 0) || (ret & LOADVM_EXITCODE_QUITLOOP)) {
+                return ret;
             }
+            exitcode |= ret; /* Lets us pass flags up to the parent */
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
     DPRINTF("qemu_loadvm_state loop: exited loop");
 
-    cpu_synchronize_all_post_init();
+    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
+        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
+        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
+        exitcode &= LOADVM_EXITCODE_QUITLOOP;
+    }
+
+    return exitcode;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    LoadStateEntry *le, *new_le;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        return -EINVAL;
+    }
 
-    ret = 0;
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        return -ENOTSUP;
+    }
+
+    QLIST_INIT(&loadvm_handlers);
+    ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
-out:
-    QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-        QLIST_REMOVE(le, entry);
-        g_free(le);
+    if (ret == 0) {
+        cpu_synchronize_all_post_init();
+    }
+
+    if ((ret < 0) || !(ret & LOADVM_EXITCODE_KEEPHANDLERS)) {
+        QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
+            QLIST_REMOVE(le, entry);
+            g_free(le);
+        }
     }
 
     if (ret == 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 18/47] Add migration-capability boolean for postcopy-ram.
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 17/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 19/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/migration.h | 1 +
 migration.c                   | 9 +++++++++
 qapi-schema.json              | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ff47987..0d9f62d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -173,6 +173,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
 
diff --git a/migration.c b/migration.c
index 1754b67..2fa128f 100644
--- a/migration.c
+++ b/migration.c
@@ -635,6 +635,15 @@ bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 341f417..77e57bd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,14 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
+#          migrated, pulling the remaining pages along as needed. NOTE: If the
+#          migration fails during postcopy the VM will fail.  (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 19/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 18/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 20/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add state variable showing current incoming postcopy state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 include/sysemu/sysemu.h       |  19 +++
 savevm.c                      | 324 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 351 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0d9f62d..2c078c4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -61,6 +61,14 @@ typedef struct MigrationState MigrationState;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    volatile enum {
+        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+        POSTCOPY_RAM_INCOMING_ADVISE,
+        POSTCOPY_RAM_INCOMING_LISTENING,
+        POSTCOPY_RAM_INCOMING_RUNNING,
+        POSTCOPY_RAM_INCOMING_END
+    } postcopy_ram_state;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 4dd6ba0..0641cc2 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,16 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
+    QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
+                                              warn we might want to do PC */
+    QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,      /* A list of pages to discard that
+                                              were previously sent during
+                                              precopy but are dirty. */
+    QEMU_VM_CMD_POSTCOPY_RAM_LISTEN,       /* Start listening for incoming
+                                              pages as it's running. */
+    QEMU_VM_CMD_POSTCOPY_RAM_RUN,          /* Start execution */
+    QEMU_VM_CMD_POSTCOPY_RAM_END,          /* Postcopy is finished. */
+
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
@@ -101,6 +111,15 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist);
+
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index 7236232..13d975d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -33,12 +33,14 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -624,6 +626,83 @@ void qemu_savevm_send_openrp(QEMUFile *f)
 {
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
+
+/* Send prior to any RAM transfer */
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-advise");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 0, NULL);
+}
+
+/* Prior to running, to cause pages that have been dirtied after precopy
+ * started to be discarded on the destination.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *  2 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
+ *      byte   version (0)
+ *      byte   offset into the 1st data word containing 1st page of RAMBlock
+ *      byte   Length of name field
+ *  n x byte   RAM block name (NOT 0 terminated)
+ *  n x
+ *      be64   Page addresses for start of an invalidation range
+ *      be64   mask of 64 pages, '1' to discard'
+ *
+ *  Hopefully this is pretty sparse so we don't get too many entries,
+ *  and using the mask should deal with most pagesize differences
+ *  just ending up as a single full mask
+ *
+ *  The mask is always 64bits irrespective of the long size
+ *
+ *  Note the destination is free to discard *more* than we've asked
+ *  (e.g. rounding up to some convenient page size)
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  pagelist: one 8byte header word (empty) then len*(start,mask) pairs
+ *            The caller must have already put these into be64 format
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+
+    DPRINTF("send postcopy-ram-discard");
+    buf = g_malloc0(len*16 + strlen(name) + 3);
+    buf[0] = 0; /* Version */
+    buf[1] = offset;
+    assert(strlen(name) < 256);
+    buf[2] = strlen(name);
+    memcpy(buf+3, name, strlen(name));
+    tmplen = 3+strlen(name);
+    memcpy(buf + tmplen, pagelist, len*16);
+
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,
+                             tmplen + len*16, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive page data. */
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-listen");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-run");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL);
+}
+
+/* End of postcopy - with a status byte; 0 is good, anything else is a fail */
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status)
+{
+    DPRINTF("send postcopy-ram-end");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -935,6 +1014,220 @@ static LoadStateEntry_Head loadvm_handlers =
 static int qemu_loadvm_state_main(QEMUFile *f,
                                   LoadStateEntry_Head *loadvm_handlers);
 
+/* ------ incoming postcopy-ram messages ------ */
+/* 'advise' arrives before any RAM transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    /* Check this host can do it */
+    if (postcopy_ram_hosttest()) {
+        return -1;
+    }
+
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
+
+    /*
+     * Postcopy will be sending lots of small messages along the return path
+     * that it needs quick answers to.
+     */
+    socket_set_nodelay(qemu_get_fd(mis->return_path));
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ * Bits set in the message represent a page in the source VMs bitmap, but
+ * since the guest/target page sizes can be different on s/d then we have
+ * to convert.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    const int source_target_page_bits = 12; /* TODO */
+    unsigned int first_bit_offset;
+    char ramid[256];
+
+    DPRINTF("%s", __func__);
+
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    /* We're expecting a
+     *    3 byte header,
+     *    a RAM ID string
+     *    then at least 1 2x8 byte chunks
+    */
+    if (len < 19) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+    first_bit_offset = qemu_get_byte(mis->file);
+
+    if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len & 15) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    while (len) {
+        uint64_t startaddr, mask;
+        /*
+         * We now have pairs of address, mask
+         *   The address is in multiples of 64bit chunks in the source bitmask
+         *     ie multiply by 64 and then source-target-page-size to get bytes
+         *     '0' represents the chunk in which the RAMBlock starts for the
+         *     source and 'first_bit_offset' (see above) represents which bit in
+         *     that first word corresponds to the first page of the RAMBlock
+         *   The mask is 64 bits of bitmask starting at that offset into the
+         *   RAMBlock.
+         *
+         *   For example:
+         *      an address of 1 with a first_bit_offset of 12 indicates
+         *      page 1*64 - 12 = page 52 for bit 0 of the mask
+         *      Source guarantees that for address 0, bits <first_bit_offset
+         *      shall be 0
+         */
+        startaddr = qemu_get_be64(mis->file) * 64;
+        mask = qemu_get_be64(mis->file);
+
+        len -= 16;
+
+        while (mask) {
+            /* mask= .....?10...0 */
+            /*             ^fs    */
+            int firstset = ctz64(mask);
+
+            /* tmp64=.....?11...1 */
+            /*             ^fs    */
+            uint64_t tmp64 = mask | ((((uint64_t)1)<<firstset)-1);
+
+            /* mask= .?01..10...0 */
+            /*         ^fz ^fs    */
+            int firstzero = cto64(tmp64);
+
+            if ((startaddr == 0) && (firstset < first_bit_offset)) {
+                error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set"
+                               " prior to block; block=%s offset=%d"
+                               " firstset=%d\n", ramid, first_bit_offset,
+                               firstzero);
+                return -1;
+            }
+            /*
+             * we know there must be at least 1 bit set due to the loop entry
+             * If there is no 0 firstzero will be 64
+             */
+            /* TODO - ram_discard_range gets added in a later patch
+            int ret = ram_discard_range(mis, ramid, source_target_page_bits,
+                                startaddr + firstset - first_bit_offset,
+                                startaddr + (firstzero - 1) - first_bit_offset);
+             */
+            ret = -1; /* TODO */
+            if (ret) {
+                return ret;
+            }
+
+            /* mask= .?0000000000 */
+            /*         ^fz ^fs    */
+            if (firstzero != 64) {
+                mask &= (((uint64_t)-1) << firstzero);
+            } else {
+                mask = 0;
+            }
+        }
+    }
+    DPRINTF("%s finished", __func__);
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive page data */
+static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
+
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
+/* The end - with a byte from the source which can tell us to fail. */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_END in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    return -1; /* TODO - expecting 1 byte good/fail */
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -997,6 +1290,37 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_advise(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_listen(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_run(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_END:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_END",
+                                                   len, 1)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_end(mis);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 20/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 19/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 21/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
of migration stream to be sent in one go, and be received by
a separate instance of the loadvm loop while not interacting
with the migration stream.

This is used by postcopy to load device state (from the package)
while loading memory pages from the main stream.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 savevm.c                | 79 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0641cc2..abf0d63 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -86,6 +86,7 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
+    QEMU_VM_CMD_PACKAGED,      /* Send a wrapped stream within this stream */
 
     QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
                                               warn we might want to do PC */
@@ -100,6 +101,8 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -111,6 +114,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len, uint8_t offset,
diff --git a/savevm.c b/savevm.c
index 13d975d..171676d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -627,6 +627,38 @@ void qemu_savevm_send_openrp(QEMUFile *f)
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ */
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    tmp = cpu_to_be32(len);
+
+    DPRINTF("send_packaged");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+}
+
 /* Send prior to any RAM transfer */
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
 {
@@ -1241,6 +1273,45 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length,
+                                      LoadStateEntry_Head *loadvm_handlers)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    DPRINTF("loadvm_handle_cmd_packaged: length=%u", length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    DPRINTF("%s: Received %d package, going to load", __func__, ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, loadvm_handlers);
+    DPRINTF("%s: qemu_loadvm_state_main returned %d", __func__, ret);
+    qemu_fclose(packf); /* also frees the qsb */
+
+    return ret;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
@@ -1290,6 +1361,14 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32, loadvm_handlers);
+
     case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
                                                    len, 0)) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 21/47] migrate_init: Call from savevm
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 20/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 22/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 include/qemu/typedefs.h       | 1 +
 migration.c                   | 2 +-
 savevm.c                      | 2 ++
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2c078c4..3aeae47 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -140,6 +140,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0f79b5c..8539de6 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -16,6 +16,7 @@ struct Monitor;
 typedef struct Monitor Monitor;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 
 typedef struct Property Property;
 typedef struct PropertyInfo PropertyInfo;
diff --git a/migration.c b/migration.c
index 2fa128f..6097c3c 100644
--- a/migration.c
+++ b/migration.c
@@ -465,7 +465,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index 171676d..809bd04 100644
--- a/savevm.c
+++ b/savevm.c
@@ -941,6 +941,8 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 22/47] Allow savevm handlers to state whether they could go into postcopy
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 21/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 23/47] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use that to split the qemu_savevm_state_pending counts into postcopiable
and non-postcopiable amounts

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  7 +++++++
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  4 +++-
 migration.c                 |  9 ++++++++-
 savevm.c                    | 23 +++++++++++++++++++----
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2568369..315b88e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1190,6 +1190,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* RAM's always up for postcopying */
+static bool ram_can_postcopy(void *opaque)
+{
+    return true;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -1197,6 +1203,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
+    .can_postcopy = ram_can_postcopy,
 };
 
 void ram_mig_init(void)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a001bd..4991935 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,7 +54,7 @@ typedef struct SaveVMHandlers {
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
     uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    bool (*can_postcopy)(void *opaque);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index abf0d63..dc53580 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -109,7 +109,9 @@ void qemu_savevm_state_begin(QEMUFile *f,
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/migration.c b/migration.c
index 6097c3c..6c6ec03 100644
--- a/migration.c
+++ b/migration.c
@@ -840,8 +840,15 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
+            uint64_t pend_post, pend_nonpost;
+            DPRINTF("iterate\n");
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
             trace_migrate_pending(pending_size, max_size);
+            DPRINTF("pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64
+                    " nonpost=%" PRIu64 ")\n",
+                    pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/savevm.c b/savevm.c
index 809bd04..412bf9d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -903,10 +903,18 @@ void qemu_savevm_state_complete(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t res_nonpc = 0;
+    uint64_t res_pc = 0;
+    uint64_t tmp;
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -917,9 +925,16 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        tmp = se->ops->save_live_pending(f, se->opaque, max_size);
+
+        if (se->ops->can_postcopy(se->opaque)) {
+            res_pc += tmp;
+        } else {
+            res_nonpc += tmp;
+        }
     }
-    return ret;
+    *res_non_postcopiable = res_nonpc;
+    *res_postcopiable = res_pc;
 }
 
 void qemu_savevm_state_cancel(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 23/47] postcopy: OS support test
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 22/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 24/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 Makefile.objs                    |   2 +-
 include/migration/postcopy-ram.h |  19 +++++
 postcopy-ram.c                   | 151 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

diff --git a/Makefile.objs b/Makefile.objs
index 97db978..fa0a3a0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -54,7 +54,7 @@ common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
 common-obj-y += qemu-char.o #aio.o
 common-obj-y += block-migration.o
-common-obj-y += page_cache.o xbzrle.o
+common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..dcd1afa
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return 0 if the host supports everything we need to do postcopy-ram */
+int postcopy_ram_hosttest(void);
+
+#endif
diff --git a/postcopy-ram.c b/postcopy-ram.c
new file mode 100644
index 0000000..6017b4d
--- /dev/null
+++ b/postcopy-ram.c
@@ -0,0 +1,151 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+
+//#define DEBUG_POSTCOPY
+
+#ifdef DEBUG_POSTCOPY
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "postcopy@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+/* On Linux we use:
+ *    madvise MADV_USERFAULT - to mark an area of anonymous memory such
+ *                             that userspace is notifed of accesses to
+ *                             unallocated areas.
+ *    userfaultfd      - opens a socket to receive USERFAULT messages
+ *    remap_anon_pages - to shuffle mapped pages into previously unallocated
+ *                       areas without creating loads of VMAs.
+ */
+
+#include <sys/mman.h>
+#include <sys/types.h>
+
+/* TODO remove once we have libc defs
+ * NOTE: These are x86-64 numbers for Andrea's 3.15.0 world */
+#ifndef MADV_USERFAULT
+#define MADV_USERFAULT   18
+#define MADV_NOUSERFAULT 19
+#endif
+
+#ifndef __NR_remap_anon_pages
+#define __NR_remap_anon_pages 317
+#endif
+
+#ifndef __NR_userfaultfd
+#define __NR_userfaultfd 318
+#endif
+
+#ifndef USERFAULTFD_PROTOCOL
+#define USERFAULTFD_PROTOCOL (uint64_t)0xaa
+#endif
+
+int postcopy_ram_hosttest(void)
+{
+    /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
+     *
+     * Try each syscall we need, but this isn't a testbench,
+     * just enough to see that we have the calls
+     */
+    void *testarea = NULL, *testarea2 = NULL;
+    long pagesize = getpagesize();
+    int ufd = -1;
+    int ret = -1; /* Error unless we change it */
+
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (!testarea) {
+        perror("postcopy_ram_hosttest: Failed to map test area");
+        goto out;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        perror("postcopy_ram_hosttest: userfaultfd not available");
+        goto out;
+    }
+
+    if (madvise(testarea, pagesize, MADV_USERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
+        goto out;
+    }
+
+    if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
+        goto out;
+    }
+
+    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                     MAP_ANONYMOUS, -1, 0);
+    if (!testarea2) {
+        perror("postcopy_ram_hosttest: Failed to map second test area");
+        goto out;
+    }
+    g_assert(((size_t)testarea2 & (pagesize-1)) == 0);
+    *(char *)testarea = 0; /* Force the map of the new page */
+    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
+        pagesize) {
+        perror("postcopy_ram_hosttest: remap_anon_pages not available");
+        goto out;
+    }
+
+    /* Success! */
+    ret = 0;
+out:
+    if (testarea) {
+        munmap(testarea, pagesize);
+    }
+    if (testarea2) {
+        munmap(testarea2, pagesize);
+    }
+    if (ufd != -1) {
+        close(ufd);
+    }
+    return ret;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+int postcopy_ram_hosttest(void)
+{
+    error_report("postcopy_ram_hosttest: No OS support");
+    return -1;
+}
+
+#endif
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 24/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 23/47] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 25/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration.c                   | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d0943b1..53362b3 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -987,6 +987,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 4d1838e..e4a6189 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1077,6 +1077,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 4fd3c4a..4d59e6e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,6 +64,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3aeae47..b74121e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -100,6 +100,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    volatile bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index 6c6ec03..fb758d6 100644
--- a/migration.c
+++ b/migration.c
@@ -362,6 +362,28 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIG_STATE_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    s->start_postcopy = true;
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index 77e57bd..d639a78 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -538,6 +538,14 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.2
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..96eb854 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -686,6 +686,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_input_query_migrate_cache_size,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 25/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 24/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 26/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIG_STATE_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration.c                   | 74 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index b74121e..2ff9d35 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -147,6 +147,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_bytes_remaining(void);
diff --git a/migration.c b/migration.c
index fb758d6..eafd72a 100644
--- a/migration.c
+++ b/migration.c
@@ -38,13 +38,14 @@
     do { } while (0)
 #endif
 
-enum {
+enum MigrationPhase {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
     MIG_STATE_SETUP,
     MIG_STATE_CANCELLING,
     MIG_STATE_CANCELLED,
     MIG_STATE_ACTIVE,
+    MIG_STATE_POSTCOPY_ACTIVE,
     MIG_STATE_COMPLETED,
 };
 
@@ -246,6 +247,23 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+/* Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIG_STATE_ACTIVE:
+    case MIG_STATE_POSTCOPY_ACTIVE:
+    case MIG_STATE_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -309,6 +327,40 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIG_STATE_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->status = g_strdup("postcopy-active");
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIG_STATE_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -352,7 +404,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -421,7 +473,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIG_STATE_ACTIVE);
+    assert((s->state != MIG_STATE_ACTIVE) &&
+           (s->state != MIG_STATE_POSTCOPY_ACTIVE));
 
     if (s->state != MIG_STATE_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -454,7 +507,8 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
+        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE &&
+            old_state != MIG_STATE_POSTCOPY_ACTIVE) {
             break;
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
@@ -487,6 +541,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIG_STATE_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -535,7 +594,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIG_STATE_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -857,7 +916,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
-    while (s->state == MIG_STATE_ACTIVE) {
+    DPRINTF("setup complete\n");
+
+    while (s->state == MIG_STATE_ACTIVE ||
+           s->state == MIG_STATE_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 26/47] qemu_savevm_state_complete: Postcopy changes
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 25/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 27/47] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When postcopy calls qemu_savevm_state_complete it's not really
the end of migration, so skip:
   a) Finishing postcopiable iterative devices - they'll carry on
   b) The termination byte on the end of the stream.

We then also add:
  qemu_savevm_state_postcopy_complete
which is called at the end of a postcopy migration to call the
complete methods on devices skipped in the _complete call.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index dc53580..ce52c0a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -112,6 +112,7 @@ void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
+void qemu_savevm_state_postcopy_complete(QEMUFile *f);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/savevm.c b/savevm.c
index 412bf9d..d54286f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -845,10 +845,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
+/*
+ * Calls the complete routines just for those devices that are postcopiable;
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls the plain qemu_savevm_state_complete to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_postcopy_complete(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete ||
+            !se->ops->can_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_savevm_send_postcopy_ram_end(f, 0 /* Good */);
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+}
+
 void qemu_savevm_state_complete(QEMUFile *f)
 {
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete();
 
@@ -863,6 +904,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
                 continue;
             }
         }
+        if (in_postcopy && se->ops &&  se->ops->can_postcopy &&
+            se->ops->can_postcopy(se->opaque)) {
+            DPRINTF("%s: Skipping %s in postcopy", __func__, se->idstr);
+            continue;
+        }
         trace_savevm_section_start(se->idstr, se->section_id);
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_END);
@@ -899,7 +945,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
+
     qemu_fflush(f);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 27/47] Postcopy: Maintain sentmap during postcopy pre phase
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 26/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 28/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      | 162 ++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h    |   5 ++
 include/migration/postcopy-ram.h |  20 +++++
 migration.c                      |   2 +
 postcopy-ram.c                   | 156 +++++++++++++++++++++++++++++++++++++
 savevm.c                         |   3 -
 6 files changed, 342 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 315b88e..2c587aa 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -40,6 +40,7 @@
 #include "hw/audio/audio.h"
 #include "sysemu/kvm.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "hw/i386/smbios.h"
 #include "exec/address-spaces.h"
 #include "hw/audio/pcspk.h"
@@ -413,9 +414,15 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return bytes_sent;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * bitoffset: Pointer into which to store the offset into the dirty map
+ *            at which the bit was found.
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 unsigned long *bitoffset)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -434,6 +441,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *bitoffset = next;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -562,6 +570,19 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /*
  * ram_save_page: Send the given page to the stream
  *
@@ -650,13 +671,14 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
     bool complete_round = false;
     int bytes_sent = 0;
     MemoryRegion *mr;
+    unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -674,6 +696,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 
             /* if page is unmodified, continue to the next */
             if (bytes_sent > 0) {
+                MigrationState *s = migrate_get_current();
+                if (s->sentmap) {
+                    set_bit(bitoffset, s->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -733,12 +760,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -806,6 +840,123 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/*
+ * Utility for the outgoing postcopy code; this performs
+ * sentmap &= migration_bitmap
+ * returning the length of the bitmap
+ */
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    migration_bitmap_sync();
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap, ram_pages);
+    return ram_pages;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+int ram_postcopy_each_ram_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        unsigned long first, last;
+        first = block->offset >> TARGET_PAGE_BITS;
+        last = (block->offset + (block->length-1)) >> TARGET_PAGE_BITS;
+        ret = postcopy_send_discard_bm_ram(ms, block->idstr, first, last);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive range of bits indexed in the source
+ *    VMs bitmap for this RAMBlock, source_target_page_bits tells
+ *    us what one of those bits represents.
+ *
+ * start/end are offsets from the start of the bitmap for RAMBlock 'block_name'
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end)
+{
+    assert(end >= start);
+    unsigned int bitdif;
+
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        return -1;
+    }
+
+    if (source_target_page_bits != TARGET_PAGE_BITS) {
+        if (source_target_page_bits < TARGET_PAGE_BITS) {
+            /*
+             * e.g. source is 4K and we're 64k - we'll have to discard
+             * on the larger boundary
+             * e.g. a range of  70K...132K we would discard from
+             * 64K..192K, so round start down, and end up
+             */
+            bitdif = TARGET_PAGE_BITS - source_target_page_bits;
+            start = start >> bitdif;
+            if (end & ((1<<bitdif)-1)) {
+                end = end >> bitdif;
+                end++;
+            } else {
+                end = end >> bitdif;
+            }
+
+        } else {
+            /* e.g. source is 64K and we're 4K - easy just scale the indexes */
+            bitdif = source_target_page_bits - TARGET_PAGE_BITS;
+
+            start = start << bitdif;
+            end = end << bitdif;
+        }
+    }
+
+    uint64_t index_offset = rb->offset >> TARGET_PAGE_BITS;
+    postcopy_pmi_discard_range(mis, start + index_offset, (end - start) + 1);
+
+    /* +1 gives the byte after the end of the last page to be discarded */
+    ram_addr_t end_offset = (end+1) << TARGET_PAGE_BITS;
+    uint8_t *host_startaddr = rb->host + (start << TARGET_PAGE_BITS);
+    uint8_t *host_endaddr;
+
+    if (end_offset <= rb->length) {
+        host_endaddr   = rb->host + (end_offset-1);
+        return postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%zu/%zu/%zu)",
+                     block_name, start, end, rb->length);
+        return -1;
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -844,7 +995,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
         acct_clear();
     }
-
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
@@ -854,6 +1004,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2ff9d35..d307d1c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -172,6 +172,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms);
+int ram_postcopy_each_ram_discard(MigrationState *ms);
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index dcd1afa..fe89a3c 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -13,7 +13,27 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "migration/migration.h"
+
 /* Return 0 if the host supports everything we need to do postcopy-ram */
 int postcopy_ram_hosttest(void);
 
+/* Send the list of sent-but-dirty pages */
+int postcopy_send_discard_bitmap(MigrationState *ms);
+
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called back from arch_init's ram_postcopy_each_ram_discard to handle
+ * discarding one RAMBlock's pre-postcopy dirty pages
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end);
+
 #endif
diff --git a/migration.c b/migration.c
index eafd72a..7a80e7c 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
@@ -938,6 +939,7 @@ static void *migration_thread(void *opaque)
             } else {
                 int ret;
 
+                DPRINTF("done iterating\n");
                 qemu_mutex_lock_iothread();
                 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 6017b4d..ba6cf17 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,7 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
 
@@ -138,6 +139,21 @@ out:
     return ret;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        perror("postcopy_ram_discard_range MADV_DONTNEED");
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -147,5 +163,145 @@ int postcopy_ram_hosttest(void)
     return -1;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
+                               void *end)
+{
+    error_report("postcopy_ram_discard_range: No OS support");
+    return -1;
+}
+#endif
+
+/* ------------------------------------------------------------------------- */
+/*
+ * A helper to get 64 bits from the sentmap; trivial for HOST_LONG_BITS=64
+ * messier for other sizes; pads with 0's at end if an unaligned end
+ *   check2nd32: True if it's safe to read the upper 32bits in a 32bit long
+ *               map
+ */
+static uint64_t get_64bits_sentmap(unsigned long *sentmap, bool check2nd32,
+                                   int64_t start)
+{
+    uint64_t result;
+#if HOST_LONG_BITS == 64
+    result = sentmap[start / 64];
+#elif HOST_LONG_BITS == 32
+    /*
+     * Irrespective of host endianness, sentmap[n] is for pages earlier
+     * than sentmap[n+1] so we can't just cast up
+     */
+    uint32_t sm0, sm1;
+    sm0 = sentmap[start / 32];
+    sm1 = check2nd32 ? sentmap[(start / 32) + 1] : 0;
+    result = sm0 | ((uint64_t)sm1) << 32;
+#else
+#error "Host long other than 64/32 not supported"
 #endif
 
+    return result;
+}
+
+/*
+ * Callback from ram_postcopy_each_ram_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end)
+{
+    /* Keeps command under 256 bytes - but arbitrary */
+    const unsigned int max_entries_per_command = 12;
+    uint16_t cur_entry;
+    uint64_t buffer[2*max_entries_per_command];
+    unsigned int nsentwords = 0;
+    unsigned int nsentcmds = 0;
+
+    /*
+     * There is no guarantee that start, end are on convenient 64bit multiples
+     * (We always send 64bit chunks over the wire, irrespective of long size)
+     */
+    unsigned long first64, last64, cur64;
+    first64 = start / 64;
+    last64 = end / 64;
+
+    cur_entry = 0;
+    for (cur64 = first64; cur64 <= last64; cur64++) {
+        /* Deal with start/end not on alignment */
+        uint64_t mask;
+        mask = ~(uint64_t)0;
+
+        if ((cur64 == first64) && (start & 63)) {
+            /* e.g. (start & 63) = 3
+             *         1 << .    -> 2^3
+             *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+             *         ~.        -> mask 63..3
+             */
+            mask &= ~((((uint64_t)1) << (start & 63)) - 1);
+        }
+
+        if ((cur64 == last64) && ((end & 64) != 63)) {
+            /* e.g. (end & 64) = 3
+             *            .   +1 -> 4
+             *         1 << .    -> 2^4
+             *         . -1      -> 2^4 - 1
+             *                   = mask set 3..0
+             */
+            mask &= (((uint64_t)1) << ((end & 64) + 1)) - 1;
+        }
+
+        uint64_t data = get_64bits_sentmap(ms->sentmap,
+                                           (end & 64) >= 32, cur64 * 64);
+        data &= mask;
+
+        if (data) {
+            cpu_to_be64w(buffer+2*cur_entry, (cur64-first64));
+            cpu_to_be64w(buffer+1+2*cur_entry, data);
+            cur_entry++;
+            nsentwords++;
+
+            if (cur_entry == max_entries_per_command) {
+                /* Full set, ship it! */
+                qemu_savevm_send_postcopy_ram_discard(ms->file, name,
+                                                      cur_entry,
+                                                      start & 63,
+                                                      buffer);
+                nsentcmds++;
+                cur_entry = 0;
+            }
+        }
+    }
+
+    /* Anything unsent? */
+    if (cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, name, cur_entry,
+                                              start & 63, buffer);
+        nsentcmds++;
+    }
+
+    /*fprintf(stderr, "postcopy_send_discard_bm_ram: '%s' mask words"
+                      " sent=%d in %d commands.\n",
+            name, nsentwords, nsentcmds);*/
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     * (arch_init gives us the full size as a return)
+     */
+    ram_mask_postcopy_bitmap(ms);
+
+    DPRINTF("Dumping merged sentmap");
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
+#endif
+
+    return ram_postcopy_each_ram_discard(ms);
+}
+
diff --git a/savevm.c b/savevm.c
index d54286f..0cae88b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,12 +1244,9 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
              * we know there must be at least 1 bit set due to the loop entry
              * If there is no 0 firstzero will be 64
              */
-            /* TODO - ram_discard_range gets added in a later patch
             int ret = ram_discard_range(mis, ramid, source_target_page_bits,
                                 startaddr + firstset - first_bit_offset,
                                 startaddr + (firstzero - 1) - first_bit_offset);
-             */
-            ret = -1; /* TODO */
             if (ret) {
                 return ret;
             }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 28/47] Postcopy page-map-incoming (PMI) structure
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 27/47] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 29/47] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The PMI holds the state of each page on the incoming side,
so that we can tell if the page is missing, already received
or there is a request outstanding for it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  18 +++++
 include/migration/postcopy-ram.h |  10 +++
 include/qemu/typedefs.h          |   1 +
 postcopy-ram.c                   | 144 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d307d1c..b6e8cda 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -57,6 +57,23 @@ struct MigrationRetPathState {
 
 typedef struct MigrationState MigrationState;
 
+/* Postcopy page-map-incoming - data about each page on the inbound side */
+
+typedef enum {
+   POSTCOPY_PMI_MISSING,   /* page hasn't yet been received */
+   POSTCOPY_PMI_REQUESTED, /* Kernel asked for a page, but we've not got it */
+   POSTCOPY_PMI_RECEIVED   /* We've got the page */
+} PostcopyPMIState;
+
+struct PostcopyPMI {
+    /* TODO: I'm expecting to rework this using some atomic compare-exchange
+     * thing, which will require merging the maps together
+     */
+    QemuMutex      mutex;
+    unsigned long *received_map;  /* Pages that we have received */
+    unsigned long *requested_map; /* Pages that we're sending a request for */
+};
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -71,6 +88,7 @@ struct MigrationIncomingState {
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    PostcopyPMI    postcopy_pmi;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index fe89a3c..04c2ec8 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -36,4 +36,14 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
                                  unsigned long start, unsigned long end);
 
+/*
+ * In 'advise' mode record that a page has been received.
+ */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index);
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis);
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages);
+void postcopy_pmi_dump(MigrationIncomingState *mis);
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 8539de6..61b330c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -77,6 +77,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
+typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
 typedef struct AdapterInfo AdapterInfo;
 
diff --git a/postcopy-ram.c b/postcopy-ram.c
index ba6cf17..86ce68c 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,8 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
@@ -74,6 +76,140 @@
 #define USERFAULTFD_PROTOCOL (uint64_t)0xaa
 #endif
 
+/* ---------------------------------------------------------------------- */
+/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
+/* state of each page used by the inbound postcopy                        */
+
+static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    qemu_mutex_init(&mis->postcopy_pmi.mutex);
+    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
+    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
+    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
+    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
+}
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis)
+{
+    if (mis->postcopy_pmi.received_map) {
+        g_free(mis->postcopy_pmi.received_map);
+        mis->postcopy_pmi.received_map = NULL;
+    }
+    if (mis->postcopy_pmi.requested_map) {
+        g_free(mis->postcopy_pmi.requested_map);
+        mis->postcopy_pmi.requested_map = NULL;
+    }
+    qemu_mutex_destroy(&mis->postcopy_pmi.mutex);
+}
+
+/*
+ * Mark a set of pages in the PMI as being clear; this is used by the discard
+ * at the start of postcopy, and before the postcopy stream starts.
+ */
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages)
+{
+    bitmap_clear(mis->postcopy_pmi.received_map, start, npages);
+}
+
+/*
+ * Retrieve the state of the given page
+ * Note: This version for use by callers already holding the lock
+ */
+static PostcopyPMIState postcopy_pmi_get_state_nolock(
+                            MigrationIncomingState *mis,
+                            size_t bitmap_index)
+{
+    bool received, requested;
+
+    received = test_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    requested = test_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+
+    if (received) {
+        assert(!requested);
+        return POSTCOPY_PMI_RECEIVED;
+    } else {
+        return requested ? POSTCOPY_PMI_REQUESTED : POSTCOPY_PMI_MISSING;
+    }
+}
+
+/* Retrieve the state of the given page */
+static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
+                                               size_t bitmap_index)
+{
+    PostcopyPMIState ret;
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    ret = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+
+    return ret;
+}
+
+/*
+ * Set the page state to the given state if the previous state was as expected
+ * Return the actual previous state.
+ */
+static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
+                                           size_t bitmap_index,
+                                           PostcopyPMIState expected_state,
+                                           PostcopyPMIState new_state)
+{
+    PostcopyPMIState old_state;
+
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    old_state = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+
+    if (old_state == expected_state) {
+        switch (new_state) {
+        case POSTCOPY_PMI_MISSING:
+          assert(0); /* This shouldn't actually happen - use discard_range */
+          break;
+
+        case POSTCOPY_PMI_REQUESTED:
+          assert(old_state == POSTCOPY_PMI_MISSING);
+          set_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+
+        case POSTCOPY_PMI_RECEIVED:
+          assert(old_state == POSTCOPY_PMI_MISSING ||
+                 old_state == POSTCOPY_PMI_REQUESTED);
+          set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+          clear_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+        }
+    }
+
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+    return old_state;
+}
+
+/*
+ * Useful when debugging postcopy, although if it failed early the
+ * received map can be quite sparse and thus big when dumped.
+ */
+void postcopy_pmi_dump(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_pmi_dump: requested\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.requested_map, false);
+    fprintf(stderr, "postcopy_pmi_dump: received\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
+    fprintf(stderr, "postcopy_pmi_dump: end\n");
+}
+
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * If we're in precopy-advise mode we need to track received pages even
+         * though we don't need to place pages atomically yet.
+         * In advise mode there's only a single thread, so don't need locks
+         */
+        set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    }
+}
+
 int postcopy_ram_hosttest(void)
 {
     /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
@@ -169,6 +305,14 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
     error_report("postcopy_ram_discard_range: No OS support");
     return -1;
 }
+
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    /* We don't support postcopy so don't care */
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 29/47] postcopy: Add incoming_init/cleanup functions
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 28/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide functions to be called before the start of a postcopy
enabled migration (even if it's not eventually used) and
at the end.

During the init we must disable huge pages in the RAM that
we will receive postcopy data into, since if they start off
as hugepage and get a 4k page written to them, the rest of
the hugepage won't get userfault'd and won't work as a destination
for remap.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  12 +++++
 postcopy-ram.c                   | 110 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 122 insertions(+)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 04c2ec8..c2d2e64 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -30,6 +30,18 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Called back from arch_init's ram_postcopy_each_ram_discard to handle
  * discarding one RAMBlock's pre-postcopy dirty pages
  */
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 86ce68c..b1917e7 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -290,6 +290,104 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can use remap_anon_pages to put
+     * a 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        perror("init_area: NOHUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    /* Turn off userfault here as well? */
+
+    DPRINTF("cleanup_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        perror("init_area: HUGEPAGE");
+        return -1;
+    }
+
+    /*
+     * We can also turn off userfault now since we should have all the
+     * pages.   It can be useful to leave it on to debug postcopy
+     * if you're not sure it's always getting every page.
+     */
+    if (madvise(host_addr, length, MADV_NOUSERFAULT)) {
+        perror("init_area: NOUSERFAULT");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    postcopy_pmi_init(mis, ram_pages);
+
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -313,6 +411,18 @@ void postcopy_hook_early_receive(MigrationIncomingState *mis,
     /* We don't support postcopy so don't care */
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support\n");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    error_report("postcopy_ram_incoming_cleanup: No OS support\n");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 30/47] postcopy: Incoming initialisation
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 29/47] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 11 +++++++++++
 include/migration/migration.h |  1 +
 migration.c                   |  1 +
 3 files changed, 13 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 2c587aa..d4144e4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1234,6 +1234,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index b6e8cda..91269c8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -195,6 +195,7 @@ int ram_postcopy_each_ram_discard(MigrationState *ms);
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       int source_target_page_bits,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration.c b/migration.c
index 7a80e7c..f5d9d02 100644
--- a/migration.c
+++ b/migration.c
@@ -99,6 +99,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 
 void migration_incoming_state_destroy(void)
 {
+    postcopy_pmi_destroy(mis_current);
     g_free(mis_current);
     mis_current = NULL;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 31/47] postcopy: ram_enable_notify to switch on userfault
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 32/47] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  5 +++++
 postcopy-ram.c                   | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index c2d2e64..18fc4d9 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -28,6 +28,11 @@ int postcopy_send_discard_bitmap(MigrationState *ms);
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                uint8_t *end);
 
+/*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/postcopy-ram.c b/postcopy-ram.c
index b1917e7..18b27b4 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -388,9 +388,38 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: Unused
+ * Returns 0 on success
+ */
+static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
+                                       ram_addr_t offset, ram_addr_t length,
+                                       void *opaque)
+{
+    if (madvise(host_addr, length, MADV_USERFAULT)) {
+        perror("postcopy_ram_sensitise_area");
+        return -1;
+    }
+    return 0;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 int postcopy_ram_hosttest(void)
 {
     error_report("postcopy_ram_hosttest: No OS support");
@@ -423,6 +452,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return -1;
 }
 
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
+    return -1;
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 32/47] Postcopy: postcopy_start
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 33/47] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_start:
  Perform all the initialisation associated with starting up postcopy
mode from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/migration.c b/migration.c
index f5d9d02..1ae5b7d 100644
--- a/migration.c
+++ b/migration.c
@@ -899,6 +899,91 @@ static void await_outgoing_return_path_close(MigrationState *ms)
     DPRINTF("%s: Exit", __func__);
 }
 
+/* Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
+
+    DPRINTF("postcopy_start\n");
+    qemu_mutex_lock_iothread();
+    DPRINTF("postcopy_start: setting run state\n");
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (postcopy_send_discard_bitmap(ms)) {
+        DPRINTF("postcopy send discard bitmap failed\n");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    DPRINTF("postcopy_start: sending req 2\n");
+    qemu_savevm_send_reqack(ms->file, 2);
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    DPRINTF("postcopy_start: do state_complete\n");
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+
+    qemu_savevm_state_complete(fb);
+    DPRINTF("postcopy_start: sending req 3\n");
+    qemu_savevm_send_reqack(fb, 3);
+
+    qemu_savevm_send_postcopy_ram_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
+        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
+                (unsigned long)(qsb_get_length(qsb)));
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        qemu_fclose(fb);
+        return -1;
+    }
+    qemu_savevm_send_packaged(ms->file, qsb);
+    qemu_fclose(fb);
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("postcopy_start not finished sending ack\n");
+    qemu_savevm_send_reqack(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+    }
+
+    return ret;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 33/47] Postcopy: Rework migration thread for postcopy mode
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 32/47] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 34/47] mig fd_connect: open return path Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Switch to postcopy if:
   1) There's still a significant amount to transfer
   2) Postcopy is enabled
   3) migrate_postcopy_start has been issued.

and change the cleanup at the end of migration to match.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 74 insertions(+), 16 deletions(-)

diff --git a/migration.c b/migration.c
index 1ae5b7d..623a056 100644
--- a/migration.c
+++ b/migration.c
@@ -991,16 +991,36 @@ static int postcopy_start(MigrationState *ms)
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
+
     bool old_vm_running = false;
 
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
+
     qemu_savevm_state_begin(s->file, &s->params);
 
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open it's end so it can reply */
+        qemu_savevm_send_openrp(s->file);
+
+        /* And ask it to send an ack that will make stuff easier to debug */
+        qemu_savevm_send_reqack(s->file, 1);
+
+        /* Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_ram_advise(s->file);
+    }
+
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIG_STATE_ACTIVE;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
     DPRINTF("setup complete\n");
@@ -1021,37 +1041,74 @@ static void *migration_thread(void *opaque)
                     " nonpost=%" PRIu64 ")\n",
                     pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
+                    pend_nonpost == 0 && s->start_postcopy) {
+
+                    if (!postcopy_start(s)) {
+                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
+                    }
+
+                    continue;
+                }
+                /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
                 int ret;
 
-                DPRINTF("done iterating\n");
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
+                DPRINTF("done iterating pending size %" PRIu64 "\n",
+                        pending_size);
+
+                if (s->state == MIG_STATE_ACTIVE) {
+                    qemu_mutex_lock_iothread();
+                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+                    old_vm_running = runstate_is_running();
+
+                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+                    if (ret >= 0) {
+                        qemu_file_set_rate_limit(s->file, INT64_MAX);
+                        qemu_savevm_state_complete(s->file);
+                    }
+                    qemu_mutex_unlock_iothread();
+
+                    if (ret < 0) {
+                        migrate_set_state(s, current_active_type,
+                                          MIG_STATE_ERROR);
+                        break;
+                    }
+                } else if (s->state == MIG_STATE_POSTCOPY_ACTIVE) {
+                    DPRINTF("postcopy end\n");
+
+                    qemu_savevm_state_postcopy_complete(s->file);
+                    DPRINTF("postcopy end after complete\n");
 
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
                 }
-                qemu_mutex_unlock_iothread();
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
-                    break;
+                /*
+                 * If rp was opened we must clean up the thread before
+                 * cleaning everything else up.
+                 * Postcopy opens rp if enabled (even if it's not avtivated)
+                 */
+                if (migrate_postcopy_ram()) {
+                    DPRINTF("before rp close");
+                    await_outgoing_return_path_close(s);
+                    DPRINTF("after rp close");
                 }
-
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
+                    migrate_set_state(s, current_active_type,
+                                      MIG_STATE_COMPLETED);
                     break;
                 }
             }
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
+            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
+            DPRINTF("migration_thread: file is in error state\n");
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1082,6 +1139,7 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    DPRINTF("migration_thread: After loop");
     qemu_mutex_lock_iothread();
     if (s->state == MIG_STATE_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 34/47] mig fd_connect: open return path
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 33/47] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 35/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open the return path before migration thread creation.
Since this can fail, guard the fd cleanup so it doesn't
try and destroy the potentially non-existent thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 migration.c                   | 18 +++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 91269c8..dbdf785 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -121,6 +121,9 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     volatile bool start_postcopy;
+    /* Flag set once the migration thread is running (and needs joining) */
+    volatile bool started_migration_thread;
+
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index 623a056..8ab378f 100644
--- a/migration.c
+++ b/migration.c
@@ -468,7 +468,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->started_migration_thread) {
+            qemu_thread_join(&s->thread);
+            s->started_migration_thread = false;
+        }
         qemu_mutex_lock_iothread();
 
         qemu_fclose(s->file);
@@ -1177,6 +1180,19 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /* Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_outgoing_return_path(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->started_migration_thread = true;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 35/47] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 34/47] mig fd_connect: open return path Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 36/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 postcopy-ram.c                | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index dbdf785..07e15d7 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -86,6 +86,9 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 18b27b4..4f0d233 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -408,8 +408,31 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
     return 0;
 }
 
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 36/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 35/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 37/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 ++
 migration.c                   | 74 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 07e15d7..8e8fdf2 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,7 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
     MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_REQPAGES,     /* data (start: be64, len: be64) */
     MIG_RPCOMM_AFTERLASTVALID
 };
 
@@ -240,6 +241,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_ack(MigrationIncomingState *mis,
                          uint32_t value);
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index 8ab378f..b8df458 100644
--- a/migration.c
+++ b/migration.c
@@ -144,6 +144,38 @@ void migrate_send_rp_ack(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = (uint64_t)start;
+    buf64[0] = cpu_to_be64(buf64[0]);
+    buf64[1] = (uint64_t)len;
+    buf64[1] = cpu_to_be64(buf64[1]);
+    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -777,6 +809,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  */
@@ -788,6 +831,8 @@ static void *source_return_path_thread(void *opaque)
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
+    char *tmpstr;
     int res;
 
     DPRINTF("RP: %s entry", __func__);
@@ -803,6 +848,11 @@ static void *source_return_path_thread(void *opaque)
             expected_len = 4;
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
                     header_com, header_len);
@@ -850,6 +900,30 @@ static void *source_return_path_thread(void *opaque)
             atomic_xchg(&ms->rp_state.latest_ack, tmp32);
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
+            tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
+            tmpstr = NULL;
+            if (tmp64b & 1) {
+                tmp64b -= 1; /* Remove the flag */
+                /* Now we expect an idstr */
+                tmp32 = buf[16]; /* Length of the following idstr */
+                tmpstr = (char *)&buf[17];
+                buf[17+tmp32] = '\0';
+                expected_len = 16+1+tmp32;
+            } else {
+                expected_len = 16;
+            }
+            if (header_len != expected_len) {
+                error_report("RP: Received ReqPage with length %d expecting %d",
+                        header_len, expected_len);
+                source_return_path_bad(ms);
+            }
+            migrate_handle_rp_reqpages(ms, tmpstr,
+                                          (ram_addr_t)tmp64a,
+                                          (ram_addr_t)tmp64b);
+            break;
+
         default:
             /* This shouldn't happen because we should catch this above */
             DPRINTF("RP: Bad header_com in dispatch");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 37/47] Page request: Process incoming page request
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 36/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 38/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQPAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h | 27 ++++++++++++++++++++++
 include/qemu/typedefs.h       |  3 ++-
 migration.c                   | 34 +++++++++++++++++++++++++++-
 4 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d4144e4..9401648 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -658,6 +658,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no previous block");
+            return -1;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            return -1;
+        }
+    }
+    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
+                    ramblock->idstr, start, len);
+
+    if (start+len > ramblock->length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->length);
+        return -1;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return 0;
+}
+
+/*
  * ram_find_and_save_block: Finds a page to send and sends it to f
  *
  * Returns:  The number of bytes written.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8e8fdf2..472767f 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -99,6 +99,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -128,6 +140,18 @@ struct MigrationState
     /* Flag set once the migration thread is running (and needs joining) */
     volatile bool started_migration_thread;
 
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     *    Where it's used to send the dirtymap at the start
+     *    of the postcopy phase, then cleared
+     */
+    unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -263,4 +287,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              int *bytes_sent);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 61b330c..d57acc5 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 
+typedef struct AdapterInfo AdapterInfo;
 typedef struct AioContext AioContext;
 
 typedef struct Visitor Visitor;
@@ -79,6 +80,6 @@ typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
 typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
-typedef struct AdapterInfo AdapterInfo;
+typedef struct RAMBlock RAMBlock;
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration.c b/migration.c
index b8df458..9c0f926 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 //#define DEBUG_MIGRATION
 
@@ -497,6 +499,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -603,6 +614,9 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIG_STATE_SETUP;
     trace_migrate_set_state(MIG_STATE_SETUP);
 
+    qemu_mutex_init(&s->src_page_req_mutex);
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -816,7 +830,25 @@ static void source_return_path_bad(MigrationState *s)
 static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
+            rbname, start, len);
+
+    /* Round everything up to our host page size */
+    long our_host_ps = sysconf(_SC_PAGESIZE);
+    if (start & (our_host_ps-1)) {
+        long roundings = start & (our_host_ps-1);
+        start -= roundings;
+        len += roundings;
+    }
+    if (len & (our_host_ps-1)) {
+        long roundings = len & (our_host_ps-1);
+        len -= roundings;
+        len += our_host_ps;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 38/47] Page request: Consume pages off the post-copy queue
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 37/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 39/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 106 insertions(+), 24 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9401648..d0ee627 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -458,6 +458,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
     return ret;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     ram_addr_t addr;
@@ -658,6 +671,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:   The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:   MigrationState in
+ *  offset:   the byte offset within the RAMBlock for the start of the page
+ * bitoffset: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       unsigned long *bitoffset)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *bitoffset = (entry->offset + entry->rb->offset) >> TARGET_PAGE_BITS;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+/*
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
  *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
@@ -718,44 +764,80 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int bytes_sent = 0;
-    MemoryRegion *mr;
     unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
-        }
-        if (offset >= block->length) {
-            offset = 0;
-            block = QTAILQ_NEXT(block, next);
-            if (!block) {
-                block = QTAILQ_FIRST(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
+                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
+            /* We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again.
+             */
+            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
+                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
+                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
+                        test_bit(bitoffset, ms->sentmap));
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We mustn't change block/offset unless it's to a valid one
+             * otherwise we can go down some of the exit cases in the normal
+             * path.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
         } else {
-            bytes_sent = ram_save_page(f, block, offset, last_stage);
-
-            /* if page is unmodified, continue to the next */
-            if (bytes_sent > 0) {
-                MigrationState *s = migrate_get_current();
-                if (s->sentmap) {
-                    set_bit(bitoffset, s->sentmap);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                           &bitoffset);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
+            }
+            if (offset >= block->length) {
+                offset = 0;
+                block = QTAILQ_NEXT(block, next);
+                if (!block) {
+                    block = QTAILQ_FIRST(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
                 }
+                continue; /* pick an offset in the new block */
+            }
+        }
 
-                last_sent_block = block;
-                break;
+        /* We have a page to send, so send it */
+        bytes_sent = ram_save_page(f, block, offset, last_stage);
+
+        /* if page is unmodified, continue to the next */
+        if (bytes_sent > 0) {
+            if (ms->sentmap) {
+                set_bit(bitoffset, ms->sentmap);
             }
+
+            last_sent_block = block;
+            break;
         }
     }
     last_seen_block = block;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 39/47] Add assertion to check migration_dirty_pages
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 38/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 40/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

I've seen it go negative once during dev, it shouldn't
happen.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch_init.c b/arch_init.c
index d0ee627..d592579 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -439,6 +439,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 
     if (next < size) {
         clear_bit(next, migration_bitmap);
+        assert(migration_dirty_pages > 0);
         migration_dirty_pages--;
     }
     *bitoffset = next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 40/47] postcopy_ram.c: place_page and helpers
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 39/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 41/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the new remap_anon_pages syscall).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |   1 +
 include/migration/postcopy-ram.h |  23 +++++++++
 postcopy-ram.c                   | 105 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 472767f..437f783 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -93,6 +93,7 @@ struct MigrationIncomingState {
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
+    void          *postcopy_tmp_page;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 18fc4d9..4459f73 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -59,6 +59,29 @@ int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
 void postcopy_hook_early_receive(MigrationIncomingState *mis,
                                  size_t bitmap_index);
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset);
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 void postcopy_pmi_destroy(MigrationIncomingState *mis);
 void postcopy_pmi_discard_range(MigrationIncomingState *mis,
                                 size_t start, size_t npages);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 4f0d233..739403c 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -385,6 +385,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -441,6 +445,88 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset)
+{
+    void *tmp = postcopy_get_tmp_page(mis);
+    if (!tmp) {
+        return -ENOMEM;
+    }
+    *(char *)tmp = 0;
+    return postcopy_place_page(mis, host, tmp, bitmap_offset);
+}
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset)
+{
+    PostcopyPMIState old_state, tmp_state;
+
+    if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
+            getpagesize()) {
+        perror("remap_anon_pages in postcopy_place_page");
+        fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
+                postcopy_pmi_get_state(mis, bitmap_offset));
+
+        return -errno;
+    }
+
+    tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
+    do {
+        old_state = tmp_state;
+        tmp_state = postcopy_pmi_change_state(mis, bitmap_offset, old_state,
+                                              POSTCOPY_PMI_RECEIVED);
+
+    } while (old_state != tmp_state);
+
+
+    if (old_state == POSTCOPY_PMI_REQUESTED) {
+        /* TODO: Notify kernel */
+    }
+
+    /* TODO: hostpagesize!=targetpagesize case */
+    return 0;
+}
+
+/*
+ * Returns a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            perror("mapping postcopy tmp page");
+            return NULL;
+        }
+        if (madvise(mis->postcopy_tmp_page, getpagesize(), MADV_DONTFORK)) {
+            munmap(mis->postcopy_tmp_page, getpagesize());
+            perror("postcpy tmp page DONTFORK");
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 int postcopy_ram_hosttest(void)
@@ -480,6 +566,25 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
     return -1;
 }
+
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host)
+{
+    error_report("postcopy_place_zero_page: No OS support");
+    return -1;
+}
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
+{
+    error_report("postcopy_place_page: No OS support");
+    return -1;
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    error_report("postcopy_get_tmp_page: No OS support");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 41/47] Postcopy: Use helpers to map pages during migration
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 40/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 42/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d592579..b449f6d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1328,9 +1328,20 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ * rb: Pointer to RAMBlock* that gets filled in with the RB we find
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
-                                            int flags)
+                                            int flags, RAMBlock **rb)
 {
     static RAMBlock *block = NULL;
     char id[256];
@@ -1341,8 +1352,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
+        if (rb) {
+            *rb = block;
+        }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        goto gotit;
     }
 
     len = qemu_get_byte(f);
@@ -1350,12 +1364,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
-            return memory_region_get_ram_ptr(block->mr) + offset;
+        if (!strncmp(id, block->idstr, sizeof(id))) {
+            if (rb) {
+                *rb = block;
+            }
+            goto gotit;
+        }
     }
 
     error_report("Can't find block %s!", id);
     return NULL;
+
+gotit:
+    postcopy_hook_early_receive(mis,
+        (offset + (*rb)->offset) >> TARGET_PAGE_BITS);
+    return memory_region_get_ram_ptr(block->mr) + offset;
+
 }
 
 /*
@@ -1385,6 +1409,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     ram_addr_t addr;
     int flags, ret = 0;
     static uint64_t seq_iter;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    bool postcopy_running = mis->postcopy_ram_state >=
+                            POSTCOPY_RAM_INCOMING_LISTENING;
 
     seq_iter++;
 
@@ -1439,8 +1470,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         } else if (flags & RAM_SAVE_FLAG_COMPRESS) {
             void *host;
             uint8_t ch;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -1448,20 +1480,63 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
 
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                if (!ch) {
+                    ret = postcopy_place_zero_page(mis, host,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                } else {
+                    void *tmp;
+                    tmp = postcopy_get_tmp_page(mis);
+                    if (!tmp) {
+                        return -ENOMEM;
+                    }
+                    memset(tmp, ch, TARGET_PAGE_SIZE);
+                    ret = postcopy_place_page(mis, host, tmp,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                }
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy compress @"
+                                 "%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_PAGE) {
             void *host;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
 
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            } else {
+                void *tmp = postcopy_get_tmp_page(mis);
+                if (!tmp) {
+                    return -ENOMEM;
+                }
+                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
+                ret = postcopy_place_page(mis, host, tmp,
+                          (addr + rb->offset) >> TARGET_PAGE_BITS);
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy simple"
+                                 "@%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
-            void *host = host_from_stream_offset(f, addr, flags);
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
+            }
+            void *host = host_from_stream_offset(f, mis, addr, flags, NULL);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 42/47] qemu_ram_block_from_host
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 41/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2014-08-28 15:03 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 43/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock, the global ram_addr_t value and it's bitmap position.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since it's the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 56 ++++++++++++++++++++++++++++++++++++++++++-----
 include/exec/cpu-common.h |  4 ++++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 8b95502..7c43485 100644
--- a/exec.c
+++ b/exec.c
@@ -1176,6 +1176,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
     RAMBlock *new_block = find_ram_block(addr);
@@ -1515,16 +1520,35 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
+ *                          isn't available)
+ *
+ * Returns: RAMBlock (or NULL if not found)
+ */
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset,
+                                   unsigned long *bm_index)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
 
     if (xen_enabled()) {
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        return qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (!block) {
+            return NULL;
+        }
+        *offset = (host - block->host);
+        return block;
     }
 
     block = ram_list.mru_block;
@@ -1545,7 +1569,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
+    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+    unsigned long index; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset, &index);
+
+    if (!block) {
+        return NULL;
+    }
+
     return block->mr;
 }
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 8042f50..ae25407 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -55,8 +55,12 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset,
+                                   unsigned long *bm_index);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 43/47] Don't sync dirty bitmaps in postcopy
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 42/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2014-08-28 15:04 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index b449f6d..0694394 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -984,6 +984,7 @@ int64_t ram_mask_postcopy_bitmap(MigrationState *ms)
 {
     int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
+    /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
     bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap, ram_pages);
     return ram_pages;
@@ -1251,7 +1252,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     qemu_mutex_lock_ramlist();
-    migration_bitmap_sync();
+
+    if (!migration_postcopy_phase(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1284,7 +1288,8 @@ static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_postcopy_phase(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         migration_bitmap_sync();
         qemu_mutex_unlock_iothread();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 44/47] Postcopy; Handle userfault requests
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 43/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2014-08-28 15:04 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages marked as MADV_USERFAULT, and
allows the program to acknowledge those stalls and tell the accessing
thread to carry on.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   6 ++
 postcopy-ram.c                | 219 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 216 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 437f783..1adbcf0 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -87,9 +87,15 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    /* For the kernel to send us notifications */
+    int            userfault_fd;
+    /* To tell the fault_thread to quit */
+    int            userfault_quit_fd;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 739403c..4cf2c68 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -54,6 +54,8 @@
  *                       areas without creating loads of VMAs.
  */
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 
@@ -380,7 +382,31 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
+    DPRINTF("%s: entry", __func__);
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to inc it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            DPRINTF("%s: Joining fault thread", __func__);
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            perror("incing userfault_quit_fd");
+        }
+
+        DPRINTF("%s: closing uf", __func__);
+        close(mis->userfault_fd);
+        close(mis->userfault_quit_fd);
+        mis->have_fault_thread = false;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_END;
+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
+
     if (qemu_ram_foreach_block(cleanup_area, mis)) {
         return -1;
     }
@@ -389,6 +415,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    DPRINTF("%s: exit", __func__);
     return 0;
 }
 
@@ -413,35 +440,205 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
 }
 
 /*
+ * Tell the kernel that we've now got some memory it previously asked for.
+ * Note: We're not allowed to ack a page which wasn't requested.
+ */
+static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
+{
+    uint64_t tmp[2];
+
+    /* Kernel wants the range that's now safe to access */
+    tmp[0] = (uint64_t)start;
+    tmp[1] = (uint64_t)start + (uint64_t)(len-1);
+
+    if (write(mis->userfault_fd, tmp, 16) != 16) {
+        int e = errno;
+
+        if (e == ENOENT) {
+            /* Kernel said it wasn't waiting - one case where this can
+             * happen is where two threads triggered the userfault
+             * and we receive the page and ack it just after we received
+             * the 2nd request and that ends up deciding it should ack it
+             * We could optimise it out, but it's rare.
+             */
+            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
+            return 0;
+        }
+        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
+                     start, len, e);
+        return -errno;
+    }
+
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+    void *hostaddr;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL;
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    DPRINTF("%s", __func__);
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
-    }
+    while (true) {
+        PostcopyPMIState old_state, tmp_state;
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        unsigned long bitmap_index;
+        struct pollfd pfd[2];
+
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            perror("userfault poll");
+            break;
+        }
 
+        if (pfd[1].revents) {
+            DPRINTF("%s got quit event", __func__);
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
+        if (ret != sizeof(hostaddr)) {
+            if (ret < 0) {
+                perror("Failed to read full userfault hostaddr");
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %ld",
+                             __func__, ret, sizeof(hostaddr));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+
+        /* TODO: We want to be marking host-page-size areas of the bitmaps? */
+        last_rb = rb;
+        rb = qemu_ram_block_from_host(hostaddr, true, &in_raspace, &rb_offset,
+                                      &bitmap_index);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %p",
+                         hostaddr);
+            break;
+        }
+
+        DPRINTF("%s: Request for HVA=%p index=%lx rb=%s offset=%zx",
+                __func__, hostaddr, bitmap_index, qemu_ram_get_idstr(rb),
+                rb_offset);
+
+        tmp_state = postcopy_pmi_get_state(mis, bitmap_index);
+        do {
+            old_state = tmp_state;
+
+            switch (old_state) {
+            case POSTCOPY_PMI_REQUESTED:
+                /* Do nothing - it's already requested */
+                break;
+
+            case POSTCOPY_PMI_RECEIVED:
+                /* Already arrived - no state change, just kick the kernel */
+                DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
+                        hostaddr);
+                if (ack_userfault(mis, hostaddr, hostpagesize)) {
+                    assert(0);
+                }
+                break;
+
+            case POSTCOPY_PMI_MISSING:
+
+                tmp_state = postcopy_pmi_change_state(mis, bitmap_index,
+                                           old_state, POSTCOPY_PMI_REQUESTED);
+                if (tmp_state == POSTCOPY_PMI_MISSING) {
+                    /*
+                     * Send the request to the source - we want to request one
+                     * of our host page sizes (which is >= TPS)
+                     */
+                    if (rb != last_rb) {
+                        migrate_send_rp_reqpages(mis, qemu_ram_get_idstr(rb),
+                                                 rb_offset, hostpagesize);
+                    } else {
+                        /* Save some space */
+                        migrate_send_rp_reqpages(mis, NULL,
+                                                 rb_offset, hostpagesize);
+                    }
+                }
+                break;
+           }
+        } while (tmp_state != old_state);
+    }
+    DPRINTF("%s: exit", __func__);
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    uint64_t tmp64;
+
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (mis->userfault_fd == -1) {
+        perror("Failed to open userfault fd");
+        return -1;
+    }
+
+    /*
+     * Version handshake, we send it the version we want and expect to get the
+     * same back.
+     */
+    tmp64 = USERFAULTFD_PROTOCOL;
+    if (write(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Writing userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (read(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Reading userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (tmp64 != USERFAULTFD_PROTOCOL) {
+        error_report("Mismatched userfaultfd version, expected %zx, got %zx",
+                     (size_t)USERFAULTFD_PROTOCOL, (size_t)tmp64);
+        close(mis->userfault_fd);
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        perror("Opening userfault_quit_fd");
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
         return -1;
     }
 
+    DPRINTF("postcopy_ram_enable_notify: Sensitised");
+
     return 0;
 }
 
@@ -475,11 +672,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
     if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
             getpagesize()) {
+        int e = errno;
         perror("remap_anon_pages in postcopy_place_page");
         fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
                 postcopy_pmi_get_state(mis, bitmap_offset));
 
-        return -errno;
+        return -e;
     }
 
     tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
@@ -492,7 +690,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
 
     if (old_state == POSTCOPY_PMI_REQUESTED) {
-        /* TODO: Notify kernel */
+        /* Send the kernel the host address that should now be accessible */
+        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
+                __func__, bitmap_offset, host);
+        return ack_userfault(mis, host, getpagesize());
     }
 
     /* TODO: hostpagesize!=targetpagesize case */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2014-08-28 15:04 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration.c                   |  6 +++++
 savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1adbcf0..a83e5fa 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -91,6 +91,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration.c b/migration.c
index 9c0f926..21b419e 100644
--- a/migration.c
+++ b/migration.c
@@ -1058,6 +1058,12 @@ static int postcopy_start(MigrationState *ms)
      */
     QEMUFile *fb = qemu_bufopen("w", NULL);
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_ram_listen(fb);
+
     qemu_savevm_state_complete(fb);
     DPRINTF("postcopy_start: sending req 3\n");
     qemu_savevm_send_reqack(fb, 3);
diff --git a/savevm.c b/savevm.c
index 0cae88b..4960a21 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1265,9 +1265,45 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+typedef struct ram_listen_thread_data {
+    QEMUFile *f;
+    LoadStateEntry_Head *lh;
+} ram_listen_thread_data;
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    ram_listen_thread_data *rltd = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    DPRINTF("postcopy_ram_listen_thread start");
+
+    load_res = qemu_loadvm_state_main(rltd->f, rltd->lh);
+
+    DPRINTF("postcopy_ram_listen_thread exiting");
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(rltd->f, load_res);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    g_free(rltd);
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive page data */
 static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 {
+    ram_listen_thread_data *rltd = g_malloc(sizeof(ram_listen_thread_data));
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
         error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
@@ -1286,8 +1322,25 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
-    return 0;
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    rltd->f = mis->file;
+    rltd->lh = &loadvm_handlers;
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, rltd, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+
+    /*
+     * all good - cause the loop that handled this command to exit because
+     * the new thread is taking over
+     */
+    return LOADVM_EXITCODE_QUITPARENT | LOADVM_EXITCODE_KEEPHANDLERS;
 }
 
 /* After all discards we can start running and asking for pages */
@@ -1607,6 +1660,11 @@ int qemu_loadvm_state(QEMUFile *f)
     QLIST_INIT(&loadvm_handlers);
     ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
+    if (migration_incoming_get_current()->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         cpu_synchronize_all_post_init();
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2014-08-28 15:04 ` Dr. David Alan Gilbert (git)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_ram_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4960a21..e227689 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1346,6 +1346,8 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 /* After all discards we can start running and asking for pages */
 static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
@@ -1354,6 +1356,28 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
     }
 
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+    bdrv_clear_incoming_migration_all();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: cpu_synchronize_all_post_init");
+    cpu_synchronize_all_post_init();
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: vm_start");
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1362,11 +1386,15 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
-/* The end - with a byte from the source which can tell us to fail. */
-static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+/* The end - with a byte from the source which can tell us to fail.
+ * The source sends this either if there is a failure, or if it believes it's
+ * sent everything
+ */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis,
+                                          uint8_t status)
 {
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
@@ -1374,7 +1402,32 @@ static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
                      mis->postcopy_ram_state);
         return -1;
     }
-    return -1; /* TODO - expecting 1 byte good/fail */
+
+    DPRINTF("loadvm_postcopy_ram_handle_end status=%d", status);
+
+    if (!status) {
+        bool one_message = false;
+        /* This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet.
+         * TODO: Using an atomic_xchg or something for this
+         */
+        while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) {
+            if (!one_message) {
+                DPRINTF("%s: Waiting for RUN", __func__);
+                one_message = true;
+            }
+        }
+    }
+
+    if (status) {
+        error_report("CMD_POSTCOPY_RAM_END: error on source host (%d)",
+                     status);
+        qemu_file_set_error(mis->file, -EPIPE);
+    }
+
+    /* This will cause the listen thread to exit and call cleanup */
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
@@ -1515,7 +1568,7 @@ static int loadvm_process_command(QEMUFile *f,
                                                    len, 1)) {
             return -1;
         }
-        return loadvm_postcopy_ram_handle_end(mis);
+        return loadvm_postcopy_ram_handle_end(mis, qemu_get_byte(f));
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 47/47] End of migration for postcopy
  2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
@ 2014-08-28 15:04 ` Dr. David Alan Gilbert (git)
  46 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-28 15:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, amit.shah, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index 21b419e..55ee54d 100644
--- a/migration.c
+++ b/migration.c
@@ -205,12 +205,33 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
     int ret;
 
-    migration_incoming_state_init(f);
+    mis = migration_incoming_state_init(f);
 
     ret = qemu_loadvm_state(f);
 
+    DPRINTF("%s: ret=%d postcopy_ram_state=%d", __func__, ret,
+            mis->postcopy_ram_state);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * Where a migration had postcopy enabled (and thus went to advise)
+         * but managed to complete within the precopy period
+         */
+        postcopy_ram_incoming_cleanup(mis);
+    } else {
+        if ((ret >= 0) &&
+            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
+            /*
+             * Postcopy was started, cleanup should happen at the end of the
+             * postcopy thread.
+             */
+            DPRINTF("process_incoming_migration_co: exiting main branch");
+            return;
+        }
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-09-09  3:34   ` Hongyang Yang
  2014-09-09  3:46     ` Hongyang Yang
  2014-09-09  3:39   ` Hongyang Yang
  1 sibling, 1 reply; 52+ messages in thread
From: Hongyang Yang @ 2014-09-09  3:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, Jiang Yunhong, Dong Eddie,
	amit.shah, Lai Jiangshan

Hi

   I've read your documentation about Postcopy, this is interesting.
It comes to my mind that if COLO can gain some improvements from
Postcopy.
   The first thing I thought was that if we can use the back channel
that request dirty pages from source so that we do not need to manage
a ram snapshot of the source. That is, when entered a COLO checkpoint,
we just request the pages that dirtied on destination from source.
But I'm not sure it won't affect the performance, anyway, it may worth
a try.

在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   docs/migration.txt | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 188 insertions(+)
>
> diff --git a/docs/migration.txt b/docs/migration.txt
> index 0492a45..7f0fdc4 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio operation
>   (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>   not enabled, the values on that fields are garbage and don't need to
>   be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd (although
> +possibly with another fd or similar for some fast way of throwing pages across).
> +
> +However, some uses need two way communication; in particular the Postcopy destination
> +needs to be able to request pages on demand from the source.
> +
> +For these scenarios there is a 'return path' from the destination to the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> +path.
> +
> +  Source side
> +     Forward path - written by migration thread
> +     Return path  - opened by main thread, read by return-path thread
> +
> +  Destination side
> +     Forward path - read by main thread
> +     Return path  - opened by main thread, written by main thread AND postcopy
> +                    thread (protected by rp_mutex)
> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> +its plus side is that there is an upper bound on the amount of migration traffic
> +and time it takes, the down side is that during the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.
> +
> +In postcopy the destination CPUs are started before all the memory has been
> +transferred, and accesses to pages that are yet to be transferred cause
> +a fault that's translated by QEMU into a request to the source QEMU.
> +
> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> +doesn't finish in a given time the switch is automatically made to precopy.
> +
> +=== Enabling postcopy ===
> +
> +To enable postcopy (prior to the start of migration):
> +
> +migrate_set_capability x-postcopy-ram on
> +
> +The migration will still start in precopy mode, however issuing:
> +
> +migrate_start_postcopy
> +
> +will now cause the transition from precopy to postcopy.
> +It can be issued immediately after migration is started or any
> +time later on.  Issuing it after the end of a migration is harmless.
> +
> +=== Postcopy device transfer ===
> +
> +Loading of device data may cause the device emulation to access guest RAM
> +that may trigger faults that have to be resolved by the source, as such
> +the migration stream has to be able to respond with page data *during* the
> +device load, and hence the device data has to be read from the stream completely
> +before the device load begins to free the stream up.  This is achieved by
> +'packaging' the device data into a blob that's read in one go.
> +
> +Source behaviour
> +
> +Until postcopy is entered the migration stream is identical to normal postcopy,
> +except for the addition of a 'postcopy advise' command at the beginning to
> +let the destination know that postcopy might happen.  When postcopy starts
> +the source sends the page discard data and then forms the 'package' containing:
> +
> +   Command: 'postcopy ram listen'
> +   The device state
> +      A series of sections, identical to the precopy streams device state stream
> +      containing everything except postcopiable devices (i.e. RAM)
> +   Command: 'postcopy ram run'
> +
> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> +contents are formatted in the same way as the main migration stream.
> +
> +Destination behaviour
> +
> +Initially the destination looks the same as precopy, with a single thread
> +reading the migration stream; the 'postcopy advise' and 'discard' commands
> +are processed to change the way RAM is managed, but don't affect the stream
> +processing.
> +
> +------------------------------------------------------------------------------
> +                        1      2   3     4 5                      6   7
> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> +thread                             |       |
> +                                   |     (page request)
> +                                   |        \___
> +                                   v            \
> +listen thread:                     --- page -- page -- page -- page -- page --
> +
> +                                   a   b        c
> +------------------------------------------------------------------------------
> +
> +On receipt of CMD_PACKAGED (1)
> +   All the data associated with the package - the ( ... ) section in the
> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
> +which contains commands (3,6) and devices (4...)
> +
> +On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
> +a new thread (a) is started that takes over servicing the migration stream,
> +while the main thread carries on loading the package.   It loads normal
> +background page data (b) but if during a device load a fault happens (5) the
> +returned page (c) is loaded by the listen thread allowing the main threads
> +device load to carry on.
> +
> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> +CPUs start running.
> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> +and is no longer used by migration, while the listen thread carries
> +on servicing page data until the end of migration.
> +
> +=== Postcopy states ===
> +
> +Postcopy moves through a series of states (see postcopy_ram_state)
> +from ADVISE->LISTEN->RUNNING->END
> +
> +  Advise: Set at the start of migration if postcopy is enabled, even
> +          if it hasn't had the start command; here the destination
> +          checks that its OS has the support needed for postcopy, and performs
> +          setup to ensure the RAM mappings are suitable for later postcopy.
> +          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
> +
> +  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
> +          the destination state to Listen, and starts a new thread
> +          (the 'listen thread') which takes over the job of receiving
> +          pages off the migration stream, while the main thread carries
> +          on processing the blob.  With this thread able to process page
> +          reception, the destination now 'sensitises' the RAM to detect
> +          any access to missing pages (on Linux using the 'userfault'
> +          system).
> +
> +  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
> +          state and start the CPUs and IO devices running.  The main
> +          thread now finishes processing the migration package and
> +          now carries on as it would for normal precopy migration
> +          (although it can't do the cleanup it would do as it
> +          finishes a normal migration).
> +
> +  End: The listen thread can now quit, and perform the cleanup of migration
> +          state, the migration is now complete.
> +
> +=== Source side page maps ===
> +
> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
> +and 'sent map'.  The 'migration bitmap' is basically the same as in
> +the precopy case, and holds a bit to indicate that page is 'dirty' -
> +i.e. needs sending.  During the precopy phase this is updated as the CPU
> +dirties pages, however during postcopy the CPUs are stopped and nothing
> +should dirty anything any more.
> +
> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
> +has a bit set whenever a page is sent to the destination, however during
> +the transition to postcopy mode it is masked against the migration bitmap
> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
> +have been previously been sent but are now dirty again.  This masked
> +sentmap is sent to the destination which discards those now dirty pages
> +before starting the CPUs.
> +
> +Note that once in postcopy mode, the sent map is still updated; however,
> +its contents are not necessarily consistent with the pages already sent
> +due to the masking with the migration bitmap.
> +
> +=== Destination side page maps ===
> +
> +(Needs to be changed so we can update both easily - at the moment updates are done
> + with a lock)
> +The destination keeps a 'requested map' and a 'received map'.
> +Both maps are initially 0, as pages are received the bits are set in 'received map'.
> +Incoming requests from the kernel cause the bit to be set in the 'requested map'.
> +When a page is received that is marked as 'requested' the kernel is notified.
> +If the kernel requests a page that has already been 'received' the kernel is notified
> +without re-requesting.
> +
> +This leads to three valid page states:
> +page states:
> +    missing (!rc,!rq)  - page not yet received or requested
> +    received (rc,!rq)  - Page received
> +    requested (!rc,rq) - page requested but not yet received
> +
> +state transitions:
> +      received -> missing   (only during setup/discard)
> +
> +      missing -> received   (normal incoming page)
> +      requested -> received (incoming page previously requested)
> +      missing -> requested  (userfault request)
> +
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
  2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2014-09-09  3:34   ` Hongyang Yang
@ 2014-09-09  3:39   ` Hongyang Yang
  2014-09-12 11:23     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 52+ messages in thread
From: Hongyang Yang @ 2014-09-09  3:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, lilei



在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   docs/migration.txt | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 188 insertions(+)
>
> diff --git a/docs/migration.txt b/docs/migration.txt
> index 0492a45..7f0fdc4 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio operation
>   (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>   not enabled, the values on that fields are garbage and don't need to
>   be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd (although
> +possibly with another fd or similar for some fast way of throwing pages across).
> +
> +However, some uses need two way communication; in particular the Postcopy destination
> +needs to be able to request pages on demand from the source.
> +
> +For these scenarios there is a 'return path' from the destination to the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> +path.
> +
> +  Source side
> +     Forward path - written by migration thread
> +     Return path  - opened by main thread, read by return-path thread
> +
> +  Destination side
> +     Forward path - read by main thread
> +     Return path  - opened by main thread, written by main thread AND postcopy
> +                    thread (protected by rp_mutex)
> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> +its plus side is that there is an upper bound on the amount of migration traffic
> +and time it takes, the down side is that during the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.
> +
> +In postcopy the destination CPUs are started before all the memory has been
> +transferred, and accesses to pages that are yet to be transferred cause
> +a fault that's translated by QEMU into a request to the source QEMU.
> +
> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> +doesn't finish in a given time the switch is automatically made to precopy.

I think you mean "automatically made to postcopy" here?

> +
> +=== Enabling postcopy ===
> +
> +To enable postcopy (prior to the start of migration):
> +
> +migrate_set_capability x-postcopy-ram on
> +
> +The migration will still start in precopy mode, however issuing:
> +
> +migrate_start_postcopy
> +
> +will now cause the transition from precopy to postcopy.
> +It can be issued immediately after migration is started or any
> +time later on.  Issuing it after the end of a migration is harmless.
> +
> +=== Postcopy device transfer ===
> +
> +Loading of device data may cause the device emulation to access guest RAM
> +that may trigger faults that have to be resolved by the source, as such
> +the migration stream has to be able to respond with page data *during* the
> +device load, and hence the device data has to be read from the stream completely
> +before the device load begins to free the stream up.  This is achieved by
> +'packaging' the device data into a blob that's read in one go.
> +
> +Source behaviour
> +
> +Until postcopy is entered the migration stream is identical to normal postcopy,
> +except for the addition of a 'postcopy advise' command at the beginning to
> +let the destination know that postcopy might happen.  When postcopy starts

A comma here?

> +the source sends the page discard data and then forms the 'package' containing:
> +
> +   Command: 'postcopy ram listen'
> +   The device state
> +      A series of sections, identical to the precopy streams device state stream
> +      containing everything except postcopiable devices (i.e. RAM)
> +   Command: 'postcopy ram run'
> +
> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> +contents are formatted in the same way as the main migration stream.
> +
> +Destination behaviour
> +
> +Initially the destination looks the same as precopy, with a single thread
> +reading the migration stream; the 'postcopy advise' and 'discard' commands
> +are processed to change the way RAM is managed, but don't affect the stream
> +processing.
> +
> +------------------------------------------------------------------------------
> +                        1      2   3     4 5                      6   7
> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> +thread                             |       |
> +                                   |     (page request)
> +                                   |        \___
> +                                   v            \
> +listen thread:                     --- page -- page -- page -- page -- page --
> +
> +                                   a   b        c
> +------------------------------------------------------------------------------
> +
> +On receipt of CMD_PACKAGED (1)
> +   All the data associated with the package - the ( ... ) section in the
> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
> +which contains commands (3,6) and devices (4...)
> +
> +On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
> +a new thread (a) is started that takes over servicing the migration stream,
> +while the main thread carries on loading the package.   It loads normal
> +background page data (b) but if during a device load a fault happens (5) the
> +returned page (c) is loaded by the listen thread allowing the main threads
> +device load to carry on.
> +
> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> +CPUs start running.
> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> +and is no longer used by migration, while the listen thread carries
> +on servicing page data until the end of migration.
> +
> +=== Postcopy states ===
> +
> +Postcopy moves through a series of states (see postcopy_ram_state)
> +from ADVISE->LISTEN->RUNNING->END
> +
> +  Advise: Set at the start of migration if postcopy is enabled, even
> +          if it hasn't had the start command; here the destination
> +          checks that its OS has the support needed for postcopy, and performs
> +          setup to ensure the RAM mappings are suitable for later postcopy.
> +          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
> +
> +  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
> +          the destination state to Listen, and starts a new thread
> +          (the 'listen thread') which takes over the job of receiving
> +          pages off the migration stream, while the main thread carries
> +          on processing the blob.  With this thread able to process page
> +          reception, the destination now 'sensitises' the RAM to detect
> +          any access to missing pages (on Linux using the 'userfault'
> +          system).
> +
> +  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
> +          state and start the CPUs and IO devices running.  The main
> +          thread now finishes processing the migration package and
> +          now carries on as it would for normal precopy migration
> +          (although it can't do the cleanup it would do as it
> +          finishes a normal migration).
> +
> +  End: The listen thread can now quit, and perform the cleanup of migration
> +          state, the migration is now complete.
> +
> +=== Source side page maps ===
> +
> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
> +and 'sent map'.  The 'migration bitmap' is basically the same as in
> +the precopy case, and holds a bit to indicate that page is 'dirty' -
> +i.e. needs sending.  During the precopy phase this is updated as the CPU
> +dirties pages, however during postcopy the CPUs are stopped and nothing
> +should dirty anything any more.
> +
> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
> +has a bit set whenever a page is sent to the destination, however during
> +the transition to postcopy mode it is masked against the migration bitmap
> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
> +have been previously been sent but are now dirty again.  This masked
> +sentmap is sent to the destination which discards those now dirty pages
> +before starting the CPUs.
> +
> +Note that once in postcopy mode, the sent map is still updated; however,
> +its contents are not necessarily consistent with the pages already sent
> +due to the masking with the migration bitmap.
> +
> +=== Destination side page maps ===
> +
> +(Needs to be changed so we can update both easily - at the moment updates are done
> + with a lock)
> +The destination keeps a 'requested map' and a 'received map'.
> +Both maps are initially 0, as pages are received the bits are set in 'received map'.
> +Incoming requests from the kernel cause the bit to be set in the 'requested map'.
> +When a page is received that is marked as 'requested' the kernel is notified.
> +If the kernel requests a page that has already been 'received' the kernel is notified
> +without re-requesting.
> +
> +This leads to three valid page states:
> +page states:
> +    missing (!rc,!rq)  - page not yet received or requested
> +    received (rc,!rq)  - Page received
> +    requested (!rc,rq) - page requested but not yet received
> +
> +state transitions:
> +      received -> missing   (only during setup/discard)
> +
> +      missing -> received   (normal incoming page)
> +      requested -> received (incoming page previously requested)
> +      missing -> requested  (userfault request)
> +
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
  2014-09-09  3:34   ` Hongyang Yang
@ 2014-09-09  3:46     ` Hongyang Yang
  0 siblings, 0 replies; 52+ messages in thread
From: Hongyang Yang @ 2014-09-09  3:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, Jiang Yunhong, Dong Eddie,
	amit.shah, Lai Jiangshan

在 09/09/2014 11:34 AM, Hongyang Yang 写道:
> Hi
>
>    I've read your documentation about Postcopy, this is interesting.
> It comes to my mind that if COLO can gain some improvements from
> Postcopy.
>    The first thing I thought was that if we can use the back channel
> that request dirty pages from source so that we do not need to manage
> a ram snapshot of the source. That is, when entered a COLO checkpoint,
> we just request the pages that dirtied on destination from source.
> But I'm not sure it won't affect the performance, anyway, it may worth
> a try.

Or we can reuse your kernel side patch, that generate the page fault when
destination access a page that was dirty. and then we load the page from
the ram snapshot we managed so that we do not need to flush the dirty page
when checkpoint. It may reduce the checkpoint duration time.

>
> 在 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) 写道:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>>   docs/migration.txt | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 188 insertions(+)
>>
>> diff --git a/docs/migration.txt b/docs/migration.txt
>> index 0492a45..7f0fdc4 100644
>> --- a/docs/migration.txt
>> +++ b/docs/migration.txt
>> @@ -294,3 +294,191 @@ save/send this state when we are in the middle of a pio
>> operation
>>   (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>>   not enabled, the values on that fields are garbage and don't need to
>>   be sent.
>> +
>> += Return path =
>> +
>> +In most migration scenarios there is only a single data path that runs
>> +from the source VM to the destination, typically along a single fd (although
>> +possibly with another fd or similar for some fast way of throwing pages across).
>> +
>> +However, some uses need two way communication; in particular the Postcopy
>> destination
>> +needs to be able to request pages on demand from the source.
>> +
>> +For these scenarios there is a 'return path' from the destination to the source;
>> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
>> +path.
>> +
>> +  Source side
>> +     Forward path - written by migration thread
>> +     Return path  - opened by main thread, read by return-path thread
>> +
>> +  Destination side
>> +     Forward path - read by main thread
>> +     Return path  - opened by main thread, written by main thread AND postcopy
>> +                    thread (protected by rp_mutex)
>> +
>> += Postcopy =
>> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
>> +its plus side is that there is an upper bound on the amount of migration traffic
>> +and time it takes, the down side is that during the postcopy phase, a failure of
>> +*either* side or the network connection causes the guest to be lost.
>> +
>> +In postcopy the destination CPUs are started before all the memory has been
>> +transferred, and accesses to pages that are yet to be transferred cause
>> +a fault that's translated by QEMU into a request to the source QEMU.
>> +
>> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
>> +doesn't finish in a given time the switch is automatically made to precopy.
>> +
>> +=== Enabling postcopy ===
>> +
>> +To enable postcopy (prior to the start of migration):
>> +
>> +migrate_set_capability x-postcopy-ram on
>> +
>> +The migration will still start in precopy mode, however issuing:
>> +
>> +migrate_start_postcopy
>> +
>> +will now cause the transition from precopy to postcopy.
>> +It can be issued immediately after migration is started or any
>> +time later on.  Issuing it after the end of a migration is harmless.
>> +
>> +=== Postcopy device transfer ===
>> +
>> +Loading of device data may cause the device emulation to access guest RAM
>> +that may trigger faults that have to be resolved by the source, as such
>> +the migration stream has to be able to respond with page data *during* the
>> +device load, and hence the device data has to be read from the stream completely
>> +before the device load begins to free the stream up.  This is achieved by
>> +'packaging' the device data into a blob that's read in one go.
>> +
>> +Source behaviour
>> +
>> +Until postcopy is entered the migration stream is identical to normal postcopy,
>> +except for the addition of a 'postcopy advise' command at the beginning to
>> +let the destination know that postcopy might happen.  When postcopy starts
>> +the source sends the page discard data and then forms the 'package' containing:
>> +
>> +   Command: 'postcopy ram listen'
>> +   The device state
>> +      A series of sections, identical to the precopy streams device state stream
>> +      containing everything except postcopiable devices (i.e. RAM)
>> +   Command: 'postcopy ram run'
>> +
>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
>> +contents are formatted in the same way as the main migration stream.
>> +
>> +Destination behaviour
>> +
>> +Initially the destination looks the same as precopy, with a single thread
>> +reading the migration stream; the 'postcopy advise' and 'discard' commands
>> +are processed to change the way RAM is managed, but don't affect the stream
>> +processing.
>> +
>> +------------------------------------------------------------------------------
>> +                        1      2   3     4 5                      6   7
>> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
>> +thread                             |       |
>> +                                   |     (page request)
>> +                                   |        \___
>> +                                   v            \
>> +listen thread:                     --- page -- page -- page -- page -- page --
>> +
>> +                                   a   b        c
>> +------------------------------------------------------------------------------
>> +
>> +On receipt of CMD_PACKAGED (1)
>> +   All the data associated with the package - the ( ... ) section in the
>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
>> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
>> +which contains commands (3,6) and devices (4...)
>> +
>> +On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
>> +a new thread (a) is started that takes over servicing the migration stream,
>> +while the main thread carries on loading the package.   It loads normal
>> +background page data (b) but if during a device load a fault happens (5) the
>> +returned page (c) is loaded by the listen thread allowing the main threads
>> +device load to carry on.
>> +
>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
>> destination
>> +CPUs start running.
>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running
>> behaviour
>> +and is no longer used by migration, while the listen thread carries
>> +on servicing page data until the end of migration.
>> +
>> +=== Postcopy states ===
>> +
>> +Postcopy moves through a series of states (see postcopy_ram_state)
>> +from ADVISE->LISTEN->RUNNING->END
>> +
>> +  Advise: Set at the start of migration if postcopy is enabled, even
>> +          if it hasn't had the start command; here the destination
>> +          checks that its OS has the support needed for postcopy, and performs
>> +          setup to ensure the RAM mappings are suitable for later postcopy.
>> +          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
>> +
>> +  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
>> +          the destination state to Listen, and starts a new thread
>> +          (the 'listen thread') which takes over the job of receiving
>> +          pages off the migration stream, while the main thread carries
>> +          on processing the blob.  With this thread able to process page
>> +          reception, the destination now 'sensitises' the RAM to detect
>> +          any access to missing pages (on Linux using the 'userfault'
>> +          system).
>> +
>> +  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
>> +          state and start the CPUs and IO devices running.  The main
>> +          thread now finishes processing the migration package and
>> +          now carries on as it would for normal precopy migration
>> +          (although it can't do the cleanup it would do as it
>> +          finishes a normal migration).
>> +
>> +  End: The listen thread can now quit, and perform the cleanup of migration
>> +          state, the migration is now complete.
>> +
>> +=== Source side page maps ===
>> +
>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
>> +and 'sent map'.  The 'migration bitmap' is basically the same as in
>> +the precopy case, and holds a bit to indicate that page is 'dirty' -
>> +i.e. needs sending.  During the precopy phase this is updated as the CPU
>> +dirties pages, however during postcopy the CPUs are stopped and nothing
>> +should dirty anything any more.
>> +
>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
>> +has a bit set whenever a page is sent to the destination, however during
>> +the transition to postcopy mode it is masked against the migration bitmap
>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
>> +have been previously been sent but are now dirty again.  This masked
>> +sentmap is sent to the destination which discards those now dirty pages
>> +before starting the CPUs.
>> +
>> +Note that once in postcopy mode, the sent map is still updated; however,
>> +its contents are not necessarily consistent with the pages already sent
>> +due to the masking with the migration bitmap.
>> +
>> +=== Destination side page maps ===
>> +
>> +(Needs to be changed so we can update both easily - at the moment updates are
>> done
>> + with a lock)
>> +The destination keeps a 'requested map' and a 'received map'.
>> +Both maps are initially 0, as pages are received the bits are set in
>> 'received map'.
>> +Incoming requests from the kernel cause the bit to be set in the 'requested
>> map'.
>> +When a page is received that is marked as 'requested' the kernel is notified.
>> +If the kernel requests a page that has already been 'received' the kernel is
>> notified
>> +without re-requesting.
>> +
>> +This leads to three valid page states:
>> +page states:
>> +    missing (!rc,!rq)  - page not yet received or requested
>> +    received (rc,!rq)  - Page received
>> +    requested (!rc,rq) - page requested but not yet received
>> +
>> +state transitions:
>> +      received -> missing   (only during setup/discard)
>> +
>> +      missing -> received   (normal incoming page)
>> +      requested -> received (incoming page previously requested)
>> +      missing -> requested  (userfault request)
>> +
>>
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
  2014-09-09  3:39   ` Hongyang Yang
@ 2014-09-12 11:23     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2014-09-12 11:23 UTC (permalink / raw)
  To: Hongyang Yang; +Cc: aarcange, yamahata, lilei, quintela, qemu-devel, amit.shah

* Hongyang Yang (yanghy@cn.fujitsu.com) wrote:
> 
> 
> ??? 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) ??????:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---


> >+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> >+doesn't finish in a given time the switch is automatically made to precopy.
> 
> I think you mean "automatically made to postcopy" here?

Thanks!

> >+Source behaviour
> >+
> >+Until postcopy is entered the migration stream is identical to normal postcopy,
> >+except for the addition of a 'postcopy advise' command at the beginning to
> >+let the destination know that postcopy might happen.  When postcopy starts
> 
> A comma here?

Yes, thanks.

Dave

> 
> >+the source sends the page discard data and then forms the 'package' containing:
> >+
> >+   Command: 'postcopy ram listen'
> >+   The device state
> >+      A series of sections, identical to the precopy streams device state stream
> >+      containing everything except postcopiable devices (i.e. RAM)
> >+   Command: 'postcopy ram run'
> >+
> >+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> >+contents are formatted in the same way as the main migration stream.
> >+
> >+Destination behaviour
> >+
> >+Initially the destination looks the same as precopy, with a single thread
> >+reading the migration stream; the 'postcopy advise' and 'discard' commands
> >+are processed to change the way RAM is managed, but don't affect the stream
> >+processing.
> >+
> >+------------------------------------------------------------------------------
> >+                        1      2   3     4 5                      6   7
> >+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> >+thread                             |       |
> >+                                   |     (page request)
> >+                                   |        \___
> >+                                   v            \
> >+listen thread:                     --- page -- page -- page -- page -- page --
> >+
> >+                                   a   b        c
> >+------------------------------------------------------------------------------
> >+
> >+On receipt of CMD_PACKAGED (1)
> >+   All the data associated with the package - the ( ... ) section in the
> >+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> >+recurses into qemu_loadvm_state_main to process the contents of the package (2)
> >+which contains commands (3,6) and devices (4...)
> >+
> >+On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
> >+a new thread (a) is started that takes over servicing the migration stream,
> >+while the main thread carries on loading the package.   It loads normal
> >+background page data (b) but if during a device load a fault happens (5) the
> >+returned page (c) is loaded by the listen thread allowing the main threads
> >+device load to carry on.
> >+
> >+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> >+CPUs start running.
> >+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> >+and is no longer used by migration, while the listen thread carries
> >+on servicing page data until the end of migration.
> >+
> >+=== Postcopy states ===
> >+
> >+Postcopy moves through a series of states (see postcopy_ram_state)
> >+from ADVISE->LISTEN->RUNNING->END
> >+
> >+  Advise: Set at the start of migration if postcopy is enabled, even
> >+          if it hasn't had the start command; here the destination
> >+          checks that its OS has the support needed for postcopy, and performs
> >+          setup to ensure the RAM mappings are suitable for later postcopy.
> >+          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
> >+
> >+  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
> >+          the destination state to Listen, and starts a new thread
> >+          (the 'listen thread') which takes over the job of receiving
> >+          pages off the migration stream, while the main thread carries
> >+          on processing the blob.  With this thread able to process page
> >+          reception, the destination now 'sensitises' the RAM to detect
> >+          any access to missing pages (on Linux using the 'userfault'
> >+          system).
> >+
> >+  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
> >+          state and start the CPUs and IO devices running.  The main
> >+          thread now finishes processing the migration package and
> >+          now carries on as it would for normal precopy migration
> >+          (although it can't do the cleanup it would do as it
> >+          finishes a normal migration).
> >+
> >+  End: The listen thread can now quit, and perform the cleanup of migration
> >+          state, the migration is now complete.
> >+
> >+=== Source side page maps ===
> >+
> >+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
> >+and 'sent map'.  The 'migration bitmap' is basically the same as in
> >+the precopy case, and holds a bit to indicate that page is 'dirty' -
> >+i.e. needs sending.  During the precopy phase this is updated as the CPU
> >+dirties pages, however during postcopy the CPUs are stopped and nothing
> >+should dirty anything any more.
> >+
> >+The 'sent map' is used for the transition to postcopy. It is a bitmap that
> >+has a bit set whenever a page is sent to the destination, however during
> >+the transition to postcopy mode it is masked against the migration bitmap
> >+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
> >+have been previously been sent but are now dirty again.  This masked
> >+sentmap is sent to the destination which discards those now dirty pages
> >+before starting the CPUs.
> >+
> >+Note that once in postcopy mode, the sent map is still updated; however,
> >+its contents are not necessarily consistent with the pages already sent
> >+due to the masking with the migration bitmap.
> >+
> >+=== Destination side page maps ===
> >+
> >+(Needs to be changed so we can update both easily - at the moment updates are done
> >+ with a lock)
> >+The destination keeps a 'requested map' and a 'received map'.
> >+Both maps are initially 0, as pages are received the bits are set in 'received map'.
> >+Incoming requests from the kernel cause the bit to be set in the 'requested map'.
> >+When a page is received that is marked as 'requested' the kernel is notified.
> >+If the kernel requests a page that has already been 'received' the kernel is notified
> >+without re-requesting.
> >+
> >+This leads to three valid page states:
> >+page states:
> >+    missing (!rc,!rq)  - page not yet received or requested
> >+    received (rc,!rq)  - Page received
> >+    requested (!rc,rq) - page requested but not yet received
> >+
> >+state transitions:
> >+      received -> missing   (only during setup/discard)
> >+
> >+      missing -> received   (normal incoming page)
> >+      requested -> received (incoming page previously requested)
> >+      missing -> requested  (userfault request)
> >+
> >
> 
> -- 
> Thanks,
> Yang.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2014-09-12 11:23 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-28 15:03 [Qemu-devel] [PATCH v3 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 01/47] QEMUSizedBuffer/QEMUFile Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2014-09-09  3:34   ` Hongyang Yang
2014-09-09  3:46     ` Hongyang Yang
2014-09-09  3:39   ` Hongyang Yang
2014-09-12 11:23     ` Dr. David Alan Gilbert
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 08/47] socket shutdown Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 09/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 10/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 11/47] Migration commands Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 12/47] Return path: Control commands Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 13/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 14/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 15/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 16/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 17/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 18/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 19/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 20/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 21/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 22/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 23/47] postcopy: OS support test Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 24/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 25/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 26/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 27/47] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 28/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 29/47] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 32/47] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 33/47] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 34/47] mig fd_connect: open return path Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 35/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 36/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 37/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 38/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 39/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 40/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 41/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2014-08-28 15:03 ` [Qemu-devel] [PATCH v3 42/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 43/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
2014-08-28 15:04 ` [Qemu-devel] [PATCH v3 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.