All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
@ 2014-10-03 17:47 Dr. David Alan Gilbert (git)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
                   ` (48 more replies)
  0 siblings, 49 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is the 4th cut of my version of postcopy; it is designed for use with
the Linux kernel additions just posted by Andrea Arcangeli here:

http://marc.info/?l=linux-kernel&m=141235633015100&w=2

(Note: This is a new version compared to my previous postcopy patchset; you'll
need to update the kernel to the new version.)

Other than the new kernel ABI (which is only a small change to the userspace side);
the major changes are;

  a) Code for host page size != target page size
  b) Support for migration over fd 
     From Cristian Klein; this is for libvirt support which Cristian recently
     posted to the libvirt list.
  c) It's now build bisectable and builds on 32bit

Testing wise; I've now done many thousand of postcopy migrations without
failure (both of idle and busy guests); so it seems pretty solid.

Must-TODO's:
  1) A partially repeatable migration_cancel failure
  2) virt_test's migrate.with_reboot test is failing
  3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
    the source feels like it needs looking at for postcopy.
  4) Paolo's comments with respect to the wakeup_request/is_running code
     in the migration thread
  5) xbzrle needs disabling once in postcopy

Later-TODO's:
  1) Control the rate of background page transfers during postcopy to
     reduce their impact on the latency of postcopy requests.
  2) Work with RDMA
  3) Could destination RP be made blocking (as per discussion with Paolo;
     I'm still worried that that changes too many assumptions)



V4:
  Initial support for host page size != target page size
    - tested heavily on hps==tps
    - only partially tested on hps!=tps systems
    - This involved quite a bit of rework around the discard code
  Updated to new kernel userfault ABI
    - It won't work with the previous version
  Fix mis-optimisation of postcopy request for wrong RAMBlock
     request for block A offset n
     un-needed fault for block B/m (already received - no req sent)
     request for block B/l  - wrongly sent as request for A/l
  Fix thinko in discard bitmap processing (missed last word of bitmap)
     Symptom: remap failures near the top of RAM if postcopy started late
  Fix bug that caused kernel page acknowledgments to be misaligned
     May have meant the guest was paused for longer than required
  Fix potential for crashing cleaning up failed RP
  Fixes in docs (from Yang)
  Handle migration by fd as sockets if they are sockets
  Build tested on 32bit
  Fully build bisectable (x86-64)


Dave

Cristian Klein (1):
  Handle bi-directional communication for fd migration

Dr. David Alan Gilbert (46):
  QEMUSizedBuffer based QEMUFile
  Tests: QEMUSizedBuffer/QEMUBuffer
  Start documenting how postcopy works.
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  improve DPRINTF macros, add to savevm
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Create MigrationIncomingState
  socket shutdown
  Provide runtime Target page information
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  qemu_loadvm errors and debug
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  migrate_init: Call from savevm
  Allow savevm handlers to state whether they could go into postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  qemu_savevm_state_complete: Postcopy changes
  Postcopy page-map-incoming (PMI) structure
  Postcopy: Maintain sentmap and calculate discard
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: Postcopy startup in migration thread
  Postcopy: Create a fault handler thread before marking the ram as
    userfault
  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  Add assertion to check migration_dirty_pages
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  qemu_ram_block_from_host
  Don't sync dirty bitmaps in postcopy
  Host page!=target page: Cleanup bitmaps
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
  End of migration for postcopy

 Makefile.objs                    |    2 +-
 arch_init.c                      |  739 +++++++++++++++++++++++++--
 docs/migration.txt               |  189 +++++++
 exec.c                           |   76 ++-
 hmp-commands.hx                  |   15 +
 hmp.c                            |    7 +
 hmp.h                            |    1 +
 include/exec/cpu-common.h        |    8 +-
 include/migration/migration.h    |  130 +++++
 include/migration/postcopy-ram.h |  106 ++++
 include/migration/qemu-file.h    |   47 ++
 include/migration/vmstate.h      |    2 +-
 include/qemu/sockets.h           |    1 +
 include/qemu/typedefs.h          |    9 +-
 include/sysemu/sysemu.h          |   43 +-
 migration-fd.c                   |   24 +-
 migration-rdma.c                 |    4 +-
 migration.c                      |  693 +++++++++++++++++++++++++-
 postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
 qapi-schema.json                 |   14 +-
 qemu-file.c                      |  598 +++++++++++++++++++++-
 qmp-commands.hx                  |   19 +
 savevm.c                         |  881 +++++++++++++++++++++++++++++++--
 tests/Makefile                   |    2 +-
 tests/test-vmstate.c             |   74 +--
 util/qemu-sockets.c              |   28 ++
 26 files changed, 4550 insertions(+), 178 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-08  2:10   ` zhanghailiang
  2014-11-03  0:53   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
                   ` (47 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

* Please comment on separate thread for this QEMUSizedBuffer patch *

This is based on Stefan and Joel's patch that creates a QEMUFile that goes
to a memory buffer; from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Joel Schopp <jschopp@linux.vnet.ibm.com>

For fixes/tweeks I've done:
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/qemu-file.h |  28 +++
 include/qemu/typedefs.h       |   1 +
 qemu-file.c                   | 456 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 485 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..6ef8ebc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #define QEMU_FILE_H 1
 #include "exec/cpu-common.h"
 
+#include <stdint.h>
+
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
  * streaming.  The handler should try to write all of the data it can.
@@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
     QEMURamSaveFunc *save_page;
 } QEMUFileOps;
 
+struct QEMUSizedBuffer {
+    struct iovec *iov;
+    size_t n_iov;
+    size_t size; /* total allocated size in all iov's */
+    size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
@@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
 bool qemu_file_mode_is_not_valid(const char *mode);
 
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+                       uint8_t *buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+                     off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
     qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct MSIMessage MSIMessage;
 typedef struct SerialState SerialState;
 typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..ccc516c 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -878,3 +878,459 @@ uint64_t qemu_get_be64(QEMUFile *f)
     v |= qemu_get_be32(f);
     return v;
 }
+
+#define QSB_CHUNK_SIZE      (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *       hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer or NULL on allocation failure
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+    QEMUSizedBuffer *qsb;
+    size_t alloc_len, num_chunks, i, to_copy;
+    size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+                        ? QSB_MAX_CHUNK_SIZE
+                        : QSB_CHUNK_SIZE;
+
+    num_chunks = DIV_ROUND_UP(len ? len : QSB_CHUNK_SIZE, chunk_size);
+    alloc_len = num_chunks * chunk_size;
+
+    qsb = g_try_new0(QEMUSizedBuffer, 1);
+    if (!qsb) {
+        return NULL;
+    }
+
+    qsb->iov = g_try_new0(struct iovec, num_chunks);
+    if (!qsb->iov) {
+        g_free(qsb);
+        return NULL;
+    }
+
+    qsb->n_iov = num_chunks;
+
+    for (i = 0; i < num_chunks; i++) {
+        qsb->iov[i].iov_base = g_try_malloc0(chunk_size);
+        if (!qsb->iov[i].iov_base) {
+            /* qsb_free is safe since g_free can cope with NULL */
+            qsb_free(qsb);
+            return NULL;
+        }
+
+        qsb->iov[i].iov_len = chunk_size;
+        if (buffer) {
+            to_copy = (len - qsb->used) > chunk_size
+                      ? chunk_size : (len - qsb->used);
+            memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy);
+            qsb->used += to_copy;
+        }
+    }
+
+    qsb->size = alloc_len;
+
+    return qsb;
+}
+
+/**
+ * Free the QEMUSizedBuffer
+ *
+ * @qsb: The QEMUSizedBuffer to free
+ */
+void qsb_free(QEMUSizedBuffer *qsb)
+{
+    size_t i;
+
+    if (!qsb) {
+        return;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        g_free(qsb->iov[i].iov_base);
+    }
+    g_free(qsb->iov);
+    g_free(qsb);
+}
+
+/**
+ * Get the number of used bytes in the QEMUSizedBuffer
+ *
+ * @qsb: A QEMUSizedBuffer
+ *
+ * Returns the number of bytes currently used in this buffer
+ */
+size_t qsb_get_length(const QEMUSizedBuffer *qsb)
+{
+    return qsb->used;
+}
+
+/**
+ * Set the length of the buffer; the primary usage of this
+ * function is to truncate the number of used bytes in the buffer.
+ * The size will not be extended beyond the current number of
+ * allocated bytes in the QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_len: The new length of bytes in the buffer
+ *
+ * Returns the number of bytes the buffer was truncated or extended
+ * to.
+ */
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t new_len)
+{
+    if (new_len <= qsb->size) {
+        qsb->used = new_len;
+    } else {
+        qsb->used = qsb->size;
+    }
+    return qsb->used;
+}
+
+/**
+ * Get the iovec that holds the data for a given position @pos.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @pos: The index of a byte in the buffer
+ * @d_off: Pointer to an offset that this function will indicate
+ *         at what position within the returned iovec the byte
+ *         is to be found
+ *
+ * Returns the index of the iovec that holds the byte at the given
+ * index @pos in the byte stream; a negative number if the iovec
+ * for the given position @pos does not exist.
+ */
+static ssize_t qsb_get_iovec(const QEMUSizedBuffer *qsb,
+                             off_t pos, off_t *d_off)
+{
+    ssize_t i;
+    off_t curr = 0;
+
+    if (pos > qsb->used) {
+        return -1;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        if (curr + qsb->iov[i].iov_len > pos) {
+            *d_off = pos - curr;
+            return i;
+        }
+        curr += qsb->iov[i].iov_len;
+    }
+    return -1;
+}
+
+/*
+ * Convert the QEMUSizedBuffer into a flat buffer.
+ *
+ * Note: If at all possible, try to avoid this function since it
+ *       may unnecessarily copy memory around.
+ *
+ * @qsb: pointer to QEMUSizedBuffer
+ * @start: offset to start at
+ * @count: number of bytes to copy
+ * @buf: a pointer to a buffer to write into (at least @count bytes)
+ *
+ * Returns the number of bytes copied into the output buffer
+ */
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *qsb, off_t start,
+                       size_t count, uint8_t *buffer)
+{
+    const struct iovec *iov;
+    size_t to_copy, all_copy;
+    ssize_t index;
+    off_t s_off;
+    off_t d_off = 0;
+    char *s;
+
+    if (start > qsb->used) {
+        return 0;
+    }
+
+    all_copy = qsb->used - start;
+    if (all_copy > count) {
+        all_copy = count;
+    } else {
+        count = all_copy;
+    }
+
+    index = qsb_get_iovec(qsb, start, &s_off);
+    if (index < 0) {
+        return 0;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        s = iov->iov_base;
+
+        to_copy = iov->iov_len - s_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+        memcpy(&buffer[d_off], &s[s_off], to_copy);
+
+        d_off += to_copy;
+        all_copy -= to_copy;
+
+        s_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Grow the QEMUSizedBuffer to the given size and allocate
+ * memory for it.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @new_size: The new size of the buffer
+ *
+ * Return:
+ *    a negative error code in case of memory allocation failure
+ * or
+ *    the new size of the buffer. The returned size may be greater or equal
+ *    to @new_size.
+ */
+static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size)
+{
+    size_t needed_chunks, i;
+
+    if (qsb->size < new_size) {
+        struct iovec *new_iov;
+        size_t size_diff = new_size - qsb->size;
+        size_t chunk_size = (size_diff > QSB_MAX_CHUNK_SIZE)
+                             ? QSB_MAX_CHUNK_SIZE : QSB_CHUNK_SIZE;
+
+        needed_chunks = DIV_ROUND_UP(size_diff, chunk_size);
+
+        new_iov = g_try_malloc_n(qsb->n_iov + needed_chunks,
+                                 sizeof(struct iovec));
+        if (new_iov == NULL) {
+            return -ENOMEM;
+        }
+
+        /* Allocate new chunks as needed into new_iov */
+        for (i = qsb->n_iov; i < qsb->n_iov + needed_chunks; i++) {
+            new_iov[i].iov_base = g_try_malloc0(chunk_size);
+            new_iov[i].iov_len = chunk_size;
+            if (!new_iov[i].iov_base) {
+                size_t j;
+
+                /* Free previously allocated new chunks */
+                for (j = qsb->n_iov; j < i; j++) {
+                    g_free(new_iov[j].iov_base);
+                }
+                g_free(new_iov);
+
+                return -ENOMEM;
+            }
+        }
+
+        /*
+         * Now we can't get any allocation errors, copy over to new iov
+         * and switch.
+         */
+        for (i = 0; i < qsb->n_iov; i++) {
+            new_iov[i] = qsb->iov[i];
+        }
+
+        qsb->n_iov += needed_chunks;
+        g_free(qsb->iov);
+        qsb->iov = new_iov;
+        qsb->size += (needed_chunks * chunk_size);
+    }
+
+    return qsb->size;
+}
+
+/**
+ * Write into the QEMUSizedBuffer at a given position and a given
+ * number of bytes. This function will automatically grow the
+ * QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ * @source: A byte array to copy data from
+ * @pos: The position within the @qsb to write data to
+ * @size: The number of bytes to copy into the @qsb
+ *
+ * Returns @size or a negative error code in case of memory allocation failure,
+ *           or with an invalid 'pos'
+ */
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
+                     off_t pos, size_t count)
+{
+    ssize_t rc = qsb_grow(qsb, pos + count);
+    size_t to_copy;
+    size_t all_copy = count;
+    const struct iovec *iov;
+    ssize_t index;
+    char *dest;
+    off_t d_off, s_off = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    if (pos + count > qsb->used) {
+        qsb->used = pos + count;
+    }
+
+    index = qsb_get_iovec(qsb, pos, &d_off);
+    if (index < 0) {
+        return -EINVAL;
+    }
+
+    while (all_copy > 0) {
+        iov = &qsb->iov[index];
+
+        dest = iov->iov_base;
+
+        to_copy = iov->iov_len - d_off;
+        if (to_copy > all_copy) {
+            to_copy = all_copy;
+        }
+
+        memcpy(&dest[d_off], &source[s_off], to_copy);
+
+        s_off += to_copy;
+        all_copy -= to_copy;
+
+        d_off = 0;
+        index++;
+    }
+
+    return count;
+}
+
+/**
+ * Create a deep copy of the given QEMUSizedBuffer.
+ *
+ * @qsb: A QEMUSizedBuffer
+ *
+ * Returns a clone of @qsb or NULL on allocation failure
+ */
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
+{
+    QEMUSizedBuffer *out = qsb_create(NULL, qsb_get_length(qsb));
+    size_t i;
+    ssize_t res;
+    off_t pos = 0;
+
+    if (!out) {
+        return NULL;
+    }
+
+    for (i = 0; i < qsb->n_iov; i++) {
+        res =  qsb_write_at(out, qsb->iov[i].iov_base,
+                            pos, qsb->iov[i].iov_len);
+        if (res < 0) {
+            qsb_free(out);
+            return NULL;
+        }
+        pos += res;
+    }
+
+    return out;
+}
+
+typedef struct QEMUBuffer {
+    QEMUSizedBuffer *qsb;
+    QEMUFile *file;
+} QEMUBuffer;
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+    ssize_t len = qsb_get_length(s->qsb) - pos;
+
+    if (len <= 0) {
+        return 0;
+    }
+
+    if (len > size) {
+        len = size;
+    }
+    return qsb_get_buffer(s->qsb, pos, len, buf);
+}
+
+static int buf_put_buffer(void *opaque, const uint8_t *buf,
+                          int64_t pos, int size)
+{
+    QEMUBuffer *s = opaque;
+
+    return qsb_write_at(s->qsb, buf, pos, size);
+}
+
+static int buf_close(void *opaque)
+{
+    QEMUBuffer *s = opaque;
+
+    qsb_free(s->qsb);
+
+    g_free(s);
+
+    return 0;
+}
+
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f)
+{
+    QEMUBuffer *p;
+
+    qemu_fflush(f);
+
+    p = f->opaque;
+
+    return p->qsb;
+}
+
+static const QEMUFileOps buf_read_ops = {
+    .get_buffer = buf_get_buffer,
+    .close =      buf_close,
+};
+
+static const QEMUFileOps buf_write_ops = {
+    .put_buffer = buf_put_buffer,
+    .close =      buf_close,
+};
+
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
+{
+    QEMUBuffer *s;
+
+    if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') ||
+        mode[1] != '\0') {
+        error_report("qemu_bufopen: Argument validity check failed");
+        return NULL;
+    }
+
+    s = g_malloc0(sizeof(QEMUBuffer));
+    if (mode[0] == 'r') {
+        s->qsb = input;
+    }
+
+    if (s->qsb == NULL) {
+        s->qsb = qsb_create(NULL, 0);
+    }
+    if (!s->qsb) {
+        g_free(s);
+        error_report("qemu_bufopen: qsb_create failed");
+        return NULL;
+    }
+
+
+    if (mode[0] == 'r') {
+        s->file = qemu_fopen_ops(s, &buf_read_ops);
+    } else {
+        s->file = qemu_fopen_ops(s, &buf_write_ops);
+    }
+    return s->file;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  1:02   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
                   ` (46 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

* Please comment on separate thread for this QEMUSziedBuffer patch *

Modify some of tests/test-vmstate.c to use the in memory file based
on QEMUSizedBuffer to provide basic testing of QEMUSizedBuffer and
the associated memory backed QEMUFile type.

Only some of the tests are changed so that the fd backed QEMUFile is
still tested.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/Makefile       |  2 +-
 tests/test-vmstate.c | 74 +++++++++++++++++++++++++++-------------------------
 2 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 834279c..d004618 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -260,7 +260,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	libqemuutil.a libqemustub.a
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	vmstate.o qemu-file.o \
-	libqemuutil.a
+	libqemuutil.a libqemustub.a
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d72c64c..5e0fd13 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -43,6 +43,12 @@ void yield_until_fd_readable(int fd)
     select(fd + 1, &fds, NULL, NULL, NULL);
 }
 
+/*
+ * Some tests use 'open_test_file' to work on a real fd, some use
+ * an in memory file (QEMUSizedBuffer+qemu_bufopen); we could pick one
+ * but this way we test both.
+ */
+
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
@@ -54,6 +60,30 @@ static QEMUFile *open_test_file(bool write)
     return qemu_fdopen(fd, write ? "wb" : "rb");
 }
 
+/* Open a read-only qemu-file from an existing memory block */
+static QEMUFile *open_mem_file_read(const void *data, size_t len)
+{
+    /* The qsb gets freed by qemu_fclose */
+    QEMUSizedBuffer *qsb = qsb_create(data, len);
+    g_assert(qsb);
+
+    return qemu_bufopen("r", qsb);
+}
+
+/*
+ * Check that the contents of the memory-buffered file f match
+ * the given size/data.
+ */
+static void check_mem_file(QEMUFile *f, void *data, size_t size)
+{
+    uint8_t *result = g_malloc(size);
+    const QEMUSizedBuffer *qsb = qemu_buf_get(f);
+    g_assert_cmpint(qsb_get_length(qsb), ==, size);
+    g_assert_cmpint(qsb_get_buffer(qsb, 0, size, result), ==, size);
+    g_assert_cmpint(memcmp(result, data, size), ==, 0);
+    g_free(result);
+}
+
 #define SUCCESS(val) \
     g_assert_cmpint((val), ==, 0)
 
@@ -371,14 +401,12 @@ static const VMStateDescription vmstate_skipping = {
 
 static void test_save_noskip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
+    QEMUFile *fsave = qemu_bufopen("w", NULL);
     TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
                        .skip_c_e = false };
     vmstate_save_state(fsave, &vmstate_skipping, &obj);
     g_assert(!qemu_file_get_error(fsave));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
     uint8_t expected[] = {
         0, 0, 0, 1,             /* a */
         0, 0, 0, 2,             /* b */
@@ -387,52 +415,31 @@ static void test_save_noskip(void)
         0, 0, 0, 5,             /* e */
         0, 0, 0, 0, 0, 0, 0, 6, /* f */
     };
-    uint8_t result[sizeof(expected)];
-    g_assert_cmpint(qemu_get_buffer(loading, result, sizeof(result)), ==,
-                    sizeof(result));
-    g_assert(!qemu_file_get_error(loading));
-    g_assert_cmpint(memcmp(result, expected, sizeof(result)), ==, 0);
-
-    /* Must reach EOF */
-    qemu_get_byte(loading);
-    g_assert_cmpint(qemu_file_get_error(loading), ==, -EIO);
-
-    qemu_fclose(loading);
+    check_mem_file(fsave, expected, sizeof(expected));
+    qemu_fclose(fsave);
 }
 
 static void test_save_skip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
+    QEMUFile *fsave = qemu_bufopen("w", NULL);
     TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
                        .skip_c_e = true };
     vmstate_save_state(fsave, &vmstate_skipping, &obj);
     g_assert(!qemu_file_get_error(fsave));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
     uint8_t expected[] = {
         0, 0, 0, 1,             /* a */
         0, 0, 0, 2,             /* b */
         0, 0, 0, 0, 0, 0, 0, 4, /* d */
         0, 0, 0, 0, 0, 0, 0, 6, /* f */
     };
-    uint8_t result[sizeof(expected)];
-    g_assert_cmpint(qemu_get_buffer(loading, result, sizeof(result)), ==,
-                    sizeof(result));
-    g_assert(!qemu_file_get_error(loading));
-    g_assert_cmpint(memcmp(result, expected, sizeof(result)), ==, 0);
-
-
-    /* Must reach EOF */
-    qemu_get_byte(loading);
-    g_assert_cmpint(qemu_file_get_error(loading), ==, -EIO);
+    check_mem_file(fsave, expected, sizeof(expected));
 
-    qemu_fclose(loading);
+    qemu_fclose(fsave);
 }
 
 static void test_load_noskip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
     uint8_t buf[] = {
         0, 0, 0, 10,             /* a */
         0, 0, 0, 20,             /* b */
@@ -442,10 +449,8 @@ static void test_load_noskip(void)
         0, 0, 0, 0, 0, 0, 0, 60, /* f */
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
-    qemu_put_buffer(fsave, buf, sizeof(buf));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
+    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
     TestStruct obj = { .skip_c_e = false };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
@@ -460,7 +465,6 @@ static void test_load_noskip(void)
 
 static void test_load_skip(void)
 {
-    QEMUFile *fsave = open_test_file(true);
     uint8_t buf[] = {
         0, 0, 0, 10,             /* a */
         0, 0, 0, 20,             /* b */
@@ -468,10 +472,8 @@ static void test_load_skip(void)
         0, 0, 0, 0, 0, 0, 0, 60, /* f */
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
-    qemu_put_buffer(fsave, buf, sizeof(buf));
-    qemu_fclose(fsave);
 
-    QEMUFile *loading = open_test_file(false);
+    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
     TestStruct obj = { .skip_c_e = true, .c = 300, .e = 500 };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works.
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  1:31   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 189 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..a07b744 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,192 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way communication; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is made to postcopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal
+precopy, except for the addition of a 'postcopy advise' command at
+the beginning, to tell the destination that postcopy might happen.
+When postcopy starts the source sends the page discard data and then
+forms the 'package' containing:
+
+   Command: 'postcopy ram listen'
+   The device state
+      A series of sections, identical to the precopy streams device state stream
+      containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy ram run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+------------------------------------------------------------------------------
+                        1      2   3     4 5                      6   7
+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
+thread                             |       |
+                                   |     (page request)
+                                   |        \___
+                                   v            \
+listen thread:                     --- page -- page -- page -- page -- page --
+
+                                   a   b        c
+------------------------------------------------------------------------------
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+
+On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package.   It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+
+=== Postcopy states ===
+
+Postcopy moves through a series of states (see postcopy_ram_state)
+from ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't had the start command; here the destination
+          checks that its OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
+
+  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+          state, the migration is now complete.
+
+=== Source side page maps ===
+
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that once in postcopy mode, the sent map is still updated; however,
+its contents are not necessarily consistent with the pages already sent
+due to the masking with the migration bitmap.
+
+=== Destination side page maps ===
+
+(Needs to be changed so we can update both easily - at the moment updates are done
+ with a lock)
+The destination keeps a 'requested map' and a 'received map'.
+Both maps are initially 0, as pages are received the bits are set in 'received map'.
+Incoming requests from the kernel cause the bit to be set in the 'requested map'.
+When a page is received that is marked as 'requested' the kernel is notified.
+If the kernel requests a page that has already been 'received' the kernel is notified
+without re-requesting.
+
+This leads to three valid page states:
+page states:
+    missing (!rc,!rq)  - page not yet received or requested
+    received (rc,!rq)  - Page received
+    requested (!rc,rq) - page requested but not yet received
+
+state transitions:
+      received -> missing   (only during setup/discard)
+
+      missing -> received   (normal incoming page)
+      requested -> received (incoming page previously requested)
+      missing -> requested  (userfault request)
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  2:34   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and __qemu_rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration-rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 759055d..7989d19 100644
--- a/exec.c
+++ b/exec.c
@@ -2867,12 +2867,18 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->length, opaque);
+        ret = func(block->idstr, block->host, block->offset, block->length,
+                   opaque);
+        if (ret) {
+            return ret;
+        }
     }
+    return 0;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e3ec4c8..8042f50 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -118,10 +118,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration-rdma.c b/migration-rdma.c
index b32dbdf..c0e52ed 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -595,10 +595,10 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
+    return __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  2:35   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Improve the existing DPRINTF macros in migration.c and arch_init
by:
  1) Making them go to stderr rather than stdout (so you can run with
-nographic and redirect your debug to a file)
  2) Making them print the ms time with each debug - useful for
debugging latency issues

Add the same macro to savevm

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c |  5 ++++-
 migration.c | 12 ++++++++++++
 savevm.c    | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index c974f3f..772de36 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -53,9 +53,12 @@
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
 
+// #define DEBUG_ARCH_INIT
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
-    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+    do { fprintf(stderr,  "arch_init@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
 #else
 #define DPRINTF(fmt, ...) \
     do { } while (0)
diff --git a/migration.c b/migration.c
index 8d675b3..e241370 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,18 @@
 #include "qmp-commands.h"
 #include "trace.h"
 
+//#define DEBUG_MIGRATION
+
+#ifdef DEBUG_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "migration@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 enum {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
diff --git a/savevm.c b/savevm.c
index e19ae0a..c3a1f68 100644
--- a/savevm.c
+++ b/savevm.c
@@ -43,6 +43,16 @@
 #include "block/snapshot.h"
 #include "block/qapi.h"
 
+#ifdef DEBUG_SAVEVM
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "savevm@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  2:39   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

and use it in loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 qemu-file.c                   | 15 +++++++++++++++
 savevm.c                      | 18 ++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 6ef8ebc..a8cac7a 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -300,4 +300,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);
 #endif
diff --git a/qemu-file.c b/qemu-file.c
index ccc516c..a057b3e 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -879,6 +879,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: 0 on success and a 0 terminated string in the buffer
+ */
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
+{
+    unsigned int len = qemu_get_byte(f);
+    int res = qemu_get_buffer(f, buf, len);
+
+    buf[len] = 0;
+
+    return res != len;
+}
+
 #define QSB_CHUNK_SIZE      (1 << 10)
 #define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
 
diff --git a/savevm.c b/savevm.c
index c3a1f68..cb6f0de 100644
--- a/savevm.c
+++ b/savevm.c
@@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
 
     v = qemu_get_be32(f);
     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
         return -ENOTSUP;
     }
     if (v != QEMU_VM_FILE_VERSION) {
@@ -918,31 +918,33 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
-        char idstr[257];
-        int len;
+        char idstr[256];
 
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
             /* Read section start */
             section_id = qemu_get_be32(f);
-            len = qemu_get_byte(f);
-            qemu_get_buffer(f, (uint8_t *)idstr, len);
-            idstr[len] = 0;
+            if (qemu_get_counted_string(f, (uint8_t *)idstr)) {
+                error_report("Unable to read ID string for section %u",
+                            section_id);
+                return -EINVAL;
+            }
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
-                fprintf(stderr, "Unknown savevm section or instance '%s' %d\n", idstr, instance_id);
+                error_report("Unknown savevm section or instance '%s' %d",
+                             idstr, instance_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
-                fprintf(stderr, "savevm: unsupported version %d for '%s' v%d\n",
+                error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
                 ret = -EINVAL;
                 goto out;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  2:45   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 08/47] socket shutdown Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  9 +++++++++
 include/qemu/typedefs.h       |  2 ++
 migration.c                   | 28 ++++++++++++++++++++++++++++
 savevm.c                      |  2 ++
 4 files changed, 41 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..8a36255 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,6 +41,15 @@ struct MigrationParams {
 
 typedef struct MigrationState MigrationState;
 
+/* State for the incoming migration */
+struct MigrationIncomingState {
+    QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_get_current(void);
+MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
+void migration_incoming_state_destroy(void);
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index db1153a..0f79b5c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct Visitor Visitor;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 
 typedef struct Property Property;
@@ -44,6 +45,7 @@ typedef struct PixelFormat PixelFormat;
 typedef struct QemuConsole QemuConsole;
 typedef struct CharDriverState CharDriverState;
 typedef struct MACAddr MACAddr;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct NetClientState NetClientState;
 typedef struct I2CBus I2CBus;
 typedef struct ISABus ISABus;
diff --git a/migration.c b/migration.c
index e241370..ac46ddb 100644
--- a/migration.c
+++ b/migration.c
@@ -65,6 +65,7 @@ static NotifierList migration_state_notifiers =
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
 
+/* For outgoing */
 MigrationState *migrate_get_current(void)
 {
     static MigrationState current_migration = {
@@ -77,6 +78,28 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+/* For incoming */
+static MigrationIncomingState *mis_current;
+
+MigrationIncomingState *migration_incoming_get_current(void)
+{
+    return mis_current;
+}
+
+MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
+{
+    mis_current = g_malloc0(sizeof(MigrationIncomingState));
+    mis_current->file = f;
+
+    return mis_current;
+}
+
+void migration_incoming_state_destroy(void)
+{
+    g_free(mis_current);
+    mis_current = NULL;
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -106,9 +129,14 @@ static void process_incoming_migration_co(void *opaque)
     Error *local_err = NULL;
     int ret;
 
+    migration_incoming_state_init(f);
+
     ret = qemu_loadvm_state(f);
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
+    migration_incoming_state_destroy();
+
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
         exit(EXIT_FAILURE);
diff --git a/savevm.c b/savevm.c
index cb6f0de..a0c3b40 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,9 +1244,11 @@ int load_vmstate(const char *name)
     }
 
     qemu_system_reset(VMRESET_SILENT);
+    migration_incoming_state_init(f);
     ret = qemu_loadvm_state(f);
 
     qemu_fclose(f);
+    migration_incoming_state_destroy();
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 08/47] socket shutdown
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 18:09   ` Paolo Bonzini
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 09/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add QEMUFile interface to allow a socket to be 'shut down' - i.e. any
reads/writes will fail (and any blocking read/write will be woken).

Add qemu_socket wrapper to let OS dependencies be extracted out.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h | 10 ++++++++++
 include/qemu/sockets.h        |  1 +
 qemu-file.c                   | 27 +++++++++++++++++++++++++--
 util/qemu-sockets.c           | 28 ++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a8cac7a..935cf42 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -84,6 +84,14 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                size_t size,
                                int *bytes_sent);
 
+/*
+ * Stop any read or write (depending on flags) on the underlying
+ * transport on the QEMUFile.
+ * Existing blocking reads/writes must be woken
+ * Returns 0 on success, -err on error
+ */
+typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +102,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
 struct QEMUSizedBuffer {
@@ -178,6 +187,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+void qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index fdbb196..ea8ffc6 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -41,6 +41,7 @@ int socket_set_nodelay(int fd);
 void qemu_set_block(int fd);
 void qemu_set_nonblock(int fd);
 int socket_set_fast_reuse(int fd);
+int socket_shutdown(int fd, bool rd, bool wr);
 int send_all(int fd, const void *buf, int len1);
 int recv_all(int fd, void *buf, int len1, bool single_read);
 
diff --git a/qemu-file.c b/qemu-file.c
index a057b3e..14dcf34 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -90,6 +90,14 @@ static int socket_close(void *opaque)
     return 0;
 }
 
+/* qemufile_ to disambiguate from the qemu-sockets.c code which it uses */
+static int qemufile_socket_shutdown(void *opaque, bool rd, bool wr)
+{
+    QEMUFileSocket *s = opaque;
+
+    return socket_shutdown(s->fd, rd, wr);
+}
+
 static int stdio_get_fd(void *opaque)
 {
     QEMUFileStdio *s = opaque;
@@ -337,15 +345,30 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 static const QEMUFileOps socket_read_ops = {
     .get_fd =     socket_get_fd,
     .get_buffer = socket_get_buffer,
-    .close =      socket_close
+    .close =      socket_close,
+    .shut_down       = qemufile_socket_shutdown
+
 };
 
 static const QEMUFileOps socket_write_ops = {
     .get_fd =     socket_get_fd,
     .writev_buffer = socket_writev_buffer,
-    .close =      socket_close
+    .close =      socket_close,
+    .shut_down       = qemufile_socket_shutdown
+
 };
 
+/*
+ * Stop a file from being read/written - not all backing files can do this
+ * typically only sockets can.  The caller should make sure they only
+ * call this for things that can.
+ */
+void qemu_file_shutdown(QEMUFile *f)
+{
+    assert(f->ops->shut_down);
+    f->ops->shut_down(f, true, true);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 1eef590..830e6d7 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -981,3 +981,31 @@ int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp)
     qemu_opts_del(opts);
     return fd;
 }
+
+int socket_shutdown(int fd, bool rd, bool wr)
+{
+    int how = 0;
+
+#ifndef WIN32
+    if (rd) {
+        how = SHUT_RD;
+    }
+
+    if (wr) {
+        how = rd ? SHUT_RDWR : SHUT_WR;
+    }
+
+#else
+    /* Untested */
+    if (rd) {
+        how = SD_RECEIVE;
+    }
+
+    if (wr) {
+        how = rd ? SD_BOTH : SD_SEND;
+    }
+
+#endif
+
+    return shutdown(fd, how);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 09/47] Provide runtime Target page information
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 08/47] socket shutdown Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into arch_init.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                  | 10 ++++++++++
 include/sysemu/sysemu.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/exec.c b/exec.c
index 7989d19..65ee612 100644
--- a/exec.c
+++ b/exec.c
@@ -2881,4 +2881,14 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
     }
     return 0;
 }
+
+/*
+ * Allows code that needs to deal with migration bitmaps etc to still be built
+ * target independent.
+ */
+size_t qemu_target_page_bits(void)
+{
+    return TARGET_PAGE_BITS;
+}
+
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..6e5953d 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -68,6 +68,7 @@ int qemu_reset_requested_get(void);
 void qemu_system_killed(int signal, pid_t pid);
 void qemu_devices_reset(void);
 void qemu_system_reset(bool report);
+size_t qemu_target_page_bits(void);
 
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 09/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:05   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 qemu-file.c                   | 73 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 935cf42..210e9c3 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                int *bytes_sent);
 
 /*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
+/*
  * Stop any read or write (depending on flags) on the underlying
  * transport on the QEMUFile.
  * Existing blocking reads/writes must be woken
@@ -102,6 +107,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
     QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
@@ -188,6 +194,7 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 void qemu_file_shutdown(QEMUFile *f);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/qemu-file.c b/qemu-file.c
index 14dcf34..7393415 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -26,6 +26,8 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+
+    struct QEMUFile *return_path;
 };
 
 typedef struct QEMUFileStdio {
@@ -38,6 +40,45 @@ typedef struct QEMUFileSocket {
     QEMUFile *file;
 } QEMUFileSocket;
 
+/*
+ * Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *qfs = opaque;
+    int revfd;
+    bool this_is_read;
+    QEMUFile *result;
+
+    /* We should only be called once to get a RP on a file */
+    assert(!qfs->file->return_path);
+
+    if (qemu_file_get_error(qfs->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    /* I don't think there's a better way to tell which direction 'this' is */
+    this_is_read = qfs->file->ops->get_buffer != NULL;
+
+    revfd = dup(qfs->fd);
+    if (revfd == -1) {
+        error_report("Error duplicating fd for return path: %s",
+                      strerror(errno));
+        return NULL;
+    }
+
+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+    qfs->file->return_path = result;
+
+    if (!result) {
+        close(revfd);
+    }
+
+    return result;
+}
+
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                     int64_t pos)
 {
@@ -343,19 +384,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd =     socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close =      socket_close,
-    .shut_down       = qemufile_socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .shut_down       = qemufile_socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd =     socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close =      socket_close,
-    .shut_down       = qemufile_socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .shut_down       = qemufile_socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 /*
@@ -369,6 +410,18 @@ void qemu_file_shutdown(QEMUFile *f)
     f->ops->shut_down(f, true, true);
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:10   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The return path uses a non-blocking fd so as not to block waiting
for the (possibly broken) destination to finish returning a message,
however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/qemu-file.c b/qemu-file.c
index 7393415..57eabd8 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -85,12 +85,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
     }
-    return len;
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:12   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 13/47] Migration commands Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: Cristian Klein <cristian.klein@cs.umu.se>

Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se>
---
 migration-fd.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/migration-fd.c b/migration-fd.c
index d2e523a..129da99 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -31,13 +31,29 @@
     do { } while (0)
 #endif
 
+static bool fd_is_socket(int fd)
+{
+    struct stat stat;
+    int ret = fstat(fd, &stat);
+    if (ret == -1) {
+        /* When in doubt say no */
+        return false;
+    }
+    return S_ISSOCK(stat.st_mode);
+}
+
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
 {
     int fd = monitor_get_fd(cur_mon, fdname, errp);
     if (fd == -1) {
         return;
     }
-    s->file = qemu_fdopen(fd, "wb");
+
+    if (fd_is_socket(fd)) {
+        s->file = qemu_fopen_socket(fd, "wb");
+    } else {
+        s->file = qemu_fdopen(fd, "wb");
+    }
 
     migrate_fd_connect(s);
 }
@@ -58,7 +74,11 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
     DPRINTF("Attempting to start an incoming migration via fd\n");
 
     fd = strtol(infd, NULL, 0);
-    f = qemu_fdopen(fd, "rb");
+    if (fd_is_socket(fd)) {
+        f = qemu_fopen_socket(fd, "rb");
+    } else {
+        f = qemu_fdopen(fd, "rb");
+    }
     if(f == NULL) {
         error_setg_errno(errp, errno, "failed to open the source descriptor");
         return;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 13/47] Migration commands
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:14   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  9 ++++++++
 savevm.c                      | 48 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8a36255..e23947a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -33,6 +33,7 @@
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_COMMAND              0x06
 
 struct MigrationParams {
     bool blk;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 6e5953d..eed7e77 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -82,6 +82,13 @@ void do_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+
+    QEMU_VM_CMD_AFTERLASTVALID
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -89,6 +96,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index a0c3b40..3cae292 100644
--- a/savevm.c
+++ b/savevm.c
@@ -592,6 +592,25 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
     vmstate_save_state(f, se->vmsd, se->opaque);
 }
 
+
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    uint32_t tmp = (uint16_t)command;
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, tmp);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -881,6 +900,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * negative return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t com;
+    uint16_t len;
+
+    com = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
+    switch (com) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -987,6 +1029,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 13/47] Migration commands Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 18:08   ` Paolo Bonzini
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPENRP - To request that the destination open the return path
   * REQACK - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  4 +++
 savevm.c                      | 57 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e23947a..173775b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,8 @@ typedef struct MigrationState MigrationState;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
+
+    QEMUFile *return_path;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index eed7e77..ad96f2a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -85,6 +85,8 @@ void qemu_announce_self(void);
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
+    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
     QEMU_VM_CMD_AFTERLASTVALID
 };
@@ -98,6 +100,8 @@ void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_openrp(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index 3cae292..793384a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -611,6 +611,19 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    DPRINTF("send_reqack %d", value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, QEMU_VM_CMD_REQACK, 4, (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_openrp(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
+}
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -900,20 +913,64 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
     uint16_t len;
+    uint32_t tmp32;
 
     com = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
     /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
     switch (com) {
+    case QEMU_VM_CMD_OPENRP:
+        if (loadvm_process_command_simple_lencheck("CMD_OPENRP", len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPENRP called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPENRP failed - could not open return path");
+            return -1;
+        }
+        break;
+
+    case QEMU_VM_CMD_REQACK:
+        if (loadvm_process_command_simple_lencheck("CMD_REQACK", len, 4)) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        DPRINTF("Received REQACK 0x%x", tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_REQACK (0x%x) received with no open return path",
+                         tmp32);
+            return -1;
+        }
+        /* migrate_send_rp_ack(mis, tmp32); TODO: gets added later */
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:22   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send an 'ack' message
  Use it in the CMD_REQACK handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 18 ++++++++++++++++++
 migration.c                   | 41 +++++++++++++++++++++++++++++++++++++++++
 savevm.c                      |  2 +-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 173775b..12e640d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,6 +40,13 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Commands sent on the return path from destination to source*/
+enum mig_rpcomm_cmd {
+    MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
+    MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_AFTERLASTVALID
+};
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -47,6 +54,7 @@ struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -168,6 +176,16 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value);
+
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index ac46ddb..5ba8f3e 100644
--- a/migration.c
+++ b/migration.c
@@ -90,6 +90,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -100,6 +101,46 @@ void migration_incoming_state_destroy(void)
     mis_current = NULL;
 }
 
+/* Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data)
+{
+    DPRINTF("migrate_send_rp_message: cmd=%d, len=%d\n", (int)cmd, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)cmd);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  None-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_SHUT, 4, (uint8_t *)&buf);
+}
+
+/* Send an 'ACK' message on the return channel with the given value */
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
diff --git a/savevm.c b/savevm.c
index 793384a..8eebbfd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -969,7 +969,7 @@ static int loadvm_process_command(QEMUFile *f)
                          tmp32);
             return -1;
         }
-        /* migrate_send_rp_ack(mis, tmp32); TODO: gets added later */
+        migrate_send_rp_ack(mis, tmp32);
         break;
 
     default:
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 18:14   ` Paolo Bonzini
                     ` (2 more replies)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  48 siblings, 3 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  10 +++
 migration.c                   | 181 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 12e640d..b87c289 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -47,6 +47,14 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
     MIG_RPCOMM_AFTERLASTVALID
 };
+
+/* Source side RP state */
+struct MigrationRetPathState {
+    uint32_t      latest_ack;
+    QemuThread    rp_thread;
+    bool          error;
+};
+
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -69,9 +77,11 @@ struct MigrationState
     QemuThread thread;
     QEMUBH *cleanup_bh;
     QEMUFile *file;
+    QEMUFile *return_path;
 
     int state;
     MigrationParams params;
+    struct MigrationRetPathState rp_state;
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration.c b/migration.c
index 5ba8f3e..ee6db1d 100644
--- a/migration.c
+++ b/migration.c
@@ -246,6 +246,23 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+/*
+ * Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIG_STATE_ACTIVE:
+    case MIG_STATE_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -371,6 +388,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    QEMUFile *rp = ms->return_path;
+
+    /*
+     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
+     * cleaned up from a few threads; make sure not to do it twice in parallel
+     */
+    rp = atomic_cmpxchg(&ms->return_path, rp, NULL);
+    if (rp) {
+        DPRINTF("cleaning up return path\n");
+        qemu_fclose(rp);
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -378,6 +410,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
     int old_state ;
     trace_migrate_fd_cancel();
 
+    if (s->return_path) {
+        /* shutdown the rp socket, so causing the rp thread to shutdown */
+        qemu_file_shutdown(s->return_path);
+    }
+
     do {
         old_state = s->state;
         if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
@@ -655,8 +694,148 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
 
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ */
+static void *source_return_path_thread(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->return_path;
+    uint16_t expected_len, header_len, header_com;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    DPRINTF("RP: %s entry", __func__);
+    while (rp && !qemu_file_get_error(rp) &&
+        migration_already_active(ms)) {
+        DPRINTF("RP: %s top of loop", __func__);
+        header_com = qemu_get_be16(rp);
+        header_len = qemu_get_be16(rp);
+
+        switch (header_com) {
+        case MIG_RPCOMM_SHUT:
+        case MIG_RPCOMM_ACK:
+            expected_len = 4;
+            break;
+
+        default:
+            error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
+                    header_com, header_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        if (header_len > expected_len) {
+            error_report("RP: Received command 0x%04x with"
+                    "incorrect length %d expecting %d",
+                    header_com, header_len,
+                    expected_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, header_len);
+        if (res != header_len) {
+            DPRINTF("RP: Failed to read command data");
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the command and the data */
+        switch (header_com) {
+        case MIG_RPCOMM_SHUT:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            if (tmp32) {
+                error_report("RP: Sibling indicated error %d", tmp32);
+                source_return_path_bad(ms);
+            } else {
+                DPRINTF("RP: SHUT received");
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RPCOMM_ACK:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            DPRINTF("RP: Received ACK 0x%x", tmp32);
+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
+            break;
+
+        default:
+            /* This shouldn't happen because we should catch this above */
+            DPRINTF("RP: Bad header_com in dispatch");
+        }
+        /* Latest command processed, now leave a gap for the next one */
+        header_com = MIG_RPCOMM_INVALID;
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        DPRINTF("%s: rp bad at end", __func__);
+        source_return_path_bad(ms);
+    }
+
+    DPRINTF("%s: Bottom exit", __func__);
+
+out:
+    return NULL;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+static int open_outgoing_return_path(MigrationState *ms)
+{
+
+    ms->return_path = qemu_file_get_return_path(ms->file);
+    if (!ms->return_path) {
+        return -1;
+    }
+
+    DPRINTF("%s: starting thread", __func__);
+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+
+    DPRINTF("%s: continuing", __func__);
+
+    return 0;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+static void await_outgoing_return_path_close(MigrationState *ms)
+{
+    /*
+     * If this is a normal exit then the destination will send a SHUT and the
+     * rp_thread will exit, however if there's an error we need to cause
+     * it to exit, which we can do by a shutdown.
+     * (canceling must also shutdown to stop us getting stuck here if
+     * the destination died at just the wrong place)
+     */
+    if (qemu_file_get_error(ms->file) && ms->return_path) {
+        qemu_file_shutdown(ms->return_path);
+    }
+    DPRINTF("%s: Joining", __func__);
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    DPRINTF("%s: Exit", __func__);
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:49   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Flip many fprintf's to error_report
Add lots of DPRINTF debug in qemu_loadvm*

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/savevm.c b/savevm.c
index 8eebbfd..2c0d61a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -719,6 +719,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
 
         if (ret < 0) {
+            DPRINTF("%s: setting error state after iterate on id=%d/%s",
+                    __func__, se->section_id, se->idstr);
             qemu_file_set_error(f, ret);
         }
         if (ret <= 0) {
@@ -1019,6 +1021,7 @@ int qemu_loadvm_state(QEMUFile *f)
         SaveStateEntry *se;
         char idstr[256];
 
+        DPRINTF("qemu_loadvm_state loop: section_type=%d", section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
@@ -1032,6 +1035,9 @@ int qemu_loadvm_state(QEMUFile *f)
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
+            DPRINTF("qemu_loadvm_state loop START/FULL: id=%d(%s)",
+                    section_id, idstr);
+
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
@@ -1059,8 +1065,9 @@ int qemu_loadvm_state(QEMUFile *f)
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state for instance 0x%x of device '%s'\n",
-                        instance_id, idstr);
+                error_report("qemu: error while loading state for"
+                             "instance 0x%x of device '%s'",
+                             instance_id, idstr);
                 goto out;
             }
             break;
@@ -1068,23 +1075,25 @@ int qemu_loadvm_state(QEMUFile *f)
         case QEMU_VM_SECTION_END:
             section_id = qemu_get_be32(f);
 
+            DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
-                fprintf(stderr, "Unknown savevm section %d\n", section_id);
+                error_report("Unknown savevm section %d", section_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state section id %d\n",
-                        section_id);
+                error_report("qemu: error while loading state section"
+                             " id %d (%s)", section_id, le->se->idstr);
                 goto out;
             }
+            DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
@@ -1093,11 +1102,12 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             break;
         default:
-            fprintf(stderr, "Unknown savevm section type %d\n", section_type);
+            error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
             goto out;
         }
     }
+    DPRINTF("qemu_loadvm_state loop: exited loop");
 
     cpu_synchronize_all_post_init();
 
@@ -1113,6 +1123,7 @@ out:
         ret = qemu_file_get_error(f);
     }
 
+    DPRINTF("qemu_loadvm_state out: ret=%d", ret);
     return ret;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  3:58   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Misses out lines that are all the expected value so the output
can be quite compact depending on the circumstance.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 772de36..6970733 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -769,6 +769,45 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of and it won't bother printing lines that are all this value
+ * if 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128l;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur+linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur+curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found |= (thisbit ^ expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
+        }
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index b87c289..ff47987 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -157,6 +157,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 16:46   ` Paolo Bonzini
  2014-11-03  5:08   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and signal whether the parent
should.

loadvm_handlers is made static since it's lifetime is greater
than the outer qemu_loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 136 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 84 insertions(+), 52 deletions(-)

diff --git a/savevm.c b/savevm.c
index 2c0d61a..7236232 100644
--- a/savevm.c
+++ b/savevm.c
@@ -915,6 +915,26 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/* These are ORable flags */
+const int LOADVM_EXITCODE_QUITLOOP     =  1;
+const int LOADVM_EXITCODE_QUITPARENT   =  2;
+const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
+
+typedef struct LoadStateEntry {
+    QLIST_ENTRY(LoadStateEntry) entry;
+    SaveStateEntry *se;
+    int section_id;
+    int version_id;
+} LoadStateEntry;
+
+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+static LoadStateEntry_Head loadvm_handlers =
+ QLIST_HEAD_INITIALIZER(loadvm_handlers);
+
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -931,8 +951,11 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
+ * 0   just a normal return
+ * 1   All good, but exit the loop
  */
-static int loadvm_process_command(QEMUFile *f)
+static int loadvm_process_command(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
@@ -982,39 +1005,13 @@ static int loadvm_process_command(QEMUFile *f)
     return 0;
 }
 
-typedef struct LoadStateEntry {
-    QLIST_ENTRY(LoadStateEntry) entry;
-    SaveStateEntry *se;
-    int section_id;
-    int version_id;
-} LoadStateEntry;
-
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
-    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-        QLIST_HEAD_INITIALIZER(loadvm_handlers);
-    LoadStateEntry *le, *new_le;
+    LoadStateEntry *le;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-
-    if (qemu_savevm_state_blocked(NULL)) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        return -ENOTSUP;
-    }
+    int exitcode = 0;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
@@ -1043,16 +1040,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1061,14 +1056,14 @@ int qemu_loadvm_state(QEMUFile *f)
             le->se = se;
             le->section_id = section_id;
             le->version_id = version_id;
-            QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+            QLIST_INSERT_HEAD(loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state for"
                              "instance 0x%x of device '%s'",
                              instance_id, idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1076,47 +1071,84 @@ int qemu_loadvm_state(QEMUFile *f)
             section_id = qemu_get_be32(f);
 
             DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
-            QLIST_FOREACH(le, &loadvm_handlers, entry) {
+            QLIST_FOREACH(le, loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state section"
                              " id %d (%s)", section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
-            ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            ret = loadvm_process_command(f, loadvm_handlers);
+            DPRINTF("%s QEMU_VM_COMMAND ret: %d", __func__, ret);
+            if ((ret < 0) || (ret & LOADVM_EXITCODE_QUITLOOP)) {
+                return ret;
             }
+            exitcode |= ret; /* Lets us pass flags up to the parent */
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
     DPRINTF("qemu_loadvm_state loop: exited loop");
 
-    cpu_synchronize_all_post_init();
+    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
+        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
+        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
+        exitcode &= LOADVM_EXITCODE_QUITLOOP;
+    }
+
+    return exitcode;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    LoadStateEntry *le, *new_le;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        return -EINVAL;
+    }
 
-    ret = 0;
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        return -ENOTSUP;
+    }
+
+    QLIST_INIT(&loadvm_handlers);
+    ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
-out:
-    QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-        QLIST_REMOVE(le, entry);
-        g_free(le);
+    if (ret == 0) {
+        cpu_synchronize_all_post_init();
+    }
+
+    if ((ret < 0) || !(ret & LOADVM_EXITCODE_KEEPHANDLERS)) {
+        QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
+            QLIST_REMOVE(le, entry);
+            g_free(le);
+        }
     }
 
     if (ret == 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram.
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-06 18:59   ` Eric Blake
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/migration.h | 1 +
 migration.c                   | 9 +++++++++
 qapi-schema.json              | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ff47987..0d9f62d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -173,6 +173,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
 
diff --git a/migration.c b/migration.c
index ee6db1d..527423e 100644
--- a/migration.c
+++ b/migration.c
@@ -658,6 +658,15 @@ bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 2e9e261..cdf5290 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -494,10 +494,14 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
+#          migrated, pulling the remaining pages along as needed. NOTE: If the
+#          migration fails during postcopy the VM will fail.  (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-03  5:51   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add state variable showing current incoming postcopy state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 +
 include/sysemu/sysemu.h       |  20 +++
 savevm.c                      | 335 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 363 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0d9f62d..2c078c4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -61,6 +61,14 @@ typedef struct MigrationState MigrationState;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    volatile enum {
+        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+        POSTCOPY_RAM_INCOMING_ADVISE,
+        POSTCOPY_RAM_INCOMING_LISTENING,
+        POSTCOPY_RAM_INCOMING_RUNNING,
+        POSTCOPY_RAM_INCOMING_END
+    } postcopy_ram_state;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ad96f2a..102dd93 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -88,6 +88,16 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
+    QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
+                                              warn we might want to do PC */
+    QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,      /* A list of pages to discard that
+                                              were previously sent during
+                                              precopy but are dirty. */
+    QEMU_VM_CMD_POSTCOPY_RAM_LISTEN,       /* Start listening for incoming
+                                              pages as it's running. */
+    QEMU_VM_CMD_POSTCOPY_RAM_RUN,          /* Start execution */
+    QEMU_VM_CMD_POSTCOPY_RAM_END,          /* Postcopy is finished. */
+
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
@@ -102,6 +112,16 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *addrlist,
+                                           uint32_t *masklist);
+
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index 7236232..b942e8c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -39,6 +39,7 @@
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -624,6 +625,92 @@ void qemu_savevm_send_openrp(QEMUFile *f)
 {
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
+
+/* Send prior to any RAM transfer */
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-advise");
+    uint64_t tmp[2];
+    tmp[0] = cpu_to_be64(sysconf(_SC_PAGESIZE));
+    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
+
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 16,
+                             (uint8_t *)tmp);
+}
+
+/* Prior to running, to cause pages that have been dirtied after precopy
+ * started to be discarded on the destination.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *  3 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
+ *      byte   version (0)
+ *      byte   offset into the 1st data word containing 1st page of RAMBlock
+ *      byte   Length of name field
+ *  n x byte   RAM block name (NOT 0 terminated)
+ *  n x
+ *      be64   Page addresses for start of an invalidation range
+ *      be32   mask of 32 pages, '1' to discard'
+ *
+ *  Hopefully this is pretty sparse so we don't get too many entries,
+ *  and using the mask should deal with most pagesize differences
+ *  just ending up as a single full mask
+ *
+ * The mask is always 32bits irrespective of the long size
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  addrlist: 'len' addresses
+ *  masklist: 'len' masks (corresponding to the addresses)
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *addrlist,
+                                           uint32_t *masklist)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+    uint16_t t;
+
+    DPRINTF("send postcopy-ram-discard");
+    buf = g_malloc0(len*12 + strlen(name) + 3);
+    buf[0] = 0; /* Version */
+    buf[1] = offset;
+    assert(strlen(name) < 256);
+    buf[2] = strlen(name);
+    memcpy(buf+3, name, strlen(name));
+    tmplen = 3+strlen(name);
+
+    for (t = 0; t < len; t++) {
+        cpu_to_be64w((uint64_t *)(buf + tmplen), addrlist[t]);
+        tmplen += 8;
+        cpu_to_be32w((uint32_t *)(buf + tmplen), masklist[t]);
+        tmplen += 4;
+    }
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,
+                             tmplen, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive page data. */
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-listen");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-run");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL);
+}
+
+/* End of postcopy - with a status byte; 0 is good, anything else is a fail */
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status)
+{
+    DPRINTF("send postcopy-ram-end");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -935,6 +1022,220 @@ static LoadStateEntry_Head loadvm_handlers =
 static int qemu_loadvm_state_main(QEMUFile *f,
                                   LoadStateEntry_Head *loadvm_handlers);
 
+/* ------ incoming postcopy-ram messages ------ */
+/* 'advise' arrives before any RAM transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis,
+                                             uint64_t remote_hps,
+                                             uint64_t remote_tps)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    if (remote_hps != sysconf(_SC_PAGESIZE))  {
+        /*
+         * Some combinations of mismatch are probably possible but it gets
+         * a bit more complicated.  In particular we need to place whole
+         * host pages on the dest at once, and we need to ensure that we
+         * handle dirtying to make sure we never end up sending part of
+         * a hostpage on it's own.
+         */
+        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
+                     (int)remote_hps, (int)sysconf(_SC_PAGESIZE));
+        return -1;
+    }
+
+    if (remote_tps != (1ul << qemu_target_page_bits())) {
+        /*
+         * Again, some differences could be dealt with, but for now keep it
+         * simple.
+         */
+        error_report("Postcopy needs matching target page sizes (s=%d d=%d)",
+                     (int)remote_tps, 1 << qemu_target_page_bits());
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
+
+    /*
+     * Postcopy will be sending lots of small messages along the return path
+     * that it needs quick answers to.
+     */
+    socket_set_nodelay(qemu_get_fd(mis->return_path));
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ * Bits set in the message represent a page in the source VMs bitmap, but
+ * since the guest/target page sizes can be different on s/d then we have
+ * to convert.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    unsigned int first_bit_offset;
+    char ramid[256];
+
+    DPRINTF("%s", __func__);
+
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    /* We're expecting a
+     *    3 byte header,
+     *    a RAM ID string
+     *    then at least 1 12 byte chunks
+    */
+    if (len < 16) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+    first_bit_offset = qemu_get_byte(mis->file);
+
+    if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len % 12) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    while (len) {
+        uint64_t startaddr;
+        uint32_t mask;
+        /*
+         * We now have pairs of address, mask
+         *   The mask is 32 bits of bitmask starting at 'startaddr'-offset
+         *   RAMBlock; e.g. if the RAMBlock started at 8k where TPS=4k
+         *   then first_bit_offset=2 and the 1st 2 bits of the mask
+         *   aren't relevant to this RAMBlock, and bit 2 corresponds
+         *   to the 1st page of this RAMBlock
+         */
+        startaddr = qemu_get_be64(mis->file);
+        mask = qemu_get_be32(mis->file);
+
+        len -= 12;
+
+        while (mask) {
+            /* mask= .....?10...0 */
+            /*             ^fs    */
+            int firstset = ctz32(mask);
+
+            /* tmp32=.....?11...1 */
+            /*             ^fs    */
+            uint32_t tmp32 = mask | ((((uint32_t)1)<<firstset)-1);
+
+            /* mask= .?01..10...0 */
+            /*         ^fz ^fs    */
+            int firstzero = cto32(tmp32);
+
+            if ((startaddr == 0) && (firstset < first_bit_offset)) {
+                error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set"
+                               " prior to block; block=%s offset=%d"
+                               " firstset=%d\n", ramid, first_bit_offset,
+                               firstzero);
+                return -1;
+            }
+
+            /*
+             * we know there must be at least 1 bit set due to the loop entry
+             * If there is no 0 firstzero will be 32
+             */
+            /* TODO - ram_discard_range gets added in a later patch
+            int ret = ram_discard_range(mis, ramid,
+                                startaddr + firstset - first_bit_offset,
+                                startaddr + (firstzero - 1) - first_bit_offset);
+            ret = -1;
+            if (ret) {
+                return ret;
+            }
+            */
+
+            /* mask= .?0000000000 */
+            /*         ^fz ^fs    */
+            if (firstzero != 32) {
+                mask &= (((uint32_t)-1) << firstzero);
+            } else {
+                mask = 0;
+            }
+        }
+    }
+    DPRINTF("%s finished", __func__);
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive page data */
+static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
+/* The end - with a byte from the source which can tell us to fail. */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_END in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    return -1; /* TODO - expecting 1 byte good/fail */
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -961,6 +1262,7 @@ static int loadvm_process_command(QEMUFile *f,
     uint16_t com;
     uint16_t len;
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
 
     com = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -997,6 +1299,39 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+                                                   len, 16)) {
+            return -1;
+        }
+        tmp64a = qemu_get_be64(f); /* hps */
+        tmp64b = qemu_get_be64(f); /* tps */
+        return loadvm_postcopy_ram_handle_advise(mis, tmp64a, tmp64b);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_listen(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_run(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_END:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_END",
+                                                   len, 1)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_end(mis);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  1:28   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
of migration stream to be sent in one go, and be received by
a separate instance of the loadvm loop while not interacting
with the migration stream.

This is used by postcopy to load device state (from the package)
while loading memory pages from the main stream.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 savevm.c                | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 102dd93..ef98fa9 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,7 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
+    QEMU_VM_CMD_PACKAGED,      /* Send a wrapped stream within this stream */
 
     QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
                                               warn we might want to do PC */
@@ -101,6 +102,8 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -112,6 +115,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len, uint8_t offset,
diff --git a/savevm.c b/savevm.c
index b942e8c..bffe890 100644
--- a/savevm.c
+++ b/savevm.c
@@ -626,6 +626,38 @@ void qemu_savevm_send_openrp(QEMUFile *f)
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ */
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    tmp = cpu_to_be32(len);
+
+    DPRINTF("send_packaged");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+}
+
 /* Send prior to any RAM transfer */
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
 {
@@ -1249,6 +1281,48 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length,
+                                      LoadStateEntry_Head *loadvm_handlers)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    DPRINTF("loadvm_handle_cmd_packaged: length=%u", length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    DPRINTF("%s: Received %d package, going to load", __func__, ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    if (!qsb) {
+        error_report("Unable to create qsb");
+    }
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, loadvm_handlers);
+    DPRINTF("%s: qemu_loadvm_state_main returned %d", __func__, ret);
+    qemu_fclose(packf); /* also frees the qsb */
+
+    return ret;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
@@ -1299,6 +1373,14 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_PACKAGED",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32, loadvm_handlers);
+
     case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
                                                    len, 16)) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-08  2:28   ` zhanghailiang
  2014-11-04  1:29   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 include/qemu/typedefs.h       | 1 +
 migration.c                   | 2 +-
 savevm.c                      | 2 ++
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2c078c4..3aeae47 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -140,6 +140,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0f79b5c..8539de6 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -16,6 +16,7 @@ struct Monitor;
 typedef struct Monitor Monitor;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 
 typedef struct Property Property;
 typedef struct PropertyInfo PropertyInfo;
diff --git a/migration.c b/migration.c
index 527423e..3a45b2a 100644
--- a/migration.c
+++ b/migration.c
@@ -488,7 +488,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index bffe890..a368a25 100644
--- a/savevm.c
+++ b/savevm.c
@@ -949,6 +949,8 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  1:33   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use that to split the qemu_savevm_state_pending counts into postcopiable
and non-postcopiable amounts

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  7 +++++++
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  4 +++-
 migration.c                 |  9 ++++++++-
 savevm.c                    | 23 +++++++++++++++++++----
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 6970733..44072d8 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* RAM's always up for postcopying */
+static bool ram_can_postcopy(void *opaque)
+{
+    return true;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
+    .can_postcopy = ram_can_postcopy,
 };
 
 void ram_mig_init(void)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a001bd..4991935 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,7 +54,7 @@ typedef struct SaveVMHandlers {
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
     uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    bool (*can_postcopy)(void *opaque);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ef98fa9..e7ff3d0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -110,7 +110,9 @@ void qemu_savevm_state_begin(QEMUFile *f,
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/migration.c b/migration.c
index 3a45b2a..bca397d 100644
--- a/migration.c
+++ b/migration.c
@@ -865,8 +865,15 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
+            uint64_t pend_post, pend_nonpost;
+            DPRINTF("iterate\n");
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
             trace_migrate_pending(pending_size, max_size);
+            DPRINTF("pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64
+                    " nonpost=%" PRIu64 ")\n",
+                    pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/savevm.c b/savevm.c
index a368a25..1642a59 100644
--- a/savevm.c
+++ b/savevm.c
@@ -911,10 +911,18 @@ void qemu_savevm_state_complete(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t res_nonpc = 0;
+    uint64_t res_pc = 0;
+    uint64_t tmp;
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -925,9 +933,16 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        tmp = se->ops->save_live_pending(f, se->opaque, max_size);
+
+        if (se->ops->can_postcopy(se->opaque)) {
+            res_pc += tmp;
+        } else {
+            res_nonpc += tmp;
+        }
     }
-    return ret;
+    *res_non_postcopiable = res_nonpc;
+    *res_postcopiable = res_pc;
 }
 
 void qemu_savevm_state_cancel(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  1:40   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 Makefile.objs                    |   2 +-
 include/migration/postcopy-ram.h |  19 +++++
 postcopy-ram.c                   | 160 +++++++++++++++++++++++++++++++++++++++
 savevm.c                         |   6 ++
 4 files changed, 186 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

diff --git a/Makefile.objs b/Makefile.objs
index 97db978..fa0a3a0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -54,7 +54,7 @@ common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
 common-obj-y += qemu-char.o #aio.o
 common-obj-y += block-migration.o
-common-obj-y += page_cache.o xbzrle.o
+common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..dcd1afa
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return 0 if the host supports everything we need to do postcopy-ram */
+int postcopy_ram_hosttest(void);
+
+#endif
diff --git a/postcopy-ram.c b/postcopy-ram.c
new file mode 100644
index 0000000..bba5c71
--- /dev/null
+++ b/postcopy-ram.c
@@ -0,0 +1,160 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+
+//#define DEBUG_POSTCOPY
+
+#ifdef DEBUG_POSTCOPY
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "postcopy@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+/* On Linux we use:
+ *    madvise MADV_USERFAULT - to mark an area of anonymous memory such
+ *                             that userspace is notifed of accesses to
+ *                             unallocated areas.
+ *    userfaultfd      - opens a socket to receive USERFAULT messages
+ *    remap_anon_pages - to shuffle mapped pages into previously unallocated
+ *                       areas without creating loads of VMAs.
+ */
+
+#include <sys/mman.h>
+#include <sys/types.h>
+
+/* TODO remove once we have libc defs */
+
+#ifdef HOST_X86_64
+ /* NOTE: These are Andrea's 3.15.0 world */
+#ifndef MADV_USERFAULT
+#define MADV_USERFAULT   18
+#define MADV_NOUSERFAULT 19
+#endif
+
+#ifndef __NR_remap_anon_pages
+#define __NR_remap_anon_pages 321
+#endif
+
+#ifndef __NR_userfaultfd
+#define __NR_userfaultfd 322
+#endif
+
+#endif
+
+#ifndef USERFAULTFD_PROTOCOL
+#define USERFAULTFD_PROTOCOL (uint64_t)0xaa
+#endif
+
+#endif
+
+#if defined(__linux__) && defined(MADV_USERFAULT) && \
+                          defined(__NR_remap_anon_pages)
+
+int postcopy_ram_hosttest(void)
+{
+    /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
+     *
+     * Try each syscall we need, but this isn't a testbench,
+     * just enough to see that we have the calls
+     */
+    void *testarea = NULL, *testarea2 = NULL;
+    long pagesize = getpagesize();
+    int ufd = -1;
+    int ret = -1; /* Error unless we change it */
+
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (!testarea) {
+        perror("postcopy_ram_hosttest: Failed to map test area");
+        goto out;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        perror("postcopy_ram_hosttest: userfaultfd not available");
+        goto out;
+    }
+
+    if (madvise(testarea, pagesize, MADV_USERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
+        goto out;
+    }
+
+    if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
+        goto out;
+    }
+
+    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                     MAP_ANONYMOUS, -1, 0);
+    if (!testarea2) {
+        perror("postcopy_ram_hosttest: Failed to map second test area");
+        goto out;
+    }
+    g_assert(((size_t)testarea2 & (pagesize-1)) == 0);
+    *(char *)testarea = 0; /* Force the map of the new page */
+    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
+        pagesize) {
+        perror("postcopy_ram_hosttest: remap_anon_pages not available");
+        goto out;
+    }
+
+    /* Success! */
+    ret = 0;
+out:
+    if (testarea) {
+        munmap(testarea, pagesize);
+    }
+    if (testarea2) {
+        munmap(testarea2, pagesize);
+    }
+    if (ufd != -1) {
+        close(ufd);
+    }
+    return ret;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+int postcopy_ram_hosttest(void)
+{
+    error_report("postcopy_ram_hosttest: No OS support");
+    return -1;
+}
+
+#endif
+
diff --git a/savevm.c b/savevm.c
index 1642a59..a0cb88b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -33,6 +33,7 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
@@ -1087,6 +1088,11 @@ static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    /* Check this host can do it  */
+    if (postcopy_ram_hosttest()) {
+        return -1;
+    }
+
     if (remote_hps != sysconf(_SC_PAGESIZE))  {
         /*
          * Some combinations of mismatch are probably possible but it gets
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  1:47   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration.c                   | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0b1a4f7..63cd23a 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -985,6 +985,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 63d7686..07acb8d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1079,6 +1079,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 4bb5dca..da1334f 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,6 +64,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3aeae47..b74121e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -100,6 +100,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    volatile bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index bca397d..7757acc 100644
--- a/migration.c
+++ b/migration.c
@@ -379,6 +379,28 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIG_STATE_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    s->start_postcopy = true;
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index cdf5290..792dd63 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -541,6 +541,14 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.2
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1abd619..b4dd291 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -685,6 +685,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_input_query_migrate_cache_size,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  1:49   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIG_STATE_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration.c                   | 58 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index b74121e..2ff9d35 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -147,6 +147,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_bytes_remaining(void);
diff --git a/migration.c b/migration.c
index 7757acc..29ee740 100644
--- a/migration.c
+++ b/migration.c
@@ -38,13 +38,14 @@
     do { } while (0)
 #endif
 
-enum {
+enum MigrationPhase {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
     MIG_STATE_SETUP,
     MIG_STATE_CANCELLING,
     MIG_STATE_CANCELLED,
     MIG_STATE_ACTIVE,
+    MIG_STATE_POSTCOPY_ACTIVE,
     MIG_STATE_COMPLETED,
 };
 
@@ -254,6 +255,7 @@ static bool migration_already_active(MigrationState *ms)
 {
     switch (ms->state) {
     case MIG_STATE_ACTIVE:
+    case MIG_STATE_POSTCOPY_ACTIVE:
     case MIG_STATE_SETUP:
         return true;
 
@@ -326,6 +328,40 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIG_STATE_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->status = g_strdup("postcopy-active");
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIG_STATE_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -369,7 +405,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -444,7 +480,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIG_STATE_ACTIVE);
+    assert((s->state != MIG_STATE_ACTIVE) &&
+           (s->state != MIG_STATE_POSTCOPY_ACTIVE));
 
     if (s->state != MIG_STATE_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -477,7 +514,8 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
+        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE &&
+            old_state != MIG_STATE_POSTCOPY_ACTIVE) {
             break;
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
@@ -510,6 +548,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIG_STATE_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -558,7 +601,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIG_STATE_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -882,7 +925,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
-    while (s->state == MIG_STATE_ACTIVE) {
+    DPRINTF("setup complete\n");
+
+    while (s->state == MIG_STATE_ACTIVE ||
+           s->state == MIG_STATE_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  2:18   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When postcopy calls qemu_savevm_state_complete it's not really
the end of migration, so skip:
   a) Finishing postcopiable iterative devices - they'll carry on
   b) The termination byte on the end of the stream.

We then also add:
  qemu_savevm_state_postcopy_complete
which is called at the end of a postcopy migration to call the
complete methods on devices skipped in the _complete call.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e7ff3d0..46665ce 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -113,6 +113,7 @@ void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
+void qemu_savevm_state_postcopy_complete(QEMUFile *f);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/savevm.c b/savevm.c
index a0cb88b..7c4541d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -854,10 +854,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
+/*
+ * Calls the complete routines just for those devices that are postcopiable;
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls the plain qemu_savevm_state_complete to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_postcopy_complete(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete ||
+            !se->ops->can_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_savevm_send_postcopy_ram_end(f, 0 /* Good */);
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+}
+
 void qemu_savevm_state_complete(QEMUFile *f)
 {
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete();
 
@@ -872,6 +913,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
                 continue;
             }
         }
+        if (in_postcopy && se->ops &&  se->ops->can_postcopy &&
+            se->ops->can_postcopy(se->opaque)) {
+            DPRINTF("%s: Skipping %s in postcopy", __func__, se->idstr);
+            continue;
+        }
         trace_savevm_section_start(se->idstr, se->section_id);
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_END);
@@ -908,7 +954,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
+
     qemu_fflush(f);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-04  3:09   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The PMI holds the state of each page on the incoming side,
so that we can tell if the page is missing, already received
or there is a request outstanding for it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  19 ++++
 include/migration/postcopy-ram.h |  12 +++
 include/qemu/typedefs.h          |   1 +
 postcopy-ram.c                   | 220 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 252 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2ff9d35..1405a15 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -57,6 +57,24 @@ struct MigrationRetPathState {
 
 typedef struct MigrationState MigrationState;
 
+/* Postcopy page-map-incoming - data about each page on the inbound side */
+
+typedef enum {
+   POSTCOPY_PMI_MISSING,   /* page hasn't yet been received */
+   POSTCOPY_PMI_REQUESTED, /* Kernel asked for a page, but we've not got it */
+   POSTCOPY_PMI_RECEIVED   /* We've got the page */
+} PostcopyPMIState;
+
+struct PostcopyPMI {
+    QemuMutex      mutex;
+    unsigned long *received_map;  /* Pages that we have received */
+    unsigned long *requested_map; /* Pages that we're sending a request for */
+    unsigned long  host_mask;     /* A mask with enough bits set to cover one
+                                     host page in the PMI */
+    unsigned long  host_bits;     /* The number of bits in the map representing
+                                     one host page */
+};
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -71,6 +89,7 @@ struct MigrationIncomingState {
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    PostcopyPMI    postcopy_pmi;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index dcd1afa..addb88a 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -13,7 +13,19 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "migration/migration.h"
+
 /* Return 0 if the host supports everything we need to do postcopy-ram */
 int postcopy_ram_hosttest(void);
 
+/*
+ * In 'advise' mode record that a page has been received.
+ */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index);
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis);
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages);
+void postcopy_pmi_dump(MigrationIncomingState *mis);
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 8539de6..61b330c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -77,6 +77,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
+typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
 typedef struct AdapterInfo AdapterInfo;
 
diff --git a/postcopy-ram.c b/postcopy-ram.c
index bba5c71..210585c 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,9 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 
 //#define DEBUG_POSTCOPY
 
@@ -82,6 +85,216 @@
 #if defined(__linux__) && defined(MADV_USERFAULT) && \
                           defined(__NR_remap_anon_pages)
 
+/* ---------------------------------------------------------------------- */
+/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
+/* state of each page used by the inbound postcopy                        */
+/* It's a pair of bitmaps (of the same structure as the migration bitmaps)*/
+/* holding one bit per target-page, although all operations work on host  */
+/* pages.                                                                 */
+__attribute__ (( unused )) /* Until later in patch series */
+static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    unsigned int tpb = qemu_target_page_bits();
+    unsigned long host_bits;
+
+    qemu_mutex_init(&mis->postcopy_pmi.mutex);
+    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
+    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
+    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
+    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
+    /*
+     * Each bit in the map represents one 'target page' which is no bigger
+     * than a host page but can be smaller.  It's useful to have some
+     * convenience masks for later
+     */
+
+    /*
+     * The number of bits one host page takes up in the bitmap
+     * e.g. on a 64k host page, 4k Target page, host_bits=64/4=16
+     */
+    host_bits = sysconf(_SC_PAGESIZE) / (1ul << tpb);
+    /* Should be a power of 2 */
+    assert(host_bits && !(host_bits & (host_bits - 1)));
+    /*
+     * If the host_bits isn't a division of the number of bits in long
+     * then the code gets a lot more complex; disallow for now
+     * (I'm not aware of a system where it's true anyway)
+     */
+    assert(((sizeof(long) * 8) % host_bits) == 0);
+
+    mis->postcopy_pmi.host_bits = host_bits;
+    /* A mask, starting at bit 0, containing host_bits continuous set bits */
+    mis->postcopy_pmi.host_mask =  (1ul << host_bits) - 1;
+
+    assert((ram_pages % host_bits) == 0);
+}
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis)
+{
+    if (mis->postcopy_pmi.received_map) {
+        g_free(mis->postcopy_pmi.received_map);
+        mis->postcopy_pmi.received_map = NULL;
+    }
+    if (mis->postcopy_pmi.requested_map) {
+        g_free(mis->postcopy_pmi.requested_map);
+        mis->postcopy_pmi.requested_map = NULL;
+    }
+    qemu_mutex_destroy(&mis->postcopy_pmi.mutex);
+}
+
+/*
+ * Mark a set of pages in the PMI as being clear; this is used by the discard
+ * at the start of postcopy, and before the postcopy stream starts.
+ */
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages)
+{
+    bitmap_clear(mis->postcopy_pmi.received_map, start, npages);
+}
+
+/*
+ * Test a host-page worth of bits in the map starting at bitmap_index
+ * The bits should all be consistent
+ */
+static bool test_hpbits(MigrationIncomingState *mis,
+                        size_t bitmap_index, unsigned long *map)
+{
+    long masked;
+
+    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
+
+    masked = (map[BIT_WORD(bitmap_index)] >>
+               (bitmap_index % BITS_PER_LONG)) &
+             mis->postcopy_pmi.host_mask;
+
+    assert((masked == 0) || (masked == mis->postcopy_pmi.host_mask));
+    return !!masked;
+}
+
+/*
+ * Set host-page worth of bits in the map starting at bitmap_index
+ */
+static void set_hpbits(MigrationIncomingState *mis,
+                       size_t bitmap_index, unsigned long *map)
+{
+    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
+
+    map[BIT_WORD(bitmap_index)] |= mis->postcopy_pmi.host_mask <<
+                                    (bitmap_index % BITS_PER_LONG);
+}
+
+/*
+ * Clear host-page worth of bits in the map starting at bitmap_index
+ */
+static void clear_hpbits(MigrationIncomingState *mis,
+                         size_t bitmap_index, unsigned long *map)
+{
+    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
+
+    map[BIT_WORD(bitmap_index)] &= ~(mis->postcopy_pmi.host_mask <<
+                                    (bitmap_index % BITS_PER_LONG));
+}
+
+/*
+ * Retrieve the state of the given page
+ * Note: This version for use by callers already holding the lock
+ */
+static PostcopyPMIState postcopy_pmi_get_state_nolock(
+                            MigrationIncomingState *mis,
+                            size_t bitmap_index)
+{
+    bool received, requested;
+
+    received = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.received_map);
+    requested = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
+
+    if (received) {
+        assert(!requested);
+        return POSTCOPY_PMI_RECEIVED;
+    } else {
+        return requested ? POSTCOPY_PMI_REQUESTED : POSTCOPY_PMI_MISSING;
+    }
+}
+
+/* Retrieve the state of the given page */
+__attribute__ (( unused )) /* Until later in patch series */
+static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
+                                               size_t bitmap_index)
+{
+    PostcopyPMIState ret;
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    ret = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+
+    return ret;
+}
+
+/*
+ * Set the page state to the given state if the previous state was as expected
+ * Return the actual previous state.
+ */
+__attribute__ (( unused )) /* Until later in patch series */
+static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
+                                           size_t bitmap_index,
+                                           PostcopyPMIState expected_state,
+                                           PostcopyPMIState new_state)
+{
+    PostcopyPMIState old_state;
+
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    old_state = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+
+    if (old_state == expected_state) {
+        switch (new_state) {
+        case POSTCOPY_PMI_MISSING:
+          assert(0); /* This shouldn't actually happen - use discard_range */
+          break;
+
+        case POSTCOPY_PMI_REQUESTED:
+          assert(old_state == POSTCOPY_PMI_MISSING);
+          set_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+
+        case POSTCOPY_PMI_RECEIVED:
+          assert(old_state == POSTCOPY_PMI_MISSING ||
+                 old_state == POSTCOPY_PMI_REQUESTED);
+          set_hpbits(mis, bitmap_index, mis->postcopy_pmi.received_map);
+          clear_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+        }
+    }
+
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+    return old_state;
+}
+
+/*
+ * Useful when debugging postcopy, although if it failed early the
+ * received map can be quite sparse and thus big when dumped.
+ */
+void postcopy_pmi_dump(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_pmi_dump: requested\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.requested_map, false);
+    fprintf(stderr, "postcopy_pmi_dump: received\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
+    fprintf(stderr, "postcopy_pmi_dump: end\n");
+}
+
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * If we're in precopy-advise mode we need to track received pages even
+         * though we don't need to place pages atomically yet.
+         * In advise mode there's only a single thread, so don't need locks
+         */
+        set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    }
+}
+
 int postcopy_ram_hosttest(void)
 {
     /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
@@ -156,5 +369,12 @@ int postcopy_ram_hosttest(void)
     return -1;
 }
 
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    /* We don't support postcopy so don't care */
+}
+
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-05  6:38   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      | 271 ++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h    |  12 ++
 include/migration/postcopy-ram.h |  34 +++++
 include/qemu/typedefs.h          |   1 +
 migration.c                      |   2 +
 postcopy-ram.c                   | 111 ++++++++++++++++
 savevm.c                         |   3 -
 7 files changed, 428 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 44072d8..030d189 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -40,6 +40,7 @@
 #include "hw/audio/audio.h"
 #include "sysemu/kvm.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "hw/i386/smbios.h"
 #include "exec/address-spaces.h"
 #include "hw/audio/pcspk.h"
@@ -415,9 +416,15 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return bytes_sent;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * bitoffset: Pointer into which to store the offset into the dirty map
+ *            at which the bit was found.
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 unsigned long *bitoffset)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -436,6 +443,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *bitoffset = next;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -564,6 +572,19 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /*
  * ram_save_page: Send the given page to the stream
  *
@@ -652,13 +673,14 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
     bool complete_round = false;
     int bytes_sent = 0;
     MemoryRegion *mr;
+    unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -676,6 +698,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 
             /* if page is unmodified, continue to the next */
             if (bytes_sent > 0) {
+                MigrationState *s = migrate_get_current();
+                if (s->sentmap) {
+                    set_bit(bitoffset, s->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -735,12 +762,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -808,6 +842,232 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/* **** functions for postcopy ***** */
+
+/*
+ * A helper to get 32 bits from a bit map; trivial for HOST_LONG_BITS=32
+ * messier for 64; the bitmaps are actually long's that are 32 or 64bit
+ */
+static uint32_t get_32bits_map(unsigned long *map, int64_t start)
+{
+#if HOST_LONG_BITS == 64
+    uint64_t tmp64;
+
+    tmp64 = map[start / 64];
+    return (start & 32) ? (tmp64 >> 32) : (tmp64 & 0xffffffffu);
+#elif HOST_LONG_BITS == 32
+    /*
+     * Irrespective of host endianness, sentmap[n] is for pages earlier
+     * than sentmap[n+1] so we can't just cast up
+     */
+    return map[start / 32];
+#else
+#error "Host long other than 64/32 not supported"
+#endif
+}
+
+/*
+ * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
+ * messier for 64; the bitmaps are actually long's that are 32 or 64bit
+ */
+__attribute__ (( unused )) /* Until later in patch series */
+static void put_32bits_map(unsigned long *map, int64_t start,
+                           uint32_t v)
+{
+#if HOST_LONG_BITS == 64
+    uint64_t tmp64 = v;
+    uint64_t mask = 0xffffffffu;
+
+    if (start & 32) {
+        tmp64 = tmp64 << 32;
+        mask =  mask << 32;
+    }
+
+    map[start / 64] = (map[start / 64] & ~mask) | tmp64;
+#elif HOST_LONG_BITS == 32
+    /*
+     * Irrespective of host endianness, sentmap[n] is for pages earlier
+     * than sentmap[n+1] so we can't just cast up
+     */
+    map[start / 32] = v;
+#else
+#error "Host long other than 64/32 not supported"
+#endif
+}
+
+/*
+ * When working on 32bit chunks of a bitmap where the only valid section
+ * is between start..end (inclusive), generate a mask with only those
+ * valid bits set for the current 32bit word within that bitmask.
+ */
+static int make_32bit_mask(unsigned long start, unsigned long end,
+                           unsigned long cur32)
+{
+    unsigned long first32, last32;
+    uint32_t mask = ~(uint32_t)0;
+    first32 = start / 32;
+    last32 = end / 32;
+
+    if ((cur32 == first32) && (start & 31)) {
+        /* e.g. (start & 31) = 3
+         *         1 << .    -> 2^3
+         *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+         *         ~.        -> mask 31..3
+         */
+        mask &= ~((((uint32_t)1) << (start & 31)) - 1);
+    }
+
+    if ((cur32 == last32) && ((end & 31) != 31)) {
+        /* e.g. (end & 31) = 3
+         *            .   +1 -> 4
+         *         1 << .    -> 2^4
+         *         . -1      -> 2^4 - 1
+         *                   = mask set 3..0
+         */
+        mask &= (((uint32_t)1) << ((end & 31) + 1)) - 1;
+    }
+
+    return mask;
+}
+
+/*
+ * Callback from ram_postcopy_each_ram_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+static int pc_send_discard_bm_ram(MigrationState *ms,
+                                  PostcopyDiscardState *pds,
+                                  unsigned long start, unsigned long end)
+{
+    /*
+     * There is no guarantee that start, end are on convenient 32bit multiples
+     * (We always send 32bit chunks over the wire, irrespective of long size)
+     */
+    unsigned long first32, last32, cur32;
+    first32 = start / 32;
+    last32 = end / 32;
+
+    for (cur32 = first32; cur32 <= last32; cur32++) {
+        /* Deal with start/end not on alignment */
+        uint32_t mask = make_32bit_mask(start, end, cur32);
+
+        uint32_t data = get_32bits_map(ms->sentmap, cur32 * 32);
+        data &= mask;
+
+        if (data) {
+            postcopy_discard_send_chunk(ms, pds, (cur32-first32) * 32, data);
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+static int pc_each_ram_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->length-1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first & 31,
+                                                               block->idstr);
+
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        ret = pc_send_discard_bm_ram(ms, pds, first, last);
+        postcopy_discard_send_finish(ms, pds);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    /* This should be our last sync, the src is now paused */
+    migration_bitmap_sync();
+
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     */
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap,
+               last_ram_offset() >> TARGET_PAGE_BITS);
+
+
+    DPRINTF("Dumping merged sentmap");
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
+#endif
+
+    return pc_each_ram_discard(ms);
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive range of bits indexed in the source
+ *    VMs bitmap for this RAMBlock, source_target_page_bits tells
+ *    us what one of those bits represents.
+ *
+ * start/end are offsets from the start of the bitmap for RAMBlock 'block_name'
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      uint64_t start, uint64_t end)
+{
+    assert(end >= start);
+
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        return -1;
+    }
+
+    uint64_t index_offset = rb->offset >> TARGET_PAGE_BITS;
+    postcopy_pmi_discard_range(mis, start + index_offset, (end - start) + 1);
+
+    /* +1 gives the byte after the end of the last page to be discarded */
+    ram_addr_t end_offset = (end+1) << TARGET_PAGE_BITS;
+    uint8_t *host_startaddr = rb->host + (start << TARGET_PAGE_BITS);
+    uint8_t *host_endaddr;
+
+    if (end_offset <= rb->length) {
+        host_endaddr   = rb->host + (end_offset-1);
+        return postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%" PRIu64
+                     "/%" PRIu64 "/%zu)",
+                     block_name, start, end, rb->length);
+        return -1;
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -846,7 +1106,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
         acct_clear();
     }
-
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
@@ -856,6 +1115,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1405a15..73d338e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -122,6 +122,13 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     volatile bool start_postcopy;
+
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     * where it's used to send the dirtymap at the start
+     * of the postcopy phase
+     */
+    unsigned long *sentmap;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -191,6 +198,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+/* For outgoing discard bitmap */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms);
+/* For incoming postcopy discard */
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index addb88a..2a39a03 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -28,4 +28,38 @@ void postcopy_pmi_destroy(MigrationIncomingState *mis);
 void postcopy_pmi_discard_range(MigrationIncomingState *mis,
                                 size_t start, size_t npages);
 void postcopy_pmi_dump(MigrationIncomingState *mis);
+
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * offset is the bit within the first 32bit chunk of mask
+ * that represents the first page of the RAM Block
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 uint8_t offset,
+                                                 const char *name);
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ */
+void postcopy_discard_send_chunk(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long pos, uint32_t bitmap);
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms,
+                                  PostcopyDiscardState *pds);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 61b330c..79f57c0 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -63,6 +63,7 @@ typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
+typedef struct PostcopyDiscardState PostcopyDiscardState;
 typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct MSIMessage MSIMessage;
 typedef struct SerialState SerialState;
diff --git a/migration.c b/migration.c
index 29ee740..db860c9 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
@@ -947,6 +948,7 @@ static void *migration_thread(void *opaque)
             } else {
                 int ret;
 
+                DPRINTF("done iterating\n");
                 qemu_mutex_lock_iothread();
                 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 210585c..76f992f 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -39,6 +39,19 @@
     do { } while (0)
 #endif
 
+#define MAX_DISCARDS_PER_COMMAND 12
+
+struct PostcopyDiscardState {
+    const char *name;
+    uint16_t cur_entry;
+    uint64_t addrlist[MAX_DISCARDS_PER_COMMAND];
+    uint32_t masklist[MAX_DISCARDS_PER_COMMAND];
+    uint8_t  offset;  /* Offset within 32bit mask at addr0 representing 1st
+                         page of block */
+    unsigned int nsentwords;
+    unsigned int nsentcmds;
+};
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -360,6 +373,21 @@ out:
     return ret;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        perror("postcopy_ram_discard_range MADV_DONTNEED");
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -376,5 +404,88 @@ void postcopy_hook_early_receive(MigrationIncomingState *mis,
     /* We don't support postcopy so don't care */
 }
 
+void postcopy_pmi_destroy(MigrationIncomingState *mis)
+{
+    /* Called in normal cleanup path - so it's OK */
+}
+
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages)
+{
+    assert(0);
+}
+
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    assert(0);
+}
 #endif
 
+/* ------------------------------------------------------------------------- */
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * offset is the bit within the first 64bit chunk of mask
+ * that represents the first page of the RAM Block
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 uint8_t offset,
+                                                 const char *name)
+{
+    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));
+
+    if (res) {
+        res->name = name;
+        res->cur_entry = 0;
+        res->nsentwords = 0;
+        res->nsentcmds = 0;
+        res->offset = offset;
+    }
+
+    return res;
+}
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ */
+void postcopy_discard_send_chunk(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long pos, uint32_t bitmap)
+{
+    pds->addrlist[pds->cur_entry] = pos;
+    pds->masklist[pds->cur_entry] = bitmap;
+    pds->cur_entry++;
+    pds->nsentwords++;
+
+    if (pds->cur_entry == MAX_DISCARDS_PER_COMMAND) {
+        /* Full set, ship it! */
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry, pds->offset,
+                                              pds->addrlist, pds->masklist);
+        pds->nsentcmds++;
+        pds->cur_entry = 0;
+    }
+}
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
+{
+    /* Anything unsent? */
+    if (pds->cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry, pds->offset,
+                                              pds->addrlist, pds->masklist);
+        pds->nsentcmds++;
+    }
+
+    DPRINTF("%s: '%s' mask words sent=%d in %d commands",
+            __func__, pds->name, pds->nsentwords, pds->nsentcmds);
+
+    g_free(pds);
+}
diff --git a/savevm.c b/savevm.c
index 7c4541d..7f9e0b2 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1267,15 +1267,12 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
              * we know there must be at least 1 bit set due to the loop entry
              * If there is no 0 firstzero will be 32
              */
-            /* TODO - ram_discard_range gets added in a later patch
             int ret = ram_discard_range(mis, ramid,
                                 startaddr + firstset - first_bit_offset,
                                 startaddr + (firstzero - 1) - first_bit_offset);
-            ret = -1;
             if (ret) {
                 return ret;
             }
-            */
 
             /* mask= .?0000000000 */
             /*         ^fz ^fs    */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-05  6:47   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      |  11 ++++
 include/migration/migration.h    |   1 +
 include/migration/postcopy-ram.h |  12 +++++
 migration.c                      |   1 +
 postcopy-ram.c                   | 110 ++++++++++++++++++++++++++++++++++++++-
 savevm.c                         |   4 ++
 6 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 030d189..4a03171 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1345,6 +1345,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 73d338e..be63c89 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -203,6 +203,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 2a39a03..8f237a2 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -19,6 +19,18 @@
 int postcopy_ram_hosttest(void);
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * In 'advise' mode record that a page has been received.
  */
 void postcopy_hook_early_receive(MigrationIncomingState *mis,
diff --git a/migration.c b/migration.c
index db860c9..63d70b6 100644
--- a/migration.c
+++ b/migration.c
@@ -99,6 +99,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 
 void migration_incoming_state_destroy(void)
 {
+    postcopy_pmi_destroy(mis_current);
     g_free(mis_current);
     mis_current = NULL;
 }
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 76f992f..8eccf26 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -104,7 +104,6 @@ struct PostcopyDiscardState {
 /* It's a pair of bitmaps (of the same structure as the migration bitmaps)*/
 /* holding one bit per target-page, although all operations work on host  */
 /* pages.                                                                 */
-__attribute__ (( unused )) /* Until later in patch series */
 static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
 {
     unsigned int tpb = qemu_target_page_bits();
@@ -388,6 +387,104 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can use remap_anon_pages to put
+     * a 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        perror("init_area: NOHUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    /* Turn off userfault here as well? */
+
+    DPRINTF("cleanup_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        perror("init_area: HUGEPAGE");
+        return -1;
+    }
+
+    /*
+     * We can also turn off userfault now since we should have all the
+     * pages.   It can be useful to leave it on to debug postcopy
+     * if you're not sure it's always getting every page.
+     */
+    if (madvise(host_addr, length, MADV_NOUSERFAULT)) {
+        perror("init_area: NOUSERFAULT");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    postcopy_pmi_init(mis, ram_pages);
+
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -404,6 +501,17 @@ void postcopy_hook_early_receive(MigrationIncomingState *mis,
     /* We don't support postcopy so don't care */
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 void postcopy_pmi_destroy(MigrationIncomingState *mis)
 {
     /* Called in normal cleanup path - so it's OK */
diff --git a/savevm.c b/savevm.c
index 7f9e0b2..54bdb26 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1166,6 +1166,10 @@ static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
 
     /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 16:42   ` Paolo Bonzini
  2014-11-05  6:49   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  2 ++
 include/migration/postcopy-ram.h |  6 +++++
 postcopy-ram.c                   | 49 +++++++++++++++++++++++++++++++++++++++-
 savevm.c                         |  9 ++++++++
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index be63c89..b01cc17 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -87,6 +87,8 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    /* For the kernel to send us notifications */
+    int            userfault_fd;
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8f237a2..413b670 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -19,6 +19,12 @@
 int postcopy_ram_hosttest(void);
 
 /*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+
+/*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
  * postcopy later; must be called prior to any precopy.
  * called from arch_init's similarly named ram_postcopy_incoming_init
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 8eccf26..925ac77 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -485,9 +485,51 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: Unused
+ * Returns 0 on success
+ */
+static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
+                                       ram_addr_t offset, ram_addr_t length,
+                                       void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    uint64_t tokern[2];
+
+    if (madvise(host_addr, length, MADV_USERFAULT)) {
+        perror("postcopy_ram_sensitise_area madvise");
+        return -1;
+    }
+
+    /* Now tell our userfault_fd that it's responsible for this area */
+    tokern[0] = (uint64_t)(uintptr_t)host_addr | 1; /* 1 means register area */
+    tokern[1] = (uint64_t)(uintptr_t)host_addr + length;
+    if (write(mis->userfault_fd, tokern, 16) != 16) {
+        perror("postcopy_ram_sensitise_area write");
+        madvise(host_addr, length, MADV_NOUSERFAULT);
+        return -1;
+    }
+
+    return 0;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 int postcopy_ram_hosttest(void)
 {
     error_report("postcopy_ram_hosttest: No OS support");
@@ -528,6 +570,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 {
     assert(0);
 }
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    assert(0);
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/savevm.c b/savevm.c
index 54bdb26..859c96f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1304,6 +1304,15 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
 
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
     /* TODO start up the postcopy listening thread */
     return 0;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 16:27   ` Paolo Bonzini
  2014-11-10  6:05   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Rework the migration thread to setup and start postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   3 +
 migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 185 insertions(+), 19 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index b01cc17..f401775 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -125,6 +125,9 @@ struct MigrationState
     /* Flag set once the migration has been asked to enter postcopy */
     volatile bool start_postcopy;
 
+    /* Flag set once the migration thread is running (and needs joining) */
+    volatile bool started_migration_thread;
+
     /* bitmap of pages that have been sent at least once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
diff --git a/migration.c b/migration.c
index 63d70b6..1731017 100644
--- a/migration.c
+++ b/migration.c
@@ -475,7 +475,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->started_migration_thread) {
+            qemu_thread_join(&s->thread);
+            s->started_migration_thread = false;
+        }
         qemu_mutex_lock_iothread();
 
         qemu_fclose(s->file);
@@ -872,7 +875,6 @@ out:
     return NULL;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 static int open_outgoing_return_path(MigrationState *ms)
 {
 
@@ -890,7 +892,6 @@ static int open_outgoing_return_path(MigrationState *ms)
     return 0;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 static void await_outgoing_return_path_close(MigrationState *ms)
 {
     /*
@@ -908,6 +909,97 @@ static void await_outgoing_return_path_close(MigrationState *ms)
     DPRINTF("%s: Exit", __func__);
 }
 
+/* Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
+
+    DPRINTF("postcopy_start\n");
+    qemu_mutex_lock_iothread();
+    DPRINTF("postcopy_start: setting run state\n");
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (ram_postcopy_send_discard_bitmap(ms)) {
+        DPRINTF("postcopy send discard bitmap failed\n");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    DPRINTF("postcopy_start: sending req 2\n");
+    qemu_savevm_send_reqack(ms->file, 2);
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    DPRINTF("postcopy_start: do state_complete\n");
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+    if (!fb) {
+        error_report("Failed to create buffered file");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    qemu_savevm_state_complete(fb);
+    DPRINTF("postcopy_start: sending req 3\n");
+    qemu_savevm_send_reqack(fb, 3);
+
+    qemu_savevm_send_postcopy_ram_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
+        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
+                (unsigned long)(qsb_get_length(qsb)));
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        qemu_fclose(fb);
+        return -1;
+    }
+    qemu_savevm_send_packaged(ms->file, qsb);
+    qemu_fclose(fb);
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("postcopy_start not finished sending ack\n");
+    qemu_savevm_send_reqack(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+    }
+
+    return ret;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -915,16 +1007,36 @@ static void await_outgoing_return_path_close(MigrationState *ms)
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
+
     bool old_vm_running = false;
 
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
+
     qemu_savevm_state_begin(s->file, &s->params);
 
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open it's end so it can reply */
+        qemu_savevm_send_openrp(s->file);
+
+        /* And ask it to send an ack that will make stuff easier to debug */
+        qemu_savevm_send_reqack(s->file, 1);
+
+        /* Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_ram_advise(s->file);
+    }
+
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIG_STATE_ACTIVE;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
     DPRINTF("setup complete\n");
@@ -945,37 +1057,74 @@ static void *migration_thread(void *opaque)
                     " nonpost=%" PRIu64 ")\n",
                     pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
+                    pend_nonpost == 0 && s->start_postcopy) {
+
+                    if (!postcopy_start(s)) {
+                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
+                    }
+
+                    continue;
+                }
+                /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
                 int ret;
 
-                DPRINTF("done iterating\n");
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
+                DPRINTF("done iterating pending size %" PRIu64 "\n",
+                        pending_size);
+
+                if (s->state == MIG_STATE_ACTIVE) {
+                    qemu_mutex_lock_iothread();
+                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+                    old_vm_running = runstate_is_running();
+
+                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+                    if (ret >= 0) {
+                        qemu_file_set_rate_limit(s->file, INT64_MAX);
+                        qemu_savevm_state_complete(s->file);
+                    }
+                    qemu_mutex_unlock_iothread();
+
+                    if (ret < 0) {
+                        migrate_set_state(s, current_active_type,
+                                          MIG_STATE_ERROR);
+                        break;
+                    }
+                } else if (s->state == MIG_STATE_POSTCOPY_ACTIVE) {
+                    DPRINTF("postcopy end\n");
+
+                    qemu_savevm_state_postcopy_complete(s->file);
+                    DPRINTF("postcopy end after complete\n");
 
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
                 }
-                qemu_mutex_unlock_iothread();
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
-                    break;
+                /*
+                 * If rp was opened we must clean up the thread before
+                 * cleaning everything else up.
+                 * Postcopy opens rp if enabled (even if it's not avtivated)
+                 */
+                if (migrate_postcopy_ram()) {
+                    DPRINTF("before rp close");
+                    await_outgoing_return_path_close(s);
+                    DPRINTF("after rp close");
                 }
-
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
+                    migrate_set_state(s, current_active_type,
+                                      MIG_STATE_COMPLETED);
                     break;
                 }
             }
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
+            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
+            DPRINTF("migration_thread: file is in error state\n");
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1006,6 +1155,7 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    DPRINTF("migration_thread: After loop");
     qemu_mutex_lock_iothread();
     if (s->state == MIG_STATE_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1043,6 +1193,19 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /* Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_outgoing_return_path(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->started_migration_thread = true;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-10  6:10   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 postcopy-ram.c                | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f401775..cdd0e56 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -87,6 +87,9 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     QEMUFile *return_path;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 925ac77..8b2a035 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -518,8 +518,31 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
     return 0;
 }
 
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-10  6:19   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 ++
 migration.c                   | 74 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index cdd0e56..5e0d30d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,7 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
     MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_REQPAGES,     /* data (start: be64, len: be64) */
     MIG_RPCOMM_AFTERLASTVALID
 };
 
@@ -250,6 +251,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_ack(MigrationIncomingState *mis,
                          uint32_t value);
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index 1731017..cfdaa52 100644
--- a/migration.c
+++ b/migration.c
@@ -144,6 +144,38 @@ void migrate_send_rp_ack(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = (uint64_t)start;
+    buf64[0] = cpu_to_be64(buf64[0]);
+    buf64[1] = (uint64_t)len;
+    buf64[1] = cpu_to_be64(buf64[1]);
+    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -784,6 +816,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  */
@@ -795,6 +838,8 @@ static void *source_return_path_thread(void *opaque)
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
+    char *tmpstr;
     int res;
 
     DPRINTF("RP: %s entry", __func__);
@@ -810,6 +855,11 @@ static void *source_return_path_thread(void *opaque)
             expected_len = 4;
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
                     header_com, header_len);
@@ -857,6 +907,30 @@ static void *source_return_path_thread(void *opaque)
             atomic_xchg(&ms->rp_state.latest_ack, tmp32);
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
+            tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
+            tmpstr = NULL;
+            if (tmp64b & 1) {
+                tmp64b -= 1; /* Remove the flag */
+                /* Now we expect an idstr */
+                tmp32 = buf[16]; /* Length of the following idstr */
+                tmpstr = (char *)&buf[17];
+                buf[17+tmp32] = '\0';
+                expected_len = 16+1+tmp32;
+            } else {
+                expected_len = 16;
+            }
+            if (header_len != expected_len) {
+                error_report("RP: Received ReqPage with length %d expecting %d",
+                        header_len, expected_len);
+                source_return_path_bad(ms);
+            }
+            migrate_handle_rp_reqpages(ms, tmpstr,
+                                          (ram_addr_t)tmp64a,
+                                          (ram_addr_t)tmp64b);
+            break;
+
         default:
             /* This shouldn't happen because we should catch this above */
             DPRINTF("RP: Bad header_com in dispatch");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-08  2:31   ` zhanghailiang
  2014-11-10  6:31   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQPAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h | 21 +++++++++++++++++
 include/qemu/typedefs.h       |  3 ++-
 migration.c                   | 34 +++++++++++++++++++++++++++-
 4 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4a03171..72f9e17 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no previous block");
+            return -1;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            return -1;
+        }
+    }
+    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
+                    ramblock->idstr, start, len);
+
+    if (start+len > ramblock->length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->length);
+        return -1;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return 0;
+}
+
+/*
  * ram_find_and_save_block: Finds a page to send and sends it to f
  *
  * Returns:  The number of bytes written.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5e0d30d..5bc01d5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -102,6 +102,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -138,6 +150,12 @@ struct MigrationState
      * of the postcopy phase
      */
     unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -273,4 +291,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              int *bytes_sent);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 79f57c0..24c2207 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 
+typedef struct AdapterInfo AdapterInfo;
 typedef struct AioContext AioContext;
 
 typedef struct Visitor Visitor;
@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
 typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
-typedef struct AdapterInfo AdapterInfo;
+typedef struct RAMBlock RAMBlock;
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration.c b/migration.c
index cfdaa52..63d7699 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 //#define DEBUG_MIGRATION
 
@@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIG_STATE_SETUP;
     trace_migrate_set_state(MIG_STATE_SETUP);
 
+    qemu_mutex_init(&s->src_page_req_mutex);
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
 static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
+            rbname, start, len);
+
+    /* Round everything up to our host page size */
+    long our_host_ps = sysconf(_SC_PAGESIZE);
+    if (start & (our_host_ps-1)) {
+        long roundings = start & (our_host_ps-1);
+        start -= roundings;
+        len += roundings;
+    }
+    if (len & (our_host_ps-1)) {
+        long roundings = len & (our_host_ps-1);
+        len -= roundings;
+        len += our_host_ps;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 18:04   ` Paolo Bonzini
  2014-11-11  1:13   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 125 insertions(+), 24 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 72f9e17..a945990 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -331,6 +331,7 @@ static RAMBlock *last_seen_block;
 /* This is the last block from where we have sent data */
 static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
+static bool last_was_from_queue;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
@@ -460,6 +461,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
     return ret;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     ram_addr_t addr;
@@ -660,6 +674,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:   The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:   MigrationState in
+ *  offset:   the byte offset within the RAMBlock for the start of the page
+ * bitoffset: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       unsigned long *bitoffset)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *bitoffset = (entry->offset + entry->rb->offset) >> TARGET_PAGE_BITS;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+/*
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
  *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
@@ -720,44 +767,97 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int bytes_sent = 0;
-    MemoryRegion *mr;
     unsigned long bitoffset;
+    unsigned long hps = sysconf(_SC_PAGESIZE);
 
-    if (!block)
+    if (!block) {
         block = QTAILQ_FIRST(&ram_list.blocks);
+        last_was_from_queue = false;
+    }
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = NULL;
+
+        /*
+         * Don't break host-page chunks up with queue items
+         * so only unqueue if,
+         *   a) The last item came from the queue anyway
+         *   b) The last sent item was the last target-page in a host page
+         */
+        if (last_was_from_queue || (!last_sent_block) ||
+            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
+            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
         }
-        if (offset >= block->length) {
-            offset = 0;
-            block = QTAILQ_NEXT(block, next);
-            if (!block) {
-                block = QTAILQ_FIRST(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
+
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
+                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
+            /* We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again.
+             */
+            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
+                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
+                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
+                        test_bit(bitoffset, ms->sentmap));
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We mustn't change block/offset unless it's to a valid one
+             * otherwise we can go down some of the exit cases in the normal
+             * path.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
+            last_was_from_queue = true;
         } else {
-            bytes_sent = ram_save_page(f, block, offset, last_stage);
-
-            /* if page is unmodified, continue to the next */
-            if (bytes_sent > 0) {
-                MigrationState *s = migrate_get_current();
-                if (s->sentmap) {
-                    set_bit(bitoffset, s->sentmap);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                           &bitoffset);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
+            }
+            if (offset >= block->length) {
+                offset = 0;
+                block = QTAILQ_NEXT(block, next);
+                if (!block) {
+                    block = QTAILQ_FIRST(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
                 }
+                continue; /* pick an offset in the new block */
+            }
+            last_was_from_queue = false;
+        }
 
-                last_sent_block = block;
-                break;
+        /* We have a page to send, so send it */
+        bytes_sent = ram_save_page(f, block, offset, last_stage);
+
+        /* if page is unmodified, continue to the next */
+        if (bytes_sent > 0) {
+            if (ms->sentmap) {
+                set_bit(bitoffset, ms->sentmap);
             }
+
+            last_sent_block = block;
+            break;
         }
     }
     last_seen_block = block;
@@ -851,6 +951,7 @@ static void reset_ram_globals(void)
     last_offset = 0;
     last_version = ram_list.version;
     ram_bulk_stage = true;
+    last_was_from_queue = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 18:32   ` Paolo Bonzini
  2014-11-11  1:14   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

I've seen it go negative once during dev, it shouldn't
happen.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch_init.c b/arch_init.c
index a945990..2f4345a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -442,6 +442,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 
     if (next < size) {
         clear_bit(next, migration_bitmap);
+        assert(migration_dirty_pages > 0);
         migration_dirty_pages--;
     }
     *bitoffset = next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-11  1:39   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the new remap_anon_pages syscall).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |   2 +
 include/migration/postcopy-ram.h |  23 +++++++
 postcopy-ram.c                   | 145 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5bc01d5..58ac7bf 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -96,6 +96,8 @@ struct MigrationIncomingState {
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
+    void          *postcopy_tmp_page;
+    long           postcopy_place_skipped; /* Check for incorrect place ops */
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 413b670..0210491 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -80,4 +80,27 @@ void postcopy_discard_send_chunk(MigrationState *ms, PostcopyDiscardState *pds,
 void postcopy_discard_send_finish(MigrationState *ms,
                                   PostcopyDiscardState *pds);
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset);
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset);
+
 #endif
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 8b2a035..19d4b20 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -229,7 +229,6 @@ static PostcopyPMIState postcopy_pmi_get_state_nolock(
 }
 
 /* Retrieve the state of the given page */
-__attribute__ (( unused )) /* Until later in patch series */
 static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
                                                size_t bitmap_index)
 {
@@ -245,7 +244,6 @@ static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
  * Set the page state to the given state if the previous state was as expected
  * Return the actual previous state.
  */
-__attribute__ (( unused )) /* Until later in patch series */
 static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
                                            size_t bitmap_index,
                                            PostcopyPMIState expected_state,
@@ -464,6 +462,7 @@ static int cleanup_area(const char *block_name, void *host_addr,
 int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
 {
     postcopy_pmi_init(mis, ram_pages);
+    mis->postcopy_place_skipped = -1;
 
     if (qemu_ram_foreach_block(init_area, mis)) {
         return -1;
@@ -482,6 +481,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -551,6 +554,126 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset)
+{
+    void *tmp = postcopy_get_tmp_page(mis, bitmap_offset);
+    if (!tmp) {
+        return -ENOMEM;
+    }
+    *(char *)tmp = 0;
+    return postcopy_place_page(mis, host, tmp, bitmap_offset);
+}
+
+/*
+ * Place a target page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ *
+ * Where HPS > TPS it holds off doing the place until the last TP in the HP
+ *  and assumes (from, host) point to the last TP in a continuous HP
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset)
+{
+    PostcopyPMIState old_state, tmp_state;
+    size_t hps = sysconf(_SC_PAGESIZE);
+
+    /* Only place the page when the last target page within the hp arrives */
+    if ((bitmap_offset + 1) & (mis->postcopy_pmi.host_bits - 1)) {
+        DPRINTF("%s: Skipping incomplete hp host=%p from=%p bitmap_offset=%lx",
+                __func__, host, from, bitmap_offset);
+        mis->postcopy_place_skipped = bitmap_offset;
+        return 0;
+    }
+
+    /*
+     * If we skip a page (above) we should end up placing that page before
+     * doing anything with other host pages.
+     */
+    if (mis->postcopy_place_skipped != -1) {
+        assert((bitmap_offset & ~(mis->postcopy_pmi.host_bits - 1)) ==
+               (mis->postcopy_place_skipped &
+                ~(mis->postcopy_pmi.host_bits - 1)));
+    }
+    mis->postcopy_place_skipped = -1;
+
+    /* Adjust pointers to point to start of host page */
+    host = (void *)((uintptr_t)host & ~(hps - 1));
+    from = (void *)((uintptr_t)from & ~(hps - 1));
+    bitmap_offset -= (mis->postcopy_pmi.host_bits - 1);
+
+    if (syscall(__NR_remap_anon_pages, host, from, hps, 0) !=
+            getpagesize()) {
+        perror("remap_anon_pages in postcopy_place_page");
+        fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
+                postcopy_pmi_get_state(mis, bitmap_offset));
+
+        return -errno;
+    }
+
+    tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
+    do {
+        old_state = tmp_state;
+        tmp_state = postcopy_pmi_change_state(mis, bitmap_offset, old_state,
+                                              POSTCOPY_PMI_RECEIVED);
+
+    } while (old_state != tmp_state);
+
+
+    if (old_state == POSTCOPY_PMI_REQUESTED) {
+        /* TODO: Notify kernel */
+    }
+
+    return 0;
+}
+
+/*
+ * Returns a target page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ *
+ * Note this is a target page and uses the bitmap_offset to get an offset
+ * into a hostpage; since there's only one real temporary host page the caller
+ * is expected to not flip around between pages.
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset)
+{
+    ptrdiff_t offset;
+
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            perror("mapping postcopy tmp page");
+            return NULL;
+        }
+        if (madvise(mis->postcopy_tmp_page, getpagesize(), MADV_DONTFORK)) {
+            munmap(mis->postcopy_tmp_page, getpagesize());
+            perror("postcpy tmp page DONTFORK");
+            return NULL;
+        }
+    }
+
+    /*
+     * Get the offset within the host page based on bitmap_offset.
+     */
+    offset = (bitmap_offset & (mis->postcopy_pmi.host_bits - 1)) <<
+                 qemu_target_page_bits();
+
+    return (void *)((uint8_t *)mis->postcopy_tmp_page + offset);
+}
+
 #else
 /* No target OS support, stubs just fail */
 int postcopy_ram_hosttest(void)
@@ -598,6 +721,24 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     assert(0);
 }
+
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset)
+{
+    assert(0);
+}
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset)
+{
+    assert(0);
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset)
+{
+    assert(0);
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  2:53   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 87 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2f4345a..0ba627b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1458,9 +1458,20 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ * rb: Pointer to RAMBlock* that gets filled in with the RB we find
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
-                                            int flags)
+                                            int flags, RAMBlock **rb)
 {
     static RAMBlock *block = NULL;
     char id[256];
@@ -1471,8 +1482,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
+        if (rb) {
+            *rb = block;
+        }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        goto gotit;
     }
 
     len = qemu_get_byte(f);
@@ -1480,12 +1494,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
-            return memory_region_get_ram_ptr(block->mr) + offset;
+        if (!strncmp(id, block->idstr, sizeof(id))) {
+            if (rb) {
+                *rb = block;
+            }
+            goto gotit;
+        }
     }
 
     error_report("Can't find block %s!", id);
     return NULL;
+
+gotit:
+    postcopy_hook_early_receive(mis,
+        (offset + (*rb)->offset) >> TARGET_PAGE_BITS);
+    return memory_region_get_ram_ptr(block->mr) + offset;
+
 }
 
 /*
@@ -1515,6 +1539,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     ram_addr_t addr;
     int flags, ret = 0;
     static uint64_t seq_iter;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    bool postcopy_running = mis->postcopy_ram_state >=
+                            POSTCOPY_RAM_INCOMING_LISTENING;
 
     seq_iter++;
 
@@ -1523,6 +1554,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     while (!ret) {
+        RAMBlock *rb = 0; /* =0 needed to silence compiler */
         addr = qemu_get_be64(f);
 
         flags = addr & ~TARGET_PAGE_MASK;
@@ -1570,7 +1602,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             void *host;
             uint8_t ch;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -1578,20 +1610,66 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
 
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                if (!ch) {
+                    ret = postcopy_place_zero_page(mis, host,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                } else {
+                    void *tmp;
+                    tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
+                                                      TARGET_PAGE_BITS);
+
+                    if (!tmp) {
+                        return -ENOMEM;
+                    }
+                    memset(tmp, ch, TARGET_PAGE_SIZE);
+                    ret = postcopy_place_page(mis, host, tmp,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                }
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy compress @"
+                                 "%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_PAGE) {
             void *host;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
 
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            } else {
+                void *tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
+                                                        TARGET_PAGE_BITS);
+
+                if (!tmp) {
+                    return -ENOMEM;
+                }
+                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
+                ret = postcopy_place_page(mis, host, tmp,
+                          (addr + rb->offset) >> TARGET_PAGE_BITS);
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy simple"
+                                 "@%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
-            void *host = host_from_stream_offset(f, addr, flags);
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
+            }
+            void *host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  2:59   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock, the global ram_addr_t value and it's bitmap position.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since it's the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 56 ++++++++++++++++++++++++++++++++++++++++++-----
 include/exec/cpu-common.h |  4 ++++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 65ee612..07722b3 100644
--- a/exec.c
+++ b/exec.c
@@ -1246,6 +1246,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
     RAMBlock *new_block = find_ram_block(addr);
@@ -1603,16 +1608,35 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
+ *                          isn't available)
+ *
+ * Returns: RAMBlock (or NULL if not found)
+ */
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset,
+                                   unsigned long *bm_index)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
 
     if (xen_enabled()) {
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        return qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (!block) {
+            return NULL;
+        }
+        *offset = (host - block->host);
+        return block;
     }
 
     block = ram_list.mru_block;
@@ -1633,7 +1657,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
+    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+    unsigned long index; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset, &index);
+
+    if (!block) {
+        return NULL;
+    }
+
     return block->mr;
 }
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 8042f50..ae25407 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -55,8 +55,12 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset,
+                                   unsigned long *bm_index);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  3:01   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0ba627b..1fe4fab 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1381,7 +1381,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     qemu_mutex_lock_ramlist();
-    migration_bitmap_sync();
+
+    if (!migration_postcopy_phase(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1414,7 +1417,8 @@ static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_postcopy_phase(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         migration_bitmap_sync();
         qemu_mutex_unlock_iothread();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  3:10   ` David Gibson
  2015-01-27 10:20   ` Peter Maydell
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Prior to the start of postcopy, ensure that everything that will
be transferred later is a whole host-page in size.

This is accomplished by discarding partially transferred host pages
and marking any that are partially dirty as fully dirty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 1fe4fab..aac250c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
  * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
  * messier for 64; the bitmaps are actually long's that are 32 or 64bit
  */
-__attribute__ (( unused )) /* Until later in patch series */
 static void put_32bits_map(unsigned long *map, int64_t start,
                            uint32_t v)
 {
@@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
 }
 
 /*
+ * Utility for the outgoing postcopy code.
+ *
+ * Discard any partially sent host-page size chunks, mark any partially
+ * dirty host-page size chunks as all dirty.
+ *
+ * Returns: 0 on success
+ */
+static int postcopy_chunk_hostpages(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
+    uint32_t host_mask;
+
+    /* Should be a power of 2 */
+    assert(host_bits && !(host_bits & (host_bits - 1)));
+    /*
+     * If the host_bits isn't a division of 32 (the minimum long size)
+     * then the code gets a lot more complex; disallow for now
+     * (I'm not aware of a system where it's true anyway)
+     */
+    assert((32 % host_bits) == 0);
+
+    /* A mask, starting at bit 0, containing host_bits continuous set bits */
+    host_mask =  (1u << host_bits) - 1;
+
+
+    if (host_bits == 1) {
+        /* Easy case - TPS==HPS - nothing to be done */
+        return 0;
+    }
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        unsigned long first32, last32, cur32;
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->length-1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first & 31,
+                                                               block->idstr);
+
+        first32 = first / 32;
+        last32 = last / 32;
+        for (cur32 = first32; cur32 <= last32; cur32++) {
+            unsigned int current_hp;
+            /* Deal with start/end not on alignment */
+            uint32_t mask = make_32bit_mask(first, last, cur32);
+
+            /* a chunk of sent pages */
+            uint32_t sdata = get_32bits_map(ms->sentmap, cur32 * 32);
+            /* a chunk of dirty pages */
+            uint32_t ddata = get_32bits_map(migration_bitmap, cur32 * 32);
+            uint32_t discard = 0;
+            uint32_t redirty = 0;
+            sdata &= mask;
+            ddata &= mask;
+
+            for (current_hp = 0; current_hp < 32; current_hp += host_bits) {
+                uint32_t host_sent = (sdata >> current_hp) & host_mask;
+                uint32_t host_dirty = (ddata >> current_hp) & host_mask;
+
+                if (host_sent && (host_sent != host_mask)) {
+                    /* Partially sent host page */
+                    redirty |= host_mask << current_hp;
+                    discard |= host_mask << current_hp;
+
+                } else if (host_dirty && (host_dirty != host_mask)) {
+                    /* Partially dirty host page */
+                    redirty |= host_mask << current_hp;
+                }
+            }
+            if (discard) {
+                /* Tell the destination to discard these pages */
+                postcopy_discard_send_chunk(ms, pds, (cur32-first32) * 32,
+                                            discard);
+                /* And clear them in the sent data structure */
+                sdata = get_32bits_map(ms->sentmap, cur32 * 32);
+                put_32bits_map(ms->sentmap, cur32 * 32, sdata & ~discard);
+            }
+            if (redirty) {
+                /*
+                 * Reread original dirty bits and OR in ones we clear; we
+                 * must reread since we might be at the start or end of
+                 * a RAMBlock that the original 'mask' discarded some
+                 * bits from
+                */
+                ddata = get_32bits_map(migration_bitmap, cur32 * 32);
+                put_32bits_map(migration_bitmap, cur32 * 32,
+                           ddata | redirty);
+                /* Inc the count of dirty pages */
+                migration_dirty_pages += ctpop32(redirty - (ddata & redirty));
+            }
+        }
+
+        postcopy_discard_send_finish(ms, pds);
+    }
+    /* Easiest way to make sure we don't resume in the middle of a host-page */
+    last_seen_block = NULL;
+    last_sent_block = NULL;
+
+    return 0;
+}
+
+/*
  * Transmit the set of pages to be discarded after precopy to the target
  * these are pages that have been sent previously but have been dirtied
  * Hopefully this is pretty sparse
  */
 int ram_postcopy_send_discard_bitmap(MigrationState *ms)
 {
+    int ret;
+
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
 
+    /* Deal with TPS != HPS */
+    ret = postcopy_chunk_hostpages(ms);
+    if (ret) {
+        return ret;
+    }
+
     /*
      * Update the sentmap to be  sentmap&=dirty
      */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  3:23   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages marked as MADV_USERFAULT, and
allows the program to acknowledge those stalls and tell the accessing
thread to carry on.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   4 +
 postcopy-ram.c                | 224 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 219 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 58ac7bf..00255b8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -88,11 +88,15 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
     /* For the kernel to send us notifications */
     int            userfault_fd;
+    /* To tell the fault_thread to quit */
+    int            userfault_quit_fd;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 19d4b20..c0ed8c0 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -67,6 +67,8 @@ struct PostcopyDiscardState {
  *                       areas without creating loads of VMAs.
  */
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 
@@ -476,15 +478,40 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
+    DPRINTF("%s: entry", __func__);
     if (qemu_ram_foreach_block(cleanup_area, mis)) {
         return -1;
     }
 
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to inc it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            DPRINTF("%s: Joining fault thread", __func__);
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            perror("incing userfault_quit_fd");
+        }
+
+        DPRINTF("%s: closing uf", __func__);
+        close(mis->userfault_fd);
+        close(mis->userfault_quit_fd);
+        mis->have_fault_thread = false;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_END;
+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
+
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    DPRINTF("%s: exit", __func__);
     return 0;
 }
 
@@ -522,35 +549,210 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
 }
 
 /*
+ * Tell the kernel that we've now got some memory it previously asked for.
+ * Note: We're not allowed to ack a page which wasn't requested.
+ */
+static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
+{
+    uint64_t tmp[2];
+
+    /*
+     * Kernel wants the range that's now safe to access
+     * Note it always takes 64bit values, even on a 32bit host.
+     */
+    tmp[0] = (uint64_t)(uintptr_t)start;
+    tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
+
+    if (write(mis->userfault_fd, tmp, 16) != 16) {
+        int e = errno;
+
+        if (e == ENOENT) {
+            /* Kernel said it wasn't waiting - one case where this can
+             * happen is where two threads triggered the userfault
+             * and we receive the page and ack it just after we received
+             * the 2nd request and that ends up deciding it should ack it
+             * We could optimise it out, but it's rare.
+             */
+            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
+            return 0;
+        }
+        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
+                     start, len, e);
+        return -errno;
+    }
+
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+    void *hostaddr;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    DPRINTF("%s", __func__);
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
-    }
+    while (true) {
+        PostcopyPMIState old_state, tmp_state;
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        unsigned long bitmap_index;
+        struct pollfd pfd[2];
+
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            perror("userfault poll");
+            break;
+        }
 
+        if (pfd[1].revents) {
+            DPRINTF("%s got quit event", __func__);
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
+        if (ret != sizeof(hostaddr)) {
+            if (ret < 0) {
+                perror("Failed to read full userfault hostaddr");
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %zd",
+                             __func__, ret, sizeof(hostaddr));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+
+        rb = qemu_ram_block_from_host(hostaddr, true, &in_raspace, &rb_offset,
+                                      &bitmap_index);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %p",
+                         hostaddr);
+            break;
+        }
+
+        DPRINTF("%s: Request for HVA=%p index=%lx rb=%s offset=%zx",
+                __func__, hostaddr, bitmap_index, qemu_ram_get_idstr(rb),
+                rb_offset);
+
+        tmp_state = postcopy_pmi_get_state(mis, bitmap_index);
+        do {
+            old_state = tmp_state;
+
+            switch (old_state) {
+            case POSTCOPY_PMI_REQUESTED:
+                /* Do nothing - it's already requested */
+                break;
+
+            case POSTCOPY_PMI_RECEIVED:
+                /* Already arrived - no state change, just kick the kernel */
+                DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
+                        hostaddr);
+                if (ack_userfault(mis,
+                                  (void *)((uintptr_t)hostaddr
+                                           & ~(hostpagesize - 1)),
+                                  hostpagesize)) {
+                    assert(0);
+                }
+                break;
+
+            case POSTCOPY_PMI_MISSING:
+
+                tmp_state = postcopy_pmi_change_state(mis, bitmap_index,
+                                           old_state, POSTCOPY_PMI_REQUESTED);
+                if (tmp_state == POSTCOPY_PMI_MISSING) {
+                    /*
+                     * Send the request to the source - we want to request one
+                     * of our host page sizes (which is >= TPS)
+                     */
+                    if (rb != last_rb) {
+                        last_rb = rb;
+                        migrate_send_rp_reqpages(mis, qemu_ram_get_idstr(rb),
+                                                 rb_offset, hostpagesize);
+                    } else {
+                        /* Save some space */
+                        migrate_send_rp_reqpages(mis, NULL,
+                                                 rb_offset, hostpagesize);
+                    }
+                }
+                break;
+           }
+        } while (tmp_state != old_state);
+    }
+    DPRINTF("%s: exit", __func__);
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    uint64_t tmp64;
+
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (mis->userfault_fd == -1) {
+        perror("Failed to open userfault fd");
+        return -1;
+    }
+
+    /*
+     * Version handshake, we send it the version we want and expect to get the
+     * same back.
+     */
+    tmp64 = USERFAULTFD_PROTOCOL;
+    if (write(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Writing userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (read(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Reading userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (tmp64 != USERFAULTFD_PROTOCOL) {
+        error_report("Mismatched userfaultfd version, expected %zx, got %zx",
+                     (size_t)USERFAULTFD_PROTOCOL, (size_t)tmp64);
+        close(mis->userfault_fd);
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        perror("Opening userfault_quit_fd");
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
         return -1;
     }
 
+    DPRINTF("postcopy_ram_enable_notify: Sensitised");
+
     return 0;
 }
 
@@ -612,11 +814,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
     if (syscall(__NR_remap_anon_pages, host, from, hps, 0) !=
             getpagesize()) {
+        int e = errno;
         perror("remap_anon_pages in postcopy_place_page");
         fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
                 postcopy_pmi_get_state(mis, bitmap_offset));
 
-        return -errno;
+        return -e;
     }
 
     tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
@@ -629,7 +832,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
 
     if (old_state == POSTCOPY_PMI_REQUESTED) {
-        /* TODO: Notify kernel */
+        /* Send the kernel the host address that should now be accessible */
+        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
+                __func__, bitmap_offset, host);
+        return ack_userfault(mis, host, hps);
     }
 
     return 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-11-13  3:29   ` David Gibson
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration.c                   |  6 +++++
 savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 00255b8..69e776c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -92,6 +92,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration.c b/migration.c
index 63d7699..f0f2e2f 100644
--- a/migration.c
+++ b/migration.c
@@ -1071,6 +1071,12 @@ static int postcopy_start(MigrationState *ms)
         return -1;
     }
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_ram_listen(fb);
+
     qemu_savevm_state_complete(fb);
     DPRINTF("postcopy_start: sending req 3\n");
     qemu_savevm_send_reqack(fb, 3);
diff --git a/savevm.c b/savevm.c
index 859c96f..53e8a2c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1292,9 +1292,45 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+typedef struct ram_listen_thread_data {
+    QEMUFile *f;
+    LoadStateEntry_Head *lh;
+} ram_listen_thread_data;
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    ram_listen_thread_data *rltd = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    DPRINTF("postcopy_ram_listen_thread start");
+
+    load_res = qemu_loadvm_state_main(rltd->f, rltd->lh);
+
+    DPRINTF("postcopy_ram_listen_thread exiting");
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(rltd->f, load_res);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    g_free(rltd);
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive page data */
 static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 {
+    ram_listen_thread_data *rltd = g_malloc(sizeof(ram_listen_thread_data));
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
         error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
@@ -1313,8 +1349,25 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
-    return 0;
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    rltd->f = mis->file;
+    rltd->lh = &loadvm_handlers;
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, rltd, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+
+    /*
+     * all good - cause the loop that handled this command to exit because
+     * the new thread is taking over
+     */
+    return LOADVM_EXITCODE_QUITPARENT | LOADVM_EXITCODE_KEEPHANDLERS;
 }
 
 /* After all discards we can start running and asking for pages */
@@ -1640,6 +1693,11 @@ int qemu_loadvm_state(QEMUFile *f)
     QLIST_INIT(&loadvm_handlers);
     ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
+    if (migration_incoming_get_current()->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         cpu_synchronize_all_post_init();
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 17:51   ` Paolo Bonzini
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_ram_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/savevm.c b/savevm.c
index 53e8a2c..805bb21 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1373,6 +1373,8 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 /* After all discards we can start running and asking for pages */
 static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
@@ -1381,6 +1383,28 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
     }
 
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+    bdrv_clear_incoming_migration_all();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: cpu_synchronize_all_post_init");
+    cpu_synchronize_all_post_init();
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: vm_start");
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1389,11 +1413,15 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
-/* The end - with a byte from the source which can tell us to fail. */
-static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+/* The end - with a byte from the source which can tell us to fail.
+ * The source sends this either if there is a failure, or if it believes it's
+ * sent everything
+ */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis,
+                                          uint8_t status)
 {
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
@@ -1401,7 +1429,32 @@ static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
                      mis->postcopy_ram_state);
         return -1;
     }
-    return -1; /* TODO - expecting 1 byte good/fail */
+
+    DPRINTF("loadvm_postcopy_ram_handle_end status=%d", status);
+
+    if (!status) {
+        bool one_message = false;
+        /* This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet.
+         * TODO: Using an atomic_xchg or something for this
+         */
+        while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) {
+            if (!one_message) {
+                DPRINTF("%s: Waiting for RUN", __func__);
+                one_message = true;
+            }
+        }
+    }
+
+    if (status) {
+        error_report("CMD_POSTCOPY_RAM_END: error on source host (%d)",
+                     status);
+        qemu_file_set_error(mis->file, -EPIPE);
+    }
+
+    /* This will cause the listen thread to exit and call cleanup */
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
@@ -1548,7 +1601,7 @@ static int loadvm_process_command(QEMUFile *f,
                                                    len, 1)) {
             return -1;
         }
-        return loadvm_postcopy_ram_handle_end(mis);
+        return loadvm_postcopy_ram_handle_end(mis, qemu_get_byte(f));
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
@ 2014-10-03 17:47 ` Dr. David Alan Gilbert (git)
  2014-10-04 17:49   ` Paolo Bonzini
  2014-10-04 18:31   ` Paolo Bonzini
  2014-10-03 19:21 ` [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert
  2014-11-21  3:48 ` zhanghailiang
  48 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-10-03 17:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index f0f2e2f..1ee5b1b 100644
--- a/migration.c
+++ b/migration.c
@@ -205,12 +205,33 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
     int ret;
 
-    migration_incoming_state_init(f);
+    mis = migration_incoming_state_init(f);
 
     ret = qemu_loadvm_state(f);
 
+    DPRINTF("%s: ret=%d postcopy_ram_state=%d", __func__, ret,
+            mis->postcopy_ram_state);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * Where a migration had postcopy enabled (and thus went to advise)
+         * but managed to complete within the precopy period
+         */
+        postcopy_ram_incoming_cleanup(mis);
+    } else {
+        if ((ret >= 0) &&
+            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
+            /*
+             * Postcopy was started, cleanup should happen at the end of the
+             * postcopy thread.
+             */
+            DPRINTF("process_incoming_migration_co: exiting main branch");
+            return;
+        }
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (46 preceding siblings ...)
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2014-10-03 19:21 ` Dr. David Alan Gilbert
  2014-10-07  2:27   ` Cristian Klein
  2014-11-21  3:48 ` zhanghailiang
  48 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-03 19:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy


I've updated our github at:
https://github.com/orbitfp7/qemu/tree/wp3-postcopy

to have this version.

and it corresponds to the tag:
https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4

Dave

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is the 4th cut of my version of postcopy; it is designed for use with
> the Linux kernel additions just posted by Andrea Arcangeli here:
> 
> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
> 
> (Note: This is a new version compared to my previous postcopy patchset; you'll
> need to update the kernel to the new version.)
> 
> Other than the new kernel ABI (which is only a small change to the userspace side);
> the major changes are;
> 
>   a) Code for host page size != target page size
>   b) Support for migration over fd 
>      From Cristian Klein; this is for libvirt support which Cristian recently
>      posted to the libvirt list.
>   c) It's now build bisectable and builds on 32bit
> 
> Testing wise; I've now done many thousand of postcopy migrations without
> failure (both of idle and busy guests); so it seems pretty solid.
> 
> Must-TODO's:
>   1) A partially repeatable migration_cancel failure
>   2) virt_test's migrate.with_reboot test is failing
>   3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>     the source feels like it needs looking at for postcopy.
>   4) Paolo's comments with respect to the wakeup_request/is_running code
>      in the migration thread
>   5) xbzrle needs disabling once in postcopy
> 
> Later-TODO's:
>   1) Control the rate of background page transfers during postcopy to
>      reduce their impact on the latency of postcopy requests.
>   2) Work with RDMA
>   3) Could destination RP be made blocking (as per discussion with Paolo;
>      I'm still worried that that changes too many assumptions)
> 
> 
> 
> V4:
>   Initial support for host page size != target page size
>     - tested heavily on hps==tps
>     - only partially tested on hps!=tps systems
>     - This involved quite a bit of rework around the discard code
>   Updated to new kernel userfault ABI
>     - It won't work with the previous version
>   Fix mis-optimisation of postcopy request for wrong RAMBlock
>      request for block A offset n
>      un-needed fault for block B/m (already received - no req sent)
>      request for block B/l  - wrongly sent as request for A/l
>   Fix thinko in discard bitmap processing (missed last word of bitmap)
>      Symptom: remap failures near the top of RAM if postcopy started late
>   Fix bug that caused kernel page acknowledgments to be misaligned
>      May have meant the guest was paused for longer than required
>   Fix potential for crashing cleaning up failed RP
>   Fixes in docs (from Yang)
>   Handle migration by fd as sockets if they are sockets
>   Build tested on 32bit
>   Fully build bisectable (x86-64)
> 
> 
> Dave
> 
> Cristian Klein (1):
>   Handle bi-directional communication for fd migration
> 
> Dr. David Alan Gilbert (46):
>   QEMUSizedBuffer based QEMUFile
>   Tests: QEMUSizedBuffer/QEMUBuffer
>   Start documenting how postcopy works.
>   qemu_ram_foreach_block: pass up error value, and down the ramblock
>     name
>   improve DPRINTF macros, add to savevm
>   Add qemu_get_counted_string to read a string prefixed by a count byte
>   Create MigrationIncomingState
>   socket shutdown
>   Provide runtime Target page information
>   Return path: Open a return path on QEMUFile for sockets
>   Return path: socket_writev_buffer: Block even on non-blocking fd's
>   Migration commands
>   Return path: Control commands
>   Return path: Send responses from destination to source
>   Return path: Source handling of return path
>   qemu_loadvm errors and debug
>   ram_debug_dump_bitmap: Dump a migration bitmap as text
>   Rework loadvm path for subloops
>   Add migration-capability boolean for postcopy-ram.
>   Add wrappers and handlers for sending/receiving the postcopy-ram
>     migration messages.
>   QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>   migrate_init: Call from savevm
>   Allow savevm handlers to state whether they could go into postcopy
>   postcopy: OS support test
>   migrate_start_postcopy: Command to trigger transition to postcopy
>   MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>   qemu_savevm_state_complete: Postcopy changes
>   Postcopy page-map-incoming (PMI) structure
>   Postcopy: Maintain sentmap and calculate discard
>   postcopy: Incoming initialisation
>   postcopy: ram_enable_notify to switch on userfault
>   Postcopy: Postcopy startup in migration thread
>   Postcopy: Create a fault handler thread before marking the ram as
>     userfault
>   Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>   Page request: Process incoming page request
>   Page request: Consume pages off the post-copy queue
>   Add assertion to check migration_dirty_pages
>   postcopy_ram.c: place_page and helpers
>   Postcopy: Use helpers to map pages during migration
>   qemu_ram_block_from_host
>   Don't sync dirty bitmaps in postcopy
>   Host page!=target page: Cleanup bitmaps
>   Postcopy; Handle userfault requests
>   Start up a postcopy/listener thread ready for incoming page data
>   postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>   End of migration for postcopy
> 
>  Makefile.objs                    |    2 +-
>  arch_init.c                      |  739 +++++++++++++++++++++++++--
>  docs/migration.txt               |  189 +++++++
>  exec.c                           |   76 ++-
>  hmp-commands.hx                  |   15 +
>  hmp.c                            |    7 +
>  hmp.h                            |    1 +
>  include/exec/cpu-common.h        |    8 +-
>  include/migration/migration.h    |  130 +++++
>  include/migration/postcopy-ram.h |  106 ++++
>  include/migration/qemu-file.h    |   47 ++
>  include/migration/vmstate.h      |    2 +-
>  include/qemu/sockets.h           |    1 +
>  include/qemu/typedefs.h          |    9 +-
>  include/sysemu/sysemu.h          |   43 +-
>  migration-fd.c                   |   24 +-
>  migration-rdma.c                 |    4 +-
>  migration.c                      |  693 +++++++++++++++++++++++++-
>  postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json                 |   14 +-
>  qemu-file.c                      |  598 +++++++++++++++++++++-
>  qmp-commands.hx                  |   19 +
>  savevm.c                         |  881 +++++++++++++++++++++++++++++++--
>  tests/Makefile                   |    2 +-
>  tests/test-vmstate.c             |   74 +--
>  util/qemu-sockets.c              |   28 ++
>  26 files changed, 4550 insertions(+), 178 deletions(-)
>  create mode 100644 include/migration/postcopy-ram.h
>  create mode 100644 postcopy-ram.c
> 
> -- 
> 1.9.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2014-10-04 16:27   ` Paolo Bonzini
  2014-11-20 11:45     ` Dr. David Alan Gilbert
                       ` (2 more replies)
  2014-11-10  6:05   ` David Gibson
  1 sibling, 3 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 16:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Rework the migration thread to setup and start postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |   3 +
>  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 185 insertions(+), 19 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index b01cc17..f401775 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -125,6 +125,9 @@ struct MigrationState
>      /* Flag set once the migration has been asked to enter postcopy */
>      volatile bool start_postcopy;
>  
> +    /* Flag set once the migration thread is running (and needs joining) */
> +    volatile bool started_migration_thread;

volatile almost never does what you think it does. :)

In this case, I think only one thread reads/writes the variable so
"volatile" is unnecessary.

Otherwise, you would need to add actual memory barriers, atomic
operations, or synchronization primitives.

For start_postcopy, it is okay because it is just a hint to the compiler
and the processor will eventually see the assignment.  For this case
QEMU has atomic_read/atomic_set (corresponding to __ATOMIC_RELAXED in
C/C++1x), so you could use those as well.

>      /* bitmap of pages that have been sent at least once
>       * only maintained and used in postcopy at the moment
>       * where it's used to send the dirtymap at the start
> diff --git a/migration.c b/migration.c
> index 63d70b6..1731017 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -475,7 +475,10 @@ static void migrate_fd_cleanup(void *opaque)
>      if (s->file) {
>          trace_migrate_fd_cleanup();
>          qemu_mutex_unlock_iothread();
> -        qemu_thread_join(&s->thread);
> +        if (s->started_migration_thread) {
> +            qemu_thread_join(&s->thread);
> +            s->started_migration_thread = false;
> +        }
>          qemu_mutex_lock_iothread();
>  
>          qemu_fclose(s->file);
> @@ -872,7 +875,6 @@ out:
>      return NULL;
>  }
>  
> -__attribute__ (( unused )) /* Until later in patch series */
>  static int open_outgoing_return_path(MigrationState *ms)
>  {
>  
> @@ -890,7 +892,6 @@ static int open_outgoing_return_path(MigrationState *ms)
>      return 0;
>  }
>  
> -__attribute__ (( unused )) /* Until later in patch series */
>  static void await_outgoing_return_path_close(MigrationState *ms)
>  {
>      /*
> @@ -908,6 +909,97 @@ static void await_outgoing_return_path_close(MigrationState *ms)
>      DPRINTF("%s: Exit", __func__);
>  }
>  
> +/* Switch from normal iteration to postcopy
> + * Returns non-0 on error
> + */
> +static int postcopy_start(MigrationState *ms)
> +{
> +    int ret;
> +    const QEMUSizedBuffer *qsb;
> +    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
> +
> +    DPRINTF("postcopy_start\n");
> +    qemu_mutex_lock_iothread();
> +    DPRINTF("postcopy_start: setting run state\n");
> +    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +
> +    if (ret < 0) {
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;

Please use "goto" for error returns, like

fail_locked:
    qemu_mutex_unlock_iothread();
fail:
    migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
    return -1;

> +    }
> +
> +    /*
> +     * in Finish migrate and with the io-lock held everything should
> +     * be quiet, but we've potentially still got dirty pages and we
> +     * need to tell the destination to throw any pages it's already received
> +     * that are dirty
> +     */
> +    if (ram_postcopy_send_discard_bitmap(ms)) {
> +        DPRINTF("postcopy send discard bitmap failed\n");
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;
> +    }
> +
> +    DPRINTF("postcopy_start: sending req 2\n");
> +    qemu_savevm_send_reqack(ms->file, 2);

Perhaps move it below qemu_file_set_rate_limit, and add
trace_qemu_savevm_send_reqack?

Also what is 2/3/4?  Is this just for debugging or is it part of the
protocol?

> +    /*
> +     * send rest of state - note things that are doing postcopy
> +     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
> +     * wrap their state up here
> +     */
> +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> +    DPRINTF("postcopy_start: do state_complete\n");
> +
> +    /*
> +     * We need to leave the fd free for page transfers during the
> +     * loading of the device state, so wrap all the remaining
> +     * commands and state into a package that gets sent in one go
> +     */

The comments in the code are very nice.  Thanks.  This is a huge
improvement from the last version I received.

> +    QEMUFile *fb = qemu_bufopen("w", NULL);
> +    if (!fb) {
> +        error_report("Failed to create buffered file");
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;
> +    }
> +
> +    qemu_savevm_state_complete(fb);
> +    DPRINTF("postcopy_start: sending req 3\n");
> +    qemu_savevm_send_reqack(fb, 3);
> +
> +    qemu_savevm_send_postcopy_ram_run(fb);
> +
> +    /* <><> end of stuff going into the package */
> +    qsb = qemu_buf_get(fb);
> +
> +    /* Now send that blob */
> +    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
> +        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
> +                (unsigned long)(qsb_get_length(qsb)));
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        qemu_fclose(fb);

Close fb above migrate_set_state, and use goto as above.  Or just have
three labels.

> +        return -1;
> +    }
> +    qemu_savevm_send_packaged(ms->file, qsb);
> +    qemu_fclose(fb);
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    DPRINTF("postcopy_start not finished sending ack\n");
> +    qemu_savevm_send_reqack(ms->file, 4);
> +
> +    ret = qemu_file_get_error(ms->file);
> +    if (ret) {
> +        error_report("postcopy_start: Migration stream errored");

This should have been reported already.

> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +    }
> +
> +    return ret;
> +}
> +
>  /*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
> @@ -915,16 +1007,36 @@ static void await_outgoing_return_path_close(MigrationState *ms)
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    /* Used by the bandwidth calcs, updated later */
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
>      int64_t max_size = 0;
>      int64_t start_time = initial_time;
> +
>      bool old_vm_running = false;
>  
> +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> +    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
> +
>      qemu_savevm_state_begin(s->file, &s->params);
>  
> +    if (migrate_postcopy_ram()) {
> +        /* Now tell the dest that it should open it's end so it can reply */
> +        qemu_savevm_send_openrp(s->file);
> +
> +        /* And ask it to send an ack that will make stuff easier to debug */
> +        qemu_savevm_send_reqack(s->file, 1);
> +
> +        /* Tell the destination that we *might* want to do postcopy later;
> +         * if the other end can't do postcopy it should fail now, nice and
> +         * early.
> +         */
> +        qemu_savevm_send_postcopy_ram_advise(s->file);
> +    }

Should this be done here or in the save_state_begin function for RAM?
In general, I'm curious if there are parts of postcopy_start that
could/should be changed into new save state functions (with
postcopy_start just iterating on all devices).

>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> +    current_active_type = MIG_STATE_ACTIVE;
>      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
>  
>      DPRINTF("setup complete\n");
> @@ -945,37 +1057,74 @@ static void *migration_thread(void *opaque)
>                      " nonpost=%" PRIu64 ")\n",
>                      pending_size, max_size, pend_post, pend_nonpost);
>              if (pending_size && pending_size >= max_size) {
> +                /* Still a significant amount to transfer */
> +
> +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                if (migrate_postcopy_ram() &&
> +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> +                    pend_nonpost == 0 && s->start_postcopy) {
> +
> +                    if (!postcopy_start(s)) {
> +                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> +                    }
> +
> +                    continue;
> +                }
> +                /* Just another iteration step */
>                  qemu_savevm_state_iterate(s->file);
>              } else {
>                  int ret;
>  
> -                DPRINTF("done iterating\n");
> -                qemu_mutex_lock_iothread();
> -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> -                old_vm_running = runstate_is_running();
> +                DPRINTF("done iterating pending size %" PRIu64 "\n",
> +                        pending_size);
> +
> +                if (s->state == MIG_STATE_ACTIVE) {
> +                    qemu_mutex_lock_iothread();
> +                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> +                    old_vm_running = runstate_is_running();
> +
> +                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +                    if (ret >= 0) {
> +                        qemu_file_set_rate_limit(s->file, INT64_MAX);
> +                        qemu_savevm_state_complete(s->file);
> +                    }
> +                    qemu_mutex_unlock_iothread();
> +
> +                    if (ret < 0) {
> +                        migrate_set_state(s, current_active_type,
> +                                          MIG_STATE_ERROR);
> +                        break;
> +                    }
> +                } else if (s->state == MIG_STATE_POSTCOPY_ACTIVE) {
> +                    DPRINTF("postcopy end\n");
> +
> +                    qemu_savevm_state_postcopy_complete(s->file);
> +                    DPRINTF("postcopy end after complete\n");
>  
> -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> -                if (ret >= 0) {
> -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> -                    qemu_savevm_state_complete(s->file);
>                  }
> -                qemu_mutex_unlock_iothread();
>  
> -                if (ret < 0) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> -                    break;
> +                /*
> +                 * If rp was opened we must clean up the thread before
> +                 * cleaning everything else up.
> +                 * Postcopy opens rp if enabled (even if it's not avtivated)
> +                 */
> +                if (migrate_postcopy_ram()) {
> +                    DPRINTF("before rp close");
> +                    await_outgoing_return_path_close(s);

Should this be done even if there is an error?  Perhaps move it
altogether out of the big migration thread while() loop?

> +                    DPRINTF("after rp close");
>                  }
> -
>                  if (!qemu_file_get_error(s->file)) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
> +                    migrate_set_state(s, current_active_type,
> +                                      MIG_STATE_COMPLETED);
>                      break;
>                  }

This "else" is huge, can you extract it into its own function?

>              }
>          }
>  
>          if (qemu_file_get_error(s->file)) {
> -            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> +            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> +            DPRINTF("migration_thread: file is in error state\n");
>              break;
>          }
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> @@ -1006,6 +1155,7 @@ static void *migration_thread(void *opaque)
>          }
>      }
>  
> +    DPRINTF("migration_thread: After loop");
>      qemu_mutex_lock_iothread();
>      if (s->state == MIG_STATE_COMPLETED) {
>          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> @@ -1043,6 +1193,19 @@ void migrate_fd_connect(MigrationState *s)
>      /* Notify before starting migration thread */
>      notifier_list_notify(&migration_state_notifiers, s);
>  
> +    /* Open the return path; currently for postcopy but other things might
> +     * also want it.
> +     */
> +    if (migrate_postcopy_ram()) {
> +        if (open_outgoing_return_path(s)) {
> +            error_report("Unable to open return-path for postcopy");
> +            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
> +            migrate_fd_cleanup(s);
> +            return;
> +        }
> +    }
> +
>      qemu_thread_create(&s->thread, "migration", migration_thread, s,
>                         QEMU_THREAD_JOINABLE);
> +    s->started_migration_thread = true;
>  }
> 

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2014-10-04 16:42   ` Paolo Bonzini
  2014-10-06 19:00     ` Dr. David Alan Gilbert
  2014-11-05  6:49   ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 16:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
> +                                       ram_addr_t offset, ram_addr_t length,
> +                                       void *opaque)

Weird name, and I'm not referring to the British -ise. :)

Perhaps ram_block_enable_userfault or ram_block_enable_notify?  It helps
clarity to limit the use of the "postcopy_ram_" prefix for static function.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-10-04 16:46   ` Paolo Bonzini
  2014-10-07  8:58     ` Dr. David Alan Gilbert
  2014-11-03  5:08   ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 16:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
>  
> +/* These are ORable flags */

... make them an "enum".

> +const int LOADVM_EXITCODE_QUITLOOP     =  1;
> +const int LOADVM_EXITCODE_QUITPARENT   =  2;

LOADVM_QUIT_ALL, LOADVM_QUIT respectively?

> +const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
> +

Is it more common to drop or keep handlers?

In either case, please add a comment to the three constants that details
how to use them.  In particular, please document why you should drop
(resp. keep) handlers...

Is it by chance that they are only used in savevm.c?  Should they be
moved to a header file?

> 
> +    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
> +        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
> +        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
> +        exitcode &= LOADVM_EXITCODE_QUITLOOP;

Either you want |=, or the first &= is useless.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2014-10-04 17:49   ` Paolo Bonzini
  2014-10-23 14:24     ` Dr. David Alan Gilbert
  2014-10-04 18:31   ` Paolo Bonzini
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 17:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +            mis->postcopy_ram_state);
> +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
> +        /*
> +         * Where a migration had postcopy enabled (and thus went to advise)
> +         * but managed to complete within the precopy period
> +         */
> +        postcopy_ram_incoming_cleanup(mis);
> +    } else {
> +        if ((ret >= 0) &&
> +            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
> +            /*
> +             * Postcopy was started, cleanup should happen at the end of the
> +             * postcopy thread.
> +             */
> +            DPRINTF("process_incoming_migration_co: exiting main branch");
> +            return;
> +        }

Extra parentheses and extra nesting.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
@ 2014-10-04 17:51   ` Paolo Bonzini
  2014-10-23 12:18     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 17:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +        bool one_message = false;
> +        /* This looks good, but it's possible that the device loading in the
> +         * main thread hasn't finished yet, and so we might not be in 'RUN'
> +         * state yet.
> +         * TODO: Using an atomic_xchg or something for this

This looks like a good match for QemuEvent.  Or just mutex & condvar.

> +         */
> +        while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) {

What if we had postcopy of something else than RAM?  Can you remove the
"ram" part from the symbols that do not directly deal with RAM but just
with the protocol?

Paolo

> +            if (!one_message) {
> +                DPRINTF("%s: Waiting for RUN", __func__);
> +                one_message = true;
> +            }
> +        }
> +    }

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2014-10-04 18:04   ` Paolo Bonzini
  2014-10-07 11:35     ` Dr. David Alan Gilbert
  2014-11-11  1:13   ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +        /*
> +         * Don't break host-page chunks up with queue items
> +         * so only unqueue if,
> +         *   a) The last item came from the queue anyway
> +         *   b) The last sent item was the last target-page in a host page
> +         */
> +        if (last_was_from_queue || (!last_sent_block) ||

Extra parentheses.  Is the last_was_from_queue check necessary?  Or
would one of the other checks be true anyway if last_was_from_queue is true?

> +            /* We're sending this page, and since it's postcopy nothing else
> +             * will dirty it, and we must make sure it doesn't get sent again.
> +             */
> +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> +                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
> +                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
> +                        test_bit(bitoffset, ms->sentmap));

If a DPRINTF occurs in a loop, please change it to a tracepoint.

This function looks like a candidate for cleaning its logic up and/or
splitting it.  But it can be done later by the poor soul who will touch
it next. :)

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2014-10-04 18:08   ` Paolo Bonzini
  2014-10-23 16:23     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
>      QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
> +    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */

OPEN_RETURN_PATH?

> +    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */

SEND_ACK or ACK_REQUESTED?

>      QEMU_VM_CMD_AFTERLASTVALID

Pleaseseparatewords.  Is this enum actually used at all?

Please avoid the difference between QEMU_VM_CMD and MIG_RPCOMM_.

Perhaps MIG_CMD and MIG_RPCMD_?

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/47] socket shutdown
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 08/47] socket shutdown Dr. David Alan Gilbert (git)
@ 2014-10-04 18:09   ` Paolo Bonzini
  2014-10-07 10:00     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +#ifndef WIN32
> +    if (rd) {
> +        how = SHUT_RD;
> +    }
> +
> +    if (wr) {
> +        how = rd ? SHUT_RDWR : SHUT_WR;
> +    }
> +
> +#else
> +    /* Untested */
> +    if (rd) {
> +        how = SD_RECEIVE;
> +    }
> +
> +    if (wr) {
> +        how = rd ? SD_BOTH : SD_SEND;
> +    }
> +
> +#endif
> +


These are the same on Windows and non-Windows actually.  Just #define
SHUT_* to 0/1/2 and avoid the wrapper.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2014-10-04 18:14   ` Paolo Bonzini
  2014-10-23 18:00     ` Dr. David Alan Gilbert
  2014-10-16  8:26   ` zhanghailiang
  2014-11-03  3:46   ` David Gibson
  2 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +/* Source side RP state */
> +struct MigrationRetPathState {
> +    uint32_t      latest_ack;
> +    QemuThread    rp_thread;
> +    bool          error;

Should the QemuFile be in here?

> +};
> +

Also please do not abbrev words, and add a typedef that matches the
struct if it is useful.  If it is not, just embed the struct without
giving the type a name (struct { } rp_state).

> +static bool migration_already_active(MigrationState *ms)
> +{
> +    switch (ms->state) {
> +    case MIG_STATE_ACTIVE:
> +    case MIG_STATE_SETUP:
> +        return true;
> +
> +    default:
> +        return false;
> +
> +    }
> +}

Should CANCELLING also be considered active?  It is on the source->dest
path.

> 
> +static void await_outgoing_return_path_close(MigrationState *ms)
> +{
> +    /*
> +     * If this is a normal exit then the destination will send a SHUT and the
> +     * rp_thread will exit, however if there's an error we need to cause
> +     * it to exit, which we can do by a shutdown.
> +     * (canceling must also shutdown to stop us getting stuck here if
> +     * the destination died at just the wrong place)
> +     */
> +    if (qemu_file_get_error(ms->file) && ms->return_path) {
> +        qemu_file_shutdown(ms->return_path);
> +    }

As mentioned early, I think it's simpler to let these function handle
themselves the case where there is no return path, and call them
unconditionally.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
  2014-10-04 17:49   ` Paolo Bonzini
@ 2014-10-04 18:31   ` Paolo Bonzini
  2014-10-07 10:29     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> +            mis->postcopy_ram_state);
> +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
> +        /*
> +         * Where a migration had postcopy enabled (and thus went to advise)
> +         * but managed to complete within the precopy period
> +         */
> +        postcopy_ram_incoming_cleanup(mis);
> +    } else {
> +        if ((ret >= 0) &&
> +            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {

Instead of the >, it is perhaps nicer to use an outer if that checks for
state != NONE?  Because in fact this check is for state != NONE, having
ADVISE been handled above.

Paolo

> +            /*
> +             * Postcopy was started, cleanup should happen at the end of the
> +             * postcopy thread.
> +             */
> +            DPRINTF("process_incoming_migration_co: exiting main branch");
> +            return;
> +        }
> +    }

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
@ 2014-10-04 18:32   ` Paolo Bonzini
  2014-10-06 18:51     ` Dr. David Alan Gilbert
  2014-11-11  1:14   ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-04 18:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> 
> I've seen it go negative once during dev, it shouldn't
> happen.

You can move it earlier, perhaps even as patch 1, since it does not have
any dependency on postcopy and can go in at any time.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
  2014-10-04 18:32   ` Paolo Bonzini
@ 2014-10-06 18:51     ` Dr. David Alan Gilbert
  2014-10-06 20:30       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-06 18:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > 
> > I've seen it go negative once during dev, it shouldn't
> > happen.
> 
> You can move it earlier, perhaps even as patch 1, since it does not have
> any dependency on postcopy and can go in at any time.

OK, I moved it to the 2nd patch - just after the docs (Eric previously said
he liked those at the start of a patch set).

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram.
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-10-06 18:59   ` Eric Blake
  2014-10-06 19:07     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Eric Blake @ 2014-10-06 18:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

On 10/03/2014 11:47 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  include/migration/migration.h | 1 +
>  migration.c                   | 9 +++++++++
>  qapi-schema.json              | 6 +++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 

>  #
> +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> +#          migration fails during postcopy the VM will fail.  (since 2.2)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }

Can we wrap this to keep things in 80 columns?  Also, the question was
raised on the libvirt list on whether the interface is stable enough to
name this 'postcopy-ram' from the get-go (rather than marking the
interface experimental), so that libvirt can start using it sooner.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-10-04 16:42   ` Paolo Bonzini
@ 2014-10-06 19:00     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-06 19:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
> > +                                       ram_addr_t offset, ram_addr_t length,
> > +                                       void *opaque)
> 
> Weird name, and I'm not referring to the British -ise. :)
> 
> Perhaps ram_block_enable_userfault or ram_block_enable_notify?  It helps
> clarity to limit the use of the "postcopy_ram_" prefix for static function.

Yep, that's fair enough; I'll make it ram_block_enable_notify.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram.
  2014-10-06 18:59   ` Eric Blake
@ 2014-10-06 19:07     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-06 19:07 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Eric Blake (eblake@redhat.com) wrote:
> On 10/03/2014 11:47 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > ---
> >  include/migration/migration.h | 1 +
> >  migration.c                   | 9 +++++++++
> >  qapi-schema.json              | 6 +++++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> 
> >  #
> > +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> > +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> > +#          migration fails during postcopy the VM will fail.  (since 2.2)
> > +#
> >  # Since: 1.2
> >  ##
> >  { 'enum': 'MigrationCapability',
> > -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> > +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
> 
> Can we wrap this to keep things in 80 columns?

Done.

> Also, the question was
> raised on the libvirt list on whether the interface is stable enough to
> name this 'postcopy-ram' from the get-go (rather than marking the
> interface experimental), so that libvirt can start using it sooner.

I'm still nervous about that, what I intend to do is add one
patch at the end of the series that removes the x-  so that can
get discussed separately.

While I'm confident that the interface to libvirt is stable, removing the x-
declares that the whole thing is stable and I then have to maintain
migration compatibility; and it seemed sensible to let people try it for 
a release; however if libvirt have no way to support QEMUs ability to have
experimental features, I guess no one is actually going to try it, which is
very disappointing.

Dave

> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
  2014-10-06 18:51     ` Dr. David Alan Gilbert
@ 2014-10-06 20:30       ` Paolo Bonzini
  0 siblings, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-06 20:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

Il 06/10/2014 20:51, Dr. David Alan Gilbert ha scritto:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
>>>
>>> I've seen it go negative once during dev, it shouldn't
>>> happen.
>>
>> You can move it earlier, perhaps even as patch 1, since it does not have
>> any dependency on postcopy and can go in at any time.
> 
> OK, I moved it to the 2nd patch - just after the docs (Eric previously said
> he liked those at the start of a patch set).

What about sending it for 2.2?  Might as well package it up with Peter's
flags patch and send a pull request, since Juan is busy and has hardly
written to the list for several months now.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-10-03 19:21 ` [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert
@ 2014-10-07  2:27   ` Cristian Klein
  2014-10-07  8:12     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Cristian Klein @ 2014-10-07  2:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Andrea Arcangeli, yamahata, lilei, quintela, qemu-devel,
	amit.shah, yanghy

On 04 Oct 2014, at 4:21 , Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:

> 
> I've updated our github at:
> https://github.com/orbitfp7/qemu/tree/wp3-postcopy
> 
> to have this version.
> 
> and it corresponds to the tag:
> https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4

Hi Dave,

I just tested this version of post-copy using the libvirt patches I recently posted and it works a lot better. The video streaming VM migrates with a downtime of less than 1 second. Before post-copy finishes, the VM is a bit slow but otherwise running well.

I also tested the patches with a VM doing ‘ping’ and the downtime was around 0.6 seconds. I suspect that this delay could be caused by libvirt and not by qemu. Notice that, libvirt is a bit special, in the sense that the VM is migrated in suspended state and resumed only after the network was set up on the destination. I will investigate and let you know.

Cristian 

> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> 
>> Hi,
>>  This is the 4th cut of my version of postcopy; it is designed for use with
>> the Linux kernel additions just posted by Andrea Arcangeli here:
>> 
>> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
>> 
>> (Note: This is a new version compared to my previous postcopy patchset; you'll
>> need to update the kernel to the new version.)
>> 
>> Other than the new kernel ABI (which is only a small change to the userspace side);
>> the major changes are;
>> 
>>  a) Code for host page size != target page size
>>  b) Support for migration over fd 
>>     From Cristian Klein; this is for libvirt support which Cristian recently
>>     posted to the libvirt list.
>>  c) It's now build bisectable and builds on 32bit
>> 
>> Testing wise; I've now done many thousand of postcopy migrations without
>> failure (both of idle and busy guests); so it seems pretty solid.
>> 
>> Must-TODO's:
>>  1) A partially repeatable migration_cancel failure
>>  2) virt_test's migrate.with_reboot test is failing
>>  3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>>    the source feels like it needs looking at for postcopy.
>>  4) Paolo's comments with respect to the wakeup_request/is_running code
>>     in the migration thread
>>  5) xbzrle needs disabling once in postcopy
>> 
>> Later-TODO's:
>>  1) Control the rate of background page transfers during postcopy to
>>     reduce their impact on the latency of postcopy requests.
>>  2) Work with RDMA
>>  3) Could destination RP be made blocking (as per discussion with Paolo;
>>     I'm still worried that that changes too many assumptions)
>> 
>> 
>> 
>> V4:
>>  Initial support for host page size != target page size
>>    - tested heavily on hps==tps
>>    - only partially tested on hps!=tps systems
>>    - This involved quite a bit of rework around the discard code
>>  Updated to new kernel userfault ABI
>>    - It won't work with the previous version
>>  Fix mis-optimisation of postcopy request for wrong RAMBlock
>>     request for block A offset n
>>     un-needed fault for block B/m (already received - no req sent)
>>     request for block B/l  - wrongly sent as request for A/l
>>  Fix thinko in discard bitmap processing (missed last word of bitmap)
>>     Symptom: remap failures near the top of RAM if postcopy started late
>>  Fix bug that caused kernel page acknowledgments to be misaligned
>>     May have meant the guest was paused for longer than required
>>  Fix potential for crashing cleaning up failed RP
>>  Fixes in docs (from Yang)
>>  Handle migration by fd as sockets if they are sockets
>>  Build tested on 32bit
>>  Fully build bisectable (x86-64)
>> 
>> 
>> Dave
>> 
>> Cristian Klein (1):
>>  Handle bi-directional communication for fd migration
>> 
>> Dr. David Alan Gilbert (46):
>>  QEMUSizedBuffer based QEMUFile
>>  Tests: QEMUSizedBuffer/QEMUBuffer
>>  Start documenting how postcopy works.
>>  qemu_ram_foreach_block: pass up error value, and down the ramblock
>>    name
>>  improve DPRINTF macros, add to savevm
>>  Add qemu_get_counted_string to read a string prefixed by a count byte
>>  Create MigrationIncomingState
>>  socket shutdown
>>  Provide runtime Target page information
>>  Return path: Open a return path on QEMUFile for sockets
>>  Return path: socket_writev_buffer: Block even on non-blocking fd's
>>  Migration commands
>>  Return path: Control commands
>>  Return path: Send responses from destination to source
>>  Return path: Source handling of return path
>>  qemu_loadvm errors and debug
>>  ram_debug_dump_bitmap: Dump a migration bitmap as text
>>  Rework loadvm path for subloops
>>  Add migration-capability boolean for postcopy-ram.
>>  Add wrappers and handlers for sending/receiving the postcopy-ram
>>    migration messages.
>>  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>>  migrate_init: Call from savevm
>>  Allow savevm handlers to state whether they could go into postcopy
>>  postcopy: OS support test
>>  migrate_start_postcopy: Command to trigger transition to postcopy
>>  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>>  qemu_savevm_state_complete: Postcopy changes
>>  Postcopy page-map-incoming (PMI) structure
>>  Postcopy: Maintain sentmap and calculate discard
>>  postcopy: Incoming initialisation
>>  postcopy: ram_enable_notify to switch on userfault
>>  Postcopy: Postcopy startup in migration thread
>>  Postcopy: Create a fault handler thread before marking the ram as
>>    userfault
>>  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>>  Page request: Process incoming page request
>>  Page request: Consume pages off the post-copy queue
>>  Add assertion to check migration_dirty_pages
>>  postcopy_ram.c: place_page and helpers
>>  Postcopy: Use helpers to map pages during migration
>>  qemu_ram_block_from_host
>>  Don't sync dirty bitmaps in postcopy
>>  Host page!=target page: Cleanup bitmaps
>>  Postcopy; Handle userfault requests
>>  Start up a postcopy/listener thread ready for incoming page data
>>  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>>  End of migration for postcopy
>> 
>> Makefile.objs                    |    2 +-
>> arch_init.c                      |  739 +++++++++++++++++++++++++--
>> docs/migration.txt               |  189 +++++++
>> exec.c                           |   76 ++-
>> hmp-commands.hx                  |   15 +
>> hmp.c                            |    7 +
>> hmp.h                            |    1 +
>> include/exec/cpu-common.h        |    8 +-
>> include/migration/migration.h    |  130 +++++
>> include/migration/postcopy-ram.h |  106 ++++
>> include/migration/qemu-file.h    |   47 ++
>> include/migration/vmstate.h      |    2 +-
>> include/qemu/sockets.h           |    1 +
>> include/qemu/typedefs.h          |    9 +-
>> include/sysemu/sysemu.h          |   43 +-
>> migration-fd.c                   |   24 +-
>> migration-rdma.c                 |    4 +-
>> migration.c                      |  693 +++++++++++++++++++++++++-
>> postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
>> qapi-schema.json                 |   14 +-
>> qemu-file.c                      |  598 +++++++++++++++++++++-
>> qmp-commands.hx                  |   19 +
>> savevm.c                         |  881 +++++++++++++++++++++++++++++++--
>> tests/Makefile                   |    2 +-
>> tests/test-vmstate.c             |   74 +--
>> util/qemu-sockets.c              |   28 ++
>> 26 files changed, 4550 insertions(+), 178 deletions(-)
>> create mode 100644 include/migration/postcopy-ram.h
>> create mode 100644 postcopy-ram.c
>> 
>> -- 
>> 1.9.3
>> 
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-10-07  2:27   ` Cristian Klein
@ 2014-10-07  8:12     ` Dr. David Alan Gilbert
  2014-10-08  8:36       ` Cristian Klein
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07  8:12 UTC (permalink / raw)
  To: Cristian Klein
  Cc: Andrea Arcangeli, yamahata, lilei, quintela, qemu-devel,
	amit.shah, yanghy

* Cristian Klein (cristian.klein@cs.umu.se) wrote:
> On 04 Oct 2014, at 4:21 , Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> > 
> > I've updated our github at:
> > https://github.com/orbitfp7/qemu/tree/wp3-postcopy
> > 
> > to have this version.
> > 
> > and it corresponds to the tag:
> > https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4
> 
> Hi Dave,
> 
> I just tested this version of post-copy using the libvirt patches I recently posted and it works a lot better. The video streaming VM migrates with a downtime of less than 1 second. Before post-copy finishes, the VM is a bit slow but otherwise running well.
> 
> I also tested the patches with a VM doing ?ping? and the downtime was around 0.6 seconds. I suspect that this delay could be caused by libvirt and not by qemu. Notice that, libvirt is a bit special, in the sense that the VM is migrated in suspended state and resumed only after the network was set up on the destination. I will investigate and let you know.

That's great news - although I'm not quite sure what caused the improvement, there
were quite a few minor bug fixes and things but nothing that I can think of that
would directly contribute (except the patches I'd sent you which you'd already tried).

Dave

> 
> Cristian 
> 
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> 
> >> Hi,
> >>  This is the 4th cut of my version of postcopy; it is designed for use with
> >> the Linux kernel additions just posted by Andrea Arcangeli here:
> >> 
> >> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
> >> 
> >> (Note: This is a new version compared to my previous postcopy patchset; you'll
> >> need to update the kernel to the new version.)
> >> 
> >> Other than the new kernel ABI (which is only a small change to the userspace side);
> >> the major changes are;
> >> 
> >>  a) Code for host page size != target page size
> >>  b) Support for migration over fd 
> >>     From Cristian Klein; this is for libvirt support which Cristian recently
> >>     posted to the libvirt list.
> >>  c) It's now build bisectable and builds on 32bit
> >> 
> >> Testing wise; I've now done many thousand of postcopy migrations without
> >> failure (both of idle and busy guests); so it seems pretty solid.
> >> 
> >> Must-TODO's:
> >>  1) A partially repeatable migration_cancel failure
> >>  2) virt_test's migrate.with_reboot test is failing
> >>  3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
> >>    the source feels like it needs looking at for postcopy.
> >>  4) Paolo's comments with respect to the wakeup_request/is_running code
> >>     in the migration thread
> >>  5) xbzrle needs disabling once in postcopy
> >> 
> >> Later-TODO's:
> >>  1) Control the rate of background page transfers during postcopy to
> >>     reduce their impact on the latency of postcopy requests.
> >>  2) Work with RDMA
> >>  3) Could destination RP be made blocking (as per discussion with Paolo;
> >>     I'm still worried that that changes too many assumptions)
> >> 
> >> 
> >> 
> >> V4:
> >>  Initial support for host page size != target page size
> >>    - tested heavily on hps==tps
> >>    - only partially tested on hps!=tps systems
> >>    - This involved quite a bit of rework around the discard code
> >>  Updated to new kernel userfault ABI
> >>    - It won't work with the previous version
> >>  Fix mis-optimisation of postcopy request for wrong RAMBlock
> >>     request for block A offset n
> >>     un-needed fault for block B/m (already received - no req sent)
> >>     request for block B/l  - wrongly sent as request for A/l
> >>  Fix thinko in discard bitmap processing (missed last word of bitmap)
> >>     Symptom: remap failures near the top of RAM if postcopy started late
> >>  Fix bug that caused kernel page acknowledgments to be misaligned
> >>     May have meant the guest was paused for longer than required
> >>  Fix potential for crashing cleaning up failed RP
> >>  Fixes in docs (from Yang)
> >>  Handle migration by fd as sockets if they are sockets
> >>  Build tested on 32bit
> >>  Fully build bisectable (x86-64)
> >> 
> >> 
> >> Dave
> >> 
> >> Cristian Klein (1):
> >>  Handle bi-directional communication for fd migration
> >> 
> >> Dr. David Alan Gilbert (46):
> >>  QEMUSizedBuffer based QEMUFile
> >>  Tests: QEMUSizedBuffer/QEMUBuffer
> >>  Start documenting how postcopy works.
> >>  qemu_ram_foreach_block: pass up error value, and down the ramblock
> >>    name
> >>  improve DPRINTF macros, add to savevm
> >>  Add qemu_get_counted_string to read a string prefixed by a count byte
> >>  Create MigrationIncomingState
> >>  socket shutdown
> >>  Provide runtime Target page information
> >>  Return path: Open a return path on QEMUFile for sockets
> >>  Return path: socket_writev_buffer: Block even on non-blocking fd's
> >>  Migration commands
> >>  Return path: Control commands
> >>  Return path: Send responses from destination to source
> >>  Return path: Source handling of return path
> >>  qemu_loadvm errors and debug
> >>  ram_debug_dump_bitmap: Dump a migration bitmap as text
> >>  Rework loadvm path for subloops
> >>  Add migration-capability boolean for postcopy-ram.
> >>  Add wrappers and handlers for sending/receiving the postcopy-ram
> >>    migration messages.
> >>  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
> >>  migrate_init: Call from savevm
> >>  Allow savevm handlers to state whether they could go into postcopy
> >>  postcopy: OS support test
> >>  migrate_start_postcopy: Command to trigger transition to postcopy
> >>  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
> >>  qemu_savevm_state_complete: Postcopy changes
> >>  Postcopy page-map-incoming (PMI) structure
> >>  Postcopy: Maintain sentmap and calculate discard
> >>  postcopy: Incoming initialisation
> >>  postcopy: ram_enable_notify to switch on userfault
> >>  Postcopy: Postcopy startup in migration thread
> >>  Postcopy: Create a fault handler thread before marking the ram as
> >>    userfault
> >>  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
> >>  Page request: Process incoming page request
> >>  Page request: Consume pages off the post-copy queue
> >>  Add assertion to check migration_dirty_pages
> >>  postcopy_ram.c: place_page and helpers
> >>  Postcopy: Use helpers to map pages during migration
> >>  qemu_ram_block_from_host
> >>  Don't sync dirty bitmaps in postcopy
> >>  Host page!=target page: Cleanup bitmaps
> >>  Postcopy; Handle userfault requests
> >>  Start up a postcopy/listener thread ready for incoming page data
> >>  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
> >>  End of migration for postcopy
> >> 
> >> Makefile.objs                    |    2 +-
> >> arch_init.c                      |  739 +++++++++++++++++++++++++--
> >> docs/migration.txt               |  189 +++++++
> >> exec.c                           |   76 ++-
> >> hmp-commands.hx                  |   15 +
> >> hmp.c                            |    7 +
> >> hmp.h                            |    1 +
> >> include/exec/cpu-common.h        |    8 +-
> >> include/migration/migration.h    |  130 +++++
> >> include/migration/postcopy-ram.h |  106 ++++
> >> include/migration/qemu-file.h    |   47 ++
> >> include/migration/vmstate.h      |    2 +-
> >> include/qemu/sockets.h           |    1 +
> >> include/qemu/typedefs.h          |    9 +-
> >> include/sysemu/sysemu.h          |   43 +-
> >> migration-fd.c                   |   24 +-
> >> migration-rdma.c                 |    4 +-
> >> migration.c                      |  693 +++++++++++++++++++++++++-
> >> postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
> >> qapi-schema.json                 |   14 +-
> >> qemu-file.c                      |  598 +++++++++++++++++++++-
> >> qmp-commands.hx                  |   19 +
> >> savevm.c                         |  881 +++++++++++++++++++++++++++++++--
> >> tests/Makefile                   |    2 +-
> >> tests/test-vmstate.c             |   74 +--
> >> util/qemu-sockets.c              |   28 ++
> >> 26 files changed, 4550 insertions(+), 178 deletions(-)
> >> create mode 100644 include/migration/postcopy-ram.h
> >> create mode 100644 postcopy-ram.c
> >> 
> >> -- 
> >> 1.9.3
> >> 
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-04 16:46   ` Paolo Bonzini
@ 2014-10-07  8:58     ` Dr. David Alan Gilbert
  2014-10-07 10:12       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07  8:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> >  
> > +/* These are ORable flags */
> 
> ... make them an "enum".

OK, will do - I'd generally tended to avoid using enum for things
that were ORable where the combinations weren't themselves members
of the enum; but I can do that.

> > +const int LOADVM_EXITCODE_QUITLOOP     =  1;
> > +const int LOADVM_EXITCODE_QUITPARENT   =  2;
> 
> LOADVM_QUIT_ALL, LOADVM_QUIT respectively?


> > +const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
> > +
> 
> Is it more common to drop or keep handlers?

I'ts more common to drop them.

> In either case, please add a comment to the three constants that details
> how to use them.  In particular, please document why you should drop
> (resp. keep) handlers...

Does this make it clearer:

/* ORable flags that control the (potentially nested) loadvm_state loops */
enum LoadVMExitCodes {
    /* Quit the loop level that received this command */
    LOADVM_QUIT_LOOP     =  1,
    /* Quit this loop and our parent */
    LOADVM_QUIT_PARENT   =  2,
    /*
     * Keep the LoadStateEntry handler list after the loop exits,
     * because they're being used in another thread.
     */
    LOADVM_KEEP_HANDLERS =  4,
};

> Is it by chance that they are only used in savevm.c?  Should they be
> moved to a header file?

They're local.

> > +    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
> > +        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
> > +        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
> > +        exitcode &= LOADVM_EXITCODE_QUITLOOP;
> 
> Either you want |=, or the first &= is useless.

Ooh nicely spotted; yes that should be |=  - now I need to figure out why this
didn't break things.

The idea is we have:
 1   outer loadvm_state loop
 2      receives packaged command
 3        inner_loadvm_state loop
 4          receives handle_listen
 5          < QUITPARENT
 6        < QUITLOOP
 7       < QUITLOOP
 8   exits

so QUITPARENT causes it's parent to exit, and to do that
the inner loop transforms QUITPARENT into QUITLOOP as it's
exit.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/47] socket shutdown
  2014-10-04 18:09   ` Paolo Bonzini
@ 2014-10-07 10:00     ` Dr. David Alan Gilbert
  2014-10-07 11:10       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07 10:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +#ifndef WIN32
> > +    if (rd) {
> > +        how = SHUT_RD;
> > +    }
> > +
> > +    if (wr) {
> > +        how = rd ? SHUT_RDWR : SHUT_WR;
> > +    }
> > +
> > +#else
> > +    /* Untested */
> > +    if (rd) {
> > +        how = SD_RECEIVE;
> > +    }
> > +
> > +    if (wr) {
> > +        how = rd ? SD_BOTH : SD_SEND;
> > +    }
> > +
> > +#endif
> > +
> 
> 
> These are the same on Windows and non-Windows actually.  Just #define
> SHUT_* to 0/1/2 and avoid the wrapper.

OK, something like this? (the qemu-file.c abstraction is still needed
to cover QEMUFile's that aren't simple sockets, but I've removed the 
second layer in util/qemu-sockets.c).


--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -44,6 +44,13 @@ int socket_set_fast_reuse(int fd);
 int send_all(int fd, const void *buf, int len1);
 int recv_all(int fd, void *buf, int len1, bool single_read);
 
+#ifdef WIN32
+/* Windows has different names for the same constants with the same values */
+#define SHUT_RD   0
+#define SHUT_WR   1
+#define SHUT_RDWR 2
+#endif
+
 /* callback function for nonblocking connect
  * valid fd on success, negative error code on failure
  */

--- a/qemu-file.c
+++ b/qemu-file.c
@@ -90,6 +90,13 @@ static int socket_close(void *opaque)
     return 0;
 }
 
+static int socket_shutdown(void *opaque, bool rd, bool wr)
+{
+    QEMUFileSocket *s = opaque;
+
+    return shutdown(s->fd, rd ? (wr ? SHUT_RDWR : SHUT_RD) : SHUT_WR);
+}
+
 static int stdio_get_fd(void *opaque)
 {
     QEMUFileStdio *s = opaque;
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-07  8:58     ` Dr. David Alan Gilbert
@ 2014-10-07 10:12       ` Paolo Bonzini
  2014-10-07 10:21         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-07 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

Il 07/10/2014 10:58, Dr. David Alan Gilbert ha scritto:
> 
>>> > > +    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
>>> > > +        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
>>> > > +        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
>>> > > +        exitcode &= LOADVM_EXITCODE_QUITLOOP;
>> > 
>> > Either you want |=, or the first &= is useless.
> Ooh nicely spotted; yes that should be |=  - now I need to figure out why this
> didn't break things.
> 
> The idea is we have:
>  1   outer loadvm_state loop
>  2      receives packaged command
>  3        inner_loadvm_state loop
>  4          receives handle_listen
>  5          < QUITPARENT
>  6        < QUITLOOP
>  7       < QUITLOOP
>  8   exits
> 
> so QUITPARENT causes it's parent to exit, and to do that
> the inner loop transforms QUITPARENT into QUITLOOP as it's
> exit.

Yes, that was my understanding as well.

We have only two nested loops, but if we had three, should it be
QUIT_PARENT or QUIT_ALL?

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-07 10:12       ` Paolo Bonzini
@ 2014-10-07 10:21         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07 10:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/10/2014 10:58, Dr. David Alan Gilbert ha scritto:
> > 
> >>> > > +    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
> >>> > > +        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
> >>> > > +        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
> >>> > > +        exitcode &= LOADVM_EXITCODE_QUITLOOP;
> >> > 
> >> > Either you want |=, or the first &= is useless.
> > Ooh nicely spotted; yes that should be |=  - now I need to figure out why this
> > didn't break things.
> > 
> > The idea is we have:
> >  1   outer loadvm_state loop
> >  2      receives packaged command
> >  3        inner_loadvm_state loop
> >  4          receives handle_listen
> >  5          < QUITPARENT
> >  6        < QUITLOOP
> >  7       < QUITLOOP
> >  8   exits
> > 
> > so QUITPARENT causes it's parent to exit, and to do that
> > the inner loop transforms QUITPARENT into QUITLOOP as it's
> > exit.
> 
> Yes, that was my understanding as well.
> 
> We have only two nested loops, but if we had three, should it be
> QUIT_PARENT or QUIT_ALL?

The answer probably depends on why you've got 3 nested loops; either
way is a bit of guesswork about what some potential future user
wants to do.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-04 18:31   ` Paolo Bonzini
@ 2014-10-07 10:29     ` Dr. David Alan Gilbert
  2014-10-07 11:12       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07 10:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +            mis->postcopy_ram_state);
> > +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
> > +        /*
> > +         * Where a migration had postcopy enabled (and thus went to advise)
> > +         * but managed to complete within the precopy period
> > +         */
> > +        postcopy_ram_incoming_cleanup(mis);
> > +    } else {
> > +        if ((ret >= 0) &&
> > +            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
> 
> Instead of the >, it is perhaps nicer to use an outer if that checks for
> state != NONE?  Because in fact this check is for state != NONE, having
> ADVISE been handled above.

You mean something like this (untested) ?

  if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
      if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
          /*
           * Where a migration had postcopy enabled (and thus went to advise)
           * but managed to complete within the precopy period
           */
          postcopy_ram_incoming_cleanup(mis);
      } else if (ret >= 0) {
           /*
            * Postcopy was started, cleanup should happen at the end of the
            * postcopy thread.
            */
           DPRINTF("process_incoming_migration_co: exiting main branch");
           return;
      }
  }

Dave

> Paolo
> 
> > +            /*
> > +             * Postcopy was started, cleanup should happen at the end of the
> > +             * postcopy thread.
> > +             */
> > +            DPRINTF("process_incoming_migration_co: exiting main branch");
> > +            return;
> > +        }
> > +    }
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/47] socket shutdown
  2014-10-07 10:00     ` Dr. David Alan Gilbert
@ 2014-10-07 11:10       ` Paolo Bonzini
  0 siblings, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-07 11:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

Il 07/10/2014 12:00, Dr. David Alan Gilbert ha scritto:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
>>> +#ifndef WIN32
>>> +    if (rd) {
>>> +        how = SHUT_RD;
>>> +    }
>>> +
>>> +    if (wr) {
>>> +        how = rd ? SHUT_RDWR : SHUT_WR;
>>> +    }
>>> +
>>> +#else
>>> +    /* Untested */
>>> +    if (rd) {
>>> +        how = SD_RECEIVE;
>>> +    }
>>> +
>>> +    if (wr) {
>>> +        how = rd ? SD_BOTH : SD_SEND;
>>> +    }
>>> +
>>> +#endif
>>> +
>>
>>
>> These are the same on Windows and non-Windows actually.  Just #define
>> SHUT_* to 0/1/2 and avoid the wrapper.
> 
> OK, something like this? (the qemu-file.c abstraction is still needed
> to cover QEMUFile's that aren't simple sockets, but I've removed the 
> second layer in util/qemu-sockets.c).

Yes.  Or just pass SHUT_* directly to socket_shutdown, of course.

Paolo

> 
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -44,6 +44,13 @@ int socket_set_fast_reuse(int fd);
>  int send_all(int fd, const void *buf, int len1);
>  int recv_all(int fd, void *buf, int len1, bool single_read);
>  
> +#ifdef WIN32
> +/* Windows has different names for the same constants with the same values */
> +#define SHUT_RD   0
> +#define SHUT_WR   1
> +#define SHUT_RDWR 2
> +#endif
> +
>  /* callback function for nonblocking connect
>   * valid fd on success, negative error code on failure
>   */
> 
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -90,6 +90,13 @@ static int socket_close(void *opaque)
>      return 0;
>  }
>  
> +static int socket_shutdown(void *opaque, bool rd, bool wr)
> +{
> +    QEMUFileSocket *s = opaque;
> +
> +    return shutdown(s->fd, rd ? (wr ? SHUT_RDWR : SHUT_RD) : SHUT_WR);
> +}
> +
>  static int stdio_get_fd(void *opaque)
>  {
>      QEMUFileStdio *s = opaque;
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-07 10:29     ` Dr. David Alan Gilbert
@ 2014-10-07 11:12       ` Paolo Bonzini
  0 siblings, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-07 11:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

Il 07/10/2014 12:29, Dr. David Alan Gilbert ha scritto:
> You mean something like this (untested) ?
> 
>   if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
>       if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
>           /*
>            * Where a migration had postcopy enabled (and thus went to advise)
>            * but managed to complete within the precopy period
>            */
>           postcopy_ram_incoming_cleanup(mis);
>       } else if (ret >= 0) {
>            /*
>             * Postcopy was started, cleanup should happen at the end of the
>             * postcopy thread.
>             */
>            DPRINTF("process_incoming_migration_co: exiting main branch");
>            return;
>       }
>   }

Yes.  Not sure why postcopy_ram_incoming_cleanup is not needed if ret <
0, but you sure know. :)

Of course, this is subject to my previous comment that I would rename a
lot of postcopy_ram_* symbols to just postcopy_*.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2014-10-04 18:04   ` Paolo Bonzini
@ 2014-10-07 11:35     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-07 11:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +        /*
> > +         * Don't break host-page chunks up with queue items
> > +         * so only unqueue if,
> > +         *   a) The last item came from the queue anyway
> > +         *   b) The last sent item was the last target-page in a host page
> > +         */
> > +        if (last_was_from_queue || (!last_sent_block) ||
> 
> Extra parentheses.

Fixed.

> Is the last_was_from_queue check necessary?  Or
> would one of the other checks be true anyway if last_was_from_queue is true?

        if (last_was_from_queue || !last_sent_block ||
            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
        }

The last_was_from_queue is needed.  We're going around this loop in Target-page
chunks (that correspond to one bit in the migration bitmap) and want to make sure
we don't break into the middle of an existing host-page with an entry off the queue.
So (going backwards in that if):
   We can take something from the queue if:
         a) We just sent the last TP in a HP
         b) We didn't send anything yet (unlikely)
         c) The last thing came from the queue anyway

(c) is needed to override (a) when we've just sent a TP from the queue but it's
not the last TP in the HP that came from the queue; otherwise we'd send the first
TP from the queue and resume taking pages from the background scan.

> > +            /* We're sending this page, and since it's postcopy nothing else
> > +             * will dirty it, and we must make sure it doesn't get sent again.
> > +             */
> > +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> > +                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
> > +                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
> > +                        test_bit(bitoffset, ms->sentmap));
> 
> If a DPRINTF occurs in a loop, please change it to a tracepoint.

OK, I'll look at that - most of arch_init (and migration) still use DPRINTF.

> This function looks like a candidate for cleaning its logic up and/or
> splitting it.  But it can be done later by the poor soul who will touch
> it next. :)

Yep, I've already done it once (see 14bcfdc7f - Split ram_save_block).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
@ 2014-10-08  2:10   ` zhanghailiang
  2014-11-03  0:53   ` David Gibson
  1 sibling, 0 replies; 204+ messages in thread
From: zhanghailiang @ 2014-10-08  2:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> * Please comment on separate thread for this QEMUSizedBuffer patch *
>
> This is based on Stefan and Joel's patch that creates a QEMUFile that goes
> to a memory buffer; from:
>
> http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html
>
> Using the QEMUFile interface, this patch adds support functions for
> operating on in-memory sized buffers that can be written to or read from.
>
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> Signed-off-by: Joel Schopp <jschopp@linux.vnet.ibm.com>
>
> For fixes/tweeks I've done:
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>   include/migration/qemu-file.h |  28 +++
>   include/qemu/typedefs.h       |   1 +
>   qemu-file.c                   | 456 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 485 insertions(+)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index c90f529..6ef8ebc 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -25,6 +25,8 @@
>   #define QEMU_FILE_H 1
>   #include "exec/cpu-common.h"
>
> +#include <stdint.h>
> +
>   /* This function writes a chunk of data to a file at the given position.
>    * The pos argument can be ignored if the file is only being used for
>    * streaming.  The handler should try to write all of the data it can.
> @@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
>       QEMURamSaveFunc *save_page;
>   } QEMUFileOps;
>
> +struct QEMUSizedBuffer {
> +    struct iovec *iov;
> +    size_t n_iov;
> +    size_t size; /* total allocated size in all iov's */
> +    size_t used; /* number of used bytes */
> +};
> +
> +typedef struct QEMUSizedBuffer QEMUSizedBuffer;
> +

There is a redefinition of typedef ‘QEMUSizedBuffer’ in
'include/qemu/typedefs.h:68', when i compile qemu, it complains;)


>   QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
>   QEMUFile *qemu_fopen(const char *filename, const char *mode);
>   QEMUFile *qemu_fdopen(int fd, const char *mode);
>   QEMUFile *qemu_fopen_socket(int fd, const char *mode);
>   QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
> +QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
>   int qemu_get_fd(QEMUFile *f);
>   int qemu_fclose(QEMUFile *f);
>   int64_t qemu_ftell(QEMUFile *f);
> @@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
>   void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
>   bool qemu_file_mode_is_not_valid(const char *mode);
>
> +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
> +QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
> +void qsb_free(QEMUSizedBuffer *);
> +size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
> +size_t qsb_get_length(const QEMUSizedBuffer *qsb);
> +ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
> +                       uint8_t *buf);
> +ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
> +                     off_t pos, size_t count);
> +
> +
> +/*
> + * For use on files opened with qemu_bufopen
> + */
> +const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
> +
>   static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
>   {
>       qemu_put_byte(f, (int)v);
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 5f20b0e..db1153a 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
>   typedef struct PCIEAERErr PCIEAERErr;
>   typedef struct PCIEPort PCIEPort;
>   typedef struct PCIESlot PCIESlot;
> +typedef struct QEMUSizedBuffer QEMUSizedBuffer;

Here!, see above comment. Thanks.

>   typedef struct MSIMessage MSIMessage;
>   typedef struct SerialState SerialState;
>   typedef struct PCMCIACardState PCMCIACardState;
> diff --git a/qemu-file.c b/qemu-file.c
> index a8e3912..ccc516c 100644
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -878,3 +878,459 @@ uint64_t qemu_get_be64(QEMUFile *f)
>       v |= qemu_get_be32(f);
>       return v;
>   }
> +
> +#define QSB_CHUNK_SIZE      (1 << 10)
> +#define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
> +
> +/**
> + * Create a QEMUSizedBuffer
> + * This type of buffer uses scatter-gather lists internally and
> + * can grow to any size. Any data array in the scatter-gather list
> + * can hold different amount of bytes.
> + *
> + * @buffer: Optional buffer to copy into the QSB
> + * @len: size of initial buffer; if @buffer is given, buffer must
> + *       hold at least len bytes
> + *
> + * Returns a pointer to a QEMUSizedBuffer or NULL on allocation failure
> + */
> +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
> +{
> +    QEMUSizedBuffer *qsb;
> +    size_t alloc_len, num_chunks, i, to_copy;
> +    size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
> +                        ? QSB_MAX_CHUNK_SIZE
> +                        : QSB_CHUNK_SIZE;
> +
> +    num_chunks = DIV_ROUND_UP(len ? len : QSB_CHUNK_SIZE, chunk_size);
> +    alloc_len = num_chunks * chunk_size;
> +
> +    qsb = g_try_new0(QEMUSizedBuffer, 1);
> +    if (!qsb) {
> +        return NULL;
> +    }
> +
> +    qsb->iov = g_try_new0(struct iovec, num_chunks);
> +    if (!qsb->iov) {
> +        g_free(qsb);
> +        return NULL;
> +    }
> +
> +    qsb->n_iov = num_chunks;
> +
> +    for (i = 0; i < num_chunks; i++) {
> +        qsb->iov[i].iov_base = g_try_malloc0(chunk_size);
> +        if (!qsb->iov[i].iov_base) {
> +            /* qsb_free is safe since g_free can cope with NULL */
> +            qsb_free(qsb);
> +            return NULL;
> +        }
> +
> +        qsb->iov[i].iov_len = chunk_size;
> +        if (buffer) {
> +            to_copy = (len - qsb->used) > chunk_size
> +                      ? chunk_size : (len - qsb->used);
> +            memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy);
> +            qsb->used += to_copy;
> +        }
> +    }
> +
> +    qsb->size = alloc_len;
> +
> +    return qsb;
> +}
> +
> +/**
> + * Free the QEMUSizedBuffer
> + *
> + * @qsb: The QEMUSizedBuffer to free
> + */
> +void qsb_free(QEMUSizedBuffer *qsb)
> +{
> +    size_t i;
> +
> +    if (!qsb) {
> +        return;
> +    }
> +
> +    for (i = 0; i < qsb->n_iov; i++) {
> +        g_free(qsb->iov[i].iov_base);
> +    }
> +    g_free(qsb->iov);
> +    g_free(qsb);
> +}
> +
> +/**
> + * Get the number of used bytes in the QEMUSizedBuffer
> + *
> + * @qsb: A QEMUSizedBuffer
> + *
> + * Returns the number of bytes currently used in this buffer
> + */
> +size_t qsb_get_length(const QEMUSizedBuffer *qsb)
> +{
> +    return qsb->used;
> +}
> +
> +/**
> + * Set the length of the buffer; the primary usage of this
> + * function is to truncate the number of used bytes in the buffer.
> + * The size will not be extended beyond the current number of
> + * allocated bytes in the QEMUSizedBuffer.
> + *
> + * @qsb: A QEMUSizedBuffer
> + * @new_len: The new length of bytes in the buffer
> + *
> + * Returns the number of bytes the buffer was truncated or extended
> + * to.
> + */
> +size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t new_len)
> +{
> +    if (new_len <= qsb->size) {
> +        qsb->used = new_len;
> +    } else {
> +        qsb->used = qsb->size;
> +    }
> +    return qsb->used;
> +}
> +
> +/**
> + * Get the iovec that holds the data for a given position @pos.
> + *
> + * @qsb: A QEMUSizedBuffer
> + * @pos: The index of a byte in the buffer
> + * @d_off: Pointer to an offset that this function will indicate
> + *         at what position within the returned iovec the byte
> + *         is to be found
> + *
> + * Returns the index of the iovec that holds the byte at the given
> + * index @pos in the byte stream; a negative number if the iovec
> + * for the given position @pos does not exist.
> + */
> +static ssize_t qsb_get_iovec(const QEMUSizedBuffer *qsb,
> +                             off_t pos, off_t *d_off)
> +{
> +    ssize_t i;
> +    off_t curr = 0;
> +
> +    if (pos > qsb->used) {
> +        return -1;
> +    }
> +
> +    for (i = 0; i < qsb->n_iov; i++) {
> +        if (curr + qsb->iov[i].iov_len > pos) {
> +            *d_off = pos - curr;
> +            return i;
> +        }
> +        curr += qsb->iov[i].iov_len;
> +    }
> +    return -1;
> +}
> +
> +/*
> + * Convert the QEMUSizedBuffer into a flat buffer.
> + *
> + * Note: If at all possible, try to avoid this function since it
> + *       may unnecessarily copy memory around.
> + *
> + * @qsb: pointer to QEMUSizedBuffer
> + * @start: offset to start at
> + * @count: number of bytes to copy
> + * @buf: a pointer to a buffer to write into (at least @count bytes)
> + *
> + * Returns the number of bytes copied into the output buffer
> + */
> +ssize_t qsb_get_buffer(const QEMUSizedBuffer *qsb, off_t start,
> +                       size_t count, uint8_t *buffer)
> +{
> +    const struct iovec *iov;
> +    size_t to_copy, all_copy;
> +    ssize_t index;
> +    off_t s_off;
> +    off_t d_off = 0;
> +    char *s;
> +
> +    if (start > qsb->used) {
> +        return 0;
> +    }
> +
> +    all_copy = qsb->used - start;
> +    if (all_copy > count) {
> +        all_copy = count;
> +    } else {
> +        count = all_copy;
> +    }
> +
> +    index = qsb_get_iovec(qsb, start, &s_off);
> +    if (index < 0) {
> +        return 0;
> +    }
> +
> +    while (all_copy > 0) {
> +        iov = &qsb->iov[index];
> +
> +        s = iov->iov_base;
> +
> +        to_copy = iov->iov_len - s_off;
> +        if (to_copy > all_copy) {
> +            to_copy = all_copy;
> +        }
> +        memcpy(&buffer[d_off], &s[s_off], to_copy);
> +
> +        d_off += to_copy;
> +        all_copy -= to_copy;
> +
> +        s_off = 0;
> +        index++;
> +    }
> +
> +    return count;
> +}
> +
> +/**
> + * Grow the QEMUSizedBuffer to the given size and allocate
> + * memory for it.
> + *
> + * @qsb: A QEMUSizedBuffer
> + * @new_size: The new size of the buffer
> + *
> + * Return:
> + *    a negative error code in case of memory allocation failure
> + * or
> + *    the new size of the buffer. The returned size may be greater or equal
> + *    to @new_size.
> + */
> +static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size)
> +{
> +    size_t needed_chunks, i;
> +
> +    if (qsb->size < new_size) {
> +        struct iovec *new_iov;
> +        size_t size_diff = new_size - qsb->size;
> +        size_t chunk_size = (size_diff > QSB_MAX_CHUNK_SIZE)
> +                             ? QSB_MAX_CHUNK_SIZE : QSB_CHUNK_SIZE;
> +
> +        needed_chunks = DIV_ROUND_UP(size_diff, chunk_size);
> +
> +        new_iov = g_try_malloc_n(qsb->n_iov + needed_chunks,
> +                                 sizeof(struct iovec));
> +        if (new_iov == NULL) {
> +            return -ENOMEM;
> +        }
> +
> +        /* Allocate new chunks as needed into new_iov */
> +        for (i = qsb->n_iov; i < qsb->n_iov + needed_chunks; i++) {
> +            new_iov[i].iov_base = g_try_malloc0(chunk_size);
> +            new_iov[i].iov_len = chunk_size;
> +            if (!new_iov[i].iov_base) {
> +                size_t j;
> +
> +                /* Free previously allocated new chunks */
> +                for (j = qsb->n_iov; j < i; j++) {
> +                    g_free(new_iov[j].iov_base);
> +                }
> +                g_free(new_iov);
> +
> +                return -ENOMEM;
> +            }
> +        }
> +
> +        /*
> +         * Now we can't get any allocation errors, copy over to new iov
> +         * and switch.
> +         */
> +        for (i = 0; i < qsb->n_iov; i++) {
> +            new_iov[i] = qsb->iov[i];
> +        }
> +
> +        qsb->n_iov += needed_chunks;
> +        g_free(qsb->iov);
> +        qsb->iov = new_iov;
> +        qsb->size += (needed_chunks * chunk_size);
> +    }
> +
> +    return qsb->size;
> +}
> +
> +/**
> + * Write into the QEMUSizedBuffer at a given position and a given
> + * number of bytes. This function will automatically grow the
> + * QEMUSizedBuffer.
> + *
> + * @qsb: A QEMUSizedBuffer
> + * @source: A byte array to copy data from
> + * @pos: The position within the @qsb to write data to
> + * @size: The number of bytes to copy into the @qsb
> + *
> + * Returns @size or a negative error code in case of memory allocation failure,
> + *           or with an invalid 'pos'
> + */
> +ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
> +                     off_t pos, size_t count)
> +{
> +    ssize_t rc = qsb_grow(qsb, pos + count);
> +    size_t to_copy;
> +    size_t all_copy = count;
> +    const struct iovec *iov;
> +    ssize_t index;
> +    char *dest;
> +    off_t d_off, s_off = 0;
> +
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    if (pos + count > qsb->used) {
> +        qsb->used = pos + count;
> +    }
> +
> +    index = qsb_get_iovec(qsb, pos, &d_off);
> +    if (index < 0) {
> +        return -EINVAL;
> +    }
> +
> +    while (all_copy > 0) {
> +        iov = &qsb->iov[index];
> +
> +        dest = iov->iov_base;
> +
> +        to_copy = iov->iov_len - d_off;
> +        if (to_copy > all_copy) {
> +            to_copy = all_copy;
> +        }
> +
> +        memcpy(&dest[d_off], &source[s_off], to_copy);
> +
> +        s_off += to_copy;
> +        all_copy -= to_copy;
> +
> +        d_off = 0;
> +        index++;
> +    }
> +
> +    return count;
> +}
> +
> +/**
> + * Create a deep copy of the given QEMUSizedBuffer.
> + *
> + * @qsb: A QEMUSizedBuffer
> + *
> + * Returns a clone of @qsb or NULL on allocation failure
> + */
> +QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
> +{
> +    QEMUSizedBuffer *out = qsb_create(NULL, qsb_get_length(qsb));
> +    size_t i;
> +    ssize_t res;
> +    off_t pos = 0;
> +
> +    if (!out) {
> +        return NULL;
> +    }
> +
> +    for (i = 0; i < qsb->n_iov; i++) {
> +        res =  qsb_write_at(out, qsb->iov[i].iov_base,
> +                            pos, qsb->iov[i].iov_len);
> +        if (res < 0) {
> +            qsb_free(out);
> +            return NULL;
> +        }
> +        pos += res;
> +    }
> +
> +    return out;
> +}
> +
> +typedef struct QEMUBuffer {
> +    QEMUSizedBuffer *qsb;
> +    QEMUFile *file;
> +} QEMUBuffer;
> +
> +static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
> +{
> +    QEMUBuffer *s = opaque;
> +    ssize_t len = qsb_get_length(s->qsb) - pos;
> +
> +    if (len <= 0) {
> +        return 0;
> +    }
> +
> +    if (len > size) {
> +        len = size;
> +    }
> +    return qsb_get_buffer(s->qsb, pos, len, buf);
> +}
> +
> +static int buf_put_buffer(void *opaque, const uint8_t *buf,
> +                          int64_t pos, int size)
> +{
> +    QEMUBuffer *s = opaque;
> +
> +    return qsb_write_at(s->qsb, buf, pos, size);
> +}
> +
> +static int buf_close(void *opaque)
> +{
> +    QEMUBuffer *s = opaque;
> +
> +    qsb_free(s->qsb);
> +
> +    g_free(s);
> +
> +    return 0;
> +}
> +
> +const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f)
> +{
> +    QEMUBuffer *p;
> +
> +    qemu_fflush(f);
> +
> +    p = f->opaque;
> +
> +    return p->qsb;
> +}
> +
> +static const QEMUFileOps buf_read_ops = {
> +    .get_buffer = buf_get_buffer,
> +    .close =      buf_close,
> +};
> +
> +static const QEMUFileOps buf_write_ops = {
> +    .put_buffer = buf_put_buffer,
> +    .close =      buf_close,
> +};
> +
> +QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
> +{
> +    QEMUBuffer *s;
> +
> +    if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') ||
> +        mode[1] != '\0') {
> +        error_report("qemu_bufopen: Argument validity check failed");
> +        return NULL;
> +    }
> +
> +    s = g_malloc0(sizeof(QEMUBuffer));
> +    if (mode[0] == 'r') {
> +        s->qsb = input;
> +    }
> +
> +    if (s->qsb == NULL) {
> +        s->qsb = qsb_create(NULL, 0);
> +    }
> +    if (!s->qsb) {
> +        g_free(s);
> +        error_report("qemu_bufopen: qsb_create failed");
> +        return NULL;
> +    }
> +
> +
> +    if (mode[0] == 'r') {
> +        s->file = qemu_fopen_ops(s, &buf_read_ops);
> +    } else {
> +        s->file = qemu_fopen_ops(s, &buf_write_ops);
> +    }
> +    return s->file;
> +}
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2014-10-08  2:28   ` zhanghailiang
  2014-11-04  1:29   ` David Gibson
  1 sibling, 0 replies; 204+ messages in thread
From: zhanghailiang @ 2014-10-08  2:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Suspend to file is very much like a migrate, and it makes life
> easier if we have the Migration state available, so initialise it
> in the savevm.c code for suspending.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   include/migration/migration.h | 1 +
>   include/qemu/typedefs.h       | 1 +
>   migration.c                   | 2 +-
>   savevm.c                      | 2 ++
>   4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2c078c4..3aeae47 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -140,6 +140,7 @@ int migrate_fd_close(MigrationState *s);
>
>   void add_migration_state_change_notifier(Notifier *notify);
>   void remove_migration_state_change_notifier(Notifier *notify);
> +MigrationState *migrate_init(const MigrationParams *params);
>   bool migration_in_setup(MigrationState *);
>   bool migration_has_finished(MigrationState *);
>   bool migration_has_failed(MigrationState *);
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 0f79b5c..8539de6 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -16,6 +16,7 @@ struct Monitor;
>   typedef struct Monitor Monitor;
>   typedef struct MigrationIncomingState MigrationIncomingState;
>   typedef struct MigrationParams MigrationParams;
> +typedef struct MigrationState MigrationState;
>

Er, another redefinition, when compile, it complains there is
a redefinition of typedef ‘MigrationState’ in
'include/migration/migration.h:59', is this a problem?

>   typedef struct Property Property;
>   typedef struct PropertyInfo PropertyInfo;
> diff --git a/migration.c b/migration.c
> index 527423e..3a45b2a 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -488,7 +488,7 @@ bool migration_has_failed(MigrationState *s)
>               s->state == MIG_STATE_ERROR);
>   }
>
> -static MigrationState *migrate_init(const MigrationParams *params)
> +MigrationState *migrate_init(const MigrationParams *params)
>   {
>       MigrationState *s = migrate_get_current();
>       int64_t bandwidth_limit = s->bandwidth_limit;
> diff --git a/savevm.c b/savevm.c
> index bffe890..a368a25 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -949,6 +949,8 @@ static int qemu_savevm_state(QEMUFile *f)
>           .blk = 0,
>           .shared = 0
>       };
> +    MigrationState *ms = migrate_init(&params);
> +    ms->file = f;
>
>       if (qemu_savevm_state_blocked(NULL)) {
>           return -EINVAL;
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2014-10-08  2:31   ` zhanghailiang
  2014-10-08  7:49     ` Dr. David Alan Gilbert
  2014-11-10  6:31   ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: zhanghailiang @ 2014-10-08  2:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> On receiving MIG_RPCOMM_REQPAGES look up the address and
> queue the page.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
>   include/migration/migration.h | 21 +++++++++++++++++
>   include/qemu/typedefs.h       |  3 ++-
>   migration.c                   | 34 +++++++++++++++++++++++++++-
>   4 files changed, 108 insertions(+), 2 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 4a03171..72f9e17 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>   }
>
>   /*
> + * Queue the pages for transmission, e.g. a request from postcopy destination
> + *   ms: MigrationStatus in which the queue is held
> + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> + *   start: Offset from the start of the RAMBlock
> + *   len: Length (in bytes) to send
> + *   Return: 0 on success
> + */
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len)
> +{
> +    RAMBlock *ramblock;
> +
> +    if (!rbname) {
> +        /* Reuse last RAMBlock */
> +        ramblock = ms->last_req_rb;
> +
> +        if (!ramblock) {
> +            error_report("ram_save_queue_pages no previous block");
> +            return -1;
> +        }
> +    } else {
> +        ramblock = ram_find_block(rbname);
> +
> +        if (!ramblock) {
> +            error_report("ram_save_queue_pages no block '%s'", rbname);
> +            return -1;
> +        }
> +    }
> +    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
> +                    ramblock->idstr, start, len);
> +
> +    if (start+len > ramblock->length) {
> +        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> +                     __func__, start, len, ramblock->length);
> +        return -1;
> +    }
> +
> +    struct MigrationSrcPageRequest *new_entry =
> +        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +    new_entry->rb = ramblock;
> +    new_entry->offset = start;
> +    new_entry->len = len;
> +    ms->last_req_rb = ramblock;
> +
> +    qemu_mutex_lock(&ms->src_page_req_mutex);
> +    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +    return 0;
> +}
> +
> +/*
>    * ram_find_and_save_block: Finds a page to send and sends it to f
>    *
>    * Returns:  The number of bytes written.
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 5e0d30d..5bc01d5 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -102,6 +102,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
>   MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
>   void migration_incoming_state_destroy(void);
>
> +/*
> + * An outstanding page request, on the source, having been received
> + * and queued
> + */
> +struct MigrationSrcPageRequest {
> +    RAMBlock *rb;
> +    hwaddr    offset;
> +    hwaddr    len;
> +
> +    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
> +};
> +
>   struct MigrationState
>   {
>       int64_t bandwidth_limit;
> @@ -138,6 +150,12 @@ struct MigrationState
>        * of the postcopy phase
>        */
>       unsigned long *sentmap;
> +
> +    /* Queue of outstanding page requests from the destination */
> +    QemuMutex src_page_req_mutex;
> +    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
> +    /* The RAMBlock used in the last src_page_request */
> +    RAMBlock *last_req_rb;
>   };
>
>   void process_incoming_migration(QEMUFile *f);
> @@ -273,4 +291,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>                                ram_addr_t offset, size_t size,
>                                int *bytes_sent);
>
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len);
> +
>   #endif
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 79f57c0..24c2207 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
>   typedef struct QEMUFile QEMUFile;
>   typedef struct QEMUBH QEMUBH;
>
> +typedef struct AdapterInfo AdapterInfo;
>   typedef struct AioContext AioContext;
>
>   typedef struct Visitor Visitor;
> @@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
>   typedef struct PcGuestInfo PcGuestInfo;
>   typedef struct PostcopyPMI PostcopyPMI;
>   typedef struct Range Range;
> -typedef struct AdapterInfo AdapterInfo;
> +typedef struct RAMBlock RAMBlock;
>

:(, another redefinition, 'RAMBlock' also defined in 'include/exec/cpu-all.h:314',
Am i miss something when compile qemu?

>   #endif /* QEMU_TYPEDEFS_H */
> diff --git a/migration.c b/migration.c
> index cfdaa52..63d7699 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -26,6 +26,8 @@
>   #include "qemu/thread.h"
>   #include "qmp-commands.h"
>   #include "trace.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
>
>   //#define DEBUG_MIGRATION
>
> @@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
>
>       migrate_fd_cleanup_src_rp(s);
>
> +    /* This queue generally should be empty - but in the case of a failed
> +     * migration might have some droppings in.
> +     */
> +    struct MigrationSrcPageRequest *mspr, *next_mspr;
> +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> +        g_free(mspr);
> +    }
> +
>       if (s->file) {
>           trace_migrate_fd_cleanup();
>           qemu_mutex_unlock_iothread();
> @@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
>       s->state = MIG_STATE_SETUP;
>       trace_migrate_set_state(MIG_STATE_SETUP);
>
> +    qemu_mutex_init(&s->src_page_req_mutex);
> +    QSIMPLEQ_INIT(&s->src_page_requests);
> +
>       s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>       return s;
>   }
> @@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
>   static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
>                                          ram_addr_t start, ram_addr_t len)
>   {
> -    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> +    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
> +            rbname, start, len);
> +
> +    /* Round everything up to our host page size */
> +    long our_host_ps = sysconf(_SC_PAGESIZE);
> +    if (start & (our_host_ps-1)) {
> +        long roundings = start & (our_host_ps-1);
> +        start -= roundings;
> +        len += roundings;
> +    }
> +    if (len & (our_host_ps-1)) {
> +        long roundings = len & (our_host_ps-1);
> +        len -= roundings;
> +        len += our_host_ps;
> +    }
> +
> +    if (ram_save_queue_pages(ms, rbname, start, len)) {
> +        source_return_path_bad(ms);
> +    }
>   }
>
>   /*
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-08  2:31   ` zhanghailiang
@ 2014-10-08  7:49     ` Dr. David Alan Gilbert
  2014-10-08  8:07       ` Paolo Bonzini
  2014-10-08  8:10       ` zhanghailiang
  0 siblings, 2 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-08  7:49 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:

> >  typedef struct Visitor Visitor;
> >@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
> >  typedef struct PcGuestInfo PcGuestInfo;
> >  typedef struct PostcopyPMI PostcopyPMI;
> >  typedef struct Range Range;
> >-typedef struct AdapterInfo AdapterInfo;
> >+typedef struct RAMBlock RAMBlock;
> >
> 
> :(, another redefinition, 'RAMBlock' also defined in 'include/exec/cpu-all.h:314',
> Am i miss something when compile qemu?

Interesting; I'm not seeing that problem at all (gcc 4.8.3-7)

What compiler and flags are you using?

Dave


> 
> >  #endif /* QEMU_TYPEDEFS_H */
> >diff --git a/migration.c b/migration.c
> >index cfdaa52..63d7699 100644
> >--- a/migration.c
> >+++ b/migration.c
> >@@ -26,6 +26,8 @@
> >  #include "qemu/thread.h"
> >  #include "qmp-commands.h"
> >  #include "trace.h"
> >+#include "exec/memory.h"
> >+#include "exec/address-spaces.h"
> >
> >  //#define DEBUG_MIGRATION
> >
> >@@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
> >
> >      migrate_fd_cleanup_src_rp(s);
> >
> >+    /* This queue generally should be empty - but in the case of a failed
> >+     * migration might have some droppings in.
> >+     */
> >+    struct MigrationSrcPageRequest *mspr, *next_mspr;
> >+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> >+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> >+        g_free(mspr);
> >+    }
> >+
> >      if (s->file) {
> >          trace_migrate_fd_cleanup();
> >          qemu_mutex_unlock_iothread();
> >@@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
> >      s->state = MIG_STATE_SETUP;
> >      trace_migrate_set_state(MIG_STATE_SETUP);
> >
> >+    qemu_mutex_init(&s->src_page_req_mutex);
> >+    QSIMPLEQ_INIT(&s->src_page_requests);
> >+
> >      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      return s;
> >  }
> >@@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
> >  static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> >                                         ram_addr_t start, ram_addr_t len)
> >  {
> >-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> >+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
> >+            rbname, start, len);
> >+
> >+    /* Round everything up to our host page size */
> >+    long our_host_ps = sysconf(_SC_PAGESIZE);
> >+    if (start & (our_host_ps-1)) {
> >+        long roundings = start & (our_host_ps-1);
> >+        start -= roundings;
> >+        len += roundings;
> >+    }
> >+    if (len & (our_host_ps-1)) {
> >+        long roundings = len & (our_host_ps-1);
> >+        len -= roundings;
> >+        len += our_host_ps;
> >+    }
> >+
> >+    if (ram_save_queue_pages(ms, rbname, start, len)) {
> >+        source_return_path_bad(ms);
> >+    }
> >  }
> >
> >  /*
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-08  7:49     ` Dr. David Alan Gilbert
@ 2014-10-08  8:07       ` Paolo Bonzini
  2014-10-08  8:10       ` zhanghailiang
  1 sibling, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-08  8:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, zhanghailiang
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

Il 08/10/2014 09:49, Dr. David Alan Gilbert ha scritto:
>> > :(, another redefinition, 'RAMBlock' also defined in 'include/exec/cpu-all.h:314',
>> > Am i miss something when compile qemu?
> Interesting; I'm not seeing that problem at all (gcc 4.8.3-7)
> 
> What compiler and flags are you using?

I think it is visible with CentOS 6.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-08  7:49     ` Dr. David Alan Gilbert
  2014-10-08  8:07       ` Paolo Bonzini
@ 2014-10-08  8:10       ` zhanghailiang
  2014-10-08  8:18         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: zhanghailiang @ 2014-10-08  8:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

On 2014/10/8 15:49, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>
>>>   typedef struct Visitor Visitor;
>>> @@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
>>>   typedef struct PcGuestInfo PcGuestInfo;
>>>   typedef struct PostcopyPMI PostcopyPMI;
>>>   typedef struct Range Range;
>>> -typedef struct AdapterInfo AdapterInfo;
>>> +typedef struct RAMBlock RAMBlock;
>>>
>>
>> :(, another redefinition, 'RAMBlock' also defined in 'include/exec/cpu-all.h:314',
>> Am i miss something when compile qemu?
>
> Interesting; I'm not seeing that problem at all (gcc 4.8.3-7)
>
> What compiler and flags are you using?
>
> Dave
>

Hi Dave,

My compiler info:
gcc (SUSE Linux) 4.3.4

The configure info is:
#./configure --target-list=x86_64-softmmu --enable-debug --disable-gtk
...
CFLAGS            -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g
QEMU_CFLAGS       -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all    -I/usr/include/libpng12   -I/usr/include/pixman-1
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g
...

Maybe its gcc's limitation, but why this redefinition need? After i remove one,
it compiles successfully;)

Thanks,
zhanghailiang

>
>>
>>>   #endif /* QEMU_TYPEDEFS_H */
>>> diff --git a/migration.c b/migration.c
>>> index cfdaa52..63d7699 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -26,6 +26,8 @@
>>>   #include "qemu/thread.h"
>>>   #include "qmp-commands.h"
>>>   #include "trace.h"
>>> +#include "exec/memory.h"
>>> +#include "exec/address-spaces.h"
>>>
>>>   //#define DEBUG_MIGRATION
>>>
>>> @@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
>>>
>>>       migrate_fd_cleanup_src_rp(s);
>>>
>>> +    /* This queue generally should be empty - but in the case of a failed
>>> +     * migration might have some droppings in.
>>> +     */
>>> +    struct MigrationSrcPageRequest *mspr, *next_mspr;
>>> +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
>>> +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
>>> +        g_free(mspr);
>>> +    }
>>> +
>>>       if (s->file) {
>>>           trace_migrate_fd_cleanup();
>>>           qemu_mutex_unlock_iothread();
>>> @@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
>>>       s->state = MIG_STATE_SETUP;
>>>       trace_migrate_set_state(MIG_STATE_SETUP);
>>>
>>> +    qemu_mutex_init(&s->src_page_req_mutex);
>>> +    QSIMPLEQ_INIT(&s->src_page_requests);
>>> +
>>>       s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>>>       return s;
>>>   }
>>> @@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
>>>   static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
>>>                                          ram_addr_t start, ram_addr_t len)
>>>   {
>>> -    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
>>> +    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
>>> +            rbname, start, len);
>>> +
>>> +    /* Round everything up to our host page size */
>>> +    long our_host_ps = sysconf(_SC_PAGESIZE);
>>> +    if (start & (our_host_ps-1)) {
>>> +        long roundings = start & (our_host_ps-1);
>>> +        start -= roundings;
>>> +        len += roundings;
>>> +    }
>>> +    if (len & (our_host_ps-1)) {
>>> +        long roundings = len & (our_host_ps-1);
>>> +        len -= roundings;
>>> +        len += our_host_ps;
>>> +    }
>>> +
>>> +    if (ram_save_queue_pages(ms, rbname, start, len)) {
>>> +        source_return_path_bad(ms);
>>> +    }
>>>   }
>>>
>>>   /*
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-08  8:10       ` zhanghailiang
@ 2014-10-08  8:18         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-08  8:18 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2014/10/8 15:49, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >
> >>>  typedef struct Visitor Visitor;
> >>>@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
> >>>  typedef struct PcGuestInfo PcGuestInfo;
> >>>  typedef struct PostcopyPMI PostcopyPMI;
> >>>  typedef struct Range Range;
> >>>-typedef struct AdapterInfo AdapterInfo;
> >>>+typedef struct RAMBlock RAMBlock;
> >>>
> >>
> >>:(, another redefinition, 'RAMBlock' also defined in 'include/exec/cpu-all.h:314',
> >>Am i miss something when compile qemu?
> >
> >Interesting; I'm not seeing that problem at all (gcc 4.8.3-7)
> >
> >What compiler and flags are you using?
> >
> >Dave
> >
> 
> Hi Dave,
> 
> My compiler info:
> gcc (SUSE Linux) 4.3.4
> 
> The configure info is:
> #./configure --target-list=x86_64-softmmu --enable-debug --disable-gtk
> ...
> CFLAGS            -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -g
> QEMU_CFLAGS       -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common  -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all    -I/usr/include/libpng12   -I/usr/include/pixman-1
> LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g
> ...
> 
> Maybe its gcc's limitation, but why this redefinition need? After i remove one,
> it compiles successfully;)

OK, thanks.  I'll clean them up.

Dave

> 
> Thanks,
> zhanghailiang
> 
> >
> >>
> >>>  #endif /* QEMU_TYPEDEFS_H */
> >>>diff --git a/migration.c b/migration.c
> >>>index cfdaa52..63d7699 100644
> >>>--- a/migration.c
> >>>+++ b/migration.c
> >>>@@ -26,6 +26,8 @@
> >>>  #include "qemu/thread.h"
> >>>  #include "qmp-commands.h"
> >>>  #include "trace.h"
> >>>+#include "exec/memory.h"
> >>>+#include "exec/address-spaces.h"
> >>>
> >>>  //#define DEBUG_MIGRATION
> >>>
> >>>@@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
> >>>
> >>>      migrate_fd_cleanup_src_rp(s);
> >>>
> >>>+    /* This queue generally should be empty - but in the case of a failed
> >>>+     * migration might have some droppings in.
> >>>+     */
> >>>+    struct MigrationSrcPageRequest *mspr, *next_mspr;
> >>>+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> >>>+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> >>>+        g_free(mspr);
> >>>+    }
> >>>+
> >>>      if (s->file) {
> >>>          trace_migrate_fd_cleanup();
> >>>          qemu_mutex_unlock_iothread();
> >>>@@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
> >>>      s->state = MIG_STATE_SETUP;
> >>>      trace_migrate_set_state(MIG_STATE_SETUP);
> >>>
> >>>+    qemu_mutex_init(&s->src_page_req_mutex);
> >>>+    QSIMPLEQ_INIT(&s->src_page_requests);
> >>>+
> >>>      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>>      return s;
> >>>  }
> >>>@@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
> >>>  static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> >>>                                         ram_addr_t start, ram_addr_t len)
> >>>  {
> >>>-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> >>>+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
> >>>+            rbname, start, len);
> >>>+
> >>>+    /* Round everything up to our host page size */
> >>>+    long our_host_ps = sysconf(_SC_PAGESIZE);
> >>>+    if (start & (our_host_ps-1)) {
> >>>+        long roundings = start & (our_host_ps-1);
> >>>+        start -= roundings;
> >>>+        len += roundings;
> >>>+    }
> >>>+    if (len & (our_host_ps-1)) {
> >>>+        long roundings = len & (our_host_ps-1);
> >>>+        len -= roundings;
> >>>+        len += our_host_ps;
> >>>+    }
> >>>+
> >>>+    if (ram_save_queue_pages(ms, rbname, start, len)) {
> >>>+        source_return_path_bad(ms);
> >>>+    }
> >>>  }
> >>>
> >>>  /*
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-10-07  8:12     ` Dr. David Alan Gilbert
@ 2014-10-08  8:36       ` Cristian Klein
  0 siblings, 0 replies; 204+ messages in thread
From: Cristian Klein @ 2014-10-08  8:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Andrea Arcangeli, yamahata, lilei, quintela, qemu-devel,
	amit.shah, yanghy

On 07 Oct 2014, at 17:12 , Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:

> * Cristian Klein (cristian.klein@cs.umu.se) wrote:
>> On 04 Oct 2014, at 4:21 , Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>> 
>>> 
>>> I've updated our github at:
>>> https://github.com/orbitfp7/qemu/tree/wp3-postcopy
>>> 
>>> to have this version.
>>> 
>>> and it corresponds to the tag:
>>> https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4
>> 
>> Hi Dave,
>> 
>> I just tested this version of post-copy using the libvirt patches I recently posted and it works a lot better. The video streaming VM migrates with a downtime of less than 1 second. Before post-copy finishes, the VM is a bit slow but otherwise running well.
>> 
>> I also tested the patches with a VM doing ?ping? and the downtime was around 0.6 seconds. I suspect that this delay could be caused by libvirt and not by qemu. Notice that, libvirt is a bit special, in the sense that the VM is migrated in suspended state and resumed only after the network was set up on the destination. I will investigate and let you know.
> 
> That's great news - although I'm not quite sure what caused the improvement, there
> were quite a few minor bug fixes and things but nothing that I can think of that
> would directly contribute (except the patches I'd sent you which you'd already tried).

Unfortunately, I made an error in my experiments (post-copy started too late). I re-launched the experiments a few times. A ping VM observes a downtime of about 2 seconds, whereas a video streaming VM of about 4 seconds.

Cristian

>> 
>>> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>> 
>>>> Hi,
>>>> This is the 4th cut of my version of postcopy; it is designed for use with
>>>> the Linux kernel additions just posted by Andrea Arcangeli here:
>>>> 
>>>> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
>>>> 
>>>> (Note: This is a new version compared to my previous postcopy patchset; you'll
>>>> need to update the kernel to the new version.)
>>>> 
>>>> Other than the new kernel ABI (which is only a small change to the userspace side);
>>>> the major changes are;
>>>> 
>>>> a) Code for host page size != target page size
>>>> b) Support for migration over fd 
>>>>    From Cristian Klein; this is for libvirt support which Cristian recently
>>>>    posted to the libvirt list.
>>>> c) It's now build bisectable and builds on 32bit
>>>> 
>>>> Testing wise; I've now done many thousand of postcopy migrations without
>>>> failure (both of idle and busy guests); so it seems pretty solid.
>>>> 
>>>> Must-TODO's:
>>>> 1) A partially repeatable migration_cancel failure
>>>> 2) virt_test's migrate.with_reboot test is failing
>>>> 3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>>>>   the source feels like it needs looking at for postcopy.
>>>> 4) Paolo's comments with respect to the wakeup_request/is_running code
>>>>    in the migration thread
>>>> 5) xbzrle needs disabling once in postcopy
>>>> 
>>>> Later-TODO's:
>>>> 1) Control the rate of background page transfers during postcopy to
>>>>    reduce their impact on the latency of postcopy requests.
>>>> 2) Work with RDMA
>>>> 3) Could destination RP be made blocking (as per discussion with Paolo;
>>>>    I'm still worried that that changes too many assumptions)
>>>> 
>>>> 
>>>> 
>>>> V4:
>>>> Initial support for host page size != target page size
>>>>   - tested heavily on hps==tps
>>>>   - only partially tested on hps!=tps systems
>>>>   - This involved quite a bit of rework around the discard code
>>>> Updated to new kernel userfault ABI
>>>>   - It won't work with the previous version
>>>> Fix mis-optimisation of postcopy request for wrong RAMBlock
>>>>    request for block A offset n
>>>>    un-needed fault for block B/m (already received - no req sent)
>>>>    request for block B/l  - wrongly sent as request for A/l
>>>> Fix thinko in discard bitmap processing (missed last word of bitmap)
>>>>    Symptom: remap failures near the top of RAM if postcopy started late
>>>> Fix bug that caused kernel page acknowledgments to be misaligned
>>>>    May have meant the guest was paused for longer than required
>>>> Fix potential for crashing cleaning up failed RP
>>>> Fixes in docs (from Yang)
>>>> Handle migration by fd as sockets if they are sockets
>>>> Build tested on 32bit
>>>> Fully build bisectable (x86-64)
>>>> 
>>>> 
>>>> Dave
>>>> 
>>>> Cristian Klein (1):
>>>> Handle bi-directional communication for fd migration
>>>> 
>>>> Dr. David Alan Gilbert (46):
>>>> QEMUSizedBuffer based QEMUFile
>>>> Tests: QEMUSizedBuffer/QEMUBuffer
>>>> Start documenting how postcopy works.
>>>> qemu_ram_foreach_block: pass up error value, and down the ramblock
>>>>   name
>>>> improve DPRINTF macros, add to savevm
>>>> Add qemu_get_counted_string to read a string prefixed by a count byte
>>>> Create MigrationIncomingState
>>>> socket shutdown
>>>> Provide runtime Target page information
>>>> Return path: Open a return path on QEMUFile for sockets
>>>> Return path: socket_writev_buffer: Block even on non-blocking fd's
>>>> Migration commands
>>>> Return path: Control commands
>>>> Return path: Send responses from destination to source
>>>> Return path: Source handling of return path
>>>> qemu_loadvm errors and debug
>>>> ram_debug_dump_bitmap: Dump a migration bitmap as text
>>>> Rework loadvm path for subloops
>>>> Add migration-capability boolean for postcopy-ram.
>>>> Add wrappers and handlers for sending/receiving the postcopy-ram
>>>>   migration messages.
>>>> QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>>>> migrate_init: Call from savevm
>>>> Allow savevm handlers to state whether they could go into postcopy
>>>> postcopy: OS support test
>>>> migrate_start_postcopy: Command to trigger transition to postcopy
>>>> MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>>>> qemu_savevm_state_complete: Postcopy changes
>>>> Postcopy page-map-incoming (PMI) structure
>>>> Postcopy: Maintain sentmap and calculate discard
>>>> postcopy: Incoming initialisation
>>>> postcopy: ram_enable_notify to switch on userfault
>>>> Postcopy: Postcopy startup in migration thread
>>>> Postcopy: Create a fault handler thread before marking the ram as
>>>>   userfault
>>>> Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>>>> Page request: Process incoming page request
>>>> Page request: Consume pages off the post-copy queue
>>>> Add assertion to check migration_dirty_pages
>>>> postcopy_ram.c: place_page and helpers
>>>> Postcopy: Use helpers to map pages during migration
>>>> qemu_ram_block_from_host
>>>> Don't sync dirty bitmaps in postcopy
>>>> Host page!=target page: Cleanup bitmaps
>>>> Postcopy; Handle userfault requests
>>>> Start up a postcopy/listener thread ready for incoming page data
>>>> postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>>>> End of migration for postcopy
>>>> 
>>>> Makefile.objs                    |    2 +-
>>>> arch_init.c                      |  739 +++++++++++++++++++++++++--
>>>> docs/migration.txt               |  189 +++++++
>>>> exec.c                           |   76 ++-
>>>> hmp-commands.hx                  |   15 +
>>>> hmp.c                            |    7 +
>>>> hmp.h                            |    1 +
>>>> include/exec/cpu-common.h        |    8 +-
>>>> include/migration/migration.h    |  130 +++++
>>>> include/migration/postcopy-ram.h |  106 ++++
>>>> include/migration/qemu-file.h    |   47 ++
>>>> include/migration/vmstate.h      |    2 +-
>>>> include/qemu/sockets.h           |    1 +
>>>> include/qemu/typedefs.h          |    9 +-
>>>> include/sysemu/sysemu.h          |   43 +-
>>>> migration-fd.c                   |   24 +-
>>>> migration-rdma.c                 |    4 +-
>>>> migration.c                      |  693 +++++++++++++++++++++++++-
>>>> postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
>>>> qapi-schema.json                 |   14 +-
>>>> qemu-file.c                      |  598 +++++++++++++++++++++-
>>>> qmp-commands.hx                  |   19 +
>>>> savevm.c                         |  881 +++++++++++++++++++++++++++++++--
>>>> tests/Makefile                   |    2 +-
>>>> tests/test-vmstate.c             |   74 +--
>>>> util/qemu-sockets.c              |   28 ++
>>>> 26 files changed, 4550 insertions(+), 178 deletions(-)
>>>> create mode 100644 include/migration/postcopy-ram.h
>>>> create mode 100644 postcopy-ram.c
>>>> 
>>>> -- 
>>>> 1.9.3
>>>> 
>>>> 
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
  2014-10-04 18:14   ` Paolo Bonzini
@ 2014-10-16  8:26   ` zhanghailiang
  2014-10-16  8:35     ` Dr. David Alan Gilbert
  2014-11-03  3:47     ` David Gibson
  2014-11-03  3:46   ` David Gibson
  2 siblings, 2 replies; 204+ messages in thread
From: zhanghailiang @ 2014-10-16  8:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Open a return path, and handle messages that are received upon it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   include/migration/migration.h |  10 +++
>   migration.c                   | 181 +++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 190 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 12e640d..b87c289 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -47,6 +47,14 @@ enum mig_rpcomm_cmd {
>       MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
>       MIG_RPCOMM_AFTERLASTVALID
>   };
> +
> +/* Source side RP state */
> +struct MigrationRetPathState {
> +    uint32_t      latest_ack;
> +    QemuThread    rp_thread;
> +    bool          error;
> +};
> +
>   typedef struct MigrationState MigrationState;
>
>   /* State for the incoming migration */
> @@ -69,9 +77,11 @@ struct MigrationState
>       QemuThread thread;
>       QEMUBH *cleanup_bh;
>       QEMUFile *file;
> +    QEMUFile *return_path;
>
>       int state;
>       MigrationParams params;
> +    struct MigrationRetPathState rp_state;
>       double mbps;
>       int64_t total_time;
>       int64_t downtime;
> diff --git a/migration.c b/migration.c
> index 5ba8f3e..ee6db1d 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -246,6 +246,23 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
>       return head;
>   }
>
> +/*
> + * Return true if we're already in the middle of a migration
> + * (i.e. any of the active or setup states)
> + */
> +static bool migration_already_active(MigrationState *ms)
> +{
> +    switch (ms->state) {
> +    case MIG_STATE_ACTIVE:
> +    case MIG_STATE_SETUP:
> +        return true;
> +
> +    default:
> +        return false;
> +
> +    }
> +}
> +
>   static void get_xbzrle_cache_stats(MigrationInfo *info)
>   {
>       if (migrate_use_xbzrle()) {
> @@ -371,6 +388,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
>       }
>   }
>
> +static void migrate_fd_cleanup_src_rp(MigrationState *ms)
> +{
> +    QEMUFile *rp = ms->return_path;
> +
> +    /*
> +     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
> +     * cleaned up from a few threads; make sure not to do it twice in parallel
> +     */
> +    rp = atomic_cmpxchg(&ms->return_path, rp, NULL);
> +    if (rp) {
> +        DPRINTF("cleaning up return path\n");
> +        qemu_fclose(rp);
> +    }
> +}
> +
>   static void migrate_fd_cleanup(void *opaque)
>   {
>       MigrationState *s = opaque;
> @@ -378,6 +410,8 @@ static void migrate_fd_cleanup(void *opaque)
>       qemu_bh_delete(s->cleanup_bh);
>       s->cleanup_bh = NULL;
>
> +    migrate_fd_cleanup_src_rp(s);
> +
>       if (s->file) {
>           trace_migrate_fd_cleanup();
>           qemu_mutex_unlock_iothread();
> @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
>       int old_state ;
>       trace_migrate_fd_cancel();
>
> +    if (s->return_path) {
> +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> +        qemu_file_shutdown(s->return_path);
> +    }
> +
>       do {
>           old_state = s->state;
>           if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
> @@ -655,8 +694,148 @@ int64_t migrate_xbzrle_cache_size(void)
>       return s->xbzrle_cache_size;
>   }
>
> -/* migration thread support */
> +/*
> + * Something bad happened to the RP stream, mark an error
> + * The caller shall print something to indicate why
> + */
> +static void source_return_path_bad(MigrationState *s)
> +{
> +    s->rp_state.error = true;
> +    migrate_fd_cleanup_src_rp(s);
> +}
>
> +/*
> + * Handles messages sent on the return path towards the source VM
> + *
> + */
> +static void *source_return_path_thread(void *opaque)
> +{
> +    MigrationState *ms = opaque;
> +    QEMUFile *rp = ms->return_path;
> +    uint16_t expected_len, header_len, header_com;
> +    const int max_len = 512;
> +    uint8_t buf[max_len];
> +    uint32_t tmp32;
> +    int res;
> +
> +    DPRINTF("RP: %s entry", __func__);
> +    while (rp && !qemu_file_get_error(rp) &&
> +        migration_already_active(ms)) {
> +        DPRINTF("RP: %s top of loop", __func__);
> +        header_com = qemu_get_be16(rp);
> +        header_len = qemu_get_be16(rp);
> +
> +        switch (header_com) {
> +        case MIG_RPCOMM_SHUT:
> +        case MIG_RPCOMM_ACK:
> +            expected_len = 4;
> +            break;
> +
> +        default:
> +            error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
> +                    header_com, header_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        if (header_len > expected_len) {
> +            error_report("RP: Received command 0x%04x with"
> +                    "incorrect length %d expecting %d",
> +                    header_com, header_len,
> +                    expected_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* We know we've got a valid header by this point */
> +        res = qemu_get_buffer(rp, buf, header_len);
> +        if (res != header_len) {
> +            DPRINTF("RP: Failed to read command data");
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* OK, we have the command and the data */
> +        switch (header_com) {
> +        case MIG_RPCOMM_SHUT:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);
> +            if (tmp32) {
> +                error_report("RP: Sibling indicated error %d", tmp32);
> +                source_return_path_bad(ms);
> +            } else {
> +                DPRINTF("RP: SHUT received");
> +            }
> +            /*
> +             * We'll let the main thread deal with closing the RP
> +             * we could do a shutdown(2) on it, but we're the only user
> +             * anyway, so there's nothing gained.
> +             */
> +            goto out;
> +
> +        case MIG_RPCOMM_ACK:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);
> +            DPRINTF("RP: Received ACK 0x%x", tmp32);
> +            atomic_xchg(&ms->rp_state.latest_ack, tmp32);

I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used for?;)

> +            break;
> +
> +        default:
> +            /* This shouldn't happen because we should catch this above */
> +            DPRINTF("RP: Bad header_com in dispatch");
> +        }
> +        /* Latest command processed, now leave a gap for the next one */
> +        header_com = MIG_RPCOMM_INVALID;
> +    }
> +    if (rp && qemu_file_get_error(rp)) {
> +        DPRINTF("%s: rp bad at end", __func__);
> +        source_return_path_bad(ms);
> +    }
> +
> +    DPRINTF("%s: Bottom exit", __func__);
> +
> +out:
> +    return NULL;
> +}
> +
> +__attribute__ (( unused )) /* Until later in patch series */
> +static int open_outgoing_return_path(MigrationState *ms)
> +{
> +
> +    ms->return_path = qemu_file_get_return_path(ms->file);
> +    if (!ms->return_path) {
> +        return -1;
> +    }
> +
> +    DPRINTF("%s: starting thread", __func__);
> +    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> +                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
> +
> +    DPRINTF("%s: continuing", __func__);
> +
> +    return 0;
> +}
> +
> +__attribute__ (( unused )) /* Until later in patch series */
> +static void await_outgoing_return_path_close(MigrationState *ms)
> +{
> +    /*
> +     * If this is a normal exit then the destination will send a SHUT and the
> +     * rp_thread will exit, however if there's an error we need to cause
> +     * it to exit, which we can do by a shutdown.
> +     * (canceling must also shutdown to stop us getting stuck here if
> +     * the destination died at just the wrong place)
> +     */
> +    if (qemu_file_get_error(ms->file) && ms->return_path) {
> +        qemu_file_shutdown(ms->return_path);
> +    }
> +    DPRINTF("%s: Joining", __func__);
> +    qemu_thread_join(&ms->rp_state.rp_thread);
> +    DPRINTF("%s: Exit", __func__);
> +}
> +
> +/*
> + * Master migration thread on the source VM.
> + * It drives the migration and pumps the data down the outgoing channel.
> + */
>   static void *migration_thread(void *opaque)
>   {
>       MigrationState *s = opaque;
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-16  8:26   ` zhanghailiang
@ 2014-10-16  8:35     ` Dr. David Alan Gilbert
  2014-10-16  9:09       ` zhanghailiang
  2014-11-03  3:47     ` David Gibson
  1 sibling, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-16  8:35 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:

> >+        case MIG_RPCOMM_ACK:
> >+            tmp32 = be32_to_cpup((uint32_t *)buf);
> >+            DPRINTF("RP: Received ACK 0x%x", tmp32);
> >+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
> 
> I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used for?;)

Nothing currently; I've used the REQ/ACK as debug at the moment;   I was thinking
that someone might want to wait on an ack being received before carrying on; but hadn't
actually needed it in postcopy.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-16  8:35     ` Dr. David Alan Gilbert
@ 2014-10-16  9:09       ` zhanghailiang
  0 siblings, 0 replies; 204+ messages in thread
From: zhanghailiang @ 2014-10-16  9:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

On 2014/10/16 16:35, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>
>>> +        case MIG_RPCOMM_ACK:
>>> +            tmp32 = be32_to_cpup((uint32_t *)buf);
>>> +            DPRINTF("RP: Received ACK 0x%x", tmp32);
>>> +            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
>>
>> I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used for?;)
>
> Nothing currently; I've used the REQ/ACK as debug at the moment;   I was thinking
> that someone might want to wait on an ack being received before carrying on; but hadn't
> actually needed it in postcopy.
>

OK, i see, Thanks.

> Dave
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-10-04 17:51   ` Paolo Bonzini
@ 2014-10-23 12:18     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-23 12:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +        bool one_message = false;
> > +        /* This looks good, but it's possible that the device loading in the
> > +         * main thread hasn't finished yet, and so we might not be in 'RUN'
> > +         * state yet.
> > +         * TODO: Using an atomic_xchg or something for this
> 
> This looks like a good match for QemuEvent.  Or just mutex & condvar.

Done, QemuEvent seems to work nicely.

> 
> > +         */
> > +        while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) {
> 
> What if we had postcopy of something else than RAM?  Can you remove the
> "ram" part from the symbols that do not directly deal with RAM but just
> with the protocol?

Done; that's 'postcopy_state' and 'POSTCOPY_INCOMING_LISTENING'
a lot of the internal command enums have also lost the 'RAM'; but not
all of them (hopefully just the ones where it makes sense). Similarly
the loadvm_postcopy_ram_handle's are now loadvm_postcopy_handle_...

I've kept the hmp/qmp command with the 'ram'.

Dave

> Paolo
> 
> > +            if (!one_message) {
> > +                DPRINTF("%s: Waiting for RUN", __func__);
> > +                one_message = true;
> > +            }
> > +        }
> > +    }
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
  2014-10-04 17:49   ` Paolo Bonzini
@ 2014-10-23 14:24     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-23 14:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +            mis->postcopy_ram_state);
> > +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
> > +        /*
> > +         * Where a migration had postcopy enabled (and thus went to advise)
> > +         * but managed to complete within the precopy period
> > +         */
> > +        postcopy_ram_incoming_cleanup(mis);
> > +    } else {
> > +        if ((ret >= 0) &&
> > +            (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
> > +            /*
> > +             * Postcopy was started, cleanup should happen at the end of the
> > +             * postcopy thread.
> > +             */
> > +            DPRINTF("process_incoming_migration_co: exiting main branch");
> > +            return;
> > +        }
> 
> Extra parentheses and extra nesting.

Done.

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-04 18:08   ` Paolo Bonzini
@ 2014-10-23 16:23     ` Dr. David Alan Gilbert
  2014-10-23 20:15       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-23 16:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> >      QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
> > +    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
> 
> OPEN_RETURN_PATH?
> 
> > +    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
> 
> SEND_ACK or ACK_REQUESTED?
> 
> >      QEMU_VM_CMD_AFTERLASTVALID
> 
> Pleaseseparatewords.  Is this enum actually used at all?
> 
> Please avoid the difference between QEMU_VM_CMD and MIG_RPCOMM_.
> 
> Perhaps MIG_CMD and MIG_RPCMD_?

Almost, I went with:

    MIG_CMD_INVALID = 0,       /* Must be 0 */
    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
    MIG_CMD_SEND_ACK,          /* Request an ACK on the RP */
    MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */

    MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
                                      warn we might want to do PC */
    MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
                                      pages as it's running. */
    MIG_CMD_POSTCOPY_RUN,          /* Start execution */
    MIG_CMD_POSTCOPY_END,          /* Postcopy is finished. */

    MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                      were previously sent during
                                      precopy but are dirty. */

and
    MIG_RP_CMD_INVALID = 0,  /* Must be 0 */
    MIG_RP_CMD_SHUT,         /* sibling will not send any more RP messages */
    MIG_RP_CMD_ACK,          /* data (seq: be32 ) */
    MIG_RP_CMD_REQ_PAGES,    /* data (start: be64, len: be64) */

the only oddity I get from that is from the 'SEND_ACK' you suggested;
since all my functions to send commands are send_  I currently have
 'qemu_savevm_send_send_ack'  which while consistent looks a bit odd.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-04 18:14   ` Paolo Bonzini
@ 2014-10-23 18:00     ` Dr. David Alan Gilbert
  2014-10-24 10:04       ` Paolo Bonzini
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-10-23 18:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > +/* Source side RP state */
> > +struct MigrationRetPathState {
> > +    uint32_t      latest_ack;
> > +    QemuThread    rp_thread;
> > +    bool          error;
> 
> Should the QemuFile be in here?

Yes, done.

> > +};
> > +
> 
> Also please do not abbrev words, and add a typedef that matches the
> struct if it is useful.  If it is not, just embed the struct without
> giving the type a name (struct { } rp_state).

Done.

> 
> > +static bool migration_already_active(MigrationState *ms)
> > +{
> > +    switch (ms->state) {
> > +    case MIG_STATE_ACTIVE:
> > +    case MIG_STATE_SETUP:
> > +        return true;
> > +
> > +    default:
> > +        return false;
> > +
> > +    }
> > +}
> 
> Should CANCELLING also be considered active?  It is on the source->dest
> path.

Hmm, possibly - but my intention here was just to round up all of the
places that already checked for ACTIVE+SETUP so that I could add POSTCOPY_ACTIVE;
only one of those places also checked for CANCELLING, so I left it out.

> > +static void await_outgoing_return_path_close(MigrationState *ms)
> > +{
> > +    /*
> > +     * If this is a normal exit then the destination will send a SHUT and the
> > +     * rp_thread will exit, however if there's an error we need to cause
> > +     * it to exit, which we can do by a shutdown.
> > +     * (canceling must also shutdown to stop us getting stuck here if
> > +     * the destination died at just the wrong place)
> > +     */
> > +    if (qemu_file_get_error(ms->file) && ms->return_path) {
> > +        qemu_file_shutdown(ms->return_path);
> > +    }
> 
> As mentioned early, I think it's simpler to let these function handle
> themselves the case where there is no return path, and call them
> unconditionally.

I still need to think about that.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-23 16:23     ` Dr. David Alan Gilbert
@ 2014-10-23 20:15       ` Paolo Bonzini
  2014-11-03  3:20         ` David Gibson
  2014-11-04 18:58         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-23 20:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy



On 10/23/2014 06:23 PM, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
>>>      QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
>>> +    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
>>
>> OPEN_RETURN_PATH?
>>
>>> +    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
>>
>> SEND_ACK or ACK_REQUESTED?
>>
>>>      QEMU_VM_CMD_AFTERLASTVALID
>>
>> Pleaseseparatewords.  Is this enum actually used at all?
>>
>> Please avoid the difference between QEMU_VM_CMD and MIG_RPCOMM_.
>>
>> Perhaps MIG_CMD and MIG_RPCMD_?
> 
> Almost, I went with:
> 
>     MIG_CMD_INVALID = 0,       /* Must be 0 */
>     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
>     MIG_CMD_SEND_ACK,          /* Request an ACK on the RP */
>     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> 
>     MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
>                                       warn we might want to do PC */
>     MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
>                                       pages as it's running. */
>     MIG_CMD_POSTCOPY_RUN,          /* Start execution */
>     MIG_CMD_POSTCOPY_END,          /* Postcopy is finished. */
> 
>     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
>                                       were previously sent during
>                                       precopy but are dirty. */
> 
> and
>     MIG_RP_CMD_INVALID = 0,  /* Must be 0 */
>     MIG_RP_CMD_SHUT,         /* sibling will not send any more RP messages */
>     MIG_RP_CMD_ACK,          /* data (seq: be32 ) */
>     MIG_RP_CMD_REQ_PAGES,    /* data (start: be64, len: be64) */
> 
> the only oddity I get from that is from the 'SEND_ACK' you suggested;
> since all my functions to send commands are send_  I currently have
>  'qemu_savevm_send_send_ack'  which while consistent looks a bit odd.

Perhaps ping/pong?

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-23 18:00     ` Dr. David Alan Gilbert
@ 2014-10-24 10:04       ` Paolo Bonzini
  0 siblings, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-10-24 10:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy



On 10/23/2014 08:00 PM, Dr. David Alan Gilbert wrote:
>>> > > +static bool migration_already_active(MigrationState *ms)
>>> > > +{
>>> > > +    switch (ms->state) {
>>> > > +    case MIG_STATE_ACTIVE:
>>> > > +    case MIG_STATE_SETUP:
>>> > > +        return true;
>>> > > +
>>> > > +    default:
>>> > > +        return false;
>>> > > +
>>> > > +    }
>>> > > +}
>> > 
>> > Should CANCELLING also be considered active?  It is on the source->dest
>> > path.
> Hmm, possibly - but my intention here was just to round up all of the
> places that already checked for ACTIVE+SETUP so that I could add POSTCOPY_ACTIVE;
> only one of those places also checked for CANCELLING, so I left it out.

Ok, I would need to check the callers...  There may be bugs waiting to
be fixed. :)  For now I guess it's ok as is.

Thanks for answering my comments!

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
  2014-10-08  2:10   ` zhanghailiang
@ 2014-11-03  0:53   ` David Gibson
  1 sibling, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  0:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Fri, Oct 03, 2014 at 06:47:07PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> * Please comment on separate thread for this QEMUSizedBuffer patch *
> 
> This is based on Stefan and Joel's patch that creates a QEMUFile that goes
> to a memory buffer; from:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html
> 
> Using the QEMUFile interface, this patch adds support functions for
> operating on in-memory sized buffers that can be written to or read from.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> Signed-off-by: Joel Schopp <jschopp@linux.vnet.ibm.com>
> 
> For fixes/tweeks I've done:
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
@ 2014-11-03  1:02   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  1:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]

On Fri, Oct 03, 2014 at 06:47:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> * Please comment on separate thread for this QEMUSziedBuffer patch *
> 
> Modify some of tests/test-vmstate.c to use the in memory file based
> on QEMUSizedBuffer to provide basic testing of QEMUSizedBuffer and
> the associated memory backed QEMUFile type.
> 
> Only some of the tests are changed so that the fd backed QEMUFile is
> still tested.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Looks ok, although I do wonder if the removal of some of the test
paths for regular files is a concern.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works.
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-11-03  1:31   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  1:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

On Fri, Oct 03, 2014 at 06:47:09PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-11-03  2:34   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  2:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

On Fri, Oct 03, 2014 at 06:47:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> check the return value of the function it calls and error if it's non-0
> Fixup qemu_rdma_init_one_block that is the only current caller,
>   and __qemu_rdma_add_block the only function it calls using it.
> 
> Pass the name of the ramblock to the function; helps in debugging.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
@ 2014-11-03  2:35   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  2:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 789 bytes --]

On Fri, Oct 03, 2014 at 06:47:11PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Improve the existing DPRINTF macros in migration.c and arch_init
> by:
>   1) Making them go to stderr rather than stdout (so you can run with
> -nographic and redirect your debug to a file)
>   2) Making them print the ms time with each debug - useful for
> debugging latency issues
> 
> Add the same macro to savevm
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2014-11-03  2:39   ` David Gibson
  2014-11-25 16:13     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  2:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2512 bytes --]

On Fri, Oct 03, 2014 at 06:47:12PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> and use it in loadvm_state.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/qemu-file.h |  2 ++
>  qemu-file.c                   | 15 +++++++++++++++
>  savevm.c                      | 18 ++++++++++--------
>  3 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 6ef8ebc..a8cac7a 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -300,4 +300,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
>  {
>      qemu_get_be64s(f, (uint64_t *)pv);
>  }
> +
> +int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);

I'd suggest writing the prototype as

int qemu_get_counted_string(QEMUFile *f, uint8_t buf[256]);

The compiled code will be identical, of course, but it helps to
document what the function expects.


>  #endif
> diff --git a/qemu-file.c b/qemu-file.c
> index ccc516c..a057b3e 100644
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -879,6 +879,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
>      return v;
>  }
>  
> +/*
> + * Get a string whose length is determined by a single preceding byte
> + * A preallocated 256 byte buffer must be passed in.
> + * Returns: 0 on success and a 0 terminated string in the buffer
> + */
> +int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
> +{
> +    unsigned int len = qemu_get_byte(f);
> +    int res = qemu_get_buffer(f, buf, len);
> +
> +    buf[len] = 0;
> +
> +    return res != len;
> +}
> +
>  #define QSB_CHUNK_SIZE      (1 << 10)
>  #define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
>  
> diff --git a/savevm.c b/savevm.c
> index c3a1f68..cb6f0de 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
>  
>      v = qemu_get_be32(f);
>      if (v == QEMU_VM_FILE_VERSION_COMPAT) {
> -        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
> +        error_report("SaveVM v2 format is obsolete and don't work anymore");

These changes of fprintf() to error_report() look like an unrelated
cleanup.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2014-11-03  2:45   ` David Gibson
  2014-11-04 19:06     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  2:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

On Fri, Oct 03, 2014 at 06:47:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There are currently lots of pieces of incoming migration state scattered
> around, and postcopy is adding more, and it seems better to try and keep
> it together.
> 
> allocate MIS in process_incoming_migration_co
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  9 +++++++++
>  include/qemu/typedefs.h       |  2 ++
>  migration.c                   | 28 ++++++++++++++++++++++++++++
>  savevm.c                      |  2 ++
>  4 files changed, 41 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 3cb5ba8..8a36255 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -41,6 +41,15 @@ struct MigrationParams {
>  
>  typedef struct MigrationState MigrationState;
>  
> +/* State for the incoming migration */
> +struct MigrationIncomingState {
> +    QEMUFile *file;
> +};
> +
> +MigrationIncomingState *migration_incoming_get_current(void);
> +MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);

Hrm.  I'd prefer to see this called migration_incoming_state_new(),
since it allocates a new structure, rather than just initializing an
already allocated one.

I guess you're trying to match migrate_init() in name, so i guess
migrate_incoming_init() would work as well.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-11-03  3:05   ` David Gibson
  2014-11-03 19:04     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On Fri, Oct 03, 2014 at 06:47:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
> 
> Wire it up for 'socket' QEMUFile's using a dup'd fd.

This doesn't seem like the right abstraction to me.  In particular I
can't really see how you'd implement this for anything other than
socket.

I'd suggest instead creating new "open" helper functions (within the
QEMUFile code) that open both a forward and return path
simultaneously.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-11-03  3:10   ` David Gibson
  2014-11-03 18:59     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2533 bytes --]

On Fri, Oct 03, 2014 at 06:47:17PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The return path uses a non-blocking fd so as not to block waiting
> for the (possibly broken) destination to finish returning a message,
> however we still want outbound data to behave in the same way and block.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/qemu-file.c b/qemu-file.c
> index 7393415..57eabd8 100644
> --- a/qemu-file.c
> +++ b/qemu-file.c
> @@ -85,12 +85,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
>      QEMUFileSocket *s = opaque;
>      ssize_t len;
>      ssize_t size = iov_size(iov, iovcnt);
> +    ssize_t offset = 0;
> +    int     err;
>  
> -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> -    if (len < size) {
> -        len = -socket_error();
> +    while (size > 0) {
> +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> +
> +        if (len > 0) {
> +            size -= len;
> +            offset += len;
> +        }
> +
> +        if (size > 0) {
> +            err = socket_error();
> +
> +            if (err != EAGAIN) {
> +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> +                             err, size, len);
> +                /*
> +                 * If I've already sent some but only just got the error, I
> +                 * could return the amount validly sent so far and wait for the
> +                 * next call to report the error, but I'd rather flag the error
> +                 * immediately.

Is that safe?  This gives the caller no means to detect a partially
completed send.

> +                 */
> +                return -err;
> +            }
> +
> +            /* Emulate blocking */
> +            GPollFD pfd;
> +
> +            pfd.fd = s->fd;
> +            pfd.events = G_IO_OUT | G_IO_ERR;
> +            pfd.revents = 0;
> +            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
> +        }
>      }
> -    return len;
> +
> +    return offset;
>  }
>  
>  static int socket_get_fd(void *opaque)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration Dr. David Alan Gilbert (git)
@ 2014-11-03  3:12   ` David Gibson
  2014-11-03 13:53     ` Cristian Klein
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]

On Fri, Oct 03, 2014 at 06:47:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: Cristian Klein <cristian.klein@cs.umu.se>

This patch really, really requires a rationale in the commit message.
The reason it's necessary is certainly not obvious.

> 
> Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se>
> ---
>  migration-fd.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/migration-fd.c b/migration-fd.c
> index d2e523a..129da99 100644
> --- a/migration-fd.c
> +++ b/migration-fd.c
> @@ -31,13 +31,29 @@
>      do { } while (0)
>  #endif
>  
> +static bool fd_is_socket(int fd)
> +{
> +    struct stat stat;
> +    int ret = fstat(fd, &stat);
> +    if (ret == -1) {
> +        /* When in doubt say no */
> +        return false;
> +    }
> +    return S_ISSOCK(stat.st_mode);
> +}
> +
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>  {
>      int fd = monitor_get_fd(cur_mon, fdname, errp);
>      if (fd == -1) {
>          return;
>      }
> -    s->file = qemu_fdopen(fd, "wb");
> +
> +    if (fd_is_socket(fd)) {
> +        s->file = qemu_fopen_socket(fd, "wb");
> +    } else {
> +        s->file = qemu_fdopen(fd, "wb");
> +    }
>  
>      migrate_fd_connect(s);
>  }
> @@ -58,7 +74,11 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>      DPRINTF("Attempting to start an incoming migration via fd\n");
>  
>      fd = strtol(infd, NULL, 0);
> -    f = qemu_fdopen(fd, "rb");
> +    if (fd_is_socket(fd)) {
> +        f = qemu_fopen_socket(fd, "rb");
> +    } else {
> +        f = qemu_fdopen(fd, "rb");
> +    }
>      if(f == NULL) {
>          error_setg_errno(errp, errno, "failed to open the source descriptor");
>          return;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/47] Migration commands
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 13/47] Migration commands Dr. David Alan Gilbert (git)
@ 2014-11-03  3:14   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On Fri, Oct 03, 2014 at 06:47:19PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Create QEMU_VM_COMMAND section type for sending commands from
> source to destination.  These commands are not intended to convey
> guest state but to control the migration process.
> 
> For use in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-23 20:15       ` Paolo Bonzini
@ 2014-11-03  3:20         ` David Gibson
  2014-11-04 18:58         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein,
	Dr. David Alan Gilbert, qemu-devel, amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

On Thu, Oct 23, 2014 at 10:15:20PM +0200, Paolo Bonzini wrote:
> 
> 
> On 10/23/2014 06:23 PM, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> >>>      QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
> >>> +    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
> >>
> >> OPEN_RETURN_PATH?
> >>
> >>> +    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
> >>
> >> SEND_ACK or ACK_REQUESTED?
> >>
> >>>      QEMU_VM_CMD_AFTERLASTVALID
> >>
> >> Pleaseseparatewords.  Is this enum actually used at all?
> >>
> >> Please avoid the difference between QEMU_VM_CMD and MIG_RPCOMM_.
> >>
> >> Perhaps MIG_CMD and MIG_RPCMD_?
> > 
> > Almost, I went with:
> > 
> >     MIG_CMD_INVALID = 0,       /* Must be 0 */
> >     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
> >     MIG_CMD_SEND_ACK,          /* Request an ACK on the RP */
> >     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
> > 
> >     MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
> >                                       warn we might want to do PC */
> >     MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
> >                                       pages as it's running. */
> >     MIG_CMD_POSTCOPY_RUN,          /* Start execution */
> >     MIG_CMD_POSTCOPY_END,          /* Postcopy is finished. */
> > 
> >     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
> >                                       were previously sent during
> >                                       precopy but are dirty. */
> > 
> > and
> >     MIG_RP_CMD_INVALID = 0,  /* Must be 0 */
> >     MIG_RP_CMD_SHUT,         /* sibling will not send any more RP messages */
> >     MIG_RP_CMD_ACK,          /* data (seq: be32 ) */
> >     MIG_RP_CMD_REQ_PAGES,    /* data (start: be64, len: be64) */
> > 
> > the only oddity I get from that is from the 'SEND_ACK' you suggested;
> > since all my functions to send commands are send_  I currently have
> >  'qemu_savevm_send_send_ack'  which while consistent looks a bit odd.
> 
> Perhaps ping/pong?

I like that idea.  Calling it "send_ack" looks like it's just asking
for confusing names somewhere.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2014-11-03  3:22   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On Fri, Oct 03, 2014 at 06:47:21PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add migrate_send_rp_message to send a message from destination to source along the return path.
>   (It uses a mutex to let it be called from multiple threads)
> Add migrate_send_rp_shut to send a 'shut' message to indicate
>   the destination is finished with the RP.
> Add migrate_send_rp_ack to send an 'ack' message
>   Use it in the CMD_REQACK handler
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
  2014-10-04 18:14   ` Paolo Bonzini
  2014-10-16  8:26   ` zhanghailiang
@ 2014-11-03  3:46   ` David Gibson
  2014-11-03 13:22     ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

On Fri, Oct 03, 2014 at 06:47:22PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Open a return path, and handle messages that are received upon it.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

[snip]
> @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
>      int old_state ;
>      trace_migrate_fd_cancel();
>  
> +    if (s->return_path) {
> +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> +        qemu_file_shutdown(s->return_path);

Terminating the rp thread via shutting down its file seems roundabout,
and kind of dependent on the socket file implementation.

[snip]
> +__attribute__ (( unused )) /* Until later in patch series */
> +static int open_outgoing_return_path(MigrationState *ms)
> +{
> +
> +    ms->return_path = qemu_file_get_return_path(ms->file);

So, another reason this get_return_path abstraction doesn't seem right
to me, is that it's not obvious that for non-socket file types, the
source and destination side "get return path" operations would
necessarily be the same.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-10-16  8:26   ` zhanghailiang
  2014-10-16  8:35     ` Dr. David Alan Gilbert
@ 2014-11-03  3:47     ` David Gibson
  2014-11-25 15:44       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:47 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein,
	Dr. David Alan Gilbert (git),
	qemu-devel, amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

On Thu, Oct 16, 2014 at 04:26:55PM +0800, zhanghailiang wrote:
> On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
[snip]

> >+        case MIG_RPCOMM_ACK:
> >+            tmp32 = be32_to_cpup((uint32_t *)buf);
> >+            DPRINTF("RP: Received ACK 0x%x", tmp32);
> >+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
> 
> I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used for?;)

Also, you don't appear to use tmp32 after that point, so what's the
reason for the exchange, rather than just an assignment?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
@ 2014-11-03  3:49   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Fri, Oct 03, 2014 at 06:47:23PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Flip many fprintf's to error_report
> Add lots of DPRINTF debug in qemu_loadvm*
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2014-11-03  3:58   ` David Gibson
  2014-11-19 17:35     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  3:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]

On Fri, Oct 03, 2014 at 06:47:24PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Misses out lines that are all the expected value so the output
> can be quite compact depending on the circumstance.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
>  include/migration/migration.h |  1 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 772de36..6970733 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -769,6 +769,45 @@ static void reset_ram_globals(void)
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
>  
> +/*
> + * 'expected' is the value you expect the bitmap mostly to be full
> + * of and it won't bother printing lines that are all this value
> + * if 'todump' is null the migration bitmap is dumped.
> + */
> +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> +{
> +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> +
> +    int64_t cur;
> +    int64_t linelen = 128l;

I don't think there's any point to the 'l' there.  "long" isn't
necessarily correct for an int64_t, and normal type promotion should
get this right anyway.

Assuming the user has a >128 character wide terminal seems a little
obnoxious, too.

> +    char linebuf[129];
> +
> +    if (!todump) {
> +        todump = migration_bitmap;
> +    }
> +
> +    for (cur = 0; cur < ram_pages; cur += linelen) {
> +        int64_t curb;
> +        bool found = false;
> +        /*
> +         * Last line; catch the case where the line length
> +         * is longer than remaining ram
> +         */
> +        if (cur+linelen > ram_pages) {
> +            linelen = ram_pages - cur;
> +        }
> +        for (curb = 0; curb < linelen; curb++) {
> +            bool thisbit = test_bit(cur+curb, todump);
> +            linebuf[curb] = thisbit ? '1' : '.';
> +            found |= (thisbit ^ expected);

I guess this will have the right result with the obvious encoding of a
bool, but I don't think it's conceptually correct.  It should be
logical, not bitwise operations so:
	found = found || (thisbit != expected);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
  2014-10-04 16:46   ` Paolo Bonzini
@ 2014-11-03  5:08   ` David Gibson
  2014-11-19 17:50     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  5:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1130 bytes --]

On Fri, Oct 03, 2014 at 06:47:25PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs to have two migration streams loading concurrently;
> one from memory (with the device state) and the other from the fd
> with the memory transactions.
> 
> Split the core of qemu_loadvm_state out so we can use it for both.
> 
> Allow the inner loadvm loop to quit and signal whether the parent
> should.
> 
> loadvm_handlers is made static since it's lifetime is greater
> than the outer qemu_loadvm_state.

Maybe it's just me, but "made static" to me indicates either a change
from fully-global to module-global, or (function) local automatic to
local static, not a change from function local-automatic to
module-global as here.

It's also not clear from this patch alone why the lifetime of
loadvm_handlers now needs to exceed that of qemu_loadvm_state().

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2014-11-03  5:51   ` David Gibson
  2014-12-17 14:50     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-03  5:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 14573 bytes --]

On Fri, Oct 03, 2014 at 06:47:27PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add state variable showing current incoming postcopy state.

This appears to implement a lot more than just adding a state variable...

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |   8 +
>  include/sysemu/sysemu.h       |  20 +++
>  savevm.c                      | 335 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 363 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 0d9f62d..2c078c4 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -61,6 +61,14 @@ typedef struct MigrationState MigrationState;
>  struct MigrationIncomingState {
>      QEMUFile *file;
>  
> +    volatile enum {

What's the reason for the volatile?  I think that really needs a comment.

> +        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
> +        POSTCOPY_RAM_INCOMING_ADVISE,
> +        POSTCOPY_RAM_INCOMING_LISTENING,
> +        POSTCOPY_RAM_INCOMING_RUNNING,
> +        POSTCOPY_RAM_INCOMING_END
> +    } postcopy_ram_state;
> +
>      QEMUFile *return_path;
>      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>  };
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index ad96f2a..102dd93 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -88,6 +88,16 @@ enum qemu_vm_cmd {
>      QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
>      QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
>  
> +    QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
> +                                              warn we might want to do PC */
> +    QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,      /* A list of pages to discard that
> +                                              were previously sent during
> +                                              precopy but are dirty. */
> +    QEMU_VM_CMD_POSTCOPY_RAM_LISTEN,       /* Start listening for incoming
> +                                              pages as it's running. */
> +    QEMU_VM_CMD_POSTCOPY_RAM_RUN,          /* Start execution */
> +    QEMU_VM_CMD_POSTCOPY_RAM_END,          /* Postcopy is finished. */
> +
>      QEMU_VM_CMD_AFTERLASTVALID
>  };
>  
> @@ -102,6 +112,16 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
>                                uint16_t len, uint8_t *data);
>  void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_openrp(QEMUFile *f);
> +void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
> +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> +                                           uint16_t len, uint8_t offset,
> +                                           uint64_t *addrlist,
> +                                           uint32_t *masklist);
> +
> +void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f);
> +void qemu_savevm_send_postcopy_ram_run(QEMUFile *f);
> +void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status);
> +
>  int qemu_loadvm_state(QEMUFile *f);
>  
>  /* SLIRP */
> diff --git a/savevm.c b/savevm.c
> index 7236232..b942e8c 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -39,6 +39,7 @@
>  #include "exec/memory.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "qemu/bitops.h"
>  #include "qemu/iov.h"
>  #include "block/snapshot.h"
>  #include "block/qapi.h"
> @@ -624,6 +625,92 @@ void qemu_savevm_send_openrp(QEMUFile *f)
>  {
>      qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
>  }
> +
> +/* Send prior to any RAM transfer */
> +void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
> +{
> +    DPRINTF("send postcopy-ram-advise");
> +    uint64_t tmp[2];
> +    tmp[0] = cpu_to_be64(sysconf(_SC_PAGESIZE));
> +    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
> +
> +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 16,
> +                             (uint8_t *)tmp);
> +}
> +
> +/* Prior to running, to cause pages that have been dirtied after precopy
> + * started to be discarded on the destination.
> + * CMD_POSTCOPY_RAM_DISCARD consist of:
> + *  3 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
> + *      byte   version (0)
> + *      byte   offset into the 1st data word containing 1st page of RAMBlock

I'm not able to follow what that description means.

> + *      byte   Length of name field
> + *  n x byte   RAM block name (NOT 0 terminated)
> + *  n x
> + *      be64   Page addresses for start of an invalidation range
> + *      be32   mask of 32 pages, '1' to discard'
> + *
> + *  Hopefully this is pretty sparse so we don't get too many entries,
> + *  and using the mask should deal with most pagesize differences
> + *  just ending up as a single full mask
> + *
> + * The mask is always 32bits irrespective of the long size
> + *
> + *  name:  RAMBlock name that these entries are part of
> + *  len: Number of page entries
> + *  addrlist: 'len' addresses
> + *  masklist: 'len' masks (corresponding to the addresses)
> + */
> +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> +                                           uint16_t len, uint8_t offset,
> +                                           uint64_t *addrlist,
> +                                           uint32_t *masklist)
> +{
> +    uint8_t *buf;
> +    uint16_t tmplen;
> +    uint16_t t;
> +
> +    DPRINTF("send postcopy-ram-discard");
> +    buf = g_malloc0(len*12 + strlen(name) + 3);
> +    buf[0] = 0; /* Version */
> +    buf[1] = offset;
> +    assert(strlen(name) < 256);
> +    buf[2] = strlen(name);
> +    memcpy(buf+3, name, strlen(name));
> +    tmplen = 3+strlen(name);
> +
> +    for (t = 0; t < len; t++) {
> +        cpu_to_be64w((uint64_t *)(buf + tmplen), addrlist[t]);
> +        tmplen += 8;
> +        cpu_to_be32w((uint32_t *)(buf + tmplen), masklist[t]);
> +        tmplen += 4;
> +    }
> +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,
> +                             tmplen, buf);
> +    g_free(buf);
> +}
> +
> +/* Get the destination into a state where it can receive page data. */
> +void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f)
> +{
> +    DPRINTF("send postcopy-ram-listen");
> +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL);
> +}
> +
> +/* Kick the destination into running */
> +void qemu_savevm_send_postcopy_ram_run(QEMUFile *f)
> +{
> +    DPRINTF("send postcopy-ram-run");
> +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL);
> +}
> +
> +/* End of postcopy - with a status byte; 0 is good, anything else is a fail */
> +void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status)
> +{
> +    DPRINTF("send postcopy-ram-end");
> +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status);
> +}
> +
>  bool qemu_savevm_state_blocked(Error **errp)
>  {
>      SaveStateEntry *se;
> @@ -935,6 +1022,220 @@ static LoadStateEntry_Head loadvm_handlers =
>  static int qemu_loadvm_state_main(QEMUFile *f,
>                                    LoadStateEntry_Head *loadvm_handlers);
>  
> +/* ------ incoming postcopy-ram messages ------ */
> +/* 'advise' arrives before any RAM transfers just to tell us that a postcopy
> + * *might* happen - it might be skipped if precopy transferred everything
> + * quickly.
> + */
> +static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis,
> +                                             uint64_t remote_hps,
> +                                             uint64_t remote_tps)
> +{
> +    DPRINTF("%s", __func__);
> +    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
> +        error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%d)",
> +                     mis->postcopy_ram_state);
> +        return -1;
> +    }
> +
> +    if (remote_hps != sysconf(_SC_PAGESIZE))  {
> +        /*
> +         * Some combinations of mismatch are probably possible but it gets
> +         * a bit more complicated.  In particular we need to place whole
> +         * host pages on the dest at once, and we need to ensure that we
> +         * handle dirtying to make sure we never end up sending part of
> +         * a hostpage on it's own.
> +         */
> +        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
> +                     (int)remote_hps, (int)sysconf(_SC_PAGESIZE));
> +        return -1;
> +    }
> +
> +    if (remote_tps != (1ul << qemu_target_page_bits())) {
> +        /*
> +         * Again, some differences could be dealt with, but for now keep it
> +         * simple.
> +         */
> +        error_report("Postcopy needs matching target page sizes (s=%d d=%d)",
> +                     (int)remote_tps, 1 << qemu_target_page_bits());
> +        return -1;
> +    }
> +
> +    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
> +
> +    /*
> +     * Postcopy will be sending lots of small messages along the return path
> +     * that it needs quick answers to.
> +     */
> +    socket_set_nodelay(qemu_get_fd(mis->return_path));

So, here you break the QEMUFile abstraction and assume you have a
socket.

> +    return 0;
> +}
> +
> +/* After postcopy we will be told to throw some pages away since they're
> + * dirty and will have to be demand fetched.  Must happen before CPU is
> + * started.
> + * There can be 0..many of these messages, each encoding multiple pages.
> + * Bits set in the message represent a page in the source VMs bitmap, but
> + * since the guest/target page sizes can be different on s/d then we have
> + * to convert.
> + */
> +static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> +                                              uint16_t len)
> +{
> +    int tmp;
> +    unsigned int first_bit_offset;
> +    char ramid[256];
> +
> +    DPRINTF("%s", __func__);
> +
> +    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
> +        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
> +                     mis->postcopy_ram_state);
> +        return -1;
> +    }
> +    /* We're expecting a
> +     *    3 byte header,
> +     *    a RAM ID string
> +     *    then at least 1 12 byte chunks
> +    */
> +    if (len < 16) {
> +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
> +        return -1;
> +    }
> +
> +    tmp = qemu_get_byte(mis->file);
> +    if (tmp != 0) {
> +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
> +        return -1;
> +    }
> +    first_bit_offset = qemu_get_byte(mis->file);
> +
> +    if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) {
> +        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
> +        return -1;
> +    }
> +
> +    len -= 3+strlen(ramid);
> +    if (len % 12) {
> +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
> +        return -1;
> +    }
> +    while (len) {
> +        uint64_t startaddr;
> +        uint32_t mask;
> +        /*
> +         * We now have pairs of address, mask
> +         *   The mask is 32 bits of bitmask starting at 'startaddr'-offset
> +         *   RAMBlock; e.g. if the RAMBlock started at 8k where TPS=4k
> +         *   then first_bit_offset=2 and the 1st 2 bits of the mask
> +         *   aren't relevant to this RAMBlock, and bit 2 corresponds
> +         *   to the 1st page of this RAMBlock

Um.. yeah.. can't make much snse of this comment either.

> +         */
> +        startaddr = qemu_get_be64(mis->file);
> +        mask = qemu_get_be32(mis->file);
> +
> +        len -= 12;
> +
> +        while (mask) {
> +            /* mask= .....?10...0 */
> +            /*             ^fs    */
> +            int firstset = ctz32(mask);
> +
> +            /* tmp32=.....?11...1 */
> +            /*             ^fs    */
> +            uint32_t tmp32 = mask | ((((uint32_t)1)<<firstset)-1);
> +
> +            /* mask= .?01..10...0 */
> +            /*         ^fz ^fs    */
> +            int firstzero = cto32(tmp32);
> +
> +            if ((startaddr == 0) && (firstset < first_bit_offset)) {
> +                error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set"
> +                               " prior to block; block=%s offset=%d"
> +                               " firstset=%d\n", ramid, first_bit_offset,
> +                               firstzero);
> +                return -1;
> +            }
> +
> +            /*
> +             * we know there must be at least 1 bit set due to the loop entry
> +             * If there is no 0 firstzero will be 32
> +             */
> +            /* TODO - ram_discard_range gets added in a later patch
> +            int ret = ram_discard_range(mis, ramid,
> +                                startaddr + firstset - first_bit_offset,
> +                                startaddr + (firstzero - 1) - first_bit_offset);
> +            ret = -1;
> +            if (ret) {
> +                return ret;
> +            }
> +            */
> +
> +            /* mask= .?0000000000 */
> +            /*         ^fz ^fs    */
> +            if (firstzero != 32) {
> +                mask &= (((uint32_t)-1) << firstzero);
> +            } else {
> +                mask = 0;
> +            }
> +        }
> +    }
> +    DPRINTF("%s finished", __func__);
> +
> +    return 0;
> +}
> +
> +/* After this message we must be able to immediately receive page data */

The purpose of the listen message from a protocol point of view isn't
really clear to me.  I understand why the destination needs to set up
the postcopy handling before processing the device data, but why does
it need an assertion from the source to start this, rather than just
an internal detail on the load path.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-11-03  3:46   ` David Gibson
@ 2014-11-03 13:22     ` Dr. David Alan Gilbert
  2014-11-18  3:52       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-03 13:22 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:22PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Open a return path, and handle messages that are received upon it.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> [snip]
> > @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
> >      int old_state ;
> >      trace_migrate_fd_cancel();
> >  
> > +    if (s->return_path) {
> > +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> > +        qemu_file_shutdown(s->return_path);
> 
> Terminating the rp thread via shutting down its file seems roundabout,
> and kind of dependent on the socket file implementation.

The rp thread might be in the middle of a blocking read()/recv()
so I'm doing a shutdown() to cause those to exit; once I have to do that
anyway it didn't seem necessary to add anything etra.

> [snip]
> > +__attribute__ (( unused )) /* Until later in patch series */
> > +static int open_outgoing_return_path(MigrationState *ms)
> > +{
> > +
> > +    ms->return_path = qemu_file_get_return_path(ms->file);
> 
> So, another reason this get_return_path abstraction doesn't seem right
> to me, is that it's not obvious that for non-socket file types, the
> source and destination side "get return path" operations would
> necessarily be the same.

However, since the implementation of the get_return_path is a method
on the particular implementation, and it can be different for a 
qemu_file opened for read or write, then that non-socket file type
could implement it how it likes including something like shutdown).

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration
  2014-11-03  3:12   ` David Gibson
@ 2014-11-03 13:53     ` Cristian Klein
  2014-11-18  3:53       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Cristian Klein @ 2014-11-03 13:53 UTC (permalink / raw)
  To: David Gibson
  Cc: Andrea Arcangeli, yamahata, lilei, quintela,
	Dr. David Alan Gilbert, qemu-devel, amit.shah, yanghy

On 03 Nov 2014, at 5:12 , David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Oct 03, 2014 at 06:47:18PM +0100, Dr. David Alan Gilbert (git) wrote:
>> From: Cristian Klein <cristian.klein@cs.umu.se>
> 
> This patch really, really requires a rationale in the commit message.
> The reason it's necessary is certainly not obvious.

“”"
libvirt prefers opening the TCP connection itself, for two reasons. First, connection failed errors can be detected easier, without having to parse qemu’s error output. Second, libvirt might be asked to secure the transfer by tunnelling the communication through an TLS layer. Therefore, libvirt opens the TCP connection itself and passes an FD to qemu using QMP and a POSIX-specific mechanism. Hence, in order to make the reverse-path work in such cases, qemu needs to distinguish if the transmitted FD is a socket (reverse-path available) or not (reverse-path might not be available) and use the corresponding abstraction.
“”"

If the above message is clarifies the purpose of this commit, feel free to add it in the next version of the patch.

Cristian

> 
>> 
>> Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se>
>> ---
>> migration-fd.c | 24 ++++++++++++++++++++++--
>> 1 file changed, 22 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration-fd.c b/migration-fd.c
>> index d2e523a..129da99 100644
>> --- a/migration-fd.c
>> +++ b/migration-fd.c
>> @@ -31,13 +31,29 @@
>>     do { } while (0)
>> #endif
>> 
>> +static bool fd_is_socket(int fd)
>> +{
>> +    struct stat stat;
>> +    int ret = fstat(fd, &stat);
>> +    if (ret == -1) {
>> +        /* When in doubt say no */
>> +        return false;
>> +    }
>> +    return S_ISSOCK(stat.st_mode);
>> +}
>> +
>> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>> {
>>     int fd = monitor_get_fd(cur_mon, fdname, errp);
>>     if (fd == -1) {
>>         return;
>>     }
>> -    s->file = qemu_fdopen(fd, "wb");
>> +
>> +    if (fd_is_socket(fd)) {
>> +        s->file = qemu_fopen_socket(fd, "wb");
>> +    } else {
>> +        s->file = qemu_fdopen(fd, "wb");
>> +    }
>> 
>>     migrate_fd_connect(s);
>> }
>> @@ -58,7 +74,11 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>>     DPRINTF("Attempting to start an incoming migration via fd\n");
>> 
>>     fd = strtol(infd, NULL, 0);
>> -    f = qemu_fdopen(fd, "rb");
>> +    if (fd_is_socket(fd)) {
>> +        f = qemu_fopen_socket(fd, "rb");
>> +    } else {
>> +        f = qemu_fdopen(fd, "rb");
>> +    }
>>     if(f == NULL) {
>>         error_setg_errno(errp, errno, "failed to open the source descriptor");
>>         return;
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-11-03  3:10   ` David Gibson
@ 2014-11-03 18:59     ` Dr. David Alan Gilbert
  2014-11-18  3:54       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-03 18:59 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:17PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The return path uses a non-blocking fd so as not to block waiting
> > for the (possibly broken) destination to finish returning a message,
> > however we still want outbound data to behave in the same way and block.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 35 insertions(+), 4 deletions(-)
> > 
> > diff --git a/qemu-file.c b/qemu-file.c
> > index 7393415..57eabd8 100644
> > --- a/qemu-file.c
> > +++ b/qemu-file.c
> > @@ -85,12 +85,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
> >      QEMUFileSocket *s = opaque;
> >      ssize_t len;
> >      ssize_t size = iov_size(iov, iovcnt);
> > +    ssize_t offset = 0;
> > +    int     err;
> >  
> > -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> > -    if (len < size) {
> > -        len = -socket_error();
> > +    while (size > 0) {
> > +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> > +
> > +        if (len > 0) {
> > +            size -= len;
> > +            offset += len;
> > +        }
> > +
> > +        if (size > 0) {
> > +            err = socket_error();
> > +
> > +            if (err != EAGAIN) {
> > +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> > +                             err, size, len);
> > +                /*
> > +                 * If I've already sent some but only just got the error, I
> > +                 * could return the amount validly sent so far and wait for the
> > +                 * next call to report the error, but I'd rather flag the error
> > +                 * immediately.
> 
> Is that safe?  This gives the caller no means to detect a partially
> completed send.

Well I'm returning the -err, so the caller knows something has gone wrong - it just
doesn't know whether it managed to send some part of the data before the failure.

Dave

> 
> > +                 */
> > +                return -err;

> > +            }
> > +
> > +            /* Emulate blocking */
> > +            GPollFD pfd;
> > +
> > +            pfd.fd = s->fd;
> > +            pfd.events = G_IO_OUT | G_IO_ERR;
> > +            pfd.revents = 0;
> > +            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
> > +        }
> >      }
> > -    return len;
> > +
> > +    return offset;
> >  }
> >  
> >  static int socket_get_fd(void *opaque)
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets
  2014-11-03  3:05   ` David Gibson
@ 2014-11-03 19:04     ` Dr. David Alan Gilbert
  2014-11-18  4:34       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-03 19:04 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy needs a method to send messages from the destination back to
> > the source, this is the 'return path'.
> > 
> > Wire it up for 'socket' QEMUFile's using a dup'd fd.
> 
> This doesn't seem like the right abstraction to me.  In particular I
> can't really see how you'd implement this for anything other than
> socket.
> 
> I'd suggest instead creating new "open" helper functions (within the
> QEMUFile code) that open both a forward and return path
> simultaneously.

Can you give an example of a transport where it would be a problem,
so I can look at how that works?

It's a little tricky since, on the destination, at the time we create
the connection we don't know that we're going to need the return path.

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2014-11-04  1:28   ` David Gibson
  2014-11-04 10:19     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On Fri, Oct 03, 2014 at 06:47:28PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
> of migration stream to be sent in one go, and be received by
> a separate instance of the loadvm loop while not interacting
> with the migration stream.
> 
> This is used by postcopy to load device state (from the package)
> while loading memory pages from the main stream.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Though one minor comment:

[snip]
> +/* We have a buffer of data to send; we don't want that all to be loaded
> + * by the command itself, so the command contains just the length of the
> + * extra buffer that we then send straight after it.
> + * TODO: Must be a better way to organise that

I'm not quite understanding what that comment's getting at.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
  2014-10-08  2:28   ` zhanghailiang
@ 2014-11-04  1:29   ` David Gibson
  1 sibling, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

On Fri, Oct 03, 2014 at 06:47:29PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Suspend to file is very much like a migrate, and it makes life
> easier if we have the Migration state available, so initialise it
> in the savevm.c code for suspending.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
@ 2014-11-04  1:33   ` David Gibson
  2014-11-19 17:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

On Fri, Oct 03, 2014 at 06:47:30PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Use that to split the qemu_savevm_state_pending counts into postcopiable
> and non-postcopiable amounts
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c                 |  7 +++++++
>  include/migration/vmstate.h |  2 +-
>  include/sysemu/sysemu.h     |  4 +++-
>  migration.c                 |  9 ++++++++-
>  savevm.c                    | 23 +++++++++++++++++++----
>  5 files changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 6970733..44072d8 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +/* RAM's always up for postcopying */
> +static bool ram_can_postcopy(void *opaque)
> +{
> +    return true;
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_live_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> @@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .save_live_pending = ram_save_pending,
>      .load_state = ram_load,
>      .cancel = ram_migration_cancel,
> +    .can_postcopy = ram_can_postcopy,

Is there actually any plausible device for which you'd need a callback
here, rather than just having a static bool?

On the other hand, it does seem kind of plausible that there might be
situations in which some data from a device must be pre-copied, but
more can be post-copied, which would necessitate extending the
per-handler callback to return quantities for both.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-11-04  1:40   ` David Gibson
  2014-11-25 17:34     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2865 bytes --]

On Fri, Oct 03, 2014 at 06:47:31PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
> 
> Creates postcopy-ram.c which will get most of the other helpers we need.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  Makefile.objs                    |   2 +-
>  include/migration/postcopy-ram.h |  19 +++++
>  postcopy-ram.c                   | 160 +++++++++++++++++++++++++++++++++++++++
>  savevm.c                         |   6 ++
>  4 files changed, 186 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/postcopy-ram.h
>  create mode 100644 postcopy-ram.c
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 97db978..fa0a3a0 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -54,7 +54,7 @@ common-obj-y += qemu-file.o
>  common-obj-$(CONFIG_RDMA) += migration-rdma.o
>  common-obj-y += qemu-char.o #aio.o
>  common-obj-y += block-migration.o
> -common-obj-y += page_cache.o xbzrle.o
> +common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
>  
>  common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
>  
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> new file mode 100644
> index 0000000..dcd1afa
> --- /dev/null
> +++ b/include/migration/postcopy-ram.h
> @@ -0,0 +1,19 @@
> +/*
> + * Postcopy migration for RAM
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *  Dave Gilbert  <dgilbert@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#ifndef QEMU_POSTCOPY_RAM_H
> +#define QEMU_POSTCOPY_RAM_H
> +
> +/* Return 0 if the host supports everything we need to do postcopy-ram */
> +int postcopy_ram_hosttest(void);

Maybe postcopy_supported_by_host() would be a bit clearer?

[snip]
> +#ifdef HOST_X86_64
> + /* NOTE: These are Andrea's 3.15.0 world */

I thought the usual approach in qemu was to import the updated headers
first in a separate patch, rather than embeddeding new defines.


> +#ifndef MADV_USERFAULT
> +#define MADV_USERFAULT   18
> +#define MADV_NOUSERFAULT 19
> +#endif
> +
> +#ifndef __NR_remap_anon_pages
> +#define __NR_remap_anon_pages 321
> +#endif
> +
> +#ifndef __NR_userfaultfd
> +#define __NR_userfaultfd 322
> +#endif
> +
> +#endif
> +
> +#ifndef USERFAULTFD_PROTOCOL
> +#define USERFAULTFD_PROTOCOL (uint64_t)0xaa
> +#endif
> +
> +#endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2014-11-04  1:47   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On Fri, Oct 03, 2014 at 06:47:32PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
> 
>   migrate_start_postcopy
> 
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
> 
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2014-11-04  1:49   ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-04  1:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On Fri, Oct 03, 2014 at 06:47:33PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> 'MIG_STATE_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> 
> 'migration_postcopy_phase' is provided for other sections to know if
> they're in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
@ 2014-11-04  2:18   ` David Gibson
  2014-12-17 16:14     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-04  2:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 3589 bytes --]

On Fri, Oct 03, 2014 at 06:47:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When postcopy calls qemu_savevm_state_complete it's not really
> the end of migration, so skip:
>    a) Finishing postcopiable iterative devices - they'll carry on
>    b) The termination byte on the end of the stream.
> 
> We then also add:
>   qemu_savevm_state_postcopy_complete
> which is called at the end of a postcopy migration to call the
> complete methods on devices skipped in the _complete call.

So, we should probably rename qemu_savevm_state_complete() to reflect
the fact that it's no longer actually a completion, but just the
transition from pre-copy to post-copy phases.  A good, brief name
doesn't immediately occur to me, unfortunately.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/sysemu/sysemu.h |  1 +
>  savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index e7ff3d0..46665ce 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -113,6 +113,7 @@ void qemu_savevm_state_cancel(void);
>  void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
>                                 uint64_t *res_non_postcopiable,
>                                 uint64_t *res_postcopiable);
> +void qemu_savevm_state_postcopy_complete(QEMUFile *f);
>  void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
>                                uint16_t len, uint8_t *data);
>  void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
> diff --git a/savevm.c b/savevm.c
> index a0cb88b..7c4541d 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -854,10 +854,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>      return ret;
>  }
>  
> +/*
> + * Calls the complete routines just for those devices that are postcopiable;
> + * causing the last few pages to be sent immediately and doing any associated
> + * cleanup.
> + * Note postcopy also calls the plain qemu_savevm_state_complete to complete
> + * all the other devices, but that happens at the point we switch to postcopy.
> + */
> +void qemu_savevm_state_postcopy_complete(QEMUFile *f)
> +{
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
> +        if (!se->ops || !se->ops->save_live_complete ||
> +            !se->ops->can_postcopy) {

So, you check for the presence of a can_postcopy callback, but you
don't ever actually invoke it.

> +            continue;
> +        }
> +        if (se->ops && se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +        trace_savevm_section_start(se->idstr, se->section_id);
> +        /* Section type */
> +        qemu_put_byte(f, QEMU_VM_SECTION_END);
> +        qemu_put_be32(f, se->section_id);
> +
> +        ret = se->ops->save_live_complete(f, se->opaque);

I'm wondering if it might be clearer not to overload the
save_live_complete hook, but instead allow both "execution transition"
(old complete) and "final complete" (postcopy complete) hooks
(expecting only one to be non-NULL in most cases).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
@ 2014-11-04  3:09   ` David Gibson
  2014-11-19 18:46     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-04  3:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 14042 bytes --]

On Fri, Oct 03, 2014 at 06:47:35PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The PMI holds the state of each page on the incoming side,
> so that we can tell if the page is missing, already received
> or there is a request outstanding for it.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Though there are a couple of minor comments below:

> ---
>  include/migration/migration.h    |  19 ++++
>  include/migration/postcopy-ram.h |  12 +++
>  include/qemu/typedefs.h          |   1 +
>  postcopy-ram.c                   | 220 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 252 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2ff9d35..1405a15 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -57,6 +57,24 @@ struct MigrationRetPathState {
>  
>  typedef struct MigrationState MigrationState;
>  
> +/* Postcopy page-map-incoming - data about each page on the inbound side */
> +
> +typedef enum {
> +   POSTCOPY_PMI_MISSING,   /* page hasn't yet been received */
> +   POSTCOPY_PMI_REQUESTED, /* Kernel asked for a page, but we've not got it */
> +   POSTCOPY_PMI_RECEIVED   /* We've got the page */
> +} PostcopyPMIState;
> +
> +struct PostcopyPMI {
> +    QemuMutex      mutex;
> +    unsigned long *received_map;  /* Pages that we have received */
> +    unsigned long *requested_map; /* Pages that we're sending a request for */
> +    unsigned long  host_mask;     /* A mask with enough bits set to cover one
> +                                     host page in the PMI */
> +    unsigned long  host_bits;     /* The number of bits in the map representing
> +                                     one host page */
> +};
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *file;
> @@ -71,6 +89,7 @@ struct MigrationIncomingState {
>  
>      QEMUFile *return_path;
>      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> +    PostcopyPMI    postcopy_pmi;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> index dcd1afa..addb88a 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -13,7 +13,19 @@
>  #ifndef QEMU_POSTCOPY_RAM_H
>  #define QEMU_POSTCOPY_RAM_H
>  
> +#include "migration/migration.h"
> +
>  /* Return 0 if the host supports everything we need to do postcopy-ram */
>  int postcopy_ram_hosttest(void);
>  
> +/*
> + * In 'advise' mode record that a page has been received.
> + */
> +void postcopy_hook_early_receive(MigrationIncomingState *mis,
> +                                 size_t bitmap_index);
> +
> +void postcopy_pmi_destroy(MigrationIncomingState *mis);
> +void postcopy_pmi_discard_range(MigrationIncomingState *mis,
> +                                size_t start, size_t npages);
> +void postcopy_pmi_dump(MigrationIncomingState *mis);
>  #endif
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 8539de6..61b330c 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -77,6 +77,7 @@ typedef struct QEMUSGList QEMUSGList;
>  typedef struct SHPCDevice SHPCDevice;
>  typedef struct FWCfgState FWCfgState;
>  typedef struct PcGuestInfo PcGuestInfo;
> +typedef struct PostcopyPMI PostcopyPMI;
>  typedef struct Range Range;
>  typedef struct AdapterInfo AdapterInfo;
>  
> diff --git a/postcopy-ram.c b/postcopy-ram.c
> index bba5c71..210585c 100644
> --- a/postcopy-ram.c
> +++ b/postcopy-ram.c
> @@ -23,6 +23,9 @@
>  #include "qemu-common.h"
>  #include "migration/migration.h"
>  #include "migration/postcopy-ram.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/bitmap.h"
> +#include "qemu/error-report.h"
>  
>  //#define DEBUG_POSTCOPY
>  
> @@ -82,6 +85,216 @@
>  #if defined(__linux__) && defined(MADV_USERFAULT) && \
>                            defined(__NR_remap_anon_pages)
>  
> +/* ---------------------------------------------------------------------- */
> +/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
> +/* state of each page used by the inbound postcopy                        */
> +/* It's a pair of bitmaps (of the same structure as the migration bitmaps)*/
> +/* holding one bit per target-page, although all operations work on host  */
> +/* pages.                                                                 */
> +__attribute__ (( unused )) /* Until later in patch series */
> +static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
> +{
> +    unsigned int tpb = qemu_target_page_bits();
> +    unsigned long host_bits;
> +
> +    qemu_mutex_init(&mis->postcopy_pmi.mutex);
> +    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
> +    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
> +    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
> +    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
> +    /*
> +     * Each bit in the map represents one 'target page' which is no bigger
> +     * than a host page but can be smaller.  It's useful to have some
> +     * convenience masks for later

So, there's no inherent reason a target page couldn't be bigger than a
host page.  It's fair enough not to handle that case for now, but
something somewhere should probably verify that it's no the case.

> +     */
> +
> +    /*
> +     * The number of bits one host page takes up in the bitmap
> +     * e.g. on a 64k host page, 4k Target page, host_bits=64/4=16
> +     */
> +    host_bits = sysconf(_SC_PAGESIZE) / (1ul << tpb);
> +    /* Should be a power of 2 */
> +    assert(host_bits && !(host_bits & (host_bits - 1)));
> +    /*
> +     * If the host_bits isn't a division of the number of bits in long
> +     * then the code gets a lot more complex; disallow for now
> +     * (I'm not aware of a system where it's true anyway)
> +     */
> +    assert(((sizeof(long) * 8) % host_bits) == 0);
> +
> +    mis->postcopy_pmi.host_bits = host_bits;
> +    /* A mask, starting at bit 0, containing host_bits continuous set bits */
> +    mis->postcopy_pmi.host_mask =  (1ul << host_bits) - 1;
> +
> +    assert((ram_pages % host_bits) == 0);
> +}
> +
> +void postcopy_pmi_destroy(MigrationIncomingState *mis)
> +{
> +    if (mis->postcopy_pmi.received_map) {
> +        g_free(mis->postcopy_pmi.received_map);

g_free() is safe to call on NULL anyway, isn't it?

> +        mis->postcopy_pmi.received_map = NULL;
> +    }
> +    if (mis->postcopy_pmi.requested_map) {
> +        g_free(mis->postcopy_pmi.requested_map);
> +        mis->postcopy_pmi.requested_map = NULL;
> +    }
> +    qemu_mutex_destroy(&mis->postcopy_pmi.mutex);
> +}
> +
> +/*
> + * Mark a set of pages in the PMI as being clear; this is used by the discard
> + * at the start of postcopy, and before the postcopy stream starts.
> + */
> +void postcopy_pmi_discard_range(MigrationIncomingState *mis,
> +                                size_t start, size_t npages)
> +{
> +    bitmap_clear(mis->postcopy_pmi.received_map, start, npages);
> +}
> +
> +/*
> + * Test a host-page worth of bits in the map starting at bitmap_index
> + * The bits should all be consistent
> + */
> +static bool test_hpbits(MigrationIncomingState *mis,
> +                        size_t bitmap_index, unsigned long *map)
> +{
> +    long masked;
> +
> +    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
> +
> +    masked = (map[BIT_WORD(bitmap_index)] >>
> +               (bitmap_index % BITS_PER_LONG)) &
> +             mis->postcopy_pmi.host_mask;
> +
> +    assert((masked == 0) || (masked == mis->postcopy_pmi.host_mask));
> +    return !!masked;
> +}
> +
> +/*
> + * Set host-page worth of bits in the map starting at bitmap_index
> + */
> +static void set_hpbits(MigrationIncomingState *mis,
> +                       size_t bitmap_index, unsigned long *map)
> +{
> +    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
> +
> +    map[BIT_WORD(bitmap_index)] |= mis->postcopy_pmi.host_mask <<
> +                                    (bitmap_index % BITS_PER_LONG);
> +}
> +
> +/*
> + * Clear host-page worth of bits in the map starting at bitmap_index
> + */
> +static void clear_hpbits(MigrationIncomingState *mis,
> +                         size_t bitmap_index, unsigned long *map)
> +{
> +    assert((bitmap_index & (mis->postcopy_pmi.host_bits-1)) == 0);
> +
> +    map[BIT_WORD(bitmap_index)] &= ~(mis->postcopy_pmi.host_mask <<
> +                                    (bitmap_index % BITS_PER_LONG));
> +}
> +
> +/*
> + * Retrieve the state of the given page
> + * Note: This version for use by callers already holding the lock
> + */
> +static PostcopyPMIState postcopy_pmi_get_state_nolock(
> +                            MigrationIncomingState *mis,
> +                            size_t bitmap_index)
> +{
> +    bool received, requested;
> +
> +    received = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.received_map);
> +    requested = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
> +
> +    if (received) {
> +        assert(!requested);

Clearing the requested bit when you set the received bit seems a bit
pointless.  (requested && received) isn't meaningfully different from
(!requested && received) but there seems no reason to go to extra
trouble to avoid that state, and having the record might be
interesting for gathering statistics.

> +        return POSTCOPY_PMI_RECEIVED;
> +    } else {
> +        return requested ? POSTCOPY_PMI_REQUESTED : POSTCOPY_PMI_MISSING;
> +    }
> +}
> +
> +/* Retrieve the state of the given page */
> +__attribute__ (( unused )) /* Until later in patch series */
> +static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
> +                                               size_t bitmap_index)
> +{
> +    PostcopyPMIState ret;
> +    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
> +    ret = postcopy_pmi_get_state_nolock(mis, bitmap_index);
> +    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
> +
> +    return ret;
> +}
> +
> +/*
> + * Set the page state to the given state if the previous state was as expected
> + * Return the actual previous state.
> + */
> +__attribute__ (( unused )) /* Until later in patch series */
> +static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
> +                                           size_t bitmap_index,
> +                                           PostcopyPMIState expected_state,
> +                                           PostcopyPMIState new_state)
> +{
> +    PostcopyPMIState old_state;
> +
> +    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
> +    old_state = postcopy_pmi_get_state_nolock(mis, bitmap_index);
> +
> +    if (old_state == expected_state) {
> +        switch (new_state) {
> +        case POSTCOPY_PMI_MISSING:
> +          assert(0); /* This shouldn't actually happen - use discard_range */
> +          break;
> +
> +        case POSTCOPY_PMI_REQUESTED:
> +          assert(old_state == POSTCOPY_PMI_MISSING);
> +          set_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
> +          break;
> +
> +        case POSTCOPY_PMI_RECEIVED:
> +          assert(old_state == POSTCOPY_PMI_MISSING ||
> +                 old_state == POSTCOPY_PMI_REQUESTED);
> +          set_hpbits(mis, bitmap_index, mis->postcopy_pmi.received_map);
> +          clear_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
> +          break;
> +        }
> +    }
> +
> +    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
> +    return old_state;
> +}
> +
> +/*
> + * Useful when debugging postcopy, although if it failed early the
> + * received map can be quite sparse and thus big when dumped.
> + */
> +void postcopy_pmi_dump(MigrationIncomingState *mis)
> +{
> +    fprintf(stderr, "postcopy_pmi_dump: requested\n");
> +    ram_debug_dump_bitmap(mis->postcopy_pmi.requested_map, false);
> +    fprintf(stderr, "postcopy_pmi_dump: received\n");
> +    ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
> +    fprintf(stderr, "postcopy_pmi_dump: end\n");
> +}
> +
> +/* Called by ram_load prior to mapping the page */
> +void postcopy_hook_early_receive(MigrationIncomingState *mis,
> +                                 size_t bitmap_index)
> +{
> +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {

A silent no-op if you're not in the expected migration phase doesn't
seem right.  Should this be an assert() instead?

> +        /*
> +         * If we're in precopy-advise mode we need to track received pages even
> +         * though we don't need to place pages atomically yet.
> +         * In advise mode there's only a single thread, so don't need locks
> +         */
> +        set_bit(bitmap_index, mis->postcopy_pmi.received_map);
> +    }
> +}
> +
>  int postcopy_ram_hosttest(void)
>  {
>      /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
> @@ -156,5 +369,12 @@ int postcopy_ram_hosttest(void)
>      return -1;
>  }
>  
> +/* Called by ram_load prior to mapping the page */
> +void postcopy_hook_early_receive(MigrationIncomingState *mis,
> +                                 size_t bitmap_index)
> +{
> +    /* We don't support postcopy so don't care */
> +}
> +
>  #endif
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-11-04  1:28   ` David Gibson
@ 2014-11-04 10:19     ` Dr. David Alan Gilbert
  2014-11-18  4:36       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-04 10:19 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:28PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
> > of migration stream to be sent in one go, and be received by
> > a separate instance of the loadvm loop while not interacting
> > with the migration stream.
> > 
> > This is used by postcopy to load device state (from the package)
> > while loading memory pages from the main stream.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Though one minor comment:
> 
> [snip]
> > +/* We have a buffer of data to send; we don't want that all to be loaded
> > + * by the command itself, so the command contains just the length of the
> > + * extra buffer that we then send straight after it.
> > + * TODO: Must be a better way to organise that
> 
> I'm not quite understanding what that comment's getting at.

We have these VM Commands; and they are a command type, and a length:
     CMD_whatever
     length: whatever
     data for whatever

This comment is describing that, to make things easier for this code it's
ended up as:

     CMD_PACKAGED
     CMD length: 4    <--- i.e. just enough to hold the next 'length' field
     package length
    ---------------
    The package

Which is a little different, hence i thought it needed the comment.

Dave
 
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
  2014-10-23 20:15       ` Paolo Bonzini
  2014-11-03  3:20         ` David Gibson
@ 2014-11-04 18:58         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-04 18:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:

> > the only oddity I get from that is from the 'SEND_ACK' you suggested;
> > since all my functions to send commands are send_  I currently have
> >  'qemu_savevm_send_send_ack'  which while consistent looks a bit odd.
> 
> Perhaps ping/pong?

Done (although I'm sure I'll find 'ack' in the comments somewhere).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState
  2014-11-03  2:45   ` David Gibson
@ 2014-11-04 19:06     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-04 19:06 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > There are currently lots of pieces of incoming migration state scattered
> > around, and postcopy is adding more, and it seems better to try and keep
> > it together.
> > 
> > allocate MIS in process_incoming_migration_co
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  9 +++++++++
> >  include/qemu/typedefs.h       |  2 ++
> >  migration.c                   | 28 ++++++++++++++++++++++++++++
> >  savevm.c                      |  2 ++
> >  4 files changed, 41 insertions(+)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 3cb5ba8..8a36255 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -41,6 +41,15 @@ struct MigrationParams {
> >  
> >  typedef struct MigrationState MigrationState;
> >  
> > +/* State for the incoming migration */
> > +struct MigrationIncomingState {
> > +    QEMUFile *file;
> > +};
> > +
> > +MigrationIncomingState *migration_incoming_get_current(void);
> > +MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
> 
> Hrm.  I'd prefer to see this called migration_incoming_state_new(),
> since it allocates a new structure, rather than just initializing an
> already allocated one.
> 
> I guess you're trying to match migrate_init() in name, so i guess
> migrate_incoming_init() would work as well.

No, you're right the 1st time, _new is more consistent; fixed.

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2014-11-05  6:38   ` David Gibson
  2014-12-17 16:48     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-05  6:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On Fri, Oct 03, 2014 at 06:47:36PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Where postcopy is preceeded by a period of precopy, the destination will
> have received pages that may have been dirtied on the source after the
> page was sent.  The destination must throw these pages away before
> starting it's CPUs.
> 
> Maintain a 'sentmap' of pages that have already been sent.
> Calculate list of sent & dirty pages
> Provide helpers on the destination side to discard these.

I find this one really hard to wrap my head around, and I'm having
trouble putting my finger on why.

I do wonder if the "base + tiny bitmap" encodinng for the discard list
over the wire is the best choice.  It seems to involve a bunch of
rather tedious code rejigging the bitmap into 32-bit chunks, and a
bunch of rather hard to follow code moving back and forth between that
encoding and simple address or page ranges for handling the actual
discards.  It also involves sending the bit offsets for the start of
each ram block over the wire, which feels like it should be an
internal detail.

Would just a simple list of start..end or start/len pairs end up
simpler overall?  Converting the bitmap used to track it on the
source into ranges would be a little fiddly, but I suspect less so
than the code to split into 32-bit pieces.

It might also be a bit more robust against possible future options for
source host vs. dest host vs. target page size, since the source can
construct it in terms if its granularity constraints, and destination
can round each chunk out to its own granularity.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2014-11-05  6:47   ` David Gibson
  2014-12-17 17:21     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-05  6:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 6282 bytes --]

On Fri, Oct 03, 2014 at 06:47:37PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c                      |  11 ++++
>  include/migration/migration.h    |   1 +
>  include/migration/postcopy-ram.h |  12 +++++
>  migration.c                      |   1 +
>  postcopy-ram.c                   | 110 ++++++++++++++++++++++++++++++++++++++-
>  savevm.c                         |   4 ++
>  6 files changed, 138 insertions(+), 1 deletion(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 030d189..4a03171 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1345,6 +1345,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>      }
>  }
>  
> +/*
> + * Allocate data structures etc needed by incoming migration with postcopy-ram
> + * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
> + */
> +int ram_postcopy_incoming_init(MigrationIncomingState *mis)
> +{
> +    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> +
> +    return postcopy_ram_incoming_init(mis, ram_pages);
> +}

Um.. yeah.  I'm sure ram_postcopy_incoming_init versus
postcopy_ram_incoming_init won't get confusing o_O.

[snip]
> +/*
> + * Setup an area of RAM so that it *can* be used for postcopy later; this
> + * must be done right at the start prior to pre-copy.
> + * opaque should be the MIS.
> + */
> +static int init_area(const char *block_name, void *host_addr,
> +                     ram_addr_t offset, ram_addr_t length, void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +
> +    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
> +            block_name, host_addr, offset, length, length);
> +    /*
> +     * We need the whole of RAM to be truly empty for postcopy, so things
> +     * like ROMs and any data tables built during init must be zero'd
> +     * - we're going to get the copy from the source anyway.
> +     */
> +    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
> +        return -1;
> +    }
> +
> +    /*
> +     * We also need the area to be normal 4k pages, not huge pages
> +     * (otherwise we can't be sure we can use remap_anon_pages to put
> +     * a 4k page in later).  THP might come along and map a 2MB page
> +     * and when it's partially accessed in precopy it might not break
> +     * it down, but leave a 2MB zero'd page.
> +     */
> +    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
> +        perror("init_area: NOHUGEPAGE");
> +        return -1;
> +    }

I'm assuming this is because remap_anon_pages() can't automatically
split a THP itself.  It's not immediately obvious to me why it can't
though.

Also.. what effect will this have on an actual hugetlbfs memory
region?  If there's code to handle that case I haven't spotted it yet.

> +
> +    return 0;
> +}
> +
> +/*
> + * At the end of migration, undo the effects of init_area
> + * opaque should be the MIS.
> + */
> +static int cleanup_area(const char *block_name, void *host_addr,
> +                        ram_addr_t offset, ram_addr_t length, void *opaque)
> +{
> +    /* Turn off userfault here as well? */

This comment appears to be obsoleted by the code below.

> +
> +    DPRINTF("cleanup_area: %s: %p offset=%zx length=%zd(%zx)",
> +            block_name, host_addr, offset, length, length);
> +    /*
> +     * We turned off hugepage for the precopy stage with postcopy enabled
> +     * we can turn it back on now.
> +     */
> +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
> +        perror("init_area: HUGEPAGE");
> +        return -1;
> +    }
> +
> +    /*
> +     * We can also turn off userfault now since we should have all the
> +     * pages.   It can be useful to leave it on to debug postcopy
> +     * if you're not sure it's always getting every page.
> +     */
> +    if (madvise(host_addr, length, MADV_NOUSERFAULT)) {
> +        perror("init_area: NOUSERFAULT");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Initialise postcopy-ram, setting the RAM to a state where we can go into
> + * postcopy later; must be called prior to any precopy.
> + * called from arch_init's similarly named ram_postcopy_incoming_init
> + */
> +int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> +{
> +    postcopy_pmi_init(mis, ram_pages);
> +
> +    if (qemu_ram_foreach_block(init_area, mis)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * At the end of a migration where postcopy_ram_incoming_init was called.
> + */
> +int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> +{
> +    /* TODO: Join the fault thread once we're sure it will exit */
> +    if (qemu_ram_foreach_block(cleanup_area, mis)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  #else
>  /* No target OS support, stubs just fail */
>  
> @@ -404,6 +501,17 @@ void postcopy_hook_early_receive(MigrationIncomingState *mis,
>      /* We don't support postcopy so don't care */
>  }
>  
> +int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> +{
> +    error_report("postcopy_ram_incoming_init: No OS support");
> +    return -1;
> +}
> +
> +int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> +{
> +    assert(0);
> +}
> +
>  void postcopy_pmi_destroy(MigrationIncomingState *mis)
>  {
>      /* Called in normal cleanup path - so it's OK */
> diff --git a/savevm.c b/savevm.c
> index 7f9e0b2..54bdb26 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1166,6 +1166,10 @@ static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis,
>          return -1;
>      }
>  
> +    if (ram_postcopy_incoming_init(mis)) {
> +        return -1;
> +    }
> +
>      mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
>  
>      /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
  2014-10-04 16:42   ` Paolo Bonzini
@ 2014-11-05  6:49   ` David Gibson
  2014-11-19 18:59     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-05  6:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]

On Fri, Oct 03, 2014 at 06:47:38PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h    |  2 ++
>  include/migration/postcopy-ram.h |  6 +++++
>  postcopy-ram.c                   | 49 +++++++++++++++++++++++++++++++++++++++-
>  savevm.c                         |  9 ++++++++
>  4 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index be63c89..b01cc17 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -87,6 +87,8 @@ struct MigrationIncomingState {
>          POSTCOPY_RAM_INCOMING_END
>      } postcopy_ram_state;
>  
> +    /* For the kernel to send us notifications */
> +    int            userfault_fd;
>      QEMUFile *return_path;
>      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>      PostcopyPMI    postcopy_pmi;
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> index 8f237a2..413b670 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -19,6 +19,12 @@
>  int postcopy_ram_hosttest(void);
>  
>  /*
> + * Make all of RAM sensitive to accesses to areas that haven't yet been written
> + * and wire up anything necessary to deal with it.
> + */
> +int postcopy_ram_enable_notify(MigrationIncomingState *mis);
> +
> +/*
>   * Initialise postcopy-ram, setting the RAM to a state where we can go into
>   * postcopy later; must be called prior to any precopy.
>   * called from arch_init's similarly named ram_postcopy_incoming_init
> diff --git a/postcopy-ram.c b/postcopy-ram.c
> index 8eccf26..925ac77 100644
> --- a/postcopy-ram.c
> +++ b/postcopy-ram.c
> @@ -485,9 +485,51 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>      return 0;
>  }
>  
> +/*
> + * Mark the given area of RAM as requiring notification to unwritten areas
> + * Used as a  callback on qemu_ram_foreach_block.
> + *   host_addr: Base of area to mark
> + *   offset: Offset in the whole ram arena
> + *   length: Length of the section
> + *   opaque: Unused

                ^^^^^^
This appears to be wrong - opaque is used to find the MIS.

> + * Returns 0 on success
> + */
> +static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
> +                                       ram_addr_t offset, ram_addr_t length,
> +                                       void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +    uint64_t tokern[2];

"tokern"?

> +
> +    if (madvise(host_addr, length, MADV_USERFAULT)) {
> +        perror("postcopy_ram_sensitise_area madvise");
> +        return -1;
> +    }
> +
> +    /* Now tell our userfault_fd that it's responsible for this area */
> +    tokern[0] = (uint64_t)(uintptr_t)host_addr | 1; /* 1 means register area */
> +    tokern[1] = (uint64_t)(uintptr_t)host_addr + length;
> +    if (write(mis->userfault_fd, tokern, 16) != 16) {
> +        perror("postcopy_ram_sensitise_area write");
> +        madvise(host_addr, length, MADV_NOUSERFAULT);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> +{
> +    /* Mark so that we get notified of accesses to unwritten areas */
> +    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  #else
>  /* No target OS support, stubs just fail */
> -
>  int postcopy_ram_hosttest(void)
>  {
>      error_report("postcopy_ram_hosttest: No OS support");
> @@ -528,6 +570,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
>  {
>      assert(0);
>  }
> +
> +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> +{
> +    assert(0);
> +}
>  #endif
>  
>  /* ------------------------------------------------------------------------- */
> diff --git a/savevm.c b/savevm.c
> index 54bdb26..859c96f 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1304,6 +1304,15 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
>  
>      mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
>  
> +    /*
> +     * Sensitise RAM - can now generate requests for blocks that don't exist
> +     * However, at this point the CPU shouldn't be running, and the IO
> +     * shouldn't be doing anything yet so don't actually expect requests
> +     */
> +    if (postcopy_ram_enable_notify(mis)) {
> +        return -1;
> +    }
> +
>      /* TODO start up the postcopy listening thread */
>      return 0;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
  2014-10-04 16:27   ` Paolo Bonzini
@ 2014-11-10  6:05   ` David Gibson
  2015-01-05 16:06     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-10  6:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 13356 bytes --]

On Fri, Oct 03, 2014 at 06:47:39PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Rework the migration thread to setup and start postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |   3 +
>  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 185 insertions(+), 19 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index b01cc17..f401775 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -125,6 +125,9 @@ struct MigrationState
>      /* Flag set once the migration has been asked to enter postcopy */
>      volatile bool start_postcopy;
>  
> +    /* Flag set once the migration thread is running (and needs joining) */
> +    volatile bool started_migration_thread;
> +
>      /* bitmap of pages that have been sent at least once
>       * only maintained and used in postcopy at the moment
>       * where it's used to send the dirtymap at the start
> diff --git a/migration.c b/migration.c
> index 63d70b6..1731017 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -475,7 +475,10 @@ static void migrate_fd_cleanup(void *opaque)
>      if (s->file) {
>          trace_migrate_fd_cleanup();
>          qemu_mutex_unlock_iothread();
> -        qemu_thread_join(&s->thread);
> +        if (s->started_migration_thread) {
> +            qemu_thread_join(&s->thread);
> +            s->started_migration_thread = false;
> +        }
>          qemu_mutex_lock_iothread();
>  
>          qemu_fclose(s->file);
> @@ -872,7 +875,6 @@ out:
>      return NULL;
>  }
>  
> -__attribute__ (( unused )) /* Until later in patch series */
>  static int open_outgoing_return_path(MigrationState *ms)
>  {
>  
> @@ -890,7 +892,6 @@ static int open_outgoing_return_path(MigrationState *ms)
>      return 0;
>  }
>  
> -__attribute__ (( unused )) /* Until later in patch series */
>  static void await_outgoing_return_path_close(MigrationState *ms)
>  {
>      /*
> @@ -908,6 +909,97 @@ static void await_outgoing_return_path_close(MigrationState *ms)
>      DPRINTF("%s: Exit", __func__);
>  }
>  
> +/* Switch from normal iteration to postcopy
> + * Returns non-0 on error
> + */
> +static int postcopy_start(MigrationState *ms)
> +{
> +    int ret;
> +    const QEMUSizedBuffer *qsb;
> +    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
> +
> +    DPRINTF("postcopy_start\n");
> +    qemu_mutex_lock_iothread();
> +    DPRINTF("postcopy_start: setting run state\n");
> +    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +
> +    if (ret < 0) {
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;
> +    }
> +
> +    /*
> +     * in Finish migrate and with the io-lock held everything should
> +     * be quiet, but we've potentially still got dirty pages and we
> +     * need to tell the destination to throw any pages it's already received
> +     * that are dirty
> +     */
> +    if (ram_postcopy_send_discard_bitmap(ms)) {
> +        DPRINTF("postcopy send discard bitmap failed\n");
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;
> +    }
> +
> +    DPRINTF("postcopy_start: sending req 2\n");
> +    qemu_savevm_send_reqack(ms->file, 2);

Are these reqacks just for debugging, or do they affect the protocol?

> +    /*
> +     * send rest of state - note things that are doing postcopy
> +     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
> +     * wrap their state up here
> +     */
> +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> +    DPRINTF("postcopy_start: do state_complete\n");
> +
> +    /*
> +     * We need to leave the fd free for page transfers during the
> +     * loading of the device state, so wrap all the remaining
> +     * commands and state into a package that gets sent in one go
> +     */
> +    QEMUFile *fb = qemu_bufopen("w", NULL);
> +    if (!fb) {
> +        error_report("Failed to create buffered file");
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        return -1;
> +    }
> +
> +    qemu_savevm_state_complete(fb);
> +    DPRINTF("postcopy_start: sending req 3\n");
> +    qemu_savevm_send_reqack(fb, 3);
> +
> +    qemu_savevm_send_postcopy_ram_run(fb);
> +
> +    /* <><> end of stuff going into the package */
> +    qsb = qemu_buf_get(fb);
> +
> +    /* Now send that blob */
> +    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
> +        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
> +                (unsigned long)(qsb_get_length(qsb)));
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +        qemu_mutex_unlock_iothread();
> +        qemu_fclose(fb);
> +        return -1;
> +    }
> +    qemu_savevm_send_packaged(ms->file, qsb);
> +    qemu_fclose(fb);
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    DPRINTF("postcopy_start not finished sending ack\n");
> +    qemu_savevm_send_reqack(ms->file, 4);
> +
> +    ret = qemu_file_get_error(ms->file);
> +    if (ret) {
> +        error_report("postcopy_start: Migration stream errored");
> +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> +    }
> +
> +    return ret;
> +}
> +
>  /*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
> @@ -915,16 +1007,36 @@ static void await_outgoing_return_path_close(MigrationState *ms)
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    /* Used by the bandwidth calcs, updated later */
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
>      int64_t max_size = 0;
>      int64_t start_time = initial_time;
> +
>      bool old_vm_running = false;
>  
> +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> +    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
> +
>      qemu_savevm_state_begin(s->file, &s->params);
>  
> +    if (migrate_postcopy_ram()) {
> +        /* Now tell the dest that it should open it's end so it can reply */

s/it's/its/

> +        qemu_savevm_send_openrp(s->file);
> +
> +        /* And ask it to send an ack that will make stuff easier to debug */
> +        qemu_savevm_send_reqack(s->file, 1);
> +
> +        /* Tell the destination that we *might* want to do postcopy later;
> +         * if the other end can't do postcopy it should fail now, nice and
> +         * early.
> +         */
> +        qemu_savevm_send_postcopy_ram_advise(s->file);
> +    }
> +
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> +    current_active_type = MIG_STATE_ACTIVE;
>      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
>  
>      DPRINTF("setup complete\n");
> @@ -945,37 +1057,74 @@ static void *migration_thread(void *opaque)
>                      " nonpost=%" PRIu64 ")\n",
>                      pending_size, max_size, pend_post, pend_nonpost);
>              if (pending_size && pending_size >= max_size) {
> +                /* Still a significant amount to transfer */
> +
> +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                if (migrate_postcopy_ram() &&
> +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> +                    pend_nonpost == 0 && s->start_postcopy) {

Hrm.  This is checking for pend_nonpost == 0, rather than just close
to zero.  IIUC this will only work if all "live sendable" state is
also postcopyable.  But if we have live sendable data that's not
postcopyable - like the power hash page table - we'll need some
threshold here, like we currently have for entering the stopped vm
phase of a precopy migration.

Or am I missing something?

> +
> +                    if (!postcopy_start(s)) {
> +                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> +                    }
> +
> +                    continue;
> +                }
> +                /* Just another iteration step */
>                  qemu_savevm_state_iterate(s->file);
>              } else {
>                  int ret;
>  
> -                DPRINTF("done iterating\n");
> -                qemu_mutex_lock_iothread();
> -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> -                old_vm_running = runstate_is_running();
> +                DPRINTF("done iterating pending size %" PRIu64 "\n",
> +                        pending_size);
> +
> +                if (s->state == MIG_STATE_ACTIVE) {
> +                    qemu_mutex_lock_iothread();
> +                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> +                    old_vm_running = runstate_is_running();
> +
> +                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +                    if (ret >= 0) {
> +                        qemu_file_set_rate_limit(s->file, INT64_MAX);
> +                        qemu_savevm_state_complete(s->file);
> +                    }
> +                    qemu_mutex_unlock_iothread();
> +
> +                    if (ret < 0) {
> +                        migrate_set_state(s, current_active_type,
> +                                          MIG_STATE_ERROR);
> +                        break;
> +                    }
> +                } else if (s->state == MIG_STATE_POSTCOPY_ACTIVE) {
> +                    DPRINTF("postcopy end\n");
> +
> +                    qemu_savevm_state_postcopy_complete(s->file);
> +                    DPRINTF("postcopy end after complete\n");
>  
> -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> -                if (ret >= 0) {
> -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> -                    qemu_savevm_state_complete(s->file);
>                  }
> -                qemu_mutex_unlock_iothread();
>  
> -                if (ret < 0) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> -                    break;
> +                /*
> +                 * If rp was opened we must clean up the thread before
> +                 * cleaning everything else up.
> +                 * Postcopy opens rp if enabled (even if it's not avtivated)
> +                 */
> +                if (migrate_postcopy_ram()) {
> +                    DPRINTF("before rp close");
> +                    await_outgoing_return_path_close(s);
> +                    DPRINTF("after rp close");
>                  }
> -
>                  if (!qemu_file_get_error(s->file)) {
> -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
> +                    migrate_set_state(s, current_active_type,
> +                                      MIG_STATE_COMPLETED);
>                      break;
>                  }
>              }
>          }
>  
>          if (qemu_file_get_error(s->file)) {
> -            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> +            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> +            DPRINTF("migration_thread: file is in error state\n");
>              break;
>          }
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> @@ -1006,6 +1155,7 @@ static void *migration_thread(void *opaque)
>          }
>      }
>  
> +    DPRINTF("migration_thread: After loop");
>      qemu_mutex_lock_iothread();
>      if (s->state == MIG_STATE_COMPLETED) {
>          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> @@ -1043,6 +1193,19 @@ void migrate_fd_connect(MigrationState *s)
>      /* Notify before starting migration thread */
>      notifier_list_notify(&migration_state_notifiers, s);
>  
> +    /* Open the return path; currently for postcopy but other things might
> +     * also want it.
> +     */
> +    if (migrate_postcopy_ram()) {
> +        if (open_outgoing_return_path(s)) {
> +            error_report("Unable to open return-path for postcopy");
> +            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
> +            migrate_fd_cleanup(s);
> +            return;
> +        }
> +    }
> +
>      qemu_thread_create(&s->thread, "migration", migration_thread, s,
>                         QEMU_THREAD_JOINABLE);
> +    s->started_migration_thread = true;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
@ 2014-11-10  6:10   ` David Gibson
  2014-11-19 18:56     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-10  6:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On Fri, Oct 03, 2014 at 06:47:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

This could do with a bit more rationale in the commit message.

Also is there a reason not to fold this with the patch originally
marking the RAM as userfault?  IIRC that one wasn't partocularly long
either.

Otherwise

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
@ 2014-11-10  6:19   ` David Gibson
  2014-11-19 20:01     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-10  6:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 6360 bytes --]

On Fri, Oct 03, 2014 at 06:47:41PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
> destination to request a page from the source.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  3 ++
>  migration.c                   | 74 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index cdd0e56..5e0d30d 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -45,6 +45,7 @@ enum mig_rpcomm_cmd {
>      MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
>      MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
>      MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
> +    MIG_RPCOMM_REQPAGES,     /* data (start: be64, len: be64) */
>      MIG_RPCOMM_AFTERLASTVALID
>  };
>  
> @@ -250,6 +251,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>                            uint32_t value);
>  void migrate_send_rp_ack(MigrationIncomingState *mis,
>                           uint32_t value);
> +void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char* rbname,
> +                              ram_addr_t start, ram_addr_t len);
>  
>  
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> diff --git a/migration.c b/migration.c
> index 1731017..cfdaa52 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -144,6 +144,38 @@ void migrate_send_rp_ack(MigrationIncomingState *mis,
>      migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
>  }
>  
> +/* Request a range of pages from the source VM at the given
> + * start address.
> + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> + *           as the last request (a name must have been given previously)
> + *   Start: Address offset within the RB
> + *   Len: Length in bytes required - must be a multiple of pagesize
> + */
> +void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char *rbname,
> +                              ram_addr_t start, ram_addr_t len)
> +{
> +    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
> +    uint64_t *buf64 = (uint64_t *)bufc;
> +    size_t msglen = 16; /* start + len */
> +
> +    assert(!(len & 1));
> +    if (rbname) {
> +        int rbname_len = strlen(rbname);
> +        assert(rbname_len < 256);
> +
> +        len |= 1; /* Flag to say we've got a name */
> +        bufc[msglen++] = rbname_len;
> +        memcpy(bufc + msglen, rbname, rbname_len);
> +        msglen += rbname_len;
> +    }
> +
> +    buf64[0] = (uint64_t)start;
> +    buf64[0] = cpu_to_be64(buf64[0]);

I think this would be clearer as well as less verbose, as just:
	buf64[0] = cpu_to_be64(start);

> +    buf64[1] = (uint64_t)len;
> +    buf64[1] = cpu_to_be64(buf64[1]);
> +    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
> +}
> +
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p;
> @@ -784,6 +816,17 @@ static void source_return_path_bad(MigrationState *s)
>  }
>  
>  /*
> + * Process a request for pages received on the return path,
> + * We're allowed to send more than requested (e.g. to round to our page size)
> + * and we don't need to send pages that have already been sent.
> + */
> +static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> +                                       ram_addr_t start, ram_addr_t len)
> +{
> +    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> +}
> +
> +/*
>   * Handles messages sent on the return path towards the source VM
>   *
>   */
> @@ -795,6 +838,8 @@ static void *source_return_path_thread(void *opaque)
>      const int max_len = 512;
>      uint8_t buf[max_len];
>      uint32_t tmp32;
> +    uint64_t tmp64a, tmp64b;

Hrm.. calling everything "tmp*" doesn't help readability.

> +    char *tmpstr;
>      int res;
>  
>      DPRINTF("RP: %s entry", __func__);
> @@ -810,6 +855,11 @@ static void *source_return_path_thread(void *opaque)
>              expected_len = 4;
>              break;
>  
> +        case MIG_RPCOMM_REQPAGES:
> +            /* 16 byte start/len _possibly_ plus an id str */
> +            expected_len = 16 + 256;
> +            break;
> +
>          default:
>              error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
>                      header_com, header_len);
> @@ -857,6 +907,30 @@ static void *source_return_path_thread(void *opaque)
>              atomic_xchg(&ms->rp_state.latest_ack, tmp32);
>              break;
>  
> +        case MIG_RPCOMM_REQPAGES:
> +            tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
> +            tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
> +            tmpstr = NULL;
> +            if (tmp64b & 1) {
> +                tmp64b -= 1; /* Remove the flag */
> +                /* Now we expect an idstr */
> +                tmp32 = buf[16]; /* Length of the following idstr */
> +                tmpstr = (char *)&buf[17];
> +                buf[17+tmp32] = '\0';
> +                expected_len = 16+1+tmp32;
> +            } else {
> +                expected_len = 16;
> +            }
> +            if (header_len != expected_len) {
> +                error_report("RP: Received ReqPage with length %d expecting %d",
> +                        header_len, expected_len);
> +                source_return_path_bad(ms);
> +            }
> +            migrate_handle_rp_reqpages(ms, tmpstr,
> +                                          (ram_addr_t)tmp64a,
> +                                          (ram_addr_t)tmp64b);
> +            break;
> +
>          default:
>              /* This shouldn't happen because we should catch this above */
>              DPRINTF("RP: Bad header_com in dispatch");

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
  2014-10-08  2:31   ` zhanghailiang
@ 2014-11-10  6:31   ` David Gibson
  2014-11-17 19:07     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-10  6:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 7789 bytes --]

On Fri, Oct 03, 2014 at 06:47:42PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> On receiving MIG_RPCOMM_REQPAGES look up the address and
> queue the page.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
>  include/migration/migration.h | 21 +++++++++++++++++
>  include/qemu/typedefs.h       |  3 ++-
>  migration.c                   | 34 +++++++++++++++++++++++++++-
>  4 files changed, 108 insertions(+), 2 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 4a03171..72f9e17 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>  }
>  
>  /*
> + * Queue the pages for transmission, e.g. a request from postcopy destination
> + *   ms: MigrationStatus in which the queue is held
> + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> + *   start: Offset from the start of the RAMBlock
> + *   len: Length (in bytes) to send
> + *   Return: 0 on success
> + */
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len)
> +{
> +    RAMBlock *ramblock;
> +
> +    if (!rbname) {
> +        /* Reuse last RAMBlock */
> +        ramblock = ms->last_req_rb;
> +
> +        if (!ramblock) {
> +            error_report("ram_save_queue_pages no previous block");
> +            return -1;

This should be an assert() shouldn't it?

> +        }
> +    } else {
> +        ramblock = ram_find_block(rbname);
> +
> +        if (!ramblock) {
> +            error_report("ram_save_queue_pages no block '%s'", rbname);
> +            return -1;
> +        }

And maybe this one too - I would have expected the rb names to have
already been validated on the source machine at this stage.

> +    }
> +    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
> +                    ramblock->idstr, start, len);
> +
> +    if (start+len > ramblock->length) {
> +        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> +                     __func__, start, len, ramblock->length);
> +        return -1;
> +    }
> +
> +    struct MigrationSrcPageRequest *new_entry =
> +        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +    new_entry->rb = ramblock;
> +    new_entry->offset = start;
> +    new_entry->len = len;
> +    ms->last_req_rb = ramblock;
> +
> +    qemu_mutex_lock(&ms->src_page_req_mutex);
> +    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +    return 0;
> +}
> +
> +/*
>   * ram_find_and_save_block: Finds a page to send and sends it to f
>   *
>   * Returns:  The number of bytes written.
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 5e0d30d..5bc01d5 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -102,6 +102,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
>  MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
>  void migration_incoming_state_destroy(void);
>  
> +/*
> + * An outstanding page request, on the source, having been received
> + * and queued
> + */
> +struct MigrationSrcPageRequest {
> +    RAMBlock *rb;
> +    hwaddr    offset;
> +    hwaddr    len;
> +
> +    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
> +};
> +
>  struct MigrationState
>  {
>      int64_t bandwidth_limit;
> @@ -138,6 +150,12 @@ struct MigrationState
>       * of the postcopy phase
>       */
>      unsigned long *sentmap;
> +
> +    /* Queue of outstanding page requests from the destination */
> +    QemuMutex src_page_req_mutex;
> +    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
> +    /* The RAMBlock used in the last src_page_request */
> +    RAMBlock *last_req_rb;
>  };
>  
>  void process_incoming_migration(QEMUFile *f);
> @@ -273,4 +291,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>                               ram_addr_t offset, size_t size,
>                               int *bytes_sent);
>  
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len);
> +
>  #endif
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 79f57c0..24c2207 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
>  typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUBH QEMUBH;
>  
> +typedef struct AdapterInfo AdapterInfo;
>  typedef struct AioContext AioContext;
>  
>  typedef struct Visitor Visitor;
> @@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
>  typedef struct PcGuestInfo PcGuestInfo;
>  typedef struct PostcopyPMI PostcopyPMI;
>  typedef struct Range Range;
> -typedef struct AdapterInfo AdapterInfo;
> +typedef struct RAMBlock RAMBlock;
>  
>  #endif /* QEMU_TYPEDEFS_H */
> diff --git a/migration.c b/migration.c
> index cfdaa52..63d7699 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -26,6 +26,8 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
>  
>  //#define DEBUG_MIGRATION
>  
> @@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
>  
>      migrate_fd_cleanup_src_rp(s);
>  
> +    /* This queue generally should be empty - but in the case of a failed
> +     * migration might have some droppings in.
> +     */
> +    struct MigrationSrcPageRequest *mspr, *next_mspr;
> +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> +        g_free(mspr);
> +    }
> +
>      if (s->file) {
>          trace_migrate_fd_cleanup();
>          qemu_mutex_unlock_iothread();
> @@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
>      s->state = MIG_STATE_SETUP;
>      trace_migrate_set_state(MIG_STATE_SETUP);
>  
> +    qemu_mutex_init(&s->src_page_req_mutex);
> +    QSIMPLEQ_INIT(&s->src_page_requests);
> +
>      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      return s;
>  }
> @@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
>  static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
>                                         ram_addr_t start, ram_addr_t len)
>  {
> -    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> +    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
> +            rbname, start, len);
> +
> +    /* Round everything up to our host page size */
> +    long our_host_ps = sysconf(_SC_PAGESIZE);
> +    if (start & (our_host_ps-1)) {
> +        long roundings = start & (our_host_ps-1);
> +        start -= roundings;
> +        len += roundings;
> +    }
> +    if (len & (our_host_ps-1)) {
> +        long roundings = len & (our_host_ps-1);
> +        len -= roundings;
> +        len += our_host_ps;
> +    }
> +
> +    if (ram_save_queue_pages(ms, rbname, start, len)) {
> +        source_return_path_bad(ms);
> +    }
>  }
>  
>  /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
  2014-10-04 18:04   ` Paolo Bonzini
@ 2014-11-11  1:13   ` David Gibson
  2015-01-14 20:13     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-11  1:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 6333 bytes --]

On Fri, Oct 03, 2014 at 06:47:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> 
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
> 
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 125 insertions(+), 24 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 72f9e17..a945990 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -331,6 +331,7 @@ static RAMBlock *last_seen_block;
>  /* This is the last block from where we have sent data */
>  static RAMBlock *last_sent_block;
>  static ram_addr_t last_offset;
> +static bool last_was_from_queue;
>  static unsigned long *migration_bitmap;
>  static uint64_t migration_dirty_pages;
>  static uint32_t last_version;
> @@ -460,6 +461,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
>      return ret;
>  }
>  
> +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
> +{
> +    bool ret;
> +    int nr = addr >> TARGET_PAGE_BITS;
> +
> +    ret = test_and_clear_bit(nr, migration_bitmap);
> +
> +    if (ret) {
> +        migration_dirty_pages--;
> +    }
> +    return ret;
> +}
> +
>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>      ram_addr_t addr;
> @@ -660,6 +674,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
>  }
>  
>  /*
> + * Unqueue a page from the queue fed by postcopy page requests
> + *
> + * Returns:   The RAMBlock* to transmit from (or NULL if the queue is empty)
> + *      ms:   MigrationState in
> + *  offset:   the byte offset within the RAMBlock for the start of the page
> + * bitoffset: global offset in the dirty/sent bitmaps
> + */
> +static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
> +                                       unsigned long *bitoffset)
> +{
> +    RAMBlock *result = NULL;
> +    qemu_mutex_lock(&ms->src_page_req_mutex);
> +    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> +        struct MigrationSrcPageRequest *entry =
> +                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
> +        result = entry->rb;
> +        *offset = entry->offset;
> +        *bitoffset = (entry->offset + entry->rb->offset) >> TARGET_PAGE_BITS;
> +
> +        if (entry->len > TARGET_PAGE_SIZE) {
> +            entry->len -= TARGET_PAGE_SIZE;
> +            entry->offset += TARGET_PAGE_SIZE;
> +        } else {
> +            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +            g_free(entry);
> +        }
> +    }
> +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +    return result;
> +}
> +
> +/*
>   * Queue the pages for transmission, e.g. a request from postcopy destination
>   *   ms: MigrationStatus in which the queue is held
>   *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> @@ -720,44 +767,97 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
>  {
> +    MigrationState *ms = migrate_get_current();
>      RAMBlock *block = last_seen_block;
> +    RAMBlock *tmpblock;
>      ram_addr_t offset = last_offset;
> +    ram_addr_t tmpoffset;
>      bool complete_round = false;
>      int bytes_sent = 0;
> -    MemoryRegion *mr;
>      unsigned long bitoffset;
> +    unsigned long hps = sysconf(_SC_PAGESIZE);
>  
> -    if (!block)
> +    if (!block) {
>          block = QTAILQ_FIRST(&ram_list.blocks);
> +        last_was_from_queue = false;
> +    }
>  
> -    while (true) {
> -        mr = block->mr;
> -        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
> -        if (complete_round && block == last_seen_block &&
> -            offset >= last_offset) {
> -            break;
> +    while (true) { /* Until we send a block or run out of stuff to send */
> +        tmpblock = NULL;
> +
> +        /*
> +         * Don't break host-page chunks up with queue items

Why does this matter?

> +         * so only unqueue if,
> +         *   a) The last item came from the queue anyway
> +         *   b) The last sent item was the last target-page in a host page
> +         */
> +        if (last_was_from_queue || (!last_sent_block) ||
> +            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
> +            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
>          }
> -        if (offset >= block->length) {
> -            offset = 0;
> -            block = QTAILQ_NEXT(block, next);
> -            if (!block) {
> -                block = QTAILQ_FIRST(&ram_list.blocks);
> -                complete_round = true;
> -                ram_bulk_stage = false;
> +
> +        if (tmpblock) {
> +            /* We've got a block from the postcopy queue */
> +            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
> +                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
> +            /* We're sending this page, and since it's postcopy nothing else
> +             * will dirty it, and we must make sure it doesn't get sent again.
> +             */
> +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {

Ugh.. that's kind of subtle.  I think it would be clearer if you work
in terms of a ram_addr_t throughout, rather than "bitoffset" whose
meaning is not terribly obvious.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
  2014-10-04 18:32   ` Paolo Bonzini
@ 2014-11-11  1:14   ` David Gibson
  1 sibling, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-11  1:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

On Fri, Oct 03, 2014 at 06:47:44PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> I've seen it go negative once during dev, it shouldn't
> happen.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2014-11-11  1:39   ` David Gibson
  2015-01-15 18:14     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-11  1:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 10431 bytes --]

On Fri, Oct 03, 2014 at 06:47:45PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the new remap_anon_pages syscall).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h    |   2 +
>  include/migration/postcopy-ram.h |  23 +++++++
>  postcopy-ram.c                   | 145 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 168 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 5bc01d5..58ac7bf 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -96,6 +96,8 @@ struct MigrationIncomingState {
>      QEMUFile *return_path;
>      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>      PostcopyPMI    postcopy_pmi;
> +    void          *postcopy_tmp_page;
> +    long           postcopy_place_skipped; /* Check for incorrect place ops */
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> index 413b670..0210491 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -80,4 +80,27 @@ void postcopy_discard_send_chunk(MigrationState *ms, PostcopyDiscardState *pds,
>  void postcopy_discard_send_finish(MigrationState *ms,
>                                    PostcopyDiscardState *pds);
>  
> +/*
> + * Place a zero'd page of memory at *host
> + * returns 0 on success
> + */
> +int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
> +                             long bitmap_offset);
> +
> +/*
> + * Place a page (from) at (host) efficiently
> + *    There are restrictions on how 'from' must be mapped, in general best
> + *    to use other postcopy_ routines to allocate.
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +                        long bitmap_offset);
> +
> +/*
> + * Allocate a page of memory that can be mapped at a later point in time
> + * using postcopy_place_page
> + * Returns: Pointer to allocated page
> + */
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset);
> +
>  #endif
> diff --git a/postcopy-ram.c b/postcopy-ram.c
> index 8b2a035..19d4b20 100644
> --- a/postcopy-ram.c
> +++ b/postcopy-ram.c
> @@ -229,7 +229,6 @@ static PostcopyPMIState postcopy_pmi_get_state_nolock(
>  }
>  
>  /* Retrieve the state of the given page */
> -__attribute__ (( unused )) /* Until later in patch series */
>  static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
>                                                 size_t bitmap_index)
>  {
> @@ -245,7 +244,6 @@ static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
>   * Set the page state to the given state if the previous state was as expected
>   * Return the actual previous state.
>   */
> -__attribute__ (( unused )) /* Until later in patch series */
>  static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
>                                             size_t bitmap_index,
>                                             PostcopyPMIState expected_state,
> @@ -464,6 +462,7 @@ static int cleanup_area(const char *block_name, void *host_addr,
>  int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>  {
>      postcopy_pmi_init(mis, ram_pages);
> +    mis->postcopy_place_skipped = -1;
>  
>      if (qemu_ram_foreach_block(init_area, mis)) {
>          return -1;
> @@ -482,6 +481,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> +    if (mis->postcopy_tmp_page) {
> +        munmap(mis->postcopy_tmp_page, getpagesize());
> +        mis->postcopy_tmp_page = NULL;
> +    }
>      return 0;
>  }
>  
> @@ -551,6 +554,126 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>      return 0;
>  }
>  
> +/*
> + * Place a zero'd page of memory at *host
> + * returns 0 on success
> + * bitmap_offset: Index into the migration bitmaps
> + */
> +int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
> +                             long bitmap_offset)
> +{
> +    void *tmp = postcopy_get_tmp_page(mis, bitmap_offset);
> +    if (!tmp) {
> +        return -ENOMEM;
> +    }
> +    *(char *)tmp = 0;
> +    return postcopy_place_page(mis, host, tmp, bitmap_offset);
> +}
> +
> +/*
> + * Place a target page (from) at (host) efficiently
> + *    There are restrictions on how 'from' must be mapped, in general best
> + *    to use other postcopy_ routines to allocate.
> + * returns 0 on success
> + * bitmap_offset: Index into the migration bitmaps
> + *
> + * Where HPS > TPS it holds off doing the place until the last TP in the HP
> + *  and assumes (from, host) point to the last TP in a continuous HP
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +                        long bitmap_offset)
> +{
> +    PostcopyPMIState old_state, tmp_state;
> +    size_t hps = sysconf(_SC_PAGESIZE);
> +
> +    /* Only place the page when the last target page within the hp arrives */
> +    if ((bitmap_offset + 1) & (mis->postcopy_pmi.host_bits - 1)) {
> +        DPRINTF("%s: Skipping incomplete hp host=%p from=%p bitmap_offset=%lx",
> +                __func__, host, from, bitmap_offset);
> +        mis->postcopy_place_skipped = bitmap_offset;
> +        return 0;
> +    }
> +
> +    /*
> +     * If we skip a page (above) we should end up placing that page before
> +     * doing anything with other host pages.
> +     */
> +    if (mis->postcopy_place_skipped != -1) {
> +        assert((bitmap_offset & ~(mis->postcopy_pmi.host_bits - 1)) ==
> +               (mis->postcopy_place_skipped &
> +                ~(mis->postcopy_pmi.host_bits - 1)));
> +    }
> +    mis->postcopy_place_skipped = -1;

All the above logic seems like you're making assumptions about exactly
how this function will be invoked which are fragile and a layering
violation.

It seems like these lower level functions should work only in host
pages and have the target->host page consolidation up in the protocol
handling layer.  Better yet would be to build it into the protocol
itself that reuqests made by the desination (in destination host page
chunks) should be answered by the source as a unit, to avoid the
hassle of splitting and recombining host pages.

> +    /* Adjust pointers to point to start of host page */
> +    host = (void *)((uintptr_t)host & ~(hps - 1));
> +    from = (void *)((uintptr_t)from & ~(hps - 1));
> +    bitmap_offset -= (mis->postcopy_pmi.host_bits - 1);
> +
> +    if (syscall(__NR_remap_anon_pages, host, from, hps, 0) !=
> +            getpagesize()) {
> +        perror("remap_anon_pages in postcopy_place_page");
> +        fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
> +                postcopy_pmi_get_state(mis, bitmap_offset));
> +
> +        return -errno;
> +    }
> +
> +    tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
> +    do {
> +        old_state = tmp_state;
> +        tmp_state = postcopy_pmi_change_state(mis, bitmap_offset, old_state,
> +                                              POSTCOPY_PMI_RECEIVED);
> +
> +    } while (old_state != tmp_state);
> +
> +
> +    if (old_state == POSTCOPY_PMI_REQUESTED) {
> +        /* TODO: Notify kernel */
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Returns a target page of memory that can be mapped at a later point in time
> + * using postcopy_place_page
> + * The same address is used repeatedly, postcopy_place_page just takes the
> + * backing page away.

The same address might be re-used, but I don't see anything that
actually makes that happen.

> + * Returns: Pointer to allocated page
> + *
> + * Note this is a target page and uses the bitmap_offset to get an offset
> + * into a hostpage; since there's only one real temporary host page the caller
> + * is expected to not flip around between pages.
> + */
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset)
> +{
> +    ptrdiff_t offset;
> +
> +    if (!mis->postcopy_tmp_page) {
> +        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
> +                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
> +                             MAP_ANONYMOUS, -1, 0);
> +        if (!mis->postcopy_tmp_page) {
> +            perror("mapping postcopy tmp page");
> +            return NULL;
> +        }
> +        if (madvise(mis->postcopy_tmp_page, getpagesize(), MADV_DONTFORK)) {
> +            munmap(mis->postcopy_tmp_page, getpagesize());
> +            perror("postcpy tmp page DONTFORK");
> +            return NULL;
> +        }
> +    }
> +
> +    /*
> +     * Get the offset within the host page based on bitmap_offset.
> +     */
> +    offset = (bitmap_offset & (mis->postcopy_pmi.host_bits - 1)) <<
> +                 qemu_target_page_bits();
> +
> +    return (void *)((uint8_t *)mis->postcopy_tmp_page + offset);
> +}
> +
>  #else
>  /* No target OS support, stubs just fail */
>  int postcopy_ram_hosttest(void)
> @@ -598,6 +721,24 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>  {
>      assert(0);
>  }
> +
> +int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
> +                             long bitmap_offset)
> +{
> +    assert(0);
> +}
> +
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +                        long bitmap_offset)
> +{
> +    assert(0);
> +}
> +
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset)
> +{
> +    assert(0);
> +}
> +
>  #endif
>  
>  /* ------------------------------------------------------------------------- */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2014-11-13  2:53   ` David Gibson
  2014-11-25 18:14     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  2:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 8060 bytes --]

On Fri, Oct 03, 2014 at 06:47:46PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
> 
> Use the helpers in postcopy-ram.c to map these pages.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 87 insertions(+), 9 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 2f4345a..0ba627b 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1458,9 +1458,20 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
>      return 0;
>  }
>  
> +/*
> + * Read a RAMBlock ID from the stream f, find the host address of the
> + * start of that block and add on 'offset'
> + *
> + * f: Stream to read from
> + * mis: MigrationIncomingState
> + * offset: Offset within the block
> + * flags: Page flags (mostly to see if it's a continuation of previous block)
> + * rb: Pointer to RAMBlock* that gets filled in with the RB we find
> + */
>  static inline void *host_from_stream_offset(QEMUFile *f,
> +                                            MigrationIncomingState *mis,
>                                              ram_addr_t offset,
> -                                            int flags)
> +                                            int flags, RAMBlock **rb)
>  {
>      static RAMBlock *block = NULL;
>      char id[256];
> @@ -1471,8 +1482,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>              error_report("Ack, bad migration stream!");
>              return NULL;
>          }
> +        if (rb) {
> +            *rb = block;
> +        }
>  
> -        return memory_region_get_ram_ptr(block->mr) + offset;
> +        goto gotit;

This is an ugly use of goto - it looks kind of like the exception
handling goto idiom, but it's not.  I think it would be nicer to make
the code fragment after gotit into a helper function.

>      }
>  
>      len = qemu_get_byte(f);
> @@ -1480,12 +1494,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>      id[len] = 0;
>  
>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> -        if (!strncmp(id, block->idstr, sizeof(id)))
> -            return memory_region_get_ram_ptr(block->mr) + offset;
> +        if (!strncmp(id, block->idstr, sizeof(id))) {
> +            if (rb) {
> +                *rb = block;
> +            }
> +            goto gotit;
> +        }
>      }
>  
>      error_report("Can't find block %s!", id);
>      return NULL;
> +
> +gotit:
> +    postcopy_hook_early_receive(mis,
> +        (offset + (*rb)->offset) >> TARGET_PAGE_BITS);
> +    return memory_region_get_ram_ptr(block->mr) + offset;
> +
>  }
>  
>  /*
> @@ -1515,6 +1539,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      ram_addr_t addr;
>      int flags, ret = 0;
>      static uint64_t seq_iter;
> +    /*
> +     * System is running in postcopy mode, page inserts to host memory must be
> +     * atomic
> +     */
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    bool postcopy_running = mis->postcopy_ram_state >=
> +                            POSTCOPY_RAM_INCOMING_LISTENING;
>  
>      seq_iter++;
>  
> @@ -1523,6 +1554,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      }
>  
>      while (!ret) {
> +        RAMBlock *rb = 0; /* =0 needed to silence compiler */
>          addr = qemu_get_be64(f);
>  
>          flags = addr & ~TARGET_PAGE_MASK;
> @@ -1570,7 +1602,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              void *host;
>              uint8_t ch;
>  
> -            host = host_from_stream_offset(f, addr, flags);
> +            host = host_from_stream_offset(f, mis, addr, flags, &rb);
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
> @@ -1578,20 +1610,66 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              }
>  
>              ch = qemu_get_byte(f);
> -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            if (!postcopy_running) {
> +                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            } else {
> +                if (!ch) {
> +                    ret = postcopy_place_zero_page(mis, host,
> +                              (addr + rb->offset) >> TARGET_PAGE_BITS);
> +                } else {
> +                    void *tmp;
> +                    tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
> +                                                      TARGET_PAGE_BITS);
> +
> +                    if (!tmp) {
> +                        return -ENOMEM;
> +                    }
> +                    memset(tmp, ch, TARGET_PAGE_SIZE);
> +                    ret = postcopy_place_page(mis, host, tmp,
> +                              (addr + rb->offset) >> TARGET_PAGE_BITS);
> +                }
> +                if (ret) {
> +                    error_report("ram_load: Failure in postcopy compress @"
> +                                 "%zx/%p;%s+%zx",
> +                                 addr, host, rb->idstr, rb->offset);
> +                    return ret;
> +                }
> +            }

Might be nicer to fold this logic into ram_handle_compressed(), since
there's no obvious reason it should not be used for the postcopy path.

>          } else if (flags & RAM_SAVE_FLAG_PAGE) {
>              void *host;
>  
> -            host = host_from_stream_offset(f, addr, flags);
> +            host = host_from_stream_offset(f, mis, addr, flags, &rb);
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
>                  break;
>              }
>  
> -            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> +            if (!postcopy_running) {
> +                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> +            } else {
> +                void *tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
> +                                                        TARGET_PAGE_BITS);
> +
> +                if (!tmp) {
> +                    return -ENOMEM;
> +                }
> +                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
> +                ret = postcopy_place_page(mis, host, tmp,
> +                          (addr + rb->offset) >> TARGET_PAGE_BITS);
> +                if (ret) {
> +                    error_report("ram_load: Failure in postcopy simple"
> +                                 "@%zx/%p;%s+%zx",
> +                                 addr, host, rb->idstr, rb->offset);
> +                    return ret;
> +                }
> +            }
>          } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
> -            void *host = host_from_stream_offset(f, addr, flags);
> +            if (postcopy_running) {
> +                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
> +                return -EINVAL;
> +            }

Hrm, there doesn't seem like an inherent reason XBZRLE shouldn't be
possible in postcopy.  Obviously a temporary buffer would be
necessary.

> +            void *host = host_from_stream_offset(f, mis, addr, flags, &rb);
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2014-11-13  2:59   ` David Gibson
  2014-11-25 18:55     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  2:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 5430 bytes --]

On Fri, Oct 03, 2014 at 06:47:47PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy sends RAMBlock names and offsets over the wire (since it can't
> rely on the order of ramaddr being the same), and it starts out with
> HVA fault addresses from the kernel.
> 
> qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
> in the RAMBlock, the global ram_addr_t value and it's bitmap position.

s/it's/its/

I find most of the passing around of bitmap positions confusing in
this patch series.  Would it make things simpler if you broke up the
bitmap into (aligned) per-ramblock chunks.  Then the offset would
determine the bitmap position, which is easier to understand since it
has an "inherent" meaning outside of the secondary data structure used
to track things.

> Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.
> 
> Provide qemu_ram_get_idstr since it's the actual name text sent on the
> wire.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c                    | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/exec/cpu-common.h |  4 ++++
>  2 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 65ee612..07722b3 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1246,6 +1246,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
>      return NULL;
>  }
>  
> +const char *qemu_ram_get_idstr(RAMBlock *rb)
> +{
> +    return rb->idstr;
> +}
> +
>  void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
>  {
>      RAMBlock *new_block = find_ram_block(addr);
> @@ -1603,16 +1608,35 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
>      }
>  }
>  
> -/* Some of the softmmu routines need to translate from a host pointer
> -   (typically a TLB entry) back to a ram offset.  */
> -MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
> +/*
> + * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
> + * in that RAMBlock.
> + *
> + * ptr: Host pointer to look up
> + * round_offset: If true round the result offset down to a page boundary
> + * *ram_addr: set to result ram_addr
> + * *offset: set to result offset within the RAMBlock
> + * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
> + *                          isn't available)
> + *
> + * Returns: RAMBlock (or NULL if not found)
> + */
> +RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> +                                   ram_addr_t *ram_addr,
> +                                   ram_addr_t *offset,
> +                                   unsigned long *bm_index)
>  {
>      RAMBlock *block;
>      uint8_t *host = ptr;
>  
>      if (xen_enabled()) {
>          *ram_addr = xen_ram_addr_from_mapcache(ptr);
> -        return qemu_get_ram_block(*ram_addr)->mr;
> +        block = qemu_get_ram_block(*ram_addr);
> +        if (!block) {
> +            return NULL;
> +        }
> +        *offset = (host - block->host);
> +        return block;
>      }
>  
>      block = ram_list.mru_block;
> @@ -1633,7 +1657,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
>      return NULL;
>  
>  found:
> -    *ram_addr = block->offset + (host - block->host);
> +    *offset = (host - block->host);
> +    if (round_offset) {
> +        *offset &= TARGET_PAGE_MASK;
> +    }

This seems clumsy.  Surely the caller can apply the mask itself it it
wants that.

> +    *ram_addr = block->offset + *offset;
> +    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
> +    return block;
> +}
> +
> +/* Some of the softmmu routines need to translate from a host pointer
> +   (typically a TLB entry) back to a ram offset.  */
> +MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
> +{
> +    RAMBlock *block;
> +    ram_addr_t offset; /* Not used */
> +    unsigned long index; /* Not used */
> +
> +    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset, &index);
> +
> +    if (!block) {
> +        return NULL;
> +    }
> +
>      return block->mr;
>  }
>  
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 8042f50..ae25407 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -55,8 +55,12 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
>  void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
>  /* This should not be used by devices.  */
>  MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
> +RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> +                                   ram_addr_t *ram_addr, ram_addr_t *offset,
> +                                   unsigned long *bm_index);
>  void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
>  void qemu_ram_unset_idstr(ram_addr_t addr);
> +const char *qemu_ram_get_idstr(RAMBlock *rb);
>  
>  void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
>                              int len, int is_write);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2014-11-13  3:01   ` David Gibson
  2014-11-25 16:25     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  3:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Fri, Oct 03, 2014 at 06:47:48PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
> 
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>      sent again, which the destination wouldn't allow (since it might
>      have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>      marked dirtied although the memory contents are apparently identical

It would be nice to understand how that happened.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Otherwise,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2014-11-13  3:10   ` David Gibson
  2014-12-17 18:21     ` Dr. David Alan Gilbert
  2015-01-27 10:20   ` Peter Maydell
  1 sibling, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  3:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 6934 bytes --]

On Fri, Oct 03, 2014 at 06:47:49PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Prior to the start of postcopy, ensure that everything that will
> be transferred later is a whole host-page in size.
> 
> This is accomplished by discarding partially transferred host pages
> and marking any that are partially dirty as fully dirty.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 1fe4fab..aac250c 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
>   * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
>   * messier for 64; the bitmaps are actually long's that are 32 or 64bit
>   */
> -__attribute__ (( unused )) /* Until later in patch series */
>  static void put_32bits_map(unsigned long *map, int64_t start,
>                             uint32_t v)
>  {
> @@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
>  }
>  
>  /*
> + * Utility for the outgoing postcopy code.
> + *
> + * Discard any partially sent host-page size chunks, mark any partially
> + * dirty host-page size chunks as all dirty.
> + *
> + * Returns: 0 on success
> + */
> +static int postcopy_chunk_hostpages(MigrationState *ms)
> +{
> +    struct RAMBlock *block;
> +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> +    uint32_t host_mask;
> +
> +    /* Should be a power of 2 */
> +    assert(host_bits && !(host_bits & (host_bits - 1)));
> +    /*
> +     * If the host_bits isn't a division of 32 (the minimum long size)
> +     * then the code gets a lot more complex; disallow for now
> +     * (I'm not aware of a system where it's true anyway)
> +     */
> +    assert((32 % host_bits) == 0);

This assert makes the first one redundant.

> +
> +    /* A mask, starting at bit 0, containing host_bits continuous set bits */
> +    host_mask =  (1u << host_bits) - 1;
> +
> +
> +    if (host_bits == 1) {
> +        /* Easy case - TPS==HPS - nothing to be done */
> +        return 0;
> +    }
> +
> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> +        unsigned long first32, last32, cur32;
> +        unsigned long first = block->offset >> TARGET_PAGE_BITS;
> +        unsigned long last = (block->offset + (block->length-1))
> +                                >> TARGET_PAGE_BITS;
> +        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
> +                                                               first & 31,
> +                                                               block->idstr);
> +
> +        first32 = first / 32;
> +        last32 = last / 32;
> +        for (cur32 = first32; cur32 <= last32; cur32++) {
> +            unsigned int current_hp;
> +            /* Deal with start/end not on alignment */
> +            uint32_t mask = make_32bit_mask(first, last, cur32);
> +
> +            /* a chunk of sent pages */
> +            uint32_t sdata = get_32bits_map(ms->sentmap, cur32 * 32);
> +            /* a chunk of dirty pages */
> +            uint32_t ddata = get_32bits_map(migration_bitmap, cur32 * 32);
> +            uint32_t discard = 0;
> +            uint32_t redirty = 0;
> +            sdata &= mask;
> +            ddata &= mask;
> +
> +            for (current_hp = 0; current_hp < 32; current_hp += host_bits) {
> +                uint32_t host_sent = (sdata >> current_hp) & host_mask;
> +                uint32_t host_dirty = (ddata >> current_hp) & host_mask;
> +
> +                if (host_sent && (host_sent != host_mask)) {
> +                    /* Partially sent host page */
> +                    redirty |= host_mask << current_hp;
> +                    discard |= host_mask << current_hp;
> +
> +                } else if (host_dirty && (host_dirty != host_mask)) {
> +                    /* Partially dirty host page */
> +                    redirty |= host_mask << current_hp;
> +                }
> +            }
> +            if (discard) {
> +                /* Tell the destination to discard these pages */
> +                postcopy_discard_send_chunk(ms, pds, (cur32-first32) * 32,
> +                                            discard);
> +                /* And clear them in the sent data structure */
> +                sdata = get_32bits_map(ms->sentmap, cur32 * 32);
> +                put_32bits_map(ms->sentmap, cur32 * 32, sdata & ~discard);
> +            }
> +            if (redirty) {
> +                /*
> +                 * Reread original dirty bits and OR in ones we clear; we
> +                 * must reread since we might be at the start or end of
> +                 * a RAMBlock that the original 'mask' discarded some
> +                 * bits from
> +                */
> +                ddata = get_32bits_map(migration_bitmap, cur32 * 32);
> +                put_32bits_map(migration_bitmap, cur32 * 32,
> +                           ddata | redirty);
> +                /* Inc the count of dirty pages */
> +                migration_dirty_pages += ctpop32(redirty - (ddata & redirty));
> +            }
> +        }
> +
> +        postcopy_discard_send_finish(ms, pds);
> +    }
> +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> +    last_seen_block = NULL;
> +    last_sent_block = NULL;
> +
> +    return 0;
> +}
> +
> +/*
>   * Transmit the set of pages to be discarded after precopy to the target
>   * these are pages that have been sent previously but have been dirtied
>   * Hopefully this is pretty sparse
>   */
>  int ram_postcopy_send_discard_bitmap(MigrationState *ms)
>  {
> +    int ret;
> +
>      /* This should be our last sync, the src is now paused */
>      migration_bitmap_sync();
>  
> +    /* Deal with TPS != HPS */
> +    ret = postcopy_chunk_hostpages(ms);
> +    if (ret) {
> +        return ret;
> +    }

This really seems like a bogus thing to be doing on the outgoing
migration side.  Doesn't the host page size constraint come from the
destination (due to the need to atomically instate pages).  Source
host page size == destination host page size doesn't seem like it
should be an inherent constraint, and it's not clear why you can't do
this rounding out to host page sized chunks on the receive end.

>      /*
>       * Update the sentmap to be  sentmap&=dirty
>       */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2014-11-13  3:23   ` David Gibson
  2015-01-05 17:13     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  3:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 10271 bytes --]

On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> userfaultfd is a Linux syscall that gives an fd that receives a stream
> of notifications of accesses to pages marked as MADV_USERFAULT, and
> allows the program to acknowledge those stalls and tell the accessing
> thread to carry on.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

[snip]
>  /*
> + * Tell the kernel that we've now got some memory it previously asked for.
> + * Note: We're not allowed to ack a page which wasn't requested.
> + */
> +static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
> +{
> +    uint64_t tmp[2];
> +
> +    /*
> +     * Kernel wants the range that's now safe to access
> +     * Note it always takes 64bit values, even on a 32bit host.
> +     */
> +    tmp[0] = (uint64_t)(uintptr_t)start;
> +    tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
> +
> +    if (write(mis->userfault_fd, tmp, 16) != 16) {
> +        int e = errno;

Is an EOF (i.e. write() returns 0) ever possible here?  If so errno
may not have a meaningful value.

> +        if (e == ENOENT) {
> +            /* Kernel said it wasn't waiting - one case where this can
> +             * happen is where two threads triggered the userfault
> +             * and we receive the page and ack it just after we received
> +             * the 2nd request and that ends up deciding it should ack it
> +             * We could optimise it out, but it's rare.
> +             */
> +            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
> +            return 0;
> +        }
> +        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
> +                     start, len, e);
> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
>   * Handle faults detected by the USERFAULT markings
>   */
>  static void *postcopy_ram_fault_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> +    void *hostaddr;
> +    int ret;
> +    size_t hostpagesize = getpagesize();
> +    RAMBlock *rb = NULL;
> +    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
>  
> -    fprintf(stderr, "postcopy_ram_fault_thread\n");
> -    /* TODO: In later patch */
> +    DPRINTF("%s", __func__);
>      qemu_sem_post(&mis->fault_thread_sem);
> -    while (1) {
> -        /* TODO: In later patch */
> -    }
> +    while (true) {
> +        PostcopyPMIState old_state, tmp_state;
> +        ram_addr_t rb_offset;
> +        ram_addr_t in_raspace;
> +        unsigned long bitmap_index;
> +        struct pollfd pfd[2];
> +
> +        /*
> +         * We're mainly waiting for the kernel to give us a faulting HVA,
> +         * however we can be told to quit via userfault_quit_fd which is
> +         * an eventfd
> +         */
> +        pfd[0].fd = mis->userfault_fd;
> +        pfd[0].events = POLLIN;
> +        pfd[0].revents = 0;
> +        pfd[1].fd = mis->userfault_quit_fd;
> +        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> +        pfd[1].revents = 0;
> +
> +        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> +            perror("userfault poll");
> +            break;
> +        }
>  
> +        if (pfd[1].revents) {
> +            DPRINTF("%s got quit event", __func__);
> +            break;

I don't see any cleanup path in the userfault thread.  So wouldn't it
be simpler to just pthread_cancel() it rather than using an extra fd
for quit notifications.

> +        }
> +
> +        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
> +        if (ret != sizeof(hostaddr)) {
> +            if (ret < 0) {
> +                perror("Failed to read full userfault hostaddr");
> +                break;
> +            } else {
> +                error_report("%s: Read %d bytes from userfaultfd expected %zd",
> +                             __func__, ret, sizeof(hostaddr));
> +                break; /* Lost alignment, don't know what we'd read next */
> +            }
> +        }
> +
> +        rb = qemu_ram_block_from_host(hostaddr, true, &in_raspace, &rb_offset,
> +                                      &bitmap_index);
> +        if (!rb) {
> +            error_report("postcopy_ram_fault_thread: Fault outside guest: %p",
> +                         hostaddr);
> +            break;
> +        }
> +
> +        DPRINTF("%s: Request for HVA=%p index=%lx rb=%s offset=%zx",
> +                __func__, hostaddr, bitmap_index, qemu_ram_get_idstr(rb),
> +                rb_offset);
> +
> +        tmp_state = postcopy_pmi_get_state(mis, bitmap_index);
> +        do {
> +            old_state = tmp_state;
> +
> +            switch (old_state) {
> +            case POSTCOPY_PMI_REQUESTED:
> +                /* Do nothing - it's already requested */
> +                break;
> +
> +            case POSTCOPY_PMI_RECEIVED:
> +                /* Already arrived - no state change, just kick the kernel */
> +                DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
> +                        hostaddr);
> +                if (ack_userfault(mis,
> +                                  (void *)((uintptr_t)hostaddr
> +                                           & ~(hostpagesize - 1)),
> +                                  hostpagesize)) {
> +                    assert(0);
> +                }
> +                break;
> +
> +            case POSTCOPY_PMI_MISSING:
> +
> +                tmp_state = postcopy_pmi_change_state(mis, bitmap_index,
> +                                           old_state, POSTCOPY_PMI_REQUESTED);
> +                if (tmp_state == POSTCOPY_PMI_MISSING) {
> +                    /*
> +                     * Send the request to the source - we want to request one
> +                     * of our host page sizes (which is >= TPS)
> +                     */
> +                    if (rb != last_rb) {
> +                        last_rb = rb;
> +                        migrate_send_rp_reqpages(mis, qemu_ram_get_idstr(rb),
> +                                                 rb_offset, hostpagesize);
> +                    } else {
> +                        /* Save some space */
> +                        migrate_send_rp_reqpages(mis, NULL,
> +                                                 rb_offset, hostpagesize);
> +                    }
> +                }
> +                break;
> +           }
> +        } while (tmp_state != old_state);
> +    }
> +    DPRINTF("%s: exit", __func__);
>      return NULL;
>  }
>  
>  int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>  {
> -    /* Create the fault handler thread and wait for it to be ready */
> +    uint64_t tmp64;
> +
> +    /* Open the fd for the kernel to give us userfaults */
> +    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +    if (mis->userfault_fd == -1) {
> +        perror("Failed to open userfault fd");
> +        return -1;
> +    }
> +
> +    /*
> +     * Version handshake, we send it the version we want and expect to get the
> +     * same back.
> +     */
> +    tmp64 = USERFAULTFD_PROTOCOL;
> +    if (write(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
> +        perror("Writing userfaultfd version");
> +        close(mis->userfault_fd);
> +        return -1;
> +    }
> +    if (read(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
> +        perror("Reading userfaultfd version");
> +        close(mis->userfault_fd);
> +        return -1;
> +    }
> +    if (tmp64 != USERFAULTFD_PROTOCOL) {
> +        error_report("Mismatched userfaultfd version, expected %zx, got %zx",
> +                     (size_t)USERFAULTFD_PROTOCOL, (size_t)tmp64);
> +        close(mis->userfault_fd);
> +        return -1;
> +    }
> +
> +    /* Now an eventfd we use to tell the fault-thread to quit */
> +    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
> +    if (mis->userfault_quit_fd == -1) {
> +        perror("Opening userfault_quit_fd");
> +        close(mis->userfault_fd);
> +        return -1;
> +    }
> +
>      qemu_sem_init(&mis->fault_thread_sem, 0);
>      qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>                         postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
>      qemu_sem_wait(&mis->fault_thread_sem);
> +    mis->have_fault_thread = true;
>  
>      /* Mark so that we get notified of accesses to unwritten areas */
>      if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
>          return -1;
>      }
>  
> +    DPRINTF("postcopy_ram_enable_notify: Sensitised");
> +
>      return 0;
>  }
>  
> @@ -612,11 +814,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>      if (syscall(__NR_remap_anon_pages, host, from, hps, 0) !=
>              getpagesize()) {
> +        int e = errno;
>          perror("remap_anon_pages in postcopy_place_page");
>          fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
>                  postcopy_pmi_get_state(mis, bitmap_offset));
>  
> -        return -errno;
> +        return -e;

Unrelated change, should probably be folded into the patch which added
this code.

>      }
>  
>      tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
> @@ -629,7 +832,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>  
>      if (old_state == POSTCOPY_PMI_REQUESTED) {
> -        /* TODO: Notify kernel */
> +        /* Send the kernel the host address that should now be accessible */
> +        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
> +                __func__, bitmap_offset, host);
> +        return ack_userfault(mis, host, hps);
>      }
>  
>      return 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2014-11-13  3:29   ` David Gibson
  2014-11-19 19:40     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-13  3:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]

On Fri, Oct 03, 2014 at 06:47:51PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The loading of a device state (during postcopy) may access guest
> memory that's still on the source machine and thus might need
> a page fill; split off a separate thread that handles the incoming
> page data so that the original incoming migration code can finish
> off the device data.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  4 +++
>  migration.c                   |  6 +++++
>  savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 00255b8..69e776c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -92,6 +92,10 @@ struct MigrationIncomingState {
>      QemuThread     fault_thread;
>      QemuSemaphore  fault_thread_sem;
>  
> +    bool           have_listen_thread;

AFAICT have_listen_thread is never set to a value other than 'true',
so there doesn't see much point to it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-11-10  6:31   ` David Gibson
@ 2014-11-17 19:07     ` Dr. David Alan Gilbert
  2014-11-18  4:38       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-17 19:07 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:42PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > On receiving MIG_RPCOMM_REQPAGES look up the address and
> > queue the page.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
> >  include/migration/migration.h | 21 +++++++++++++++++
> >  include/qemu/typedefs.h       |  3 ++-
> >  migration.c                   | 34 +++++++++++++++++++++++++++-
> >  4 files changed, 108 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 4a03171..72f9e17 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
> >  }
> >  
> >  /*
> > + * Queue the pages for transmission, e.g. a request from postcopy destination
> > + *   ms: MigrationStatus in which the queue is held
> > + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> > + *   start: Offset from the start of the RAMBlock
> > + *   len: Length (in bytes) to send
> > + *   Return: 0 on success
> > + */
> > +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> > +                         ram_addr_t start, ram_addr_t len)
> > +{
> > +    RAMBlock *ramblock;
> > +
> > +    if (!rbname) {
> > +        /* Reuse last RAMBlock */
> > +        ramblock = ms->last_req_rb;
> > +
> > +        if (!ramblock) {
> > +            error_report("ram_save_queue_pages no previous block");
> > +            return -1;
> 
> This should be an assert() shouldn't it?
> 
> > +        }
> > +    } else {
> > +        ramblock = ram_find_block(rbname);
> > +
> > +        if (!ramblock) {
> > +            error_report("ram_save_queue_pages no block '%s'", rbname);
> > +            return -1;
> > +        }
> 
> And maybe this one too - I would have expected the rb names to have
> already been validated on the source machine at this stage.

No to both:
I've been trying to avoid asserts in migration outgoing code, because
they shouldn't affect the state of your guest, so there's no reason
to kill off what might still be a viable running guest just because
migration failed.

(On the incoming side it's a bit more OK since if you've not got
a full running VM anyway yet then there's not much to lose).

Dave


> 
> > +    }
> > +    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
> > +                    ramblock->idstr, start, len);
> > +
> > +    if (start+len > ramblock->length) {
> > +        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> > +                     __func__, start, len, ramblock->length);
> > +        return -1;
> > +    }
> > +
> > +    struct MigrationSrcPageRequest *new_entry =
> > +        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> > +    new_entry->rb = ramblock;
> > +    new_entry->offset = start;
> > +    new_entry->len = len;
> > +    ms->last_req_rb = ramblock;
> > +
> > +    qemu_mutex_lock(&ms->src_page_req_mutex);
> > +    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> > +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> > +
> > +    return 0;
> > +}
> > +
> > +/*
> >   * ram_find_and_save_block: Finds a page to send and sends it to f
> >   *
> >   * Returns:  The number of bytes written.
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 5e0d30d..5bc01d5 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -102,6 +102,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
> >  MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
> >  void migration_incoming_state_destroy(void);
> >  
> > +/*
> > + * An outstanding page request, on the source, having been received
> > + * and queued
> > + */
> > +struct MigrationSrcPageRequest {
> > +    RAMBlock *rb;
> > +    hwaddr    offset;
> > +    hwaddr    len;
> > +
> > +    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
> > +};
> > +
> >  struct MigrationState
> >  {
> >      int64_t bandwidth_limit;
> > @@ -138,6 +150,12 @@ struct MigrationState
> >       * of the postcopy phase
> >       */
> >      unsigned long *sentmap;
> > +
> > +    /* Queue of outstanding page requests from the destination */
> > +    QemuMutex src_page_req_mutex;
> > +    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
> > +    /* The RAMBlock used in the last src_page_request */
> > +    RAMBlock *last_req_rb;
> >  };
> >  
> >  void process_incoming_migration(QEMUFile *f);
> > @@ -273,4 +291,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
> >                               ram_addr_t offset, size_t size,
> >                               int *bytes_sent);
> >  
> > +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> > +                         ram_addr_t start, ram_addr_t len);
> > +
> >  #endif
> > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> > index 79f57c0..24c2207 100644
> > --- a/include/qemu/typedefs.h
> > +++ b/include/qemu/typedefs.h
> > @@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
> >  typedef struct QEMUFile QEMUFile;
> >  typedef struct QEMUBH QEMUBH;
> >  
> > +typedef struct AdapterInfo AdapterInfo;
> >  typedef struct AioContext AioContext;
> >  
> >  typedef struct Visitor Visitor;
> > @@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
> >  typedef struct PcGuestInfo PcGuestInfo;
> >  typedef struct PostcopyPMI PostcopyPMI;
> >  typedef struct Range Range;
> > -typedef struct AdapterInfo AdapterInfo;
> > +typedef struct RAMBlock RAMBlock;
> >  
> >  #endif /* QEMU_TYPEDEFS_H */
> > diff --git a/migration.c b/migration.c
> > index cfdaa52..63d7699 100644
> > --- a/migration.c
> > +++ b/migration.c
> > @@ -26,6 +26,8 @@
> >  #include "qemu/thread.h"
> >  #include "qmp-commands.h"
> >  #include "trace.h"
> > +#include "exec/memory.h"
> > +#include "exec/address-spaces.h"
> >  
> >  //#define DEBUG_MIGRATION
> >  
> > @@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque)
> >  
> >      migrate_fd_cleanup_src_rp(s);
> >  
> > +    /* This queue generally should be empty - but in the case of a failed
> > +     * migration might have some droppings in.
> > +     */
> > +    struct MigrationSrcPageRequest *mspr, *next_mspr;
> > +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> > +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> > +        g_free(mspr);
> > +    }
> > +
> >      if (s->file) {
> >          trace_migrate_fd_cleanup();
> >          qemu_mutex_unlock_iothread();
> > @@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams *params)
> >      s->state = MIG_STATE_SETUP;
> >      trace_migrate_set_state(MIG_STATE_SETUP);
> >  
> > +    qemu_mutex_init(&s->src_page_req_mutex);
> > +    QSIMPLEQ_INIT(&s->src_page_requests);
> > +
> >      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      return s;
> >  }
> > @@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s)
> >  static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> >                                         ram_addr_t start, ram_addr_t len)
> >  {
> > -    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> > +    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
> > +            rbname, start, len);
> > +
> > +    /* Round everything up to our host page size */
> > +    long our_host_ps = sysconf(_SC_PAGESIZE);
> > +    if (start & (our_host_ps-1)) {
> > +        long roundings = start & (our_host_ps-1);
> > +        start -= roundings;
> > +        len += roundings;
> > +    }
> > +    if (len & (our_host_ps-1)) {
> > +        long roundings = len & (our_host_ps-1);
> > +        len -= roundings;
> > +        len += our_host_ps;
> > +    }
> > +
> > +    if (ram_save_queue_pages(ms, rbname, start, len)) {
> > +        source_return_path_bad(ms);
> > +    }
> >  }
> >  
> >  /*
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-11-03 13:22     ` Dr. David Alan Gilbert
@ 2014-11-18  3:52       ` David Gibson
  2014-11-19 17:06         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-18  3:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]

On Mon, Nov 03, 2014 at 01:22:45PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:22PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Open a return path, and handle messages that are received upon it.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > [snip]
> > > @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
> > >      int old_state ;
> > >      trace_migrate_fd_cancel();
> > >  
> > > +    if (s->return_path) {
> > > +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> > > +        qemu_file_shutdown(s->return_path);
> > 
> > Terminating the rp thread via shutting down its file seems roundabout,
> > and kind of dependent on the socket file implementation.
> 
> The rp thread might be in the middle of a blocking read()/recv()
> so I'm doing a shutdown() to cause those to exit; once I have to do that
> anyway it didn't seem necessary to add anything etra.

Hm.  I don't recall, does the rp thread need to do some cleanup at
this point?  Otherwise pthread_cancel() should kill a thread, even if
it's blocked at the moment.

> > [snip]
> > > +__attribute__ (( unused )) /* Until later in patch series */
> > > +static int open_outgoing_return_path(MigrationState *ms)
> > > +{
> > > +
> > > +    ms->return_path = qemu_file_get_return_path(ms->file);
> > 
> > So, another reason this get_return_path abstraction doesn't seem right
> > to me, is that it's not obvious that for non-socket file types, the
> > source and destination side "get return path" operations would
> > necessarily be the same.
> 
> However, since the implementation of the get_return_path is a method
> on the particular implementation, and it can be different for a 
> qemu_file opened for read or write, then that non-socket file type
> could implement it how it likes including something like shutdown).

So, I'm a little less bothered by this since I realised that QemuFile
is basically only used for migration streams, not for other file type
operations.  The fact that that makes QemuFile a really bad name is a
different matter.

The return path operation is quite specific to a migration stream, and
doesn't really belong with a "file" abstraction.

The case I've been considering where it's not easy to see how to
abstract this is that of a pipe - in that case it will be necessary to
open a second pipe from destination to source, which probably needs
some preliminary work when first opening the connection, and therefore
can't easily be encapsulated into a "get return path" callback.

The abstraction of the shutdown is another question again - I can't
think of any other file type which has an operation similar in effect
to shutdown(), so it seems really socket specific. Which is another
reason I'm not convinced telling the rp thread to die via its stream
is a good idea.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration
  2014-11-03 13:53     ` Cristian Klein
@ 2014-11-18  3:53       ` David Gibson
  2014-11-19 17:27         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-18  3:53 UTC (permalink / raw)
  To: Cristian Klein
  Cc: Andrea Arcangeli, yamahata, lilei, quintela,
	Dr. David Alan Gilbert, qemu-devel, amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]

On Mon, Nov 03, 2014 at 03:53:03PM +0200, Cristian Klein wrote:
> On 03 Nov 2014, at 5:12 , David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Oct 03, 2014 at 06:47:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> >> From: Cristian Klein <cristian.klein@cs.umu.se>
> > 
> > This patch really, really requires a rationale in the commit message.
> > The reason it's necessary is certainly not obvious.
> 
> “”"
> libvirt prefers opening the TCP connection itself, for two reasons. First, connection failed errors can be detected easier, without having to parse qemu’s error output. Second, libvirt might be asked to secure the transfer by tunnelling the communication through an TLS layer. Therefore, libvirt opens the TCP connection itself and passes an FD to qemu using QMP and a POSIX-specific mechanism. Hence, in order to make the reverse-path work in such cases, qemu needs to distinguish if the transmitted FD is a socket (reverse-path available) or not (reverse-path might not be available) and use the corresponding abstraction.
> “”"
> 
> If the above message is clarifies the purpose of this commit, feel
> free to add it in the next version of the patch.

That would help, yes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-11-03 18:59     ` Dr. David Alan Gilbert
@ 2014-11-18  3:54       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-18  3:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]

On Mon, Nov 03, 2014 at 06:59:35PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:17PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > The return path uses a non-blocking fd so as not to block waiting
> > > for the (possibly broken) destination to finish returning a message,
> > > however we still want outbound data to behave in the same way and block.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 35 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/qemu-file.c b/qemu-file.c
> > > index 7393415..57eabd8 100644
> > > --- a/qemu-file.c
> > > +++ b/qemu-file.c
> > > @@ -85,12 +85,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
> > >      QEMUFileSocket *s = opaque;
> > >      ssize_t len;
> > >      ssize_t size = iov_size(iov, iovcnt);
> > > +    ssize_t offset = 0;
> > > +    int     err;
> > >  
> > > -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> > > -    if (len < size) {
> > > -        len = -socket_error();
> > > +    while (size > 0) {
> > > +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> > > +
> > > +        if (len > 0) {
> > > +            size -= len;
> > > +            offset += len;
> > > +        }
> > > +
> > > +        if (size > 0) {
> > > +            err = socket_error();
> > > +
> > > +            if (err != EAGAIN) {
> > > +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> > > +                             err, size, len);
> > > +                /*
> > > +                 * If I've already sent some but only just got the error, I
> > > +                 * could return the amount validly sent so far and wait for the
> > > +                 * next call to report the error, but I'd rather flag the error
> > > +                 * immediately.
> > 
> > Is that safe?  This gives the caller no means to detect a partially
> > completed send.
> 
> Well I'm returning the -err, so the caller knows something has gone wrong - it just
> doesn't know whether it managed to send some part of the data before
> the failure.

Right.  Which seems like it could be pretty important.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets
  2014-11-03 19:04     ` Dr. David Alan Gilbert
@ 2014-11-18  4:34       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-18  4:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

On Mon, Nov 03, 2014 at 07:04:48PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Postcopy needs a method to send messages from the destination back to
> > > the source, this is the 'return path'.
> > > 
> > > Wire it up for 'socket' QEMUFile's using a dup'd fd.
> > 
> > This doesn't seem like the right abstraction to me.  In particular I
> > can't really see how you'd implement this for anything other than
> > socket.
> > 
> > I'd suggest instead creating new "open" helper functions (within the
> > QEMUFile code) that open both a forward and return path
> > simultaneously.
> 
> Can you give an example of a transport where it would be a problem,
> so I can look at how that works?

pipe

socket routed through some external transport / encoding layer that's
one way only.

> It's a little tricky since, on the destination, at the time we create
> the connection we don't know that we're going to need the return path.

Creating it and not using it shouldn't be a problem though, should it?
As long as you just fall back to precopy rather than failing the
migration if you're unable to open it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-11-04 10:19     ` Dr. David Alan Gilbert
@ 2014-11-18  4:36       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-18  4:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

On Tue, Nov 04, 2014 at 10:19:15AM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:28PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
> > > of migration stream to be sent in one go, and be received by
> > > a separate instance of the loadvm loop while not interacting
> > > with the migration stream.
> > > 
> > > This is used by postcopy to load device state (from the package)
> > > while loading memory pages from the main stream.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > Though one minor comment:
> > 
> > [snip]
> > > +/* We have a buffer of data to send; we don't want that all to be loaded
> > > + * by the command itself, so the command contains just the length of the
> > > + * extra buffer that we then send straight after it.
> > > + * TODO: Must be a better way to organise that
> > 
> > I'm not quite understanding what that comment's getting at.
> 
> We have these VM Commands; and they are a command type, and a length:
>      CMD_whatever
>      length: whatever
>      data for whatever
> 
> This comment is describing that, to make things easier for this code it's
> ended up as:
> 
>      CMD_PACKAGED
>      CMD length: 4    <--- i.e. just enough to hold the next 'length' field
>      package length
>     ---------------
>     The package
> 
> Which is a little different, hence i thought it needed the comment.

Ah.. right.  That seems.. gratuitously easy to get wrong further down
the track.  Why not just use the cmd length as the package length.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-11-17 19:07     ` Dr. David Alan Gilbert
@ 2014-11-18  4:38       ` David Gibson
  2014-11-19 19:37         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-18  4:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2991 bytes --]

On Mon, Nov 17, 2014 at 07:07:33PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:42PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > On receiving MIG_RPCOMM_REQPAGES look up the address and
> > > queue the page.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
> > >  include/migration/migration.h | 21 +++++++++++++++++
> > >  include/qemu/typedefs.h       |  3 ++-
> > >  migration.c                   | 34 +++++++++++++++++++++++++++-
> > >  4 files changed, 108 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch_init.c b/arch_init.c
> > > index 4a03171..72f9e17 100644
> > > --- a/arch_init.c
> > > +++ b/arch_init.c
> > > @@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
> > >  }
> > >  
> > >  /*
> > > + * Queue the pages for transmission, e.g. a request from postcopy destination
> > > + *   ms: MigrationStatus in which the queue is held
> > > + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> > > + *   start: Offset from the start of the RAMBlock
> > > + *   len: Length (in bytes) to send
> > > + *   Return: 0 on success
> > > + */
> > > +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> > > +                         ram_addr_t start, ram_addr_t len)
> > > +{
> > > +    RAMBlock *ramblock;
> > > +
> > > +    if (!rbname) {
> > > +        /* Reuse last RAMBlock */
> > > +        ramblock = ms->last_req_rb;
> > > +
> > > +        if (!ramblock) {
> > > +            error_report("ram_save_queue_pages no previous block");
> > > +            return -1;
> > 
> > This should be an assert() shouldn't it?
> > 
> > > +        }
> > > +    } else {
> > > +        ramblock = ram_find_block(rbname);
> > > +
> > > +        if (!ramblock) {
> > > +            error_report("ram_save_queue_pages no block '%s'", rbname);
> > > +            return -1;
> > > +        }
> > 
> > And maybe this one too - I would have expected the rb names to have
> > already been validated on the source machine at this stage.
> 
> No to both:
> I've been trying to avoid asserts in migration outgoing code, because
> they shouldn't affect the state of your guest, so there's no reason
> to kill off what might still be a viable running guest just because
> migration failed.

Ah, ok, that makes sense.  Maybe adding something to the error message
or a nearby comment indicating that if these happen it's certainly a
bug, not the result of some external problem?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-11-18  3:52       ` David Gibson
@ 2014-11-19 17:06         ` Dr. David Alan Gilbert
  2014-11-19 21:12           ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 17:06 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Mon, Nov 03, 2014 at 01:22:45PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:22PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Open a return path, and handle messages that are received upon it.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > 
> > > [snip]
> > > > @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
> > > >      int old_state ;
> > > >      trace_migrate_fd_cancel();
> > > >  
> > > > +    if (s->return_path) {
> > > > +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> > > > +        qemu_file_shutdown(s->return_path);
> > > 
> > > Terminating the rp thread via shutting down its file seems roundabout,
> > > and kind of dependent on the socket file implementation.
> > 
> > The rp thread might be in the middle of a blocking read()/recv()
> > so I'm doing a shutdown() to cause those to exit; once I have to do that
> > anyway it didn't seem necessary to add anything etra.
> 
> Hm.  I don't recall, does the rp thread need to do some cleanup at
> this point?  Otherwise pthread_cancel() should kill a thread, even if
> it's blocked at the moment.

It was Paolo's idea to use shutdown() and I agree - it works well;
I'd originally thought about using pthread_cancel but it seemed to be
generally disliked - you have to be very careful to either know exactly
the points at which it might be killed (if you use the deferred version)
or be prepared to deal with your thread disappearing at any time and
ensure your data structures are always consistent.  In addition there
was some concern that there was no Windows equivalent to pthread_cancel.

> > > [snip]
> > > > +__attribute__ (( unused )) /* Until later in patch series */
> > > > +static int open_outgoing_return_path(MigrationState *ms)
> > > > +{
> > > > +
> > > > +    ms->return_path = qemu_file_get_return_path(ms->file);
> > > 
> > > So, another reason this get_return_path abstraction doesn't seem right
> > > to me, is that it's not obvious that for non-socket file types, the
> > > source and destination side "get return path" operations would
> > > necessarily be the same.
> > 
> > However, since the implementation of the get_return_path is a method
> > on the particular implementation, and it can be different for a 
> > qemu_file opened for read or write, then that non-socket file type
> > could implement it how it likes including something like shutdown).
> 
> So, I'm a little less bothered by this since I realised that QemuFile
> is basically only used for migration streams, not for other file type
> operations.  The fact that that makes QemuFile a really bad name is a
> different matter.

Yes, but hey we've got FILE* in C anyway, so it might be bad, but it's
not inconsitent.

> The return path operation is quite specific to a migration stream, and
> doesn't really belong with a "file" abstraction.

I think the bit that's specific, is as you say that I don't know whether
I need it until later.

> The case I've been considering where it's not easy to see how to
> abstract this is that of a pipe - in that case it will be necessary to
> open a second pipe from destination to source, which probably needs
> some preliminary work when first opening the connection, and therefore
> can't easily be encapsulated into a "get return path" callback.

I'm OK with some transports not supporting this; I check for it and
error out.  At a higher level I do send an 'open_return_path' command
from src->dest early on to say I'm going to want a return path, I guess
a pipe might be able to open that fd then and pass it back over the original
fd? But that might be hairy.

> The abstraction of the shutdown is another question again - I can't
> think of any other file type which has an operation similar in effect
> to shutdown(), so it seems really socket specific. Which is another
> reason I'm not convinced telling the rp thread to die via its stream
> is a good idea.

I'd be OK with setting some flag or similar at the same time if that
would help; but I don't think there's a safe posix'y way of killing a
thread that might be stuck in a recv()/read() other than shutdown().

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration
  2014-11-18  3:53       ` David Gibson
@ 2014-11-19 17:27         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 17:27 UTC (permalink / raw)
  To: David Gibson
  Cc: Andrea Arcangeli, yamahata, quintela, Cristian Klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Mon, Nov 03, 2014 at 03:53:03PM +0200, Cristian Klein wrote:
> > On 03 Nov 2014, at 5:12 , David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Fri, Oct 03, 2014 at 06:47:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > >> From: Cristian Klein <cristian.klein@cs.umu.se>
> > > 
> > > This patch really, really requires a rationale in the commit message.
> > > The reason it's necessary is certainly not obvious.
> > 
> > ??????"
> > libvirt prefers opening the TCP connection itself, for two reasons. First, connection failed errors can be detected easier, without having to parse qemu???s error output. Second, libvirt might be asked to secure the transfer by tunnelling the communication through an TLS layer. Therefore, libvirt opens the TCP connection itself and passes an FD to qemu using QMP and a POSIX-specific mechanism. Hence, in order to make the reverse-path work in such cases, qemu needs to distinguish if the transmitted FD is a socket (reverse-path available) or not (reverse-path might not be available) and use the corresponding abstraction.
> > ??????"
> > 
> > If the above message is clarifies the purpose of this commit, feel
> > free to add it in the next version of the patch.
> 
> That would help, yes.

I've added that text into the commit message.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-11-03  3:58   ` David Gibson
@ 2014-11-19 17:35     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 17:35 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:24PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Misses out lines that are all the expected value so the output
> > can be quite compact depending on the circumstance.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
> >  include/migration/migration.h |  1 +
> >  2 files changed, 40 insertions(+)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 772de36..6970733 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -769,6 +769,45 @@ static void reset_ram_globals(void)
> >  
> >  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> >  
> > +/*
> > + * 'expected' is the value you expect the bitmap mostly to be full
> > + * of and it won't bother printing lines that are all this value
> > + * if 'todump' is null the migration bitmap is dumped.
> > + */
> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> > +{
> > +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> > +
> > +    int64_t cur;
> > +    int64_t linelen = 128l;
> 
> I don't think there's any point to the 'l' there.  "long" isn't
> necessarily correct for an int64_t, and normal type promotion should
> get this right anyway.

Fixed.

> Assuming the user has a >128 character wide terminal seems a little
> obnoxious, too.

This is a debug routine primarily to help me, but it seems right to leave
it in for the next person who has to debug it; it's easy enough for them
to tweak it to whatever their preference is.

> > +    char linebuf[129];
> > +
> > +    if (!todump) {
> > +        todump = migration_bitmap;
> > +    }
> > +
> > +    for (cur = 0; cur < ram_pages; cur += linelen) {
> > +        int64_t curb;
> > +        bool found = false;
> > +        /*
> > +         * Last line; catch the case where the line length
> > +         * is longer than remaining ram
> > +         */
> > +        if (cur+linelen > ram_pages) {
> > +            linelen = ram_pages - cur;
> > +        }
> > +        for (curb = 0; curb < linelen; curb++) {
> > +            bool thisbit = test_bit(cur+curb, todump);
> > +            linebuf[curb] = thisbit ? '1' : '.';
> > +            found |= (thisbit ^ expected);
> 
> I guess this will have the right result with the obvious encoding of a
> bool, but I don't think it's conceptually correct.  It should be
> logical, not bitwise operations so:
> 	found = found || (thisbit != expected);

Fixed.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-11-03  5:08   ` David Gibson
@ 2014-11-19 17:50     ` Dr. David Alan Gilbert
  2014-11-21  6:53       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 17:50 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:25PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy needs to have two migration streams loading concurrently;
> > one from memory (with the device state) and the other from the fd
> > with the memory transactions.
> > 
> > Split the core of qemu_loadvm_state out so we can use it for both.
> > 
> > Allow the inner loadvm loop to quit and signal whether the parent
> > should.
> > 
> > loadvm_handlers is made static since it's lifetime is greater
> > than the outer qemu_loadvm_state.
> 
> Maybe it's just me, but "made static" to me indicates either a change
> from fully-global to module-global, or (function) local automatic to
> local static, not a change from function local-automatic to
> module-global as here.
> 
> It's also not clear from this patch alone why the lifetime of
> loadvm_handlers now needs to exceed that of qemu_loadvm_state().

OK, how about if I reworked that last sentence to be:

   loadvm_handlers is made module-global to survive beyond the lifetime
   of the outer qemu_loadvm_state since it may still be in use by
   a subloop in the postcopy listen thread.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy
  2014-11-04  1:33   ` David Gibson
@ 2014-11-19 17:53     ` Dr. David Alan Gilbert
  2014-11-21  6:58       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 17:53 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:30PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Use that to split the qemu_savevm_state_pending counts into postcopiable
> > and non-postcopiable amounts
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c                 |  7 +++++++
> >  include/migration/vmstate.h |  2 +-
> >  include/sysemu/sysemu.h     |  4 +++-
> >  migration.c                 |  9 ++++++++-
> >  savevm.c                    | 23 +++++++++++++++++++----
> >  5 files changed, 38 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 6970733..44072d8 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      return ret;
> >  }
> >  
> > +/* RAM's always up for postcopying */
> > +static bool ram_can_postcopy(void *opaque)
> > +{
> > +    return true;
> > +}
> > +
> >  static SaveVMHandlers savevm_ram_handlers = {
> >      .save_live_setup = ram_save_setup,
> >      .save_live_iterate = ram_save_iterate,
> > @@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> >      .save_live_pending = ram_save_pending,
> >      .load_state = ram_load,
> >      .cancel = ram_migration_cancel,
> > +    .can_postcopy = ram_can_postcopy,
> 
> Is there actually any plausible device for which you'd need a callback
> here, rather than just having a static bool?
> 
> On the other hand, it does seem kind of plausible that there might be
> situations in which some data from a device must be pre-copied, but
> more can be post-copied, which would necessitate extending the
> per-handler callback to return quantities for both.

It's cheap enough and I couldn't make a strong argument about
any possible device, so I just used the function.

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure
  2014-11-04  3:09   ` David Gibson
@ 2014-11-19 18:46     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 18:46 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:35PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The PMI holds the state of each page on the incoming side,
> > so that we can tell if the page is missing, already received
> > or there is a request outstanding for it.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Though there are a couple of minor comments below:

<snip>

> > +/* ---------------------------------------------------------------------- */
> > +/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
> > +/* state of each page used by the inbound postcopy                        */
> > +/* It's a pair of bitmaps (of the same structure as the migration bitmaps)*/
> > +/* holding one bit per target-page, although all operations work on host  */
> > +/* pages.                                                                 */
> > +__attribute__ (( unused )) /* Until later in patch series */
> > +static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
> > +{
> > +    unsigned int tpb = qemu_target_page_bits();
> > +    unsigned long host_bits;
> > +
> > +    qemu_mutex_init(&mis->postcopy_pmi.mutex);
> > +    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
> > +    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
> > +    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
> > +    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
> > +    /*
> > +     * Each bit in the map represents one 'target page' which is no bigger
> > +     * than a host page but can be smaller.  It's useful to have some
> > +     * convenience masks for later
> 
> So, there's no inherent reason a target page couldn't be bigger than a
> host page.  It's fair enough not to handle that case for now, but
> something somewhere should probably verify that it's no the case.

I've added a guard for this in the host test.

> > +     */
> > +
> > +    /*
> > +     * The number of bits one host page takes up in the bitmap
> > +     * e.g. on a 64k host page, 4k Target page, host_bits=64/4=16
> > +     */
> > +    host_bits = sysconf(_SC_PAGESIZE) / (1ul << tpb);
> > +    /* Should be a power of 2 */
> > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> > +    /*
> > +     * If the host_bits isn't a division of the number of bits in long
> > +     * then the code gets a lot more complex; disallow for now
> > +     * (I'm not aware of a system where it's true anyway)
> > +     */
> > +    assert(((sizeof(long) * 8) % host_bits) == 0);
> > +
> > +    mis->postcopy_pmi.host_bits = host_bits;
> > +    /* A mask, starting at bit 0, containing host_bits continuous set bits */
> > +    mis->postcopy_pmi.host_mask =  (1ul << host_bits) - 1;
> > +
> > +    assert((ram_pages % host_bits) == 0);
> > +}
> > +
> > +void postcopy_pmi_destroy(MigrationIncomingState *mis)
> > +{
> > +    if (mis->postcopy_pmi.received_map) {
> > +        g_free(mis->postcopy_pmi.received_map);
> 
> g_free() is safe to call on NULL anyway, isn't it?

It is; fixed.

> > +/*
> > + * Retrieve the state of the given page
> > + * Note: This version for use by callers already holding the lock
> > + */
> > +static PostcopyPMIState postcopy_pmi_get_state_nolock(
> > +                            MigrationIncomingState *mis,
> > +                            size_t bitmap_index)
> > +{
> > +    bool received, requested;
> > +
> > +    received = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.received_map);
> > +    requested = test_hpbits(mis, bitmap_index, mis->postcopy_pmi.requested_map);
> > +
> > +    if (received) {
> > +        assert(!requested);
> 
> Clearing the requested bit when you set the received bit seems a bit
> pointless.  (requested && received) isn't meaningfully different from
> (!requested && received) but there seems no reason to go to extra
> trouble to avoid that state, and having the record might be
> interesting for gathering statistics.

Hmm yes I think you're right; but I want to think about it to convince me a
bit more; this code originally started off as two really seprate bitmaps
and has slowly morphed into really representing 3 states.  I've added it
to a TODO.

> > +/* Called by ram_load prior to mapping the page */
> > +void postcopy_hook_early_receive(MigrationIncomingState *mis,
> > +                                 size_t bitmap_index)
> > +{
> > +    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
> 
> A silent no-op if you're not in the expected migration phase doesn't
> seem right.  Should this be an assert() instead?

No.  This routine is called by the RAM code prior to doing anything with
the page, but it does it in all postcopy states; it's just that we only
care about it in the ADVISE state (it makes things a little cleaner - 
the RAM code no longer has to know about the postcopy stages).

Dave

> > +        /*
> > +         * If we're in precopy-advise mode we need to track received pages even
> > +         * though we don't need to place pages atomically yet.
> > +         * In advise mode there's only a single thread, so don't need locks
> > +         */
> > +        set_bit(bitmap_index, mis->postcopy_pmi.received_map);
> > +    }
> > +}
> > +
> >  int postcopy_ram_hosttest(void)
> >  {
> >      /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
> > @@ -156,5 +369,12 @@ int postcopy_ram_hosttest(void)
> >      return -1;
> >  }
> >  
> > +/* Called by ram_load prior to mapping the page */
> > +void postcopy_hook_early_receive(MigrationIncomingState *mis,
> > +                                 size_t bitmap_index)
> > +{
> > +    /* We don't support postcopy so don't care */
> > +}
> > +
> >  #endif
> >  
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-11-10  6:10   ` David Gibson
@ 2014-11-19 18:56     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 18:56 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:40PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> This could do with a bit more rationale in the commit message.
> 
> Also is there a reason not to fold this with the patch originally
> marking the RAM as userfault?  IIRC that one wasn't partocularly long
> either.

I've merged it with 'ram_enable_notify to switch on userfault' and
it's commit message now reads.


 'Mark the area of RAM as 'userfault'
 Start up a fault-thread to handle any userfaults we might receive
 from it (to be filled in later)'

Dave

> 
> Otherwise
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-11-05  6:49   ` David Gibson
@ 2014-11-19 18:59     ` Dr. David Alan Gilbert
  2014-11-19 21:17       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 18:59 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:38PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h    |  2 ++
> >  include/migration/postcopy-ram.h |  6 +++++
> >  postcopy-ram.c                   | 49 +++++++++++++++++++++++++++++++++++++++-
> >  savevm.c                         |  9 ++++++++
> >  4 files changed, 65 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index be63c89..b01cc17 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -87,6 +87,8 @@ struct MigrationIncomingState {
> >          POSTCOPY_RAM_INCOMING_END
> >      } postcopy_ram_state;
> >  
> > +    /* For the kernel to send us notifications */
> > +    int            userfault_fd;
> >      QEMUFile *return_path;
> >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >      PostcopyPMI    postcopy_pmi;
> > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > index 8f237a2..413b670 100644
> > --- a/include/migration/postcopy-ram.h
> > +++ b/include/migration/postcopy-ram.h
> > @@ -19,6 +19,12 @@
> >  int postcopy_ram_hosttest(void);
> >  
> >  /*
> > + * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > + * and wire up anything necessary to deal with it.
> > + */
> > +int postcopy_ram_enable_notify(MigrationIncomingState *mis);
> > +
> > +/*
> >   * Initialise postcopy-ram, setting the RAM to a state where we can go into
> >   * postcopy later; must be called prior to any precopy.
> >   * called from arch_init's similarly named ram_postcopy_incoming_init
> > diff --git a/postcopy-ram.c b/postcopy-ram.c
> > index 8eccf26..925ac77 100644
> > --- a/postcopy-ram.c
> > +++ b/postcopy-ram.c
> > @@ -485,9 +485,51 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >      return 0;
> >  }
> >  
> > +/*
> > + * Mark the given area of RAM as requiring notification to unwritten areas
> > + * Used as a  callback on qemu_ram_foreach_block.
> > + *   host_addr: Base of area to mark
> > + *   offset: Offset in the whole ram arena
> > + *   length: Length of the section
> > + *   opaque: Unused
> 
>                 ^^^^^^
> This appears to be wrong - opaque is used to find the MIS.

Fixed.

> 
> > + * Returns 0 on success
> > + */
> > +static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
> > +                                       ram_addr_t offset, ram_addr_t length,
> > +                                       void *opaque)
> > +{
> > +    MigrationIncomingState *mis = opaque;
> > +    uint64_t tokern[2];
> 
> "tokern"?

Now "to_kernel"

Dave

> 
> > +
> > +    if (madvise(host_addr, length, MADV_USERFAULT)) {
> > +        perror("postcopy_ram_sensitise_area madvise");
> > +        return -1;
> > +    }
> > +
> > +    /* Now tell our userfault_fd that it's responsible for this area */
> > +    tokern[0] = (uint64_t)(uintptr_t)host_addr | 1; /* 1 means register area */
> > +    tokern[1] = (uint64_t)(uintptr_t)host_addr + length;
> > +    if (write(mis->userfault_fd, tokern, 16) != 16) {
> > +        perror("postcopy_ram_sensitise_area write");
> > +        madvise(host_addr, length, MADV_NOUSERFAULT);
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > +{
> > +    /* Mark so that we get notified of accesses to unwritten areas */
> > +    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, mis)) {
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  #else
> >  /* No target OS support, stubs just fail */
> > -
> >  int postcopy_ram_hosttest(void)
> >  {
> >      error_report("postcopy_ram_hosttest: No OS support");
> > @@ -528,6 +570,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> >  {
> >      assert(0);
> >  }
> > +
> > +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > +{
> > +    assert(0);
> > +}
> >  #endif
> >  
> >  /* ------------------------------------------------------------------------- */
> > diff --git a/savevm.c b/savevm.c
> > index 54bdb26..859c96f 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -1304,6 +1304,15 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
> >  
> >      mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
> >  
> > +    /*
> > +     * Sensitise RAM - can now generate requests for blocks that don't exist
> > +     * However, at this point the CPU shouldn't be running, and the IO
> > +     * shouldn't be doing anything yet so don't actually expect requests
> > +     */
> > +    if (postcopy_ram_enable_notify(mis)) {
> > +        return -1;
> > +    }
> > +
> >      /* TODO start up the postcopy listening thread */
> >      return 0;
> >  }
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
  2014-11-18  4:38       ` David Gibson
@ 2014-11-19 19:37         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 19:37 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Mon, Nov 17, 2014 at 07:07:33PM +0000, Dr. David Alan Gilbert wrote:

> > > And maybe this one too - I would have expected the rb names to have
> > > already been validated on the source machine at this stage.
> > 
> > No to both:
> > I've been trying to avoid asserts in migration outgoing code, because
> > they shouldn't affect the state of your guest, so there's no reason
> > to kill off what might still be a viable running guest just because
> > migration failed.
> 
> Ah, ok, that makes sense.  Maybe adding something to the error message
> or a nearby comment indicating that if these happen it's certainly a
> bug, not the result of some external problem?

Done.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-11-13  3:29   ` David Gibson
@ 2014-11-19 19:40     ` Dr. David Alan Gilbert
  2014-11-21  8:36       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 19:40 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:51PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The loading of a device state (during postcopy) may access guest
> > memory that's still on the source machine and thus might need
> > a page fill; split off a separate thread that handles the incoming
> > page data so that the original incoming migration code can finish
> > off the device data.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  4 +++
> >  migration.c                   |  6 +++++
> >  savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
> >  3 files changed, 70 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 00255b8..69e776c 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -92,6 +92,10 @@ struct MigrationIncomingState {
> >      QemuThread     fault_thread;
> >      QemuSemaphore  fault_thread_sem;
> >  
> > +    bool           have_listen_thread;
> 
> AFAICT have_listen_thread is never set to a value other than 'true',
> so there doesn't see much point to it.

It's tested by qemu_loadvm_state to avoid doing cleaning stuff up as it exits,
since the listen thread is still using it.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-11-10  6:19   ` David Gibson
@ 2014-11-19 20:01     ` Dr. David Alan Gilbert
  2014-11-19 21:48       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-19 20:01 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:41PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
> > destination to request a page from the source.

> > +    buf64[0] = (uint64_t)start;
> > +    buf64[0] = cpu_to_be64(buf64[0]);
> 
> I think this would be clearer as well as less verbose, as just:
> 	buf64[0] = cpu_to_be64(start);

I've gone with the halfway mark of:

buf64[0] = cpu_to_be64((uint64_t)start);

it jsut doesn't feel right passing something into a byteswap
unless you know the size.

> > +    buf64[1] = (uint64_t)len;
> > +    buf64[1] = cpu_to_be64(buf64[1]);
> > +    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
> > +}
> > +
> >  void qemu_start_incoming_migration(const char *uri, Error **errp)
> >  {
> >      const char *p;
> > @@ -784,6 +816,17 @@ static void source_return_path_bad(MigrationState *s)
> >  }
> >  
> >  /*
> > + * Process a request for pages received on the return path,
> > + * We're allowed to send more than requested (e.g. to round to our page size)
> > + * and we don't need to send pages that have already been sent.
> > + */
> > +static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> > +                                       ram_addr_t start, ram_addr_t len)
> > +{
> > +    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> > +}
> > +
> > +/*
> >   * Handles messages sent on the return path towards the source VM
> >   *
> >   */
> > @@ -795,6 +838,8 @@ static void *source_return_path_thread(void *opaque)
> >      const int max_len = 512;
> >      uint8_t buf[max_len];
> >      uint32_t tmp32;
> > +    uint64_t tmp64a, tmp64b;
> 
> Hrm.. calling everything "tmp*" doesn't help readability.

True; most of the rest of those tmps are used by multiple commands and
just read off the wire and immediately used.
They're now start/len for tmp64a/b.

Dave
> 
> > +    char *tmpstr;
> >      int res;
> >  
> >      DPRINTF("RP: %s entry", __func__);
> > @@ -810,6 +855,11 @@ static void *source_return_path_thread(void *opaque)
> >              expected_len = 4;
> >              break;
> >  
> > +        case MIG_RPCOMM_REQPAGES:
> > +            /* 16 byte start/len _possibly_ plus an id str */
> > +            expected_len = 16 + 256;
> > +            break;
> > +
> >          default:
> >              error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
> >                      header_com, header_len);
> > @@ -857,6 +907,30 @@ static void *source_return_path_thread(void *opaque)
> >              atomic_xchg(&ms->rp_state.latest_ack, tmp32);
> >              break;
> >  
> > +        case MIG_RPCOMM_REQPAGES:
> > +            tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
> > +            tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
> > +            tmpstr = NULL;
> > +            if (tmp64b & 1) {
> > +                tmp64b -= 1; /* Remove the flag */
> > +                /* Now we expect an idstr */
> > +                tmp32 = buf[16]; /* Length of the following idstr */
> > +                tmpstr = (char *)&buf[17];
> > +                buf[17+tmp32] = '\0';
> > +                expected_len = 16+1+tmp32;
> > +            } else {
> > +                expected_len = 16;
> > +            }
> > +            if (header_len != expected_len) {
> > +                error_report("RP: Received ReqPage with length %d expecting %d",
> > +                        header_len, expected_len);
> > +                source_return_path_bad(ms);
> > +            }
> > +            migrate_handle_rp_reqpages(ms, tmpstr,
> > +                                          (ram_addr_t)tmp64a,
> > +                                          (ram_addr_t)tmp64b);
> > +            break;
> > +
> >          default:
> >              /* This shouldn't happen because we should catch this above */
> >              DPRINTF("RP: Bad header_com in dispatch");
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-11-19 17:06         ` Dr. David Alan Gilbert
@ 2014-11-19 21:12           ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-19 21:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 5062 bytes --]

On Wed, Nov 19, 2014 at 05:06:50PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Mon, Nov 03, 2014 at 01:22:45PM +0000, Dr. David Alan Gilbert wrote:
> > > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 03, 2014 at 06:47:22PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Open a return path, and handle messages that are received upon it.
> > > > > 
> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > 
> > > > [snip]
> > > > > @@ -414,6 +448,11 @@ static void migrate_fd_cancel(MigrationState *s)
> > > > >      int old_state ;
> > > > >      trace_migrate_fd_cancel();
> > > > >  
> > > > > +    if (s->return_path) {
> > > > > +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> > > > > +        qemu_file_shutdown(s->return_path);
> > > > 
> > > > Terminating the rp thread via shutting down its file seems roundabout,
> > > > and kind of dependent on the socket file implementation.
> > > 
> > > The rp thread might be in the middle of a blocking read()/recv()
> > > so I'm doing a shutdown() to cause those to exit; once I have to do that
> > > anyway it didn't seem necessary to add anything etra.
> > 
> > Hm.  I don't recall, does the rp thread need to do some cleanup at
> > this point?  Otherwise pthread_cancel() should kill a thread, even if
> > it's blocked at the moment.
> 
> It was Paolo's idea to use shutdown() and I agree - it works well;
> I'd originally thought about using pthread_cancel but it seemed to be
> generally disliked - you have to be very careful to either know exactly
> the points at which it might be killed (if you use the deferred version)
> or be prepared to deal with your thread disappearing at any time and
> ensure your data structures are always consistent.  In addition there
> was some concern that there was no Windows equivalent to pthread_cancel.

Hmm, yeah all right.

> > > > [snip]
> > > > > +__attribute__ (( unused )) /* Until later in patch series */
> > > > > +static int open_outgoing_return_path(MigrationState *ms)
> > > > > +{
> > > > > +
> > > > > +    ms->return_path = qemu_file_get_return_path(ms->file);
> > > > 
> > > > So, another reason this get_return_path abstraction doesn't seem right
> > > > to me, is that it's not obvious that for non-socket file types, the
> > > > source and destination side "get return path" operations would
> > > > necessarily be the same.
> > > 
> > > However, since the implementation of the get_return_path is a method
> > > on the particular implementation, and it can be different for a 
> > > qemu_file opened for read or write, then that non-socket file type
> > > could implement it how it likes including something like shutdown).
> > 
> > So, I'm a little less bothered by this since I realised that QemuFile
> > is basically only used for migration streams, not for other file type
> > operations.  The fact that that makes QemuFile a really bad name is a
> > different matter.
> 
> Yes, but hey we've got FILE* in C anyway, so it might be bad, but it's
> not inconsitent.

Uh.. not following the analogy here, sorry.

> > The return path operation is quite specific to a migration stream, and
> > doesn't really belong with a "file" abstraction.
> 
> I think the bit that's specific, is as you say that I don't know whether
> I need it until later.
> 
> > The case I've been considering where it's not easy to see how to
> > abstract this is that of a pipe - in that case it will be necessary to
> > open a second pipe from destination to source, which probably needs
> > some preliminary work when first opening the connection, and therefore
> > can't easily be encapsulated into a "get return path" callback.
> 
> I'm OK with some transports not supporting this; I check for it and
> error out.  At a higher level I do send an 'open_return_path' command
> from src->dest early on to say I'm going to want a return path, I guess
> a pipe might be able to open that fd then and pass it back over the original
> fd? But that might be hairy.

You can pass fds over Unix sockets, but not over pipes AFAIK.

> > The abstraction of the shutdown is another question again - I can't
> > think of any other file type which has an operation similar in effect
> > to shutdown(), so it seems really socket specific. Which is another
> > reason I'm not convinced telling the rp thread to die via its stream
> > is a good idea.
> 
> I'd be OK with setting some flag or similar at the same time if that
> would help; but I don't think there's a safe posix'y way of killing a
> thread that might be stuck in a recv()/read() other than shutdown().
> 
> Dave
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
  2014-11-19 18:59     ` Dr. David Alan Gilbert
@ 2014-11-19 21:17       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-19 21:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]

On Wed, Nov 19, 2014 at 06:59:38PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:38PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  include/migration/migration.h    |  2 ++
> > >  include/migration/postcopy-ram.h |  6 +++++
> > >  postcopy-ram.c                   | 49 +++++++++++++++++++++++++++++++++++++++-
> > >  savevm.c                         |  9 ++++++++
> > >  4 files changed, 65 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index be63c89..b01cc17 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -87,6 +87,8 @@ struct MigrationIncomingState {
> > >          POSTCOPY_RAM_INCOMING_END
> > >      } postcopy_ram_state;
> > >  
> > > +    /* For the kernel to send us notifications */
> > > +    int            userfault_fd;
> > >      QEMUFile *return_path;
> > >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> > >      PostcopyPMI    postcopy_pmi;
> > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > index 8f237a2..413b670 100644
> > > --- a/include/migration/postcopy-ram.h
> > > +++ b/include/migration/postcopy-ram.h
> > > @@ -19,6 +19,12 @@
> > >  int postcopy_ram_hosttest(void);
> > >  
> > >  /*
> > > + * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > + * and wire up anything necessary to deal with it.
> > > + */
> > > +int postcopy_ram_enable_notify(MigrationIncomingState *mis);
> > > +
> > > +/*
> > >   * Initialise postcopy-ram, setting the RAM to a state where we can go into
> > >   * postcopy later; must be called prior to any precopy.
> > >   * called from arch_init's similarly named ram_postcopy_incoming_init
> > > diff --git a/postcopy-ram.c b/postcopy-ram.c
> > > index 8eccf26..925ac77 100644
> > > --- a/postcopy-ram.c
> > > +++ b/postcopy-ram.c
> > > @@ -485,9 +485,51 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> > >      return 0;
> > >  }
> > >  
> > > +/*
> > > + * Mark the given area of RAM as requiring notification to unwritten areas
> > > + * Used as a  callback on qemu_ram_foreach_block.
> > > + *   host_addr: Base of area to mark
> > > + *   offset: Offset in the whole ram arena
> > > + *   length: Length of the section
> > > + *   opaque: Unused
> > 
> >                 ^^^^^^
> > This appears to be wrong - opaque is used to find the MIS.
> 
> Fixed.
> 
> > 
> > > + * Returns 0 on success
> > > + */
> > > +static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
> > > +                                       ram_addr_t offset, ram_addr_t length,
> > > +                                       void *opaque)
> > > +{
> > > +    MigrationIncomingState *mis = opaque;
> > > +    uint64_t tokern[2];
> > 
> > "tokern"?
> 
> Now "to_kernel"

Ah!  I thought it was just mispelled "token".

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-11-19 20:01     ` Dr. David Alan Gilbert
@ 2014-11-19 21:48       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2014-11-19 21:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2855 bytes --]

On Wed, Nov 19, 2014 at 08:01:31PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:41PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
> > > destination to request a page from the source.
> 
> > > +    buf64[0] = (uint64_t)start;
> > > +    buf64[0] = cpu_to_be64(buf64[0]);
> > 
> > I think this would be clearer as well as less verbose, as just:
> > 	buf64[0] = cpu_to_be64(start);
> 
> I've gone with the halfway mark of:
> 
> buf64[0] = cpu_to_be64((uint64_t)start);
> 
> it jsut doesn't feel right passing something into a byteswap
> unless you know the size.

I've always thought of (this group of) byteswap functions as
specifying the output size.  It's a value parameter so integer
promotion to the required type is pretty safe.

> > > +    buf64[1] = (uint64_t)len;
> > > +    buf64[1] = cpu_to_be64(buf64[1]);
> > > +    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
> > > +}
> > > +
> > >  void qemu_start_incoming_migration(const char *uri, Error **errp)
> > >  {
> > >      const char *p;
> > > @@ -784,6 +816,17 @@ static void source_return_path_bad(MigrationState *s)
> > >  }
> > >  
> > >  /*
> > > + * Process a request for pages received on the return path,
> > > + * We're allowed to send more than requested (e.g. to round to our page size)
> > > + * and we don't need to send pages that have already been sent.
> > > + */
> > > +static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
> > > +                                       ram_addr_t start, ram_addr_t len)
> > > +{
> > > +    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
> > > +}
> > > +
> > > +/*
> > >   * Handles messages sent on the return path towards the source VM
> > >   *
> > >   */
> > > @@ -795,6 +838,8 @@ static void *source_return_path_thread(void *opaque)
> > >      const int max_len = 512;
> > >      uint8_t buf[max_len];
> > >      uint32_t tmp32;
> > > +    uint64_t tmp64a, tmp64b;
> > 
> > Hrm.. calling everything "tmp*" doesn't help readability.
> 
> True; most of the rest of those tmps are used by multiple commands and
> just read off the wire and immediately used.
> They're now start/len for tmp64a/b.

Ok, great.  I find it's usually best to declare appropriate variables
for each case (or even use local blocks), rather than share.  The
compiler's smart enough to coalesce them, so there's no real cost.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-04 16:27   ` Paolo Bonzini
@ 2014-11-20 11:45     ` Dr. David Alan Gilbert
  2014-11-21 12:01       ` Paolo Bonzini
  2014-11-20 17:12     ` Dr. David Alan Gilbert
  2014-11-24 18:26     ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-20 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 185 insertions(+), 19 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index b01cc17..f401775 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -125,6 +125,9 @@ struct MigrationState
> >      /* Flag set once the migration has been asked to enter postcopy */
> >      volatile bool start_postcopy;
> >  
> > +    /* Flag set once the migration thread is running (and needs joining) */
> > +    volatile bool started_migration_thread;
> 
> volatile almost never does what you think it does. :)

True.

> In this case, I think only one thread reads/writes the variable so
> "volatile" is unnecessary.

Lets just check that; so it's set by 'migrate_fd_connect' (from the main thread)
when it spawns the thread, and it's cleared by migrate_fd_cleanup that's always run
as a bh, so should always be in the main thread; so yes - always the same thread,
that's nice and simple; volatile evaporated.

> Otherwise, you would need to add actual memory barriers, atomic
> operations, or synchronization primitives.
> 
> For start_postcopy, it is okay because it is just a hint to the compiler
> and the processor will eventually see the assignment.

Yes, in this case my understanding is that it's necessary to stop the
compiler potentially moving the check outside the loop.

> For this case
> QEMU has atomic_read/atomic_set (corresponding to __ATOMIC_RELAXED in
> C/C++1x), so you could use those as well.

Ah, so those look like they just volatile cast anyway.

(I've probably got some other flags I need to think about reading/writing
atomically/safely).

Dave
(I'll take the other issues in this mail separately since there are quite a few).
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-04 16:27   ` Paolo Bonzini
  2014-11-20 11:45     ` Dr. David Alan Gilbert
@ 2014-11-20 17:12     ` Dr. David Alan Gilbert
  2014-11-20 17:19       ` Paolo Bonzini
  2014-11-24 18:26     ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-20 17:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 185 insertions(+), 19 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index b01cc17..f401775 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h

<snip>

> > +/* Switch from normal iteration to postcopy
> > + * Returns non-0 on error
> > + */
> > +static int postcopy_start(MigrationState *ms)
> > +{
> > +    int ret;
> > +    const QEMUSizedBuffer *qsb;
> > +    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
> > +
> > +    DPRINTF("postcopy_start\n");
> > +    qemu_mutex_lock_iothread();
> > +    DPRINTF("postcopy_start: setting run state\n");
> > +    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > +
> > +    if (ret < 0) {
> > +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +        qemu_mutex_unlock_iothread();
> > +        return -1;
> 
> Please use "goto" for error returns, like
> 
> fail_locked:
>     qemu_mutex_unlock_iothread();
> fail:
>     migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
>     return -1;

Done; they all end up unlocking, but I've got another label
for a case that has to close the fb later.

> > +    }
> > +
> > +    /*
> > +     * in Finish migrate and with the io-lock held everything should
> > +     * be quiet, but we've potentially still got dirty pages and we
> > +     * need to tell the destination to throw any pages it's already received
> > +     * that are dirty
> > +     */
> > +    if (ram_postcopy_send_discard_bitmap(ms)) {
> > +        DPRINTF("postcopy send discard bitmap failed\n");
> > +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +        qemu_mutex_unlock_iothread();
> > +        return -1;
> > +    }
> > +
> > +    DPRINTF("postcopy_start: sending req 2\n");
> > +    qemu_savevm_send_reqack(ms->file, 2);
> 
> Perhaps move it below qemu_file_set_rate_limit, and add
> trace_qemu_savevm_send_reqack?

Trace added, and also moved as requested - was the request to move
it just to elimintate the other DPRINTF?

> Also what is 2/3/4?  Is this just for debugging or is it part of the
> protocol?

Debug; they're very useful for matching the debug streams up, especially
when the timers on the two hosts are very different.
(I'm up for suggestions on how to mark the 2/3/4 for debug more clearly,
especially if it meant that it didn't make the ping (ne reqack) dedicated
to debug).

> > +    /*
> > +     * send rest of state - note things that are doing postcopy
> > +     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
> > +     * wrap their state up here
> > +     */
> > +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> > +    DPRINTF("postcopy_start: do state_complete\n");
> > +
> > +    /*
> > +     * We need to leave the fd free for page transfers during the
> > +     * loading of the device state, so wrap all the remaining
> > +     * commands and state into a package that gets sent in one go
> > +     */
> 
> The comments in the code are very nice.  Thanks.  This is a huge
> improvement from the last version I received.
> 
> > +    QEMUFile *fb = qemu_bufopen("w", NULL);
> > +    if (!fb) {
> > +        error_report("Failed to create buffered file");
> > +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +        qemu_mutex_unlock_iothread();
> > +        return -1;
> > +    }
> > +
> > +    qemu_savevm_state_complete(fb);
> > +    DPRINTF("postcopy_start: sending req 3\n");
> > +    qemu_savevm_send_reqack(fb, 3);
> > +
> > +    qemu_savevm_send_postcopy_ram_run(fb);
> > +
> > +    /* <><> end of stuff going into the package */
> > +    qsb = qemu_buf_get(fb);
> > +
> > +    /* Now send that blob */
> > +    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
> > +        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
> > +                (unsigned long)(qsb_get_length(qsb)));
> > +        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
> > +        qemu_mutex_unlock_iothread();
> > +        qemu_fclose(fb);
> 
> Close fb above migrate_set_state, and use goto as above.  Or just have
> three labels.

Done, it's a separate label.

> 
> > +        return -1;
> > +    }
> > +    qemu_savevm_send_packaged(ms->file, qsb);
> > +    qemu_fclose(fb);
> > +
> > +    qemu_mutex_unlock_iothread();
> > +
> > +    DPRINTF("postcopy_start not finished sending ack\n");
> > +    qemu_savevm_send_reqack(ms->file, 4);
> > +
> > +    ret = qemu_file_get_error(ms->file);
> > +    if (ret) {
> > +        error_report("postcopy_start: Migration stream errored");
> 
> This should have been reported already.

No, sorry - I don't trust qemu_file reporting errors by itself.

Dave
(Again, the rest of the comments on this patch can wait for another mail)

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-11-20 17:12     ` Dr. David Alan Gilbert
@ 2014-11-20 17:19       ` Paolo Bonzini
  0 siblings, 0 replies; 204+ messages in thread
From: Paolo Bonzini @ 2014-11-20 17:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy



On 20/11/2014 18:12, Dr. David Alan Gilbert wrote:
> Trace added, and also moved as requested - was the request to move
> it just to elimintate the other DPRINTF?

Yes.

>> > Also what is 2/3/4?  Is this just for debugging or is it part of the
>> > protocol?
> Debug; they're very useful for matching the debug streams up, especially
> when the timers on the two hosts are very different.
> (I'm up for suggestions on how to mark the 2/3/4 for debug more clearly,
> especially if it meant that it didn't make the ping (ne reqack) dedicated
> to debug).
> 

No problem, as long as it's clear to the guy matching the code against
the debug output.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (47 preceding siblings ...)
  2014-10-03 19:21 ` [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert
@ 2014-11-21  3:48 ` zhanghailiang
  2014-11-21 10:14   ` Dr. David Alan Gilbert
  2014-11-21 18:56   ` Andrea Arcangeli
  48 siblings, 2 replies; 204+ messages in thread
From: zhanghailiang @ 2014-11-21  3:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, amit.shah, yanghy

Hi David,

When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,

Is it a bug of userfaultfd API?

cc: Andrea

reproduce Steps:
Source:
qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
-machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
-hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :11 -monitor stdio

Destination:
qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
-machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
-hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :12 -monitor stdio \
-incoming unix:/mnt/migrate.sock
(1) migrate_set_capability x-postcopy-ram on
(2) migrate -d unix:/mnt/migrate.sock

In Destination, it fails, reports:
savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: 0
savevm@2040988668 qemu_loadvm_state loop: section_type=6
savevm@2040988668 loadvm_postcopy_ram_handle_advise
postcopy_ram_hosttest: remap_anon_pages not available: File exists
savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: -1

And one more thing, i want to know: ;)
Why we must start precopy first before start postcopy?
Can we do postcopy at the beginning of migration?

Thanks,
zhanghailiang

On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Hi,
>    This is the 4th cut of my version of postcopy; it is designed for use with
> the Linux kernel additions just posted by Andrea Arcangeli here:
>
> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
>
> (Note: This is a new version compared to my previous postcopy patchset; you'll
> need to update the kernel to the new version.)
>
> Other than the new kernel ABI (which is only a small change to the userspace side);
> the major changes are;
>
>    a) Code for host page size != target page size
>    b) Support for migration over fd
>       From Cristian Klein; this is for libvirt support which Cristian recently
>       posted to the libvirt list.
>    c) It's now build bisectable and builds on 32bit
>
> Testing wise; I've now done many thousand of postcopy migrations without
> failure (both of idle and busy guests); so it seems pretty solid.
>
> Must-TODO's:
>    1) A partially repeatable migration_cancel failure
>    2) virt_test's migrate.with_reboot test is failing
>    3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>      the source feels like it needs looking at for postcopy.
>    4) Paolo's comments with respect to the wakeup_request/is_running code
>       in the migration thread
>    5) xbzrle needs disabling once in postcopy
>
> Later-TODO's:
>    1) Control the rate of background page transfers during postcopy to
>       reduce their impact on the latency of postcopy requests.
>    2) Work with RDMA
>    3) Could destination RP be made blocking (as per discussion with Paolo;
>       I'm still worried that that changes too many assumptions)
>
>
>
> V4:
>    Initial support for host page size != target page size
>      - tested heavily on hps==tps
>      - only partially tested on hps!=tps systems
>      - This involved quite a bit of rework around the discard code
>    Updated to new kernel userfault ABI
>      - It won't work with the previous version
>    Fix mis-optimisation of postcopy request for wrong RAMBlock
>       request for block A offset n
>       un-needed fault for block B/m (already received - no req sent)
>       request for block B/l  - wrongly sent as request for A/l
>    Fix thinko in discard bitmap processing (missed last word of bitmap)
>       Symptom: remap failures near the top of RAM if postcopy started late
>    Fix bug that caused kernel page acknowledgments to be misaligned
>       May have meant the guest was paused for longer than required
>    Fix potential for crashing cleaning up failed RP
>    Fixes in docs (from Yang)
>    Handle migration by fd as sockets if they are sockets
>    Build tested on 32bit
>    Fully build bisectable (x86-64)
>
>
> Dave
>
> Cristian Klein (1):
>    Handle bi-directional communication for fd migration
>
> Dr. David Alan Gilbert (46):
>    QEMUSizedBuffer based QEMUFile
>    Tests: QEMUSizedBuffer/QEMUBuffer
>    Start documenting how postcopy works.
>    qemu_ram_foreach_block: pass up error value, and down the ramblock
>      name
>    improve DPRINTF macros, add to savevm
>    Add qemu_get_counted_string to read a string prefixed by a count byte
>    Create MigrationIncomingState
>    socket shutdown
>    Provide runtime Target page information
>    Return path: Open a return path on QEMUFile for sockets
>    Return path: socket_writev_buffer: Block even on non-blocking fd's
>    Migration commands
>    Return path: Control commands
>    Return path: Send responses from destination to source
>    Return path: Source handling of return path
>    qemu_loadvm errors and debug
>    ram_debug_dump_bitmap: Dump a migration bitmap as text
>    Rework loadvm path for subloops
>    Add migration-capability boolean for postcopy-ram.
>    Add wrappers and handlers for sending/receiving the postcopy-ram
>      migration messages.
>    QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>    migrate_init: Call from savevm
>    Allow savevm handlers to state whether they could go into postcopy
>    postcopy: OS support test
>    migrate_start_postcopy: Command to trigger transition to postcopy
>    MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>    qemu_savevm_state_complete: Postcopy changes
>    Postcopy page-map-incoming (PMI) structure
>    Postcopy: Maintain sentmap and calculate discard
>    postcopy: Incoming initialisation
>    postcopy: ram_enable_notify to switch on userfault
>    Postcopy: Postcopy startup in migration thread
>    Postcopy: Create a fault handler thread before marking the ram as
>      userfault
>    Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>    Page request: Process incoming page request
>    Page request: Consume pages off the post-copy queue
>    Add assertion to check migration_dirty_pages
>    postcopy_ram.c: place_page and helpers
>    Postcopy: Use helpers to map pages during migration
>    qemu_ram_block_from_host
>    Don't sync dirty bitmaps in postcopy
>    Host page!=target page: Cleanup bitmaps
>    Postcopy; Handle userfault requests
>    Start up a postcopy/listener thread ready for incoming page data
>    postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>    End of migration for postcopy
>
>   Makefile.objs                    |    2 +-
>   arch_init.c                      |  739 +++++++++++++++++++++++++--
>   docs/migration.txt               |  189 +++++++
>   exec.c                           |   76 ++-
>   hmp-commands.hx                  |   15 +
>   hmp.c                            |    7 +
>   hmp.h                            |    1 +
>   include/exec/cpu-common.h        |    8 +-
>   include/migration/migration.h    |  130 +++++
>   include/migration/postcopy-ram.h |  106 ++++
>   include/migration/qemu-file.h    |   47 ++
>   include/migration/vmstate.h      |    2 +-
>   include/qemu/sockets.h           |    1 +
>   include/qemu/typedefs.h          |    9 +-
>   include/sysemu/sysemu.h          |   43 +-
>   migration-fd.c                   |   24 +-
>   migration-rdma.c                 |    4 +-
>   migration.c                      |  693 +++++++++++++++++++++++++-
>   postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
>   qapi-schema.json                 |   14 +-
>   qemu-file.c                      |  598 +++++++++++++++++++++-
>   qmp-commands.hx                  |   19 +
>   savevm.c                         |  881 +++++++++++++++++++++++++++++++--
>   tests/Makefile                   |    2 +-
>   tests/test-vmstate.c             |   74 +--
>   util/qemu-sockets.c              |   28 ++
>   26 files changed, 4550 insertions(+), 178 deletions(-)
>   create mode 100644 include/migration/postcopy-ram.h
>   create mode 100644 postcopy-ram.c
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-11-19 17:50     ` Dr. David Alan Gilbert
@ 2014-11-21  6:53       ` David Gibson
  2014-12-11 14:47         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-21  6:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Wed, Nov 19, 2014 at 05:50:11PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:25PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Postcopy needs to have two migration streams loading concurrently;
> > > one from memory (with the device state) and the other from the fd
> > > with the memory transactions.
> > > 
> > > Split the core of qemu_loadvm_state out so we can use it for both.
> > > 
> > > Allow the inner loadvm loop to quit and signal whether the parent
> > > should.
> > > 
> > > loadvm_handlers is made static since it's lifetime is greater
> > > than the outer qemu_loadvm_state.
> > 
> > Maybe it's just me, but "made static" to me indicates either a change
> > from fully-global to module-global, or (function) local automatic to
> > local static, not a change from function local-automatic to
> > module-global as here.
> > 
> > It's also not clear from this patch alone why the lifetime of
> > loadvm_handlers now needs to exceed that of qemu_loadvm_state().
> 
> OK, how about if I reworked that last sentence to be:
> 
>    loadvm_handlers is made module-global to survive beyond the lifetime
>    of the outer qemu_loadvm_state since it may still be in use by
>    a subloop in the postcopy listen thread.

Yeah, that's better.  A global seems ugly though.  Would it be better
to dynamically allocate the list head and pass a pointer into the
listen thread, or even to pass the list head by value into the listen
thread.

The individual list elements need to be cleaned up at some point
anyway, so I don't think that introduces any lifetime questions that
weren't already there.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy
  2014-11-19 17:53     ` Dr. David Alan Gilbert
@ 2014-11-21  6:58       ` David Gibson
  2014-11-25 19:58         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-21  6:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]

On Wed, Nov 19, 2014 at 05:53:54PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:30PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Use that to split the qemu_savevm_state_pending counts into postcopiable
> > > and non-postcopiable amounts
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  arch_init.c                 |  7 +++++++
> > >  include/migration/vmstate.h |  2 +-
> > >  include/sysemu/sysemu.h     |  4 +++-
> > >  migration.c                 |  9 ++++++++-
> > >  savevm.c                    | 23 +++++++++++++++++++----
> > >  5 files changed, 38 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch_init.c b/arch_init.c
> > > index 6970733..44072d8 100644
> > > --- a/arch_init.c
> > > +++ b/arch_init.c
> > > @@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > >      return ret;
> > >  }
> > >  
> > > +/* RAM's always up for postcopying */
> > > +static bool ram_can_postcopy(void *opaque)
> > > +{
> > > +    return true;
> > > +}
> > > +
> > >  static SaveVMHandlers savevm_ram_handlers = {
> > >      .save_live_setup = ram_save_setup,
> > >      .save_live_iterate = ram_save_iterate,
> > > @@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> > >      .save_live_pending = ram_save_pending,
> > >      .load_state = ram_load,
> > >      .cancel = ram_migration_cancel,
> > > +    .can_postcopy = ram_can_postcopy,
> > 
> > Is there actually any plausible device for which you'd need a callback
> > here, rather than just having a static bool?
> > 
> > On the other hand, it does seem kind of plausible that there might be
> > situations in which some data from a device must be pre-copied, but
> > more can be post-copied, which would necessitate extending the
> > per-handler callback to return quantities for both.
> 
> It's cheap enough and I couldn't make a strong argument about
> any possible device, so I just used the function.

Ok.  I still wonder if it might be better to instead extend
the save_live_pending callback in order to return both
non-postcopyable and postcopyable quantites.  It allows for the case
of a postcopyable device which has some non-postcopyable data - and
with any postcopyable device other than RAM, it seems likely that
there will need to be some precopied metadata at least.  Plus it
avoids adding another callback.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-11-19 19:40     ` Dr. David Alan Gilbert
@ 2014-11-21  8:36       ` David Gibson
  2014-11-21 10:17         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2014-11-21  8:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On Wed, Nov 19, 2014 at 07:40:20PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:51PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > The loading of a device state (during postcopy) may access guest
> > > memory that's still on the source machine and thus might need
> > > a page fill; split off a separate thread that handles the incoming
> > > page data so that the original incoming migration code can finish
> > > off the device data.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  include/migration/migration.h |  4 +++
> > >  migration.c                   |  6 +++++
> > >  savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
> > >  3 files changed, 70 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 00255b8..69e776c 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -92,6 +92,10 @@ struct MigrationIncomingState {
> > >      QemuThread     fault_thread;
> > >      QemuSemaphore  fault_thread_sem;
> > >  
> > > +    bool           have_listen_thread;
> > 
> > AFAICT have_listen_thread is never set to a value other than 'true',
> > so there doesn't see much point to it.
> 
> It's tested by qemu_loadvm_state to avoid doing cleaning stuff up as it exits,
> since the listen thread is still using it.

Right, but I couldn't see under what circumstances it would ever be
false at the test false in qemu_loadvm_state().

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-11-21  3:48 ` zhanghailiang
@ 2014-11-21 10:14   ` Dr. David Alan Gilbert
  2014-11-24  8:10     ` zhanghailiang
  2014-11-21 18:56   ` Andrea Arcangeli
  1 sibling, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-21 10:14 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Hi David,
> 
> When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
> It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,
> 
> Is it a bug of userfaultfd API?

Thanks.

> cc: Andrea
> 
> reproduce Steps:
> Source:
> qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
> -machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
> -hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :11 -monitor stdio
> 
> Destination:
> qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
> -machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
> -hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :12 -monitor stdio \
> -incoming unix:/mnt/migrate.sock
> (1) migrate_set_capability x-postcopy-ram on
> (2) migrate -d unix:/mnt/migrate.sock
> 
> In Destination, it fails, reports:
> savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: 0
> savevm@2040988668 qemu_loadvm_state loop: section_type=6
> savevm@2040988668 loadvm_postcopy_ram_handle_advise
> postcopy_ram_hosttest: remap_anon_pages not available: File exists
> savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: -1

Yes, I think I need to chat to Andrea about how that's supposed to work with mlock.
I've added it to my list and we'll figure it out; I suspect on the destination
I need to avoid doing the mlockall until after postcopy completes.

> And one more thing, i want to know: ;)
> Why we must start precopy first before start postcopy?
> Can we do postcopy at the beginning of migration?

You can send migrate_start_postcopy immediately after you send the migrate
command, which is very close to no-precopy; the original API had a timeout
and if you set it to 0 then it would do exactly no-precopy, but the current API
was preferred by reviewers, and is simpler.
With testing, the best performance is from doing one full pass of precopy and
then starting postcopy; that way all of the kernel and other static stuff
has already moved to the destination, and there are much fewer page requests.

Thanks for the report,

Dave

> 
> Thanks,
> zhanghailiang
> 
> On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Hi,
> >   This is the 4th cut of my version of postcopy; it is designed for use with
> >the Linux kernel additions just posted by Andrea Arcangeli here:
> >
> >http://marc.info/?l=linux-kernel&m=141235633015100&w=2
> >
> >(Note: This is a new version compared to my previous postcopy patchset; you'll
> >need to update the kernel to the new version.)
> >
> >Other than the new kernel ABI (which is only a small change to the userspace side);
> >the major changes are;
> >
> >   a) Code for host page size != target page size
> >   b) Support for migration over fd
> >      From Cristian Klein; this is for libvirt support which Cristian recently
> >      posted to the libvirt list.
> >   c) It's now build bisectable and builds on 32bit
> >
> >Testing wise; I've now done many thousand of postcopy migrations without
> >failure (both of idle and busy guests); so it seems pretty solid.
> >
> >Must-TODO's:
> >   1) A partially repeatable migration_cancel failure
> >   2) virt_test's migrate.with_reboot test is failing
> >   3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
> >     the source feels like it needs looking at for postcopy.
> >   4) Paolo's comments with respect to the wakeup_request/is_running code
> >      in the migration thread
> >   5) xbzrle needs disabling once in postcopy
> >
> >Later-TODO's:
> >   1) Control the rate of background page transfers during postcopy to
> >      reduce their impact on the latency of postcopy requests.
> >   2) Work with RDMA
> >   3) Could destination RP be made blocking (as per discussion with Paolo;
> >      I'm still worried that that changes too many assumptions)
> >
> >
> >
> >V4:
> >   Initial support for host page size != target page size
> >     - tested heavily on hps==tps
> >     - only partially tested on hps!=tps systems
> >     - This involved quite a bit of rework around the discard code
> >   Updated to new kernel userfault ABI
> >     - It won't work with the previous version
> >   Fix mis-optimisation of postcopy request for wrong RAMBlock
> >      request for block A offset n
> >      un-needed fault for block B/m (already received - no req sent)
> >      request for block B/l  - wrongly sent as request for A/l
> >   Fix thinko in discard bitmap processing (missed last word of bitmap)
> >      Symptom: remap failures near the top of RAM if postcopy started late
> >   Fix bug that caused kernel page acknowledgments to be misaligned
> >      May have meant the guest was paused for longer than required
> >   Fix potential for crashing cleaning up failed RP
> >   Fixes in docs (from Yang)
> >   Handle migration by fd as sockets if they are sockets
> >   Build tested on 32bit
> >   Fully build bisectable (x86-64)
> >
> >
> >Dave
> >
> >Cristian Klein (1):
> >   Handle bi-directional communication for fd migration
> >
> >Dr. David Alan Gilbert (46):
> >   QEMUSizedBuffer based QEMUFile
> >   Tests: QEMUSizedBuffer/QEMUBuffer
> >   Start documenting how postcopy works.
> >   qemu_ram_foreach_block: pass up error value, and down the ramblock
> >     name
> >   improve DPRINTF macros, add to savevm
> >   Add qemu_get_counted_string to read a string prefixed by a count byte
> >   Create MigrationIncomingState
> >   socket shutdown
> >   Provide runtime Target page information
> >   Return path: Open a return path on QEMUFile for sockets
> >   Return path: socket_writev_buffer: Block even on non-blocking fd's
> >   Migration commands
> >   Return path: Control commands
> >   Return path: Send responses from destination to source
> >   Return path: Source handling of return path
> >   qemu_loadvm errors and debug
> >   ram_debug_dump_bitmap: Dump a migration bitmap as text
> >   Rework loadvm path for subloops
> >   Add migration-capability boolean for postcopy-ram.
> >   Add wrappers and handlers for sending/receiving the postcopy-ram
> >     migration messages.
> >   QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
> >   migrate_init: Call from savevm
> >   Allow savevm handlers to state whether they could go into postcopy
> >   postcopy: OS support test
> >   migrate_start_postcopy: Command to trigger transition to postcopy
> >   MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
> >   qemu_savevm_state_complete: Postcopy changes
> >   Postcopy page-map-incoming (PMI) structure
> >   Postcopy: Maintain sentmap and calculate discard
> >   postcopy: Incoming initialisation
> >   postcopy: ram_enable_notify to switch on userfault
> >   Postcopy: Postcopy startup in migration thread
> >   Postcopy: Create a fault handler thread before marking the ram as
> >     userfault
> >   Page request:  Add MIG_RPCOMM_REQPAGES reverse command
> >   Page request: Process incoming page request
> >   Page request: Consume pages off the post-copy queue
> >   Add assertion to check migration_dirty_pages
> >   postcopy_ram.c: place_page and helpers
> >   Postcopy: Use helpers to map pages during migration
> >   qemu_ram_block_from_host
> >   Don't sync dirty bitmaps in postcopy
> >   Host page!=target page: Cleanup bitmaps
> >   Postcopy; Handle userfault requests
> >   Start up a postcopy/listener thread ready for incoming page data
> >   postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
> >   End of migration for postcopy
> >
> >  Makefile.objs                    |    2 +-
> >  arch_init.c                      |  739 +++++++++++++++++++++++++--
> >  docs/migration.txt               |  189 +++++++
> >  exec.c                           |   76 ++-
> >  hmp-commands.hx                  |   15 +
> >  hmp.c                            |    7 +
> >  hmp.h                            |    1 +
> >  include/exec/cpu-common.h        |    8 +-
> >  include/migration/migration.h    |  130 +++++
> >  include/migration/postcopy-ram.h |  106 ++++
> >  include/migration/qemu-file.h    |   47 ++
> >  include/migration/vmstate.h      |    2 +-
> >  include/qemu/sockets.h           |    1 +
> >  include/qemu/typedefs.h          |    9 +-
> >  include/sysemu/sysemu.h          |   43 +-
> >  migration-fd.c                   |   24 +-
> >  migration-rdma.c                 |    4 +-
> >  migration.c                      |  693 +++++++++++++++++++++++++-
> >  postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json                 |   14 +-
> >  qemu-file.c                      |  598 +++++++++++++++++++++-
> >  qmp-commands.hx                  |   19 +
> >  savevm.c                         |  881 +++++++++++++++++++++++++++++++--
> >  tests/Makefile                   |    2 +-
> >  tests/test-vmstate.c             |   74 +--
> >  util/qemu-sockets.c              |   28 ++
> >  26 files changed, 4550 insertions(+), 178 deletions(-)
> >  create mode 100644 include/migration/postcopy-ram.h
> >  create mode 100644 postcopy-ram.c
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data
  2014-11-21  8:36       ` David Gibson
@ 2014-11-21 10:17         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-21 10:17 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Nov 19, 2014 at 07:40:20PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:51PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > The loading of a device state (during postcopy) may access guest
> > > > memory that's still on the source machine and thus might need
> > > > a page fill; split off a separate thread that handles the incoming
> > > > page data so that the original incoming migration code can finish
> > > > off the device data.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  include/migration/migration.h |  4 +++
> > > >  migration.c                   |  6 +++++
> > > >  savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++--
> > > >  3 files changed, 70 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > index 00255b8..69e776c 100644
> > > > --- a/include/migration/migration.h
> > > > +++ b/include/migration/migration.h
> > > > @@ -92,6 +92,10 @@ struct MigrationIncomingState {
> > > >      QemuThread     fault_thread;
> > > >      QemuSemaphore  fault_thread_sem;
> > > >  
> > > > +    bool           have_listen_thread;
> > > 
> > > AFAICT have_listen_thread is never set to a value other than 'true',
> > > so there doesn't see much point to it.
> > 
> > It's tested by qemu_loadvm_state to avoid doing cleaning stuff up as it exits,
> > since the listen thread is still using it.
> 
> Right, but I couldn't see under what circumstances it would ever be
> false at the test false in qemu_loadvm_state().

In a normal pre-copy case where we don't have a listen-thread.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-11-20 11:45     ` Dr. David Alan Gilbert
@ 2014-11-21 12:01       ` Paolo Bonzini
  2014-11-21 12:07         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: Paolo Bonzini @ 2014-11-21 12:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy



On 20/11/2014 12:45, Dr. David Alan Gilbert wrote:
> > For this case QEMU has atomic_read/atomic_set (corresponding to
> > __ATOMIC_RELAXED in C/C++1x), so you could use those as well.
>
> Ah, so those look like they just volatile cast anyway.

Yeah, but it explicitly shows that the assignment is a) for a
multi-threaded operation b) using relaxed semantics.  It attaches the
information to the use instead of the variable; it just happens that
volatile is the pre-C11 way to express those.

Paolo

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-11-21 12:01       ` Paolo Bonzini
@ 2014-11-21 12:07         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-21 12:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 20/11/2014 12:45, Dr. David Alan Gilbert wrote:
> > > For this case QEMU has atomic_read/atomic_set (corresponding to
> > > __ATOMIC_RELAXED in C/C++1x), so you could use those as well.
> >
> > Ah, so those look like they just volatile cast anyway.
> 
> Yeah, but it explicitly shows that the assignment is a) for a
> multi-threaded operation b) using relaxed semantics.  It attaches the
> information to the use instead of the variable; it just happens that
> volatile is the pre-C11 way to express those.

OK, I'll use those anyway; Ideally what I'd have is a way to mark
something so that it'd compile-time-fail if I didn't use an atomic_
on it, because it's the type of thing that I'm bound to forget somewhere.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-11-21  3:48 ` zhanghailiang
  2014-11-21 10:14   ` Dr. David Alan Gilbert
@ 2014-11-21 18:56   ` Andrea Arcangeli
  2014-11-24  8:25     ` zhanghailiang
  1 sibling, 1 reply; 204+ messages in thread
From: Andrea Arcangeli @ 2014-11-21 18:56 UTC (permalink / raw)
  To: zhanghailiang
  Cc: yamahata, lilei, quintela, cristian.klein,
	Dr. David Alan Gilbert (git),
	qemu-devel, amit.shah, yanghy

On Fri, Nov 21, 2014 at 11:48:03AM +0800, zhanghailiang wrote:
> Hi David,
> 
> When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
> It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,
> 
> Is it a bug of userfaultfd API?

It's not userfaultfd related, but it's remap_anon_pages related (in
the future mcopy_atomic or equivalent userfaultfd cmd) and
MADV_DONTNEED related.

If the destination qemu starts with mlockall(current|future), -EEXIST
saves the day by noticing all not yet transferred pages were already
present in the destination (as allocated zero pages). We can't trigger
non-present faults (in userfaultfd) if the dst starts with mlockall.

Furthermore if precopy has been run before postcopy (currently it's
always the case as there's no way to specify the number of precopy
passes to run before starting postcopy... in turn allowing to specify
zero passes) the bitmap with the re-dirtied pages must be transferred
to the destination before postcopy can start, and MADV_DONTNEED has to
be used to zap those re-dirtied pages. But MADV_DONTNEED will fail
with -EINVAL too well before postcopy starts if mlockall is set on the
destination qemu.

If you didn't fail at -EINVAL in the destination MADV_DONTNEED
probably there wasn't any redirtied page.

remap_anon_pages is extremely strict (unlike vma-mangling mremap that
would just zap the dst range vma silently if it existed) so it cannot
overwrite the guest memory and you get EEXIST (the strictness was
intentional to eliminate the risk of any memory corruption if userland
hits a bug like in this case).

But it should have failed before with MADV_DONTNEED returning -EINVAL
if there was any re-redirted page between the last precopy pass and
postcopy (I assume the guest was idle?).

In short I think to fix this qemu should call mlockall in the
destination only after postcopy is complete. There's no way to lock
the memory in the destination if the memory still resides in the
source so some userfault may have to happen (and if userfaults happen,
it means we're ot mlocked yet).

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-11-21 10:14   ` Dr. David Alan Gilbert
@ 2014-11-24  8:10     ` zhanghailiang
  0 siblings, 0 replies; 204+ messages in thread
From: zhanghailiang @ 2014-11-24  8:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

On 2014/11/21 18:14, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi David,
>>
>> When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
>> It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,
>>
>> Is it a bug of userfaultfd API?
>
> Thanks.
>
>> cc: Andrea
>>
>> reproduce Steps:
>> Source:
>> qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
>> -machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
>> -hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :11 -monitor stdio
>>
>> Destination:
>> qemu-postcopy/qemu # x86_64-softmmu/qemu-system-x86_64 -msg timestamp=on \
>> -machine pc-i440fx-2.2,accel=kvm -m 1024 -realtime mlock=on -smp 4 \
>> -hda /mnt/sdb/pure_IMG/redhat/redhat-6.4-httpd.img -vnc :12 -monitor stdio \
>> -incoming unix:/mnt/migrate.sock
>> (1) migrate_set_capability x-postcopy-ram on
>> (2) migrate -d unix:/mnt/migrate.sock
>>
>> In Destination, it fails, reports:
>> savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: 0
>> savevm@2040988668 qemu_loadvm_state loop: section_type=6
>> savevm@2040988668 loadvm_postcopy_ram_handle_advise
>> postcopy_ram_hosttest: remap_anon_pages not available: File exists
>> savevm@2040988668 qemu_loadvm_state_main QEMU_VM_COMMAND ret: -1
>
> Yes, I think I need to chat to Andrea about how that's supposed to work with mlock.
> I've added it to my list and we'll figure it out; I suspect on the destination
> I need to avoid doing the mlockall until after postcopy completes.
>
>> And one more thing, i want to know: ;)
>> Why we must start precopy first before start postcopy?
>> Can we do postcopy at the beginning of migration?
>
> You can send migrate_start_postcopy immediately after you send the migrate
> command, which is very close to no-precopy; the original API had a timeout
> and if you set it to 0 then it would do exactly no-precopy, but the current API
> was preferred by reviewers, and is simpler.
> With testing, the best performance is from doing one full pass of precopy and
> then starting postcopy; that way all of the kernel and other static stuff
> has already moved to the destination, and there are much fewer page requests.
>

Got it, :) Thanks.

> Thanks for the report,
>
> Dave
>
>>
>> Thanks,
>> zhanghailiang
>>
>> On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> Hi,
>>>    This is the 4th cut of my version of postcopy; it is designed for use with
>>> the Linux kernel additions just posted by Andrea Arcangeli here:
>>>
>>> http://marc.info/?l=linux-kernel&m=141235633015100&w=2
>>>
>>> (Note: This is a new version compared to my previous postcopy patchset; you'll
>>> need to update the kernel to the new version.)
>>>
>>> Other than the new kernel ABI (which is only a small change to the userspace side);
>>> the major changes are;
>>>
>>>    a) Code for host page size != target page size
>>>    b) Support for migration over fd
>>>       From Cristian Klein; this is for libvirt support which Cristian recently
>>>       posted to the libvirt list.
>>>    c) It's now build bisectable and builds on 32bit
>>>
>>> Testing wise; I've now done many thousand of postcopy migrations without
>>> failure (both of idle and busy guests); so it seems pretty solid.
>>>
>>> Must-TODO's:
>>>    1) A partially repeatable migration_cancel failure
>>>    2) virt_test's migrate.with_reboot test is failing
>>>    3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>>>      the source feels like it needs looking at for postcopy.
>>>    4) Paolo's comments with respect to the wakeup_request/is_running code
>>>       in the migration thread
>>>    5) xbzrle needs disabling once in postcopy
>>>
>>> Later-TODO's:
>>>    1) Control the rate of background page transfers during postcopy to
>>>       reduce their impact on the latency of postcopy requests.
>>>    2) Work with RDMA
>>>    3) Could destination RP be made blocking (as per discussion with Paolo;
>>>       I'm still worried that that changes too many assumptions)
>>>
>>>
>>>
>>> V4:
>>>    Initial support for host page size != target page size
>>>      - tested heavily on hps==tps
>>>      - only partially tested on hps!=tps systems
>>>      - This involved quite a bit of rework around the discard code
>>>    Updated to new kernel userfault ABI
>>>      - It won't work with the previous version
>>>    Fix mis-optimisation of postcopy request for wrong RAMBlock
>>>       request for block A offset n
>>>       un-needed fault for block B/m (already received - no req sent)
>>>       request for block B/l  - wrongly sent as request for A/l
>>>    Fix thinko in discard bitmap processing (missed last word of bitmap)
>>>       Symptom: remap failures near the top of RAM if postcopy started late
>>>    Fix bug that caused kernel page acknowledgments to be misaligned
>>>       May have meant the guest was paused for longer than required
>>>    Fix potential for crashing cleaning up failed RP
>>>    Fixes in docs (from Yang)
>>>    Handle migration by fd as sockets if they are sockets
>>>    Build tested on 32bit
>>>    Fully build bisectable (x86-64)
>>>
>>>
>>> Dave
>>>
>>> Cristian Klein (1):
>>>    Handle bi-directional communication for fd migration
>>>
>>> Dr. David Alan Gilbert (46):
>>>    QEMUSizedBuffer based QEMUFile
>>>    Tests: QEMUSizedBuffer/QEMUBuffer
>>>    Start documenting how postcopy works.
>>>    qemu_ram_foreach_block: pass up error value, and down the ramblock
>>>      name
>>>    improve DPRINTF macros, add to savevm
>>>    Add qemu_get_counted_string to read a string prefixed by a count byte
>>>    Create MigrationIncomingState
>>>    socket shutdown
>>>    Provide runtime Target page information
>>>    Return path: Open a return path on QEMUFile for sockets
>>>    Return path: socket_writev_buffer: Block even on non-blocking fd's
>>>    Migration commands
>>>    Return path: Control commands
>>>    Return path: Send responses from destination to source
>>>    Return path: Source handling of return path
>>>    qemu_loadvm errors and debug
>>>    ram_debug_dump_bitmap: Dump a migration bitmap as text
>>>    Rework loadvm path for subloops
>>>    Add migration-capability boolean for postcopy-ram.
>>>    Add wrappers and handlers for sending/receiving the postcopy-ram
>>>      migration messages.
>>>    QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>>>    migrate_init: Call from savevm
>>>    Allow savevm handlers to state whether they could go into postcopy
>>>    postcopy: OS support test
>>>    migrate_start_postcopy: Command to trigger transition to postcopy
>>>    MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>>>    qemu_savevm_state_complete: Postcopy changes
>>>    Postcopy page-map-incoming (PMI) structure
>>>    Postcopy: Maintain sentmap and calculate discard
>>>    postcopy: Incoming initialisation
>>>    postcopy: ram_enable_notify to switch on userfault
>>>    Postcopy: Postcopy startup in migration thread
>>>    Postcopy: Create a fault handler thread before marking the ram as
>>>      userfault
>>>    Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>>>    Page request: Process incoming page request
>>>    Page request: Consume pages off the post-copy queue
>>>    Add assertion to check migration_dirty_pages
>>>    postcopy_ram.c: place_page and helpers
>>>    Postcopy: Use helpers to map pages during migration
>>>    qemu_ram_block_from_host
>>>    Don't sync dirty bitmaps in postcopy
>>>    Host page!=target page: Cleanup bitmaps
>>>    Postcopy; Handle userfault requests
>>>    Start up a postcopy/listener thread ready for incoming page data
>>>    postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>>>    End of migration for postcopy
>>>
>>>   Makefile.objs                    |    2 +-
>>>   arch_init.c                      |  739 +++++++++++++++++++++++++--
>>>   docs/migration.txt               |  189 +++++++
>>>   exec.c                           |   76 ++-
>>>   hmp-commands.hx                  |   15 +
>>>   hmp.c                            |    7 +
>>>   hmp.h                            |    1 +
>>>   include/exec/cpu-common.h        |    8 +-
>>>   include/migration/migration.h    |  130 +++++
>>>   include/migration/postcopy-ram.h |  106 ++++
>>>   include/migration/qemu-file.h    |   47 ++
>>>   include/migration/vmstate.h      |    2 +-
>>>   include/qemu/sockets.h           |    1 +
>>>   include/qemu/typedefs.h          |    9 +-
>>>   include/sysemu/sysemu.h          |   43 +-
>>>   migration-fd.c                   |   24 +-
>>>   migration-rdma.c                 |    4 +-
>>>   migration.c                      |  693 +++++++++++++++++++++++++-
>>>   postcopy-ram.c                   | 1016 ++++++++++++++++++++++++++++++++++++++
>>>   qapi-schema.json                 |   14 +-
>>>   qemu-file.c                      |  598 +++++++++++++++++++++-
>>>   qmp-commands.hx                  |   19 +
>>>   savevm.c                         |  881 +++++++++++++++++++++++++++++++--
>>>   tests/Makefile                   |    2 +-
>>>   tests/test-vmstate.c             |   74 +--
>>>   util/qemu-sockets.c              |   28 ++
>>>   26 files changed, 4550 insertions(+), 178 deletions(-)
>>>   create mode 100644 include/migration/postcopy-ram.h
>>>   create mode 100644 postcopy-ram.c
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
  2014-11-21 18:56   ` Andrea Arcangeli
@ 2014-11-24  8:25     ` zhanghailiang
  0 siblings, 0 replies; 204+ messages in thread
From: zhanghailiang @ 2014-11-24  8:25 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: yamahata, lilei, quintela, cristian.klein,
	Dr. David Alan Gilbert (git),
	qemu-devel, amit.shah, yanghy

On 2014/11/22 2:56, Andrea Arcangeli wrote:
> On Fri, Nov 21, 2014 at 11:48:03AM +0800, zhanghailiang wrote:
>> Hi David,
>>
>> When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
>> It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,
>>
>> Is it a bug of userfaultfd API?
>
> It's not userfaultfd related, but it's remap_anon_pages related (in
> the future mcopy_atomic or equivalent userfaultfd cmd) and
> MADV_DONTNEED related.
>
> If the destination qemu starts with mlockall(current|future), -EEXIST
> saves the day by noticing all not yet transferred pages were already
> present in the destination (as allocated zero pages). We can't trigger
> non-present faults (in userfaultfd) if the dst starts with mlockall.
>
> Furthermore if precopy has been run before postcopy (currently it's
> always the case as there's no way to specify the number of precopy
> passes to run before starting postcopy... in turn allowing to specify
> zero passes) the bitmap with the re-dirtied pages must be transferred
> to the destination before postcopy can start, and MADV_DONTNEED has to
> be used to zap those re-dirtied pages. But MADV_DONTNEED will fail
> with -EINVAL too well before postcopy starts if mlockall is set on the
> destination qemu.
>
> If you didn't fail at -EINVAL in the destination MADV_DONTNEED
> probably there wasn't any redirtied page.
>
> remap_anon_pages is extremely strict (unlike vma-mangling mremap that
> would just zap the dst range vma silently if it existed) so it cannot
> overwrite the guest memory and you get EEXIST (the strictness was
> intentional to eliminate the risk of any memory corruption if userland
> hits a bug like in this case).
>
> But it should have failed before with MADV_DONTNEED returning -EINVAL
> if there was any re-redirted page between the last precopy pass and
> postcopy (I assume the guest was idle?).
>

You are right ;)

> In short I think to fix this qemu should call mlockall in the
> destination only after postcopy is complete. There's no way to lock
> the memory in the destination if the memory still resides in the
> source so some userfault may have to happen (and if userfaults happen,
> it means we're ot mlocked yet).
>

Got it, so this problem should be fixed in qemu. Thanks for your explanation.

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-10-04 16:27   ` Paolo Bonzini
  2014-11-20 11:45     ` Dr. David Alan Gilbert
  2014-11-20 17:12     ` Dr. David Alan Gilbert
@ 2014-11-24 18:26     ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-24 18:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 185 insertions(+), 19 deletions(-)
> > 

> > @@ -915,16 +1007,36 @@ static void await_outgoing_return_path_close(MigrationState *ms)
> >  static void *migration_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> > +    /* Used by the bandwidth calcs, updated later */
> >      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >      int64_t initial_bytes = 0;
> >      int64_t max_size = 0;
> >      int64_t start_time = initial_time;
> > +
> >      bool old_vm_running = false;
> >  
> > +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > +    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
> > +
> >      qemu_savevm_state_begin(s->file, &s->params);
> >  
> > +    if (migrate_postcopy_ram()) {
> > +        /* Now tell the dest that it should open it's end so it can reply */
> > +        qemu_savevm_send_openrp(s->file);
> > +
> > +        /* And ask it to send an ack that will make stuff easier to debug */
> > +        qemu_savevm_send_reqack(s->file, 1);
> > +
> > +        /* Tell the destination that we *might* want to do postcopy later;
> > +         * if the other end can't do postcopy it should fail now, nice and
> > +         * early.
> > +         */
> > +        qemu_savevm_send_postcopy_ram_advise(s->file);
> > +    }
> 
> Should this be done here or in the save_state_begin function for RAM?
> In general, I'm curious if there are parts of postcopy_start that
> could/should be changed into new save state functions (with
> postcopy_start just iterating on all devices).

The contents of this 'if' are generic to whatever is being postcopied,
(and as per one of your other comments the _ram_ has been removed from
the send_postcopy_ram_advise); so I think this is the right place for it.

> >      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > +    current_active_type = MIG_STATE_ACTIVE;
> >      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
> >  
> >      DPRINTF("setup complete\n");
> > @@ -945,37 +1057,74 @@ static void *migration_thread(void *opaque)
> >                      " nonpost=%" PRIu64 ")\n",
> >                      pending_size, max_size, pend_post, pend_nonpost);
> >              if (pending_size && pending_size >= max_size) {
> > +                /* Still a significant amount to transfer */
> > +
> > +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +                if (migrate_postcopy_ram() &&
> > +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> > +                    pend_nonpost == 0 && s->start_postcopy) {
> > +
> > +                    if (!postcopy_start(s)) {
> > +                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
> > +                    }
> > +
> > +                    continue;
> > +                }
> > +                /* Just another iteration step */
> >                  qemu_savevm_state_iterate(s->file);
> >              } else {
> >                  int ret;
> >  
> > -                DPRINTF("done iterating\n");
> > -                qemu_mutex_lock_iothread();
> > -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> > -                old_vm_running = runstate_is_running();
> > +                DPRINTF("done iterating pending size %" PRIu64 "\n",
> > +                        pending_size);
> > +
> > +                if (s->state == MIG_STATE_ACTIVE) {
> > +                    qemu_mutex_lock_iothread();
> > +                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> > +                    old_vm_running = runstate_is_running();
> > +
> > +                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > +                    if (ret >= 0) {
> > +                        qemu_file_set_rate_limit(s->file, INT64_MAX);
> > +                        qemu_savevm_state_complete(s->file);
> > +                    }
> > +                    qemu_mutex_unlock_iothread();
> > +
> > +                    if (ret < 0) {
> > +                        migrate_set_state(s, current_active_type,
> > +                                          MIG_STATE_ERROR);
> > +                        break;
> > +                    }
> > +                } else if (s->state == MIG_STATE_POSTCOPY_ACTIVE) {
> > +                    DPRINTF("postcopy end\n");
> > +
> > +                    qemu_savevm_state_postcopy_complete(s->file);
> > +                    DPRINTF("postcopy end after complete\n");
> >  
> > -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > -                if (ret >= 0) {
> > -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> > -                    qemu_savevm_state_complete(s->file);
> >                  }
> > -                qemu_mutex_unlock_iothread();
> >  
> > -                if (ret < 0) {
> > -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> > -                    break;
> > +                /*
> > +                 * If rp was opened we must clean up the thread before
> > +                 * cleaning everything else up.
> > +                 * Postcopy opens rp if enabled (even if it's not avtivated)
> > +                 */
> > +                if (migrate_postcopy_ram()) {
> > +                    DPRINTF("before rp close");
> > +                    await_outgoing_return_path_close(s);
> 
> Should this be done even if there is an error?  Perhaps move it
> altogether out of the big migration thread while() loop?

Yes, I've made a note of that; I need to go and look at more error
cases to see where it makes sense (e.g. the one above), however
in the non-error case I do want it to wait here for the 'SHUT' from
the destination to indicate that the destination believes migration
completed correctly (or not), and that should happen before
the state gets set to COMPLETED because we're waiting.

> > +                    DPRINTF("after rp close");
> >                  }
> > -
> >                  if (!qemu_file_get_error(s->file)) {
> > -                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
> > +                    migrate_set_state(s, current_active_type,
> > +                                      MIG_STATE_COMPLETED);
> >                      break;
> >                  }
> 
> This "else" is huge, can you extract it into its own function?

Done, and all the changes inside this "else" I've moved into a separate
commit that deals with the end rather than the start of postcopy.

Dave

> 
> >              }
> >          }
> >  
> >          if (qemu_file_get_error(s->file)) {
> > -            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
> > +            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
> > +            DPRINTF("migration_thread: file is in error state\n");
> >              break;
> >          }
> >          current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > @@ -1006,6 +1155,7 @@ static void *migration_thread(void *opaque)
> >          }
> >      }
> >  
> > +    DPRINTF("migration_thread: After loop");
> >      qemu_mutex_lock_iothread();
> >      if (s->state == MIG_STATE_COMPLETED) {
> >          int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > @@ -1043,6 +1193,19 @@ void migrate_fd_connect(MigrationState *s)
> >      /* Notify before starting migration thread */
> >      notifier_list_notify(&migration_state_notifiers, s);
> >  
> > +    /* Open the return path; currently for postcopy but other things might
> > +     * also want it.
> > +     */
> > +    if (migrate_postcopy_ram()) {
> > +        if (open_outgoing_return_path(s)) {
> > +            error_report("Unable to open return-path for postcopy");
> > +            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
> > +            migrate_fd_cleanup(s);
> > +            return;
> > +        }
> > +    }
> > +
> >      qemu_thread_create(&s->thread, "migration", migration_thread, s,
> >                         QEMU_THREAD_JOINABLE);
> > +    s->started_migration_thread = true;
> >  }
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
  2014-11-03  3:47     ` David Gibson
@ 2014-11-25 15:44       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 15:44 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, lilei, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy, zhanghailiang

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Thu, Oct 16, 2014 at 04:26:55PM +0800, zhanghailiang wrote:
> > On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:
> > >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> [snip]
> 
> > >+        case MIG_RPCOMM_ACK:
> > >+            tmp32 = be32_to_cpup((uint32_t *)buf);
> > >+            DPRINTF("RP: Received ACK 0x%x", tmp32);
> > >+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
> > 
> > I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used for?;)
> 
> Also, you don't appear to use tmp32 after that point, so what's the
> reason for the exchange, rather than just an assignment?

I've killed the 'latest_ack' off; I've kept the DPRINTF (and might turn it into
a trace).

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-11-03  2:39   ` David Gibson
@ 2014-11-25 16:13     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 16:13 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:12PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > and use it in loadvm_state.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/qemu-file.h |  2 ++
> >  qemu-file.c                   | 15 +++++++++++++++
> >  savevm.c                      | 18 ++++++++++--------
> >  3 files changed, 27 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> > index 6ef8ebc..a8cac7a 100644
> > --- a/include/migration/qemu-file.h
> > +++ b/include/migration/qemu-file.h
> > @@ -300,4 +300,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
> >  {
> >      qemu_get_be64s(f, (uint64_t *)pv);
> >  }
> > +
> > +int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);
> 
> I'd suggest writing the prototype as
> 
> int qemu_get_counted_string(QEMUFile *f, uint8_t buf[256]);
> 
> The compiled code will be identical, of course, but it helps to
> document what the function expects.

Good idea; done.

> >  #endif
> > diff --git a/qemu-file.c b/qemu-file.c
> > index ccc516c..a057b3e 100644
> > --- a/qemu-file.c
> > +++ b/qemu-file.c
> > @@ -879,6 +879,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
> >      return v;
> >  }
> >  
> > +/*
> > + * Get a string whose length is determined by a single preceding byte
> > + * A preallocated 256 byte buffer must be passed in.
> > + * Returns: 0 on success and a 0 terminated string in the buffer
> > + */
> > +int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
> > +{
> > +    unsigned int len = qemu_get_byte(f);
> > +    int res = qemu_get_buffer(f, buf, len);
> > +
> > +    buf[len] = 0;
> > +
> > +    return res != len;
> > +}
> > +
> >  #define QSB_CHUNK_SIZE      (1 << 10)
> >  #define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
> >  
> > diff --git a/savevm.c b/savevm.c
> > index c3a1f68..cb6f0de 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
> >  
> >      v = qemu_get_be32(f);
> >      if (v == QEMU_VM_FILE_VERSION_COMPAT) {
> > -        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
> > +        error_report("SaveVM v2 format is obsolete and don't work anymore");
> 
> These changes of fprintf() to error_report() look like an unrelated
> cleanup.

Not quite;   with the use of qemu_get_counted_string it can return an error,
so I check it, and use error_report; while I was there I converted
a couple of the surrounding fprintf's to error_report at the same time.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy
  2014-11-13  3:01   ` David Gibson
@ 2014-11-25 16:25     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 16:25 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:48PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> > 
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >      sent again, which the destination wouldn't allow (since it might
> >      have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >      marked dirtied although the memory contents are apparently identical
> 
> It would be nice to understand how that happened.

Yes, I'd come to the conclusion it was a device that was prodding about in user
memory space even though it should have stopped, although I hadn't gone and
traced them down.

Dave

> 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Otherwise,
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test
  2014-11-04  1:40   ` David Gibson
@ 2014-11-25 17:34     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 17:34 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:31PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Provide a check to see if the OS we're running on has all the bits
> > needed for postcopy.
> > 
> > Creates postcopy-ram.c which will get most of the other helpers we need.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  Makefile.objs                    |   2 +-
> >  include/migration/postcopy-ram.h |  19 +++++
> >  postcopy-ram.c                   | 160 +++++++++++++++++++++++++++++++++++++++
> >  savevm.c                         |   6 ++
> >  4 files changed, 186 insertions(+), 1 deletion(-)
> >  create mode 100644 include/migration/postcopy-ram.h
> >  create mode 100644 postcopy-ram.c
> > 
> > diff --git a/Makefile.objs b/Makefile.objs
> > index 97db978..fa0a3a0 100644
> > --- a/Makefile.objs
> > +++ b/Makefile.objs
> > @@ -54,7 +54,7 @@ common-obj-y += qemu-file.o
> >  common-obj-$(CONFIG_RDMA) += migration-rdma.o
> >  common-obj-y += qemu-char.o #aio.o
> >  common-obj-y += block-migration.o
> > -common-obj-y += page_cache.o xbzrle.o
> > +common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
> >  
> >  common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
> >  
> > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > new file mode 100644
> > index 0000000..dcd1afa
> > --- /dev/null
> > +++ b/include/migration/postcopy-ram.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * Postcopy migration for RAM
> > + *
> > + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> > + *
> > + * Authors:
> > + *  Dave Gilbert  <dgilbert@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +#ifndef QEMU_POSTCOPY_RAM_H
> > +#define QEMU_POSTCOPY_RAM_H
> > +
> > +/* Return 0 if the host supports everything we need to do postcopy-ram */
> > +int postcopy_ram_hosttest(void);
> 
> Maybe postcopy_supported_by_host() would be a bit clearer?

I went with postcopy_ram_supported_by_host
(and flipped the sense so it returns true if it's supported)

> [snip]
> > +#ifdef HOST_X86_64
> > + /* NOTE: These are Andrea's 3.15.0 world */
> 
> I thought the usual approach in qemu was to import the updated headers
> first in a separate patch, rather than embeddeding new defines.

Yes, those will sort themselves out when the syscalls land in the kernel and
then we can import the headers.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration
  2014-11-13  2:53   ` David Gibson
@ 2014-11-25 18:14     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 18:14 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:46PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > In postcopy, the destination guest is running at the same time
> > as it's receiving pages; as we receive new pages we must put
> > them into the guests address space atomically to avoid a running
> > CPU accessing a partially written page.
> > 
> > Use the helpers in postcopy-ram.c to map these pages.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 87 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 2f4345a..0ba627b 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -1458,9 +1458,20 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
> >      return 0;
> >  }
> >  
> > +/*
> > + * Read a RAMBlock ID from the stream f, find the host address of the
> > + * start of that block and add on 'offset'
> > + *
> > + * f: Stream to read from
> > + * mis: MigrationIncomingState
> > + * offset: Offset within the block
> > + * flags: Page flags (mostly to see if it's a continuation of previous block)
> > + * rb: Pointer to RAMBlock* that gets filled in with the RB we find
> > + */
> >  static inline void *host_from_stream_offset(QEMUFile *f,
> > +                                            MigrationIncomingState *mis,
> >                                              ram_addr_t offset,
> > -                                            int flags)
> > +                                            int flags, RAMBlock **rb)
> >  {
> >      static RAMBlock *block = NULL;
> >      char id[256];
> > @@ -1471,8 +1482,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
> >              error_report("Ack, bad migration stream!");
> >              return NULL;
> >          }
> > +        if (rb) {
> > +            *rb = block;
> > +        }
> >  
> > -        return memory_region_get_ram_ptr(block->mr) + offset;
> > +        goto gotit;
> 
> This is an ugly use of goto - it looks kind of like the exception
> handling goto idiom, but it's not.  I think it would be nicer to make
> the code fragment after gotit into a helper function.

Indeed; I've added the helper.

> >      }
> >  
> >      len = qemu_get_byte(f);
> > @@ -1480,12 +1494,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
> >      id[len] = 0;
> >  
> >      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> > -        if (!strncmp(id, block->idstr, sizeof(id)))
> > -            return memory_region_get_ram_ptr(block->mr) + offset;
> > +        if (!strncmp(id, block->idstr, sizeof(id))) {
> > +            if (rb) {
> > +                *rb = block;
> > +            }
> > +            goto gotit;
> > +        }
> >      }
> >  
> >      error_report("Can't find block %s!", id);
> >      return NULL;
> > +
> > +gotit:
> > +    postcopy_hook_early_receive(mis,
> > +        (offset + (*rb)->offset) >> TARGET_PAGE_BITS);
> > +    return memory_region_get_ram_ptr(block->mr) + offset;
> > +
> >  }
> >  
> >  /*
> > @@ -1515,6 +1539,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      ram_addr_t addr;
> >      int flags, ret = 0;
> >      static uint64_t seq_iter;
> > +    /*
> > +     * System is running in postcopy mode, page inserts to host memory must be
> > +     * atomic
> > +     */
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    bool postcopy_running = mis->postcopy_ram_state >=
> > +                            POSTCOPY_RAM_INCOMING_LISTENING;
> >  
> >      seq_iter++;
> >  
> > @@ -1523,6 +1554,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      }
> >  
> >      while (!ret) {
> > +        RAMBlock *rb = 0; /* =0 needed to silence compiler */
> >          addr = qemu_get_be64(f);
> >  
> >          flags = addr & ~TARGET_PAGE_MASK;
> > @@ -1570,7 +1602,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              void *host;
> >              uint8_t ch;
> >  
> > -            host = host_from_stream_offset(f, addr, flags);
> > +            host = host_from_stream_offset(f, mis, addr, flags, &rb);
> >              if (!host) {
> >                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> >                  ret = -EINVAL;
> > @@ -1578,20 +1610,66 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              }
> >  
> >              ch = qemu_get_byte(f);
> > -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            if (!postcopy_running) {
> > +                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            } else {
> > +                if (!ch) {
> > +                    ret = postcopy_place_zero_page(mis, host,
> > +                              (addr + rb->offset) >> TARGET_PAGE_BITS);
> > +                } else {
> > +                    void *tmp;
> > +                    tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
> > +                                                      TARGET_PAGE_BITS);
> > +
> > +                    if (!tmp) {
> > +                        return -ENOMEM;
> > +                    }
> > +                    memset(tmp, ch, TARGET_PAGE_SIZE);
> > +                    ret = postcopy_place_page(mis, host, tmp,
> > +                              (addr + rb->offset) >> TARGET_PAGE_BITS);
> > +                }
> > +                if (ret) {
> > +                    error_report("ram_load: Failure in postcopy compress @"
> > +                                 "%zx/%p;%s+%zx",
> > +                                 addr, host, rb->idstr, rb->offset);
> > +                    return ret;
> > +                }
> > +            }
> 
> Might be nicer to fold this logic into ram_handle_compressed(), since
> there's no obvious reason it should not be used for the postcopy path.

Hmm, that would be true, except ram_handle_compressed is also called from
the RDMA code (that postcopy doesn't yet support) and when it does it's
data path might be a bit different as well.

> >          } else if (flags & RAM_SAVE_FLAG_PAGE) {
> >              void *host;
> >  
> > -            host = host_from_stream_offset(f, addr, flags);
> > +            host = host_from_stream_offset(f, mis, addr, flags, &rb);
> >              if (!host) {
> >                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> >                  ret = -EINVAL;
> >                  break;
> >              }
> >  
> > -            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> > +            if (!postcopy_running) {
> > +                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> > +            } else {
> > +                void *tmp = postcopy_get_tmp_page(mis, (addr + rb->offset) >>
> > +                                                        TARGET_PAGE_BITS);
> > +
> > +                if (!tmp) {
> > +                    return -ENOMEM;
> > +                }
> > +                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
> > +                ret = postcopy_place_page(mis, host, tmp,
> > +                          (addr + rb->offset) >> TARGET_PAGE_BITS);
> > +                if (ret) {
> > +                    error_report("ram_load: Failure in postcopy simple"
> > +                                 "@%zx/%p;%s+%zx",
> > +                                 addr, host, rb->idstr, rb->offset);
> > +                    return ret;
> > +                }
> > +            }
> >          } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
> > -            void *host = host_from_stream_offset(f, addr, flags);
> > +            if (postcopy_running) {
> > +                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
> > +                return -EINVAL;
> > +            }
> 
> Hrm, there doesn't seem like an inherent reason XBZRLE shouldn't be
> possible in postcopy.  Obviously a temporary buffer would be
> necessary.

This is only disabling it in the postcopy stage; so the precopy stage at the beginning
still uses it.   In postcopy, we only ever send a page once, so we won't be sending
the page and then sending an XBZRLE fixup for it.   If the page was already sent
in precopy then it won't get sent again.

Dave

> 
> > +            void *host = host_from_stream_offset(f, mis, addr, flags, &rb);
> >              if (!host) {
> >                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> >                  ret = -EINVAL;
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host
  2014-11-13  2:59   ` David Gibson
@ 2014-11-25 18:55     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 18:55 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:47PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy sends RAMBlock names and offsets over the wire (since it can't
> > rely on the order of ramaddr being the same), and it starts out with
> > HVA fault addresses from the kernel.
> > 
> > qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
> > in the RAMBlock, the global ram_addr_t value and it's bitmap position.
> 
> s/it's/its/

fixed.

> I find most of the passing around of bitmap positions confusing in
> this patch series.  Would it make things simpler if you broke up the
> bitmap into (aligned) per-ramblock chunks.  Then the offset would
> determine the bitmap position, which is easier to understand since it
> has an "inherent" meaning outside of the secondary data structure used
> to track things.

Yes it does get very confusing; there are two halves to the question though,
source and destination.

   source: I didn't really want to change the way the existing migration
      structures work here; and my 'sent' bitmap is very similar to the
      migration bitmap.   I think the reason that this is a single bitmap
      on the source is to make it easy/fast to search the bitmap for
      dirty pages; I don't know if there's any more detailed history behind
      it.

   destination:  It might be possible to make that change; although I need to
      think about it; I'm going to need to keep a mapping similar to the RAMBlock
      list to get my data structure, or tack an individual PMI data onto each
      RAMBlock.

Let me add that to a list to think about.

> > Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.
> > 
> > Provide qemu_ram_get_idstr since it's the actual name text sent on the
> > wire.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  exec.c                    | 56 ++++++++++++++++++++++++++++++++++++++++++-----
> >  include/exec/cpu-common.h |  4 ++++
> >  2 files changed, 55 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 65ee612..07722b3 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1246,6 +1246,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
> >      return NULL;
> >  }
> >  
> > +const char *qemu_ram_get_idstr(RAMBlock *rb)
> > +{
> > +    return rb->idstr;
> > +}
> > +
> >  void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
> >  {
> >      RAMBlock *new_block = find_ram_block(addr);
> > @@ -1603,16 +1608,35 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
> >      }
> >  }
> >  
> > -/* Some of the softmmu routines need to translate from a host pointer
> > -   (typically a TLB entry) back to a ram offset.  */
> > -MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
> > +/*
> > + * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
> > + * in that RAMBlock.
> > + *
> > + * ptr: Host pointer to look up
> > + * round_offset: If true round the result offset down to a page boundary
> > + * *ram_addr: set to result ram_addr
> > + * *offset: set to result offset within the RAMBlock
> > + * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
> > + *                          isn't available)
> > + *
> > + * Returns: RAMBlock (or NULL if not found)
> > + */
> > +RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> > +                                   ram_addr_t *ram_addr,
> > +                                   ram_addr_t *offset,
> > +                                   unsigned long *bm_index)
> >  {
> >      RAMBlock *block;
> >      uint8_t *host = ptr;
> >  
> >      if (xen_enabled()) {
> >          *ram_addr = xen_ram_addr_from_mapcache(ptr);
> > -        return qemu_get_ram_block(*ram_addr)->mr;
> > +        block = qemu_get_ram_block(*ram_addr);
> > +        if (!block) {
> > +            return NULL;
> > +        }
> > +        *offset = (host - block->host);
> > +        return block;
> >      }
> >  
> >      block = ram_list.mru_block;
> > @@ -1633,7 +1657,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
> >      return NULL;
> >  
> >  found:
> > -    *ram_addr = block->offset + (host - block->host);
> > +    *offset = (host - block->host);
> > +    if (round_offset) {
> > +        *offset &= TARGET_PAGE_MASK;
> > +    }
> 
> This seems clumsy.  Surely the caller can apply the mask itself it it
> wants that.

That's true for what gets returned, but we're about to use that value again;
although does that ever matter; I need to think about it.

> > +    *ram_addr = block->offset + *offset;
> > +    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
> > +    return block;
> > +}
> > +
> > +/* Some of the softmmu routines need to translate from a host pointer
> > +   (typically a TLB entry) back to a ram offset.  */
> > +MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
> > +{
> > +    RAMBlock *block;
> > +    ram_addr_t offset; /* Not used */
> > +    unsigned long index; /* Not used */
> > +
> > +    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset, &index);
> > +
> > +    if (!block) {
> > +        return NULL;
> > +    }
> > +
> >      return block->mr;
> >  }
> >  
> > diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> > index 8042f50..ae25407 100644
> > --- a/include/exec/cpu-common.h
> > +++ b/include/exec/cpu-common.h
> > @@ -55,8 +55,12 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
> >  void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
> >  /* This should not be used by devices.  */
> >  MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
> > +RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> > +                                   ram_addr_t *ram_addr, ram_addr_t *offset,
> > +                                   unsigned long *bm_index);
> >  void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
> >  void qemu_ram_unset_idstr(ram_addr_t addr);
> > +const char *qemu_ram_get_idstr(RAMBlock *rb);
> >  
> >  void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
> >                              int len, int is_write);
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy
  2014-11-21  6:58       ` David Gibson
@ 2014-11-25 19:58         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-25 19:58 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Nov 19, 2014 at 05:53:54PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:30PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Use that to split the qemu_savevm_state_pending counts into postcopiable
> > > > and non-postcopiable amounts
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  arch_init.c                 |  7 +++++++
> > > >  include/migration/vmstate.h |  2 +-
> > > >  include/sysemu/sysemu.h     |  4 +++-
> > > >  migration.c                 |  9 ++++++++-
> > > >  savevm.c                    | 23 +++++++++++++++++++----
> > > >  5 files changed, 38 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch_init.c b/arch_init.c
> > > > index 6970733..44072d8 100644
> > > > --- a/arch_init.c
> > > > +++ b/arch_init.c
> > > > @@ -1192,6 +1192,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > > >      return ret;
> > > >  }
> > > >  
> > > > +/* RAM's always up for postcopying */
> > > > +static bool ram_can_postcopy(void *opaque)
> > > > +{
> > > > +    return true;
> > > > +}
> > > > +
> > > >  static SaveVMHandlers savevm_ram_handlers = {
> > > >      .save_live_setup = ram_save_setup,
> > > >      .save_live_iterate = ram_save_iterate,
> > > > @@ -1199,6 +1205,7 @@ static SaveVMHandlers savevm_ram_handlers = {
> > > >      .save_live_pending = ram_save_pending,
> > > >      .load_state = ram_load,
> > > >      .cancel = ram_migration_cancel,
> > > > +    .can_postcopy = ram_can_postcopy,
> > > 
> > > Is there actually any plausible device for which you'd need a callback
> > > here, rather than just having a static bool?
> > > 
> > > On the other hand, it does seem kind of plausible that there might be
> > > situations in which some data from a device must be pre-copied, but
> > > more can be post-copied, which would necessitate extending the
> > > per-handler callback to return quantities for both.
> > 
> > It's cheap enough and I couldn't make a strong argument about
> > any possible device, so I just used the function.
> 
> Ok.  I still wonder if it might be better to instead extend
> the save_live_pending callback in order to return both
> non-postcopyable and postcopyable quantites.  It allows for the case
> of a postcopyable device which has some non-postcopyable data - and
> with any postcopyable device other than RAM, it seems likely that
> there will need to be some precopied metadata at least.  Plus it
> avoids adding another callback.

There are two separate suggestions there - which I'll address in opposite order:
  1) Extending save_live_pending callback to avoid a new callback
    can_postcopy is used for a few diferent decisions; not just where
    to add the pending value to; so I don't think extending s_l_p saves
    the need for the other callback

  2) Allowing a device to do both pre and postcopy
    Yeh I can see you could theoretically have a device where that would be
    useful; but for reasonably small chunks of metadata it already gets
    the chance to send those during the _begin call.  I'll make the change
    to save_live_pending's parameters to let this work.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
  2014-11-21  6:53       ` David Gibson
@ 2014-12-11 14:47         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-11 14:47 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Nov 19, 2014 at 05:50:11PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:25PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Postcopy needs to have two migration streams loading concurrently;
> > > > one from memory (with the device state) and the other from the fd
> > > > with the memory transactions.
> > > > 
> > > > Split the core of qemu_loadvm_state out so we can use it for both.
> > > > 
> > > > Allow the inner loadvm loop to quit and signal whether the parent
> > > > should.
> > > > 
> > > > loadvm_handlers is made static since it's lifetime is greater
> > > > than the outer qemu_loadvm_state.
> > > 
> > > Maybe it's just me, but "made static" to me indicates either a change
> > > from fully-global to module-global, or (function) local automatic to
> > > local static, not a change from function local-automatic to
> > > module-global as here.
> > > 
> > > It's also not clear from this patch alone why the lifetime of
> > > loadvm_handlers now needs to exceed that of qemu_loadvm_state().
> > 
> > OK, how about if I reworked that last sentence to be:
> > 
> >    loadvm_handlers is made module-global to survive beyond the lifetime
> >    of the outer qemu_loadvm_state since it may still be in use by
> >    a subloop in the postcopy listen thread.
> 
> Yeah, that's better.  A global seems ugly though.  Would it be better
> to dynamically allocate the list head and pass a pointer into the
> listen thread, or even to pass the list head by value into the listen
> thread.
> 
> The individual list elements need to be cleaned up at some point
> anyway, so I don't think that introduces any lifetime questions that
> weren't already there.

I've moved the loadvm_handlers out into the MigrationIncomingState
structure, and free them when that is deallocated at the end of migration.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-11-03  5:51   ` David Gibson
@ 2014-12-17 14:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-17 14:50 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:27PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add state variable showing current incoming postcopy state.
> 
> This appears to implement a lot more than just adding a state variable...

It's clearer with the title line;  I've reworded it to:

    Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
    
    The state of the postcopy process is managed via a series of messages;
       * Add wrappers and handlers for sending/receiving these messages
       * Add state variable that track the current state of postcopy

> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   8 +
> >  include/sysemu/sysemu.h       |  20 +++
> >  savevm.c                      | 335 ++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 363 insertions(+)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 0d9f62d..2c078c4 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -61,6 +61,14 @@ typedef struct MigrationState MigrationState;
> >  struct MigrationIncomingState {
> >      QEMUFile *file;
> >  
> > +    volatile enum {
> 
> What's the reason for the volatile?  I think that really needs a comment.

I've removed it and replaced it by atomic_ functions to access it;
the state is accessed from multiple threads.

> > +        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
> > +        POSTCOPY_RAM_INCOMING_ADVISE,
> > +        POSTCOPY_RAM_INCOMING_LISTENING,
> > +        POSTCOPY_RAM_INCOMING_RUNNING,
> > +        POSTCOPY_RAM_INCOMING_END
> > +    } postcopy_ram_state;
> > +
> >      QEMUFile *return_path;
> >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >  };

<snip>

> > diff --git a/savevm.c b/savevm.c
> > index 7236232..b942e8c 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -39,6 +39,7 @@
> >  #include "exec/memory.h"
> >  #include "qmp-commands.h"
> >  #include "trace.h"
> > +#include "qemu/bitops.h"
> >  #include "qemu/iov.h"
> >  #include "block/snapshot.h"
> >  #include "block/qapi.h"
> > @@ -624,6 +625,92 @@ void qemu_savevm_send_openrp(QEMUFile *f)
> >  {
> >      qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
> >  }
> > +
> > +/* Send prior to any RAM transfer */
> > +void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
> > +{
> > +    DPRINTF("send postcopy-ram-advise");
> > +    uint64_t tmp[2];
> > +    tmp[0] = cpu_to_be64(sysconf(_SC_PAGESIZE));
> > +    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
> > +
> > +    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 16,
> > +                             (uint8_t *)tmp);
> > +}
> > +
> > +/* Prior to running, to cause pages that have been dirtied after precopy
> > + * started to be discarded on the destination.
> > + * CMD_POSTCOPY_RAM_DISCARD consist of:
> > + *  3 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
> > + *      byte   version (0)
> > + *      byte   offset into the 1st data word containing 1st page of RAMBlock
> 
> I'm not able to follow what that description means.

I've reworded it as:
 *      byte   offset to be subtracted from each page address to deal with
 *             RAMBlocks that don't start on a mask word boundary.

(I've tried reworking this protocol a few times, the offset still seems
to work out easiest, otherwise you end up having to treat the address entries
as potentially signed).

<snip>

> > +    /*
> > +     * Postcopy will be sending lots of small messages along the return path
> > +     * that it needs quick answers to.
> > +     */
> > +    socket_set_nodelay(qemu_get_fd(mis->return_path));
> 
> So, here you break the QEMUFile abstraction and assume you have a
> socket.

Ah yes; I put that in to see if it would help.  Hmm; I've taken it out for now,
I need to see if there's a sensible way to add the abstration.

> > +    while (len) {
> > +        uint64_t startaddr;
> > +        uint32_t mask;
> > +        /*
> > +         * We now have pairs of address, mask
> > +         *   The mask is 32 bits of bitmask starting at 'startaddr'-offset
> > +         *   RAMBlock; e.g. if the RAMBlock started at 8k where TPS=4k
> > +         *   then first_bit_offset=2 and the 1st 2 bits of the mask
> > +         *   aren't relevant to this RAMBlock, and bit 2 corresponds
> > +         *   to the 1st page of this RAMBlock
> 
> Um.. yeah.. can't make much snse of this comment either.

Well, at least it's the one that corresponds to the comment above you couldn't
understand either.
I've reworded it to:
         * We now have pairs of address, mask
         *   Each word of mask is 32 bits, where each bit corresponds to one
         *   target page.
         *   RAMBlocks don't necessarily start on word boundaries, 
         *   and the offset in the header indicates the offset into the 1st
         *   mask word that corresponds to the 1st page of the RAMBlock.

<snip>

> > +/* After this message we must be able to immediately receive page data */
> 
> The purpose of the listen message from a protocol point of view isn't
> really clear to me.  I understand why the destination needs to set up
> the postcopy handling before processing the device data, but why does
> it need an assertion from the source to start this, rather than just
> an internal detail on the load path.

The device migration format isn't structured well enough to allow the
receiver to know the size of the device data without parsing it all.
The parsing isn't structured, so there's no way to do that without
calling all the device code that may read from guest memory.
We solve that by sending all the device data in the CMD_PACKAGED.

While we could assume that a CMD_PACKAGED is always for postcopy data,
instead I don't special case the processing at all of the data inside
the package, keeping that general.  Then the listen message is used
inside that package to start the listener thread off and set up the other
associated state.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes
  2014-11-04  2:18   ` David Gibson
@ 2014-12-17 16:14     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-17 16:14 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:34PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When postcopy calls qemu_savevm_state_complete it's not really
> > the end of migration, so skip:
> >    a) Finishing postcopiable iterative devices - they'll carry on
> >    b) The termination byte on the end of the stream.
> > 
> > We then also add:
> >   qemu_savevm_state_postcopy_complete
> > which is called at the end of a postcopy migration to call the
> > complete methods on devices skipped in the _complete call.
> 
> So, we should probably rename qemu_savevm_state_complete() to reflect
> the fact that it's no longer actually a completion, but just the
> transition from pre-copy to post-copy phases.  A good, brief name
> doesn't immediately occur to me, unfortunately.

Well it's still completion in the non-postcopy case; if you do think
of a good obvious name then I'd be happy to change it.
(Another way would be to add aparameter to qemu_savevm_state_complete
to make it do one or the other, but some of the conditions in it were
already a bit hairy).

> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> >  savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 52 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index e7ff3d0..46665ce 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -113,6 +113,7 @@ void qemu_savevm_state_cancel(void);
> >  void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> >                                 uint64_t *res_non_postcopiable,
> >                                 uint64_t *res_postcopiable);
> > +void qemu_savevm_state_postcopy_complete(QEMUFile *f);
> >  void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
> >                                uint16_t len, uint8_t *data);
> >  void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
> > diff --git a/savevm.c b/savevm.c
> > index a0cb88b..7c4541d 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -854,10 +854,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
> >      return ret;
> >  }
> >  
> > +/*
> > + * Calls the complete routines just for those devices that are postcopiable;
> > + * causing the last few pages to be sent immediately and doing any associated
> > + * cleanup.
> > + * Note postcopy also calls the plain qemu_savevm_state_complete to complete
> > + * all the other devices, but that happens at the point we switch to postcopy.
> > + */
> > +void qemu_savevm_state_postcopy_complete(QEMUFile *f)
> > +{
> > +    SaveStateEntry *se;
> > +    int ret;
> > +
> > +    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
> > +        if (!se->ops || !se->ops->save_live_complete ||
> > +            !se->ops->can_postcopy) {
> 
> So, you check for the presence of a can_postcopy callback, but you
> don't ever actually invoke it.

Thanks, fixed.

> > +            continue;
> > +        }
> > +        if (se->ops && se->ops->is_active) {
> > +            if (!se->ops->is_active(se->opaque)) {
> > +                continue;
> > +            }
> > +        }
> > +        trace_savevm_section_start(se->idstr, se->section_id);
> > +        /* Section type */
> > +        qemu_put_byte(f, QEMU_VM_SECTION_END);
> > +        qemu_put_be32(f, se->section_id);
> > +
> > +        ret = se->ops->save_live_complete(f, se->opaque);
> 
> I'm wondering if it might be clearer not to overload the
> save_live_complete hook, but instead allow both "execution transition"
> (old complete) and "final complete" (postcopy complete) hooks
> (expecting only one to be non-NULL in most cases).

Note that I only call save_live_complete once for any one device.
Non-postcopied devices get done in the original loop, postcopied
devices in the new ones, so from the point of view of a device
it's still a complete.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard
  2014-11-05  6:38   ` David Gibson
@ 2014-12-17 16:48     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-17 16:48 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:36PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Where postcopy is preceeded by a period of precopy, the destination will
> > have received pages that may have been dirtied on the source after the
> > page was sent.  The destination must throw these pages away before
> > starting it's CPUs.
> > 
> > Maintain a 'sentmap' of pages that have already been sent.
> > Calculate list of sent & dirty pages
> > Provide helpers on the destination side to discard these.
> 
> I find this one really hard to wrap my head around, and I'm having
> trouble putting my finger on why.

It seems to turn out fairly compact, and a lot of the time the
bitmap we're trying to transfer is very sparse, but seems to be
a little clumpy.

> I do wonder if the "base + tiny bitmap" encodinng for the discard list
> over the wire is the best choice.  It seems to involve a bunch of
> rather tedious code rejigging the bitmap into 32-bit chunks, and a
> bunch of rather hard to follow code moving back and forth between that
> encoding and simple address or page ranges for handling the actual
> discards.  It also involves sending the bit offsets for the start of
> each ram block over the wire, which feels like it should be an
> internal detail.

Yes the fiddling into 32bit chunks is a bit messier than it was originally;
the problem here is that the migration bitmap (for no apparent reason)
uses 'long' so the complexity all comes from having the internal structure
be flexible and wanting not to pass that flexibility onto the wire.

> Would just a simple list of start..end or start/len pairs end up
> simpler overall?  Converting the bitmap used to track it on the
> source into ranges would be a little fiddly, but I suspect less so
> than the code to split into 32-bit pieces.

In the end you have to traverse that bitmap somewhere, since I already
had the data in the form of a bitmap it seemed reasonable to take advantage
of it as a compact representation.

> It might also be a bit more robust against possible future options for
> source host vs. dest host vs. target page size, since the source can
> construct it in terms if its granularity constraints, and destination
> can round each chunk out to its own granularity.

That already happens here as the destination reconsistutes the addresses
from the bitmap.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation
  2014-11-05  6:47   ` David Gibson
@ 2014-12-17 17:21     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-17 17:21 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:37PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c                      |  11 ++++
> >  include/migration/migration.h    |   1 +
> >  include/migration/postcopy-ram.h |  12 +++++
> >  migration.c                      |   1 +
> >  postcopy-ram.c                   | 110 ++++++++++++++++++++++++++++++++++++++-
> >  savevm.c                         |   4 ++
> >  6 files changed, 138 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 030d189..4a03171 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -1345,6 +1345,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> >      }
> >  }
> >  
> > +/*
> > + * Allocate data structures etc needed by incoming migration with postcopy-ram
> > + * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
> > + */
> > +int ram_postcopy_incoming_init(MigrationIncomingState *mis)
> > +{
> > +    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> > +
> > +    return postcopy_ram_incoming_init(mis, ram_pages);
> > +}
> 
> Um.. yeah.  I'm sure ram_postcopy_incoming_init versus
> postcopy_ram_incoming_init won't get confusing o_O.

agreed; that's why I put the comments on.  My problem here is that:
  1) last_ram_offset() comes from code that's poisoned so it can't be built in
     a target independent file
  2) I'd managed so far (with a couple of hacks) to keep postcopy_ram.c
     target independent.
  3) ram_ is the prefix for external names in arch_init.c
  4) postcopy_ram_ is the prefix for external names in postcopy_ram.c

If I threw in the towel and made postcopy_ram target dependent it would
remove the need for that wrapper; it might be the best bet.
(Other naming suggestions also welcome)

> [snip]
> > +/*
> > + * Setup an area of RAM so that it *can* be used for postcopy later; this
> > + * must be done right at the start prior to pre-copy.
> > + * opaque should be the MIS.
> > + */
> > +static int init_area(const char *block_name, void *host_addr,
> > +                     ram_addr_t offset, ram_addr_t length, void *opaque)
> > +{
> > +    MigrationIncomingState *mis = opaque;
> > +
> > +    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
> > +            block_name, host_addr, offset, length, length);
> > +    /*
> > +     * We need the whole of RAM to be truly empty for postcopy, so things
> > +     * like ROMs and any data tables built during init must be zero'd
> > +     * - we're going to get the copy from the source anyway.
> > +     */
> > +    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
> > +        return -1;
> > +    }
> > +
> > +    /*
> > +     * We also need the area to be normal 4k pages, not huge pages
> > +     * (otherwise we can't be sure we can use remap_anon_pages to put
> > +     * a 4k page in later).  THP might come along and map a 2MB page
> > +     * and when it's partially accessed in precopy it might not break
> > +     * it down, but leave a 2MB zero'd page.
> > +     */
> > +    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
> > +        perror("init_area: NOHUGEPAGE");
> > +        return -1;
> > +    }
> 
> I'm assuming this is because remap_anon_pages() can't automatically
> split a THP itself.  It's not immediately obvious to me why it can't
> though.

No, I think this restriction stems from two things:
   1) remap_anon_pages not allowing us to map into an area that's already
   got a page present - it's a good protection mechanism against us
   doing something stupid and receiving a page that we already received
   and the destination is busy accessing.

   2) We wouldn't want THP to decide to convert a page that we'd only
   partially received into a HP because we wouldn't then receive userfault
   messages for it.

(Although it might be best to check with Andrea).
(1) might disappear with the modifications to replace remap_anon_pages
that Andrea is working on.

> Also.. what effect will this have on an actual hugetlbfs memory
> region?  If there's code to handle that case I haven't spotted it yet.

I wouldn't expect this code to work with hugetlbfs mappings.

> > +
> > +    return 0;
> > +}
> > +
> > +/*
> > + * At the end of migration, undo the effects of init_area
> > + * opaque should be the MIS.
> > + */
> > +static int cleanup_area(const char *block_name, void *host_addr,
> > +                        ram_addr_t offset, ram_addr_t length, void *opaque)
> > +{
> > +    /* Turn off userfault here as well? */
> 
> This comment appears to be obsoleted by the code below.

Thanks; I've squashed it.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2014-11-13  3:10   ` David Gibson
@ 2014-12-17 18:21     ` Dr. David Alan Gilbert
  2015-01-27  4:50       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2014-12-17 18:21 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:49PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Prior to the start of postcopy, ensure that everything that will
> > be transferred later is a whole host-page in size.
> > 
> > This is accomplished by discarding partially transferred host pages
> > and marking any that are partially dirty as fully dirty.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 111 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 1fe4fab..aac250c 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
> >   * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
> >   * messier for 64; the bitmaps are actually long's that are 32 or 64bit
> >   */
> > -__attribute__ (( unused )) /* Until later in patch series */
> >  static void put_32bits_map(unsigned long *map, int64_t start,
> >                             uint32_t v)
> >  {
> > @@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
> >  }
> >  
> >  /*
> > + * Utility for the outgoing postcopy code.
> > + *
> > + * Discard any partially sent host-page size chunks, mark any partially
> > + * dirty host-page size chunks as all dirty.
> > + *
> > + * Returns: 0 on success
> > + */
> > +static int postcopy_chunk_hostpages(MigrationState *ms)
> > +{
> > +    struct RAMBlock *block;
> > +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> > +    uint32_t host_mask;
> > +
> > +    /* Should be a power of 2 */
> > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> > +    /*
> > +     * If the host_bits isn't a division of 32 (the minimum long size)
> > +     * then the code gets a lot more complex; disallow for now
> > +     * (I'm not aware of a system where it's true anyway)
> > +     */
> > +    assert((32 % host_bits) == 0);
> 
> This assert makes the first one redundant.

True I guess, removed the power of 2 check.

<snip>

> > +/*
> >   * Transmit the set of pages to be discarded after precopy to the target
> >   * these are pages that have been sent previously but have been dirtied
> >   * Hopefully this is pretty sparse
> >   */
> >  int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> >  {
> > +    int ret;
> > +
> >      /* This should be our last sync, the src is now paused */
> >      migration_bitmap_sync();
> >  
> > +    /* Deal with TPS != HPS */
> > +    ret = postcopy_chunk_hostpages(ms);
> > +    if (ret) {
> > +        return ret;
> > +    }
> 
> This really seems like a bogus thing to be doing on the outgoing
> migration side.  Doesn't the host page size constraint come from the
> destination (due to the need to atomically instate pages).  Source
> host page size == destination host page size doesn't seem like it
> should be an inherent constraint

It's not an inherent constraint; it just makes life messier. I had
some code to deal with it but it complicates things even more, and
I've not got anything to test that rare case with; if someone is
desperate for it then it can be added.

> and it's not clear why you can't do
> this rounding out to host page sized chunks on the receive end.

The source keeps track of which pages still need sending, and so
has to update that list when it tells the destination to perform
a discard.

If the destination discards more than the source told it to (for
example because it has bigger host-pages) the source would need
to update it's map of the pages that still need sending.


Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
  2014-11-10  6:05   ` David Gibson
@ 2015-01-05 16:06     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-05 16:06 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:39PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Rework the migration thread to setup and start postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration.c                   | 201 ++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 185 insertions(+), 19 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index b01cc17..f401775 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h

> > +    DPRINTF("postcopy_start: sending req 2\n");
> > +    qemu_savevm_send_reqack(ms->file, 2);
> 
> Are these reqacks just for debugging, or do they affect the protocol?

Just for debugging; comment added.  They just make it easy to line
traces up between the source and destination, and also make it easy to figure
out how far stuff has got if it jams up.

> > +    if (migrate_postcopy_ram()) {
> > +        /* Now tell the dest that it should open it's end so it can reply */
> 
> s/it's/its/

Fixed.

> > +        qemu_savevm_send_openrp(s->file);
> > +
> > +        /* And ask it to send an ack that will make stuff easier to debug */
> > +        qemu_savevm_send_reqack(s->file, 1);
> > +
> > +        /* Tell the destination that we *might* want to do postcopy later;
> > +         * if the other end can't do postcopy it should fail now, nice and
> > +         * early.
> > +         */
> > +        qemu_savevm_send_postcopy_ram_advise(s->file);
> > +    }
> > +
> >      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > +    current_active_type = MIG_STATE_ACTIVE;
> >      migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
> >  
> >      DPRINTF("setup complete\n");
> > @@ -945,37 +1057,74 @@ static void *migration_thread(void *opaque)
> >                      " nonpost=%" PRIu64 ")\n",
> >                      pending_size, max_size, pend_post, pend_nonpost);
> >              if (pending_size && pending_size >= max_size) {
> > +                /* Still a significant amount to transfer */
> > +
> > +                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +                if (migrate_postcopy_ram() &&
> > +                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
> > +                    pend_nonpost == 0 && s->start_postcopy) {
> 
> Hrm.  This is checking for pend_nonpost == 0, rather than just close
> to zero.  IIUC this will only work if all "live sendable" state is
> also postcopyable.  But if we have live sendable data that's not
> postcopyable - like the power hash page table - we'll need some
> threshold here, like we currently have for entering the stopped vm
> phase of a precopy migration.
> 
> Or am I missing something?

Hmm, I think you're right; I've changed this to:
	pend_nonpost <= max_size 

so that it's the same cut-off logic as the normal end-of-migrate;
I think that will work; i.e. it gets small enough to be expected
to complete quickly in the _complete phase that's at the start
of postcopy.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
  2014-11-13  3:23   ` David Gibson
@ 2015-01-05 17:13     ` Dr. David Alan Gilbert
  2015-01-27  4:33       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-05 17:13 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > userfaultfd is a Linux syscall that gives an fd that receives a stream
> > of notifications of accesses to pages marked as MADV_USERFAULT, and
> > allows the program to acknowledge those stalls and tell the accessing
> > thread to carry on.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> [snip]
> >  /*
> > + * Tell the kernel that we've now got some memory it previously asked for.
> > + * Note: We're not allowed to ack a page which wasn't requested.
> > + */
> > +static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
> > +{
> > +    uint64_t tmp[2];
> > +
> > +    /*
> > +     * Kernel wants the range that's now safe to access
> > +     * Note it always takes 64bit values, even on a 32bit host.
> > +     */
> > +    tmp[0] = (uint64_t)(uintptr_t)start;
> > +    tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
> > +
> > +    if (write(mis->userfault_fd, tmp, 16) != 16) {
> > +        int e = errno;
> 
> Is an EOF (i.e. write() returns 0) ever possible here?  If so errno
> may not have a meaningful value.

I don't think so; I think any !=16 case is an error; however if I understand
correctly the safe thing to do is for me to do an 

errno = 0

before the call.

> 
> > +        if (e == ENOENT) {
> > +            /* Kernel said it wasn't waiting - one case where this can
> > +             * happen is where two threads triggered the userfault
> > +             * and we receive the page and ack it just after we received
> > +             * the 2nd request and that ends up deciding it should ack it
> > +             * We could optimise it out, but it's rare.
> > +             */
> > +            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
> > +            return 0;
> > +        }
> > +        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
> > +                     start, len, e);
> > +        return -errno;

Hmm, and made that    return -e

> > +/*
> >   * Handle faults detected by the USERFAULT markings
> >   */
> >  static void *postcopy_ram_fault_thread(void *opaque)
> >  {
> >      MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> > +    void *hostaddr;
> > +    int ret;
> > +    size_t hostpagesize = getpagesize();
> > +    RAMBlock *rb = NULL;
> > +    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> >  
> > -    fprintf(stderr, "postcopy_ram_fault_thread\n");
> > -    /* TODO: In later patch */
> > +    DPRINTF("%s", __func__);
> >      qemu_sem_post(&mis->fault_thread_sem);
> > -    while (1) {
> > -        /* TODO: In later patch */
> > -    }
> > +    while (true) {
> > +        PostcopyPMIState old_state, tmp_state;
> > +        ram_addr_t rb_offset;
> > +        ram_addr_t in_raspace;
> > +        unsigned long bitmap_index;
> > +        struct pollfd pfd[2];
> > +
> > +        /*
> > +         * We're mainly waiting for the kernel to give us a faulting HVA,
> > +         * however we can be told to quit via userfault_quit_fd which is
> > +         * an eventfd
> > +         */
> > +        pfd[0].fd = mis->userfault_fd;
> > +        pfd[0].events = POLLIN;
> > +        pfd[0].revents = 0;
> > +        pfd[1].fd = mis->userfault_quit_fd;
> > +        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> > +        pfd[1].revents = 0;
> > +
> > +        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> > +            perror("userfault poll");
> > +            break;
> > +        }
> >  
> > +        if (pfd[1].revents) {
> > +            DPRINTF("%s got quit event", __func__);
> > +            break;
> 
> I don't see any cleanup path in the userfault thread.  So wouldn't it
> be simpler to just pthread_cancel() it rather than using an extra fd
> for quit notifications.

But it does call functions that take locks (both the pmi and the
return path qemu-file), so I don't feel comfortable just cancelling the
thread.
I guess I could do a pthread_set_cancelstate around the top of the loop
to only allow it to cancel there; is that any better than the fd?

> > @@ -612,11 +814,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >  
> >      if (syscall(__NR_remap_anon_pages, host, from, hps, 0) !=
> >              getpagesize()) {
> > +        int e = errno;
> >          perror("remap_anon_pages in postcopy_place_page");
> >          fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
> >                  postcopy_pmi_get_state(mis, bitmap_offset));
> >  
> > -        return -errno;
> > +        return -e;
> 
> Unrelated change, should probably be folded into the patch which added
> this code.

Thanks, fixed.

Dave

> 
> >      }
> >  
> >      tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
> > @@ -629,7 +832,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >  
> >  
> >      if (old_state == POSTCOPY_PMI_REQUESTED) {
> > -        /* TODO: Notify kernel */
> > +        /* Send the kernel the host address that should now be accessible */
> > +        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
> > +                __func__, bitmap_offset, host);
> > +        return ack_userfault(mis, host, hps);
> >      }
> >  
> >      return 0;
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2014-11-11  1:13   ` David Gibson
@ 2015-01-14 20:13     ` Dr. David Alan Gilbert
  2015-01-27  4:38       ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-14 20:13 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When transmitting RAM pages, consume pages that have been queued by
> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> > 
> > Note:
> >   a) After a queued page the linear walk carries on from after the
> > unqueued page; there is a reasonable chance that the destination
> > was about to ask for other closeby pages anyway.
> > 
> >   b) We have to be careful of any assumptions that the page walking
> > code makes, in particular it does some short cuts on its first linear
> > walk that break as soon as we do a queued page.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 125 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 72f9e17..a945990 100644
> > +
> > +        /*
> > +         * Don't break host-page chunks up with queue items
> 
> Why does this matter?

See the comment you make in a few patches time, it's about being able
to place the pages atomically on the destination.

> > +         * so only unqueue if,
> > +         *   a) The last item came from the queue anyway
> > +         *   b) The last sent item was the last target-page in a host page
> > +         */
> > +        if (last_was_from_queue || (!last_sent_block) ||
> > +            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
> > +            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
> >          }
> > -        if (offset >= block->length) {
> > -            offset = 0;
> > -            block = QTAILQ_NEXT(block, next);
> > -            if (!block) {
> > -                block = QTAILQ_FIRST(&ram_list.blocks);
> > -                complete_round = true;
> > -                ram_bulk_stage = false;
> > +
> > +        if (tmpblock) {
> > +            /* We've got a block from the postcopy queue */
> > +            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
> > +                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
> > +            /* We're sending this page, and since it's postcopy nothing else
> > +             * will dirty it, and we must make sure it doesn't get sent again.
> > +             */
> > +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> 
> Ugh.. that's kind of subtle.  I think it would be clearer if you work
> in terms of a ram_addr_t throughout, rather than "bitoffset" whose
> meaning is not terribly obvious.

I've changed it to ram_addr_t as requested; it's slightly clearer but there
are a few places where we're dealing with the sentmap where we now need to shift
the other way.  In the end ram_addr_t is really a scaled offset into those
bitmaps.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers
  2014-11-11  1:39   ` David Gibson
@ 2015-01-15 18:14     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-15 18:14 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 03, 2014 at 06:47:45PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > postcopy_place_page (etc) provide a way for postcopy to place a page
> > into guests memory atomically (using the new remap_anon_pages syscall).
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h    |   2 +
> >  include/migration/postcopy-ram.h |  23 +++++++
> >  postcopy-ram.c                   | 145 ++++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 168 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 5bc01d5..58ac7bf 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -96,6 +96,8 @@ struct MigrationIncomingState {
> >      QEMUFile *return_path;
> >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >      PostcopyPMI    postcopy_pmi;
> > +    void          *postcopy_tmp_page;
> > +    long           postcopy_place_skipped; /* Check for incorrect place ops */
> >  };
> >  
> >  MigrationIncomingState *migration_incoming_get_current(void);
> > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > index 413b670..0210491 100644
> > --- a/include/migration/postcopy-ram.h
> > +++ b/include/migration/postcopy-ram.h
> > @@ -80,4 +80,27 @@ void postcopy_discard_send_chunk(MigrationState *ms, PostcopyDiscardState *pds,
> >  void postcopy_discard_send_finish(MigrationState *ms,
> >                                    PostcopyDiscardState *pds);
> >  
> > +/*
> > + * Place a zero'd page of memory at *host
> > + * returns 0 on success
> > + */
> > +int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
> > +                             long bitmap_offset);
> > +
> > +/*
> > + * Place a page (from) at (host) efficiently
> > + *    There are restrictions on how 'from' must be mapped, in general best
> > + *    to use other postcopy_ routines to allocate.
> > + * returns 0 on success
> > + */
> > +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > +                        long bitmap_offset);
> > +
> > +/*
> > + * Allocate a page of memory that can be mapped at a later point in time
> > + * using postcopy_place_page
> > + * Returns: Pointer to allocated page
> > + */
> > +void *postcopy_get_tmp_page(MigrationIncomingState *mis, long bitmap_offset);
> > +
> >  #endif
> > diff --git a/postcopy-ram.c b/postcopy-ram.c
> > index 8b2a035..19d4b20 100644
> > --- a/postcopy-ram.c
> > +++ b/postcopy-ram.c
> > @@ -229,7 +229,6 @@ static PostcopyPMIState postcopy_pmi_get_state_nolock(
> >  }
> >  
> >  /* Retrieve the state of the given page */
> > -__attribute__ (( unused )) /* Until later in patch series */
> >  static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
> >                                                 size_t bitmap_index)
> >  {
> > @@ -245,7 +244,6 @@ static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
> >   * Set the page state to the given state if the previous state was as expected
> >   * Return the actual previous state.
> >   */
> > -__attribute__ (( unused )) /* Until later in patch series */
> >  static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
> >                                             size_t bitmap_index,
> >                                             PostcopyPMIState expected_state,
> > @@ -464,6 +462,7 @@ static int cleanup_area(const char *block_name, void *host_addr,
> >  int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> >  {
> >      postcopy_pmi_init(mis, ram_pages);
> > +    mis->postcopy_place_skipped = -1;
> >  
> >      if (qemu_ram_foreach_block(init_area, mis)) {
> >          return -1;
> > @@ -482,6 +481,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >          return -1;
> >      }
> >  
> > +    if (mis->postcopy_tmp_page) {
> > +        munmap(mis->postcopy_tmp_page, getpagesize());
> > +        mis->postcopy_tmp_page = NULL;
> > +    }
> >      return 0;
> >  }
> >  
> > @@ -551,6 +554,126 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> >      return 0;
> >  }
> >  
> > +/*
> > + * Place a zero'd page of memory at *host
> > + * returns 0 on success
> > + * bitmap_offset: Index into the migration bitmaps
> > + */
> > +int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
> > +                             long bitmap_offset)
> > +{
> > +    void *tmp = postcopy_get_tmp_page(mis, bitmap_offset);
> > +    if (!tmp) {
> > +        return -ENOMEM;
> > +    }
> > +    *(char *)tmp = 0;
> > +    return postcopy_place_page(mis, host, tmp, bitmap_offset);
> > +}
> > +
> > +/*
> > + * Place a target page (from) at (host) efficiently
> > + *    There are restrictions on how 'from' must be mapped, in general best
> > + *    to use other postcopy_ routines to allocate.
> > + * returns 0 on success
> > + * bitmap_offset: Index into the migration bitmaps
> > + *
> > + * Where HPS > TPS it holds off doing the place until the last TP in the HP
> > + *  and assumes (from, host) point to the last TP in a continuous HP
> > + */
> > +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > +                        long bitmap_offset)
> > +{
> > +    PostcopyPMIState old_state, tmp_state;
> > +    size_t hps = sysconf(_SC_PAGESIZE);
> > +
> > +    /* Only place the page when the last target page within the hp arrives */
> > +    if ((bitmap_offset + 1) & (mis->postcopy_pmi.host_bits - 1)) {
> > +        DPRINTF("%s: Skipping incomplete hp host=%p from=%p bitmap_offset=%lx",
> > +                __func__, host, from, bitmap_offset);
> > +        mis->postcopy_place_skipped = bitmap_offset;
> > +        return 0;
> > +    }
> > +
> > +    /*
> > +     * If we skip a page (above) we should end up placing that page before
> > +     * doing anything with other host pages.
> > +     */
> > +    if (mis->postcopy_place_skipped != -1) {
> > +        assert((bitmap_offset & ~(mis->postcopy_pmi.host_bits - 1)) ==
> > +               (mis->postcopy_place_skipped &
> > +                ~(mis->postcopy_pmi.host_bits - 1)));
> > +    }
> > +    mis->postcopy_place_skipped = -1;
> 
> All the above logic seems like you're making assumptions about exactly
> how this function will be invoked which are fragile and a layering
> violation.
> 
> It seems like these lower level functions should work only in host
> pages and have the target->host page consolidation up in the protocol
> handling layer.  Better yet would be to build it into the protocol
> itself that reuqests made by the desination (in destination host page
> chunks) should be answered by the source as a unit, to avoid the
> hassle of splitting and recombining host pages.

I've reworked this for place_page to rebalance it towards the
callers of this code in ram_load; the sending side ensures that
it meets these requirements when in postcopy mode.


<snip>

> > +
> > +/*
> > + * Returns a target page of memory that can be mapped at a later point in time
> > + * using postcopy_place_page
> > + * The same address is used repeatedly, postcopy_place_page just takes the
> > + * backing page away.
> 
> The same address might be re-used, but I don't see anything that
> actually makes that happen.

The virtual allocation is never freed, and so the address returned here is always
the same once initially allocated.  The physical page churns around though
as the one used is put into place.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
  2015-01-05 17:13     ` Dr. David Alan Gilbert
@ 2015-01-27  4:33       ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2015-01-27  4:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 5002 bytes --]

On Mon, Jan 05, 2015 at 05:13:50PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > userfaultfd is a Linux syscall that gives an fd that receives a stream
> > > of notifications of accesses to pages marked as MADV_USERFAULT, and
> > > allows the program to acknowledge those stalls and tell the accessing
> > > thread to carry on.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > [snip]
> > >  /*
> > > + * Tell the kernel that we've now got some memory it previously asked for.
> > > + * Note: We're not allowed to ack a page which wasn't requested.
> > > + */
> > > +static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
> > > +{
> > > +    uint64_t tmp[2];
> > > +
> > > +    /*
> > > +     * Kernel wants the range that's now safe to access
> > > +     * Note it always takes 64bit values, even on a 32bit host.
> > > +     */
> > > +    tmp[0] = (uint64_t)(uintptr_t)start;
> > > +    tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
> > > +
> > > +    if (write(mis->userfault_fd, tmp, 16) != 16) {
> > > +        int e = errno;
> > 
> > Is an EOF (i.e. write() returns 0) ever possible here?  If so errno
> > may not have a meaningful value.
> 
> I don't think so; I think any !=16 case is an error; however if I understand
> correctly the safe thing to do is for me to do an 
> 
> errno = 0
> 
> before the call.

Either that, or handle unexpected EOF / short write as a different case.

> 
> > 
> > > +        if (e == ENOENT) {
> > > +            /* Kernel said it wasn't waiting - one case where this can
> > > +             * happen is where two threads triggered the userfault
> > > +             * and we receive the page and ack it just after we received
> > > +             * the 2nd request and that ends up deciding it should ack it
> > > +             * We could optimise it out, but it's rare.
> > > +             */
> > > +            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
> > > +            return 0;
> > > +        }
> > > +        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
> > > +                     start, len, e);
> > > +        return -errno;
> 
> Hmm, and made that    return -e

Ah, yes, otherwise it's very likely that error_report() will clobber
the value.

> > > +/*
> > >   * Handle faults detected by the USERFAULT markings
> > >   */
> > >  static void *postcopy_ram_fault_thread(void *opaque)
> > >  {
> > >      MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> > > +    void *hostaddr;
> > > +    int ret;
> > > +    size_t hostpagesize = getpagesize();
> > > +    RAMBlock *rb = NULL;
> > > +    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> > >  
> > > -    fprintf(stderr, "postcopy_ram_fault_thread\n");
> > > -    /* TODO: In later patch */
> > > +    DPRINTF("%s", __func__);
> > >      qemu_sem_post(&mis->fault_thread_sem);
> > > -    while (1) {
> > > -        /* TODO: In later patch */
> > > -    }
> > > +    while (true) {
> > > +        PostcopyPMIState old_state, tmp_state;
> > > +        ram_addr_t rb_offset;
> > > +        ram_addr_t in_raspace;
> > > +        unsigned long bitmap_index;
> > > +        struct pollfd pfd[2];
> > > +
> > > +        /*
> > > +         * We're mainly waiting for the kernel to give us a faulting HVA,
> > > +         * however we can be told to quit via userfault_quit_fd which is
> > > +         * an eventfd
> > > +         */
> > > +        pfd[0].fd = mis->userfault_fd;
> > > +        pfd[0].events = POLLIN;
> > > +        pfd[0].revents = 0;
> > > +        pfd[1].fd = mis->userfault_quit_fd;
> > > +        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> > > +        pfd[1].revents = 0;
> > > +
> > > +        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> > > +            perror("userfault poll");
> > > +            break;
> > > +        }
> > >  
> > > +        if (pfd[1].revents) {
> > > +            DPRINTF("%s got quit event", __func__);
> > > +            break;
> > 
> > I don't see any cleanup path in the userfault thread.  So wouldn't it
> > be simpler to just pthread_cancel() it rather than using an extra fd
> > for quit notifications.
> 
> But it does call functions that take locks (both the pmi and the
> return path qemu-file), so I don't feel comfortable just cancelling the
> thread.

Ah, good point.  Use of an event restrict the points at which the
thread can exit, which is significant.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2015-01-14 20:13     ` Dr. David Alan Gilbert
@ 2015-01-27  4:38       ` David Gibson
  2015-01-27  9:40         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2015-01-27  4:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 3754 bytes --]

On Wed, Jan 14, 2015 at 08:13:27PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > When transmitting RAM pages, consume pages that have been queued by
> > > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> > > 
> > > Note:
> > >   a) After a queued page the linear walk carries on from after the
> > > unqueued page; there is a reasonable chance that the destination
> > > was about to ask for other closeby pages anyway.
> > > 
> > >   b) We have to be careful of any assumptions that the page walking
> > > code makes, in particular it does some short cuts on its first linear
> > > walk that break as soon as we do a queued page.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
> > >  1 file changed, 125 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/arch_init.c b/arch_init.c
> > > index 72f9e17..a945990 100644
> > > +
> > > +        /*
> > > +         * Don't break host-page chunks up with queue items
> > 
> > Why does this matter?
> 
> See the comment you make in a few patches time, it's about being able
> to place the pages atomically on the destination.

Hmm.  But if the destination has to wait for all the pieces of a host
page to arrive anyway, does it really make any difference if they're
contiguous in the stream?

> > > +         * so only unqueue if,
> > > +         *   a) The last item came from the queue anyway
> > > +         *   b) The last sent item was the last target-page in a host page
> > > +         */
> > > +        if (last_was_from_queue || (!last_sent_block) ||
> > > +            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
> > > +            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
> > >          }
> > > -        if (offset >= block->length) {
> > > -            offset = 0;
> > > -            block = QTAILQ_NEXT(block, next);
> > > -            if (!block) {
> > > -                block = QTAILQ_FIRST(&ram_list.blocks);
> > > -                complete_round = true;
> > > -                ram_bulk_stage = false;
> > > +
> > > +        if (tmpblock) {
> > > +            /* We've got a block from the postcopy queue */
> > > +            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
> > > +                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
> > > +            /* We're sending this page, and since it's postcopy nothing else
> > > +             * will dirty it, and we must make sure it doesn't get sent again.
> > > +             */
> > > +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> > 
> > Ugh.. that's kind of subtle.  I think it would be clearer if you work
> > in terms of a ram_addr_t throughout, rather than "bitoffset" whose
> > meaning is not terribly obvious.
> 
> I've changed it to ram_addr_t as requested; it's slightly clearer but there
> are a few places where we're dealing with the sentmap where we now need to shift
> the other way.  In the end ram_addr_t is really a scaled offset into those
> bitmaps.

Right, but to someone who isn't deeply familiar with the code, they're
more likely to understand what the ram address means than the bitmap
offset.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2014-12-17 18:21     ` Dr. David Alan Gilbert
@ 2015-01-27  4:50       ` David Gibson
  2015-01-27 10:04         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 204+ messages in thread
From: David Gibson @ 2015-01-27  4:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 5565 bytes --]

On Wed, Dec 17, 2014 at 06:21:34PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 03, 2014 at 06:47:49PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Prior to the start of postcopy, ensure that everything that will
> > > be transferred later is a whole host-page in size.
> > > 
> > > This is accomplished by discarding partially transferred host pages
> > > and marking any that are partially dirty as fully dirty.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 111 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch_init.c b/arch_init.c
> > > index 1fe4fab..aac250c 100644
> > > --- a/arch_init.c
> > > +++ b/arch_init.c
> > > @@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
> > >   * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
> > >   * messier for 64; the bitmaps are actually long's that are 32 or 64bit
> > >   */
> > > -__attribute__ (( unused )) /* Until later in patch series */
> > >  static void put_32bits_map(unsigned long *map, int64_t start,
> > >                             uint32_t v)
> > >  {
> > > @@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
> > >  }
> > >  
> > >  /*
> > > + * Utility for the outgoing postcopy code.
> > > + *
> > > + * Discard any partially sent host-page size chunks, mark any partially
> > > + * dirty host-page size chunks as all dirty.
> > > + *
> > > + * Returns: 0 on success
> > > + */
> > > +static int postcopy_chunk_hostpages(MigrationState *ms)
> > > +{
> > > +    struct RAMBlock *block;
> > > +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> > > +    uint32_t host_mask;
> > > +
> > > +    /* Should be a power of 2 */
> > > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> > > +    /*
> > > +     * If the host_bits isn't a division of 32 (the minimum long size)
> > > +     * then the code gets a lot more complex; disallow for now
> > > +     * (I'm not aware of a system where it's true anyway)
> > > +     */
> > > +    assert((32 % host_bits) == 0);
> > 
> > This assert makes the first one redundant.
> 
> True I guess, removed the power of 2 check.
> 
> <snip>
> 
> > > +/*
> > >   * Transmit the set of pages to be discarded after precopy to the target
> > >   * these are pages that have been sent previously but have been dirtied
> > >   * Hopefully this is pretty sparse
> > >   */
> > >  int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > >  {
> > > +    int ret;
> > > +
> > >      /* This should be our last sync, the src is now paused */
> > >      migration_bitmap_sync();
> > >  
> > > +    /* Deal with TPS != HPS */
> > > +    ret = postcopy_chunk_hostpages(ms);
> > > +    if (ret) {
> > > +        return ret;
> > > +    }
> > 
> > This really seems like a bogus thing to be doing on the outgoing
> > migration side.  Doesn't the host page size constraint come from the
> > destination (due to the need to atomically instate pages).  Source
> > host page size == destination host page size doesn't seem like it
> > should be an inherent constraint
> 
> It's not an inherent constraint; it just makes life messier. I had
> some code to deal with it but it complicates things even more, and
> I've not got anything to test that rare case with; if someone is
> desperate for it then it can be added.

So, I'm all for deferring implementation improvements that we don't
need for the time being.

What worries me though, is having the source have to make assumptions
about how the migration stream will be processed on the destination
that aren't somehow baked into the protocol itself.  i.e. I think we
should really try to avoid the possibility of migration streams that
are structurally sound, and look like they should be valid, but
aren't, because of subtle constraints in the order and manner in which
the destination needs to process the individual chunks.

> > and it's not clear why you can't do
> > this rounding out to host page sized chunks on the receive end.
> 
> The source keeps track of which pages still need sending, and so
> has to update that list when it tells the destination to perform
> a discard.

Ah.

> If the destination discards more than the source told it to (for
> example because it has bigger host-pages) the source would need
> to update it's map of the pages that still need sending.

I'm beginning to wonder if what we really need is for early in the
migration process the destination to tell the host what granularity of
updates it can handle (based on its page size).

Perhaps the short summary is that I don't think we need to actually
handle the case of different source and dest host page sizes.  BUT,
if that does happen the migration process should be able to detect
that that's what's gone wrong and print a meaningful error, rather
than having the destination blow up part way through and deep the code
because chunk constraints necessary for the dest host page size
haven't been met by the source.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2015-01-27  4:38       ` David Gibson
@ 2015-01-27  9:40         ` Dr. David Alan Gilbert
  2015-01-28  5:33           ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-27  9:40 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Jan 14, 2015 at 08:13:27PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > When transmitting RAM pages, consume pages that have been queued by
> > > > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> > > > 
> > > > Note:
> > > >   a) After a queued page the linear walk carries on from after the
> > > > unqueued page; there is a reasonable chance that the destination
> > > > was about to ask for other closeby pages anyway.
> > > > 
> > > >   b) We have to be careful of any assumptions that the page walking
> > > > code makes, in particular it does some short cuts on its first linear
> > > > walk that break as soon as we do a queued page.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
> > > >  1 file changed, 125 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/arch_init.c b/arch_init.c
> > > > index 72f9e17..a945990 100644
> > > > +
> > > > +        /*
> > > > +         * Don't break host-page chunks up with queue items
> > > 
> > > Why does this matter?
> > 
> > See the comment you make in a few patches time, it's about being able
> > to place the pages atomically on the destination.
> 
> Hmm.  But if the destination has to wait for all the pieces of a host
> page to arrive anyway, does it really make any difference if they're
> contiguous in the stream?

The problem is knowing where to put the arriving target-pages until you've
got a full host-page; you've got to put the arriving TP into a temporary
until you have the full set, if they're not contiguous in the stream
then you have to have multiple temporarys dealing with the set of outstanding
host pages that you've not got the full set for; and you've still got to be
careful on the sending side to have a bounded-number of host-pages on the run
at any time.   Making that bound 1 makes the code simpler.

> > > > +         * so only unqueue if,
> > > > +         *   a) The last item came from the queue anyway
> > > > +         *   b) The last sent item was the last target-page in a host page
> > > > +         */
> > > > +        if (last_was_from_queue || (!last_sent_block) ||
> > > > +            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
> > > > +            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
> > > >          }
> > > > -        if (offset >= block->length) {
> > > > -            offset = 0;
> > > > -            block = QTAILQ_NEXT(block, next);
> > > > -            if (!block) {
> > > > -                block = QTAILQ_FIRST(&ram_list.blocks);
> > > > -                complete_round = true;
> > > > -                ram_bulk_stage = false;
> > > > +
> > > > +        if (tmpblock) {
> > > > +            /* We've got a block from the postcopy queue */
> > > > +            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
> > > > +                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
> > > > +            /* We're sending this page, and since it's postcopy nothing else
> > > > +             * will dirty it, and we must make sure it doesn't get sent again.
> > > > +             */
> > > > +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> > > 
> > > Ugh.. that's kind of subtle.  I think it would be clearer if you work
> > > in terms of a ram_addr_t throughout, rather than "bitoffset" whose
> > > meaning is not terribly obvious.
> > 
> > I've changed it to ram_addr_t as requested; it's slightly clearer but there
> > are a few places where we're dealing with the sentmap where we now need to shift
> > the other way.  In the end ram_addr_t is really a scaled offset into those
> > bitmaps.
> 
> Right, but to someone who isn't deeply familiar with the code, they're
> more likely to understand what the ram address means than the bitmap
> offset.

Fair enough.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2015-01-27  4:50       ` David Gibson
@ 2015-01-27 10:04         ` Dr. David Alan Gilbert
  2015-01-28  5:36           ` David Gibson
  0 siblings, 1 reply; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-27 10:04 UTC (permalink / raw)
  To: David Gibson
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Wed, Dec 17, 2014 at 06:21:34PM +0000, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 03, 2014 at 06:47:49PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Prior to the start of postcopy, ensure that everything that will
> > > > be transferred later is a whole host-page in size.
> > > > 
> > > > This is accomplished by discarding partially transferred host pages
> > > > and marking any that are partially dirty as fully dirty.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 111 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch_init.c b/arch_init.c
> > > > index 1fe4fab..aac250c 100644
> > > > --- a/arch_init.c
> > > > +++ b/arch_init.c
> > > > @@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
> > > >   * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
> > > >   * messier for 64; the bitmaps are actually long's that are 32 or 64bit
> > > >   */
> > > > -__attribute__ (( unused )) /* Until later in patch series */
> > > >  static void put_32bits_map(unsigned long *map, int64_t start,
> > > >                             uint32_t v)
> > > >  {
> > > > @@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
> > > >  }
> > > >  
> > > >  /*
> > > > + * Utility for the outgoing postcopy code.
> > > > + *
> > > > + * Discard any partially sent host-page size chunks, mark any partially
> > > > + * dirty host-page size chunks as all dirty.
> > > > + *
> > > > + * Returns: 0 on success
> > > > + */
> > > > +static int postcopy_chunk_hostpages(MigrationState *ms)
> > > > +{
> > > > +    struct RAMBlock *block;
> > > > +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> > > > +    uint32_t host_mask;
> > > > +
> > > > +    /* Should be a power of 2 */
> > > > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> > > > +    /*
> > > > +     * If the host_bits isn't a division of 32 (the minimum long size)
> > > > +     * then the code gets a lot more complex; disallow for now
> > > > +     * (I'm not aware of a system where it's true anyway)
> > > > +     */
> > > > +    assert((32 % host_bits) == 0);
> > > 
> > > This assert makes the first one redundant.
> > 
> > True I guess, removed the power of 2 check.
> > 
> > <snip>
> > 
> > > > +/*
> > > >   * Transmit the set of pages to be discarded after precopy to the target
> > > >   * these are pages that have been sent previously but have been dirtied
> > > >   * Hopefully this is pretty sparse
> > > >   */
> > > >  int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > > >  {
> > > > +    int ret;
> > > > +
> > > >      /* This should be our last sync, the src is now paused */
> > > >      migration_bitmap_sync();
> > > >  
> > > > +    /* Deal with TPS != HPS */
> > > > +    ret = postcopy_chunk_hostpages(ms);
> > > > +    if (ret) {
> > > > +        return ret;
> > > > +    }
> > > 
> > > This really seems like a bogus thing to be doing on the outgoing
> > > migration side.  Doesn't the host page size constraint come from the
> > > destination (due to the need to atomically instate pages).  Source
> > > host page size == destination host page size doesn't seem like it
> > > should be an inherent constraint
> > 
> > It's not an inherent constraint; it just makes life messier. I had
> > some code to deal with it but it complicates things even more, and
> > I've not got anything to test that rare case with; if someone is
> > desperate for it then it can be added.
> 
> So, I'm all for deferring implementation improvements that we don't
> need for the time being.
> 
> What worries me though, is having the source have to make assumptions
> about how the migration stream will be processed on the destination
> that aren't somehow baked into the protocol itself.  i.e. I think we
> should really try to avoid the possibility of migration streams that
> are structurally sound, and look like they should be valid, but
> aren't, because of subtle constraints in the order and manner in which
> the destination needs to process the individual chunks.

Agreed; see below.

> > > and it's not clear why you can't do
> > > this rounding out to host page sized chunks on the receive end.
> > 
> > The source keeps track of which pages still need sending, and so
> > has to update that list when it tells the destination to perform
> > a discard.
> 
> Ah.
> 
> > If the destination discards more than the source told it to (for
> > example because it has bigger host-pages) the source would need
> > to update it's map of the pages that still need sending.
> 
> I'm beginning to wonder if what we really need is for early in the
> migration process the destination to tell the host what granularity of
> updates it can handle (based on its page size).
> 
> Perhaps the short summary is that I don't think we need to actually
> handle the case of different source and dest host page sizes.  BUT,
> if that does happen the migration process should be able to detect
> that that's what's gone wrong and print a meaningful error, rather
> than having the destination blow up part way through and deep the code
> because chunk constraints necessary for the dest host page size
> haven't been met by the source.

Right; I cut the problem in the opposite direction and made the source
send the destination its page sizes in the 'advise' message and
the destination validates it and spits a:
   Postcopy needs matching host page sizes (s=%d d=%d)"

(That's in 21/47 Add wrappers and handlers....)
It's just a little easier to do it that way rather than having to
make the source wait for the destination.

Dave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
  2014-11-13  3:10   ` David Gibson
@ 2015-01-27 10:20   ` Peter Maydell
  2015-01-27 11:50     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 204+ messages in thread
From: Peter Maydell @ 2015-01-27 10:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: Andrea Arcangeli, yamahata, Lei Li, Juan Quintela,
	cristian.klein, QEMU Developers, Amit Shah, yanghy

On 3 October 2014 at 18:47, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:

Just noticed after writing most of this email that this is an
old patch with new review comments. I'm sending the below
anyway in case some of it is still valid...

>  /*
> + * Utility for the outgoing postcopy code.
> + *
> + * Discard any partially sent host-page size chunks, mark any partially
> + * dirty host-page size chunks as all dirty.
> + *
> + * Returns: 0 on success
> + */
> +static int postcopy_chunk_hostpages(MigrationState *ms)
> +{
> +    struct RAMBlock *block;
> +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;

I'm guessing this won't build on Win32. Can you use getpagesize() ?
We provide a compat wrapper for that in util/ as necessary.

What happens if the TARGET_PAGE_SIZE is larger than the
host page size? (If you want MIN(host page size, TARGET_PAGE_SIZE)
try qemu_host_page_size, see page_size_init()).

> +    uint32_t host_mask;
> +
> +    /* Should be a power of 2 */
> +    assert(host_bits && !(host_bits & (host_bits - 1)));

assert(is_power_of_2(host_bits));

> +    /*
> +     * If the host_bits isn't a division of 32 (the minimum long size)
> +     * then the code gets a lot more complex; disallow for now
> +     * (I'm not aware of a system where it's true anyway)
> +     */
> +    assert((32 % host_bits) == 0);
> +
> +    /* A mask, starting at bit 0, containing host_bits continuous set bits */
> +    host_mask =  (1u << host_bits) - 1;

If the host has 64K pages and the guest TARGET_PAGE_SIZE is 1K
(eg ARM) then this will shift off the end of your uint32_t.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2015-01-27 10:20   ` Peter Maydell
@ 2015-01-27 11:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 204+ messages in thread
From: Dr. David Alan Gilbert @ 2015-01-27 11:50 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Andrea Arcangeli, yamahata, Juan Quintela, cristian.klein,
	QEMU Developers, Amit Shah, yanghy

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 3 October 2014 at 18:47, Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> 
> Just noticed after writing most of this email that this is an
> old patch with new review comments. I'm sending the below
> anyway in case some of it is still valid...

Still useful; I'm just looking at doing a new post soon.

> >  /*
> > + * Utility for the outgoing postcopy code.
> > + *
> > + * Discard any partially sent host-page size chunks, mark any partially
> > + * dirty host-page size chunks as all dirty.
> > + *
> > + * Returns: 0 on success
> > + */
> > +static int postcopy_chunk_hostpages(MigrationState *ms)
> > +{
> > +    struct RAMBlock *block;
> > +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> 
> I'm guessing this won't build on Win32. Can you use getpagesize() ?
> We provide a compat wrapper for that in util/ as necessary.

I'd used sysconf since the manpage of getpagesize() says 'Portable
applications should employ sysconf(_SC_PAGESIZE) instead of getpagesize()'
and I can see there are a couple of other places in qemu that use the
same sysconf; however if it's not on Win32 then yes I'm happy to change
over.

> What happens if the TARGET_PAGE_SIZE is larger than the
> host page size? (If you want MIN(host page size, TARGET_PAGE_SIZE)
> try qemu_host_page_size, see page_size_init()).

Thanks; I hadn't realised that was possible - but yes I should probably
just use qemu_host_page_size  instead of my sysconf in most places.

What happens where the target wants to map a RAMBlock with Target-page-size
alignment?

> > +    uint32_t host_mask;
> > +
> > +    /* Should be a power of 2 */
> > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> 
> assert(is_power_of_2(host_bits));

Thanks; fixed.

> > +    /*
> > +     * If the host_bits isn't a division of 32 (the minimum long size)
> > +     * then the code gets a lot more complex; disallow for now
> > +     * (I'm not aware of a system where it's true anyway)
> > +     */
> > +    assert((32 % host_bits) == 0);
> > +
> > +    /* A mask, starting at bit 0, containing host_bits continuous set bits */
> > +    host_mask =  (1u << host_bits) - 1;
> 
> If the host has 64K pages and the guest TARGET_PAGE_SIZE is 1K
> (eg ARM) then this will shift off the end of your uint32_t.

Gah! That's going to make things a lot hairier; OK, that's going to take
some rework, I'll have a think how.

Note, keep an eye out for the RAM_SAVE_FLAG definitions in arch_init,
they're one-bit-per-type of message (for no good reason) and with a
TPS of 1K there are only a couple spare.

Thanks for the comments,

Dave
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
  2015-01-27  9:40         ` Dr. David Alan Gilbert
@ 2015-01-28  5:33           ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2015-01-28  5:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 4789 bytes --]

On Tue, Jan 27, 2015 at 09:40:12AM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Wed, Jan 14, 2015 at 08:13:27PM +0000, Dr. David Alan Gilbert wrote:
> > > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 03, 2014 at 06:47:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > When transmitting RAM pages, consume pages that have been queued by
> > > > > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> > > > > 
> > > > > Note:
> > > > >   a) After a queued page the linear walk carries on from after the
> > > > > unqueued page; there is a reasonable chance that the destination
> > > > > was about to ask for other closeby pages anyway.
> > > > > 
> > > > >   b) We have to be careful of any assumptions that the page walking
> > > > > code makes, in particular it does some short cuts on its first linear
> > > > > walk that break as soon as we do a queued page.
> > > > > 
> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > ---
> > > > >  arch_init.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
> > > > >  1 file changed, 125 insertions(+), 24 deletions(-)
> > > > > 
> > > > > diff --git a/arch_init.c b/arch_init.c
> > > > > index 72f9e17..a945990 100644
> > > > > +
> > > > > +        /*
> > > > > +         * Don't break host-page chunks up with queue items
> > > > 
> > > > Why does this matter?
> > > 
> > > See the comment you make in a few patches time, it's about being able
> > > to place the pages atomically on the destination.
> > 
> > Hmm.  But if the destination has to wait for all the pieces of a host
> > page to arrive anyway, does it really make any difference if they're
> > contiguous in the stream?
> 
> The problem is knowing where to put the arriving target-pages until you've
> got a full host-page; you've got to put the arriving TP into a temporary
> until you have the full set, if they're not contiguous in the stream
> then you have to have multiple temporarys dealing with the set of outstanding
> host pages that you've not got the full set for; and you've still got to be
> careful on the sending side to have a bounded-number of host-pages on the run
> at any time.   Making that bound 1 makes the code simpler.

Ah, right, I see your point.

> > > > > +         * so only unqueue if,
> > > > > +         *   a) The last item came from the queue anyway
> > > > > +         *   b) The last sent item was the last target-page in a host page
> > > > > +         */
> > > > > +        if (last_was_from_queue || (!last_sent_block) ||
> > > > > +            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
> > > > > +            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
> > > > >          }
> > > > > -        if (offset >= block->length) {
> > > > > -            offset = 0;
> > > > > -            block = QTAILQ_NEXT(block, next);
> > > > > -            if (!block) {
> > > > > -                block = QTAILQ_FIRST(&ram_list.blocks);
> > > > > -                complete_round = true;
> > > > > -                ram_bulk_stage = false;
> > > > > +
> > > > > +        if (tmpblock) {
> > > > > +            /* We've got a block from the postcopy queue */
> > > > > +            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
> > > > > +                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
> > > > > +            /* We're sending this page, and since it's postcopy nothing else
> > > > > +             * will dirty it, and we must make sure it doesn't get sent again.
> > > > > +             */
> > > > > +            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
> > > > 
> > > > Ugh.. that's kind of subtle.  I think it would be clearer if you work
> > > > in terms of a ram_addr_t throughout, rather than "bitoffset" whose
> > > > meaning is not terribly obvious.
> > > 
> > > I've changed it to ram_addr_t as requested; it's slightly clearer but there
> > > are a few places where we're dealing with the sentmap where we now need to shift
> > > the other way.  In the end ram_addr_t is really a scaled offset into those
> > > bitmaps.
> > 
> > Right, but to someone who isn't deeply familiar with the code, they're
> > more likely to understand what the ram address means than the bitmap
> > offset.
> 
> Fair enough.
> 
> Dave
> 
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

* Re: [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps
  2015-01-27 10:04         ` Dr. David Alan Gilbert
@ 2015-01-28  5:36           ` David Gibson
  0 siblings, 0 replies; 204+ messages in thread
From: David Gibson @ 2015-01-28  5:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, cristian.klein, qemu-devel,
	amit.shah, yanghy

[-- Attachment #1: Type: text/plain, Size: 6888 bytes --]

On Tue, Jan 27, 2015 at 10:04:31AM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Wed, Dec 17, 2014 at 06:21:34PM +0000, Dr. David Alan Gilbert wrote:
> > > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 03, 2014 at 06:47:49PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Prior to the start of postcopy, ensure that everything that will
> > > > > be transferred later is a whole host-page in size.
> > > > > 
> > > > > This is accomplished by discarding partially transferred host pages
> > > > > and marking any that are partially dirty as fully dirty.
> > > > > 
> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > ---
> > > > >  arch_init.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 111 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch_init.c b/arch_init.c
> > > > > index 1fe4fab..aac250c 100644
> > > > > --- a/arch_init.c
> > > > > +++ b/arch_init.c
> > > > > @@ -1024,7 +1024,6 @@ static uint32_t get_32bits_map(unsigned long *map, int64_t start)
> > > > >   * A helper to put 32 bits into a bit map; trivial for HOST_LONG_BITS=32
> > > > >   * messier for 64; the bitmaps are actually long's that are 32 or 64bit
> > > > >   */
> > > > > -__attribute__ (( unused )) /* Until later in patch series */
> > > > >  static void put_32bits_map(unsigned long *map, int64_t start,
> > > > >                             uint32_t v)
> > > > >  {
> > > > > @@ -1153,15 +1152,126 @@ static int pc_each_ram_discard(MigrationState *ms)
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > + * Utility for the outgoing postcopy code.
> > > > > + *
> > > > > + * Discard any partially sent host-page size chunks, mark any partially
> > > > > + * dirty host-page size chunks as all dirty.
> > > > > + *
> > > > > + * Returns: 0 on success
> > > > > + */
> > > > > +static int postcopy_chunk_hostpages(MigrationState *ms)
> > > > > +{
> > > > > +    struct RAMBlock *block;
> > > > > +    unsigned int host_bits = sysconf(_SC_PAGESIZE) / TARGET_PAGE_SIZE;
> > > > > +    uint32_t host_mask;
> > > > > +
> > > > > +    /* Should be a power of 2 */
> > > > > +    assert(host_bits && !(host_bits & (host_bits - 1)));
> > > > > +    /*
> > > > > +     * If the host_bits isn't a division of 32 (the minimum long size)
> > > > > +     * then the code gets a lot more complex; disallow for now
> > > > > +     * (I'm not aware of a system where it's true anyway)
> > > > > +     */
> > > > > +    assert((32 % host_bits) == 0);
> > > > 
> > > > This assert makes the first one redundant.
> > > 
> > > True I guess, removed the power of 2 check.
> > > 
> > > <snip>
> > > 
> > > > > +/*
> > > > >   * Transmit the set of pages to be discarded after precopy to the target
> > > > >   * these are pages that have been sent previously but have been dirtied
> > > > >   * Hopefully this is pretty sparse
> > > > >   */
> > > > >  int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > > > >  {
> > > > > +    int ret;
> > > > > +
> > > > >      /* This should be our last sync, the src is now paused */
> > > > >      migration_bitmap_sync();
> > > > >  
> > > > > +    /* Deal with TPS != HPS */
> > > > > +    ret = postcopy_chunk_hostpages(ms);
> > > > > +    if (ret) {
> > > > > +        return ret;
> > > > > +    }
> > > > 
> > > > This really seems like a bogus thing to be doing on the outgoing
> > > > migration side.  Doesn't the host page size constraint come from the
> > > > destination (due to the need to atomically instate pages).  Source
> > > > host page size == destination host page size doesn't seem like it
> > > > should be an inherent constraint
> > > 
> > > It's not an inherent constraint; it just makes life messier. I had
> > > some code to deal with it but it complicates things even more, and
> > > I've not got anything to test that rare case with; if someone is
> > > desperate for it then it can be added.
> > 
> > So, I'm all for deferring implementation improvements that we don't
> > need for the time being.
> > 
> > What worries me though, is having the source have to make assumptions
> > about how the migration stream will be processed on the destination
> > that aren't somehow baked into the protocol itself.  i.e. I think we
> > should really try to avoid the possibility of migration streams that
> > are structurally sound, and look like they should be valid, but
> > aren't, because of subtle constraints in the order and manner in which
> > the destination needs to process the individual chunks.
> 
> Agreed; see below.
> 
> > > > and it's not clear why you can't do
> > > > this rounding out to host page sized chunks on the receive end.
> > > 
> > > The source keeps track of which pages still need sending, and so
> > > has to update that list when it tells the destination to perform
> > > a discard.
> > 
> > Ah.
> > 
> > > If the destination discards more than the source told it to (for
> > > example because it has bigger host-pages) the source would need
> > > to update it's map of the pages that still need sending.
> > 
> > I'm beginning to wonder if what we really need is for early in the
> > migration process the destination to tell the host what granularity of
> > updates it can handle (based on its page size).
> > 
> > Perhaps the short summary is that I don't think we need to actually
> > handle the case of different source and dest host page sizes.  BUT,
> > if that does happen the migration process should be able to detect
> > that that's what's gone wrong and print a meaningful error, rather
> > than having the destination blow up part way through and deep the code
> > because chunk constraints necessary for the dest host page size
> > haven't been met by the source.
> 
> Right; I cut the problem in the opposite direction and made the source
> send the destination its page sizes in the 'advise' message and
> the destination validates it and spits a:
>    Postcopy needs matching host page sizes (s=%d d=%d)"
> 
> (That's in 21/47 Add wrappers and handlers....)
> It's just a little easier to do it that way rather than having to
> make the source wait for the destination.

I guess that's ok.  It's still a bit ugly, because it means if we were
ever to support different source and dest host page sizes, it would
necessarily require a protocol change (unless I've missed something).
But at least the errors would be obvious.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 204+ messages in thread

end of thread, other threads:[~2015-01-28  5:35 UTC | newest]

Thread overview: 204+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-03 17:47 [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile Dr. David Alan Gilbert (git)
2014-10-08  2:10   ` zhanghailiang
2014-11-03  0:53   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 02/47] Tests: QEMUSizedBuffer/QEMUBuffer Dr. David Alan Gilbert (git)
2014-11-03  1:02   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 03/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2014-11-03  1:31   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 04/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2014-11-03  2:34   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 05/47] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
2014-11-03  2:35   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 06/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
2014-11-03  2:39   ` David Gibson
2014-11-25 16:13     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 07/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
2014-11-03  2:45   ` David Gibson
2014-11-04 19:06     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 08/47] socket shutdown Dr. David Alan Gilbert (git)
2014-10-04 18:09   ` Paolo Bonzini
2014-10-07 10:00     ` Dr. David Alan Gilbert
2014-10-07 11:10       ` Paolo Bonzini
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 09/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 10/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2014-11-03  3:05   ` David Gibson
2014-11-03 19:04     ` Dr. David Alan Gilbert
2014-11-18  4:34       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 11/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2014-11-03  3:10   ` David Gibson
2014-11-03 18:59     ` Dr. David Alan Gilbert
2014-11-18  3:54       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 12/47] Handle bi-directional communication for fd migration Dr. David Alan Gilbert (git)
2014-11-03  3:12   ` David Gibson
2014-11-03 13:53     ` Cristian Klein
2014-11-18  3:53       ` David Gibson
2014-11-19 17:27         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 13/47] Migration commands Dr. David Alan Gilbert (git)
2014-11-03  3:14   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
2014-10-04 18:08   ` Paolo Bonzini
2014-10-23 16:23     ` Dr. David Alan Gilbert
2014-10-23 20:15       ` Paolo Bonzini
2014-11-03  3:20         ` David Gibson
2014-11-04 18:58         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2014-11-03  3:22   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2014-10-04 18:14   ` Paolo Bonzini
2014-10-23 18:00     ` Dr. David Alan Gilbert
2014-10-24 10:04       ` Paolo Bonzini
2014-10-16  8:26   ` zhanghailiang
2014-10-16  8:35     ` Dr. David Alan Gilbert
2014-10-16  9:09       ` zhanghailiang
2014-11-03  3:47     ` David Gibson
2014-11-25 15:44       ` Dr. David Alan Gilbert
2014-11-03  3:46   ` David Gibson
2014-11-03 13:22     ` Dr. David Alan Gilbert
2014-11-18  3:52       ` David Gibson
2014-11-19 17:06         ` Dr. David Alan Gilbert
2014-11-19 21:12           ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 17/47] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
2014-11-03  3:49   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 18/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2014-11-03  3:58   ` David Gibson
2014-11-19 17:35     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2014-10-04 16:46   ` Paolo Bonzini
2014-10-07  8:58     ` Dr. David Alan Gilbert
2014-10-07 10:12       ` Paolo Bonzini
2014-10-07 10:21         ` Dr. David Alan Gilbert
2014-11-03  5:08   ` David Gibson
2014-11-19 17:50     ` Dr. David Alan Gilbert
2014-11-21  6:53       ` David Gibson
2014-12-11 14:47         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2014-10-06 18:59   ` Eric Blake
2014-10-06 19:07     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2014-11-03  5:51   ` David Gibson
2014-12-17 14:50     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 22/47] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2014-11-04  1:28   ` David Gibson
2014-11-04 10:19     ` Dr. David Alan Gilbert
2014-11-18  4:36       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2014-10-08  2:28   ` zhanghailiang
2014-11-04  1:29   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 24/47] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
2014-11-04  1:33   ` David Gibson
2014-11-19 17:53     ` Dr. David Alan Gilbert
2014-11-21  6:58       ` David Gibson
2014-11-25 19:58         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
2014-11-04  1:40   ` David Gibson
2014-11-25 17:34     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2014-11-04  1:47   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 27/47] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2014-11-04  1:49   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 28/47] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
2014-11-04  2:18   ` David Gibson
2014-12-17 16:14     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 29/47] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
2014-11-04  3:09   ` David Gibson
2014-11-19 18:46     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 30/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
2014-11-05  6:38   ` David Gibson
2014-12-17 16:48     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 31/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2014-11-05  6:47   ` David Gibson
2014-12-17 17:21     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2014-10-04 16:42   ` Paolo Bonzini
2014-10-06 19:00     ` Dr. David Alan Gilbert
2014-11-05  6:49   ` David Gibson
2014-11-19 18:59     ` Dr. David Alan Gilbert
2014-11-19 21:17       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
2014-10-04 16:27   ` Paolo Bonzini
2014-11-20 11:45     ` Dr. David Alan Gilbert
2014-11-21 12:01       ` Paolo Bonzini
2014-11-21 12:07         ` Dr. David Alan Gilbert
2014-11-20 17:12     ` Dr. David Alan Gilbert
2014-11-20 17:19       ` Paolo Bonzini
2014-11-24 18:26     ` Dr. David Alan Gilbert
2014-11-10  6:05   ` David Gibson
2015-01-05 16:06     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 34/47] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
2014-11-10  6:10   ` David Gibson
2014-11-19 18:56     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 35/47] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
2014-11-10  6:19   ` David Gibson
2014-11-19 20:01     ` Dr. David Alan Gilbert
2014-11-19 21:48       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2014-10-08  2:31   ` zhanghailiang
2014-10-08  7:49     ` Dr. David Alan Gilbert
2014-10-08  8:07       ` Paolo Bonzini
2014-10-08  8:10       ` zhanghailiang
2014-10-08  8:18         ` Dr. David Alan Gilbert
2014-11-10  6:31   ` David Gibson
2014-11-17 19:07     ` Dr. David Alan Gilbert
2014-11-18  4:38       ` David Gibson
2014-11-19 19:37         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2014-10-04 18:04   ` Paolo Bonzini
2014-10-07 11:35     ` Dr. David Alan Gilbert
2014-11-11  1:13   ` David Gibson
2015-01-14 20:13     ` Dr. David Alan Gilbert
2015-01-27  4:38       ` David Gibson
2015-01-27  9:40         ` Dr. David Alan Gilbert
2015-01-28  5:33           ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
2014-10-04 18:32   ` Paolo Bonzini
2014-10-06 18:51     ` Dr. David Alan Gilbert
2014-10-06 20:30       ` Paolo Bonzini
2014-11-11  1:14   ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 39/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2014-11-11  1:39   ` David Gibson
2015-01-15 18:14     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 40/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2014-11-13  2:53   ` David Gibson
2014-11-25 18:14     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 41/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2014-11-13  2:59   ` David Gibson
2014-11-25 18:55     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 42/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
2014-11-13  3:01   ` David Gibson
2014-11-25 16:25     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 43/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
2014-11-13  3:10   ` David Gibson
2014-12-17 18:21     ` Dr. David Alan Gilbert
2015-01-27  4:50       ` David Gibson
2015-01-27 10:04         ` Dr. David Alan Gilbert
2015-01-28  5:36           ` David Gibson
2015-01-27 10:20   ` Peter Maydell
2015-01-27 11:50     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2014-11-13  3:23   ` David Gibson
2015-01-05 17:13     ` Dr. David Alan Gilbert
2015-01-27  4:33       ` David Gibson
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 45/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2014-11-13  3:29   ` David Gibson
2014-11-19 19:40     ` Dr. David Alan Gilbert
2014-11-21  8:36       ` David Gibson
2014-11-21 10:17         ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
2014-10-04 17:51   ` Paolo Bonzini
2014-10-23 12:18     ` Dr. David Alan Gilbert
2014-10-03 17:47 ` [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy Dr. David Alan Gilbert (git)
2014-10-04 17:49   ` Paolo Bonzini
2014-10-23 14:24     ` Dr. David Alan Gilbert
2014-10-04 18:31   ` Paolo Bonzini
2014-10-07 10:29     ` Dr. David Alan Gilbert
2014-10-07 11:12       ` Paolo Bonzini
2014-10-03 19:21 ` [Qemu-devel] [PATCH v4 00/47] Postcopy implementation Dr. David Alan Gilbert
2014-10-07  2:27   ` Cristian Klein
2014-10-07  8:12     ` Dr. David Alan Gilbert
2014-10-08  8:36       ` Cristian Klein
2014-11-21  3:48 ` zhanghailiang
2014-11-21 10:14   ` Dr. David Alan Gilbert
2014-11-24  8:10     ` zhanghailiang
2014-11-21 18:56   ` Andrea Arcangeli
2014-11-24  8:25     ` zhanghailiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.