All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/43] Postcopy implementation
@ 2014-08-11 14:29 Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (43 more replies)
  0 siblings, 44 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is a 2nd cut of my postcopy implementation; it fixes up some
of the comments from the 1st posting but it also fixes a lot of bugs,
(other comments from the 1st round will be fixed later).

The commands to start a postcopy migration have changed to:

  migrate_set_capability x-postcopy-ram on
  migrate -d   tcp:whereever
  <some time later>
  migrate_start_postcopy

Starting the destination in paused mode (-s) now works.

This now survives in a test harness with a few hundred migrations without
problems (both on heavily loaded and idle VMs).

Other changes:
  Changed source rp handler to coroutine (from fd_set_handler) - cures
    occasional loss of requests (and hence huge latency spikes)
  Added 'SHUT' rp command as a handshake from the destination that it's finished
  Fixed race during migration of guest with very little page modification
  Shutdown the fault-thread cleanly
  Update 'received' map in ADVISE state
  Turned off nagling on rp - we want our requests to get their ASAP
  Fixed seg when attempting to postcopy to a stream that couldn't get a return path
  Got rid of the move of QEMUFile into the .h
  Fixed up '-ve' and title of assert patch
  Removed a couple of return-path opening optimisations as per Paolo's comments
 
Note this version needs the QEMUSizedBuffer/QEMUFile patches that I've posted
separately.

Current TODO:
   1) It's not bisectable yet
   2) There are no testsuite additions (although I have a virt-test modification
      I've been using).
   3) End-of-migration cleanup is much better than it was but still needs some work.
   4) Not all the code is there for systems with hostpagesize!=qemupagesize
   5) xbzrle needs disabling once in postcopy
   6) RDMA needs some rework
   7) The latency measurements are now pretty consistent, no very large spikes,
      but they're a bit higher than expected, I need to look at rate limiting
      just the background scan.
   8) Conversion of return-path to a process and blocking fd needs investigation
      (as per discussion with Paolo)
   9) Andrea has suggestions on ways to avoid some of the huge-page splitting
      that occurs during the discard phase after precopy.
  10) I'd like to format the data on the return path in a more structured way
      (i.e. maybe using stuff from my BER world).
  11) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
      the source feels like it needs looking at for postcopy.
Dave

Dr. David Alan Gilbert (43):
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  improve DPRINTF macros, add to savevm
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Create MigrationIncomingState
  
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  qemu_loadvm errors and debug
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  migrate_init: Call from savevm
  Allow savevm handlers to state whether they could go into postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  qemu_savevm_state_complete: Postcopy changes
  Postcopy: Maintain sentmap during postcopy pre phase
  Postcopy page-map-incoming (PMI) structure
  postcopy: Add incoming_init/cleanup functions
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: postcopy_start
  Postcopy: Rework migration thread for postcopy mode
  mig fd_connect: open return path
  Postcopy: Create a fault handler thread before marking the ram as
    userfault
  Page request:  Add MIG_RPCOMM_REQPAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  Add assertion to check migration_dirty_pages
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  qemu_ram_block_from_host
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
  End of migration for postcopy
  Start documenting how postcopy works.

 Makefile.objs                    |   2 +-
 arch_init.c                      | 490 +++++++++++++++++++--
 docs/migration.txt               | 150 +++++++
 exec.c                           |  66 ++-
 hmp-commands.hx                  |  15 +
 hmp.c                            |   7 +
 hmp.h                            |   1 +
 include/exec/cpu-common.h        |   8 +-
 include/migration/migration.h    | 128 ++++++
 include/migration/postcopy-ram.h |  89 ++++
 include/migration/qemu-file.h    |   9 +
 include/migration/vmstate.h      |   2 +-
 include/qemu/typedefs.h          |   7 +-
 include/sysemu/sysemu.h          |  41 +-
 migration-rdma.c                 |   4 +-
 migration.c                      | 667 ++++++++++++++++++++++++++--
 postcopy-ram.c                   | 915 +++++++++++++++++++++++++++++++++++++++
 qapi-schema.json                 |  14 +-
 qemu-file.c                      | 124 +++++-
 qmp-commands.hx                  |  19 +
 savevm.c                         | 854 +++++++++++++++++++++++++++++++++---
 21 files changed, 3473 insertions(+), 139 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 18:29   ` Eric Blake
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 02/43] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's none-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and __qemu_rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration-rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 765bd94..729c12c 100644
--- a/exec.c
+++ b/exec.c
@@ -2774,12 +2774,18 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->length, opaque);
+        ret = func(block->idstr, block->host, block->offset, block->length,
+                   opaque);
+        if (ret) {
+            return ret;
+        }
     }
+    return 0;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index e3ec4c8..8042f50 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -118,10 +118,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration-rdma.c b/migration-rdma.c
index d99812c..666c052 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -595,10 +595,10 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
+    return __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 02/43] improve DPRINTF macros, add to savevm
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 03/43] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Improve the existing DPRINTF macros in migration.c and arch_init
by:
  1) Making them go to stderr rather than stdout (so you can run with
-nographic and redirect your debug to a file)
  2) Making them print the ms time with each debug - useful for
debugging latency issues

Add the same macro to savevm

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c |  5 ++++-
 migration.c | 12 ++++++++++++
 savevm.c    | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 8ddaf35..eb6455a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -53,9 +53,12 @@
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
 
+// #define DEBUG_ARCH_INIT
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
-    do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+    do { fprintf(stderr,  "arch_init@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
 #else
 #define DPRINTF(fmt, ...) \
     do { } while (0)
diff --git a/migration.c b/migration.c
index 8d675b3..e241370 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,18 @@
 #include "qmp-commands.h"
 #include "trace.h"
 
+//#define DEBUG_MIGRATION
+
+#ifdef DEBUG_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "migration@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 enum {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
diff --git a/savevm.c b/savevm.c
index e19ae0a..c3a1f68 100644
--- a/savevm.c
+++ b/savevm.c
@@ -43,6 +43,16 @@
 #include "block/snapshot.h"
 #include "block/qapi.h"
 
+#ifdef DEBUG_SAVEVM
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "savevm@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 03/43] Add qemu_get_counted_string to read a string prefixed by a count byte
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 02/43] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 04/43] Create MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

and use it in loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 qemu-file.c                   | 15 +++++++++++++++
 savevm.c                      | 18 ++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 80af3ff..e50d696 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -300,4 +300,6 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf);
 #endif
diff --git a/qemu-file.c b/qemu-file.c
index d64bee2..f6d64ce 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -879,6 +879,21 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: 0 on success and a 0 terminated string in the buffer
+ */
+int qemu_get_counted_string(QEMUFile *f, uint8_t *buf)
+{
+    unsigned int len = qemu_get_byte(f);
+    int res = qemu_get_buffer(f, buf, len);
+
+    buf[len] = 0;
+
+    return res != len;
+}
+
 #define QSB_CHUNK_SIZE      (1 << 10)
 #define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
 
diff --git a/savevm.c b/savevm.c
index c3a1f68..cb6f0de 100644
--- a/savevm.c
+++ b/savevm.c
@@ -908,7 +908,7 @@ int qemu_loadvm_state(QEMUFile *f)
 
     v = qemu_get_be32(f);
     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
         return -ENOTSUP;
     }
     if (v != QEMU_VM_FILE_VERSION) {
@@ -918,31 +918,33 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
-        char idstr[257];
-        int len;
+        char idstr[256];
 
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
             /* Read section start */
             section_id = qemu_get_be32(f);
-            len = qemu_get_byte(f);
-            qemu_get_buffer(f, (uint8_t *)idstr, len);
-            idstr[len] = 0;
+            if (qemu_get_counted_string(f, (uint8_t *)idstr)) {
+                error_report("Unable to read ID string for section %u",
+                            section_id);
+                return -EINVAL;
+            }
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
-                fprintf(stderr, "Unknown savevm section or instance '%s' %d\n", idstr, instance_id);
+                error_report("Unknown savevm section or instance '%s' %d",
+                             idstr, instance_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
-                fprintf(stderr, "savevm: unsupported version %d for '%s' v%d\n",
+                error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
                 ret = -EINVAL;
                 goto out;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 04/43] Create MigrationIncomingState
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 03/43] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 05/43] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  9 +++++++++
 include/qemu/typedefs.h       |  2 ++
 migration.c                   | 30 ++++++++++++++++++++++++++++++
 savevm.c                      |  2 ++
 4 files changed, 43 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..8a36255 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,6 +41,15 @@ struct MigrationParams {
 
 typedef struct MigrationState MigrationState;
 
+/* State for the incoming migration */
+struct MigrationIncomingState {
+    QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_get_current(void);
+MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
+void migration_incoming_state_destroy(void);
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index db1153a..0f79b5c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct Visitor Visitor;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 
 typedef struct Property Property;
@@ -44,6 +45,7 @@ typedef struct PixelFormat PixelFormat;
 typedef struct QemuConsole QemuConsole;
 typedef struct CharDriverState CharDriverState;
 typedef struct MACAddr MACAddr;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct NetClientState NetClientState;
 typedef struct I2CBus I2CBus;
 typedef struct ISABus ISABus;
diff --git a/migration.c b/migration.c
index e241370..c203958 100644
--- a/migration.c
+++ b/migration.c
@@ -65,6 +65,7 @@ static NotifierList migration_state_notifiers =
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
 
+/* For outgoing */
 MigrationState *migrate_get_current(void)
 {
     static MigrationState current_migration = {
@@ -77,6 +78,28 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+/* For incoming */
+static MigrationIncomingState *mis_current;
+
+MigrationIncomingState *migration_incoming_get_current(void)
+{
+    return mis_current;
+}
+
+MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
+{
+    mis_current = g_malloc0(sizeof(MigrationIncomingState));
+    mis_current->file = f;
+
+    return mis_current;
+}
+
+void migration_incoming_state_destroy(void)
+{
+    g_free(mis_current);
+    mis_current = NULL;
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -104,11 +127,18 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
     int ret;
 
+    mis = migration_incoming_state_init(f);
+
     ret = qemu_loadvm_state(f);
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
+    migration_incoming_state_destroy();
+    mis = NULL;
+
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
         exit(EXIT_FAILURE);
diff --git a/savevm.c b/savevm.c
index cb6f0de..a0c3b40 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,9 +1244,11 @@ int load_vmstate(const char *name)
     }
 
     qemu_system_reset(VMRESET_SILENT);
+    migration_incoming_state_init(f);
     ret = qemu_loadvm_state(f);
 
     qemu_fclose(f);
+    migration_incoming_state_destroy();
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 05/43] Return path: Open a return path on QEMUFile for sockets
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 04/43] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 qemu-file.c                   | 70 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index e50d696..06baaf4 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -84,6 +84,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                size_t size,
                                int *bytes_sent);
 
+/*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +99,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
 
 struct QEMUSizedBuffer {
@@ -178,6 +184,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/qemu-file.c b/qemu-file.c
index f6d64ce..46cb6de 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -26,6 +26,8 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+
+    struct QEMUFile *return_path;
 };
 
 typedef struct QEMUFileStdio {
@@ -38,6 +40,48 @@ typedef struct QEMUFileSocket {
     QEMUFile *file;
 } QEMUFileSocket;
 
+/* Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ * qemu_fopen_socket marks write fd's as blocking, but doesn't
+ * touch read fd's status, so we dup the fd just to keep settings
+ * separate.
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *qfs = opaque;
+    int revfd;
+    bool this_is_read;
+    QEMUFile *result;
+
+    /* We should only be called once to get a RP on a file */
+    assert(!qfs->file->return_path);
+
+    if (qemu_file_get_error(qfs->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    /* I don't think there's a better way to tell which direction 'this' is */
+    this_is_read = qfs->file->ops->get_buffer != NULL;
+
+    revfd = dup(qfs->fd);
+    if (revfd == -1) {
+        error_report("Error duplicating fd for return path: %s",
+                      strerror(errno));
+        return NULL;
+    }
+
+    qemu_set_nonblock(revfd);
+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+    qfs->file->return_path = result;
+
+    if (!result) {
+        close(revfd);
+    }
+
+    return result;
+}
+
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                     int64_t pos)
 {
@@ -335,17 +379,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd =     socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close =      socket_close
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd =     socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close =      socket_close
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .get_return_path = socket_dup_return_path
 };
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 05/43] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-12  2:13   ` [Qemu-devel] 答复: " chenliang (T)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 07/43] Migration commands Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The return path uses a non-blocking fd so as not to block waiting
for the (possibly broken) destination to finish returning a message,
however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 qemu-file.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/qemu-file.c b/qemu-file.c
index 46cb6de..f410a1d 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -88,12 +88,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
     }
-    return len;
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 07/43] Migration commands
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 08/43] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  7 +++++
 savevm.c                      | 62 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8a36255..e23947a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -33,6 +33,7 @@
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_COMMAND              0x06
 
 struct MigrationParams {
     bool blk;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..66821eb 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -81,6 +81,13 @@ void do_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+
+    QEMU_VM_CMD_AFTERLASTVALID
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
diff --git a/savevm.c b/savevm.c
index a0c3b40..8a1f5f1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -592,6 +592,25 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
     vmstate_save_state(f, se->vmsd, se->opaque);
 }
 
+
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    uint32_t tmp = (uint16_t)command;
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, tmp);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -881,6 +900,43 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * negative return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t com;
+    uint16_t len;
+    uint32_t tmp32;
+
+    com = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
+    switch (com) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -987,6 +1043,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 08/43] Return path: Control commands
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 07/43] Migration commands Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 09/43] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPENRP - To request that the destination open the return path
   * REQACK - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  2 ++
 savevm.c                      | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e23947a..173775b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,8 @@ typedef struct MigrationState MigrationState;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
+
+    QEMUFile *return_path;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 66821eb..b25e938 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,8 @@ void qemu_announce_self(void);
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
+    QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
+    QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
     QEMU_VM_CMD_AFTERLASTVALID
 };
diff --git a/savevm.c b/savevm.c
index 8a1f5f1..8eebbfd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -611,6 +611,19 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    DPRINTF("send_reqack %d", value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, QEMU_VM_CMD_REQACK, 4, (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_openrp(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
+}
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -919,6 +932,7 @@ static int loadvm_process_command_simple_lencheck(const char *name,
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
     uint16_t len;
     uint32_t tmp32;
@@ -928,6 +942,35 @@ static int loadvm_process_command(QEMUFile *f)
 
     /* fprintf(stderr,"loadvm_process_command: com=0x%x len=%d\n", com,len); */
     switch (com) {
+    case QEMU_VM_CMD_OPENRP:
+        if (loadvm_process_command_simple_lencheck("CMD_OPENRP", len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPENRP called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPENRP failed - could not open return path");
+            return -1;
+        }
+        break;
+
+    case QEMU_VM_CMD_REQACK:
+        if (loadvm_process_command_simple_lencheck("CMD_REQACK", len, 4)) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        DPRINTF("Received REQACK 0x%x", tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_REQACK (0x%x) received with no open return path",
+                         tmp32);
+            return -1;
+        }
+        migrate_send_rp_ack(mis, tmp32);
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 09/43] Return path: Send responses from destination to source
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 08/43] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 10/43] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send an 'ack' message
  Use it in the CMD_REQACK handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 18 ++++++++++++++++++
 migration.c                   | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 173775b..12e640d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,6 +40,13 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Commands sent on the return path from destination to source*/
+enum mig_rpcomm_cmd {
+    MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
+    MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_AFTERLASTVALID
+};
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -47,6 +54,7 @@ struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -168,6 +176,16 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value);
+
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index c203958..3e4e120 100644
--- a/migration.c
+++ b/migration.c
@@ -90,6 +90,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -100,6 +101,46 @@ void migration_incoming_state_destroy(void)
     mis_current = NULL;
 }
 
+/* Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rpcomm_cmd cmd,
+                             uint16_t len, uint8_t *data)
+{
+    DPRINTF("migrate_send_rp_message: cmd=%d, len=%d\n", (int)cmd, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)cmd);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  None-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_SHUT, 4, (uint8_t *)&buf);
+}
+
+/* Send an 'ACK' message on the return channel with the given value */
+void migrate_send_rp_ack(MigrationIncomingState *mis,
+                         uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 10/43] Return path: Source handling of return path
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 09/43] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 11/43] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  11 ++++
 migration.c                   | 145 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 12e640d..0c9055f 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -47,6 +47,15 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
     MIG_RPCOMM_AFTERLASTVALID
 };
+
+struct MigrationRetPathState {
+    uint16_t      header_com;   /* Headers of last (partially?) received cmd */
+    uint16_t      header_len;
+    uint32_t      latest_ack;
+    bool          error;        /* True if something bad happened on the RP */
+    QemuSemaphore finished;     /* When the RP co quits */
+};
+
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -69,9 +78,11 @@ struct MigrationState
     QemuThread thread;
     QEMUBH *cleanup_bh;
     QEMUFile *file;
+    QEMUFile *return_path;
 
     int state;
     MigrationParams params;
+    struct MigrationRetPathState rp_state;
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration.c b/migration.c
index 3e4e120..c54d59e 100644
--- a/migration.c
+++ b/migration.c
@@ -373,6 +373,15 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    if (ms->return_path) {
+        DPRINTF("cleaning up return path\n");
+        qemu_fclose(ms->return_path);
+        ms->return_path = NULL;
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -380,6 +389,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -657,8 +668,140 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ * This is a coroutine that sits around listening for messages as
+ * long as the return-path exists
+ */
+static void source_return_path_co(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->return_path;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    DPRINTF("RP: source_return_path_co entry");
+    while (rp && !qemu_file_get_error(rp)) {
+        DPRINTF("RP: source_return_path_co top of loop");
+        ms->rp_state.header_com = qemu_get_be16(rp);
+        ms->rp_state.header_len = qemu_get_be16(rp);
+
+        uint16_t expected_len;
+
+        switch (ms->rp_state.header_com) {
+        case MIG_RPCOMM_SHUT:
+        case MIG_RPCOMM_ACK:
+            expected_len = 4;
+            break;
+
+        default:
+            error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
+                    ms->rp_state.header_com, ms->rp_state.header_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
 
+        if (ms->rp_state.header_len > expected_len) {
+            error_report("RP: Received command 0x%04x with"
+                    "incorrect length %d expecting %d",
+                    ms->rp_state.header_com, ms->rp_state.header_len,
+                    expected_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, ms->rp_state.header_len);
+        if (res != ms->rp_state.header_len) {
+            DPRINTF("RP: Failed to read command data");
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the command and the data */
+        switch (ms->rp_state.header_com) {
+        case MIG_RPCOMM_SHUT:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            if (tmp32) {
+                error_report("RP: Sibling indicated error %d", tmp32);
+                source_return_path_bad(ms);
+            } else {
+                DPRINTF("RP: SHUT received");
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RPCOMM_ACK:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            DPRINTF("RP: Received ACK 0x%x", tmp32);
+            atomic_xchg(&ms->rp_state.latest_ack, tmp32);
+            break;
+
+        default:
+            /* This shouldn't happen because we should catch this above */
+            DPRINTF("RP: Bad header_com in dispatch");
+        }
+        /* Latest command processed, now leave a gap for the next one */
+        ms->rp_state.header_com = MIG_RPCOMM_INVALID;
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        DPRINTF("source_report_path_co: rp bad at end");
+        source_return_path_bad(ms);
+    }
+
+    DPRINTF("source_report_path_co: Bottom exit");
+
+out:
+    /* For await_outgoing_return_path_close */
+    qemu_sem_post(&ms->rp_state.finished);
+}
+
+static int open_outgoing_return_path(MigrationState *ms)
+{
+    Coroutine *co = qemu_coroutine_create(source_return_path_co);
+    qemu_sem_init(&ms->rp_state.finished, 0);
+
+    ms->return_path = qemu_file_get_return_path(ms->file);
+    if (!ms->return_path) {
+        return -1;
+    }
+
+    DPRINTF("open_outgoing_return_path starting co");
+    qemu_coroutine_enter(co, ms);
+    DPRINTF("open_outgoing_return_path continuing");
+
+    return 0;
+}
+
+static void await_outgoing_return_path_close(MigrationState *ms)
+{
+    /* TODO: once the _co becomes a process we can replace this by a join */
+    DPRINTF("%s: Waiting", __func__);
+    qemu_sem_wait(&ms->rp_state.finished);
+    DPRINTF("%s: Exit", __func__);
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 11/43] qemu_loadvm errors and debug
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 10/43] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 12/43] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Flip many fprintf's to error_report
Add lots of DPRINTF debug in qemu_loadvm*

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/savevm.c b/savevm.c
index 8eebbfd..2c0d61a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -719,6 +719,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
 
         if (ret < 0) {
+            DPRINTF("%s: setting error state after iterate on id=%d/%s",
+                    __func__, se->section_id, se->idstr);
             qemu_file_set_error(f, ret);
         }
         if (ret <= 0) {
@@ -1019,6 +1021,7 @@ int qemu_loadvm_state(QEMUFile *f)
         SaveStateEntry *se;
         char idstr[256];
 
+        DPRINTF("qemu_loadvm_state loop: section_type=%d", section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
@@ -1032,6 +1035,9 @@ int qemu_loadvm_state(QEMUFile *f)
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
+            DPRINTF("qemu_loadvm_state loop START/FULL: id=%d(%s)",
+                    section_id, idstr);
+
             /* Find savevm section */
             se = find_se(idstr, instance_id);
             if (se == NULL) {
@@ -1059,8 +1065,9 @@ int qemu_loadvm_state(QEMUFile *f)
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state for instance 0x%x of device '%s'\n",
-                        instance_id, idstr);
+                error_report("qemu: error while loading state for"
+                             "instance 0x%x of device '%s'",
+                             instance_id, idstr);
                 goto out;
             }
             break;
@@ -1068,23 +1075,25 @@ int qemu_loadvm_state(QEMUFile *f)
         case QEMU_VM_SECTION_END:
             section_id = qemu_get_be32(f);
 
+            DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
-                fprintf(stderr, "Unknown savevm section %d\n", section_id);
+                error_report("Unknown savevm section %d", section_id);
                 ret = -EINVAL;
                 goto out;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
-                fprintf(stderr, "qemu: warning: error while loading state section id %d\n",
-                        section_id);
+                error_report("qemu: error while loading state section"
+                             " id %d (%s)", section_id, le->se->idstr);
                 goto out;
             }
+            DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
@@ -1093,11 +1102,12 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             break;
         default:
-            fprintf(stderr, "Unknown savevm section type %d\n", section_type);
+            error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
             goto out;
         }
     }
+    DPRINTF("qemu_loadvm_state loop: exited loop");
 
     cpu_synchronize_all_post_init();
 
@@ -1113,6 +1123,7 @@ out:
         ret = qemu_file_get_error(f);
     }
 
+    DPRINTF("qemu_loadvm_state out: ret=%d", ret);
     return ret;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 12/43] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 11/43] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 13/43] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Misses out lines that are all the expected value so the output
can be quite compact depending on the circumstance.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 39 +++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index eb6455a..a3c468e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -767,6 +767,45 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of and it won't bother printing lines that are all this value
+ * if 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128l;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur+linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur+curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found |= (thisbit ^ expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08lx : %s\n", cur, linebuf);
+        }
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0c9055f..984d96c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -158,6 +158,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 13/43] Rework loadvm path for subloops
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 12/43] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and signal whether the parent
should.

loadvm_handlers is made static since it's lifetime is greater
than the outer qemu_loadvm_state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 136 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 84 insertions(+), 52 deletions(-)

diff --git a/savevm.c b/savevm.c
index 2c0d61a..7236232 100644
--- a/savevm.c
+++ b/savevm.c
@@ -915,6 +915,26 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/* These are ORable flags */
+const int LOADVM_EXITCODE_QUITLOOP     =  1;
+const int LOADVM_EXITCODE_QUITPARENT   =  2;
+const int LOADVM_EXITCODE_KEEPHANDLERS =  4;
+
+typedef struct LoadStateEntry {
+    QLIST_ENTRY(LoadStateEntry) entry;
+    SaveStateEntry *se;
+    int section_id;
+    int version_id;
+} LoadStateEntry;
+
+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+static LoadStateEntry_Head loadvm_handlers =
+ QLIST_HEAD_INITIALIZER(loadvm_handlers);
+
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -931,8 +951,11 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
+ * 0   just a normal return
+ * 1   All good, but exit the loop
  */
-static int loadvm_process_command(QEMUFile *f)
+static int loadvm_process_command(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
@@ -982,39 +1005,13 @@ static int loadvm_process_command(QEMUFile *f)
     return 0;
 }
 
-typedef struct LoadStateEntry {
-    QLIST_ENTRY(LoadStateEntry) entry;
-    SaveStateEntry *se;
-    int section_id;
-    int version_id;
-} LoadStateEntry;
-
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f,
+                                  LoadStateEntry_Head *loadvm_handlers)
 {
-    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-        QLIST_HEAD_INITIALIZER(loadvm_handlers);
-    LoadStateEntry *le, *new_le;
+    LoadStateEntry *le;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-
-    if (qemu_savevm_state_blocked(NULL)) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        return -ENOTSUP;
-    }
+    int exitcode = 0;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
@@ -1043,16 +1040,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                         version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1061,14 +1056,14 @@ int qemu_loadvm_state(QEMUFile *f)
             le->se = se;
             le->section_id = section_id;
             le->version_id = version_id;
-            QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+            QLIST_INSERT_HEAD(loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state for"
                              "instance 0x%x of device '%s'",
                              instance_id, idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1076,47 +1071,84 @@ int qemu_loadvm_state(QEMUFile *f)
             section_id = qemu_get_be32(f);
 
             DPRINTF("QEMU_VM_SECTION_PART/END entry for id=%d", section_id);
-            QLIST_FOREACH(le, &loadvm_handlers, entry) {
+            QLIST_FOREACH(le, loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("qemu: error while loading state section"
                              " id %d (%s)", section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             DPRINTF("QEMU_VM_SECTION_PART/END done for id=%d", section_id);
             break;
         case QEMU_VM_COMMAND:
-            ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            ret = loadvm_process_command(f, loadvm_handlers);
+            DPRINTF("%s QEMU_VM_COMMAND ret: %d", __func__, ret);
+            if ((ret < 0) || (ret & LOADVM_EXITCODE_QUITLOOP)) {
+                return ret;
             }
+            exitcode |= ret; /* Lets us pass flags up to the parent */
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
     DPRINTF("qemu_loadvm_state loop: exited loop");
 
-    cpu_synchronize_all_post_init();
+    if (exitcode & LOADVM_EXITCODE_QUITPARENT) {
+        DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT");
+        exitcode &= ~LOADVM_EXITCODE_QUITPARENT;
+        exitcode &= LOADVM_EXITCODE_QUITLOOP;
+    }
+
+    return exitcode;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    LoadStateEntry *le, *new_le;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        return -EINVAL;
+    }
 
-    ret = 0;
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        return -ENOTSUP;
+    }
+
+    QLIST_INIT(&loadvm_handlers);
+    ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
-out:
-    QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-        QLIST_REMOVE(le, entry);
-        g_free(le);
+    if (ret == 0) {
+        cpu_synchronize_all_post_init();
+    }
+
+    if ((ret < 0) || !(ret & LOADVM_EXITCODE_KEEPHANDLERS)) {
+        QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
+            QLIST_REMOVE(le, entry);
+            g_free(le);
+        }
     }
 
     if (ret == 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram.
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 13/43] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 16:47   ` Eric Blake
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 15/43] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 migration.c                   | 9 +++++++++
 qapi-schema.json              | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 984d96c..ad02602 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -174,6 +174,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
 
diff --git a/migration.c b/migration.c
index c54d59e..bd7071a 100644
--- a/migration.c
+++ b/migration.c
@@ -632,6 +632,15 @@ bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 341f417..77e57bd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,14 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
+#          migrated, pulling the remaining pages along as needed. NOTE: If the
+#          migration fails during postcopy the VM will fail.  (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 15/43] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 16/43] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add state variable showing current incoming postcopy state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 include/sysemu/sysemu.h       |  23 +++
 savevm.c                      | 324 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 355 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ad02602..e3f4494 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -62,6 +62,14 @@ typedef struct MigrationState MigrationState;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    volatile enum {
+        POSTCOPY_RAM_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+        POSTCOPY_RAM_INCOMING_ADVISE,
+        POSTCOPY_RAM_INCOMING_LISTENING,
+        POSTCOPY_RAM_INCOMING_RUNNING,
+        POSTCOPY_RAM_INCOMING_END
+    } postcopy_ram_state;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b25e938..0641cc2 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,16 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
 
+    QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
+                                              warn we might want to do PC */
+    QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,      /* A list of pages to discard that
+                                              were previously sent during
+                                              precopy but are dirty. */
+    QEMU_VM_CMD_POSTCOPY_RAM_LISTEN,       /* Start listening for incoming
+                                              pages as it's running. */
+    QEMU_VM_CMD_POSTCOPY_RAM_RUN,          /* Start execution */
+    QEMU_VM_CMD_POSTCOPY_RAM_END,          /* Postcopy is finished. */
+
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
@@ -97,6 +107,19 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
+void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist);
+
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 /* SLIRP */
diff --git a/savevm.c b/savevm.c
index 7236232..13d975d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -33,12 +33,14 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -624,6 +626,83 @@ void qemu_savevm_send_openrp(QEMUFile *f)
 {
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
+
+/* Send prior to any RAM transfer */
+void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-advise");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 0, NULL);
+}
+
+/* Prior to running, to cause pages that have been dirtied after precopy
+ * started to be discarded on the destination.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *  2 byte header (filled in by qemu_savevm_send_postcopy_ram_discard)
+ *      byte   version (0)
+ *      byte   offset into the 1st data word containing 1st page of RAMBlock
+ *      byte   Length of name field
+ *  n x byte   RAM block name (NOT 0 terminated)
+ *  n x
+ *      be64   Page addresses for start of an invalidation range
+ *      be64   mask of 64 pages, '1' to discard'
+ *
+ *  Hopefully this is pretty sparse so we don't get too many entries,
+ *  and using the mask should deal with most pagesize differences
+ *  just ending up as a single full mask
+ *
+ *  The mask is always 64bits irrespective of the long size
+ *
+ *  Note the destination is free to discard *more* than we've asked
+ *  (e.g. rounding up to some convenient page size)
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  pagelist: one 8byte header word (empty) then len*(start,mask) pairs
+ *            The caller must have already put these into be64 format
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len, uint8_t offset,
+                                           uint64_t *pagelist)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+
+    DPRINTF("send postcopy-ram-discard");
+    buf = g_malloc0(len*16 + strlen(name) + 3);
+    buf[0] = 0; /* Version */
+    buf[1] = offset;
+    assert(strlen(name) < 256);
+    buf[2] = strlen(name);
+    memcpy(buf+3, name, strlen(name));
+    tmplen = 3+strlen(name);
+    memcpy(buf + tmplen, pagelist, len*16);
+
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD,
+                             tmplen + len*16, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive page data. */
+void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-listen");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_ram_run(QEMUFile *f)
+{
+    DPRINTF("send postcopy-ram-run");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL);
+}
+
+/* End of postcopy - with a status byte; 0 is good, anything else is a fail */
+void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status)
+{
+    DPRINTF("send postcopy-ram-end");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -935,6 +1014,220 @@ static LoadStateEntry_Head loadvm_handlers =
 static int qemu_loadvm_state_main(QEMUFile *f,
                                   LoadStateEntry_Head *loadvm_handlers);
 
+/* ------ incoming postcopy-ram messages ------ */
+/* 'advise' arrives before any RAM transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    /* Check this host can do it */
+    if (postcopy_ram_hosttest()) {
+        return -1;
+    }
+
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_ADVISE;
+
+    /*
+     * Postcopy will be sending lots of small messages along the return path
+     * that it needs quick answers to.
+     */
+    socket_set_nodelay(qemu_get_fd(mis->return_path));
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ * Bits set in the message represent a page in the source VMs bitmap, but
+ * since the guest/target page sizes can be different on s/d then we have
+ * to convert.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    const int source_target_page_bits = 12; /* TODO */
+    unsigned int first_bit_offset;
+    char ramid[256];
+
+    DPRINTF("%s", __func__);
+
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    /* We're expecting a
+     *    3 byte header,
+     *    a RAM ID string
+     *    then at least 1 2x8 byte chunks
+    */
+    if (len < 19) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+    first_bit_offset = qemu_get_byte(mis->file);
+
+    if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len & 15) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    while (len) {
+        uint64_t startaddr, mask;
+        /*
+         * We now have pairs of address, mask
+         *   The address is in multiples of 64bit chunks in the source bitmask
+         *     ie multiply by 64 and then source-target-page-size to get bytes
+         *     '0' represents the chunk in which the RAMBlock starts for the
+         *     source and 'first_bit_offset' (see above) represents which bit in
+         *     that first word corresponds to the first page of the RAMBlock
+         *   The mask is 64 bits of bitmask starting at that offset into the
+         *   RAMBlock.
+         *
+         *   For example:
+         *      an address of 1 with a first_bit_offset of 12 indicates
+         *      page 1*64 - 12 = page 52 for bit 0 of the mask
+         *      Source guarantees that for address 0, bits <first_bit_offset
+         *      shall be 0
+         */
+        startaddr = qemu_get_be64(mis->file) * 64;
+        mask = qemu_get_be64(mis->file);
+
+        len -= 16;
+
+        while (mask) {
+            /* mask= .....?10...0 */
+            /*             ^fs    */
+            int firstset = ctz64(mask);
+
+            /* tmp64=.....?11...1 */
+            /*             ^fs    */
+            uint64_t tmp64 = mask | ((((uint64_t)1)<<firstset)-1);
+
+            /* mask= .?01..10...0 */
+            /*         ^fz ^fs    */
+            int firstzero = cto64(tmp64);
+
+            if ((startaddr == 0) && (firstset < first_bit_offset)) {
+                error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set"
+                               " prior to block; block=%s offset=%d"
+                               " firstset=%d\n", ramid, first_bit_offset,
+                               firstzero);
+                return -1;
+            }
+            /*
+             * we know there must be at least 1 bit set due to the loop entry
+             * If there is no 0 firstzero will be 64
+             */
+            /* TODO - ram_discard_range gets added in a later patch
+            int ret = ram_discard_range(mis, ramid, source_target_page_bits,
+                                startaddr + firstset - first_bit_offset,
+                                startaddr + (firstzero - 1) - first_bit_offset);
+             */
+            ret = -1; /* TODO */
+            if (ret) {
+                return ret;
+            }
+
+            /* mask= .?0000000000 */
+            /*         ^fz ^fs    */
+            if (firstzero != 64) {
+                mask &= (((uint64_t)-1) << firstzero);
+            } else {
+                mask = 0;
+            }
+        }
+    }
+    DPRINTF("%s finished", __func__);
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive page data */
+static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_LISTENING;
+
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
+/* The end - with a byte from the source which can tell us to fail. */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+{
+    DPRINTF("%s", __func__);
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_RAM_END in wrong postcopy state (%d)",
+                     mis->postcopy_ram_state);
+        return -1;
+    }
+    return -1; /* TODO - expecting 1 byte good/fail */
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -997,6 +1290,37 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_advise(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_listen(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_run(mis);
+
+    case QEMU_VM_CMD_POSTCOPY_RAM_END:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_END",
+                                                   len, 1)) {
+            return -1;
+        }
+        return loadvm_postcopy_ram_handle_end(mis);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 16/43] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 15/43] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 17/43] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

QEMU_VM_CMD_PACKAGED is a migration command that allows a chunk
of migration stream to be sent in one go, and be received by
a separate instance of the loadvm loop while not interacting
with the migration stream.

This is used by postcopy to load device state (from the package)
while loading memory pages from the main stream.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 savevm.c                | 79 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0641cc2..abf0d63 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -86,6 +86,7 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_INVALID = 0,   /* Must be 0 */
     QEMU_VM_CMD_OPENRP,        /* Tell the dest to open the Return path */
     QEMU_VM_CMD_REQACK,        /* Request an ACK on the RP */
+    QEMU_VM_CMD_PACKAGED,      /* Send a wrapped stream within this stream */
 
     QEMU_VM_CMD_POSTCOPY_RAM_ADVISE = 20,  /* Prior to any page transfers, just
                                               warn we might want to do PC */
@@ -100,6 +101,8 @@ enum qemu_vm_cmd {
     QEMU_VM_CMD_AFTERLASTVALID
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -111,6 +114,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_openrp(QEMUFile *f);
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len, uint8_t offset,
diff --git a/savevm.c b/savevm.c
index 13d975d..171676d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -627,6 +627,38 @@ void qemu_savevm_send_openrp(QEMUFile *f)
     qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ */
+void qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    tmp = cpu_to_be32(len);
+
+    DPRINTF("send_packaged");
+    qemu_savevm_command_send(f, QEMU_VM_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+}
+
 /* Send prior to any RAM transfer */
 void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f)
 {
@@ -1241,6 +1273,45 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length,
+                                      LoadStateEntry_Head *loadvm_handlers)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    DPRINTF("loadvm_handle_cmd_packaged: length=%u", length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    DPRINTF("%s: Received %d package, going to load", __func__, ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, loadvm_handlers);
+    DPRINTF("%s: qemu_loadvm_state_main returned %d", __func__, ret);
+    qemu_fclose(packf); /* also frees the qsb */
+
+    return ret;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
@@ -1290,6 +1361,14 @@ static int loadvm_process_command(QEMUFile *f,
         migrate_send_rp_ack(mis, tmp32);
         break;
 
+    case QEMU_VM_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32, loadvm_handlers);
+
     case QEMU_VM_CMD_POSTCOPY_RAM_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RAM_ADVISE",
                                                    len, 0)) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 17/43] migrate_init: Call from savevm
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 16/43] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 18/43] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 include/qemu/typedefs.h       | 1 +
 migration.c                   | 2 +-
 savevm.c                      | 2 ++
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e3f4494..bbb616e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -141,6 +141,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0f79b5c..8539de6 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -16,6 +16,7 @@ struct Monitor;
 typedef struct Monitor Monitor;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 
 typedef struct Property Property;
 typedef struct PropertyInfo PropertyInfo;
diff --git a/migration.c b/migration.c
index bd7071a..500de77 100644
--- a/migration.c
+++ b/migration.c
@@ -462,7 +462,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index 171676d..809bd04 100644
--- a/savevm.c
+++ b/savevm.c
@@ -941,6 +941,8 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 18/43] Allow savevm handlers to state whether they could go into postcopy
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 17/43] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use that to split the qemu_savevm_state_pending counts into postcopiable
and non-postcopiable amounts

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  7 +++++++
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  4 +++-
 migration.c                 |  9 ++++++++-
 savevm.c                    | 23 +++++++++++++++++++----
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a3c468e..aeeaf37 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1190,6 +1190,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/* RAM's always up for postcopying */
+static bool ram_can_postcopy(void *opaque)
+{
+    return true;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -1197,6 +1203,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
+    .can_postcopy = ram_can_postcopy,
 };
 
 void ram_mig_init(void)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9a001bd..4991935 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,7 +54,7 @@ typedef struct SaveVMHandlers {
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
     uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    bool (*can_postcopy)(void *opaque);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index abf0d63..dc53580 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -109,7 +109,9 @@ void qemu_savevm_state_begin(QEMUFile *f,
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/migration.c b/migration.c
index 500de77..ee923e2 100644
--- a/migration.c
+++ b/migration.c
@@ -831,8 +831,15 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
+            uint64_t pend_post, pend_nonpost;
+            DPRINTF("iterate\n");
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
             trace_migrate_pending(pending_size, max_size);
+            DPRINTF("pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64
+                    " nonpost=%" PRIu64 ")\n",
+                    pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/savevm.c b/savevm.c
index 809bd04..412bf9d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -903,10 +903,18 @@ void qemu_savevm_state_complete(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t res_nonpc = 0;
+    uint64_t res_pc = 0;
+    uint64_t tmp;
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -917,9 +925,16 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        tmp = se->ops->save_live_pending(f, se->opaque, max_size);
+
+        if (se->ops->can_postcopy(se->opaque)) {
+            res_pc += tmp;
+        } else {
+            res_nonpc += tmp;
+        }
     }
-    return ret;
+    *res_non_postcopiable = res_nonpc;
+    *res_postcopiable = res_pc;
 }
 
 void qemu_savevm_state_cancel(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 18/43] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-12  5:32   ` zhanghailiang
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 Makefile.objs                    |   2 +-
 include/migration/postcopy-ram.h |  19 ++++++
 postcopy-ram.c                   | 129 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 postcopy-ram.c

diff --git a/Makefile.objs b/Makefile.objs
index 1f76cea..a7ad235 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -55,7 +55,7 @@ common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
 common-obj-y += qemu-char.o #aio.o
 common-obj-y += block-migration.o
-common-obj-y += page_cache.o xbzrle.o
+common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..dcd1afa
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return 0 if the host supports everything we need to do postcopy-ram */
+int postcopy_ram_hosttest(void);
+
+#endif
diff --git a/postcopy-ram.c b/postcopy-ram.c
new file mode 100644
index 0000000..1f3e6ea
--- /dev/null
+++ b/postcopy-ram.c
@@ -0,0 +1,129 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+
+//#define DEBUG_POSTCOPY
+
+#ifdef DEBUG_POSTCOPY
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "postcopy@%" PRId64 " " fmt "\n", \
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
+                          ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+/* On Linux we use:
+ *    madvise MADV_USERFAULT - to mark an area of anonymous memory such
+ *                             that userspace is notifed of accesses to
+ *                             unallocated areas.
+ *    userfaultfd      - opens a socket to receive USERFAULT messages
+ *    remap_anon_pages - to shuffle mapped pages into previously unallocated
+ *                       areas without creating loads of VMAs.
+ */
+
+#include <sys/mman.h>
+#include <sys/types.h>
+
+/* TODO remove once we have libc defs
+ * NOTE: These are x86-64 numbers for Andrea's 3.15.0 world */
+#ifndef MADV_USERFAULT
+#define MADV_USERFAULT   18
+#define MADV_NOUSERFAULT 19
+#endif
+
+#ifndef __NR_remap_anon_pages
+#define __NR_remap_anon_pages 317
+#endif
+
+int postcopy_ram_hosttest(void)
+{
+    /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
+     *
+     * Try each syscall we need, but this isn't a testbench,
+     * just enough to see that we have the calls
+     */
+    void *testarea, *testarea2;
+    long pagesize = getpagesize();
+
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (!testarea) {
+        perror("postcopy_ram_hosttest: Failed to map test area");
+        return -1;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    if (madvise(testarea, pagesize, MADV_USERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
+    if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
+        perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
+    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                     MAP_ANONYMOUS, -1, 0);
+    if (!testarea2) {
+        perror("postcopy_ram_hosttest: Failed to map second test area");
+        return -1;
+    }
+    g_assert(((size_t)testarea2 & (pagesize-1)) == 0);
+    *(char *)testarea = 0; /* Force the map of the new page */
+    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
+        pagesize) {
+        perror("postcopy_ram_hosttest: remap_anon_pages not available");
+        munmap(testarea, pagesize);
+        munmap(testarea2, pagesize);
+        return -1;
+    }
+
+    munmap(testarea, pagesize);
+    munmap(testarea2, pagesize);
+    return 0;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+int postcopy_ram_hosttest(void)
+{
+    error_report("postcopy_ram_hosttest: No OS support");
+    return -1;
+}
+
+#endif
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 17:01   ` Eric Blake
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 21/43] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration.c                   | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d0943b1..53362b3 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -987,6 +987,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 4d1838e..e4a6189 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1077,6 +1077,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 4fd3c4a..4d59e6e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,6 +64,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index bbb616e..3c6fdd8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -101,6 +101,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    volatile bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index ee923e2..c473a0f 100644
--- a/migration.c
+++ b/migration.c
@@ -364,6 +364,28 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIG_STATE_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    s->start_postcopy = true;
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index 77e57bd..d639a78 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -538,6 +538,14 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.2
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..96eb854 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -686,6 +686,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_input_query_migrate_cache_size,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 21/43] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 22/43] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIG_STATE_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration.c                   | 74 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3c6fdd8..e7999e6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -148,6 +148,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_bytes_remaining(void);
diff --git a/migration.c b/migration.c
index c473a0f..e3ea9ae 100644
--- a/migration.c
+++ b/migration.c
@@ -38,13 +38,14 @@
     do { } while (0)
 #endif
 
-enum {
+enum MigrationPhase {
     MIG_STATE_ERROR = -1,
     MIG_STATE_NONE,
     MIG_STATE_SETUP,
     MIG_STATE_CANCELLING,
     MIG_STATE_CANCELLED,
     MIG_STATE_ACTIVE,
+    MIG_STATE_POSTCOPY_ACTIVE,
     MIG_STATE_COMPLETED,
 };
 
@@ -248,6 +249,23 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+/* Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIG_STATE_ACTIVE:
+    case MIG_STATE_POSTCOPY_ACTIVE:
+    case MIG_STATE_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -311,6 +329,40 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIG_STATE_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->status = g_strdup("postcopy-active");
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIG_STATE_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -354,7 +406,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -423,7 +475,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIG_STATE_ACTIVE);
+    assert((s->state != MIG_STATE_ACTIVE) &&
+           (s->state != MIG_STATE_POSTCOPY_ACTIVE));
 
     if (s->state != MIG_STATE_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -451,7 +504,8 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
+        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE &&
+            old_state != MIG_STATE_POSTCOPY_ACTIVE) {
             break;
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
@@ -484,6 +538,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIG_STATE_ERROR);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIG_STATE_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -532,7 +591,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIG_STATE_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -848,7 +907,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
-    while (s->state == MIG_STATE_ACTIVE) {
+    DPRINTF("setup complete\n");
+
+    while (s->state == MIG_STATE_ACTIVE ||
+           s->state == MIG_STATE_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 22/43] qemu_savevm_state_complete: Postcopy changes
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 21/43] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 23/43] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When postcopy calls qemu_savevm_state_complete it's not really
the end of migration, so skip:
   a) Finishing postcopiable iterative devices - they'll carry on
   b) The termination byte on the end of the stream.

We then also add:
  qemu_savevm_state_postcopy_complete
which is called at the end of a postcopy migration to call the
complete methods on devices skipped in the _complete call.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 savevm.c                | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index dc53580..ce52c0a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -112,6 +112,7 @@ void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
+void qemu_savevm_state_postcopy_complete(QEMUFile *f);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value);
diff --git a/savevm.c b/savevm.c
index 412bf9d..d54286f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -845,10 +845,51 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
+/*
+ * Calls the complete routines just for those devices that are postcopiable;
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls the plain qemu_savevm_state_complete to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_postcopy_complete(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete ||
+            !se->ops->can_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_savevm_send_postcopy_ram_end(f, 0 /* Good */);
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+}
+
 void qemu_savevm_state_complete(QEMUFile *f)
 {
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete();
 
@@ -863,6 +904,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
                 continue;
             }
         }
+        if (in_postcopy && se->ops &&  se->ops->can_postcopy &&
+            se->ops->can_postcopy(se->opaque)) {
+            DPRINTF("%s: Skipping %s in postcopy", __func__, se->idstr);
+            continue;
+        }
         trace_savevm_section_start(se->idstr, se->section_id);
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_END);
@@ -899,7 +945,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
+
     qemu_fflush(f);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 23/43] Postcopy: Maintain sentmap during postcopy pre phase
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 22/43] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 24/43] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      | 162 ++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h    |   5 ++
 include/migration/postcopy-ram.h |  20 +++++
 migration.c                      |   2 +
 postcopy-ram.c                   | 156 +++++++++++++++++++++++++++++++++++++
 savevm.c                         |   3 -
 6 files changed, 342 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index aeeaf37..134ea7e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -40,6 +40,7 @@
 #include "hw/audio/audio.h"
 #include "sysemu/kvm.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "hw/i386/smbios.h"
 #include "exec/address-spaces.h"
 #include "hw/audio/pcspk.h"
@@ -413,9 +414,15 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return bytes_sent;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * bitoffset: Pointer into which to store the offset into the dirty map
+ *            at which the bit was found.
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 unsigned long *bitoffset)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -434,6 +441,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *bitoffset = next;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -562,6 +570,19 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /*
  * ram_save_page: Send the given page to the stream
  *
@@ -650,13 +671,14 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
     bool complete_round = false;
     int bytes_sent = 0;
     MemoryRegion *mr;
+    unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -674,6 +696,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 
             /* if page is unmodified, continue to the next */
             if (bytes_sent > 0) {
+                MigrationState *s = migrate_get_current();
+                if (s->sentmap) {
+                    set_bit(bitoffset, s->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -733,12 +760,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -806,6 +840,123 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/*
+ * Utility for the outgoing postcopy code; this performs
+ * sentmap &= migration_bitmap
+ * returning the length of the bitmap
+ */
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    migration_bitmap_sync();
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap, ram_pages);
+    return ram_pages;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+int ram_postcopy_each_ram_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        unsigned long first, last;
+        first = block->offset >> TARGET_PAGE_BITS;
+        last = (block->offset + (block->length-1)) >> TARGET_PAGE_BITS;
+        ret = postcopy_send_discard_bm_ram(ms, block->idstr, first, last);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive range of bits indexed in the source
+ *    VMs bitmap for this RAMBlock, source_target_page_bits tells
+ *    us what one of those bits represents.
+ *
+ * start/end are offsets from the start of the bitmap for RAMBlock 'block_name'
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end)
+{
+    assert(end >= start);
+    unsigned int bitdif;
+
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        return -1;
+    }
+
+    if (source_target_page_bits != TARGET_PAGE_BITS) {
+        if (source_target_page_bits < TARGET_PAGE_BITS) {
+            /*
+             * e.g. source is 4K and we're 64k - we'll have to discard
+             * on the larger boundary
+             * e.g. a range of  70K...132K we would discard from
+             * 64K..192K, so round start down, and end up
+             */
+            bitdif = TARGET_PAGE_BITS - source_target_page_bits;
+            start = start >> bitdif;
+            if (end & ((1<<bitdif)-1)) {
+                end = end >> bitdif;
+                end++;
+            } else {
+                end = end >> bitdif;
+            }
+
+        } else {
+            /* e.g. source is 64K and we're 4K - easy just scale the indexes */
+            bitdif = source_target_page_bits - TARGET_PAGE_BITS;
+
+            start = start << bitdif;
+            end = end << bitdif;
+        }
+    }
+
+    uint64_t index_offset = rb->offset >> TARGET_PAGE_BITS;
+    postcopy_pmi_discard_range(mis, start + index_offset, (end - start) + 1);
+
+    /* +1 gives the byte after the end of the last page to be discarded */
+    ram_addr_t end_offset = (end+1) << TARGET_PAGE_BITS;
+    uint8_t *host_startaddr = rb->host + (start << TARGET_PAGE_BITS);
+    uint8_t *host_endaddr;
+
+    if (end_offset <= rb->length) {
+        host_endaddr   = rb->host + (end_offset-1);
+        return postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%zu/%zu/%zu)",
+                     block_name, start, end, rb->length);
+        return -1;
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -844,7 +995,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
         acct_clear();
     }
-
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
@@ -854,6 +1004,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index e7999e6..627c95a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -173,6 +173,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+int64_t ram_mask_postcopy_bitmap(MigrationState *ms);
+int ram_postcopy_each_ram_discard(MigrationState *ms);
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      int source_target_page_bits,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index dcd1afa..fe89a3c 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -13,7 +13,27 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "migration/migration.h"
+
 /* Return 0 if the host supports everything we need to do postcopy-ram */
 int postcopy_ram_hosttest(void);
 
+/* Send the list of sent-but-dirty pages */
+int postcopy_send_discard_bitmap(MigrationState *ms);
+
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called back from arch_init's ram_postcopy_each_ram_discard to handle
+ * discarding one RAMBlock's pre-postcopy dirty pages
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end);
+
 #endif
diff --git a/migration.c b/migration.c
index e3ea9ae..b509e4d 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
@@ -929,6 +930,7 @@ static void *migration_thread(void *opaque)
             } else {
                 int ret;
 
+                DPRINTF("done iterating\n");
                 qemu_mutex_lock_iothread();
                 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
                 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 1f3e6ea..ff6bdd6 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,7 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
 
@@ -116,6 +117,21 @@ int postcopy_ram_hosttest(void)
     return 0;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        perror("postcopy_ram_discard_range MADV_DONTNEED");
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -125,5 +141,145 @@ int postcopy_ram_hosttest(void)
     return -1;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
+                               void *end)
+{
+    error_report("postcopy_ram_discard_range: No OS support");
+    return -1;
+}
+#endif
+
+/* ------------------------------------------------------------------------- */
+/*
+ * A helper to get 64 bits from the sentmap; trivial for HOST_LONG_BITS=64
+ * messier for other sizes; pads with 0's at end if an unaligned end
+ *   check2nd32: True if it's safe to read the upper 32bits in a 32bit long
+ *               map
+ */
+static uint64_t get_64bits_sentmap(unsigned long *sentmap, bool check2nd32,
+                                   int64_t start)
+{
+    uint64_t result;
+#if HOST_LONG_BITS == 64
+    result = sentmap[start / 64];
+#elif HOST_LONG_BITS == 32
+    /*
+     * Irrespective of host endianness, sentmap[n] is for pages earlier
+     * than sentmap[n+1] so we can't just cast up
+     */
+    uint32_t sm0, sm1;
+    sm0 = sentmap[start / 32];
+    sm1 = check2nd32 ? sentmap[(start / 32) + 1] : 0;
+    result = sm0 | ((uint64_t)sm1) << 32;
+#else
+#error "Host long other than 64/32 not supported"
+#endif
+
+    return result;
+}
+
+/*
+ * Callback from ram_postcopy_each_ram_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
+                                 unsigned long start, unsigned long end)
+{
+    /* Keeps command under 256 bytes - but arbitrary */
+    const unsigned int max_entries_per_command = 12;
+    uint16_t cur_entry;
+    uint64_t buffer[2*max_entries_per_command];
+    unsigned int nsentwords = 0;
+    unsigned int nsentcmds = 0;
+
+    /*
+     * There is no guarantee that start, end are on convenient 64bit multiples
+     * (We always send 64bit chunks over the wire, irrespective of long size)
+     */
+    unsigned long first64, last64, cur64;
+    first64 = start / 64;
+    last64 = end / 64;
+
+    cur_entry = 0;
+    for (cur64 = first64; cur64 <= last64; cur64++) {
+        /* Deal with start/end not on alignment */
+        uint64_t mask;
+        mask = ~(uint64_t)0;
+
+        if ((cur64 == first64) && (start & 63)) {
+            /* e.g. (start & 63) = 3
+             *         1 << .    -> 2^3
+             *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+             *         ~.        -> mask 63..3
+             */
+            mask &= ~((((uint64_t)1) << (start & 63)) - 1);
+        }
+
+        if ((cur64 == last64) && ((end & 64) != 63)) {
+            /* e.g. (end & 64) = 3
+             *            .   +1 -> 4
+             *         1 << .    -> 2^4
+             *         . -1      -> 2^4 - 1
+             *                   = mask set 3..0
+             */
+            mask &= (((uint64_t)1) << ((end & 64) + 1)) - 1;
+        }
+
+        uint64_t data = get_64bits_sentmap(ms->sentmap,
+                                           (end & 64) >= 32, cur64 * 64);
+        data &= mask;
+
+        if (data) {
+            cpu_to_be64w(buffer+2*cur_entry, (cur64-first64));
+            cpu_to_be64w(buffer+1+2*cur_entry, data);
+            cur_entry++;
+            nsentwords++;
+
+            if (cur_entry == max_entries_per_command) {
+                /* Full set, ship it! */
+                qemu_savevm_send_postcopy_ram_discard(ms->file, name,
+                                                      cur_entry,
+                                                      start & 63,
+                                                      buffer);
+                nsentcmds++;
+                cur_entry = 0;
+            }
+        }
+    }
+
+    /* Anything unsent? */
+    if (cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, name, cur_entry,
+                                              start & 63, buffer);
+        nsentcmds++;
+    }
+
+    /*fprintf(stderr, "postcopy_send_discard_bm_ram: '%s' mask words"
+                      " sent=%d in %d commands.\n",
+            name, nsentwords, nsentcmds);*/
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     * (arch_init gives us the full size as a return)
+     */
+    ram_mask_postcopy_bitmap(ms);
+
+    DPRINTF("Dumping merged sentmap");
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
 #endif
 
+    return ram_postcopy_each_ram_discard(ms);
+}
+
diff --git a/savevm.c b/savevm.c
index d54286f..0cae88b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1244,12 +1244,9 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
              * we know there must be at least 1 bit set due to the loop entry
              * If there is no 0 firstzero will be 64
              */
-            /* TODO - ram_discard_range gets added in a later patch
             int ret = ram_discard_range(mis, ramid, source_target_page_bits,
                                 startaddr + firstset - first_bit_offset,
                                 startaddr + (firstzero - 1) - first_bit_offset);
-             */
-            ret = -1; /* TODO */
             if (ret) {
                 return ret;
             }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 24/43] Postcopy page-map-incoming (PMI) structure
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 23/43] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 25/43] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The PMI holds the state of each page on the incoming side,
so that we can tell if the page is missing, already received
or there is a request outstanding for it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  18 +++++
 include/migration/postcopy-ram.h |  10 +++
 include/qemu/typedefs.h          |   1 +
 postcopy-ram.c                   | 139 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 168 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 627c95a..0ed3790 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -58,6 +58,23 @@ struct MigrationRetPathState {
 
 typedef struct MigrationState MigrationState;
 
+/* Postcopy page-map-incoming - data about each page on the inbound side */
+
+typedef enum {
+   POSTCOPY_PMI_MISSING,   /* page hasn't yet been received */
+   POSTCOPY_PMI_REQUESTED, /* Kernel asked for a page, but we've not got it */
+   POSTCOPY_PMI_RECEIVED   /* We've got the page */
+} PostcopyPMIState;
+
+struct PostcopyPMI {
+    /* TODO: I'm expecting to rework this using some atomic compare-exchange
+     * thing, which will require merging the maps together
+     */
+    QemuMutex      mutex;
+    unsigned long *received_map;  /* Pages that we have received */
+    unsigned long *requested_map; /* Pages that we're sending a request for */
+};
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -72,6 +89,7 @@ struct MigrationIncomingState {
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    PostcopyPMI    postcopy_pmi;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index fe89a3c..e06cb45 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -36,4 +36,14 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
                                  unsigned long start, unsigned long end);
 
+/*
+ * In 'advise' mode record that a page has been received.
+ */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index);
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis);
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 8539de6..61b330c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -77,6 +77,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
+typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
 typedef struct AdapterInfo AdapterInfo;
 
diff --git a/postcopy-ram.c b/postcopy-ram.c
index ff6bdd6..b41fec8 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -23,6 +23,8 @@
 #include "qemu-common.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 
 //#define DEBUG_POSTCOPY
@@ -66,6 +68,135 @@
 #define __NR_remap_anon_pages 317
 #endif
 
+/* ---------------------------------------------------------------------- */
+/* Postcopy pagemap-inbound (pmi) - data structures that record the       */
+/* state of each page used by the inbound postcopy                        */
+
+static void postcopy_pmi_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    qemu_mutex_init(&mis->postcopy_pmi.mutex);
+    mis->postcopy_pmi.received_map = bitmap_new(ram_pages);
+    mis->postcopy_pmi.requested_map = bitmap_new(ram_pages);
+    bitmap_clear(mis->postcopy_pmi.received_map, 0, ram_pages);
+    bitmap_clear(mis->postcopy_pmi.requested_map, 0, ram_pages);
+}
+
+void postcopy_pmi_destroy(MigrationIncomingState *mis)
+{
+    if (mis->postcopy_pmi.received_map) {
+        g_free(mis->postcopy_pmi.received_map);
+        mis->postcopy_pmi.received_map = NULL;
+    }
+    if (mis->postcopy_pmi.requested_map) {
+        g_free(mis->postcopy_pmi.requested_map);
+        mis->postcopy_pmi.requested_map = NULL;
+    }
+    qemu_mutex_destroy(&mis->postcopy_pmi.mutex);
+}
+
+/*
+ * Mark a set of pages in the PMI as being clear; this is used by the discard
+ * at the start of postcopy, and before the postcopy stream starts.
+ */
+void postcopy_pmi_discard_range(MigrationIncomingState *mis,
+                                size_t start, size_t npages)
+{
+    bitmap_clear(mis->postcopy_pmi.received_map, start, npages);
+}
+
+/*
+ * Retrieve the state of the given page
+ * Note: This version for use by callers already holding the lock
+ */
+static PostcopyPMIState postcopy_pmi_get_state_nolock(
+                            MigrationIncomingState *mis,
+                            size_t bitmap_index)
+{
+    bool received, requested;
+
+    received = test_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    requested = test_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+
+    if (received) {
+        assert(!requested);
+        return POSTCOPY_PMI_RECEIVED;
+    } else {
+        return requested ? POSTCOPY_PMI_REQUESTED : POSTCOPY_PMI_MISSING;
+    }
+}
+
+/* Retrieve the state of the given page */
+static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
+                                               size_t bitmap_index)
+{
+    PostcopyPMIState ret;
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    ret = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+
+    return ret;
+}
+
+/*
+ * Set the page state to the given state if the previous state was as expected
+ * Return the actual previous state.
+ */
+static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState *mis,
+                                           size_t bitmap_index,
+                                           PostcopyPMIState expected_state,
+                                           PostcopyPMIState new_state)
+{
+    PostcopyPMIState old_state;
+
+    qemu_mutex_lock(&mis->postcopy_pmi.mutex);
+    old_state = postcopy_pmi_get_state_nolock(mis, bitmap_index);
+
+    if (old_state == expected_state) {
+        switch (new_state) {
+        case POSTCOPY_PMI_MISSING:
+          assert(0); /* This shouldn't actually happen - use discard_range */
+          break;
+
+        case POSTCOPY_PMI_REQUESTED:
+          assert(old_state == POSTCOPY_PMI_MISSING);
+          set_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+
+        case POSTCOPY_PMI_RECEIVED:
+          assert(old_state == POSTCOPY_PMI_MISSING ||
+                 old_state == POSTCOPY_PMI_REQUESTED);
+          set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+          clear_bit(bitmap_index, mis->postcopy_pmi.requested_map);
+          break;
+        }
+    }
+
+    qemu_mutex_unlock(&mis->postcopy_pmi.mutex);
+    return old_state;
+}
+
+static void postcopy_pmi_dump(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_pmi_dump: requested\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.requested_map, false);
+    fprintf(stderr, "postcopy_pmi_dump: received\n");
+    ram_debug_dump_bitmap(mis->postcopy_pmi.received_map, true);
+}
+
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * If we're in precopy-advise mode we need to track received pages even
+         * though we don't need to place pages atomically yet.
+         * In advise mode there's only a single thread, so don't need locks
+         */
+        set_bit(bitmap_index, mis->postcopy_pmi.received_map);
+    }
+}
+
 int postcopy_ram_hosttest(void)
 {
     /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
@@ -147,6 +278,14 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, void *start,
     error_report("postcopy_ram_discard_range: No OS support");
     return -1;
 }
+
+/* Called by ram_load prior to mapping the page */
+void postcopy_hook_early_receive(MigrationIncomingState *mis,
+                                 size_t bitmap_index)
+{
+    /* We don't support postcopy so don't care */
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 25/43] postcopy: Add incoming_init/cleanup functions
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 24/43] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 26/43] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide functions to be called before the start of a postcopy
enabled migration (even if it's not eventually used) and
at the end.

During the init we must disable huge pages in the RAM that
we will receive postcopy data into, since if they start off
as hugepage and get a 4k page written to them, the rest of
the hugepage won't get userfault'd and won't work as a destination
for remap.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  12 +++++
 postcopy-ram.c                   | 101 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 113 insertions(+)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index e06cb45..ca2045b 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -30,6 +30,18 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Called back from arch_init's ram_postcopy_each_ram_discard to handle
  * discarding one RAMBlock's pre-postcopy dirty pages
  */
diff --git a/postcopy-ram.c b/postcopy-ram.c
index b41fec8..b1fbabd 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -263,6 +263,95 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    DPRINTF("init_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can use remap_anon_pages to put
+     * a 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        perror("init_area: NOHUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    /* Turn off userfault here as well? */
+
+    DPRINTF("cleanup_area: %s: %p offset=%zx length=%zd(%zx)",
+            block_name, host_addr, offset, length, length);
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        perror("init_area: HUGEPAGE");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    postcopy_pmi_init(mis, ram_pages);
+
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    close(mis->userfault_fd);
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -286,6 +375,18 @@ void postcopy_hook_early_receive(MigrationIncomingState *mis,
     /* We don't support postcopy so don't care */
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support\n");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    error_report("postcopy_ram_incoming_cleanup: No OS support\n");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 26/43] postcopy: Incoming initialisation
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 25/43] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 27/43] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 11 +++++++++++
 include/migration/migration.h |  1 +
 migration.c                   |  1 +
 3 files changed, 13 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 134ea7e..fd7399c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1234,6 +1234,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0ed3790..12a54e5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -196,6 +196,7 @@ int ram_postcopy_each_ram_discard(MigrationState *ms);
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       int source_target_page_bits,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration.c b/migration.c
index b509e4d..d962aee 100644
--- a/migration.c
+++ b/migration.c
@@ -99,6 +99,7 @@ MigrationIncomingState *migration_incoming_state_init(QEMUFile* f)
 
 void migration_incoming_state_destroy(void)
 {
+    postcopy_pmi_destroy(mis_current);
     g_free(mis_current);
     mis_current = NULL;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 27/43] postcopy: ram_enable_notify to switch on userfault
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 26/43] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 28/43] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  5 +++++
 postcopy-ram.c                   | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index ca2045b..fc23558 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -28,6 +28,11 @@ int postcopy_send_discard_bitmap(MigrationState *ms);
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                uint8_t *end);
 
+/*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
 
 /*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
diff --git a/postcopy-ram.c b/postcopy-ram.c
index b1fbabd..5b52542 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -352,9 +352,38 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: Unused
+ * Returns 0 on success
+ */
+static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
+                                       ram_addr_t offset, ram_addr_t length,
+                                       void *opaque)
+{
+    if (madvise(host_addr, length, MADV_USERFAULT)) {
+        perror("postcopy_ram_sensitise_area");
+        return -1;
+    }
+    return 0;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 int postcopy_ram_hosttest(void)
 {
     error_report("postcopy_ram_hosttest: No OS support");
@@ -387,6 +416,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return -1;
 }
 
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
+    return -1;
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 28/43] Postcopy: postcopy_start
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 27/43] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 29/43] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_start:
  Perform all the initialisation associated with starting up postcopy
mode from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/migration.c b/migration.c
index d962aee..4be9053 100644
--- a/migration.c
+++ b/migration.c
@@ -890,6 +890,91 @@ static void await_outgoing_return_path_close(MigrationState *ms)
     DPRINTF("%s: Exit", __func__);
 }
 
+/* Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    migrate_set_state(ms, MIG_STATE_ACTIVE, MIG_STATE_POSTCOPY_ACTIVE);
+
+    DPRINTF("postcopy_start\n");
+    qemu_mutex_lock_iothread();
+    DPRINTF("postcopy_start: setting run state\n");
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (postcopy_send_discard_bitmap(ms)) {
+        DPRINTF("postcopy send discard bitmap failed\n");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        return -1;
+    }
+
+    DPRINTF("postcopy_start: sending req 2\n");
+    qemu_savevm_send_reqack(ms->file, 2);
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in MIG_STATE_POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    DPRINTF("postcopy_start: do state_complete\n");
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+
+    qemu_savevm_state_complete(fb);
+    DPRINTF("postcopy_start: sending req 3\n");
+    qemu_savevm_send_reqack(fb, 3);
+
+    qemu_savevm_send_postcopy_ram_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qsb_get_length(qsb) > MAX_VM_CMD_PACKAGED_SIZE) {
+        DPRINTF("postcopy_start: Unreasonably large packaged state: %lu\n",
+                (unsigned long)(qsb_get_length(qsb)));
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+        qemu_mutex_unlock_iothread();
+        qemu_fclose(fb);
+        return -1;
+    }
+    qemu_savevm_send_packaged(ms->file, qsb);
+    qemu_fclose(fb);
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("postcopy_start not finished sending ack\n");
+    qemu_savevm_send_reqack(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIG_STATE_POSTCOPY_ACTIVE, MIG_STATE_ERROR);
+    }
+
+    return ret;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 29/43] Postcopy: Rework migration thread for postcopy mode
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 28/43] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 30/43] mig fd_connect: open return path Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Switch to postcopy if:
   1) There's still a significant amount to transfer
   2) Postcopy is enabled
   3) migrate_postcopy_start has been issued.

and change the cleanup at the end of migration to match.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 70 insertions(+), 18 deletions(-)

diff --git a/migration.c b/migration.c
index 4be9053..472fc4d 100644
--- a/migration.c
+++ b/migration.c
@@ -982,16 +982,36 @@ static int postcopy_start(MigrationState *ms)
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
+
     bool old_vm_running = false;
 
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;
+
     qemu_savevm_state_begin(s->file, &s->params);
 
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open it's end so it can reply */
+        qemu_savevm_send_openrp(s->file);
+
+        /* And ask it to send an ack that will make stuff easier to debug */
+        qemu_savevm_send_reqack(s->file, 1);
+
+        /* Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_ram_advise(s->file);
+    }
+
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIG_STATE_ACTIVE;
     migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);
 
     DPRINTF("setup complete\n");
@@ -1012,37 +1032,68 @@ static void *migration_thread(void *opaque)
                     " nonpost=%" PRIu64 ")\n",
                     pending_size, max_size, pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
-                qemu_savevm_state_iterate(s->file);
-            } else {
-                int ret;
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIG_STATE_POSTCOPY_ACTIVE &&
+                    pend_nonpost == 0 && s->start_postcopy) {
 
-                DPRINTF("done iterating\n");
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
+                    if (!postcopy_start(s)) {
+                        current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
+                    }
 
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
+                    continue;
+                } else {
+                    /* Just another iteration step */
+                    qemu_savevm_state_iterate(s->file);
                 }
-                qemu_mutex_unlock_iothread();
+            } else {
+                int ret;
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
-                    break;
+                DPRINTF("done iterating pending size %" PRIu64 "\n",
+                        pending_size);
+
+                if (s->state == MIG_STATE_ACTIVE) {
+                    qemu_mutex_lock_iothread();
+                    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+                    old_vm_running = runstate_is_running();
+
+                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+                    if (ret >= 0) {
+                        qemu_file_set_rate_limit(s->file, INT64_MAX);
+                        qemu_savevm_state_complete(s->file);
+                    }
+                    qemu_mutex_unlock_iothread();
+
+                    if (ret < 0) {
+                        migrate_set_state(s, current_active_type,
+                                          MIG_STATE_ERROR);
+                        break;
+                    }
+                } else {
+                    assert(s->state == MIG_STATE_POSTCOPY_ACTIVE);
+                    DPRINTF("postcopy end\n");
+
+                    qemu_savevm_state_postcopy_complete(s->file);
+                    DPRINTF("postcopy end after complete\n");
+
+                    await_outgoing_return_path_close(s);
+                    DPRINTF("postcopy end after rp close");
                 }
 
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
+                    migrate_set_state(s, current_active_type,
+                                      MIG_STATE_COMPLETED);
                     break;
                 }
             }
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
+            migrate_set_state(s, current_active_type, MIG_STATE_ERROR);
+            DPRINTF("migration_thread: file is in error state\n");
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1073,6 +1124,7 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    DPRINTF("migration_thread: Hit error: case\n");
     qemu_mutex_lock_iothread();
     if (s->state == MIG_STATE_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 30/43] mig fd_connect: open return path
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 29/43] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 31/43] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open the return path before migration thread creation.
Since this can fail, guard the fd cleanup so it doesn't
try and destroy the potentially non-existent thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 migration.c                   | 18 +++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 12a54e5..21aa8e3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -122,6 +122,9 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     volatile bool start_postcopy;
+    /* Flag set once the migration thread is running (and needs joining) */
+    volatile bool started_migration_thread;
+
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index 472fc4d..9637941 100644
--- a/migration.c
+++ b/migration.c
@@ -470,7 +470,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->started_migration_thread) {
+            qemu_thread_join(&s->thread);
+            s->started_migration_thread = false;
+        }
         qemu_mutex_lock_iothread();
 
         qemu_fclose(s->file);
@@ -1162,6 +1165,19 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /* Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_outgoing_return_path(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ERROR);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->started_migration_thread = true;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 31/43] Postcopy: Create a fault handler thread before marking the ram as userfault
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 30/43] mig fd_connect: open return path Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 32/43] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 postcopy-ram.c                | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 21aa8e3..98ebe69 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -87,6 +87,9 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 5b52542..84b36d7 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -372,8 +372,31 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
     return 0;
 }
 
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
         return -1;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 32/43] Page request: Add MIG_RPCOMM_REQPAGES reverse command
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 31/43] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 33/43] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RPCOMM_REQPAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 ++
 migration.c                   | 77 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 98ebe69..df5f0fb 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -45,6 +45,7 @@ enum mig_rpcomm_cmd {
     MIG_RPCOMM_INVALID = 0,  /* Must be 0 */
     MIG_RPCOMM_SHUT,         /* sibling will not send any more RP messages */
     MIG_RPCOMM_ACK,          /* data (seq: be32 ) */
+    MIG_RPCOMM_REQPAGES,     /* data (start: be64, len: be64) */
     MIG_RPCOMM_AFTERLASTVALID
 };
 
@@ -241,6 +242,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_ack(MigrationIncomingState *mis,
                          uint32_t value);
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration.c b/migration.c
index 9637941..8dfae51 100644
--- a/migration.c
+++ b/migration.c
@@ -144,6 +144,38 @@ void migrate_send_rp_ack(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RPCOMM_ACK, 4, (uint8_t *)&buf);
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_reqpages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = (uint64_t)start;
+    buf64[0] = cpu_to_be64(buf64[0]);
+    buf64[1] = (uint64_t)len;
+    buf64[1] = cpu_to_be64(buf64[1]);
+    migrate_send_rp_message(mis, MIG_RPCOMM_REQPAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -774,6 +806,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  * This is a coroutine that sits around listening for messages as
@@ -783,9 +826,12 @@ static void source_return_path_co(void *opaque)
 {
     MigrationState *ms = opaque;
     QEMUFile *rp = ms->return_path;
+    uint16_t expected_len;
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
+    char *tmpstr;
     int res;
 
     DPRINTF("RP: source_return_path_co entry");
@@ -794,14 +840,17 @@ static void source_return_path_co(void *opaque)
         ms->rp_state.header_com = qemu_get_be16(rp);
         ms->rp_state.header_len = qemu_get_be16(rp);
 
-        uint16_t expected_len;
-
         switch (ms->rp_state.header_com) {
         case MIG_RPCOMM_SHUT:
         case MIG_RPCOMM_ACK:
             expected_len = 4;
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             error_report("RP: Received invalid cmd 0x%04x length 0x%04x",
                     ms->rp_state.header_com, ms->rp_state.header_len);
@@ -849,6 +898,30 @@ static void source_return_path_co(void *opaque)
             atomic_xchg(&ms->rp_state.latest_ack, tmp32);
             break;
 
+        case MIG_RPCOMM_REQPAGES:
+            tmp64a = be64_to_cpup((uint64_t *)buf);  /* Start */
+            tmp64b = be64_to_cpup(((uint64_t *)buf)+1); /* Len */
+            tmpstr = NULL;
+            if (tmp64b & 1) {
+                tmp64b -= 1; /* Remove the flag */
+                /* Now we expect an idstr */
+                tmp32 = buf[16]; /* Length of the following idstr */
+                tmpstr = (char *)&buf[17];
+                buf[17+tmp32] = '\0';
+                expected_len = 16+1+tmp32;
+            } else {
+                expected_len = 16;
+            }
+            if (ms->rp_state.header_len != expected_len) {
+                error_report("RP: Received ReqPage with length %d expecting %d",
+                        ms->rp_state.header_len, expected_len);
+                source_return_path_bad(ms);
+            }
+            migrate_handle_rp_reqpages(ms, tmpstr,
+                                          (ram_addr_t)tmp64a,
+                                          (ram_addr_t)tmp64b);
+            break;
+
         default:
             /* This shouldn't happen because we should catch this above */
             DPRINTF("RP: Bad header_com in dispatch");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 33/43] Page request: Process incoming page request
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 32/43] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 34/43] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQPAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 52 +++++++++++++++++++++++++++++++++++++++++++
 include/migration/migration.h | 27 ++++++++++++++++++++++
 include/qemu/typedefs.h       |  3 ++-
 migration.c                   | 34 +++++++++++++++++++++++++++-
 4 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index fd7399c..cc4acea 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -658,6 +658,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no previous block");
+            return -1;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            return -1;
+        }
+    }
+    DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
+                    ramblock->idstr, start, len);
+
+    if (start+len > ramblock->length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->length);
+        return -1;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return 0;
+}
+
+/*
  * ram_find_and_save_block: Finds a page to send and sends it to f
  *
  * Returns:  The number of bytes written.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index df5f0fb..16f9db1 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -100,6 +100,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -129,6 +141,18 @@ struct MigrationState
     /* Flag set once the migration thread is running (and needs joining) */
     volatile bool started_migration_thread;
 
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     *    Where it's used to send the dirtymap at the start
+     *    of the postcopy phase, then cleared
+     */
+    unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -264,4 +288,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              int *bytes_sent);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 61b330c..d57acc5 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 
+typedef struct AdapterInfo AdapterInfo;
 typedef struct AioContext AioContext;
 
 typedef struct Visitor Visitor;
@@ -79,6 +80,6 @@ typedef struct FWCfgState FWCfgState;
 typedef struct PcGuestInfo PcGuestInfo;
 typedef struct PostcopyPMI PostcopyPMI;
 typedef struct Range Range;
-typedef struct AdapterInfo AdapterInfo;
+typedef struct RAMBlock RAMBlock;
 
 #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration.c b/migration.c
index 8dfae51..44eb0e2 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 //#define DEBUG_MIGRATION
 
@@ -499,6 +501,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -600,6 +611,9 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIG_STATE_SETUP;
     trace_migrate_set_state(MIG_STATE_SETUP);
 
+    qemu_mutex_init(&s->src_page_req_mutex);
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -813,7 +827,25 @@ static void source_return_path_bad(MigrationState *s)
 static void migrate_handle_rp_reqpages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
-    DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len);
+    DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx",
+            rbname, start, len);
+
+    /* Round everything up to our host page size */
+    long our_host_ps = sysconf(_SC_PAGESIZE);
+    if (start & (our_host_ps-1)) {
+        long roundings = start & (our_host_ps-1);
+        start -= roundings;
+        len += roundings;
+    }
+    if (len & (our_host_ps-1)) {
+        long roundings = len & (our_host_ps-1);
+        len -= roundings;
+        len += our_host_ps;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 34/43] Page request: Consume pages off the post-copy queue
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 33/43] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 35/43] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 106 insertions(+), 24 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index cc4acea..c006d21 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -458,6 +458,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
     return ret;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     ram_addr_t addr;
@@ -658,6 +671,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:   The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:   MigrationState in
+ *  offset:   the byte offset within the RAMBlock for the start of the page
+ * bitoffset: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       unsigned long *bitoffset)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *bitoffset = (entry->offset + entry->rb->offset) >> TARGET_PAGE_BITS;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+/*
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
  *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
@@ -718,44 +764,80 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int bytes_sent = 0;
-    MemoryRegion *mr;
     unsigned long bitoffset;
 
     if (!block)
         block = QTAILQ_FIRST(&ram_list.blocks);
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset, &bitoffset);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
-        }
-        if (offset >= block->length) {
-            offset = 0;
-            block = QTAILQ_NEXT(block, next);
-            if (!block) {
-                block = QTAILQ_FIRST(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset);
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            DPRINTF("%s: Got postcopy item '%s' offset=%zx bitoffset=%zx",
+                    __func__, tmpblock->idstr, tmpoffset, bitoffset);
+            /* We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again.
+             */
+            if (!migration_bitmap_clear_dirty(bitoffset << TARGET_PAGE_BITS)) {
+                DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx (sent=%d)",
+                        __func__, tmpblock->idstr, tmpoffset, bitoffset,
+                        test_bit(bitoffset, ms->sentmap));
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We mustn't change block/offset unless it's to a valid one
+             * otherwise we can go down some of the exit cases in the normal
+             * path.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
         } else {
-            bytes_sent = ram_save_page(f, block, offset, last_stage);
-
-            /* if page is unmodified, continue to the next */
-            if (bytes_sent > 0) {
-                MigrationState *s = migrate_get_current();
-                if (s->sentmap) {
-                    set_bit(bitoffset, s->sentmap);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                           &bitoffset);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
+            }
+            if (offset >= block->length) {
+                offset = 0;
+                block = QTAILQ_NEXT(block, next);
+                if (!block) {
+                    block = QTAILQ_FIRST(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
                 }
+                continue; /* pick an offset in the new block */
+            }
+        }
 
-                last_sent_block = block;
-                break;
+        /* We have a page to send, so send it */
+        bytes_sent = ram_save_page(f, block, offset, last_stage);
+
+        /* if page is unmodified, continue to the next */
+        if (bytes_sent > 0) {
+            if (ms->sentmap) {
+                set_bit(bitoffset, ms->sentmap);
             }
+
+            last_sent_block = block;
+            break;
         }
     }
     last_seen_block = block;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 35/43] Add assertion to check migration_dirty_pages
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 34/43] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 36/43] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

I've seen it go negative once during dev, it shouldn't
happen.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch_init.c b/arch_init.c
index c006d21..58eccc1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -439,6 +439,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 
     if (next < size) {
         clear_bit(next, migration_bitmap);
+        assert(migration_dirty_pages > 0);
         migration_dirty_pages--;
     }
     *bitoffset = next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 36/43] postcopy_ram.c: place_page and helpers
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 35/43] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 37/43] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the new remap_anon_pages syscall).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |   1 +
 include/migration/postcopy-ram.h |  23 +++++++++
 postcopy-ram.c                   | 105 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 16f9db1..59d48bc 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -94,6 +94,7 @@ struct MigrationIncomingState {
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
+    void          *postcopy_tmp_page;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index fc23558..eceea95 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -59,6 +59,29 @@ int postcopy_send_discard_bm_ram(MigrationState *ms, const char *name,
 void postcopy_hook_early_receive(MigrationIncomingState *mis,
                                  size_t bitmap_index);
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset);
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 void postcopy_pmi_destroy(MigrationIncomingState *mis);
 void postcopy_pmi_discard_range(MigrationIncomingState *mis,
                                 size_t start, size_t npages);
diff --git a/postcopy-ram.c b/postcopy-ram.c
index 84b36d7..b37af47 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -349,6 +349,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -405,6 +409,88 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a zero'd page of memory at *host
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host,
+                             long bitmap_offset)
+{
+    void *tmp = postcopy_get_tmp_page(mis);
+    if (!tmp) {
+        return -ENOMEM;
+    }
+    *(char *)tmp = 0;
+    return postcopy_place_page(mis, host, tmp, bitmap_offset);
+}
+
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ * bitmap_offset: Index into the migration bitmaps
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        long bitmap_offset)
+{
+    PostcopyPMIState old_state, tmp_state;
+
+    if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
+            getpagesize()) {
+        perror("remap_anon_pages in postcopy_place_page");
+        fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
+                postcopy_pmi_get_state(mis, bitmap_offset));
+
+        return -errno;
+    }
+
+    tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
+    do {
+        old_state = tmp_state;
+        tmp_state = postcopy_pmi_change_state(mis, bitmap_offset, old_state,
+                                              POSTCOPY_PMI_RECEIVED);
+
+    } while (old_state != tmp_state);
+
+
+    if (old_state == POSTCOPY_PMI_REQUESTED) {
+        /* TODO: Notify kernel */
+    }
+
+    /* TODO: hostpagesize!=targetpagesize case */
+    return 0;
+}
+
+/*
+ * Returns a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            perror("mapping postcopy tmp page");
+            return NULL;
+        }
+        if (madvise(mis->postcopy_tmp_page, getpagesize(), MADV_DONTFORK)) {
+            munmap(mis->postcopy_tmp_page, getpagesize());
+            perror("postcpy tmp page DONTFORK");
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 int postcopy_ram_hosttest(void)
@@ -444,6 +530,25 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     fprintf(stderr, "postcopy_ram_enable_notify: No OS support\n");
     return -1;
 }
+
+int postcopy_place_zero_page(MigrationIncomingState *mis, void *host)
+{
+    error_report("postcopy_place_zero_page: No OS support");
+    return -1;
+}
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
+{
+    error_report("postcopy_place_page: No OS support");
+    return -1;
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    error_report("postcopy_get_tmp_page: No OS support");
+    return -1;
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 37/43] Postcopy: Use helpers to map pages during migration
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 36/43] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 38/43] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 58eccc1..35afba1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1328,9 +1328,20 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ * rb: Pointer to RAMBlock* that gets filled in with the RB we find
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
-                                            int flags)
+                                            int flags, RAMBlock **rb)
 {
     static RAMBlock *block = NULL;
     char id[256];
@@ -1341,8 +1352,11 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
+        if (rb) {
+            *rb = block;
+        }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        goto gotit;
     }
 
     len = qemu_get_byte(f);
@@ -1350,12 +1364,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
-            return memory_region_get_ram_ptr(block->mr) + offset;
+        if (!strncmp(id, block->idstr, sizeof(id))) {
+            if (rb) {
+                *rb = block;
+            }
+            goto gotit;
+        }
     }
 
     error_report("Can't find block %s!", id);
     return NULL;
+
+gotit:
+    postcopy_hook_early_receive(mis,
+        (offset + (*rb)->offset) >> TARGET_PAGE_BITS);
+    return memory_region_get_ram_ptr(block->mr) + offset;
+
 }
 
 /*
@@ -1385,6 +1409,13 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     ram_addr_t addr;
     int flags, ret = 0;
     static uint64_t seq_iter;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    bool postcopy_running = mis->postcopy_ram_state >=
+                            POSTCOPY_RAM_INCOMING_LISTENING;
 
     seq_iter++;
 
@@ -1439,8 +1470,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
         } else if (flags & RAM_SAVE_FLAG_COMPRESS) {
             void *host;
             uint8_t ch;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -1448,20 +1480,63 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
 
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                if (!ch) {
+                    ret = postcopy_place_zero_page(mis, host,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                } else {
+                    void *tmp;
+                    tmp = postcopy_get_tmp_page(mis);
+                    if (!tmp) {
+                        return -ENOMEM;
+                    }
+                    memset(tmp, ch, TARGET_PAGE_SIZE);
+                    ret = postcopy_place_page(mis, host, tmp,
+                              (addr + rb->offset) >> TARGET_PAGE_BITS);
+                }
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy compress @"
+                                 "%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_PAGE) {
             void *host;
+            RAMBlock *rb;
 
-            host = host_from_stream_offset(f, addr, flags);
+            host = host_from_stream_offset(f, mis, addr, flags, &rb);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
 
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            } else {
+                void *tmp = postcopy_get_tmp_page(mis);
+                if (!tmp) {
+                    return -ENOMEM;
+                }
+                qemu_get_buffer(f, tmp, TARGET_PAGE_SIZE);
+                ret = postcopy_place_page(mis, host, tmp,
+                          (addr + rb->offset) >> TARGET_PAGE_BITS);
+                if (ret) {
+                    error_report("ram_load: Failure in postcopy simple"
+                                 "@%zx/%p;%s+%zx",
+                                 addr, host, rb->idstr, rb->offset);
+                    return ret;
+                }
+            }
         } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
-            void *host = host_from_stream_offset(f, addr, flags);
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
+            }
+            void *host = host_from_stream_offset(f, mis, addr, flags, NULL);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 38/43] qemu_ram_block_from_host
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 37/43] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 39/43] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock, the global ram_addr_t value and it's bitmap position.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since it's the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                    | 56 ++++++++++++++++++++++++++++++++++++++++++-----
 include/exec/cpu-common.h |  4 ++++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 729c12c..9bcb25b 100644
--- a/exec.c
+++ b/exec.c
@@ -1176,6 +1176,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
     RAMBlock *new_block = find_ram_block(addr);
@@ -1515,16 +1520,35 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ * *bm_index: bitmap index (i.e. scaled ram_addr for use where the scale
+ *                          isn't available)
+ *
+ * Returns: RAMBlock (or NULL if not found)
+ */
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset,
+                                   unsigned long *bm_index)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
 
     if (xen_enabled()) {
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        return qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (!block) {
+            return NULL;
+        }
+        *offset = (host - block->host);
+        return block;
     }
 
     block = ram_list.mru_block;
@@ -1545,7 +1569,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
+    *bm_index = *ram_addr >> TARGET_PAGE_BITS;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+    unsigned long index; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset, &index);
+
+    if (!block) {
+        return NULL;
+    }
+
     return block->mr;
 }
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 8042f50..ae25407 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -55,8 +55,12 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset,
+                                   unsigned long *bm_index);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 39/43] Postcopy; Handle userfault requests
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 38/43] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 40/43] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages marked as MADV_USERFAULT, and
allows the program to acknowledge those stalls and tell the accessing
thread to carry on.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   6 ++
 postcopy-ram.c                | 246 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 243 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 59d48bc..f79e133 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -88,9 +88,15 @@ struct MigrationIncomingState {
         POSTCOPY_RAM_INCOMING_END
     } postcopy_ram_state;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    /* For the kernel to send us notifications */
+    int            userfault_fd;
+    /* To tell the fault_thread to quit */
+    int            userfault_quit_fd;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyPMI    postcopy_pmi;
diff --git a/postcopy-ram.c b/postcopy-ram.c
index b37af47..55c16ea 100644
--- a/postcopy-ram.c
+++ b/postcopy-ram.c
@@ -54,6 +54,8 @@
  *                       areas without creating loads of VMAs.
  */
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 
@@ -68,6 +70,14 @@
 #define __NR_remap_anon_pages 317
 #endif
 
+#ifndef __NR_userfaultfd
+#define __NR_userfaultfd 318
+#endif
+
+#ifndef USERFAULTFD_PROTOCOL
+#define USERFAULTFD_PROTOCOL (uint64_t)0xaa
+#endif
+
 /* ---------------------------------------------------------------------- */
 /* Postcopy pagemap-inbound (pmi) - data structures that record the       */
 /* state of each page used by the inbound postcopy                        */
@@ -206,6 +216,7 @@ int postcopy_ram_hosttest(void)
      */
     void *testarea, *testarea2;
     long pagesize = getpagesize();
+    int ufd;
 
     testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
                                     MAP_ANONYMOUS, -1, 0);
@@ -215,15 +226,24 @@ int postcopy_ram_hosttest(void)
     }
     g_assert(((size_t)testarea & (pagesize-1)) == 0);
 
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        perror("postcopy_ram_hosttest: userfaultfd not available");
+        munmap(testarea, pagesize);
+        return -1;
+    }
+
     if (madvise(testarea, pagesize, MADV_USERFAULT)) {
         perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
         munmap(testarea, pagesize);
+        close(ufd);
         return -1;
     }
 
     if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
         perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
         munmap(testarea, pagesize);
+        close(ufd);
         return -1;
     }
 
@@ -240,11 +260,13 @@ int postcopy_ram_hosttest(void)
         perror("postcopy_ram_hosttest: remap_anon_pages not available");
         munmap(testarea, pagesize);
         munmap(testarea2, pagesize);
+        close(ufd);
         return -1;
     }
 
     munmap(testarea, pagesize);
     munmap(testarea2, pagesize);
+    close(ufd);
     return 0;
 }
 
@@ -343,8 +365,29 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
+    DPRINTF("%s: entry", __func__);
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to inc it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            DPRINTF("%s: Joining fault thread", __func__);
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            perror("incing userfault_quit_fd");
+        }
+
+    }
+
+    DPRINTF("%s: closing uf", __func__);
     close(mis->userfault_fd);
+    close(mis->userfault_quit_fd);
+    mis->have_fault_thread = false;
+
     if (qemu_ram_foreach_block(cleanup_area, mis)) {
         return -1;
     }
@@ -353,6 +396,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    DPRINTF("%s: exit", __func__);
     return 0;
 }
 
@@ -377,35 +421,215 @@ static int postcopy_ram_sensitise_area(const char *block_name, void *host_addr,
 }
 
 /*
+ * Tell the kernel that we've now got some memory it previously asked for.
+ * Note: We're not allowed to ack a page which wasn't requested.
+ */
+static int ack_userfault(MigrationIncomingState *mis, void *start, size_t len)
+{
+    uint64_t tmp[2];
+
+    /* Kernel wants the range that's now safe to access */
+    tmp[0] = (uint64_t)start;
+    tmp[1] = (uint64_t)start + (uint64_t)(len-1);
+
+    if (write(mis->userfault_fd, tmp, 16) != 16) {
+        int e = errno;
+
+        if (e == ENOENT) {
+            /* Kernel said it wasn't waiting - one case where this can
+             * happen is where two threads triggered the userfault
+             * and we receive the page and ack it just after we received
+             * the 2nd request and that ends up deciding it should ack it
+             * We could optimise it out, but it's rare.
+             */
+            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); */
+            return 0;
+        }
+        error_report("postcopy_ram: Failed to notify kernel for %p/%zx (%d)",
+                     start, len, e);
+        return -errno;
+    }
+
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+    void *hostaddr;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL;
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    DPRINTF("%s", __func__);
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
-    }
+    while (true) {
+        PostcopyPMIState old_state, tmp_state;
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        unsigned long bitmap_index;
+        struct pollfd pfd[2];
 
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            perror("userfault poll");
+            break;
+        }
+
+        if (pfd[1].revents) {
+            DPRINTF("%s got quit event", __func__);
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
+        if (ret != sizeof(hostaddr)) {
+            if (ret < 0) {
+                perror("Failed to read full userfault hostaddr");
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %ld",
+                             __func__, ret, sizeof(hostaddr));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+
+        /* TODO: We want to be marking host-page-size areas of the bitmaps? */
+        last_rb = rb;
+        rb = qemu_ram_block_from_host(hostaddr, true, &in_raspace, &rb_offset,
+                                      &bitmap_index);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %p",
+                         hostaddr);
+            break;
+        }
+
+        DPRINTF("%s: Request for HVA=%p index=%lx rb=%s offset=%zx",
+                __func__, hostaddr, bitmap_index, qemu_ram_get_idstr(rb),
+                rb_offset);
+
+        tmp_state = postcopy_pmi_get_state(mis, bitmap_index);
+        do {
+            old_state = tmp_state;
+
+            switch (old_state) {
+            case POSTCOPY_PMI_REQUESTED:
+                /* Do nothing - it's already requested */
+                break;
+
+            case POSTCOPY_PMI_RECEIVED:
+                /* Already arrived - no state change, just kick the kernel */
+                DPRINTF("postcopy_ram_fault_thread: notify pre of %p",
+                        hostaddr);
+                if (ack_userfault(mis, hostaddr, hostpagesize)) {
+                    assert(0);
+                }
+                break;
+
+            case POSTCOPY_PMI_MISSING:
+
+                tmp_state = postcopy_pmi_change_state(mis, bitmap_index,
+                                           old_state, POSTCOPY_PMI_REQUESTED);
+                if (tmp_state == POSTCOPY_PMI_MISSING) {
+                    /*
+                     * Send the request to the source - we want to request one
+                     * of our host page sizes (which is >= TPS)
+                     */
+                    if (rb != last_rb) {
+                        migrate_send_rp_reqpages(mis, qemu_ram_get_idstr(rb),
+                                                 rb_offset, hostpagesize);
+                    } else {
+                        /* Save some space */
+                        migrate_send_rp_reqpages(mis, NULL,
+                                                 rb_offset, hostpagesize);
+                    }
+
+                    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_END) {
+                        /* This shouldn't happen - the command to close the
+                         * postcopy stream should be after the last page of RAM
+                         * so we're not going to get an answer
+                         */
+                        error_report("postcopy_ram_fault_thread: UF after end");
+                        postcopy_pmi_dump(mis);
+                        assert(0);
+                    }
+                }
+                break;
+           }
+        } while (tmp_state != old_state);
+    }
+    DPRINTF("%s: exit", __func__);
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    uint64_t tmp64;
+
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (mis->userfault_fd == -1) {
+        perror("Failed to open userfault fd");
+        return -1;
+    }
+
+    /*
+     * Version handshake, we send it the version we want and expect to get the
+     * same back.
+     */
+    tmp64 = USERFAULTFD_PROTOCOL;
+    if (write(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Writing userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (read(mis->userfault_fd, &tmp64, sizeof(tmp64)) != sizeof(tmp64)) {
+        perror("Reading userfaultfd version");
+        close(mis->userfault_fd);
+        return -1;
+    }
+    if (tmp64 != USERFAULTFD_PROTOCOL) {
+        error_report("Mismatched userfaultfd version, expected %zx, got %zx",
+                     (size_t)USERFAULTFD_PROTOCOL, (size_t)tmp64);
+        close(mis->userfault_fd);
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        perror("Opening userfault_quit_fd");
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(postcopy_ram_sensitise_area, NULL)) {
         return -1;
     }
 
+    DPRINTF("postcopy_ram_enable_notify: Sensitised");
+
     return 0;
 }
 
@@ -439,11 +663,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
     if (syscall(__NR_remap_anon_pages, host, from, getpagesize(), 0) !=
             getpagesize()) {
+        int e = errno;
         perror("remap_anon_pages in postcopy_place_page");
         fprintf(stderr, "host: %p from: %p pmi=%d\n", host, from,
                 postcopy_pmi_get_state(mis, bitmap_offset));
 
-        return -errno;
+        return -e;
     }
 
     tmp_state = postcopy_pmi_get_state(mis, bitmap_offset);
@@ -456,7 +681,10 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
 
     if (old_state == POSTCOPY_PMI_REQUESTED) {
-        /* TODO: Notify kernel */
+        /* Send the kernel the host address that should now be accessible */
+        DPRINTF("%s: Notifying kernel bitmap_offset=0x%lx host=%p",
+                __func__, bitmap_offset, host);
+        return ack_userfault(mis, host, getpagesize());
     }
 
     /* TODO: hostpagesize!=targetpagesize case */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 40/43] Start up a postcopy/listener thread ready for incoming page data
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 39/43] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 41/43] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration.c                   |  6 +++++
 savevm.c                      | 63 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f79e133..d97c52b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -92,6 +92,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration.c b/migration.c
index 44eb0e2..ac5bd3d 100644
--- a/migration.c
+++ b/migration.c
@@ -1048,6 +1048,12 @@ static int postcopy_start(MigrationState *ms)
      */
     QEMUFile *fb = qemu_bufopen("w", NULL);
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_ram_listen(fb);
+
     qemu_savevm_state_complete(fb);
     DPRINTF("postcopy_start: sending req 3\n");
     qemu_savevm_send_reqack(fb, 3);
diff --git a/savevm.c b/savevm.c
index 0cae88b..b2f51f7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1265,9 +1265,46 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+typedef struct ram_listen_thread_data {
+    QEMUFile *f;
+    LoadStateEntry_Head *lh;
+} ram_listen_thread_data;
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    ram_listen_thread_data *rltd = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    DPRINTF("postcopy_ram_listen_thread start");
+
+    load_res = qemu_loadvm_state_main(rltd->f, rltd->lh);
+
+    DPRINTF("postcopy_ram_listen_thread exiting");
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(rltd->f, load_res);
+        mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_END;
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    g_free(rltd);
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive page data */
 static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 {
+    ram_listen_thread_data *rltd = g_malloc(sizeof(ram_listen_thread_data));
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_ADVISE) {
         error_report("CMD_POSTCOPY_RAM_LISTEN in wrong postcopy state (%d)",
@@ -1286,8 +1323,25 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
-    return 0;
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    rltd->f = mis->file;
+    rltd->lh = &loadvm_handlers;
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, rltd, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+
+    /*
+     * all good - cause the loop that handled this command to exit because
+     * the new thread is taking over
+     */
+    return LOADVM_EXITCODE_QUITPARENT | LOADVM_EXITCODE_KEEPHANDLERS;
 }
 
 /* After all discards we can start running and asking for pages */
@@ -1607,6 +1661,11 @@ int qemu_loadvm_state(QEMUFile *f)
     QLIST_INIT(&loadvm_handlers);
     ret = qemu_loadvm_state_main(f, &loadvm_handlers);
 
+    if (migration_incoming_get_current()->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         cpu_synchronize_all_post_init();
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 41/43] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 40/43] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 42/43] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_ram_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/savevm.c b/savevm.c
index b2f51f7..77e4f88 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1347,6 +1347,8 @@ static int loadvm_postcopy_ram_handle_listen(MigrationIncomingState *mis)
 /* After all discards we can start running and asking for pages */
 static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RAM_RUN in wrong postcopy state (%d)",
@@ -1355,6 +1357,28 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
     }
 
     mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_RUNNING;
+
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+    bdrv_clear_incoming_migration_all();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: cpu_synchronize_all_post_init");
+    cpu_synchronize_all_post_init();
+
+    DPRINTF("loadvm_postcopy_ram_handle_run: vm_start");
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1363,11 +1387,15 @@ static int loadvm_postcopy_ram_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
-/* The end - with a byte from the source which can tell us to fail. */
-static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
+/* The end - with a byte from the source which can tell us to fail.
+ * The source sends this either if there is a failure, or if it believes it's
+ * sent everything
+ */
+static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis,
+                                          uint8_t status)
 {
     DPRINTF("%s", __func__);
     if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_NONE) {
@@ -1375,7 +1403,37 @@ static int loadvm_postcopy_ram_handle_end(MigrationIncomingState *mis)
                      mis->postcopy_ram_state);
         return -1;
     }
-    return -1; /* TODO - expecting 1 byte good/fail */
+
+    DPRINTF("loadvm_postcopy_ram_handle_end status=%d", status);
+
+    if (!status) {
+        bool one_message = false;
+        /* This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet.
+         * TODO: Using an atomic_xchg or something for this
+         */
+        while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) {
+            if (!one_message) {
+                DPRINTF("%s: Waiting for RUN", __func__);
+                one_message = true;
+            }
+        }
+    }
+
+    /* TODO: Give up on non-0 status
+     * TODO: If 0 status, check we've received everything (all outstanding
+     * requests should already have been completed)
+     */
+
+    /* TODO: Or should this be none? */
+    mis->postcopy_ram_state = POSTCOPY_RAM_INCOMING_END;
+    /* TODO: Wait for the UF thread, but we can't join here since it would
+     * block main thread; we should only have received this after every page
+     * anyway.
+     */
+    migrate_send_rp_shut(mis, status);
+    return LOADVM_EXITCODE_QUITLOOP;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
@@ -1516,7 +1574,7 @@ static int loadvm_process_command(QEMUFile *f,
                                                    len, 1)) {
             return -1;
         }
-        return loadvm_postcopy_ram_handle_end(mis);
+        return loadvm_postcopy_ram_handle_end(mis, qemu_get_byte(f));
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 42/43] End of migration for postcopy
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 41/43] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2014-08-12  1:50 ` [Qemu-devel] [PATCH v2 00/43] Postcopy implementation zhanghailiang
  43 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/migration.c b/migration.c
index ac5bd3d..188b13b 100644
--- a/migration.c
+++ b/migration.c
@@ -212,6 +212,26 @@ static void process_incoming_migration_co(void *opaque)
 
     ret = qemu_loadvm_state(f);
 
+    if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) {
+        /*
+         * Where a migration had postcopy enabled (and thus went to advise)
+         * but managed to complete within the precopy period
+         */
+        postcopy_ram_incoming_cleanup(mis);
+    }
+
+    DPRINTF("%s: ret=%d postcopy_ram_state=%d", __func__, ret,
+            mis->postcopy_ram_state);
+    if ((ret >= 0) &&
+        (mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) {
+        /*
+         * Postcopy was started, cleanup should happen at the end of the
+         * postcopy thread.
+         */
+        DPRINTF("process_incoming_migration_co: exiting main branch");
+        return;
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works.
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 42/43] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2014-08-11 14:29 ` Dr. David Alan Gilbert (git)
  2014-08-11 17:19   ` Eric Blake
  2014-08-12  1:50 ` [Qemu-devel] [PATCH v2 00/43] Postcopy implementation zhanghailiang
  43 siblings, 1 reply; 55+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2014-08-11 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: aarcange, yamahata, lilei, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..fec2d46 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,153 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way comms; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by fd_handler on main thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+Opening the return path generally sets the fd to be non-blocking so that a
+failed destination can't block the source; and since the non-blockingness seems
+to follow both directions it does alter the semantics of the forward path.
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+it's plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is automatically made to precopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy states ===
+Postcopy moves through a series of states (see postcopy_ram_state)
+from ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't passed the start-time threshold; here the destination
+          checks it's OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
+
+Normal precopy now carries on as normal, until the point that the source
+hits the start-time threshold and transitions to postcopy.  The source
+stops it's CPUs and transmits a 'discard bitmap' indicating pages that
+have been previously sent but are now dirty again and hence are out of
+date on the destination.
+
+The migration stream now contains a 'package' containing it's own chunk
+of migration stream, followed by a return to a normal stream containing
+page data.  The package (sent as CMD_PACKAGED) contains the commands to
+cycle the states on the destination, followed by all of the device
+state excluding RAM.  This lets the destination request pages from the
+source in parallel with loading device state, this is required since
+some devices (virtio) access guest memory during device initialisation.
+
+  Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+The package now contains all the remaining state data and the command
+to transition to the next state.
+
+  Running: POSTCOPY_RAM_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+Page data is sent from the source to the destination both as part
+of a linear scan (like normal migration), and received by the 'listen thread',
+When the destination tries to use a page it hasn't got, it requests
+it from the source (down the return path) and the source sends this
+page in the same stream.  When the source has transmitted all pages
+it sends a POSTCOPY_RAM_END command to transition to
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+state, the migration is now complete.
+
+=== Source side page maps ===
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that once in postcopy mode, the sent map is still updated, however it's
+contents are not-consistent as a local view of what's been sent since it's
+only got the masked result.
+
+=== Destination side page maps ===
+(Needs to be changed so we can update both easily - at the moment updates are done
+ with a lock)
+The destination keeps a 'requested map' and a 'received map'.
+Both maps are initially 0, as pages are received the bits are set in 'received map'.
+Incoming requests from the kernel cause the bit to be set in the 'requested map'.
+When a page is received that is marked as 'requested' the kernel is notified.
+If the kernel requests a page that has already been 'received' the kernel is notified
+without re-requesting.
+
+This leads to three valid page states:
+page states:
+    missing (!rc,!rq)  - page not yet received or requested
+    received (rc,!rq)  - Page received
+    requested (!rc,rq) - page requested but not yet received
+
+state transitions:
+      received -> missing   (only during setup/discard)
+
+      missing -> received   (normal incoming page)
+      requested -> received (incoming page previously requested)
+      missing -> requested  (userfault request)
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram.
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2014-08-11 16:47   ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2014-08-11 16:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h | 1 +
>  migration.c                   | 9 +++++++++
>  qapi-schema.json              | 6 +++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2014-08-11 17:01   ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2014-08-11 17:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
> 
>   migrate_start_postcopy
> 
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
> 
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hmp-commands.hx               | 15 +++++++++++++++
>  hmp.c                         |  7 +++++++
>  hmp.h                         |  1 +
>  include/migration/migration.h |  3 +++
>  migration.c                   | 22 ++++++++++++++++++++++
>  qapi-schema.json              |  8 ++++++++
>  qmp-commands.hx               | 19 +++++++++++++++++++
>  7 files changed, 75 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/qapi-schema.json
> @@ -538,6 +538,14 @@
>  { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>  
>  ##
> +# @migrate-start-postcopy
> +#
> +# Switch migration to postcopy mode
> +#
> +# Since: 2.2
> +{ 'command': 'migrate-start-postcopy' }
> +

Interface looks reasonable.  I haven't closely reviewed much else in the
series yet.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works.
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-08-11 17:19   ` Eric Blake
  2014-08-11 17:58     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Blake @ 2014-08-11 17:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 5241 bytes --]

On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/migration.txt | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 150 insertions(+)

I personally like docs early in the series (see if they make sense in
isolation, then whether the implementation matched the docs; rather than
being tainted by implementation and overlooking design flaws).

> 
> diff --git a/docs/migration.txt b/docs/migration.txt
> index 0492a45..fec2d46 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -294,3 +294,153 @@ save/send this state when we are in the middle of a pio operation
>  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>  not enabled, the values on that fields are garbage and don't need to
>  be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd (although
> +possibly with another fd or similar for some fast way of throwing pages across).
> +
> +However, some uses need two way comms; in particular the Postcopy destination

s/comms/communication/

> +needs to be able to request pages on demand from the source.
> +
> +For these scenarios there is a 'return path' from the destination to the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> +path.
> +
> +  Source side
> +     Forward path - written by migration thread
> +     Return path  - opened by main thread, read by fd_handler on main thread
> +
> +  Destination side
> +     Forward path - read by main thread
> +     Return path  - opened by main thread, written by main thread AND postcopy
> +                    thread (protected by rp_mutex)
> +
> +Opening the return path generally sets the fd to be non-blocking so that a
> +failed destination can't block the source; and since the non-blockingness seems
> +to follow both directions it does alter the semantics of the forward path.

Once you've opened a return path, haven't you already committed to the
migration?  That is, since we already document that postcopy loses the
VM if something goes wrong, what point is there in trying to protect
from a failed destination?  Once the source knows that the destination
has opened the return path, the source should never run the VM again.

> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> +it's plus side is that there is an upper bound on the amount of migration traffic

s/it's/its/

> +and time it takes, the down side is that during the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.
> +
...

> +=== Postcopy states ===
> +Postcopy moves through a series of states (see postcopy_ram_state)
> +from ADVISE->LISTEN->RUNNING->END
> +
> +  Advise: Set at the start of migration if postcopy is enabled, even
> +          if it hasn't passed the start-time threshold; here the destination
> +          checks it's OS has the support needed for postcopy, and performs

s/it's/that its/

> +          setup to ensure the RAM mappings are suitable for later postcopy.
> +          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
> +
> +Normal precopy now carries on as normal, until the point that the source
> +hits the start-time threshold and transitions to postcopy.  The source
> +stops it's CPUs and transmits a 'discard bitmap' indicating pages that

s/it's/its/

(now that you've made the same mistake multiple times: remember, "it's"
is correct ONLY if you can say "it is" (or "it has") in the same
context; possession uses "its").

> +have been previously sent but are now dirty again and hence are out of
> +date on the destination.
> +
> +The migration stream now contains a 'package' containing it's own chunk

s/it's/its/

> +of migration stream, followed by a return to a normal stream containing
> +page data.  The package (sent as CMD_PACKAGED) contains the commands to
> +cycle the states on the destination, followed by all of the device
> +state excluding RAM.  This lets the destination request pages from the
> +source in parallel with loading device state, this is required since
> +some devices (virtio) access guest memory during device initialisation.

I don't know if we have a strong preference for US (initialization) or
UK (initialisation) spelling in our docs.

...
> +
> +Note that once in postcopy mode, the sent map is still updated, however it's

s/it's/its/

> +contents are not-consistent as a local view of what's been sent since it's

This use of "it's" is correct, but sounds awkward.

> +only got the masked result.

Maybe:

Note that once in postcopy mode, the sent map is still updated; however,
its contents are not necessarily consistent with the pages already sent
due to the masking with the migration bitmap.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works.
  2014-08-11 17:19   ` Eric Blake
@ 2014-08-11 17:58     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-11 17:58 UTC (permalink / raw)
  To: Eric Blake; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* Eric Blake (eblake@redhat.com) wrote:
> On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/migration.txt | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 150 insertions(+)
> 
> I personally like docs early in the series (see if they make sense in
> isolation, then whether the implementation matched the docs; rather than
> being tainted by implementation and overlooking design flaws).

No problem, I can flip that around.

> > diff --git a/docs/migration.txt b/docs/migration.txt
> > index 0492a45..fec2d46 100644
> > --- a/docs/migration.txt
> > +++ b/docs/migration.txt
> > @@ -294,3 +294,153 @@ save/send this state when we are in the middle of a pio operation
> >  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
> >  not enabled, the values on that fields are garbage and don't need to
> >  be sent.
> > +
> > += Return path =
> > +
> > +In most migration scenarios there is only a single data path that runs
> > +from the source VM to the destination, typically along a single fd (although
> > +possibly with another fd or similar for some fast way of throwing pages across).
> > +
> > +However, some uses need two way comms; in particular the Postcopy destination
> 
> s/comms/communication/

Thanks, all typos/etc applied.

> > +needs to be able to request pages on demand from the source.
> > +
> > +For these scenarios there is a 'return path' from the destination to the source;
> > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> > +path.
> > +
> > +  Source side
> > +     Forward path - written by migration thread
> > +     Return path  - opened by main thread, read by fd_handler on main thread
> > +
> > +  Destination side
> > +     Forward path - read by main thread
> > +     Return path  - opened by main thread, written by main thread AND postcopy
> > +                    thread (protected by rp_mutex)
> > +
> > +Opening the return path generally sets the fd to be non-blocking so that a
> > +failed destination can't block the source; and since the non-blockingness seems
> > +to follow both directions it does alter the semantics of the forward path.
> 
> Once you've opened a return path, haven't you already committed to the
> migration?  That is, since we already document that postcopy loses the
> VM if something goes wrong, what point is there in trying to protect
> from a failed destination?  Once the source knows that the destination
> has opened the return path, the source should never run the VM again.

The point at which the source can't run again is quite late; plenty of things
can fail before that point and be recovered from; as long as the destination
hasn't started its CPU (and hence started chatting on the network or
changing shared storage) recovery is possible in this implementation.

Opening the return path is done early on in migration if the postcopy capability is on,
before the migrate_start_postcopy is issued.  The entire migration might be
completed in precopy with the return path opened but unused.

(The return-path blocking-ness is up for discussion anyway with the changes that
Paolo suggested in the 1st series that I've not done yet).

> > +
> > += Postcopy =
> > +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> > +it's plus side is that there is an upper bound on the amount of migration traffic
> 
> s/it's/its/
> 
> > +and time it takes, the down side is that during the postcopy phase, a failure of
> > +*either* side or the network connection causes the guest to be lost.
> > +
> ...
> 
> > +=== Postcopy states ===
> > +Postcopy moves through a series of states (see postcopy_ram_state)
> > +from ADVISE->LISTEN->RUNNING->END
> > +
> > +  Advise: Set at the start of migration if postcopy is enabled, even
> > +          if it hasn't passed the start-time threshold; here the destination
> > +          checks it's OS has the support needed for postcopy, and performs
> 
> s/it's/that its/
> 
> > +          setup to ensure the RAM mappings are suitable for later postcopy.
> > +          (Triggered by reception of POSTCOPY_RAM_ADVISE command)
> > +
> > +Normal precopy now carries on as normal, until the point that the source
> > +hits the start-time threshold and transitions to postcopy.  The source
> > +stops it's CPUs and transmits a 'discard bitmap' indicating pages that
> 
> s/it's/its/
> 
> (now that you've made the same mistake multiple times: remember, "it's"
> is correct ONLY if you can say "it is" (or "it has") in the same
> context; possession uses "its").
> 
> > +have been previously sent but are now dirty again and hence are out of
> > +date on the destination.
> > +
> > +The migration stream now contains a 'package' containing it's own chunk
> 
> s/it's/its/
> 
> > +of migration stream, followed by a return to a normal stream containing
> > +page data.  The package (sent as CMD_PACKAGED) contains the commands to
> > +cycle the states on the destination, followed by all of the device
> > +state excluding RAM.  This lets the destination request pages from the
> > +source in parallel with loading device state, this is required since
> > +some devices (virtio) access guest memory during device initialisation.
> 
> I don't know if we have a strong preference for US (initialization) or
> UK (initialisation) spelling in our docs.

grep suggests US.

> ...
> > +
> > +Note that once in postcopy mode, the sent map is still updated, however it's
> 
> s/it's/its/
> 
> > +contents are not-consistent as a local view of what's been sent since it's
> 
> This use of "it's" is correct, but sounds awkward.
> 
> > +only got the masked result.
> 
> Maybe:
> 
> Note that once in postcopy mode, the sent map is still updated; however,
> its contents are not necessarily consistent with the pages already sent
> due to the masking with the migration bitmap.

Thanks.

Dave
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2014-08-11 18:29   ` Eric Blake
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Blake @ 2014-08-11 18:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, lilei, quintela

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> check the return value of the function it calls and error if it's none-0

s/none/non/

> Fixup qemu_rdma_init_one_block that is the only current caller,
>   and __qemu_rdma_add_block the only function it calls using it.
> 
> Pass the name of the ramblock to the function; helps in debugging.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c                    | 10 ++++++++--
>  include/exec/cpu-common.h |  4 ++--
>  migration-rdma.c          |  4 ++--
>  3 files changed, 12 insertions(+), 6 deletions(-)

> 
> +++ b/migration-rdma.c
> @@ -595,10 +595,10 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr,
>   * in advanced before the migration starts. This tells us where the RAM blocks
>   * are so that we can register them individually.
>   */
> -static void qemu_rdma_init_one_block(void *host_addr,
> +static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
>      ram_addr_t block_offset, ram_addr_t length, void *opaque)
>  {
> -    __qemu_rdma_add_block(opaque, host_addr, block_offset, length);
> +    return __qemu_rdma_add_block(opaque, host_addr, block_offset, length);

We really shouldn't be using __qemu naming in our source, as that
namespace is reserved for the compiler/libc (this patch didn't introduce
it, so if you fix it, do it as a separate cleanup).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/43] Postcopy implementation
  2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2014-08-12  1:50 ` zhanghailiang
  2014-08-12  9:19   ` Dr. David Alan Gilbert
  43 siblings, 1 reply; 55+ messages in thread
From: zhanghailiang @ 2014-08-12  1:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, lilei

On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>
> Hi,
>    This is a 2nd cut of my postcopy implementation; it fixes up some
> of the comments from the 1st posting but it also fixes a lot of bugs,
> (other comments from the 1st round will be fixed later).
>
> The commands to start a postcopy migration have changed to:
>
>    migrate_set_capability x-postcopy-ram on
>    migrate -d   tcp:whereever
>    <some time later>
>    migrate_start_postcopy
>
> Starting the destination in paused mode (-s) now works.
>
> This now survives in a test harness with a few hundred migrations without
> problems (both on heavily loaded and idle VMs).
>
> Other changes:
>    Changed source rp handler to coroutine (from fd_set_handler) - cures
>      occasional loss of requests (and hence huge latency spikes)
>    Added 'SHUT' rp command as a handshake from the destination that it's finished
>    Fixed race during migration of guest with very little page modification
>    Shutdown the fault-thread cleanly
>    Update 'received' map in ADVISE state
>    Turned off nagling on rp - we want our requests to get their ASAP
>    Fixed seg when attempting to postcopy to a stream that couldn't get a return path
>    Got rid of the move of QEMUFile into the .h
>    Fixed up '-ve' and title of assert patch
>    Removed a couple of return-path opening optimisations as per Paolo's comments
>
> Note this version needs the QEMUSizedBuffer/QEMUFile patches that I've posted
> separately.
>
> Current TODO:
>     1) It's not bisectable yet
>     2) There are no testsuite additions (although I have a virt-test modification
>        I've been using).
>     3) End-of-migration cleanup is much better than it was but still needs some work.
>     4) Not all the code is there for systems with hostpagesize!=qemupagesize
>     5) xbzrle needs disabling once in postcopy
>     6) RDMA needs some rework
>     7) The latency measurements are now pretty consistent, no very large spikes,
>        but they're a bit higher than expected, I need to look at rate limiting
>        just the background scan.
>     8) Conversion of return-path to a process and blocking fd needs investigation
>        (as per discussion with Paolo)
>     9) Andrea has suggestions on ways to avoid some of the huge-page splitting
>        that occurs during the discard phase after precopy.
>    10) I'd like to format the data on the return path in a more structured way
>        (i.e. maybe using stuff from my BER world).
>    11) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than
>        the source feels like it needs looking at for postcopy.
> Dave
>
> Dr. David Alan Gilbert (43):
>    qemu_ram_foreach_block: pass up error value, and down the ramblock
>      name
>    improve DPRINTF macros, add to savevm
>    Add qemu_get_counted_string to read a string prefixed by a count byte
>    Create MigrationIncomingState
>
>    Return path: socket_writev_buffer: Block even on non-blocking fd's
>    Migration commands
>    Return path: Control commands
>    Return path: Send responses from destination to source
>    Return path: Source handling of return path
>    qemu_loadvm errors and debug
>    ram_debug_dump_bitmap: Dump a migration bitmap as text
>    Rework loadvm path for subloops
>    Add migration-capability boolean for postcopy-ram.
>    Add wrappers and handlers for sending/receiving the postcopy-ram
>      migration messages.
>    QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream
>    migrate_init: Call from savevm
>    Allow savevm handlers to state whether they could go into postcopy
>    postcopy: OS support test
>    migrate_start_postcopy: Command to trigger transition to postcopy
>    MIG_STATE_POSTCOPY_ACTIVE: Add new migration state
>    qemu_savevm_state_complete: Postcopy changes
>    Postcopy: Maintain sentmap during postcopy pre phase
>    Postcopy page-map-incoming (PMI) structure
>    postcopy: Add incoming_init/cleanup functions
>    postcopy: Incoming initialisation
>    postcopy: ram_enable_notify to switch on userfault
>    Postcopy: postcopy_start
>    Postcopy: Rework migration thread for postcopy mode
>    mig fd_connect: open return path
>    Postcopy: Create a fault handler thread before marking the ram as
>      userfault
>    Page request:  Add MIG_RPCOMM_REQPAGES reverse command
>    Page request: Process incoming page request
>    Page request: Consume pages off the post-copy queue
>    Add assertion to check migration_dirty_pages
>    postcopy_ram.c: place_page and helpers
>    Postcopy: Use helpers to map pages during migration
>    qemu_ram_block_from_host
>    Postcopy; Handle userfault requests
>    Start up a postcopy/listener thread ready for incoming page data
>    postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands
>    End of migration for postcopy
>    Start documenting how postcopy works.
>
>   Makefile.objs                    |   2 +-
>   arch_init.c                      | 490 +++++++++++++++++++--
>   docs/migration.txt               | 150 +++++++
>   exec.c                           |  66 ++-
>   hmp-commands.hx                  |  15 +
>   hmp.c                            |   7 +
>   hmp.h                            |   1 +
>   include/exec/cpu-common.h        |   8 +-
>   include/migration/migration.h    | 128 ++++++
>   include/migration/postcopy-ram.h |  89 ++++
>   include/migration/qemu-file.h    |   9 +
>   include/migration/vmstate.h      |   2 +-
>   include/qemu/typedefs.h          |   7 +-
>   include/sysemu/sysemu.h          |  41 +-
>   migration-rdma.c                 |   4 +-
>   migration.c                      | 667 ++++++++++++++++++++++++++--
>   postcopy-ram.c                   | 915 +++++++++++++++++++++++++++++++++++++++
>   qapi-schema.json                 |  14 +-
>   qemu-file.c                      | 124 +++++-
>   qmp-commands.hx                  |  19 +
>   savevm.c                         | 854 +++++++++++++++++++++++++++++++++---
>   21 files changed, 3473 insertions(+), 139 deletions(-)
>   create mode 100644 include/migration/postcopy-ram.h
>   create mode 100644 postcopy-ram.c
>

Hi Dave,

I want to test your patches, but i failed to 'git am' them to the new
qemu-2.1 source. I want to know if you has a git-branch which i can
git clone directly? Thanks.

Best regards,
zhanghailiang

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] 答复:  [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2014-08-12  2:13   ` chenliang (T)
  2014-08-12  9:36     ` [Qemu-devel] ????: [PATCH v2 06/43] Return path: socket_writev_buffer:?Block " Dr. David Alan Gilbert
  0 siblings, 1 reply; 55+ messages in thread
From: chenliang (T) @ 2014-08-12  2:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel


From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The return path uses a non-blocking fd so as not to block waiting for the (possibly broken) destination to finish returning a message, however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Hi David
  It is confusing why don't use blocking fd?
Best regards

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test
  2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2014-08-12  5:32   ` zhanghailiang
  2014-08-12  8:18     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 55+ messages in thread
From: zhanghailiang @ 2014-08-12  5:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, lilei

On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
>
> Creates postcopy-ram.c which will get most of the other helpers we need.
>
> Signed-off-by: Dr. David Alan Gilbert<dgilbert@redhat.com>
> ---
>   Makefile.objs                    |   2 +-
>   include/migration/postcopy-ram.h |  19 ++++++
>   postcopy-ram.c                   | 129 +++++++++++++++++++++++++++++++++++++++
>   3 files changed, 149 insertions(+), 1 deletion(-)
>   create mode 100644 include/migration/postcopy-ram.h
>   create mode 100644 postcopy-ram.c
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 1f76cea..a7ad235 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -55,7 +55,7 @@ common-obj-y += qemu-file.o
>   common-obj-$(CONFIG_RDMA) += migration-rdma.o
>   common-obj-y += qemu-char.o #aio.o
>   common-obj-y += block-migration.o
> -common-obj-y += page_cache.o xbzrle.o
> +common-obj-y += page_cache.o xbzrle.o postcopy-ram.o
>
>   common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
>
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> new file mode 100644
> index 0000000..dcd1afa
> --- /dev/null
> +++ b/include/migration/postcopy-ram.h
> @@ -0,0 +1,19 @@
> +/*
> + * Postcopy migration for RAM
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *  Dave Gilbert<dgilbert@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#ifndef QEMU_POSTCOPY_RAM_H
> +#define QEMU_POSTCOPY_RAM_H
> +
> +/* Return 0 if the host supports everything we need to do postcopy-ram */
> +int postcopy_ram_hosttest(void);
> +
> +#endif
> diff --git a/postcopy-ram.c b/postcopy-ram.c
> new file mode 100644
> index 0000000..1f3e6ea
> --- /dev/null
> +++ b/postcopy-ram.c
> @@ -0,0 +1,129 @@
> +/*
> + * Postcopy migration for RAM
> + *
> + * Copyright 2013-2014 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *  Dave Gilbert<dgilbert@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +/*
> + * Postcopy is a migration technique where the execution flips from the
> + * source to the destination before all the data has been copied.
> + */
> +
> +#include<glib.h>
> +#include<stdio.h>
> +#include<unistd.h>
> +
> +#include "qemu-common.h"
> +#include "migration/migration.h"
> +#include "migration/postcopy-ram.h"
> +
> +//#define DEBUG_POSTCOPY
> +
> +#ifdef DEBUG_POSTCOPY
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "postcopy@%" PRId64 " " fmt "\n", \
> +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME), \
> +                          ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +/* Postcopy needs to detect accesses to pages that haven't yet been copied
> + * across, and efficiently map new pages in, the techniques for doing this
> + * are target OS specific.
> + */
> +#if defined(__linux__)
> +
> +/* On Linux we use:
> + *    madvise MADV_USERFAULT - to mark an area of anonymous memory such
> + *                             that userspace is notifed of accesses to
> + *                             unallocated areas.
> + *    userfaultfd      - opens a socket to receive USERFAULT messages
> + *    remap_anon_pages - to shuffle mapped pages into previously unallocated
> + *                       areas without creating loads of VMAs.
> + */
> +
> +#include<sys/mman.h>
> +#include<sys/types.h>
> +
> +/* TODO remove once we have libc defs
> + * NOTE: These are x86-64 numbers for Andrea's 3.15.0 world */
> +#ifndef MADV_USERFAULT
> +#define MADV_USERFAULT   18
> +#define MADV_NOUSERFAULT 19
> +#endif
> +
> +#ifndef __NR_remap_anon_pages
> +#define __NR_remap_anon_pages 317
> +#endif
> +
> +int postcopy_ram_hosttest(void)
> +{
> +    /* TODO: Needs guarding with CONFIG_ once we have libc's that have the defs
> +     *
> +     * Try each syscall we need, but this isn't a testbench,
> +     * just enough to see that we have the calls
> +     */
> +    void *testarea, *testarea2;
> +    long pagesize = getpagesize();
> +
> +    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> +                                    MAP_ANONYMOUS, -1, 0);
> +    if (!testarea) {
> +        perror("postcopy_ram_hosttest: Failed to map test area");
> +        return -1;
> +    }
> +    g_assert(((size_t)testarea&  (pagesize-1)) == 0);
> +
> +    if (madvise(testarea, pagesize, MADV_USERFAULT)) {
> +        perror("postcopy_ram_hosttest: MADV_USERFAULT not available");
> +        munmap(testarea, pagesize);
> +        return -1;
> +    }
> +
> +    if (madvise(testarea, pagesize, MADV_NOUSERFAULT)) {
> +        perror("postcopy_ram_hosttest: MADV_NOUSERFAULT not available");
> +        munmap(testarea, pagesize);
> +        return -1;
> +    }
> +
> +    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> +                                     MAP_ANONYMOUS, -1, 0);
> +    if (!testarea2) {
> +        perror("postcopy_ram_hosttest: Failed to map second test area");

Before return, should we munmap testarea?
BTW, i think it is better to use goto statement,
which we can handle the error cases together!

Best regards,
zhanghailiang
> +        return -1;
> +    }
> +    g_assert(((size_t)testarea2&  (pagesize-1)) == 0);
> +    *(char *)testarea = 0; /* Force the map of the new page */
> +    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
> +        pagesize) {
> +        perror("postcopy_ram_hosttest: remap_anon_pages not available");
> +        munmap(testarea, pagesize);
> +        munmap(testarea2, pagesize);
> +        return -1;
> +    }
> +
> +    munmap(testarea, pagesize);
> +    munmap(testarea2, pagesize);
> +    return 0;
> +}
> +
> +#else
> +/* No target OS support, stubs just fail */
> +
> +int postcopy_ram_hosttest(void)
> +{
> +    error_report("postcopy_ram_hosttest: No OS support");
> +    return -1;
> +}
> +
> +#endif
> +

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test
  2014-08-12  5:32   ` zhanghailiang
@ 2014-08-12  8:18     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-12  8:18 UTC (permalink / raw)
  To: zhanghailiang; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> >

<snip>

> >+    testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> >+                                     MAP_ANONYMOUS, -1, 0);
> >+    if (!testarea2) {
> >+        perror("postcopy_ram_hosttest: Failed to map second test area");
> 
> Before return, should we munmap testarea?
> BTW, i think it is better to use goto statement,
> which we can handle the error cases together!

Yes, thank you for spotting this; I'll fix it in my next version.

Dave

> 
> Best regards,
> zhanghailiang
> >+        return -1;
> >+    }
> >+    g_assert(((size_t)testarea2&  (pagesize-1)) == 0);
> >+    *(char *)testarea = 0; /* Force the map of the new page */
> >+    if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) !=
> >+        pagesize) {
> >+        perror("postcopy_ram_hosttest: remap_anon_pages not available");
> >+        munmap(testarea, pagesize);
> >+        munmap(testarea2, pagesize);
> >+        return -1;
> >+    }
> >+
> >+    munmap(testarea, pagesize);
> >+    munmap(testarea2, pagesize);
> >+    return 0;
> >+}
> >+
> >+#else
> >+/* No target OS support, stubs just fail */
> >+
> >+int postcopy_ram_hosttest(void)
> >+{
> >+    error_report("postcopy_ram_hosttest: No OS support");
> >+    return -1;
> >+}
> >+
> >+#endif
> >+
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/43] Postcopy implementation
  2014-08-12  1:50 ` [Qemu-devel] [PATCH v2 00/43] Postcopy implementation zhanghailiang
@ 2014-08-12  9:19   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-12  9:19 UTC (permalink / raw)
  To: zhanghailiang; +Cc: aarcange, yamahata, quintela, qemu-devel, lilei

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>


> Hi Dave,
> 
> I want to test your patches, but i failed to 'git am' them to the new
> qemu-2.1 source. I want to know if you has a git-branch which i can
> git clone directly? Thanks.

I should have one setup in a few days, I'll let you know.

(The code is based on HEAD as of yesterday, rather than 2.1)

Dave


> 
> Best regards,
> zhanghailiang
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] ????: [PATCH v2 06/43] Return path: socket_writev_buffer:?Block even on non-blocking fd's
  2014-08-12  2:13   ` [Qemu-devel] 答复: " chenliang (T)
@ 2014-08-12  9:36     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 55+ messages in thread
From: Dr. David Alan Gilbert @ 2014-08-12  9:36 UTC (permalink / raw)
  To: chenliang (T); +Cc: qemu-devel

* chenliang (T) (chenliang88@huawei.com) wrote:
> 
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The return path uses a non-blocking fd so as not to block waiting for the (possibly broken) destination to finish returning a message, however we still want outbound data to behave in the same way and block.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Hi David
>   It is confusing why don't use blocking fd?

I want to make sure that if the destination guest hung during
sending a return path command, that the source wouldn't block
the main thread (in which I'm currently processing the RP commands).

Paolo made a suggestion to move the RP processing to a separate
thread, which would also avoid the problem, and I'll be trying that.

Dave

> Best regards
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2014-08-12  9:42 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-11 14:29 [Qemu-devel] [PATCH v2 00/43] Postcopy implementation Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 01/43] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2014-08-11 18:29   ` Eric Blake
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 02/43] improve DPRINTF macros, add to savevm Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 03/43] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 04/43] Create MigrationIncomingState Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 05/43] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 06/43] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2014-08-12  2:13   ` [Qemu-devel] 答复: " chenliang (T)
2014-08-12  9:36     ` [Qemu-devel] ????: [PATCH v2 06/43] Return path: socket_writev_buffer:?Block " Dr. David Alan Gilbert
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 07/43] Migration commands Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 08/43] Return path: Control commands Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 09/43] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 10/43] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 11/43] qemu_loadvm errors and debug Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 12/43] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 13/43] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 14/43] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2014-08-11 16:47   ` Eric Blake
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 15/43] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 16/43] QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 17/43] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 18/43] Allow savevm handlers to state whether they could go into postcopy Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test Dr. David Alan Gilbert (git)
2014-08-12  5:32   ` zhanghailiang
2014-08-12  8:18     ` Dr. David Alan Gilbert
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 20/43] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2014-08-11 17:01   ` Eric Blake
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 21/43] MIG_STATE_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 22/43] qemu_savevm_state_complete: Postcopy changes Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 23/43] Postcopy: Maintain sentmap during postcopy pre phase Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 24/43] Postcopy page-map-incoming (PMI) structure Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 25/43] postcopy: Add incoming_init/cleanup functions Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 26/43] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 27/43] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 28/43] Postcopy: postcopy_start Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 29/43] Postcopy: Rework migration thread for postcopy mode Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 30/43] mig fd_connect: open return path Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 31/43] Postcopy: Create a fault handler thread before marking the ram as userfault Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 32/43] Page request: Add MIG_RPCOMM_REQPAGES reverse command Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 33/43] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 34/43] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 35/43] Add assertion to check migration_dirty_pages Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 36/43] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 37/43] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 38/43] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 39/43] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 40/43] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 41/43] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 42/43] End of migration for postcopy Dr. David Alan Gilbert (git)
2014-08-11 14:29 ` [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2014-08-11 17:19   ` Eric Blake
2014-08-11 17:58     ` Dr. David Alan Gilbert
2014-08-12  1:50 ` [Qemu-devel] [PATCH v2 00/43] Postcopy implementation zhanghailiang
2014-08-12  9:19   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.