All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration
@ 2016-08-21 20:58 Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 1/6] Migration: Reconnect network in case of network failure during pc migration (source) Md Haris Iqbal
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Final Report Link : https://harisphnx.github.io

Usage : Once the network fails, wait for both the sides to error out and go
        into recovery. This can be understood by the error messages printed at
        the terminal.
        Once both the sides are under recovery, the reconnection can be tried.
        First in destination side console, enter the migrate_incoming command
        with the new -r flaga along with the same details as used in the
        beginning.
        Then, same goes for the source side. Use the migrate command with the
        new -r flag and similar details as used in the beginning.

Md Haris Iqbal (6):
  Migration: Reconnect network in case of network failure during pc
    migration (source)
  migration : General additions for migration recovery
  Migration: Reconnect network in case of network failure during pc
    migration (destination)
  Migration: New bitmap for postcopy migration failure
  Migration: Recovering pages lost due to n/w failure during pc
    migration (source)
  Migration: Recovering pages lost due to n/w failure during pc
    migration (destination)

 hmp-commands.hx               |  34 +++---
 hmp.c                         |   7 +-
 include/migration/migration.h |  17 +++
 include/migration/qemu-file.h |   1 +
 include/sysemu/sysemu.h       |   1 +
 migration/migration.c         | 259 ++++++++++++++++++++++++++++++++++++++----
 migration/postcopy-ram.c      |  12 ++
 migration/qemu-file.c         |   5 +
 migration/ram.c               | 108 +++++++++++++++++-
 migration/savevm.c            |  52 +++++++--
 qapi-schema.json              |  18 ++-
 qemu-version.h                |   1 +
 qmp-commands.hx               |   7 +-
 vl.c                          |   4 +
 14 files changed, 472 insertions(+), 54 deletions(-)
 create mode 100644 qemu-version.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/6] Migration: Reconnect network in case of network failure during pc migration (source)
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 2/6] migration : General additions for migration recovery Md Haris Iqbal
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 hmp-commands.hx               |  20 +++---
 hmp.c                         |   4 +-
 include/migration/migration.h |   4 ++
 migration/migration.c         | 153 +++++++++++++++++++++++++++++++++++++-----
 qapi-schema.json              |  16 ++++-
 qmp-commands.hx               |   3 +-
 vl.c                          |   4 ++
 7 files changed, 174 insertions(+), 30 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 848efee..8f765fd 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -894,23 +894,25 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,recover:-r,blk:-b,inc:-i,uri:s",
+        .params     = "[-d] [-r] [-b] [-i] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
-		      "\n\t\t\t -b for migration without shared storage with"
-		      " full copy of disk\n\t\t\t -i for migration without "
-		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+                     "\n\t\t\t -r to recover from a broken migration\n\t\t\t"
+                     " -b for migration without shared storage with"
+                     " full copy of disk\n\t\t\t -i for migration without "
+                     "shared storage with incremental copy of disk "
+                     "(base image shared between src and destination)",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-r] [-b] [-i] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-	-b for migration with full copy of disk
-	-i for migration with incremental copy of disk (base image is shared)
+       -r to recover from a broken migration
+       -b for migration with full copy of disk
+       -i for migration with incremental copy of disk (base image is shared)
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index cc2056e..02ed457 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1563,12 +1563,14 @@ static void hmp_migrate_status_cb(void *opaque)
 void hmp_migrate(Monitor *mon, const QDict *qdict)
 {
     bool detach = qdict_get_try_bool(qdict, "detach", false);
+    bool recover = qdict_get_try_bool(qdict, "recover", false);
     bool blk = qdict_get_try_bool(qdict, "blk", false);
     bool inc = qdict_get_try_bool(qdict, "inc", false);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!recover, recover, !!blk, blk, !!inc, inc, false, false,
+                &err);
     if (err) {
         error_report_err(err);
         return;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3c96623..bcaf55d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -142,6 +142,7 @@ struct MigrationState
     int state;
     /* Old style params from 'migrate' command */
     MigrationParams params;
+    bool in_recovery;
 
     /* State related to return path */
     struct {
@@ -351,6 +352,9 @@ void flush_page_queue(MigrationState *ms);
 int ram_save_queue_pages(MigrationState *ms, const char *rbname,
                          ram_addr_t start, ram_addr_t len);
 
+int qemu_migrate_postcopy_outgoing_recovery(MigrationState *ms);
+int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,MigrationIncomingState* mis);
+
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
diff --git a/migration/migration.c b/migration/migration.c
index 955d5ee..6ed2e82 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -709,6 +709,33 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_CANCELLED:
         info->has_status = true;
         break;
+    case MIGRATION_STATUS_POSTCOPY_RECOVERY:
+        info->has_status = true;
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+        info->ram->dirty_sync_count = s->dirty_sync_count;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
     }
     info->status = s->state;
 
@@ -993,6 +1020,7 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->xfer_limit = 0;
     s->cleanup_bh = 0;
     s->to_dst_file = NULL;
+    s->in_recovery = false;
     s->state = MIGRATION_STATUS_NONE;
     s->params = *params;
     s->rp_state.from_dst_file = NULL;
@@ -1069,13 +1097,14 @@ bool migration_is_blocked(Error **errp)
     return false;
 }
 
-void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
+void qmp_migrate(const char *uri, bool in_recover, bool recover, bool has_blk,
+                 bool blk, bool has_inc, bool inc, bool has_detach, bool detach,
                  Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
     MigrationParams params;
+    bool recovery = in_recover && recover;
     const char *p;
 
     params.blk = has_blk && blk;
@@ -1095,7 +1124,39 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }
 
-    s = migrate_init(&params);
+    if (recovery ^ atomic_mb_read(&s->in_recovery)) {
+        if (recovery) {
+            /* No VM is waiting for recovery and
+             * recovery option was set
+             */
+
+            error_setg(errp, "No VM to recover");
+            return;
+        } else {
+            /* A VM is waiting for recovery and
+             * no recovery option is set
+             */
+
+            error_setg(errp, "A migration is in recovery state");
+            return;
+        }
+    } else {
+        if (!recovery) {
+            /* No VM is waiting for recovery and
+             * no recovery option is set
+             */
+            s = migrate_init(&params);
+        } else {
+            /* A VM is waiting for recovery and
+             * recovery option was set
+             */
+            s->to_dst_file = NULL;
+            if (s->rp_state.from_dst_file) {
+                /* shutdown the rp socket, so causing the rp thread to shutdown */
+                qemu_file_shutdown(s->rp_state.from_dst_file);
+            }
+        }
+    }
 
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
@@ -1336,6 +1397,8 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
  */
 static void *source_return_path_thread(void *opaque)
 {
+    fprintf(stderr, "Return path started on source\n");
+
     MigrationState *ms = opaque;
     QEMUFile *rp = ms->rp_state.from_dst_file;
     uint16_t header_len, header_type;
@@ -1439,8 +1502,8 @@ static void *source_return_path_thread(void *opaque)
 
     trace_source_return_path_thread_end();
 out:
-    ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
+    fprintf(stderr, "Return path failed on source\n");
     return NULL;
 }
 
@@ -1714,6 +1777,7 @@ static void *migration_thread(void *opaque)
     bool entered_postcopy = false;
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
+    int ret;
 
     rcu_register_thread();
 
@@ -1781,7 +1845,26 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->to_dst_file)) {
+        if ((ret = qemu_file_get_error(s->to_dst_file))) {
+            /*  This check is based on how the error is set during the network
+             *  recv(). When recv() returns 0 (i.e. no data to read), the error
+             *  is set to -EIO. For all other network errors, it is set
+             *  according to the return value received.
+             */
+            if (ret == -EIO && s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+                /* Network Failure during postcopy */
+
+                current_active_state = MIGRATION_STATUS_POSTCOPY_RECOVERY;
+                runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY);
+                ret = qemu_migrate_postcopy_outgoing_recovery(s);
+                if(ret == 0) {
+                    current_active_state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
+                    runstate_set(RUN_STATE_FINISH_MIGRATE);
+                    qemu_file_clear_error(s->to_dst_file);
+                    continue;
+                }
+
+            }
             migrate_set_state(&s->state, current_active_state,
                               MIGRATION_STATUS_FAILED);
             trace_migration_thread_file_err();
@@ -1852,17 +1935,6 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s)
 {
-    /* This is a best 1st approximation. ns to ms */
-    s->expected_downtime = max_downtime/1000000;
-    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
-
-    qemu_file_set_blocking(s->to_dst_file, true);
-    qemu_file_set_rate_limit(s->to_dst_file,
-                             s->bandwidth_limit / XFER_LIMIT_RATIO);
-
-    /* Notify before starting migration thread */
-    notifier_list_notify(&migration_state_notifiers, s);
-
     /*
      * Open the return path; currently for postcopy but other things might
      * also want it.
@@ -1877,12 +1949,61 @@ void migrate_fd_connect(MigrationState *s)
         }
     }
 
+    qemu_file_set_blocking(s->to_dst_file, true);
+    qemu_file_set_rate_limit(s->to_dst_file,
+                             s->bandwidth_limit / XFER_LIMIT_RATIO);
+
+    if (atomic_mb_read(&s->in_recovery)) {
+        qemu_mutex_lock(&migration_recovery_mutex);
+        atomic_mb_set(&s->in_recovery, false);
+        qemu_cond_signal(&migration_recovery_cond);
+        qemu_mutex_unlock(&migration_recovery_mutex);
+
+        fprintf(stderr, "recovered\n");
+        return;
+    }
+
+    /* This is a best 1st approximation. ns to ms */
+    s->expected_downtime = max_downtime/1000000;
+    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
+
+
+    /* Notify before starting migration thread */
+    notifier_list_notify(&migration_state_notifiers, s);
+
     migrate_compress_threads_create();
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
     s->migration_thread_running = true;
 }
 
+int qemu_migrate_postcopy_outgoing_recovery(MigrationState* ms)
+{
+    migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                  MIGRATION_STATUS_POSTCOPY_RECOVERY);
+
+    atomic_mb_set(&ms->in_recovery, true);
+    /* Code for network recovery to be added here */
+    qemu_mutex_lock(&migration_recovery_mutex);
+    while(atomic_mb_read(&ms->in_recovery) == true) {
+        fprintf(stderr, "Under recovery, not letting it fail %p\n", ms->to_dst_file);
+        qemu_cond_wait(&migration_recovery_cond, &migration_recovery_mutex);
+    }
+    qemu_mutex_unlock(&migration_recovery_mutex);
+
+    if(ms->to_dst_file != NULL) {
+        /* Recovery successfull */
+        migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_RECOVERY,
+                                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+        qemu_savevm_send_open_return_path(ms->to_dst_file);
+        return 0;
+    }
+
+    return -1;
+
+}
+
 PostcopyState  postcopy_state_get(void)
 {
     return atomic_mb_read(&incoming_postcopy_state);
diff --git a/qapi-schema.json b/qapi-schema.json
index 5658723..a658462 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -154,12 +154,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @postmigrate-recovery: guest is paused for recovery after a network failure
+# (since 2.7)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked' ] }
+            'guest-panicked', 'postmigrate-recovery' ] }
 
 ##
 # @StatusInfo:
@@ -438,12 +441,15 @@
 #
 # @failed: some error occurred during migration process.
 #
+# @postcopy-recovery: in recovery mode, after a network failure. (since 2.7)
+#
 # Since: 2.3
 #
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed',
+            'postcopy-recovery' ] }
 
 ##
 # @MigrationInfo
@@ -2119,6 +2125,8 @@
 #
 # @uri: the Uniform Resource Identifier of the destination VM
 #
+# @recover: #optional recover from a broken migration (since 2.7)
+#
 # @blk: #optional do block migration (full disk copy)
 #
 # @inc: #optional incremental disk copy migration
@@ -2131,7 +2139,7 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*recover': 'bool', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
 
 ##
 # @migrate-incoming
@@ -2142,6 +2150,8 @@
 # @uri: The Uniform Resource Identifier identifying the source or
 #       address to listen on
 #
+# @recover: #optional recover from a broken migration (since 2.7)
+#
 # Returns: nothing on success
 #
 # Since: 2.3
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6866264..dd727bf 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -639,7 +639,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,recover:-r,blk:-b,inc:-i,uri:s",
         .mhandler.cmd_new = qmp_marshal_migrate,
     },
 
@@ -651,6 +651,7 @@ Migrate to URI.
 
 Arguments:
 
+- "recover": recover migration (json-bool, optional)
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
 - "uri": Destination URI (json-string)
diff --git a/vl.c b/vl.c
index b3c80d5..f702886 100644
--- a/vl.c
+++ b/vl.c
@@ -597,6 +597,10 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY },
+
+    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN },
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/6] migration : General additions for migration recovery
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 1/6] Migration: Reconnect network in case of network failure during pc migration (source) Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 3/6] Migration: Reconnect network in case of network failure during pc migration (destination) Md Haris Iqbal
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 include/migration/qemu-file.h | 1 +
 migration/migration.c         | 3 +++
 migration/qemu-file.c         | 5 +++++
 3 files changed, 9 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index abedd46..56a51b9 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -181,6 +181,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+void qemu_file_clear_error(QEMUFile *f);
 int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 6ed2e82..149cf1e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -63,6 +63,9 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+static QemuMutex migration_recovery_mutex;
+static QemuCond migration_recovery_cond;
+
 /*
  * Current state of incoming postcopy; note this is not part of
  * MigrationIncomingState since it's state is used during cleanup
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e9fae31..60e53c9 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -127,6 +127,11 @@ void qemu_file_set_error(QEMUFile *f, int ret)
     }
 }
 
+void qemu_file_clear_error(QEMUFile *f)
+{
+    f->last_error = 0;
+}
+
 bool qemu_file_is_writable(QEMUFile *f)
 {
     return f->ops->writev_buffer;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 3/6] Migration: Reconnect network in case of network failure during pc migration (destination)
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 1/6] Migration: Reconnect network in case of network failure during pc migration (source) Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 2/6] migration : General additions for migration recovery Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 4/6] Migration: New bitmap for postcopy migration failure Md Haris Iqbal
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 hmp-commands.hx               | 14 ++++---
 hmp.c                         |  3 +-
 include/migration/migration.h |  3 ++
 migration/migration.c         | 97 ++++++++++++++++++++++++++++++++++++++++---
 migration/postcopy-ram.c      |  9 ++++
 migration/savevm.c            | 35 ++++++++++++----
 qapi-schema.json              |  2 +-
 qemu-version.h                |  1 +
 qmp-commands.hx               |  4 +-
 9 files changed, 145 insertions(+), 23 deletions(-)
 create mode 100644 qemu-version.h

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8f765fd..e468c53 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -932,17 +932,19 @@ ETEXI
 
     {
         .name       = "migrate_incoming",
-        .args_type  = "uri:s",
-        .params     = "uri",
-        .help       = "Continue an incoming migration from an -incoming defer",
+        .args_type  = "recover:-r,uri:s",
+        .params     = "[-r] uri",
+        .help       = "Continue an incoming migration from an -incoming defer"
+                     "\n\t\t\t -r to recover from a broken migration",
         .mhandler.cmd = hmp_migrate_incoming,
     },
 
 STEXI
-@item migrate_incoming @var{uri}
+@item migrate_incoming [-r] @var{uri}
 @findex migrate_incoming
-Continue an incoming migration using the @var{uri} (that has the same syntax
-as the -incoming option).
+Continue an incoming migration using the @var{uri}
+    -r to recover from a broken migration (that has the same syntax
+    as the -incoming option).
 
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 02ed457..965e4f3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1186,9 +1186,10 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
+    bool recover = qdict_get_try_bool(qdict, "recover", false);
     const char *uri = qdict_get_str(qdict, "uri");
 
-    qmp_migrate_incoming(uri, &err);
+    qmp_migrate_incoming(uri, !!recover, recover, &err);
 
     hmp_handle_error(mon, &err);
 }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index bcaf55d..74d456e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -82,6 +82,9 @@ typedef enum {
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
 
+    /* To be used by a VM for recovery */
+    bool in_recovery;
+
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
diff --git a/migration/migration.c b/migration/migration.c
index 149cf1e..166f4f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -432,6 +432,8 @@ void migration_fd_process_incoming(QEMUFile *f)
 void migration_channel_process_incoming(MigrationState *s,
                                         QIOChannel *ioc)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
 
@@ -445,6 +447,19 @@ void migration_channel_process_incoming(MigrationState *s,
         }
     } else {
         QEMUFile *f = qemu_fopen_channel_input(ioc);
+
+        if (mis != NULL && atomic_mb_read(&mis->in_recovery)) {
+            mis->from_src_file = f;
+
+            qemu_mutex_lock(&migration_recovery_mutex);
+            atomic_mb_set(&mis->in_recovery, false);
+            qemu_cond_signal(&migration_recovery_cond);
+            qemu_mutex_unlock(&migration_recovery_mutex);
+
+            fprintf(stderr, "recovered\n");
+            return;
+        }
+
         migration_fd_process_incoming(f);
     }
 }
@@ -1063,19 +1078,62 @@ void migrate_del_blocker(Error *reason)
     migration_blockers = g_slist_remove(migration_blockers, reason);
 }
 
-void qmp_migrate_incoming(const char *uri, Error **errp)
+void qmp_migrate_incoming(const char *uri, bool in_recover, bool recover, Error **errp)
 {
     Error *local_err = NULL;
+    bool recovery = in_recover && recover;
     static bool once = true;
+    MigrationIncomingState *mis = migration_incoming_get_current();
 
-    if (!deferred_incoming) {
-        error_setg(errp, "For use with '-incoming defer'");
-        return;
-    }
-    if (!once) {
+    if (recovery) {
+        if (mis != NULL) {
+
+            if(!atomic_mb_read(&mis->in_recovery)) {
+                /* Recovery option was set but the VM
+                 * Does not seem to have been in recovery
+                 */
+                error_setg(errp, "No VM to recover");
+                return;
+            } else {
+                /* Recovery option was set and the VM
+                 * needs a recovery, resetting the socket
+                 * to NULL
+                 */
+                mis->from_src_file = NULL;
+                if(mis->have_fault_thread) {
+                    /* shutdown the socket to source, causing the fault_thread to shutdown */
+                    uint64_t tmp64 = 1;
+
+                    fprintf(stderr, "rp shutdown\n");
+
+                    if (write(mis->userfault_quit_fd, &tmp64, 8) != 8) {
+                        error_report("%s: incrementing userfault_quit_fd: %s",
+                            __func__, strerror(errno));
+                    }
+                    close(mis->userfault_quit_fd);
+                    close(mis->userfault_fd);
+                    mis->have_fault_thread = false;
+                }
+                fprintf(stderr, "rp after shutdown %p\n", mis->to_src_file);
+            }
+
+        } else {
+            /* Recovery option was set but there
+             * is no VM running/(in recovery)
+             */
+            error_setg(errp, "Cannot use -r option without a VM to recover");
+            return;
+        }
+    } else if (!once) {
         error_setg(errp, "The incoming migration has already been started");
     }
 
+    if (!recover && !deferred_incoming) {
+         error_setg(errp, "For use with '-incoming defer'");
+         return;
+     }
+
+
     qemu_start_incoming_migration(uri, &local_err);
 
     if (local_err) {
@@ -2007,6 +2065,33 @@ int qemu_migrate_postcopy_outgoing_recovery(MigrationState* ms)
 
 }
 
+int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,
+                                            MigrationIncomingState* mis)
+{
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                                   MIGRATION_STATUS_POSTCOPY_RECOVERY);
+
+    atomic_mb_set(&mis->in_recovery, true);
+    /* Code for network recovery to be added here */
+    qemu_mutex_lock(&migration_recovery_mutex);
+    while(atomic_mb_read(&mis->in_recovery) == true) {
+        fprintf(stderr, "Recover, not letting it fail %p\n", mis->from_src_file);
+        qemu_cond_wait(&migration_recovery_cond, &migration_recovery_mutex);
+    }
+    qemu_mutex_unlock(&migration_recovery_mutex);
+
+    if(mis->from_src_file != NULL) {
+        *f = mis->from_src_file;
+
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVERY,
+                                       MIGRATION_STATUS_ACTIVE);
+        return 0;
+    }
+
+    return -1;
+}
+
+
 PostcopyState  postcopy_state_get(void)
 {
     return atomic_mb_read(&incoming_postcopy_state);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9b04778..d19c13a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -393,6 +393,8 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
  */
 static void *postcopy_ram_fault_thread(void *opaque)
 {
+    fprintf(stderr, "return path thread started\n");
+
     MigrationIncomingState *mis = opaque;
     struct uffd_msg msg;
     int ret;
@@ -481,8 +483,15 @@ static void *postcopy_ram_fault_thread(void *opaque)
             migrate_send_rp_req_pages(mis, NULL,
                                      rb_offset, hostpagesize);
         }
+
+        ret = qemu_file_get_error(mis->to_src_file);
+        if (ret != 0) {
+            qemu_file_clear_error(mis->to_src_file);
+            break;
+        }
     }
     trace_postcopy_ram_fault_thread_exit();
+    fprintf(stderr, "return path failed\n");
     return NULL;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 33a2911..79f601c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1829,6 +1829,7 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret;
+    PostcopyState ps;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 
@@ -1837,28 +1838,46 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
             ret = qemu_loadvm_section_start_full(f, mis);
-            if (ret < 0) {
-                return ret;
-            }
             break;
         case QEMU_VM_SECTION_PART:
         case QEMU_VM_SECTION_END:
             ret = qemu_loadvm_section_part_end(f, mis);
-            if (ret < 0) {
-                return ret;
-            }
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
             trace_qemu_loadvm_state_section_command(ret);
-            if ((ret < 0) || (ret & LOADVM_QUIT)) {
+            if (ret & LOADVM_QUIT) {
+                fprintf(stderr, "LOADVM_QUIT\n");
                 return ret;
-            }
+             }
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             return -EINVAL;
         }
+
+        if (ret < 0) {
+            ps = postcopy_state_get();
+            ret = qemu_file_get_error(f);
+
+            /*  This check is based on how the error is set during the network
+             *  recv(). When recv() returns 0 (i.e. no data to read), the error
+             *  is set to -EIO. For all other network errors, it is set
+             *  according to the return value received.
+             */
+            if (ret == -EIO && ps == POSTCOPY_INCOMING_RUNNING) {
+                ret = qemu_migrate_postcopy_incoming_recovery(&f, mis);
+
+                if (ret == 0) {
+                    postcopy_ram_enable_notify(mis);
+                    qemu_file_clear_error(f);
+                    continue;
+                }
+            }
+
+            ret = qemu_file_get_error(f);
+            return ret;
+        }
     }
 
     return 0;
diff --git a/qapi-schema.json b/qapi-schema.json
index a658462..6a4c23b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2159,7 +2159,7 @@
 # compatible with -incoming and the format of the uri is already exposed
 # above libvirt
 ##
-{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
+{ 'command': 'migrate-incoming', 'data': {'uri': 'str', '*recover': 'bool' } }
 
 # @xen-save-devices-state:
 #
diff --git a/qemu-version.h b/qemu-version.h
new file mode 100644
index 0000000..9ce32a4
--- /dev/null
+++ b/qemu-version.h
@@ -0,0 +1 @@
+#define QEMU_PKGVERSION " (v2.6.0-1776-g689a31f-dirty)"
diff --git a/qmp-commands.hx b/qmp-commands.hx
index dd727bf..4234bc9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -694,7 +694,7 @@ EQMP
 
     {
         .name       = "migrate-incoming",
-        .args_type  = "uri:s",
+        .args_type  = "recover:-r,uri:s",
         .mhandler.cmd_new = qmp_marshal_migrate_incoming,
     },
 
@@ -703,10 +703,12 @@ migrate-incoming
 ----------------
 
 Continue an incoming migration
+ -r to recover from a broken migration
 
 Arguments:
 
 - "uri": Source/listening URI (json-string)
+- "recover": recover migration (json-bool, optional)
 
 Example:
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 4/6] Migration: New bitmap for postcopy migration failure
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
                   ` (2 preceding siblings ...)
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 3/6] Migration: Reconnect network in case of network failure during pc migration (destination) Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 5/6] Migration: Recovering pages lost due to n/w failure during pc migration (source) Md Haris Iqbal
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         |  4 +++
 migration/postcopy-ram.c      |  3 ++
 migration/ram.c               | 73 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 74d456e..4e4c0c8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -358,6 +358,10 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 int qemu_migrate_postcopy_outgoing_recovery(MigrationState *ms);
 int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,MigrationIncomingState* mis);
 
+void migrate_incoming_ram_bitmap_init(void);
+void migrate_incoming_ram_bitmap_update(RAMBlock *rb, ram_addr_t addr);
+void migrate_incoming_ram_bitmap_free(void);
+
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
diff --git a/migration/migration.c b/migration/migration.c
index 166f4f7..7cd3344 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -381,6 +381,10 @@ static void process_incoming_migration_co(void *opaque)
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
                       MIGRATION_STATUS_ACTIVE);
+
+    /* Initializing the bitmap for destination side */
+    migrate_incoming_ram_bitmap_init();
+
     ret = qemu_loadvm_state(f);
 
     ps = postcopy_state_get();
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d19c13a..a8bb311 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -317,6 +317,9 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     postcopy_state_set(POSTCOPY_INCOMING_END);
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
 
+    /* Free the bitmap used to keep track of incoming pages */
+    migrate_incoming_ram_bitmap_free();
+
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index a3d70c4..ea1382b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -250,6 +250,13 @@ static struct BitmapRcu {
      * of the postcopy phase
      */
     unsigned long *unsentmap;
+    /*
+     * A new bitmap for postcopy network failure recovery.
+     * It keeps track of the pages recieved.
+     * In the end, it would be used to request pages that were
+     * lost due to network failure.
+     */
+    unsigned long *not_received;
 } *migration_bitmap_rcu;
 
 struct CompressParam {
@@ -2340,6 +2347,7 @@ static int ram_load_postcopy(QEMUFile *f)
         void *page_buffer = NULL;
         void *place_source = NULL;
         uint8_t ch;
+        RAMBlock* block = NULL;
 
         addr = qemu_get_be64(f);
         flags = addr & ~TARGET_PAGE_MASK;
@@ -2348,7 +2356,7 @@ static int ram_load_postcopy(QEMUFile *f)
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
         place_needed = false;
         if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE)) {
-            RAMBlock *block = ram_block_from_stream(f, flags);
+            block = ram_block_from_stream(f, flags);
 
             host = host_from_ram_block_offset(block, addr);
             if (!host) {
@@ -2436,6 +2444,15 @@ static int ram_load_postcopy(QEMUFile *f)
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
+        if (block != NULL) {
+            /*
+             * TODO
+             * We need to delay updating the bits until host page is
+             * recieved and the place is done, or tidy up the bitmap later
+             * accordingly (whether whole host page was recieved or not)
+             */
+            migrate_incoming_ram_bitmap_update(block, addr);
+        }
     }
 
     return ret;
@@ -2483,6 +2500,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             RAMBlock *block = ram_block_from_stream(f, flags);
 
             host = host_from_ram_block_offset(block, addr);
+
+            migrate_incoming_ram_bitmap_update(block, addr);
+            /*
+             * TODO
+             * 1) Do we need a bitmap_update call later in the while loop also?
+             * 2) We need to delay updating the bits until host page is
+             * recieved and the place is done, or tidy up the bitmap later
+             * accordingly (whether whole host page was recieved or not)
+             */
+
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -2578,6 +2605,50 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+void migrate_incoming_ram_bitmap_init(void)
+{
+    int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
+
+    /*
+     * A new bitmap for postcopy network failure recovery.
+     * It keeps track of the pages recieved.
+     * In the end, it would be used to request pages that were
+     * lost due to network failure.
+     */
+
+    ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+    migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
+    migration_bitmap_rcu->not_received = bitmap_new(ram_bitmap_pages);
+    bitmap_set(migration_bitmap_rcu->not_received, 0, ram_bitmap_pages);
+}
+
+void migrate_incoming_ram_bitmap_update(RAMBlock *rb, ram_addr_t addr)
+{
+    unsigned long base = rb->offset >> TARGET_PAGE_BITS;
+    unsigned long nr = base + (addr >> TARGET_PAGE_BITS);
+    unsigned long *bitmap;
+
+    bitmap = atomic_rcu_read(&migration_bitmap_rcu)->not_received;
+    clear_bit(nr, bitmap);
+
+    static int count = 0;
+    count++;
+    if(count == 1000) {
+        count = 0;
+        ram_debug_dump_bitmap(bitmap, true);
+    }
+}
+
+void migrate_incoming_ram_bitmap_free(void)
+{
+    struct BitmapRcu *bitmap = migration_bitmap_rcu;
+    atomic_rcu_set(&migration_bitmap_rcu, NULL);
+    if (bitmap) {
+        memory_global_dirty_log_stop();
+        call_rcu(bitmap, migration_bitmap_free, rcu);
+    }
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 5/6] Migration: Recovering pages lost due to n/w failure during pc migration (source)
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
                   ` (3 preceding siblings ...)
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 4/6] Migration: New bitmap for postcopy migration failure Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 6/6] Migration: Recovering pages lost due to n/w failure during pc migration (destination) Md Haris Iqbal
  2016-08-21 21:10 ` [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration no-reply
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 include/migration/migration.h | 5 +++++
 migration/migration.c         | 2 ++
 migration/savevm.c            | 5 +++++
 3 files changed, 12 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4e4c0c8..5533832 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -145,6 +145,11 @@ struct MigrationState
     int state;
     /* Old style params from 'migrate' command */
     MigrationParams params;
+    /*
+     * Don't need 2 variables for recovery.
+     * Clean this up, use a single variable with different states.
+     */
+    bool recovered_once;
     bool in_recovery;
 
     /* State related to return path */
diff --git a/migration/migration.c b/migration/migration.c
index 7cd3344..6faa483 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1042,6 +1042,7 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->xfer_limit = 0;
     s->cleanup_bh = 0;
     s->to_dst_file = NULL;
+    s->recovered_once = false;
     s->in_recovery = false;
     s->state = MIGRATION_STATUS_NONE;
     s->params = *params;
@@ -1925,6 +1926,7 @@ static void *migration_thread(void *opaque)
                 if(ret == 0) {
                     current_active_state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
                     runstate_set(RUN_STATE_FINISH_MIGRATE);
+                    s->recovered_once = true;
                     qemu_file_clear_error(s->to_dst_file);
                     continue;
                 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 79f601c..aa4f777 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -986,6 +986,11 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 {
     SaveStateEntry *se;
     int ret;
+    MigrationState *ms = migrate_get_current();
+
+    if (ms->recovered_once) {
+        qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RECOVERY, 0, NULL);
+    }
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_live_complete_postcopy) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 6/6] Migration: Recovering pages lost due to n/w failure during pc migration (destination)
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
                   ` (4 preceding siblings ...)
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 5/6] Migration: Recovering pages lost due to n/w failure during pc migration (source) Md Haris Iqbal
@ 2016-08-21 20:58 ` Md Haris Iqbal
  2016-08-21 21:10 ` [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration no-reply
  6 siblings, 0 replies; 8+ messages in thread
From: Md Haris Iqbal @ 2016-08-21 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, Md Haris Iqbal

Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  1 +
 migration/ram.c               | 35 +++++++++++++++++++++++++++++++++++
 migration/savevm.c            | 12 ++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5533832..cda5ece 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -366,6 +366,7 @@ int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,MigrationIncomingState*
 void migrate_incoming_ram_bitmap_init(void);
 void migrate_incoming_ram_bitmap_update(RAMBlock *rb, ram_addr_t addr);
 void migrate_incoming_ram_bitmap_free(void);
+void *migrate_incoming_ram_req_pages(void *opaque);
 
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ee7c760..af5630c 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -100,6 +100,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_POSTCOPY_RECOVERY,  /* Send pages lost due to n/w failure */
     MIG_CMD_MAX
 };
 
diff --git a/migration/ram.c b/migration/ram.c
index ea1382b..28381b6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2649,6 +2649,41 @@ void migrate_incoming_ram_bitmap_free(void)
     }
 }
 
+void *migrate_incoming_ram_req_pages(void* opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct RAMBlock *rb;
+    size_t hostpagesize = getpagesize();
+    uint64_t addr;
+    unsigned long base;
+    unsigned long nr;
+    unsigned long rb_end;
+    unsigned long next;
+    unsigned long *not_received;
+
+    not_received = atomic_rcu_read(&migration_bitmap_rcu)->not_received;
+    QLIST_FOREACH_RCU(rb, &ram_list.blocks, next) {
+        addr = 0;
+        base = rb->offset >> TARGET_PAGE_BITS;
+        rb_end = base + (rb->used_length >> TARGET_PAGE_BITS);
+        while (true) {
+            nr = base + (addr >> TARGET_PAGE_BITS);
+            next = find_next_bit(not_received, rb_end, nr);
+            addr = (next - base) << TARGET_PAGE_BITS;
+
+            if (addr >= rb->used_length) {
+                break;
+            }
+            else {
+                migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                     addr, hostpagesize);
+                addr++;
+            }
+        }
+    }
+    return NULL;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/savevm.c b/migration/savevm.c
index aa4f777..2301b74 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1621,6 +1621,7 @@ static int loadvm_process_command(QEMUFile *f)
     uint16_t cmd;
     uint16_t len;
     uint32_t tmp32;
+    QemuThread req_pages_not_received;
 
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -1677,6 +1678,17 @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case MIG_CMD_POSTCOPY_RECOVERY:
+        /*
+         * This case will only be used when migration recovers from a
+         * network failure during a postcopy migration.
+         * Now, send the requests for pages that were lost due to the
+         * network failure.
+         */
+        qemu_thread_create(&req_pages_not_received, "pc/recovery",
+                       migrate_incoming_ram_req_pages, mis, QEMU_THREAD_DETACHED);
+            break;
     }
 
     return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration
  2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
                   ` (5 preceding siblings ...)
  2016-08-21 20:58 ` [Qemu-devel] [PATCH 6/6] Migration: Recovering pages lost due to n/w failure during pc migration (destination) Md Haris Iqbal
@ 2016-08-21 21:10 ` no-reply
  6 siblings, 0 replies; 8+ messages in thread
From: no-reply @ 2016-08-21 21:10 UTC (permalink / raw)
  To: haris.phnx; +Cc: famz, qemu-devel, dgilbert

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Message-id: 1471813132-13836-1-git-send-email-haris.phnx@gmail.com
Subject: [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1471813132-13836-1-git-send-email-haris.phnx@gmail.com -> patchew/1471813132-13836-1-git-send-email-haris.phnx@gmail.com
Switched to a new branch 'test'
f265ac8 Migration: Recovering pages lost due to n/w failure during pc migration (destination)
f92ad90 Migration: Recovering pages lost due to n/w failure during pc migration (source)
0b2e481 Migration: New bitmap for postcopy migration failure
970a113 Migration: Reconnect network in case of network failure during pc migration (destination)
be8f55b migration : General additions for migration recovery
e16354b Migration: Reconnect network in case of network failure during pc migration (source)

=== OUTPUT BEGIN ===
Checking PATCH 1/6: Migration: Reconnect network in case of network failure during pc migration (source)...
WARNING: line over 80 characters
#86: FILE: include/migration/migration.h:356:
+int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,MigrationIncomingState* mis);

ERROR: space required after that ',' (ctx:VxV)
#86: FILE: include/migration/migration.h:356:
+int qemu_migrate_postcopy_incoming_recovery(QEMUFile **f,MigrationIncomingState* mis);
                                                         ^

WARNING: line over 80 characters
#187: FILE: migration/migration.c:1155:
+                /* shutdown the rp socket, so causing the rp thread to shutdown */

ERROR: do not use assignment in if condition
#227: FILE: migration/migration.c:1848:
+        if ((ret = qemu_file_get_error(s->to_dst_file))) {

ERROR: space required before the open parenthesis '('
#239: FILE: migration/migration.c:1860:
+                if(ret == 0) {

ERROR: spaces required around that '/' (ctx:VxV)
#287: FILE: migration/migration.c:1967:
+    s->expected_downtime = max_downtime/1000000;
                                        ^

ERROR: "foo* bar" should be "foo *bar"
#300: FILE: migration/migration.c:1980:
+int qemu_migrate_postcopy_outgoing_recovery(MigrationState* ms)

ERROR: space required before the open parenthesis '('
#308: FILE: migration/migration.c:1988:
+    while(atomic_mb_read(&ms->in_recovery) == true) {

WARNING: line over 80 characters
#309: FILE: migration/migration.c:1989:
+        fprintf(stderr, "Under recovery, not letting it fail %p\n", ms->to_dst_file);

ERROR: space required before the open parenthesis '('
#314: FILE: migration/migration.c:1994:
+    if(ms->to_dst_file != NULL) {

total: 7 errors, 3 warnings, 371 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/6: migration : General additions for migration recovery...
Checking PATCH 3/6: Migration: Reconnect network in case of network failure during pc migration (destination)...
WARNING: line over 80 characters
#108: FILE: migration/migration.c:1081:
+void qmp_migrate_incoming(const char *uri, bool in_recover, bool recover, Error **errp)

ERROR: space required before the open parenthesis '('
#123: FILE: migration/migration.c:1091:
+            if(!atomic_mb_read(&mis->in_recovery)) {

ERROR: space required before the open parenthesis '('
#135: FILE: migration/migration.c:1103:
+                if(mis->have_fault_thread) {

ERROR: line over 90 characters
#136: FILE: migration/migration.c:1104:
+                    /* shutdown the socket to source, causing the fault_thread to shutdown */

ERROR: suspect code indent for conditional statements (4, 9)
#163: FILE: migration/migration.c:1131:
+    if (!recover && !deferred_incoming) {
+         error_setg(errp, "For use with '-incoming defer'");

ERROR: "foo* bar" should be "foo *bar"
#177: FILE: migration/migration.c:2069:
+                                            MigrationIncomingState* mis)

ERROR: space required before the open parenthesis '('
#185: FILE: migration/migration.c:2077:
+    while(atomic_mb_read(&mis->in_recovery) == true) {

WARNING: line over 80 characters
#186: FILE: migration/migration.c:2078:
+        fprintf(stderr, "Recover, not letting it fail %p\n", mis->from_src_file);

ERROR: space required before the open parenthesis '('
#191: FILE: migration/migration.c:2083:
+    if(mis->from_src_file != NULL) {

total: 7 errors, 2 warnings, 286 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/6: Migration: New bitmap for postcopy migration failure...
ERROR: "foo* bar" should be "foo *bar"
#75: FILE: migration/ram.c:2350:
+        RAMBlock* block = NULL;

ERROR: do not initialise statics to 0 or NULL
#151: FILE: migration/ram.c:2634:
+    static int count = 0;

ERROR: space required before the open parenthesis '('
#153: FILE: migration/ram.c:2636:
+    if(count == 1000) {

total: 3 errors, 0 warnings, 138 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/6: Migration: Recovering pages lost due to n/w failure during pc migration (source)...
Checking PATCH 6/6: Migration: Recovering pages lost due to n/w failure during pc migration (destination)...
ERROR: else should follow close brace '}'
#67: FILE: migration/ram.c:2677:
+            }
+            else {

WARNING: line over 80 characters
#105: FILE: migration/savevm.c:1690:
+                       migrate_incoming_ram_req_pages, mis, QEMU_THREAD_DETACHED);

total: 1 errors, 1 warnings, 79 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-21 21:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-21 20:58 [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 1/6] Migration: Reconnect network in case of network failure during pc migration (source) Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 2/6] migration : General additions for migration recovery Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 3/6] Migration: Reconnect network in case of network failure during pc migration (destination) Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 4/6] Migration: New bitmap for postcopy migration failure Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 5/6] Migration: Recovering pages lost due to n/w failure during pc migration (source) Md Haris Iqbal
2016-08-21 20:58 ` [Qemu-devel] [PATCH 6/6] Migration: Recovering pages lost due to n/w failure during pc migration (destination) Md Haris Iqbal
2016-08-21 21:10 ` [Qemu-devel] [PATCH 0/6] Recovery from network failure during Postcopy Migration no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.