All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance
@ 2013-09-29 20:14 Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 1/4] Curling: add doc Jules Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jules Wang @ 2013-09-29 20:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Jules Wang, owasserm, quintela

v1 -> v2:
* cmdline: migrate curling:tcp:<address>:<port> 
       ->  migrate -f tcp:<address>:<port>

* sender: use QEMU_VM_FILE_MAGIC_FT as the header of the migration
          to indicate this is a ft migration.

* receiver: look for the signature: 
            QEMU_VM_EOF_MAGIC + QEMU_VM_FILE_MAGIC_FT(64bit total)
            which indicates the end of one migration.
--
Jules Wang (4):
  Curling: add doc
  Curling: cmdline interface.
  Curling: the sender
  Curling: the receiver

 arch_init.c                   |  25 ++++--
 docs/curling.txt              |  51 ++++++++++++
 hmp-commands.hx               |  11 ++-
 hmp.c                         |   3 +-
 include/migration/migration.h |   1 +
 include/migration/qemu-file.h |   1 +
 include/sysemu/sysemu.h       |   5 +-
 migration.c                   |  50 ++++++++++--
 qapi-schema.json              |   3 +-
 qmp-commands.hx               |   3 +-
 savevm.c                      | 178 +++++++++++++++++++++++++++++++++++++++---
 11 files changed, 301 insertions(+), 30 deletions(-)
 create mode 100644 docs/curling.txt

-- 
1.8.0.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] Curling: add doc
  2013-09-29 20:14 [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance Jules Wang
@ 2013-09-29 20:14 ` Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface Jules Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Jules Wang @ 2013-09-29 20:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Jules Wang, owasserm, quintela

Curling provides fault tolerant mechanism for KVM.
For more info, see 'doc/curling.txt'.

Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn>
---
 docs/curling.txt | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100644 docs/curling.txt

diff --git a/docs/curling.txt b/docs/curling.txt
new file mode 100644
index 0000000..f506a77
--- /dev/null
+++ b/docs/curling.txt
@@ -0,0 +1,51 @@
+KVM Fault Tolerance Specification
+=================================
+
+
+Contents:
+=========
+* Introduction
+* Usage
+* Design & Implement
+* Performance
+
+Introduction
+============
+The goal of Curling(sports) is to provide a fault tolerant(ft for short)
+mechanism for KVM, so that in the event of a hardware failure, the virtual
+machine fails over to the backup in a way that is completely transparent
+to the guest operating system.
+
+
+Usage
+=====
+The steps of curling are the same as the steps of live migration except the
+following:
+1. Start ft in the qemu monitor of sender vm by following cmdline:
+   > migrate_set_speed <full bandwidth>
+   > migrate -f tcp:<address>:<port>
+2. Connect to the receiver vm by vnc or spice. The screen of the vm is displayed
+when ft is ready.
+3. Now, the sender vm is protected by ft, When it encounters a failure,
+the failover kicks in.
+
+
+
+Design & Implement
+==================
+* By leveraging live migration feature, we do endless live migrations between
+the sender and receiver, so the two virtual machines are synchronized.
+
+* The receiver does not load vm state once the migration begins, instead, it
+perfetches one whole migration data into a buffer, then loads vm state from
+that buffer afterwards. This "all or nothing" approach prevents the
+broken-in-the-middle problem Kemari has.
+
+* The sender sleeps a little while after each migration, to ease the
+performance penalty entailed by vm_stop and iothread locks. This is a
+tradeoff between performance and accuracy.
+....
+
+
+Performance
+===========
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
  2013-09-29 20:14 [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 1/4] Curling: add doc Jules Wang
@ 2013-09-29 20:14 ` Jules Wang
  2013-09-30 22:16   ` Eric Blake
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 3/4] Curling: the sender Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 4/4] Curling: the receiver Jules Wang
  3 siblings, 1 reply; 9+ messages in thread
From: Jules Wang @ 2013-09-29 20:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Jules Wang, owasserm, quintela

Add an option '-f' to migration cmdline.
Indicating whether to enable fault tolerant or not.

Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn>
---
 hmp-commands.hx               | 11 +++++++----
 hmp.c                         |  3 ++-
 include/migration/migration.h |  1 +
 migration.c                   |  3 ++-
 qapi-schema.json              |  3 ++-
 qmp-commands.hx               |  3 ++-
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 65b7f60..8418d37 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -877,23 +877,26 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,ft:-f,uri:s",
+        .params     = "[-d] [-b] [-i] [-f] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+		      "\n\t\t\t -f for fault tolerant, this is another "
+		      "feature rather than migrate",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-f] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+	-f for fault tolerant
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index fcca6ae..91beae9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int detach = qdict_get_try_bool(qdict, "detach", 0);
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
+    int ft  = qdict_get_try_bool(qdict, "ft", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!ft, ft, &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 140e6b4..fc2b066 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -25,6 +25,7 @@
 
 struct MigrationParams {
     bool blk;
+    bool ft;
     bool shared;
 };
 
diff --git a/migration.c b/migration.c
index 200d404..8989a51 100644
--- a/migration.c
+++ b/migration.c
@@ -394,7 +394,7 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_ft, bool ft, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
@@ -403,6 +403,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
+    params.ft = has_ft && ft;
 
     if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/qapi-schema.json b/qapi-schema.json
index a51f7d2..a8187cf 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2420,7 +2420,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
+           '*ft': 'bool' } }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8a8f342..1fa0e60 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -611,7 +611,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,ft:-f,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -625,6 +625,7 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "ft" : fault tolerant (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] Curling: the sender
  2013-09-29 20:14 [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 1/4] Curling: add doc Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface Jules Wang
@ 2013-09-29 20:14 ` Jules Wang
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 4/4] Curling: the receiver Jules Wang
  3 siblings, 0 replies; 9+ messages in thread
From: Jules Wang @ 2013-09-29 20:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Jules Wang, owasserm, quintela

By leveraging live migration feature, the sender simply starts a
new migration when the previous migration is completed.

We need to handle the variables related to live migration very
carefully. So the new migration does not restart from the very
begin of the migration, instead, it continues the previous
migration.

Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn>
---
 arch_init.c             | 25 ++++++++++++++++++++-----
 include/sysemu/sysemu.h |  3 ++-
 migration.c             | 25 +++++++++++++++++++++++--
 savevm.c                | 20 ++++++++++++++++----
 4 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index e47e139..cd0e0b1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -107,6 +107,7 @@ const uint32_t arch_type = QEMU_ARCH;
 static bool mig_throttle_on;
 static int dirty_rate_high_cnt;
 static void check_guest_throttling(void);
+static MigrationParams ram_mig_params;
 
 /***********************************************************/
 /* ram save/restore */
@@ -596,6 +597,11 @@ static void ram_migration_cancel(void *opaque)
     migration_end();
 }
 
+static void ram_set_params(const MigrationParams *params, void *opaque)
+{
+    ram_mig_params.ft = params->ft;
+}
+
 static void reset_ram_globals(void)
 {
     last_seen_block = NULL;
@@ -611,10 +617,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
     int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+    bool create = false;
 
-    migration_bitmap = bitmap_new(ram_pages);
-    bitmap_set(migration_bitmap, 0, ram_pages);
-    migration_dirty_pages = ram_pages;
+    if (!ram_mig_params.ft || !migration_bitmap)  {
+        migration_bitmap = bitmap_new(ram_pages);
+        bitmap_set(migration_bitmap, 0, ram_pages);
+        migration_dirty_pages = ram_pages;
+        create = true;
+    }
     mig_throttle_on = false;
     dirty_rate_high_cnt = 0;
 
@@ -634,7 +644,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
-    reset_ram_globals();
+    if (!ram_mig_params.ft || create) {
+        reset_ram_globals();
+    }
 
     memory_global_dirty_log_start();
     migration_bitmap_sync();
@@ -744,7 +756,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
 
     ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-    migration_end();
+    if (!ram_mig_params.ft) {
+        migration_end();
+    }
 
     qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
@@ -970,6 +984,7 @@ SaveVMHandlers savevm_ram_handlers = {
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
+    .set_params = ram_set_params,
 };
 
 struct soundhw {
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b1aa059..dabade4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -77,7 +77,8 @@ bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
-void qemu_savevm_state_complete(QEMUFile *f);
+void qemu_savevm_state_complete(QEMUFile *f,
+                                const MigrationParams *params);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
diff --git a/migration.c b/migration.c
index 8989a51..bf17c63 100644
--- a/migration.c
+++ b/migration.c
@@ -552,6 +552,7 @@ static void *migration_thread(void *opaque)
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    int  time_window = 100;
 
     DPRINTF("beginning savevm\n");
     qemu_savevm_state_begin(s->file, &s->params);
@@ -563,6 +564,8 @@ static void *migration_thread(void *opaque)
 
     while (s->state == MIG_STATE_ACTIVE) {
         int64_t current_time;
+        int64_t time_spent;
+        int64_t migration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
@@ -583,7 +586,7 @@ static void *migration_thread(void *opaque)
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT_MAX);
-                    qemu_savevm_state_complete(s->file);
+                    qemu_savevm_state_complete(s->file, &s->params);
                 }
                 qemu_mutex_unlock_iothread();
 
@@ -592,10 +595,28 @@ static void *migration_thread(void *opaque)
                     break;
                 }
 
-                if (!qemu_file_get_error(s->file)) {
+                if (!qemu_file_get_error(s->file) && !s->params.ft) {
                     migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
                     break;
                 }
+
+                if (s->params.ft) {
+                    if (old_vm_running) {
+                        qemu_mutex_lock_iothread();
+                        vm_start();
+                        qemu_mutex_unlock_iothread();
+
+                        current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                        time_spent = current_time - migration_start_time;
+                        DPRINTF("this migration lasts for %" PRId64 "ms\n",
+                                time_spent);
+                        if (time_spent < time_window) {
+                            g_usleep((time_window - time_spent)*1000);
+                            initial_time += time_window - time_spent;
+                        }
+                    }
+                    qemu_savevm_state_begin(s->file, &s->params);
+                }
             }
         }
 
diff --git a/savevm.c b/savevm.c
index c536aa4..556c0e7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1815,6 +1815,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 }
 
 #define QEMU_VM_FILE_MAGIC           0x5145564d
+#define QEMU_VM_FILE_MAGIC_FT        0x51454654
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
 #define QEMU_VM_FILE_VERSION         0x00000003
 
@@ -1824,6 +1825,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_EOF_MAGIC            0xFEEDCAFE
 
 bool qemu_savevm_state_blocked(Error **errp)
 {
@@ -1851,7 +1853,12 @@ void qemu_savevm_state_begin(QEMUFile *f,
         se->ops->set_params(params, se->opaque);
     }
     
-    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+    if (params->ft) {
+        qemu_put_be32(f, QEMU_VM_FILE_MAGIC_FT);
+    } else {
+        qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+    }
+
     qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
@@ -1930,7 +1937,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
-void qemu_savevm_state_complete(QEMUFile *f)
+void qemu_savevm_state_complete(QEMUFile *f,
+                                const MigrationParams *params)
 {
     SaveStateEntry *se;
     int ret;
@@ -1983,6 +1991,9 @@ void qemu_savevm_state_complete(QEMUFile *f)
     }
 
     qemu_put_byte(f, QEMU_VM_EOF);
+    if (params->ft) {
+        qemu_put_be32(f, QEMU_VM_EOF_MAGIC);
+    }
     qemu_fflush(f);
 }
 
@@ -2021,7 +2032,8 @@ static int qemu_savevm_state(QEMUFile *f)
     int ret;
     MigrationParams params = {
         .blk = 0,
-        .shared = 0
+        .shared = 0,
+        .ft = 0
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
@@ -2040,7 +2052,7 @@ static int qemu_savevm_state(QEMUFile *f)
 
     ret = qemu_file_get_error(f);
     if (ret == 0) {
-        qemu_savevm_state_complete(f);
+        qemu_savevm_state_complete(f, &params);
         ret = qemu_file_get_error(f);
     }
     if (ret != 0) {
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] Curling: the receiver
  2013-09-29 20:14 [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance Jules Wang
                   ` (2 preceding siblings ...)
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 3/4] Curling: the sender Jules Wang
@ 2013-09-29 20:14 ` Jules Wang
  3 siblings, 0 replies; 9+ messages in thread
From: Jules Wang @ 2013-09-29 20:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Jules Wang, owasserm, quintela

The receiver does migration loop until the migration connection is
lost. Then, it is started as a backup.

The receiver does not load vm state once the migration begins.
Instead, it perfetches one whole migration data into a buffer,
then loads vm state from that buffer afterwards.

Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn>
---
 include/migration/qemu-file.h |   1 +
 include/sysemu/sysemu.h       |   2 +
 migration.c                   |  22 ++++--
 savevm.c                      | 158 ++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 173 insertions(+), 10 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 0f757fb..f01ff10 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -92,6 +92,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMUFileGetBufferFunc *get_prefetch_buffer;
 } QEMUFileOps;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index dabade4..0058987 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -82,6 +82,8 @@ void qemu_savevm_state_complete(QEMUFile *f,
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_ft(QEMUFile *f);
+bool is_ft_migration(QEMUFile *f);
 
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
diff --git a/migration.c b/migration.c
index bf17c63..64e7007 100644
--- a/migration.c
+++ b/migration.c
@@ -19,6 +19,7 @@
 #include "monitor/monitor.h"
 #include "migration/qemu-file.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
@@ -101,13 +102,24 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     int ret;
+    int count = 0;
 
-    ret = qemu_loadvm_state(f);
-    qemu_fclose(f);
-    if (ret < 0) {
-        fprintf(stderr, "load of migration failed\n");
-        exit(EXIT_FAILURE);
+    if (is_ft_migration(f)) {
+        while (qemu_loadvm_state_ft(f) >= 0) {
+            count++;
+            DPRINTF("incoming count %d\r", count);
+        }
+        qemu_fclose(f);
+        DPRINTF("ft connection lost, launching self..\n");
+    } else {
+        ret = qemu_loadvm_state(f);
+        qemu_fclose(f);
+        if (ret < 0) {
+            fprintf(stderr, "load of migration failed\n");
+            exit(EXIT_FAILURE);
+        }
     }
+    cpu_synchronize_all_post_init();
     qemu_announce_self();
     DPRINTF("successfully loaded vm state\n");
 
diff --git a/savevm.c b/savevm.c
index 556c0e7..8cb3613 100644
--- a/savevm.c
+++ b/savevm.c
@@ -52,6 +52,8 @@
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3
 
+#define PREFETCH_BUFFER_SIZE 0x010000
+
 static int announce_self_create(uint8_t *buf,
 				uint8_t *mac_addr)
 {
@@ -135,6 +137,10 @@ struct QEMUFile {
     unsigned int iovcnt;
 
     int last_error;
+
+    uint8_t *prefetch_buf;
+    uint64_t prefetch_buf_index;
+    uint64_t prefetch_buf_size;
 };
 
 typedef struct QEMUFileStdio
@@ -193,6 +199,25 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     return len;
 }
 
+static int socket_get_prefetch_buffer(void *opaque, uint8_t *buf,
+                                      int64_t pos, int size)
+{
+    QEMUFile *f = opaque;
+
+    if (f->prefetch_buf_size - pos <= 0) {
+        return 0;
+    }
+
+    if (f->prefetch_buf_size - pos < size) {
+        size = f->prefetch_buf_size - pos;
+    }
+
+    memcpy(buf, f->prefetch_buf + pos, size);
+
+    return size;
+}
+
+
 static int socket_close(void *opaque)
 {
     QEMUFileSocket *s = opaque;
@@ -440,6 +465,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 static const QEMUFileOps socket_read_ops = {
     .get_fd =     socket_get_fd,
     .get_buffer = socket_get_buffer,
+    .get_prefetch_buffer = socket_get_prefetch_buffer,
     .close =      socket_close
 };
 
@@ -739,6 +765,8 @@ int qemu_fclose(QEMUFile *f)
     if (f->last_error) {
         ret = f->last_error;
     }
+
+    g_free(f->prefetch_buf);
     g_free(f);
     return ret;
 }
@@ -822,6 +850,14 @@ void qemu_put_byte(QEMUFile *f, int v)
 
 static void qemu_file_skip(QEMUFile *f, int size)
 {
+    if (f->prefetch_buf_index + size <= f->prefetch_buf_size) {
+        f->prefetch_buf_index += size;
+        return;
+    } else {
+        size -= f->prefetch_buf_size - f->prefetch_buf_index;
+        f->prefetch_buf_index = f->prefetch_buf_size;
+    }
+
     if (f->buf_index + size <= f->buf_size) {
         f->buf_index += size;
     }
@@ -831,6 +867,23 @@ static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
     int pending;
     int index;
+    int done;
+
+    if (f->ops->get_prefetch_buffer) {
+        if (f->prefetch_buf_index + offset < f->prefetch_buf_size) {
+            done = f->ops->get_prefetch_buffer(f,
+                                               buf,
+                                               f->prefetch_buf_index + offset,
+                                               size);
+            if (done == size) {
+                return size;
+            }
+            size -= done;
+            buf  += done;
+        } else {
+            offset -= f->prefetch_buf_size - f->prefetch_buf_index;
+        }
+    }
 
     assert(!qemu_file_is_writable(f));
 
@@ -875,7 +928,15 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 
 static int qemu_peek_byte(QEMUFile *f, int offset)
 {
-    int index = f->buf_index + offset;
+    int index;
+
+    if (f->prefetch_buf_index + offset < f->prefetch_buf_size) {
+        return f->prefetch_buf[f->prefetch_buf_index + offset];
+    } else {
+        offset -= f->prefetch_buf_size - f->prefetch_buf_index;
+    }
+
+    index = f->buf_index + offset;
 
     assert(!qemu_file_is_writable(f));
 
@@ -889,6 +950,16 @@ static int qemu_peek_byte(QEMUFile *f, int offset)
     return f->buf[index];
 }
 
+static unsigned int qemu_peek_be32(QEMUFile *f, int offset)
+{
+    unsigned int v;
+    v = qemu_peek_byte(f, offset) << 24;
+    v |= qemu_peek_byte(f, offset + 1) << 16;
+    v |= qemu_peek_byte(f, offset + 2) << 8;
+    v |= qemu_peek_byte(f, offset + 3);
+    return v;
+}
+
 int qemu_get_byte(QEMUFile *f)
 {
     int result;
@@ -976,7 +1047,6 @@ uint64_t qemu_get_be64(QEMUFile *f)
     return v;
 }
 
-
 /* timer */
 
 void timer_put(QEMUFile *f, QEMUTimer *ts)
@@ -2193,6 +2263,11 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
     }
 }
 
+bool is_ft_migration(QEMUFile *f)
+{
+    return (qemu_peek_be32(f, 0) == QEMU_VM_FILE_MAGIC_FT);
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -2214,8 +2289,9 @@ int qemu_loadvm_state(QEMUFile *f)
     }
 
     v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC)
+    if (v != QEMU_VM_FILE_MAGIC && v != QEMU_VM_FILE_MAGIC_FT) {
         return -EINVAL;
+    }
 
     v = qemu_get_be32(f);
     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
@@ -2302,8 +2378,6 @@ int qemu_loadvm_state(QEMUFile *f)
         }
     }
 
-    cpu_synchronize_all_post_init();
-
     ret = 0;
 
 out:
@@ -2319,6 +2393,79 @@ out:
     return ret;
 }
 
+int qemu_loadvm_state_ft(QEMUFile *f)
+{
+    int ret = 0;
+    int i   = 0;
+    int done = 0;
+    uint64_t size = 0;
+    uint64_t offset = 0;
+    uint8_t *prefetch_buf = NULL;
+    uint8_t *buf = NULL;
+
+    uint64_t max_mem = last_ram_offset() * 1.5;
+    uint64_t eof = htobe64((uint64_t)QEMU_VM_EOF_MAGIC << 32 |
+                                  QEMU_VM_FILE_MAGIC_FT);
+
+    if (!f->ops->get_prefetch_buffer) {
+        fprintf(stderr, "Fault tolerant is not supported by this protocol.\n");
+        return -EINVAL;
+    }
+
+    size = PREFETCH_BUFFER_SIZE;
+    prefetch_buf = g_malloc(size);
+
+    while (true) {
+        if (offset + TARGET_PAGE_SIZE >= size) {
+            if (size*2 > max_mem) {
+                fprintf(stderr, "qemu_loadvm_state_ft: warning:" \
+                       "Prefetch buffer becomes too large.\n" \
+                       "Fault tolerant is unstable when you see this,\n" \
+                       "please increase the bandwidth or increase " \
+                       "the max down time.\n");
+                break;
+            }
+            size = size * 2;
+            buf = g_try_realloc(prefetch_buf, size);
+            if (!buf) {
+                error_report("qemu_loadvm_state_ft: out of memory.\n");
+                g_free(prefetch_buf);
+                return -ENOMEM;
+            }
+
+            prefetch_buf = buf;
+        }
+
+        done = qemu_get_buffer(f, prefetch_buf + offset, TARGET_PAGE_SIZE);
+
+        ret = qemu_file_get_error(f);
+        if (ret != 0) {
+            g_free(prefetch_buf);
+            return ret;
+        }
+
+        buf = prefetch_buf + offset;
+        offset += done;
+        for (i = -7; i < done; i++) {
+            if (memcmp(buf + i, &eof, 8) == 0) {
+                goto out;
+            }
+        }
+    }
+ out:
+    g_free(f->prefetch_buf);
+    f->prefetch_buf_size = offset;
+    f->prefetch_buf_index = 0;
+    f->prefetch_buf = prefetch_buf;
+
+    ret = qemu_loadvm_state(f);
+
+    /* Skip magic number */
+    qemu_get_be32(f);
+
+    return ret;
+}
+
 static BlockDriverState *find_vmstate_bs(void)
 {
     BlockDriverState *bs = NULL;
@@ -2427,6 +2574,7 @@ void do_savevm(Monitor *mon, const QDict *qdict)
         goto the_end;
     }
     ret = qemu_savevm_state(f);
+    cpu_synchronize_all_post_init();
     vm_state_size = qemu_ftell(f);
     qemu_fclose(f);
     if (ret < 0) {
-- 
1.8.0.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
  2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface Jules Wang
@ 2013-09-30 22:16   ` Eric Blake
  2013-10-09  6:49     ` junqing.wang
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2013-09-30 22:16 UTC (permalink / raw)
  To: Jules Wang; +Cc: pbonzini, quintela, qemu-devel, owasserm

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

On 09/29/2013 02:14 PM, Jules Wang wrote:
> Add an option '-f' to migration cmdline.
> Indicating whether to enable fault tolerant or not.
> 
> Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn>
> ---
>          .help       = "migrate to URI (using -d to not wait for completion)"
>  		      "\n\t\t\t -b for migration without shared storage with"
>  		      " full copy of disk\n\t\t\t -i for migration without "
>  		      "shared storage with incremental copy of disk "
> -		      "(base image shared between src and destination)",
> +		      "(base image shared between src and destination)"
> +		      "\n\t\t\t -f for fault tolerant, this is another "
> +		      "feature rather than migrate",

That sounds awkward, and overly long.  Maybe go with just:

-f for fault tolerance mode

and let the user then read the full documentation for what it entails.

> -@item migrate [-d] [-b] [-i] @var{uri}
> +@item migrate [-d] [-b] [-i] [-f] @var{uri}
>  @findex migrate
>  Migrate to @var{uri} (using -d to not wait for completion).
>  	-b for migration with full copy of disk
>  	-i for migration with incremental copy of disk (base image is shared)
> +	-f for fault tolerant

Can -d and -f be used at the same time, or are they exclusive?

> +++ b/hmp.c
> @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>      int detach = qdict_get_try_bool(qdict, "detach", 0);
>      int blk = qdict_get_try_bool(qdict, "blk", 0);
>      int inc = qdict_get_try_bool(qdict, "inc", 0);
> +    int ft  = qdict_get_try_bool(qdict, "ft", 0);

Why two spaces?

> +++ b/qapi-schema.json
> @@ -2420,7 +2420,8 @@
>  # Since: 0.14.0
>  ##
>  { 'command': 'migrate',
> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
> +           '*ft': 'bool' } }

Missing documentation, including mention that the new option was only
made available in 1.7.  We still don't have introspection; is there some
other means by which libvirt and other management apps can tell whether
this feature is available?  Furthermore, 'ft' is an awfully short name;
for QMP, we prefer to use full words where possible, such as
'fault-tolerant'.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
  2013-09-30 22:16   ` Eric Blake
@ 2013-10-09  6:49     ` junqing.wang
  2013-10-09 12:02       ` Eric Blake
  0 siblings, 1 reply; 9+ messages in thread
From: junqing.wang @ 2013-10-09  6:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: pbonzini, quintela, qemu-devel, owasserm

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

At 2013-10-01 06:16:34,"Eric Blake" <eblake@redhat.com> wrote: >On 09/29/2013 02:14 PM, Jules Wang wrote: >> Add an option '-f' to migration cmdline. >> Indicating whether to enable fault tolerant or not. >>  >> Signed-off-by: Jules Wang <junqing.wang@cs2c.com.cn> >> --- >>          .help       = "migrate to URI (using -d to not wait for completion)" >>         "\n\t\t\t -b for migration without shared storage with" >>         " full copy of disk\n\t\t\t -i for migration without " >>         "shared storage with incremental copy of disk " >> -       "(base image shared between src and destination)", >> +       "(base image shared between src and destination)" >> +       "\n\t\t\t -f for fault tolerant, this is another " >> +       "feature rather than migrate", > >That sounds awkward, and overly long.  Maybe go with just: > >-f for fault tolerance mode > >and let the user then read the full documentation for what it entails.

Agree.

>> -@item migrate [-d] [-b] [-i] @var{uri}
>> +@item migrate [-d] [-b] [-i] [-f] @var{uri}
>>  @findex migrate
>>  Migrate to @var{uri} (using -d to not wait for completion).
>>  	-b for migration with full copy of disk
>>  	-i for migration with incremental copy of disk (base image is shared)
>> +	-f for fault tolerant
>
>Can -d and -f be used at the same time, or are they exclusive?

AFAK, The migration is always detached(In the code, the -d option is always false),  -d and -f can be used at the same time with no doubt.
qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!ft, ft, &err);

By the way, neither -b nor -i could be used at the same time with -f,  fault tolerant needs shared storage.

 >> +++ b/hmp.c
>> @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>>      int detach = qdict_get_try_bool(qdict, "detach", 0);
>>      int blk = qdict_get_try_bool(qdict, "blk", 0);
>>      int inc = qdict_get_try_bool(qdict, "inc", 0);
>> +   int ft   = qdict_get_try_bool(qdict, "ft", 0);
>
>Why two spaces?

To align the '=',  I will remove them if you like. 

 >
>> +++ b/qapi-schema.json
>> @@ -2420,7 +2420,8 @@
>>  # Since: 0.14.0
>>  ##
>>  { 'command': 'migrate',
>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
>> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
>> +           '*ft': 'bool' } }
>
>Missing documentation, including mention that the new option was only
>made available in 1.7.  We still don't have introspection; is there some
>other means by which libvirt and other management apps can tell whether
>this feature is available? 

I'm not clear about how to do that, could you pls give me some hints, where to 
add code and documentation. 

>Furthermore, 'ft' is an awfully short name;
>for QMP, we prefer to use full words where possible, such as
>'fault-tolerant'.

Agree.

 >-- 
>Eric Blake   eblake redhat com    +1-919-301-3266
>Libvirt virtualization library http://libvirt.org
>

[-- Attachment #2: Type: text/html, Size: 5651 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
  2013-10-09  6:49     ` junqing.wang
@ 2013-10-09 12:02       ` Eric Blake
  2013-10-10  2:52         ` Jules
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2013-10-09 12:02 UTC (permalink / raw)
  To: junqing.wang; +Cc: pbonzini, quintela, qemu-devel, owasserm

[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]

[your emailer munged the reply, making it a bit hard to read.  Are you
set for plain-text-only mail to the list?]

On 10/09/2013 12:49 AM, junqing.wang@cs2c.com.cn wrote:

>  >> +++ b/hmp.c
>>> @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>>>      int detach = qdict_get_try_bool(qdict, "detach", 0);
>>>      int blk = qdict_get_try_bool(qdict, "blk", 0);
>>>      int inc = qdict_get_try_bool(qdict, "inc", 0);
>>> +   int ft   = qdict_get_try_bool(qdict, "ft", 0);
>>
>> Why two spaces?
> 
> To align the '=',  I will remove them if you like. 

It's not a problem with me either way, other than we have a lot of code
that doesn't care about alignment and consistently uses one space, and a
fair amount of code where everything in a block of code is consistently
aligned.  But your patch was neither, in the context of the block it
lives within - if you're going to align, then line up everything with
the longest line 'int detach' (including blk and inc).

> 
>  >
>>> +++ b/qapi-schema.json
>>> @@ -2420,7 +2420,8 @@
>>>  # Since: 0.14.0
>>>  ##
>>>  { 'command': 'migrate',
>>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
>>> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
>>> +           '*ft': 'bool' } }
>>
>> Missing documentation, including mention that the new option was only
>> made available in 1.7.  We still don't have introspection; is there some
>> other means by which libvirt and other management apps can tell whether
>> this feature is available? 
> 
> I'm not clear about how to do that, could you pls give me some hints, where to 
> add code and documentation. 

As for the documentation, qapi-schema.json has plenty of examples (look
for a field with "(since 1.7)" as a hint for how to document an optional
field added in a later release than the main struct).

As for the introspection, Amos Kong was most recently working on trying
to add that (but missed the 1.6 deadline, and I haven't seen work on it
since).  Introspection is not a hard requirement, but it makes it harder
for libvirt to know if it can use 'ft':true if there is no other
'query-*' command that it can call first that would give it a hint that
this is a new enough qemu to support 'ft' during migration.  Maybe even
having something listed under query-migrate-capabilities would be
sufficient (ie. modify the 'MigrationCapability' enum to advertise a new
capability).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
  2013-10-09 12:02       ` Eric Blake
@ 2013-10-10  2:52         ` Jules
  0 siblings, 0 replies; 9+ messages in thread
From: Jules @ 2013-10-10  2:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: pbonzini, quintela, qemu-devel, owasserm

On Wed, 2013-10-09 at 06:02 -0600, Eric Blake wrote:
> [your emailer munged the reply, making it a bit hard to read.  Are you
> set for plain-text-only mail to the list?]

Thanks VERY much for remind me that, I'm using another client now.

> On 10/09/2013 12:49 AM, junqing.wang@cs2c.com.cn wrote:
> 
> >  >> +++ b/hmp.c
> >>> @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
> >>>      int detach = qdict_get_try_bool(qdict, "detach", 0);
> >>>      int blk = qdict_get_try_bool(qdict, "blk", 0);
> >>>      int inc = qdict_get_try_bool(qdict, "inc", 0);
> >>> +   int ft   = qdict_get_try_bool(qdict, "ft", 0);
> >>
> >> Why two spaces?
> > 
> > To align the '=',  I will remove them if you like. 
> 
> It's not a problem with me either way, other than we have a lot of code
> that doesn't care about alignment and consistently uses one space, and a
> fair amount of code where everything in a block of code is consistently
> aligned.  But your patch was neither, in the context of the block it
> lives within - if you're going to align, then line up everything with
> the longest line 'int detach' (including blk and inc).
> 

oh, I got it, you are right, I missed the longest line 'int detach ...'
I'm not going to align them.

> > 
> >  >
> >>> +++ b/qapi-schema.json
> >>> @@ -2420,7 +2420,8 @@
> >>>  # Since: 0.14.0
> >>>  ##
> >>>  { 'command': 'migrate',
> >>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
> >>> +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
> >>> +           '*ft': 'bool' } }
> >>
> >> Missing documentation, including mention that the new option was only
> >> made available in 1.7.  We still don't have introspection; is there some
> >> other means by which libvirt and other management apps can tell whether
> >> this feature is available? 
> > 
> > I'm not clear about how to do that, could you pls give me some hints, where to 
> > add code and documentation. 
> 
> As for the documentation, qapi-schema.json has plenty of examples (look
> for a field with "(since 1.7)" as a hint for how to document an optional
> field added in a later release than the main struct).

I see. Thanks.
> 
> As for the introspection, Amos Kong was most recently working on trying
> to add that (but missed the 1.6 deadline, and I haven't seen work on it
> since).  Introspection is not a hard requirement, but it makes it harder
> for libvirt to know if it can use 'ft':true if there is no other
> 'query-*' command that it can call first that would give it a hint that
> this is a new enough qemu to support 'ft' during migration.  Maybe even
> having something listed under query-migrate-capabilities would be
> sufficient (ie. modify the 'MigrationCapability' enum to advertise a new
> capability).

Adding a new migration capability is a work-around method. we turn on ft
by using the -f option instead of setting fault-tolerant-capability to
true. I hesitate to add it.

What about adding a query for the options of migration similar to
@query-command-line-options?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-10  2:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-29 20:14 [Qemu-devel] [PATCH v2 0/4] Curling: KVM Fault Tolerance Jules Wang
2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 1/4] Curling: add doc Jules Wang
2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface Jules Wang
2013-09-30 22:16   ` Eric Blake
2013-10-09  6:49     ` junqing.wang
2013-10-09 12:02       ` Eric Blake
2013-10-10  2:52         ` Jules
2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 3/4] Curling: the sender Jules Wang
2013-09-29 20:14 ` [Qemu-devel] [PATCH v2 4/4] Curling: the receiver Jules Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.