All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2016-02-22  2:39 zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (38 more replies)
  0 siblings, 39 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, root, arei.gonglei,
	stefanha, amit.shah, zhangchen.fnst, hongyang.yang

From: root <root@localhost.localdomain>

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.

Patch status:
Unreviewed: patch 21,27,28,29,33,38
Updated: patch 31,34,35,37

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v15:
 - Go on the shutdown process if encounter error while sending shutdown
   message to SVM. (patch 24)
 - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
   some useless comment. (patch 31, Jason)
 - Call object_new_with_props() directly to add filter in
   colo_add_buffer_filter. (patch 34, Jason)
 - Re-implement colo_set_filter_status() based on COLOBufferFilters
   list. (patch 35)
 - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
   list. (patch 37) 
v14:
 - Re-implement the network processing based on netfilter (Jason Wang)
 - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
 - Split two new patches (patch 27/28) from patch 29
 - Fix some other comments from Dave and Markus.

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

zhanghailiang (38):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Introduce two helper functions for save/find loadvm_handlers
    entry
  migration/savevm: Add new helpers to process the different stages of
    loadvm
  migration/savevm: Export two helper functions for savevm process
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter: Add a 'status' property for filter object
  filter-buffer: Accept zero interval
  net: Add notifier/callback for netdev init
  COLO/filter: add each netdev a buffer filter
  COLO: manage the status of buffer filters for PVM
  filter-buffer: make filter_buffer_flush() public
  COLO: flush buffered packets in checkpoint process or exit COLO
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  16 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   1 +
 include/migration/colo.h      |  42 ++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  16 +
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |   5 +
 include/net/net.h             |   4 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  76 ++++
 migration/colo-failover.c     |  83 ++++
 migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c         | 109 +++++-
 migration/qemu-file-buf.c     |  61 +++
 migration/ram.c               | 175 ++++++++-
 migration/savevm.c            | 114 ++++--
 net/filter-buffer.c           |  14 +-
 net/filter.c                  |  40 ++
 net/net.c                     |  33 ++
 qapi-schema.json              | 104 ++++-
 qapi/event.json               |  15 +
 qemu-options.hx               |   4 +-
 qmp-commands.hx               |  23 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  54 +++
 trace-events                  |   8 +
 vl.c                          |  31 +-
 32 files changed, 1908 insertions(+), 76 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
@ 2016-02-22  2:39 ` zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is On by default.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Turn COLO on in default (Eric's suggestion)
---
 configure | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/configure b/configure
index 0aa249b..3b89fe2 100755
--- a/configure
+++ b/configure
@@ -229,6 +229,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -911,6 +912,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1334,6 +1339,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   fdt             fdt device tree
   bluez           bluez stack connectivity
   kvm             KVM acceleration support
+  colo            COarse-grain LOck-stepping VM for Non-stop Service
   rdma            RDMA-based migration support
   uuid            uuid support
   vde             support for vde network
@@ -4725,6 +4731,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "COLO support      $colo"
 echo "RDMA support      $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
@@ -5320,6 +5327,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
@ 2016-02-22  2:39 ` zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node zhanghailiang
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v14:
- Fix the date of Copyright to 2016
v10:
- Rename capability 'colo' to experimental 'x-colo' (Eric's suggestion).
- Rename migrate_enable_colo() to migrate_colo_enabled() (Eric's suggestion).
---
 include/migration/colo.h      | 20 ++++++++++++++++++++
 include/migration/migration.h |  1 +
 migration/Makefile.objs       |  1 +
 migration/colo.c              | 18 ++++++++++++++++++
 migration/migration.c         | 18 ++++++++++++++++++
 qapi-schema.json              |  6 +++++-
 qmp-commands.hx               |  1 +
 stubs/Makefile.objs           |  1 +
 stubs/migration-colo.c        | 18 ++++++++++++++++++
 9 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
new file mode 100644
index 0000000..59a632a
--- /dev/null
+++ b/include/migration/colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_COLO_H
+#define QEMU_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 74684ad..c962ad4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -271,6 +271,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 0cac6d7..65ecc35 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 0000000..cb3e22d
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+    return true;
+}
diff --git a/migration/migration.c b/migration/migration.c
index a64cfcd..68b5019 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -34,6 +34,7 @@
 #include "qom/cpu.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "migration/colo.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -485,6 +486,9 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 
     caps = NULL; /* silence compiler warning */
     for (i = 0; i < MIGRATION_CAPABILITY__MAX; i++) {
+        if (i == MIGRATION_CAPABILITY_X_COLO && !colo_supported()) {
+            continue;
+        }
         if (head == NULL) {
             head = g_malloc0(sizeof(*caps));
             caps = head;
@@ -684,6 +688,14 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 
     for (cap = params; cap; cap = cap->next) {
+        if (cap->value->capability == MIGRATION_CAPABILITY_X_COLO) {
+            if (!colo_supported()) {
+                error_setg(errp, "COLO is not currently supported, please"
+                             " configure with --enable-colo option in order to"
+                             " support COLO feature");
+                continue;
+            }
+        }
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
 
@@ -1592,6 +1604,12 @@ fail:
                       MIGRATION_STATUS_FAILED);
 }
 
+bool migrate_colo_enabled(void)
+{
+    MigrationState *s = migrate_get_current();
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
diff --git a/qapi-schema.json b/qapi-schema.json
index 8d04897..aafa9f7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -544,11 +544,15 @@
 #          been migrated, pulling the remaining pages along as needed. NOTE: If
 #          the migration fails during postcopy the VM will fail.  (since 2.5)
 #
+# @x-colo: If enabled, migration will never end, and the state of the VM on the
+#        primary side will be migrated continuously to the VM on secondary
+#        side. (since 2.6)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-           'compress', 'events', 'x-postcopy-ram'] }
+           'compress', 'events', 'x-postcopy-ram', 'x-colo'] }
 
 ##
 # @MigrationCapabilityStatus
diff --git a/qmp-commands.hx b/qmp-commands.hx
index f9824f7..bf27c38 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3680,6 +3680,7 @@ Query current migration capabilities
          - "compress": Multiple compression threads state (json-bool)
          - "events": Migration state change event state (json-bool)
          - "x-postcopy-ram": postcopy ram state (json-bool)
+         - "x-colo" : COarse-Grain LOck Stepping for Non-stop Service (json-bool)
 
 Arguments:
 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index e922de9..459607c 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -39,3 +39,4 @@ stub-obj-y += qmp_pc_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += target-get-monitor-def.o
 stub-obj-y += vhost.o
+stub-obj-y += migration-colo.o
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
new file mode 100644
index 0000000..7b7aee0
--- /dev/null
+++ b/stubs/migration-colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+    return false;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
@ 2016-02-22  2:39 ` zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We can know if VM in destination should go into COLO mode by refer to
the info that been migrated from PVM.

We skip this section if colo is not enabled (i.e.
migrate_set_capability colo off), so that, It not break compatibility with migration
however the --enable-colo/disable-colo on the source/destination;

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v14:
- Adjust the place of calling colo_info_init()
v11:
- Add Reviewed-by tag
v10:
- Use VMSTATE_BOOL instead of VMSTATE_UNIT32 for 'colo_requested' (Dave's suggestion)
---
 include/migration/colo.h |  2 ++
 migration/Makefile.objs  |  1 +
 migration/colo-comm.c    | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                     |  4 +++-
 4 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 59a632a..1c899a0 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -14,7 +14,9 @@
 #define QEMU_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 65ecc35..81b5713 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 0000000..723d86d
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,50 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <migration/colo.h>
+#include "trace.h"
+
+typedef struct {
+     bool colo_requested;
+} COLOInfo;
+
+static COLOInfo colo_info;
+
+static void colo_info_pre_save(void *opaque)
+{
+    COLOInfo *s = opaque;
+
+    s->colo_requested = migrate_colo_enabled();
+}
+
+static bool colo_info_need(void *opaque)
+{
+   return migrate_colo_enabled();
+}
+
+static const VMStateDescription colo_state = {
+     .name = "COLOState",
+     .version_id = 1,
+     .minimum_version_id = 1,
+     .pre_save = colo_info_pre_save,
+     .needed = colo_info_need,
+     .fields = (VMStateField[]) {
+         VMSTATE_BOOL(colo_requested, COLOInfo),
+         VMSTATE_END_OF_LIST()
+        },
+};
+
+void colo_info_init(void)
+{
+    vmstate_register(NULL, 0, &colo_state, &colo_info);
+}
diff --git a/vl.c b/vl.c
index b87e292..f35703f 100644
--- a/vl.c
+++ b/vl.c
@@ -85,6 +85,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4394,6 +4395,8 @@ int main(int argc, char **argv, char **envp)
     /* clean up network at qemu process termination */
     atexit(&net_cleanup);
 
+    colo_info_init();
+
     if (net_init_clients() < 0) {
         exit(1);
     }
@@ -4425,7 +4428,6 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
-
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
     if (machine_class->units_per_default_bus) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (2 preceding siblings ...)
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node zhanghailiang
@ 2016-02-22  2:39 ` zhanghailiang
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.

We reuse migration thread, so if colo is enabled by user, migration thread will
go into the process of colo.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Rebase to master
- Add Reviewed-by tag
v10:
- Simplify process by dropping colo thread and reusing migration thread.
     (Dave's suggestion)
---
 include/migration/colo.h |  3 +++
 migration/colo.c         | 31 +++++++++++++++++++++++++++++++
 migration/migration.c    | 30 ++++++++++++++++++++++++++----
 qapi-schema.json         |  4 +++-
 stubs/migration-colo.c   |  9 +++++++++
 trace-events             |  3 +++
 6 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 1c899a0..bf84b99 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_init(void);
 
+void migrate_start_colo_process(MigrationState *s);
+bool migration_in_colo_state(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index cb3e22d..8d0d851 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,9 +10,40 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "trace.h"
 
 bool colo_supported(void)
 {
     return true;
 }
+
+bool migration_in_colo_state(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void colo_process_checkpoint(MigrationState *s)
+{
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("stop", "run");
+
+    /*TODO: COLO checkpoint savevm loop*/
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+    qemu_mutex_unlock_iothread();
+    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COLO);
+    colo_process_checkpoint(s);
+    qemu_mutex_lock_iothread();
+}
diff --git a/migration/migration.c b/migration/migration.c
index 68b5019..d7228f5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -641,6 +641,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        /* TODO: display COLO specific information (checkpoint info etc.) */
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -1001,7 +1005,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.shared = has_inc && inc;
 
     if (migration_is_setup_or_active(s->state) ||
-        s->state == MIGRATION_STATUS_CANCELLING) {
+        s->state == MIGRATION_STATUS_CANCELLING ||
+        s->state == MIGRATION_STATUS_COLO) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -1595,8 +1600,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
         goto fail;
     }
 
-    migrate_set_state(&s->state, current_active_state,
-                      MIGRATION_STATUS_COMPLETED);
+    if (!migrate_colo_enabled()) {
+        migrate_set_state(&s->state, current_active_state,
+                          MIGRATION_STATUS_COMPLETED);
+    }
+
     return;
 
 fail:
@@ -1628,6 +1636,7 @@ static void *migration_thread(void *opaque)
     bool entered_postcopy = false;
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
+    bool enable_colo = migrate_colo_enabled();
 
     rcu_register_thread();
 
@@ -1736,7 +1745,11 @@ static void *migration_thread(void *opaque)
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     qemu_mutex_lock_iothread();
-    qemu_savevm_state_cleanup();
+    /* The resource has been allocated by migration will be reused in COLO
+      process, so don't release them. */
+    if (!enable_colo) {
+        qemu_savevm_state_cleanup();
+    }
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
         s->total_time = end_time - s->total_time;
@@ -1749,6 +1762,15 @@ static void *migration_thread(void *opaque)
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
+        if (s->state == MIGRATION_STATUS_ACTIVE && enable_colo) {
+            migrate_start_colo_process(s);
+            qemu_savevm_state_cleanup();
+            /*
+            * Fixme: we will run VM in COLO no matter its old running state.
+            * After exited COLO, we will keep running.
+            */
+            old_vm_running = true;
+        }
         if (old_vm_running && !entered_postcopy) {
             vm_start();
         }
diff --git a/qapi-schema.json b/qapi-schema.json
index aafa9f7..26a1d37 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -434,12 +434,14 @@
 #
 # @failed: some error occurred during migration process.
 #
+# @colo: VM is in the process of fault tolerance. (since 2.6)
+#
 # Since: 2.3
 #
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 7b7aee0..41fed49 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -16,3 +16,12 @@ bool colo_supported(void)
 {
     return false;
 }
+
+bool migration_in_colo_state(void)
+{
+    return false;
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+}
diff --git a/trace-events b/trace-events
index f986c81..53714db 100644
--- a/trace-events
+++ b/trace-events
@@ -1603,6 +1603,9 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 
+# migration/colo.c
+colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (3 preceding siblings ...)
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
@ 2016-02-22  2:39 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.
We add three new members to struct MigrationIncomingState, 'have_colo_incoming_thread'
and 'colo_incoming_thread' record the colo related threads for secondary VM,
'migration_incoming_co' records the original migration incoming coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- We moved the place of bdrv_invalidate_cache_all(), but done the deleting work
  in other patch. Fix it.
- Add documentation for colo in 'MigrationStatus' (Eric's review comment)
v10:
- fix a bug about fd leak which is found by Dave.
---
 include/migration/colo.h      |  7 +++++++
 include/migration/migration.h |  7 +++++++
 migration/colo-comm.c         | 10 ++++++++++
 migration/colo.c              | 22 ++++++++++++++++++++++
 migration/migration.c         | 31 +++++++++++++++++++++----------
 stubs/migration-colo.c        | 10 ++++++++++
 6 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index bf84b99..b40676c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -15,6 +15,8 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "qemu/coroutine_int.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -22,4 +24,9 @@ void colo_info_init(void);
 void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
+/* loadvm */
+bool migration_incoming_enable_colo(void);
+void migration_incoming_exit_colo(void);
+void *colo_process_incoming_thread(void *opaque);
+bool migration_incoming_in_colo_state(void);
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index c962ad4..e7a516c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -22,6 +22,7 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 #include "exec/cpu-common.h"
+#include "qemu/coroutine_int.h"
 
 #define QEMU_VM_FILE_MAGIC           0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
@@ -106,6 +107,12 @@ struct MigrationIncomingState {
     void     *postcopy_tmp_page;
 
     int state;
+
+    bool have_colo_incoming_thread;
+    QemuThread colo_incoming_thread;
+    /* The coroutine we should enter (back) after failover */
+    Coroutine *migration_incoming_co;
+
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 723d86d..c36d13f 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -48,3 +48,13 @@ void colo_info_init(void)
 {
     vmstate_register(NULL, 0, &colo_state, &colo_info);
 }
+
+bool migration_incoming_enable_colo(void)
+{
+    return colo_info.colo_requested;
+}
+
+void migration_incoming_exit_colo(void)
+{
+    colo_info.colo_requested = 0;
+}
diff --git a/migration/colo.c b/migration/colo.c
index 8d0d851..20052d9 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
+#include "qemu/error-report.h"
 
 bool colo_supported(void)
 {
@@ -26,6 +27,13 @@ bool migration_in_colo_state(void)
     return (s->state == MIGRATION_STATUS_COLO);
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    return mis && (mis->state == MIGRATION_STATUS_COLO);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     qemu_mutex_lock_iothread();
@@ -47,3 +55,17 @@ void migrate_start_colo_process(MigrationState *s)
     colo_process_checkpoint(s);
     qemu_mutex_lock_iothread();
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COLO);
+
+    /* TODO: COLO checkpoint restore loop */
+
+    migration_incoming_exit_colo();
+
+    return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index d7228f5..6e19c15 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -359,6 +359,27 @@ static void process_incoming_migration_co(void *opaque)
         /* Else if something went wrong then just fall out of the normal exit */
     }
 
+    if (!ret) {
+        /* Make sure all file formats flush their mutable metadata */
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            migrate_decompress_threads_join();
+            exit(EXIT_FAILURE);
+        }
+    }
+    /* we get colo info, and know if we are in colo mode */
+    if (!ret && migration_incoming_enable_colo()) {
+        mis->migration_incoming_co = qemu_coroutine_self();
+        qemu_thread_create(&mis->colo_incoming_thread, "colo incoming",
+             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+        mis->have_colo_incoming_thread = true;
+        qemu_coroutine_yield();
+
+        /* Wait checkpoint incoming thread exit before free resource */
+        qemu_thread_join(&mis->colo_incoming_thread);
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
 
@@ -370,16 +391,6 @@ static void process_incoming_migration_co(void *opaque)
         exit(EXIT_FAILURE);
     }
 
-    /* Make sure all file formats flush their mutable metadata */
-    bdrv_invalidate_cache_all(&local_err);
-    if (local_err) {
-        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                          MIGRATION_STATUS_FAILED);
-        error_report_err(local_err);
-        migrate_decompress_threads_join();
-        exit(EXIT_FAILURE);
-    }
-
     /*
      * This must happen after all error conditions are dealt with and
      * we're sure the VM is going to be running on this host.
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 41fed49..b6f3190 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -22,6 +22,16 @@ bool migration_in_colo_state(void)
     return false;
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+    return false;
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+    return NULL;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (4 preceding siblings ...)
  2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol zhanghailiang
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

This new communication path will be used for returning messages
from destination to source.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Remove useless error report
v12:
- Add Reviewed-by tag
v11:
- Rebase master to use qemu_file_get_return_path() for opening return path
v10:
- fix the the error log (Dave's suggestion).
---
 migration/colo.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 20052d9..43e9890 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -36,6 +36,12 @@ bool migration_incoming_in_colo_state(void)
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
+    if (!s->rp_state.from_dst_file) {
+        error_report("Open QEMUFile from_dst_file failed");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -43,8 +49,13 @@ static void colo_process_checkpoint(MigrationState *s)
 
     /*TODO: COLO checkpoint savevm loop*/
 
+out:
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
+
+    if (s->rp_state.from_dst_file) {
+        qemu_fclose(s->rp_state.from_dst_file);
+    }
 }
 
 void migrate_start_colo_process(MigrationState *s)
@@ -63,8 +74,23 @@ void *colo_process_incoming_thread(void *opaque)
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
+    mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
+    if (!mis->to_src_file) {
+        error_report("colo incoming thread: Open QEMUFile to_src_file failed");
+        goto out;
+    }
+    /* Note: We set the fd to unblocked in migration incoming coroutine,
+    *  But here we are in the colo incoming thread, so it is ok to set the
+    *  fd back to blocked.
+    */
+    qemu_file_set_blocking(mis->from_src_file, true);
+
     /* TODO: COLO checkpoint restore loop */
 
+out:
+    if (mis->to_src_file) {
+        qemu_fclose(mis->to_src_file);
+    }
     migration_incoming_exit_colo();
 
     return NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (5 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points:

                   Primary               Secondary
                                            initial work
'checkpoint-ready'    <-------------------- @

'checkpoint-request'  @ -------------------->
                                            Suspend (Only in hybrid mode)
'checkpoint-reply'    <-------------------- @
                      Suspend&Save state
'vmstate-send'        @ -------------------->
                      Send state            Receive state
'vmstate-received'    <-------------------- @
                      Release packets       Load state
'vmstate-load'        <-------------------- @
                      Resume                Resume (Only in hybrid mode)

                      Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v14:
- Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
- Add Reviewd-by tag
v13:
- Refactor colo command related helper functions, use 'Error **errp' parameter
  instead of return value to indicate success or failure.
- Fix some other comments from Markus.

v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'
---
 migration/colo.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json |  25 +++++++
 trace-events     |   2 +
 3 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 43e9890..c0ff088 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include <unistd.h>
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -34,22 +35,147 @@ bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
+                         Error **errp)
+{
+    int ret;
+
+    if (cmd >= COLO_MESSAGE__MAX) {
+        error_setg(errp, "%s: Invalid cmd", __func__);
+        return;
+    }
+    qemu_put_be32(f, cmd);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Can't put COLO command");
+    }
+    trace_colo_put_cmd(COLOMessage_lookup[cmd]);
+}
+
+static COLOMessage colo_get_cmd(QEMUFile *f, Error **errp)
+{
+    COLOMessage cmd;
+    int ret;
+
+    cmd = qemu_get_be32(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Can't get COLO command");
+        return cmd;
+    }
+    if (cmd >= COLO_MESSAGE__MAX) {
+        error_setg(errp, "%s: Invalid cmd", __func__);
+        return cmd;
+    }
+    trace_colo_get_cmd(COLOMessage_lookup[cmd]);
+    return cmd;
+}
+
+static void colo_get_check_cmd(QEMUFile *f, COLOMessage expect_cmd,
+                               Error **errp)
+{
+    COLOMessage cmd;
+    Error *local_err = NULL;
+
+    cmd = colo_get_cmd(f, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    if (cmd != expect_cmd) {
+        error_setg(errp, "Unexpected COLO command %d, expected %d",
+                          expect_cmd, cmd);
+    }
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+    Error *local_err = NULL;
+
+    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
+                 &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_CHECKPOINT_REPLY, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to Secondary */
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_VMSTATE_RECEIVED, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_VMSTATE_LOADED, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: resume Primary */
+
+    return 0;
+out:
+    if (local_err) {
+        error_report_err(local_err);
+    }
+    return -EINVAL;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
+    Error *local_err = NULL;
+    int ret;
+
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         error_report("Open QEMUFile from_dst_file failed");
         goto out;
     }
 
+    /*
+     * Wait for Secondary finish loading vm states and enter COLO
+     * restore.
+     */
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_CHECKPOINT_READY, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
-    /*TODO: COLO checkpoint savevm loop*/
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        ret = colo_do_checkpoint_transaction(s);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
+    /* Throw the unreported error message after exited from loop */
+    if (local_err) {
+        error_report_err(local_err);
+    }
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
@@ -67,9 +193,33 @@ void migrate_start_colo_process(MigrationState *s)
     qemu_mutex_lock_iothread();
 }
 
+static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
+                                 Error **errp)
+{
+    COLOMessage cmd;
+    Error *local_err = NULL;
+
+    cmd = colo_get_cmd(f, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    switch (cmd) {
+    case COLO_MESSAGE_CHECKPOINT_REQUEST:
+        *checkpoint_request = 1;
+        break;
+    default:
+        *checkpoint_request = 0;
+        error_setg(errp, "Got unknown COLO command: %d", cmd);
+        break;
+    }
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    Error *local_err = NULL;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -85,9 +235,56 @@ void *colo_process_incoming_thread(void *opaque)
     */
     qemu_file_set_blocking(mis->from_src_file, true);
 
-    /* TODO: COLO checkpoint restore loop */
+    colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
+                 &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    while (mis->state == MIGRATION_STATUS_COLO) {
+        int request;
+
+        colo_wait_handle_cmd(mis->from_src_file, &request, &local_err);
+        if (local_err) {
+            goto out;
+        }
+        assert(request);
+        /* FIXME: This is unnecessary for periodic checkpoint mode */
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        colo_get_check_cmd(mis->from_src_file,
+                           COLO_MESSAGE_VMSTATE_SEND, &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        /* TODO: read migration data into colo buffer */
+
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        /* TODO: load vm state */
+
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+    }
 
 out:
+    /* Throw the unreported error message after exited from loop */
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index 26a1d37..29afbb9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -718,6 +718,31 @@
 { 'command': 'migrate-start-postcopy' }
 
 ##
+# @COLOMessage
+#
+# The message transmission between PVM and SVM
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-size: The total size of VMstate.
+#
+# @vmstate-received: VM's state has been received by SVM.
+#
+# @vmstate-loaded: VM's state has been loaded by SVM.
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOMessage',
+  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
+            'vmstate-send', 'vmstate-size', 'vmstate-received',
+            'vmstate-loaded' ] }
+
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/trace-events b/trace-events
index 53714db..97807cd 100644
--- a/trace-events
+++ b/trace-events
@@ -1605,6 +1605,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
 
 # migration/colo.c
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+colo_put_cmd(const char *msg) "Send '%s' cmd"
+colo_get_cmd(const char *msg) "Receive '%s' cmd"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (6 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi-schema.json | 5 ++++-
 vl.c             | 8 ++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 29afbb9..935870d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -154,12 +154,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.6)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked' ] }
+            'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index f35703f..1cde195 100644
--- a/vl.c
+++ b/vl.c
@@ -593,6 +593,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH },
     { RUN_STATE_INMIGRATE, RUN_STATE_POSTMIGRATE },
+    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -605,6 +606,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
     { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_PAUSED, RUN_STATE_COLO},
 
     { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -617,10 +619,13 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
+    { RUN_STATE_COLO, RUN_STATE_RUNNING },
+
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
     { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
     { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -631,6 +636,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
     { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
     { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+    { RUN_STATE_RUNNING, RUN_STATE_COLO},
 
     { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -643,10 +649,12 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_WATCHDOG, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (7 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- size_t'ify these two help functions (Dave's suggestion)
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c     | 61 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index b5d08d2..ca6a582 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -150,7 +150,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
                        uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
                      off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size);
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 7b8e78e..7801780 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -367,6 +367,67 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
     return count;
 }
 
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size)
+{
+    size_t l;
+    int i;
+
+    for (i = 0; i < qsb->n_iov && size > 0; i++) {
+        l = MIN(qsb->iov[i].iov_len, size);
+        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+        size -= l;
+    }
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size)
+{
+    ssize_t rc = qsb_grow(qsb, size);
+    ssize_t pending = size;
+    int i;
+    uint8_t *buf = NULL;
+
+    qsb->used = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+        size_t doneone = 0;
+        /* read until iov full */
+        while (doneone < qsb->iov[i].iov_len && pending > 0) {
+            size_t readone = 0;
+
+            buf = qsb->iov[i].iov_base;
+            readone = qemu_get_buffer(f, buf,
+                                MIN(qsb->iov[i].iov_len - doneone, pending));
+            if (readone == 0) {
+                return qsb->used;
+            }
+            buf += readone;
+            doneone += readone;
+            pending -= readone;
+            qsb->used += readone;
+        }
+    }
+    return qsb->used;
+}
+
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (8 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

The main process of checkpoint is to synchronize SVM with PVM.
VM's state includes ram and device state. So we will migrate PVM's
state to SVM when do checkpoint, just like migration does.

We will cache PVM's state in slave, we use QEMUSizedBuffer
to store the data, we need to know the size of VM state, so in master,
we use qsb to store VM state temporarily, get the data size by call qsb_get_length()
and then migrate the data to the qsb in the secondary side.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Refactor colo_put_cmd_value() to use 'Error **errp' to indicate success
  or failure.
v12:
- Replace the old colo_ctl_get() with the new helper function colo_put_cmd_value()
v11:
- Add Reviewed-by tag
---
 migration/colo.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 migration/ram.c  | 39 ++++++++++++++++++------
 2 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c0ff088..7e4692c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,9 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
 bool colo_supported(void)
 {
     return true;
@@ -54,6 +57,27 @@ static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
     trace_colo_put_cmd(COLOMessage_lookup[cmd]);
 }
 
+static void colo_put_cmd_value(QEMUFile *f, COLOMessage cmd,
+                               uint64_t value, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    colo_put_cmd(f, cmd, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    qemu_put_be64(f, value);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to send value for command:%s",
+                         COLOMessage_lookup[cmd]);
+    }
+}
+
 static COLOMessage colo_get_cmd(QEMUFile *f, Error **errp)
 {
     COLOMessage cmd;
@@ -90,9 +114,13 @@ static void colo_get_check_cmd(QEMUFile *f, COLOMessage expect_cmd,
     }
 }
 
-static int colo_do_checkpoint_transaction(MigrationState *s)
+static int colo_do_checkpoint_transaction(MigrationState *s,
+                                          QEMUSizedBuffer *buffer)
 {
+    QEMUFile *trans = NULL;
+    size_t size;
     Error *local_err = NULL;
+    int ret = -1;
 
     colo_put_cmd(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
                  &local_err);
@@ -105,15 +133,48 @@ static int colo_do_checkpoint_transaction(MigrationState *s)
     if (local_err) {
         goto out;
     }
+    /* Reset colo buffer and open it for write */
+    qsb_set_length(buffer, 0);
+    trans = qemu_bufopen("w", buffer);
+    if (!trans) {
+        error_report("Open colo buffer for write failed");
+        goto out;
+    }
 
-    /* TODO: suspend and save vm state to colo buffer */
+    qemu_mutex_lock_iothread();
+    vm_stop_force_state(RUN_STATE_COLO);
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("run", "stop");
+
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_header(trans);
+    qemu_savevm_state_begin(trans, &s->params);
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_complete_precopy(trans, false);
+    qemu_mutex_unlock_iothread();
+
+    qemu_fflush(trans);
 
     colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
     if (local_err) {
         goto out;
     }
+    /* we send the total size of the vmstate first */
+    size = qsb_get_length(buffer);
+    colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
+                       size, &local_err);
+    if (local_err) {
+        goto out;
+    }
 
-    /* TODO: send vmstate to Secondary */
+    qsb_put_buffer(s->to_dst_file, buffer, size);
+    qemu_fflush(s->to_dst_file);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        goto out;
+    }
 
     colo_get_check_cmd(s->rp_state.from_dst_file,
                        COLO_MESSAGE_VMSTATE_RECEIVED, &local_err);
@@ -127,18 +188,26 @@ static int colo_do_checkpoint_transaction(MigrationState *s)
         goto out;
     }
 
-    /* TODO: resume Primary */
+    ret = 0;
+    /* Resume primary guest */
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("stop", "run");
 
-    return 0;
 out:
     if (local_err) {
         error_report_err(local_err);
     }
-    return -EINVAL;
+    if (trans) {
+        qemu_fclose(trans);
+    }
+    return ret;
 }
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+    QEMUSizedBuffer *buffer = NULL;
     Error *local_err = NULL;
     int ret;
 
@@ -158,6 +227,12 @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -165,7 +240,7 @@ static void colo_process_checkpoint(MigrationState *s)
 
     while (s->state == MIGRATION_STATUS_COLO) {
         /* start a colo checkpoint */
-        ret = colo_do_checkpoint_transaction(s);
+        ret = colo_do_checkpoint_transaction(s, buffer);
         if (ret < 0) {
             goto out;
         }
@@ -179,6 +254,9 @@ out:
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    qsb_free(buffer);
+    buffer = NULL;
+
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
diff --git a/migration/ram.c b/migration/ram.c
index 704f6a9..627ffea 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "migration/colo.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -1873,16 +1874,8 @@ err:
     return ret;
 }
 
-
-/* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
- * long-running RCU critical section.  When rcu-reclaims in the code
- * start to become numerous it will be necessary to reduce the
- * granularity of these critical sections.
- */
-
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_init_globals(void)
 {
-    RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
     dirty_rate_high_cnt = 0;
@@ -1948,6 +1941,31 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
+    rcu_read_unlock();
+
+    return 0;
+}
+
+/* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
+ * long-running RCU critical section.  When rcu-reclaims in the code
+ * start to become numerous it will be necessary to reduce the
+ * granularity of these critical sections.
+ */
+
+static int ram_save_setup(QEMUFile *f, void *opaque)
+{
+    RAMBlock *block;
+
+    /*
+     * migration has already setup the bitmap, reuse it.
+     */
+    if (!migration_in_colo_state()) {
+        if (ram_save_init_globals() < 0) {
+            return -1;
+         }
+    }
+
+    rcu_read_lock();
 
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
@@ -2049,7 +2067,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     while (true) {
         int pages;
 
-        pages = ram_find_and_save_block(f, true, &bytes_transferred);
+        pages = ram_find_and_save_block(f, !migration_in_colo_state(),
+                                        &bytes_transferred);
         /* no more blocks to sent */
         if (pages == 0) {
             break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (9 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Fix minor error in error_report (Dave's comment)
- Add Reviewed-by tag
v11:
- Rename 'host_cache' to 'colo_cache' (Dave's suggestion)
v10:
- Split the process of dirty pages recording into a new patch
---
 include/exec/ram_addr.h       |  1 +
 include/migration/migration.h |  4 +++
 migration/colo.c              | 11 +++++++
 migration/ram.c               | 73 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 5d33def..53c1f48 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -26,6 +26,7 @@ struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
     uint8_t *host;
+    uint8_t *colo_cache; /* For colo, VM's ram cache */
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index e7a516c..6907986 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -332,4 +332,8 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 7e4692c..57a1132 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -298,6 +298,7 @@ void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
     Error *local_err = NULL;
+    int ret;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -313,6 +314,12 @@ void *colo_process_incoming_thread(void *opaque)
     */
     qemu_file_set_blocking(mis->from_src_file, true);
 
+    ret = colo_init_ram_cache();
+    if (ret < 0) {
+        error_report("Failed to initialize ram cache");
+        goto out;
+    }
+
     colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                  &local_err);
     if (local_err) {
@@ -363,6 +370,10 @@ out:
         error_report_err(local_err);
     }
 
+    qemu_mutex_lock_iothread();
+    colo_release_ram_cache();
+    qemu_mutex_unlock_iothread();
+
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
diff --git a/migration/ram.c b/migration/ram.c
index 627ffea..027c5bc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -224,6 +224,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static QemuMutex migration_bitmap_mutex;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -2191,6 +2192,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
     return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+                                                 ram_addr_t offset)
+{
+    if (!offset_in_ramblock(block, offset)) {
+        return NULL;
+    }
+    if (!block->colo_cache) {
+        error_report("%s: colo_cache is NULL in block :%s",
+                     __func__, block->idstr);
+        return NULL;
+    }
+    return block->colo_cache + offset;
+}
+
 /*
  * If a page (or a whole RDMA chunk) has been
  * determined to be zero, then zap it.
@@ -2467,7 +2482,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
-            host = host_from_ram_block_offset(block, addr);
+            /* After going into COLO, we should load the Page into colo_cache */
+            if (ram_cache_enable) {
+                host = colo_cache_from_block_offset(block, addr);
+            } else {
+                host = host_from_ram_block_offset(block, addr);
+            }
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -2562,6 +2582,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it will be called after first migration.
+ */
+int colo_init_ram_cache(void)
+{
+    RAMBlock *block;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        block->colo_cache = qemu_anon_ram_alloc(block->used_length, NULL);
+        if (!block->colo_cache) {
+            error_report("%s: Can't alloc memory for COLO cache of block %s,"
+                         "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
+                         block->used_length);
+            goto out_locked;
+        }
+        memcpy(block->colo_cache, block->host, block->used_length);
+    }
+    rcu_read_unlock();
+    ram_cache_enable = true;
+    return 0;
+
+out_locked:
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+
+    rcu_read_unlock();
+    return -errno;
+}
+
+void colo_release_ram_cache(void)
+{
+    RAMBlock *block;
+
+    ram_cache_enable = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+    rcu_read_unlock();
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (10 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it zhanghailiang
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.
We record them by re-using migration dirty bitmap.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Split a new helper function from original
  host_from_stream_offset() (Dave's suggestion)
- Only do recording work in this patch
v10:
- New patch split from v9's patch 13
- Rebase to master to use 'migration_bitmap_rcu'
---
 migration/ram.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 027c5bc..7373df3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2195,6 +2195,9 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
                                                  ram_addr_t offset)
 {
+    unsigned long *bitmap;
+    long k;
+
     if (!offset_in_ramblock(block, offset)) {
         return NULL;
     }
@@ -2203,6 +2206,17 @@ static inline void *colo_cache_from_block_offset(RAMBlock *block,
                      __func__, block->idstr);
         return NULL;
     }
+
+    k = (block->mr->ram_addr + offset) >> TARGET_PAGE_BITS;
+    bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+    /*
+    * During colo checkpoint, we need bitmap of these migrated pages.
+    * It help us to decide which pages in ram cache should be flushed
+    * into VM's RAM later.
+    */
+    if (!test_and_set_bit(k, bitmap)) {
+        migration_dirty_pages++;
+    }
     return block->colo_cache + offset;
 }
 
@@ -2589,6 +2603,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 int colo_init_ram_cache(void)
 {
     RAMBlock *block;
+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
@@ -2603,6 +2618,15 @@ int colo_init_ram_cache(void)
     }
     rcu_read_unlock();
     ram_cache_enable = true;
+    /*
+    * Record the dirty pages that sent by PVM, we use this dirty bitmap together
+    * with to decide which page in cache should be flushed into SVM's RAM. Here
+    * we use the same name 'migration_bitmap_rcu' as for migration.
+    */
+    migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
+    migration_bitmap_rcu->bmap = bitmap_new(ram_cache_pages);
+    migration_dirty_pages = 0;
+
     return 0;
 
 out_locked:
@@ -2620,9 +2644,15 @@ out_locked:
 void colo_release_ram_cache(void)
 {
     RAMBlock *block;
+    struct BitmapRcu *bitmap = migration_bitmap_rcu;
 
     ram_cache_enable = false;
 
+    atomic_rcu_set(&migration_bitmap_rcu, NULL);
+    if (bitmap) {
+        call_rcu(bitmap, migration_bitmap_free, rcu);
+    }
+
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (block->colo_cache) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (11 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We should not destroy the state of SVM (Secondary VM) until we receive the whole
state from the PVM (Primary VM), in case the primary fails in the middle of sending
the state, so, here we cache the device state in Secondary before restore it.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Fix the define of colo_get_cmd_value() to use 'Error **errp' instead of
  return value.
v12:
- Use the new helper colo_get_cmd_value() instead of colo_ctl_get()
---
 migration/colo.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 57a1132..b9f60c7 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -114,6 +114,28 @@ static void colo_get_check_cmd(QEMUFile *f, COLOMessage expect_cmd,
     }
 }
 
+static uint64_t colo_get_cmd_value(QEMUFile *f, uint32_t expect_cmd,
+                                   Error **errp)
+{
+    Error *local_err = NULL;
+    uint64_t value;
+    int ret;
+
+    colo_get_check_cmd(f, expect_cmd, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return 0;
+    }
+
+    value = qemu_get_be64(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to get value for COLO commnd: %s",
+                         COLOMessage_lookup[expect_cmd]);
+    }
+    return value;
+}
+
 static int colo_do_checkpoint_transaction(MigrationState *s,
                                           QEMUSizedBuffer *buffer)
 {
@@ -297,6 +319,10 @@ static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    QEMUFile *fb = NULL;
+    QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
+    uint64_t total_size;
+    uint64_t value;
     Error *local_err = NULL;
     int ret;
 
@@ -320,6 +346,12 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                  &local_err);
     if (local_err) {
@@ -347,7 +379,21 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
-        /* TODO: read migration data into colo buffer */
+        /* read the VM state total size first */
+        value = colo_get_cmd_value(mis->from_src_file,
+                                 COLO_MESSAGE_VMSTATE_SIZE, &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        /* read vm device state into colo buffer */
+        total_size = qsb_fill_buffer(buffer, mis->from_src_file, value);
+        if (total_size != value) {
+            error_report("Got %lu VMState data, less than expected %lu",
+                         total_size, value);
+            ret = -EINVAL;
+            goto out;
+        }
 
         colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
                      &local_err);
@@ -355,13 +401,32 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
-        /* TODO: load vm state */
+        /* open colo buffer for read */
+        fb = qemu_bufopen("r", buffer);
+        if (!fb) {
+            error_report("Can't open colo buffer for read");
+            goto out;
+        }
+
+        qemu_mutex_lock_iothread();
+        qemu_system_reset(VMRESET_SILENT);
+        if (qemu_loadvm_state(fb) < 0) {
+            error_report("COLO: loadvm failed");
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        qemu_mutex_unlock_iothread();
+
+        /* TODO: flush vm state */
 
         colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
                      &local_err);
         if (local_err) {
             goto out;
         }
+
+        qemu_fclose(fb);
+        fb = NULL;
     }
 
 out:
@@ -370,6 +435,11 @@ out:
         error_report_err(local_err);
     }
 
+    if (fb) {
+        qemu_fclose(fb);
+    }
+    qsb_free(buffer);
+
     qemu_mutex_lock_iothread();
     colo_release_ram_cache();
     qemu_mutex_unlock_iothread();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (12 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add a trace point in the end of colo_flush_ram_cache() (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move the place of 'need_flush' (Dave's suggestion)
- Remove unused 'DPRINTF("Flush ram_cache\n")'
v10:
- trace the number of dirty pages that be received.
---
 include/migration/migration.h |  1 +
 migration/colo.c              |  2 --
 migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
 trace-events                  |  2 ++
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 6907986..14b9f3d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -336,4 +336,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_flush_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index b9f60c7..473fb14 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -417,8 +417,6 @@ void *colo_process_incoming_thread(void *opaque)
         }
         qemu_mutex_unlock_iothread();
 
-        /* TODO: flush vm state */
-
         colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
                      &local_err);
         if (local_err) {
diff --git a/migration/ram.c b/migration/ram.c
index 7373df3..891f3b2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2465,6 +2465,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      * be atomic
      */
     bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
+    bool need_flush = false;
 
     seq_iter++;
 
@@ -2499,6 +2500,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             /* After going into COLO, we should load the Page into colo_cache */
             if (ram_cache_enable) {
                 host = colo_cache_from_block_offset(block, addr);
+                need_flush = true;
             } else {
                 host = host_from_ram_block_offset(block, addr);
             }
@@ -2591,6 +2593,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     rcu_read_unlock();
+
+    if (!ret  && ram_cache_enable && need_flush) {
+        colo_flush_ram_cache();
+    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
@@ -2663,6 +2669,38 @@ void colo_release_ram_cache(void)
     rcu_read_unlock();
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+void colo_flush_ram_cache(void)
+{
+    RAMBlock *block = NULL;
+    void *dst_host;
+    void *src_host;
+    ram_addr_t offset = 0;
+
+    trace_colo_flush_ram_cache_begin(migration_dirty_pages);
+    rcu_read_lock();
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    while (block) {
+        ram_addr_t ram_addr_abs;
+        offset = migration_bitmap_find_dirty(block, offset, &ram_addr_abs);
+        migration_bitmap_clear_dirty(ram_addr_abs);
+        if (offset >= block->used_length) {
+            offset = 0;
+            block = QLIST_NEXT_RCU(block, next);
+        } else {
+            dst_host = block->host + offset;
+            src_host = block->colo_cache + offset;
+            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+        }
+    }
+    rcu_read_unlock();
+    trace_colo_flush_ram_cache_end();
+    assert(migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/trace-events b/trace-events
index 97807cd..ee4a2fb 100644
--- a/trace-events
+++ b/trace-events
@@ -1290,6 +1290,8 @@ migration_throttle(void) ""
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (13 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	Luiz Capitulino, hongyang.yang

Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move this patch ahead of the patch where uses 'checkpoint_delay'
 (Dave's suggestion)
v10:
- Fix related qmp command
---
 hmp.c                 |  7 +++++++
 migration/migration.c | 24 +++++++++++++++++++++++-
 qapi-schema.json      | 19 ++++++++++++++++---
 qmp-commands.hx       |  3 ++-
 4 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/hmp.c b/hmp.c
index bfbd667..786954d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -285,6 +285,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: %" PRId64,
             MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
             params->x_cpu_throttle_increment);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
+            params->x_checkpoint_delay);
         monitor_printf(mon, "\n");
     }
 
@@ -1241,6 +1244,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     bool has_decompress_threads = false;
     bool has_x_cpu_throttle_initial = false;
     bool has_x_cpu_throttle_increment = false;
+    bool has_x_checkpoint_delay = false;
     int i;
 
     for (i = 0; i < MIGRATION_PARAMETER__MAX; i++) {
@@ -1260,6 +1264,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                 break;
             case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
                 has_x_cpu_throttle_increment = true;
+            case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
+                has_x_checkpoint_delay = true;
                 break;
             }
             qmp_migrate_set_parameters(has_compress_level, value,
@@ -1267,6 +1273,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                                        has_decompress_threads, value,
                                        has_x_cpu_throttle_initial, value,
                                        has_x_cpu_throttle_increment, value,
+                                       has_x_checkpoint_delay, value,
                                        &err);
             break;
         }
diff --git a/migration/migration.c b/migration/migration.c
index 6e19c15..324dcb6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -57,6 +57,11 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
+/* The delay time (in ms) between two COLO checkpoints
+ * Note: Please change this default value to 10000 when we support hybrid mode.
+ */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -92,6 +97,8 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
         .parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
                 DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
+        .parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
+                DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
     };
 
     if (!once) {
@@ -531,6 +538,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
             s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
     params->x_cpu_throttle_increment =
             s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
+    params->x_checkpoint_delay =
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY];
 
     return params;
 }
@@ -738,7 +747,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                                 bool has_x_cpu_throttle_initial,
                                 int64_t x_cpu_throttle_initial,
                                 bool has_x_cpu_throttle_increment,
-                                int64_t x_cpu_throttle_increment, Error **errp)
+                                int64_t x_cpu_throttle_increment,
+                                bool has_x_checkpoint_delay,
+                                int64_t x_checkpoint_delay,
+                                Error **errp)
 {
     MigrationState *s = migrate_get_current();
 
@@ -773,6 +785,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                    "x_cpu_throttle_increment",
                    "an integer in the range of 1 to 99");
     }
+    if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                    "x_checkpoint_delay",
+                    "is invalid, it should be positive");
+    }
 
     if (has_compress_level) {
         s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
@@ -793,6 +810,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
         s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
                                                     x_cpu_throttle_increment;
     }
+
+    if (has_x_checkpoint_delay) {
+        s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
+                                                    x_checkpoint_delay;
+    }
 }
 
 void qmp_migrate_start_postcopy(Error **errp)
diff --git a/qapi-schema.json b/qapi-schema.json
index 935870d..582407d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -621,11 +621,16 @@
 # @x-cpu-throttle-increment: throttle percentage increase each time
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
+#
+# @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
+#          periodic mode. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
-           'x-cpu-throttle-initial', 'x-cpu-throttle-increment'] }
+           'x-cpu-throttle-initial', 'x-cpu-throttle-increment',
+           'x-checkpoint-delay' ] }
 
 #
 # @migrate-set-parameters
@@ -645,6 +650,9 @@
 # @x-cpu-throttle-increment: throttle percentage increase each time
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
+#
+# @x-checkpoint-delay: the delay time between two checkpoints. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'command': 'migrate-set-parameters',
@@ -652,7 +660,8 @@
             '*compress-threads': 'int',
             '*decompress-threads': 'int',
             '*x-cpu-throttle-initial': 'int',
-            '*x-cpu-throttle-increment': 'int'} }
+            '*x-cpu-throttle-increment': 'int',
+            '*x-checkpoint-delay': 'int' } }
 
 #
 # @MigrationParameters
@@ -671,6 +680,8 @@
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
 #
+# @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'struct': 'MigrationParameters',
@@ -678,7 +689,9 @@
             'compress-threads': 'int',
             'decompress-threads': 'int',
             'x-cpu-throttle-initial': 'int',
-            'x-cpu-throttle-increment': 'int'} }
+            'x-cpu-throttle-increment': 'int',
+            'x-checkpoint-delay': 'int'} }
+
 ##
 # @query-migrate-parameters
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index bf27c38..8e08989 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3718,6 +3718,7 @@ Set migration parameters
                            throttled for auto-converge (json-int)
 - "x-cpu-throttle-increment": set throttle increasing percentage for
                              auto-converge (json-int)
+- "x-checkpoint-delay": set the delay time for periodic checkpoint (json-int)
 
 Arguments:
 
@@ -3731,7 +3732,7 @@ EQMP
     {
         .name       = "migrate-set-parameters",
         .args_type  =
-            "compress-level:i?,compress-threads:i?,decompress-threads:i?,x-cpu-throttle-initial:i?,x-cpu-throttle-increment:i?",
+            "compress-level:i?,compress-threads:i?,decompress-threads:i?,x-cpu-throttle-initial:i?,x-cpu-throttle-increment:i?,x-checkpoint-delay:i?",
         .mhandler.cmd_new = qmp_marshal_migrate_set_parameters,
     },
 SQMP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (14 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

Do checkpoint periodically, the default interval is 200ms.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Fix wrong sleep time for checkpoint period. (Dave's comment)
---
 migration/colo.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 473fb14..ba3b310 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -11,6 +11,7 @@
  */
 
 #include <unistd.h>
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -230,6 +231,7 @@ out:
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
+    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     Error *local_err = NULL;
     int ret;
 
@@ -261,11 +263,21 @@ static void colo_process_checkpoint(MigrationState *s)
     trace_colo_vm_state_change("stop", "run");
 
     while (s->state == MIGRATION_STATUS_COLO) {
+        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+        if (current_time - checkpoint_time <
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+            int64_t delay_ms;
+
+            delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
+                       (current_time - checkpoint_time);
+            g_usleep(delay_ms * 1000);
+        }
         /* start a colo checkpoint */
         ret = colo_do_checkpoint_transaction(s, buffer);
         if (ret < 0) {
             goto out;
         }
+        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     }
 
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (15 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process zhanghailiang
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	Luiz Capitulino, hongyang.yang

We leave users to choose whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use experimental command
'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example, if the command is sent to the PVM, the Primary side will
exit COLO mode and take over operation. If sent to the Secondary, the
secondary will run failover work, then take over server operation to
become the new Primary.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Add Reviewed-by tag
v11:
- Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
- Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
v10:
- Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
---
 hmp-commands.hx              | 15 +++++++++++++++
 hmp.c                        |  8 ++++++++
 hmp.h                        |  1 +
 include/migration/colo.h     |  3 +++
 include/migration/failover.h | 20 ++++++++++++++++++++
 migration/Makefile.objs      |  2 +-
 migration/colo-comm.c        | 11 +++++++++++
 migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
 migration/colo.c             |  1 +
 qapi-schema.json             | 29 +++++++++++++++++++++++++++++
 qmp-commands.hx              | 19 +++++++++++++++++++
 stubs/migration-colo.c       |  8 ++++++++
 12 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb52e4d..a381b0b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
 ETEXI
 
     {
+        .name       = "x_colo_lost_heartbeat",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
+                      "a failover or takeover is needed.",
+        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
+    },
+
+STEXI
+@item x_colo_lost_heartbeat
+@findex x_colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 786954d..531963c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1312,6 +1312,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_x_colo_lost_heartbeat(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index a8c5b5a..864a300 100644
--- a/hmp.h
+++ b/hmp.h
@@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/colo.h b/include/migration/colo.h
index b40676c..e9ac2c3 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
 void migration_incoming_exit_colo(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
+
+COLOMode get_colo_mode(void);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
new file mode 100644
index 0000000..3274735
--- /dev/null
+++ b/include/migration/failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_FAILOVER_H
+#define QEMU_FAILOVER_H
+
+#include "qemu-common.h"
+
+void failover_request_active(Error **errp);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 81b5713..920d1e7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o tcp.o
-common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += colo-comm.o
+common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index c36d13f..3943e94 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -20,6 +20,17 @@ typedef struct {
 
 static COLOInfo colo_info;
 
+COLOMode get_colo_mode(void)
+{
+    if (migration_in_colo_state()) {
+        return COLO_MODE_PRIMARY;
+    } else if (migration_incoming_in_colo_state()) {
+        return COLO_MODE_SECONDARY;
+    } else {
+        return COLO_MODE_UNKNOWN;
+    }
+}
+
 static void colo_info_pre_save(void *opaque)
 {
     COLOInfo *s = opaque;
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
new file mode 100644
index 0000000..3533409
--- /dev/null
+++ b/migration/colo-failover.c
@@ -0,0 +1,41 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+#include "migration/failover.h"
+#include "qmp-commands.h"
+#include "qapi/qmp/qerror.h"
+
+static QEMUBH *failover_bh;
+
+static void colo_failover_bh(void *opaque)
+{
+    qemu_bh_delete(failover_bh);
+    failover_bh = NULL;
+    /*TODO: Do failover work */
+}
+
+void failover_request_active(Error **errp)
+{
+    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
+    qemu_bh_schedule(failover_bh);
+}
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
+        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
+        return;
+    }
+
+    failover_request_active(errp);
+}
diff --git a/migration/colo.c b/migration/colo.c
index ba3b310..1aede64 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,7 @@
 #include "migration/colo.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "migration/failover.h"
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
diff --git a/qapi-schema.json b/qapi-schema.json
index 582407d..73325ed 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -759,6 +759,35 @@
             'vmstate-send', 'vmstate-size', 'vmstate-received',
             'vmstate-loaded' ] }
 
+##
+# @COLOMode
+#
+# The colo mode
+#
+# @unknown: unknown mode
+#
+# @primary: master side
+#
+# @secondary: slave side
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOMode',
+  'data': [ 'unknown', 'primary', 'secondary'] }
+
+##
+# @x-colo-lost-heartbeat
+#
+# Tell qemu that heartbeat is lost, request it to do takeover procedures.
+# If this command is sent to the PVM, the Primary side will exit COLO mode.
+# If sent to the Secondary, the Secondary side will run failover work,
+# then takes over server operation to become the service VM.
+#
+# Since: 2.6
+##
+{ 'command': 'x-colo-lost-heartbeat' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8e08989..5557ea2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -803,6 +803,25 @@ Example:
 EQMP
 
     {
+        .name       = "x-colo-lost-heartbeat",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
+    },
+
+SQMP
+x-colo-lost-heartbeat
+--------------------
+
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+
+Example:
+
+-> { "execute": "x-colo-lost-heartbeat" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index b6f3190..a6cd6e5 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -11,6 +11,7 @@
  */
 
 #include "migration/colo.h"
+#include "qmp-commands.h"
 
 bool colo_supported(void)
 {
@@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
 {
     return NULL;
 }
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+    error_setg(errp, "COLO is not supported, please rerun configure"
+                     " with --enable-colo option in order to support"
+                     " COLO feature");
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (16 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM zhanghailiang
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

When handling failover, we do different things according to the different stage
of failover process, here we introduce a global atomic variable to record the
status of failover.

We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and set the value.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- fix several typos found by Dave
- Add Reviewed-by tag
---
 include/migration/failover.h | 10 ++++++++++
 migration/colo-failover.c    | 37 +++++++++++++++++++++++++++++++++++++
 migration/colo.c             |  4 ++++
 trace-events                 |  1 +
 4 files changed, 52 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index 3274735..fe71bb4 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -15,6 +15,16 @@
 
 #include "qemu-common.h"
 
+typedef enum COLOFailoverStatus {
+    FAILOVER_STATUS_NONE = 0,
+    FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
+    FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
+    FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+} COLOFailoverStatus;
+
+void failover_init_state(void);
+int failover_set_state(int old_state, int new_state);
+int failover_get_state(void);
 void failover_request_active(Error **errp);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 3533409..e94b3ba 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -14,22 +14,59 @@
 #include "migration/failover.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "trace.h"
 
 static QEMUBH *failover_bh;
+static COLOFailoverStatus failover_state;
 
 static void colo_failover_bh(void *opaque)
 {
+    int old_state;
+
     qemu_bh_delete(failover_bh);
     failover_bh = NULL;
+    old_state = failover_set_state(FAILOVER_STATUS_REQUEST,
+                                   FAILOVER_STATUS_HANDLING);
+    if (old_state != FAILOVER_STATUS_REQUEST) {
+        error_report("Unkown error for failover, old_state=%d", old_state);
+        return;
+    }
     /*TODO: Do failover work */
 }
 
 void failover_request_active(Error **errp)
 {
+   if (failover_set_state(FAILOVER_STATUS_NONE, FAILOVER_STATUS_REQUEST)
+         != FAILOVER_STATUS_NONE) {
+        error_setg(errp, "COLO failover is already actived");
+        return;
+    }
     failover_bh = qemu_bh_new(colo_failover_bh, NULL);
     qemu_bh_schedule(failover_bh);
 }
 
+void failover_init_state(void)
+{
+    failover_state = FAILOVER_STATUS_NONE;
+}
+
+int failover_set_state(int old_state, int new_state)
+{
+    int old;
+
+    old = atomic_cmpxchg(&failover_state, old_state, new_state);
+    if (old == old_state) {
+        trace_colo_failover_set_state(new_state);
+    }
+    return old;
+}
+
+int failover_get_state(void)
+{
+    return atomic_read(&failover_state);
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
     if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index 1aede64..bf1ac2e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -236,6 +236,8 @@ static void colo_process_checkpoint(MigrationState *s)
     Error *local_err = NULL;
     int ret;
 
+    failover_init_state();
+
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         error_report("Open QEMUFile from_dst_file failed");
@@ -342,6 +344,8 @@ void *colo_process_incoming_thread(void *opaque)
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
+    failover_init_state();
+
     mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
     if (!mis->to_src_file) {
         error_report("colo incoming thread: Open QEMUFile to_src_file failed");
diff --git a/trace-events b/trace-events
index ee4a2fb..8fb6b31 100644
--- a/trace-events
+++ b/trace-events
@@ -1609,6 +1609,7 @@ postcopy_ram_incoming_cleanup_join(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_put_cmd(const char *msg) "Send '%s' cmd"
 colo_get_cmd(const char *msg) "Receive '%s' cmd"
+colo_failover_set_state(int new_state) "new state %d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (17 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM zhanghailiang
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

For PVM, if there is failover request from users.
The colo thread will exit the loop while the failover BH does the
cleanup work and resumes VM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Add Reviewed-by tag
v12:
- Fix error report and remove unnecessary check in
  primary_vm_do_failover() (Dave's suggestion)
v11:
- Don't call migration_end() in primary_vm_do_failover(),
 The cleanup work will be done in migration_thread().
- Remove vm_start() in primary_vm_do_failover() which also been
  done in migraiton_thread()
v10:
- Call migration_end() in primary_vm_do_failover()
---
 include/migration/colo.h     |  3 +++
 include/migration/failover.h |  1 +
 migration/colo-failover.c    |  7 +++++-
 migration/colo.c             | 53 +++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e9ac2c3..e32eef4 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
index fe71bb4..c4bd81e 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -26,5 +26,6 @@ void failover_init_state(void);
 int failover_set_state(int old_state, int new_state);
 int failover_get_state(void);
 void failover_request_active(Error **errp);
+bool failover_request_is_active(void);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index e94b3ba..0a1d4bd 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
         error_report("Unkown error for failover, old_state=%d", old_state);
         return;
     }
-    /*TODO: Do failover work */
+    colo_do_failover(NULL);
 }
 
 void failover_request_active(Error **errp)
@@ -67,6 +67,11 @@ int failover_get_state(void)
     return atomic_read(&failover_state);
 }
 
+bool failover_request_is_active(void)
+{
+    return failover_get_state() != FAILOVER_STATUS_NONE;
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
     if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index bf1ac2e..89cea58 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -40,6 +40,40 @@ bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+static void primary_vm_do_failover(void)
+{
+    MigrationState *s = migrate_get_current();
+    int old_state;
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+
+    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                   FAILOVER_STATUS_COMPLETED);
+    if (old_state != FAILOVER_STATUS_HANDLING) {
+        error_report("Incorrect state (%d) while doing failover for Primary VM",
+                     old_state);
+        return;
+    }
+}
+
+void colo_do_failover(MigrationState *s)
+{
+    /* Make sure vm stopped while failover */
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (get_colo_mode() == COLO_MODE_PRIMARY) {
+        primary_vm_do_failover();
+    }
+}
+
 static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
                          Error **errp)
 {
@@ -166,9 +200,20 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     }
 
     qemu_mutex_lock_iothread();
+    if (failover_request_is_active()) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
     vm_stop_force_state(RUN_STATE_COLO);
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("run", "stop");
+    /*
+     * failover request bh could be called after
+     * vm_stop_force_state so we check failover_request_is_active() again.
+     */
+    if (failover_request_is_active()) {
+        goto out;
+    }
 
     /* Disable block migration */
     s->params.blk = 0;
@@ -266,6 +311,11 @@ static void colo_process_checkpoint(MigrationState *s)
     trace_colo_vm_state_change("stop", "run");
 
     while (s->state == MIGRATION_STATUS_COLO) {
+        if (failover_request_is_active()) {
+            error_report("failover request");
+            goto out;
+        }
+
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
         if (current_time - checkpoint_time <
             s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
@@ -288,9 +338,6 @@ out:
     if (local_err) {
         error_report_err(local_err);
     }
-    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
-                      MIGRATION_STATUS_COMPLETED);
-
     qsb_free(buffer);
     buffer = NULL;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (18 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

If users require SVM to takeover work, colo incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Improve error message that suggested by Dave
- Add Reviewed-by tag
---
 migration/colo.c | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 89cea58..a65b22b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -45,6 +45,33 @@ static bool colo_runstate_is_stopped(void)
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void secondary_vm_do_failover(void)
+{
+    int old_state;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+
+    if (!autostart) {
+        error_report("\"-S\" qemu option will be ignored in secondary side");
+        /* recover runstate to normal migration finish state */
+        autostart = true;
+    }
+
+    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                   FAILOVER_STATUS_COMPLETED);
+    if (old_state != FAILOVER_STATUS_HANDLING) {
+        error_report("Incorrect state (%d) while doing failover for "
+                     "secondary VM", old_state);
+        return;
+    }
+    /* For Secondary VM, jump to incoming co */
+    if (mis->migration_incoming_co) {
+        qemu_coroutine_enter(mis->migration_incoming_co, NULL);
+    }
+}
+
 static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
@@ -71,6 +98,8 @@ void colo_do_failover(MigrationState *s)
 
     if (get_colo_mode() == COLO_MODE_PRIMARY) {
         primary_vm_do_failover();
+    } else {
+        secondary_vm_do_failover();
     }
 }
 
@@ -430,6 +459,11 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
         assert(request);
+        if (failover_request_is_active()) {
+            error_report("failover request");
+            goto out;
+        }
+
         /* FIXME: This is unnecessary for periodic checkpoint mode */
         colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
                      &local_err);
@@ -501,10 +535,11 @@ out:
         qemu_fclose(fb);
     }
     qsb_free(buffer);
-
-    qemu_mutex_lock_iothread();
+    /* Here, we can ensure BH is hold the global lock, and will join colo
+    * incoming thread, so here it is not necessary to lock here again,
+    * or there will be a deadlock error.
+    */
     colo_release_ram_cache();
-    qemu_mutex_unlock_iothread();
 
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (19 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, Michael Roth,
	hongyang.yang

If some errors happen during VM's COLO FT stage, it's important to notify the users
of this event. Together with 'x_colo_lost_heartbeat', users can intervene in COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v13:
- Remove optional 'error' string for this event.
  (I doubted it was usefull for users, Since users shouldn't
   interpret it and can't depend on it to decide what happened
   exaclty. Besides it is really hard to organize.)
- Remove unused 'unknown' member for enum COLOExitReason.
 (Eric's suggestion)
- Fix comment for COLO_EXIT
v11:
- Fix several typos found by Eric
---
 docs/qmp-events.txt | 16 ++++++++++++++++
 migration/colo.c    | 20 ++++++++++++++++++++
 qapi-schema.json    | 14 ++++++++++++++
 qapi/event.json     | 15 +++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index 52eb7e2..b6e8937 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,22 @@ Example:
 Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
 event.
 
+COLO_EXIT
+---------
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.
+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason": the exit reason, internal error or external request. (json-string)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+
 DEVICE_DELETED
 --------------
 
diff --git a/migration/colo.c b/migration/colo.c
index a65b22b..814480c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -367,6 +368,18 @@ out:
     if (local_err) {
         error_report_err(local_err);
     }
+    /*
+    * There are only two reasons we can go here, something error happened,
+    * Or users triggered failover.
+    */
+    if (!failover_request_is_active()) {
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+                                  COLO_EXIT_REASON_ERROR, NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+                                  COLO_EXIT_REASON_REQUEST, NULL);
+    }
+
     qsb_free(buffer);
     buffer = NULL;
 
@@ -530,6 +543,13 @@ out:
     if (local_err) {
         error_report_err(local_err);
     }
+    if (!failover_request_is_active()) {
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+                                  COLO_EXIT_REASON_ERROR, NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+                                  COLO_EXIT_REASON_REQUEST, NULL);
+    }
 
     if (fb) {
         qemu_fclose(fb);
diff --git a/qapi-schema.json b/qapi-schema.json
index 73325ed..7fec696 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -776,6 +776,20 @@
   'data': [ 'unknown', 'primary', 'secondary'] }
 
 ##
+# @COLOExitReason
+#
+# The reason for a COLO exit
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'request', 'error' ] }
+
+##
 # @x-colo-lost-heartbeat
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index 390fd45..cfcc887 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -268,6 +268,21 @@
   'data': { 'pass': 'int' } }
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 2.6
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (20 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

If the net connection between COLO's two sides is broken while colo/colo incoming
thread is blocked in 'read'/'write' socket fd. It will not detect this error until
connect timeout. It will be a long time.

Here we shutdown all the related socket file descriptors to wake up the blocking
operation in failover BH. Besides, we should close the corresponding file descriptors
after failvoer BH shutdown them, or there will be an error.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Add Reviewed-by tag
- Use semaphore to notify colo/colo incoming loop that
  failover work is finished.
v12:
- Shutdown both QEMUFile's fd though they may use the
  same fd. (Dave's suggestion)
v11:
- Only shutdown fd for once
---
 include/migration/migration.h |  3 +++
 migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 14b9f3d..b34def6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -112,6 +112,7 @@ struct MigrationIncomingState {
     QemuThread colo_incoming_thread;
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
+    QemuSemaphore colo_incoming_sem;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
@@ -175,6 +176,8 @@ struct MigrationState
     QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
+
+    QemuSemaphore colo_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/colo.c b/migration/colo.c
index 814480c..5c87a8e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
         /* recover runstate to normal migration finish state */
         autostart = true;
     }
+    /*
+    * Make sure colo incoming thread not block in recv or send,
+    * If mis->from_src_file and mis->to_src_file use the same fd,
+    * The second shutdown() will return -1, we ignore this value,
+    * it is harmless.
+    */
+    if (mis->from_src_file) {
+        qemu_file_shutdown(mis->from_src_file);
+    }
+    if (mis->to_src_file) {
+        qemu_file_shutdown(mis->to_src_file);
+    }
 
     old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
                                    FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
                      "secondary VM", old_state);
         return;
     }
+    /* Notify COLO incoming thread that failover work is finished */
+    qemu_sem_post(&mis->colo_incoming_sem);
     /* For Secondary VM, jump to incoming co */
     if (mis->migration_incoming_co) {
         qemu_coroutine_enter(mis->migration_incoming_co, NULL);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    /*
+    * Make sure colo thread no block in recv or send,
+    * The s->rp_state.from_dst_file and s->to_dst_file may use the
+    * same fd, but we still shutdown the fd for twice, it is harmless.
+    */
+    if (s->to_dst_file) {
+        qemu_file_shutdown(s->to_dst_file);
+    }
+    if (s->rp_state.from_dst_file) {
+        qemu_file_shutdown(s->rp_state.from_dst_file);
+    }
+
     old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
                                    FAILOVER_STATUS_COMPLETED);
     if (old_state != FAILOVER_STATUS_HANDLING) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
                      old_state);
         return;
     }
+    /* Notify COLO thread that failover work is finished */
+    qemu_sem_post(&s->colo_sem);
 }
 
 void colo_do_failover(MigrationState *s)
@@ -383,6 +411,14 @@ out:
     qsb_free(buffer);
     buffer = NULL;
 
+    /* Hope this not to be too long to wait here */
+    qemu_sem_wait(&s->colo_sem);
+    qemu_sem_destroy(&s->colo_sem);
+    /*
+    * Must be called after failover BH is completed,
+    * Or the failover BH may shutdown the wrong fd, that
+    * re-used by other thread after we release here.
+    */
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
@@ -391,6 +427,7 @@ out:
 void migrate_start_colo_process(MigrationState *s)
 {
     qemu_mutex_unlock_iothread();
+    qemu_sem_init(&s->colo_sem, 0);
     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
     colo_process_checkpoint(s);
@@ -430,6 +467,8 @@ void *colo_process_incoming_thread(void *opaque)
     Error *local_err = NULL;
     int ret;
 
+    qemu_sem_init(&mis->colo_incoming_sem, 0);
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
@@ -561,6 +600,10 @@ out:
     */
     colo_release_ram_cache();
 
+    /* Hope this not to be too long to loop here */
+    qemu_sem_wait(&mis->colo_incoming_sem);
+    qemu_sem_destroy(&mis->colo_incoming_sem);
+    /* Must be called after failover BH is completed */
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (21 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We should not do failover work while the main thread is loading
VM's state, otherwise it will destroy the consistent of VM's memory and
device state.

Here we add a new failover status 'RELAUNCH' which means we should
relaunch the process of failover.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v14:
- Move the place of 'vmstate_loading = false;'.
v13:
- Add Reviewed-by tag
---
 include/migration/failover.h |  2 ++
 migration/colo.c             | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index c4bd81e..99b0d58 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
     FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
     FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
     FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+    /* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
+    FAILOVER_STATUS_RELAUNCH = 4,
 } COLOFailoverStatus;
 
 void failover_init_state(void);
diff --git a/migration/colo.c b/migration/colo.c
index 5c87a8e..515d561 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -19,6 +19,8 @@
 #include "migration/failover.h"
 #include "qapi-event.h"
 
+static bool vmstate_loading;
+
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -51,6 +53,19 @@ static void secondary_vm_do_failover(void)
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    /* Can not do failover during the process of VM's loading VMstate, Or
+      * it will break the secondary VM.
+      */
+    if (vmstate_loading) {
+        old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                       FAILOVER_STATUS_RELAUNCH);
+        if (old_state != FAILOVER_STATUS_HANDLING) {
+            error_report("Unknown error while do failover for secondary VM,"
+                         "old_state: %d", old_state);
+        }
+        return;
+    }
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
@@ -560,13 +575,22 @@ void *colo_process_incoming_thread(void *opaque)
 
         qemu_mutex_lock_iothread();
         qemu_system_reset(VMRESET_SILENT);
+        vmstate_loading = true;
         if (qemu_loadvm_state(fb) < 0) {
             error_report("COLO: loadvm failed");
             qemu_mutex_unlock_iothread();
             goto out;
         }
+
+        vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
+        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+            failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
+            failover_request_active(NULL);
+            goto out;
+        }
+
         colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
                      &local_err);
         if (local_err) {
@@ -578,6 +602,7 @@ void *colo_process_incoming_thread(void *opaque)
     }
 
 out:
+     vmstate_loading = false;
     /* Throw the unreported error message after exited from loop */
     if (local_err) {
         error_report_err(local_err);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (22 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state zhanghailiang
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, Paolo Bonzini, amit.shah, zhangchen.fnst,
	hongyang.yang

If VM is in COLO FT state, we should do some extra work before normal shutdown
process. SVM will ignore the shutdown command if this command is issued directly
to it, PVM will send the shutdown command to SVM if it gets this command.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v15:
- Go on the shutdown process even some error happened
  while sent 'SHUTDOWN' message to SVM.
- Add Reviewed-by tag
v14:
- Remove 'colo_shutdown' variable, use colo_shutdown_request directly
v13:
- Move COLO shutdown related codes to colo.c file (Dave's suggestion)
---
 include/migration/colo.h |  2 ++
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c         | 44 ++++++++++++++++++++++++++++++++++++++++++--
 qapi-schema.json         |  4 +++-
 stubs/migration-colo.c   |  5 +++++
 vl.c                     | 19 ++++++++++++++++---
 6 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e32eef4..919b135 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
 
 /* failover */
 void colo_do_failover(MigrationState *s);
+
+bool colo_shutdown(void);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3bb8897..91eeda3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -52,6 +52,8 @@ typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -59,6 +61,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index 515d561..855edee 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -330,6 +330,20 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    if (colo_shutdown_requested) {
+        colo_put_cmd(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            /* Go on the shutdown process and throw the error message */
+            error_report("Failed to send shutdown message to SVM");
+        }
+        qemu_fflush(s->to_dst_file);
+        colo_shutdown_requested = 0;
+        qemu_system_shutdown_request_core();
+        /* Fix me: Just let the colo thread exit ? */
+        qemu_thread_exit(0);
+    }
+
     ret = 0;
     /* Resume primary guest */
     qemu_mutex_lock_iothread();
@@ -390,8 +404,9 @@ static void colo_process_checkpoint(MigrationState *s)
         }
 
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-        if (current_time - checkpoint_time <
-            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+        if ((current_time - checkpoint_time <
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) &&
+            !colo_shutdown_requested) {
             int64_t delay_ms;
 
             delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
@@ -465,6 +480,15 @@ static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
     case COLO_MESSAGE_CHECKPOINT_REQUEST:
         *checkpoint_request = 1;
         break;
+    case COLO_MESSAGE_GUEST_SHUTDOWN:
+        qemu_mutex_lock_iothread();
+        vm_stop_force_state(RUN_STATE_COLO);
+        qemu_system_shutdown_request_core();
+        qemu_mutex_unlock_iothread();
+        /* the main thread will exit and terminate the whole
+        * process, do we need some cleanup?
+        */
+        qemu_thread_exit(0);
     default:
         *checkpoint_request = 0;
         error_setg(errp, "Got unknown COLO command: %d", cmd);
@@ -636,3 +660,19 @@ out:
 
     return NULL;
 }
+
+bool colo_shutdown(void)
+{
+    /*
+    * if in colo mode, we need do some significant work before respond
+    * to the shutdown request.
+    */
+    if (migration_incoming_in_colo_state()) {
+        return true; /* primary's responsibility */
+    }
+    if (migration_in_colo_state()) {
+        colo_shutdown_requested = 1;
+        return true;
+    }
+    return false;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 7fec696..4d8ba04 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -752,12 +752,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @guest-shutdown: shutdown require from PVM to SVM
+#
 # Since: 2.6
 ##
 { 'enum': 'COLOMessage',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
             'vmstate-send', 'vmstate-size', 'vmstate-received',
-            'vmstate-loaded' ] }
+            'vmstate-loaded', 'guest-shutdown' ] }
 
 ##
 # @COLOMode
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index a6cd6e5..1996cd9 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -43,3 +43,8 @@ void qmp_x_colo_lost_heartbeat(Error **errp)
                      " with --enable-colo option in order to support"
                      " COLO feature");
 }
+
+bool colo_shutdown(void)
+{
+    return false;
+}
diff --git a/vl.c b/vl.c
index 1cde195..20a7889 100644
--- a/vl.c
+++ b/vl.c
@@ -1644,6 +1644,8 @@ static NotifierList wakeup_notifiers =
     NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
+int colo_shutdown_requested;
+
 int qemu_shutdown_requested_get(void)
 {
     return shutdown_requested;
@@ -1775,7 +1777,10 @@ void qemu_system_guest_panicked(void)
 void qemu_system_reset_request(void)
 {
     if (no_reboot) {
-        shutdown_requested = 1;
+        qemu_system_shutdown_request();
+        if (!shutdown_requested) {/* colo handle it ? */
+            return;
+        }
     } else {
         reset_requested = 1;
     }
@@ -1848,14 +1853,22 @@ void qemu_system_killed(int signal, pid_t pid)
     qemu_notify_event();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 {
-    trace_qemu_system_shutdown_request();
     replay_shutdown_request();
     shutdown_requested = 1;
     qemu_notify_event();
 }
 
+void qemu_system_shutdown_request(void)
+{
+    trace_qemu_system_shutdown_request();
+    if (colo_shutdown()) {
+        return;
+    }
+    qemu_system_shutdown_request_core();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown(&error_abort);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (23 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
after going into colo state.
So it is necessary to update the global runstate after going into colo state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Add Reviewed-by tag
---
 migration/colo.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 855edee..16bada6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -397,6 +397,11 @@ static void colo_process_checkpoint(MigrationState *s)
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
+    ret = global_state_store();
+    if (ret < 0) {
+        goto out;
+    }
+
     while (s->state == MIGRATION_STATUS_COLO) {
         if (failover_request_is_active()) {
             error_report("failover request");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (24 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

For COLO's checkpoint process, we will do savevm/loadvm repeatedly.
So every time we call qemu_loadvm_section_start_full(), we will
add all sections information into loadvm_handlers list for one time.
There will be many instances in loadvm_handlers for one section,
and this will lead to memory leak.

We need to check if we have the section info in loadvm_handlers list
before save it. For normal migration, it is harmless.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v14:
- Add Reviewed-by tag
-
v13:
- New patch
---
 migration/savevm.c | 56 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 94f2894..9e3c18a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1718,6 +1718,37 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
     }
 }
 
+static LoadStateEntry *loadvm_save_section_entry(MigrationIncomingState *mis,
+                                                 SaveStateEntry *se,
+                                                 uint32_t section_id,
+                                                 uint32_t version_id)
+{
+    LoadStateEntry *le;
+
+    /* Add entry */
+    le = g_malloc0(sizeof(*le));
+
+    le->se = se;
+    le->section_id = section_id;
+    le->version_id = version_id;
+    QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
+    return le;
+}
+
+static LoadStateEntry *loadvm_find_section_entry(MigrationIncomingState *mis,
+                                                 uint32_t section_id)
+{
+    LoadStateEntry *le;
+
+    QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
+        if (le->section_id == section_id) {
+            break;
+        }
+    }
+
+    return le;
+}
+
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
@@ -1753,16 +1784,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
                      version_id, idstr, se->version_id);
         return -EINVAL;
     }
-
-    /* Add entry */
-    le = g_malloc0(sizeof(*le));
-
-    le->se = se;
-    le->section_id = section_id;
-    le->version_id = version_id;
-    QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
-
-    ret = vmstate_load(f, le->se, le->version_id);
+     /* Check if we have saved this section info before, if not, save it */
+    le = loadvm_find_section_entry(mis, section_id);
+    if (!le) {
+        le = loadvm_save_section_entry(mis, se, section_id, version_id);
+    }
+    ret = vmstate_load(f, se, version_id);
     if (ret < 0) {
         error_report("error while loading state for instance 0x%x of"
                      " device '%s'", instance_id, idstr);
@@ -1785,12 +1812,9 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
     section_id = qemu_get_be32(f);
 
     trace_qemu_loadvm_state_section_partend(section_id);
-    QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
-        if (le->section_id == section_id) {
-            break;
-        }
-    }
-    if (le == NULL) {
+
+    le = loadvm_find_section_entry(mis, section_id);
+    if (!le) {
         error_report("Unknown savevm section %d", section_id);
         return -EINVAL;
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (25 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-26 12:52   ` Dr. David Alan Gilbert
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

There are several stages during loadvm process. In different stage,
migration incoming processes different section.
We want to control these stages more accuracy, to optimize the COLO
capability.

Here we add two new helper functions: qemu_loadvm_state_begin()
and qemu_load_device_state().
Besides, we make qemu_loadvm_state_main() API public.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v14:
- Split from patch 'COLO: Separate the process of saving/loading
  ram and device state
---
 include/sysemu/sysemu.h |  3 +++
 migration/savevm.c      | 38 +++++++++++++++++++++++++++++++++++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 91eeda3..c0694a1 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -134,6 +134,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint64_t *length_list);
 
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
 
 typedef enum DisplayType
 {
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e3c18a..954e0a7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1249,8 +1249,6 @@ enum LoadVMExitCodes {
     LOADVM_QUIT     =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* ------ incoming postcopy messages ------ */
 /* 'advise' arrives before any transfers just to tell us that a postcopy
  * *might* happen - it might be skipped if precopy transferred everything
@@ -1832,7 +1830,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
     return 0;
 }
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret;
@@ -1965,6 +1963,40 @@ int qemu_loadvm_state(QEMUFile *f)
     return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
+    int ret;
+
+    if (qemu_savevm_state_blocked(&local_err)) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
+    /* Load QEMU_VM_SECTION_START section */
+    ret = qemu_loadvm_state_main(f, mis);
+    if (ret < 0) {
+        error_report("Failed to loadvm begin work: %d", ret);
+    }
+    return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int ret;
+
+    /* Load QEMU_VM_SECTION_FULL section */
+    ret = qemu_loadvm_state_main(f, mis);
+    if (ret < 0) {
+        error_report("Failed to load device state: %d", ret);
+        return ret;
+    }
+
+    cpu_synchronize_all_post_init();
+    return 0;
+}
+
 void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
     BlockDriverState *bs, *bs1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (26 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-26 13:00   ` Dr. David Alan Gilbert
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
                   ` (10 subsequent siblings)
  38 siblings, 1 reply; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We add a new helper functions qemu_savevm_live_state(),
and make qemu_save_device_state() public.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v14:
- New patch split from previous
 'COLO: Separate the process of saving/loading ram and device state
---
 include/sysemu/sysemu.h |  3 +++
 migration/savevm.c      | 15 +++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c0694a1..7b1748c 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -133,6 +133,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint64_t *start_list,
                                            uint64_t *length_list);
 
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
 int qemu_loadvm_state_begin(QEMUFile *f);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
diff --git a/migration/savevm.c b/migration/savevm.c
index 954e0a7..60c7b57 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1192,13 +1192,20 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-    SaveStateEntry *se;
+    /* save QEMU_VM_SECTION_END section */
+    qemu_savevm_state_complete_precopy(f, true);
+    qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+    SaveStateEntry *se;
 
+    if (!migration_in_colo_state()) {
+        qemu_savevm_state_header(f);
+    }
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (27 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-26 13:16   ` Dr. David Alan Gilbert
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
                   ` (9 subsequent siblings)
  38 siblings, 1 reply; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

We separate the process of saving/loading ram and device state when do
checkpoint, we add new helpers for save/load ram/device. With this change,
we can directly transfer ram from master to slave without using
QEMUSizeBufferas as assistant, which also reduce the size of extra memory
been used during checkpoint.

Besides, we move the colo_flush_ram_cache to the proper position after the
above change.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v14:
- split two new patches from this patch
- Some minor fixes from Dave
v13:
- Re-use some existed helper functions to realize saving/loading
  ram and device.
v11:
- Remove load configuration section in qemu_loadvm_state_begin()
---
 migration/colo.c   | 48 ++++++++++++++++++++++++++++++++++++++----------
 migration/ram.c    |  5 -----
 migration/savevm.c |  5 +++++
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 16bada6..300fa54 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -288,21 +288,37 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     /* Disable block migration */
     s->params.blk = 0;
     s->params.shared = 0;
-    qemu_savevm_state_header(trans);
-    qemu_savevm_state_begin(trans, &s->params);
+    qemu_savevm_state_begin(s->to_dst_file, &s->params);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        error_report("Save vm state begin error");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
-    qemu_savevm_state_complete_precopy(trans, false);
+    /*
+    * Only save VM's live state, which not including device state.
+    * TODO: We may need a timeout mechanism to prevent COLO process
+    * to be blocked here.
+    */
+    qemu_savevm_live_state(s->to_dst_file);
+    /* Note: device state is saved into buffer */
+    ret = qemu_save_device_state(trans);
     qemu_mutex_unlock_iothread();
-
-    qemu_fflush(trans);
-
-    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
-    if (local_err) {
+    if (ret < 0) {
+        error_report("Save device state error");
         goto out;
     }
+    qemu_fflush(trans);
+
     /* we send the total size of the vmstate first */
     size = qsb_get_length(buffer);
     colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
@@ -573,6 +589,16 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        ret = qemu_loadvm_state_begin(mis->from_src_file);
+        if (ret < 0) {
+            error_report("Load vm state begin error, ret=%d", ret);
+            goto out;
+        }
+        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+        if (ret < 0) {
+            error_report("Load VM's live state (ram) error");
+            goto out;
+        }
         /* read the VM state total size first */
         value = colo_get_cmd_value(mis->from_src_file,
                                  COLO_MESSAGE_VMSTATE_SIZE, &local_err);
@@ -605,8 +631,10 @@ void *colo_process_incoming_thread(void *opaque)
         qemu_mutex_lock_iothread();
         qemu_system_reset(VMRESET_SILENT);
         vmstate_loading = true;
-        if (qemu_loadvm_state(fb) < 0) {
-            error_report("COLO: loadvm failed");
+        colo_flush_ram_cache();
+        ret = qemu_load_device_state(fb);
+        if (ret < 0) {
+            error_report("COLO: load device state failed");
             qemu_mutex_unlock_iothread();
             goto out;
         }
diff --git a/migration/ram.c b/migration/ram.c
index 891f3b2..8f416d5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2465,7 +2465,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      * be atomic
      */
     bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
-    bool need_flush = false;
 
     seq_iter++;
 
@@ -2500,7 +2499,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             /* After going into COLO, we should load the Page into colo_cache */
             if (ram_cache_enable) {
                 host = colo_cache_from_block_offset(block, addr);
-                need_flush = true;
             } else {
                 host = host_from_ram_block_offset(block, addr);
             }
@@ -2594,9 +2592,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
     rcu_read_unlock();
 
-    if (!ret  && ram_cache_enable && need_flush) {
-        colo_flush_ram_cache();
-    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index 60c7b57..1551fbb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -50,6 +50,7 @@
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
+#include "migration/colo.h"
 
 
 #ifndef ETH_P_RARP
@@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
             break;
         }
     }
+    if (migration_in_colo_state()) {
+        qemu_put_byte(f, QEMU_VM_EOF);
+        qemu_fflush(f);
+    }
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (28 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object zhanghailiang
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these data
transferring in the later checkpoint.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v13:
- Fix some minor issues found by Dave
- Add Reviewed-by tag
---
 migration/colo.c | 51 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 300fa54..0140203 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -293,16 +293,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
-    /* Disable block migration */
-    s->params.blk = 0;
-    s->params.shared = 0;
-    qemu_savevm_state_begin(s->to_dst_file, &s->params);
-    ret = qemu_file_get_error(s->to_dst_file);
-    if (ret < 0) {
-        error_report("Save vm state begin error");
-        goto out;
-    }
-
     qemu_mutex_lock_iothread();
     /*
     * Only save VM's live state, which not including device state.
@@ -377,6 +367,21 @@ out:
     return ret;
 }
 
+static int colo_prepare_before_save(MigrationState *s)
+{
+    int ret;
+
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_begin(s->to_dst_file, &s->params);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        error_report("Save vm state begin error");
+    }
+    return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
@@ -392,6 +397,11 @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    ret = colo_prepare_before_save(s);
+    if (ret < 0) {
+        goto out;
+    }
+
     /*
      * Wait for Secondary finish loading vm states and enter COLO
      * restore.
@@ -517,6 +527,17 @@ static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
     }
 }
 
+static int colo_prepare_before_load(QEMUFile *f)
+{
+    int ret;
+
+    ret = qemu_loadvm_state_begin(f);
+    if (ret < 0) {
+        error_report("load vm state begin error, ret=%d", ret);
+    }
+    return ret;
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -557,6 +578,11 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    ret = colo_prepare_before_load(mis->from_src_file);
+    if (ret < 0) {
+        goto out;
+    }
+
     colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                  &local_err);
     if (local_err) {
@@ -589,11 +615,6 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
-        ret = qemu_loadvm_state_begin(mis->from_src_file);
-        if (ret < 0) {
-            error_report("Load vm state begin error, ret=%d", ret);
-            goto out;
-        }
         ret = qemu_loadvm_state_main(mis->from_src_file, mis);
         if (ret < 0) {
             error_report("Load VM's live state (ram) error");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (29 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval zhanghailiang
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

With this property, users can control if this filter is 'enable'
or 'disable'. The default behavior for filter is enabled.

We will skip the disabled filter when delivering packets in net layer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v15:
- Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip (Jason)
- Remove some useless comment (Jason)
---
 include/net/filter.h |  1 +
 net/filter.c         | 40 ++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx      |  4 +++-
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index 5639976..af3c53c 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
     char *netdev_id;
     NetClientState *netdev;
     NetFilterDirection direction;
+    bool enabled;
     QTAILQ_ENTRY(NetFilterState) next;
 };
 
diff --git a/net/filter.c b/net/filter.c
index d2a514e..f114dfb 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -17,6 +17,11 @@
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
 
+static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
+{
+    return nf->enabled ? false : true;
+}
+
 ssize_t qemu_netfilter_receive(NetFilterState *nf,
                                NetFilterDirection direction,
                                NetClientState *sender,
@@ -25,6 +30,9 @@ ssize_t qemu_netfilter_receive(NetFilterState *nf,
                                int iovcnt,
                                NetPacketSent *sent_cb)
 {
+    if (qemu_can_skip_netfilter(nf)) {
+        return 0;
+    }
     if (nf->direction == direction ||
         nf->direction == NET_FILTER_DIRECTION_ALL) {
         return NETFILTER_GET_CLASS(OBJECT(nf))->receive_iov(
@@ -134,8 +142,37 @@ static void netfilter_set_direction(Object *obj, int direction, Error **errp)
     nf->direction = direction;
 }
 
+static char *netfilter_get_status(Object *obj, Error **errp)
+{
+    NetFilterState *nf = NETFILTER(obj);
+
+    if (nf->enabled) {
+        return g_strdup("enable");
+    } else {
+        return g_strdup("disable");
+    }
+}
+
+static void netfilter_set_status(Object *obj, const char *str, Error **errp)
+{
+    NetFilterState *nf = NETFILTER(obj);
+
+    if (!strcmp(str, "enable")) {
+        nf->enabled = true;
+    } else if (!strcmp(str, "disable")) {
+        nf->enabled = false;
+    } else {
+        error_setg(errp, "Invalid value for netfilter status, "
+                         "should be 'enable' or 'disable'");
+    }
+}
+
 static void netfilter_init(Object *obj)
 {
+    NetFilterState *nf = NETFILTER(obj);
+
+    nf->enabled = true;
+
     object_property_add_str(obj, "netdev",
                             netfilter_get_netdev_id, netfilter_set_netdev_id,
                             NULL);
@@ -143,6 +180,9 @@ static void netfilter_init(Object *obj)
                              NetFilterDirection_lookup,
                              netfilter_get_direction, netfilter_set_direction,
                              NULL);
+    object_property_add_str(obj, "status",
+                            netfilter_get_status, netfilter_set_status,
+                            NULL);
 }
 
 static void netfilter_complete(UserCreatable *uc, Error **errp)
diff --git a/qemu-options.hx b/qemu-options.hx
index 2f0465e..6f302e6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3742,11 +3742,13 @@ version by providing the @var{passwordid} parameter. This provides
 the ID of a previously created @code{secret} object containing the
 password for decryption.
 
-@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}]
+@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{enable|disable}]
 
 Interval @var{t} can't be 0, this filter batches the packet delivery: all
 packets arriving in a given interval on netdev @var{netdevid} are delayed
 until the end of the interval. Interval is in microseconds.
+@option{status} is optional that indicate whether the netfilter is enabled
+or disabled, the default status for netfilter will be enabled.
 
 queue @var{all|rx|tx} is an option that can be applied to any netfilter.
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (30 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init zhanghailiang
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

We may want to accept zero interval when VM FT solutions like MC
or COLO use this filter to release packets on demand.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
 net/filter-buffer.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 12ad2e3..f0a9151 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -104,16 +104,6 @@ static void filter_buffer_setup(NetFilterState *nf, Error **errp)
 {
     FilterBufferState *s = FILTER_BUFFER(nf);
 
-    /*
-     * We may want to accept zero interval when VM FT solutions like MC
-     * or COLO use this filter to release packets on demand.
-     */
-    if (!s->interval) {
-        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval",
-                   "a non-zero interval");
-        return;
-    }
-
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
     if (s->interval) {
         timer_init_us(&s->release_timer, QEMU_CLOCK_VIRTUAL,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (31 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter zhanghailiang
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

We can register some callback for this notifier,
this will be used by COLO to register a callback which
will add each netdev a buffer filter.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v14:
- New patch
---
 include/net/net.h |  4 ++++
 net/net.c         | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..f6f0194 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -176,6 +176,10 @@ struct NICInfo {
     int nvectors;
 };
 
+typedef struct netdev_init_entry NetdevInitEntry;
+typedef void NetdevInitHandler(const char *netdev_id, void *opaque);
+NetdevInitEntry *netdev_init_add_handler(NetdevInitHandler *cb, void *opaque);
+
 extern int nb_nics;
 extern NICInfo nd_table[MAX_NICS];
 extern int default_net;
diff --git a/net/net.c b/net/net.c
index aebf753..bdd3e7b 100644
--- a/net/net.c
+++ b/net/net.c
@@ -55,6 +55,14 @@
 static VMChangeStateEntry *net_change_state_entry;
 static QTAILQ_HEAD(, NetClientState) net_clients;
 
+struct netdev_init_entry {
+    NetdevInitHandler *cb;
+    void *opaque;
+    QLIST_ENTRY(netdev_init_entry) entries;
+};
+
+static QLIST_HEAD(netdev_init_head, netdev_init_entry)netdev_init_head;
+
 const char *host_net_devices[] = {
     "tap",
     "socket",
@@ -953,6 +961,26 @@ static int net_init_nic(const NetClientOptions *opts, const char *name,
     return idx;
 }
 
+NetdevInitEntry *netdev_init_add_handler(NetdevInitHandler *cb, void *opaque)
+{
+    NetdevInitEntry *e;
+
+    e = g_malloc0(sizeof(*e));
+
+    e->cb = cb;
+    e->opaque = opaque;
+    QLIST_INSERT_HEAD(&netdev_init_head, e, entries);
+    return e;
+}
+
+static void netdev_init_notify(const char *netdev_id)
+{
+    NetdevInitEntry *e, *next;
+
+    QLIST_FOREACH_SAFE(e, &netdev_init_head, entries, next) {
+        e->cb(netdev_id, e->opaque);
+    }
+}
 
 static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND__MAX])(
     const NetClientOptions *opts,
@@ -1039,6 +1067,11 @@ static int net_client_init1(const void *object, int is_netdev, Error **errp)
         }
         return -1;
     }
+    if (is_netdev) {
+        const Netdev *netdev = object;
+
+        netdev_init_notify(netdev->id);
+    }
     return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (32 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM zhanghailiang
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

For COLO periodic mode, it need to buffer packets that
sent by VM, and we will not release these packets until
finish a checkpoint.

Here, we add each netdev a buffer-filter that will be controlled
by COLO. It is disabled by default, and the packets will not pass
through these filters. If users don't enable COLO while configure
qemu, these buffer-filters will not be added.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v15:
- call object_new_with_props() directly to add filter in
  colo_add_buffer_filter. (Jason's suggestion)
v14:
- New patch
---
 include/migration/colo.h |  2 ++
 include/net/filter.h     |  2 ++
 migration/colo-comm.c    |  5 +++++
 migration/colo.c         | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/filter-buffer.c      |  2 --
 stubs/migration-colo.c   |  4 ++++
 6 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 919b135..22b92c9 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -37,4 +37,6 @@ COLOMode get_colo_mode(void);
 void colo_do_failover(MigrationState *s);
 
 bool colo_shutdown(void);
+void colo_add_buffer_filter(const char *netdev_id, void *opaque);
+
 #endif
diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..faccedd 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -22,6 +22,8 @@
 #define NETFILTER_CLASS(klass) \
     OBJECT_CLASS_CHECK(NetFilterClass, (klass), TYPE_NETFILTER)
 
+#define TYPE_FILTER_BUFFER "filter-buffer"
+
 typedef void (FilterSetup) (NetFilterState *nf, Error **errp);
 typedef void (FilterCleanup) (NetFilterState *nf);
 /*
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 3943e94..91d873e 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -13,6 +13,7 @@
 
 #include <migration/colo.h>
 #include "trace.h"
+#include <net/net.h>
 
 typedef struct {
      bool colo_requested;
@@ -58,6 +59,10 @@ static const VMStateDescription colo_state = {
 void colo_info_init(void)
 {
     vmstate_register(NULL, 0, &colo_state, &colo_info);
+    /* FIXME: Remove this after COLO switch to use colo-proxy */
+    if (colo_supported()) {
+        netdev_init_add_handler(colo_add_buffer_filter, NULL);
+    }
 }
 
 bool migration_incoming_enable_colo(void)
diff --git a/migration/colo.c b/migration/colo.c
index 0140203..bbff4e8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,12 +18,23 @@
 #include "qemu/error-report.h"
 #include "migration/failover.h"
 #include "qapi-event.h"
+#include "net/net.h"
+#include "net/filter.h"
+#include "net/vhost_net.h"
 
 static bool vmstate_loading;
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+typedef struct COLOListNode {
+    void *opaque;
+    QLIST_ENTRY(COLOListNode) node;
+} COLOListNode;
+
+static QLIST_HEAD(, COLOListNode) COLOBufferFilters =
+    QLIST_HEAD_INITIALIZER(COLOBufferFilters);
+
 bool colo_supported(void)
 {
     return true;
@@ -382,6 +393,44 @@ static int colo_prepare_before_save(MigrationState *s)
     return ret;
 }
 
+void colo_add_buffer_filter(const char *netdev_id, void *opaque)
+{
+    NetFilterState *nf;
+    char filter_name[128];
+    Object *filter;
+    COLOListNode *filternode;
+    NetClientState *nc = qemu_find_netdev(netdev_id);
+
+    /* FIXME: Not support multiple queues */
+    if (!nc || nc->queue_index > 1) {
+        return;
+    }
+
+     /* Not support vhost-net */
+    if (get_vhost_net(nc)) {
+        return;
+    }
+
+    snprintf(filter_name, sizeof(filter_name),
+            "%scolo", netdev_id);
+
+    filter = object_new_with_props(TYPE_FILTER_BUFFER,
+                        object_get_objects_root(),
+                        filter_name, NULL,
+                        "netdev", netdev_id,
+                        "status", "disable",
+                        NULL);
+    if (!filter) {
+        return;
+    }
+    nf =  NETFILTER(filter);
+    /* Only buffer the packets that sent out by VM */
+    nf->direction = NET_FILTER_DIRECTION_RX;
+    filternode = g_new0(COLOListNode, 1);
+    filternode->opaque = nf;
+    QLIST_INSERT_HEAD(&COLOBufferFilters, filternode, node);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index f0a9151..34dc312 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -16,8 +16,6 @@
 #include "qapi-visit.h"
 #include "qom/object.h"
 
-#define TYPE_FILTER_BUFFER "filter-buffer"
-
 #define FILTER_BUFFER(obj) \
     OBJECT_CHECK(FilterBufferState, (obj), TYPE_FILTER_BUFFER)
 
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 1996cd9..8e74acb 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -48,3 +48,7 @@ bool colo_shutdown(void)
 {
     return false;
 }
+
+void colo_add_buffer_filter(const char *netdev_id, void *opaque)
+{
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (33 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public zhanghailiang
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

Enable all buffer filters that added by COLO while
go into COLO process, and disable them while exit COLO.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v15:
- Re-implement colo_set_filter_status() based on COLOBufferFilters list.
- Fix the title of this patch
---
 migration/colo.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index bbff4e8..4c39204 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -113,10 +113,22 @@ static void secondary_vm_do_failover(void)
     }
 }
 
+static void colo_set_filter_status(const char *status, Error **errp)
+{
+    struct COLOListNode *e, *next;
+    NetFilterState *nf;
+
+    QLIST_FOREACH_SAFE(e, &COLOBufferFilters, node, next) {
+        nf = e->opaque;
+        object_property_set_str(OBJECT(nf), status, "status", errp);
+    }
+}
+
 static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
     int old_state;
+    Error *local_err = NULL;
 
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
@@ -140,6 +152,12 @@ static void primary_vm_do_failover(void)
                      old_state);
         return;
     }
+
+    colo_set_filter_status("disable", &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     /* Notify COLO thread that failover work is finished */
     qemu_sem_post(&s->colo_sem);
 }
@@ -440,6 +458,11 @@ static void colo_process_checkpoint(MigrationState *s)
 
     failover_init_state();
 
+    colo_set_filter_status("enable", &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         error_report("Open QEMUFile from_dst_file failed");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (34 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

We will use it in COLO to flush the buffered packets.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v14:
- New patch
---
 include/net/filter.h | 2 ++
 net/filter-buffer.c  | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index faccedd..8ffd53b 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -76,4 +76,6 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
                                     int iovcnt,
                                     void *opaque);
 
+void filter_buffer_flush(NetFilterState *nf);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 34dc312..91ddd68 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -27,7 +27,7 @@ typedef struct FilterBufferState {
     QEMUTimer release_timer;
 } FilterBufferState;
 
-static void filter_buffer_flush(NetFilterState *nf)
+void filter_buffer_flush(NetFilterState *nf)
 {
     FilterBufferState *s = FILTER_BUFFER(nf);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (35 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process zhanghailiang
  2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, Jason Wang,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	hongyang.yang

In COLO periodic mode, the packets from VM should not be sent
during the time interval of two checkpoints, we will release
all these buffered packets after the checkpoint process, before
VM is resumed.

In this way, we can ensure not to break the network services if
COLO goes into failover process.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v15:
- Re-implement colo_flush_filter_packets() based on COLOBufferFilters list
---
 migration/colo.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 4c39204..a2d489b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -124,6 +124,17 @@ static void colo_set_filter_status(const char *status, Error **errp)
     }
 }
 
+static void colo_flush_filter_packets(Error **errp)
+{
+    struct COLOListNode *e, *next;
+    NetFilterState *nf;
+
+    QLIST_FOREACH_SAFE(e, &COLOBufferFilters, node, next) {
+        nf = e->opaque;
+        filter_buffer_flush(nf);
+    }
+}
+
 static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
@@ -157,6 +168,7 @@ static void primary_vm_do_failover(void)
     if (local_err) {
         error_report_err(local_err);
     }
+    colo_flush_filter_packets(NULL);
 
     /* Notify COLO thread that failover work is finished */
     qemu_sem_post(&s->colo_sem);
@@ -364,6 +376,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     if (local_err) {
         goto out;
     }
+    /* FIXME: Remove this after switch to use colo-proxy */
+    colo_flush_filter_packets(NULL);
 
     if (colo_shutdown_requested) {
         colo_put_cmd(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN, &local_err);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (36 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
@ 2016-02-22  2:40 ` zhanghailiang
  2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
  38 siblings, 0 replies; 52+ messages in thread
From: zhanghailiang @ 2016-02-22  2:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, xiecl.fnst, lizhijian, quintela, armbru,
	yunhong.jiang, eddie.dong, peter.huangpeng, dgilbert,
	zhanghailiang, arei.gonglei, stefanha, amit.shah, zhangchen.fnst,
	Max Reitz, hongyang.yang

Make sure master start block replication after slave's block
replication started.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 migration/colo.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c |  6 +++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index a2d489b..abb7b14 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,6 +21,7 @@
 #include "net/net.h"
 #include "net/filter.h"
 #include "net/vhost_net.h"
+#include "replication.h"
 
 static bool vmstate_loading;
 
@@ -63,6 +64,7 @@ static void secondary_vm_do_failover(void)
 {
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
 
     /* Can not do failover during the process of VM's loading VMstate, Or
       * it will break the secondary VM.
@@ -80,6 +82,11 @@ static void secondary_vm_do_failover(void)
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    replication_stop_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     if (!autostart) {
         error_report("\"-S\" qemu option will be ignored in secondary side");
         /* recover runstate to normal migration finish state */
@@ -170,6 +177,11 @@ static void primary_vm_do_failover(void)
     }
     colo_flush_filter_packets(NULL);
 
+    replication_stop_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     /* Notify COLO thread that failover work is finished */
     qemu_sem_post(&s->colo_sem);
 }
@@ -329,6 +341,14 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    /* we call this api although this may do nothing on primary side */
+    qemu_mutex_lock_iothread();
+    replication_do_checkpoint_all(&local_err);
+    qemu_mutex_unlock_iothread();
+    if (local_err) {
+        goto out;
+    }
+
     colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
     if (local_err) {
         goto out;
@@ -505,6 +525,13 @@ static void colo_process_checkpoint(MigrationState *s)
     }
 
     qemu_mutex_lock_iothread();
+    /* start block replication */
+    replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
+    if (local_err) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
+
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
@@ -600,6 +627,7 @@ static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
     case COLO_MESSAGE_GUEST_SHUTDOWN:
         qemu_mutex_lock_iothread();
         vm_stop_force_state(RUN_STATE_COLO);
+        replication_stop_all(false, NULL);
         qemu_system_shutdown_request_core();
         qemu_mutex_unlock_iothread();
         /* the main thread will exit and terminate the whole
@@ -669,6 +697,14 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    qemu_mutex_lock_iothread();
+    /* start block replication */
+    replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
+    qemu_mutex_unlock_iothread();
+    if (local_err) {
+        goto out;
+    }
+
     colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                  &local_err);
     if (local_err) {
@@ -746,6 +782,18 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        replication_get_error_all(&local_err);
+        if (local_err) {
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        /* discard colo disk buffer */
+        replication_do_checkpoint_all(&local_err);
+        if (local_err) {
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+
         vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
diff --git a/migration/migration.c b/migration/migration.c
index 324dcb6..068edb0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1591,7 +1591,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
 
         if (!ret) {
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-            if (ret >= 0) {
+            /*
+            * Don't mark image with BDRV_O_INACTIVE flag if
+            * we will go into COLO stage later.
+            */
+            if (ret >= 0 && !migrate_colo_enabled()) {
                 ret = bdrv_inactivate_all();
             }
             if (ret >= 0) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (37 preceding siblings ...)
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process zhanghailiang
@ 2016-02-25 19:52 ` Dr. David Alan Gilbert
  2016-02-26 16:36   ` Dr. David Alan Gilbert
  38 siblings, 1 reply; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-25 19:52 UTC (permalink / raw)
  To: zhanghailiang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> From: root <root@localhost.localdomain>
> 
> This is the 15th version of COLO (Still only support periodic checkpoint).
> 
> Here is only COLO frame part, you can get the whole codes from github:
> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> 
> There are little changes for this series except the network releated part.

I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

  3.8ms  pcibus_reset: VGA
  1.8ms  pcibus_reset: virtio-net-pci
  1.5ms  pcibus_reset: virtio-blk-pci
  1.5ms  qemu_devices_reset: piix4_reset
  1.1ms  pcibus_reset: piix3-ide
  1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is 
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.

I guess reset is normally off anyones radar since it's outside
the time anyone cares about, but I guess perhaps the guys trying
to make qemu start really quickly would be interested.

Dave

> 
> Patch status:
> Unreviewed: patch 21,27,28,29,33,38
> Updated: patch 31,34,35,37
> 
> TODO:
> 1. Checkpoint based on proxy in qemu
> 2. The capability of continuous FT
> 3. Optimize the VM's downtime during checkpoint
> 
> v15:
>  - Go on the shutdown process if encounter error while sending shutdown
>    message to SVM. (patch 24)
>  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
>    some useless comment. (patch 31, Jason)
>  - Call object_new_with_props() directly to add filter in
>    colo_add_buffer_filter. (patch 34, Jason)
>  - Re-implement colo_set_filter_status() based on COLOBufferFilters
>    list. (patch 35)
>  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
>    list. (patch 37) 
> v14:
>  - Re-implement the network processing based on netfilter (Jason Wang)
>  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
>  - Split two new patches (patch 27/28) from patch 29
>  - Fix some other comments from Dave and Markus.
> 
> v13:
>  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
>   instead of return value to indicate success or failure. (patch 10)
>  - Remove the optional error message for COLO_EXIT event. (patch 25)
>  - Use semaphore to notify colo/colo incoming loop that failover work is
>    finished. (patch 26)
>  - Move COLO shutdown related codes to colo.c file. (patch 28)
>  - Fix memory leak bug for colo incoming loop. (new patch 31)
>  - Re-use some existed helper functions to realize the process of
>    saving/loading ram and device. (patch 32)
>  - Fix some other comments from Dave and Markus.
> 
> zhanghailiang (38):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'x-colo' to migration
>   COLO: migrate colo related info to secondary node
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   COLO/migration: Create a new communication path from destination to
>     source
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save PVM state to secondary side when do checkpoint
>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>   ram/COLO: Record the dirty pages that SVM received
>   COLO: Load VMState into qsb before restore it
>   COLO: Flush PVM's cached RAM into SVM's memory
>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
>   COLO: synchronize PVM's state to SVM periodically
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Introduce state to record failover process
>   COLO: Implement failover work for Primary VM
>   COLO: Implement failover work for Secondary VM
>   qmp event: Add COLO_EXIT event to notify users while exited from COLO
>   COLO failover: Shutdown related socket fd when do failover
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Process shutdown command for VM in COLO state
>   COLO: Update the global runstate after going into colo state
>   savevm: Introduce two helper functions for save/find loadvm_handlers
>     entry
>   migration/savevm: Add new helpers to process the different stages of
>     loadvm
>   migration/savevm: Export two helper functions for savevm process
>   COLO: Separate the process of saving/loading ram and device state
>   COLO: Split qemu_savevm_state_begin out of checkpoint process
>   net/filter: Add a 'status' property for filter object
>   filter-buffer: Accept zero interval
>   net: Add notifier/callback for netdev init
>   COLO/filter: add each netdev a buffer filter
>   COLO: manage the status of buffer filters for PVM
>   filter-buffer: make filter_buffer_flush() public
>   COLO: flush buffered packets in checkpoint process or exit COLO
>   COLO: Add block replication into colo process
> 
>  configure                     |  11 +
>  docs/qmp-events.txt           |  16 +
>  hmp-commands.hx               |  15 +
>  hmp.c                         |  15 +
>  hmp.h                         |   1 +
>  include/exec/ram_addr.h       |   1 +
>  include/migration/colo.h      |  42 ++
>  include/migration/failover.h  |  33 ++
>  include/migration/migration.h |  16 +
>  include/migration/qemu-file.h |   3 +-
>  include/net/filter.h          |   5 +
>  include/net/net.h             |   4 +
>  include/sysemu/sysemu.h       |   9 +
>  migration/Makefile.objs       |   2 +
>  migration/colo-comm.c         |  76 ++++
>  migration/colo-failover.c     |  83 ++++
>  migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
>  migration/migration.c         | 109 +++++-
>  migration/qemu-file-buf.c     |  61 +++
>  migration/ram.c               | 175 ++++++++-
>  migration/savevm.c            | 114 ++++--
>  net/filter-buffer.c           |  14 +-
>  net/filter.c                  |  40 ++
>  net/net.c                     |  33 ++
>  qapi-schema.json              | 104 ++++-
>  qapi/event.json               |  15 +
>  qemu-options.hx               |   4 +-
>  qmp-commands.hx               |  23 +-
>  stubs/Makefile.objs           |   1 +
>  stubs/migration-colo.c        |  54 +++
>  trace-events                  |   8 +
>  vl.c                          |  31 +-
>  32 files changed, 1908 insertions(+), 76 deletions(-)
>  create mode 100644 include/migration/colo.h
>  create mode 100644 include/migration/failover.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
@ 2016-02-26 12:52   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-26 12:52 UTC (permalink / raw)
  To: zhanghailiang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> There are several stages during loadvm process. In different stage,
> migration incoming processes different section.
> We want to control these stages more accuracy, to optimize the COLO
> capability.
> 
> Here we add two new helper functions: qemu_loadvm_state_begin()
> and qemu_load_device_state().
> Besides, we make qemu_loadvm_state_main() API public.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

It's interesting; there's not that much difference between the two
routines, and they only wrap a little around the loadvm_state_main,
but I can see it makes it clearer at the place they're called.

Dave

> ---
> v14:
> - Split from patch 'COLO: Separate the process of saving/loading
>   ram and device state
> ---
>  include/sysemu/sysemu.h |  3 +++
>  migration/savevm.c      | 38 +++++++++++++++++++++++++++++++++++---
>  2 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 91eeda3..c0694a1 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -134,6 +134,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>                                             uint64_t *length_list);
>  
>  int qemu_loadvm_state(QEMUFile *f);
> +int qemu_loadvm_state_begin(QEMUFile *f);
> +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> +int qemu_load_device_state(QEMUFile *f);
>  
>  typedef enum DisplayType
>  {
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 9e3c18a..954e0a7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1249,8 +1249,6 @@ enum LoadVMExitCodes {
>      LOADVM_QUIT     =  1,
>  };
>  
> -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> -
>  /* ------ incoming postcopy messages ------ */
>  /* 'advise' arrives before any transfers just to tell us that a postcopy
>   * *might* happen - it might be skipped if precopy transferred everything
> @@ -1832,7 +1830,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>      return 0;
>  }
>  
> -static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>      uint8_t section_type;
>      int ret;
> @@ -1965,6 +1963,40 @@ int qemu_loadvm_state(QEMUFile *f)
>      return ret;
>  }
>  
> +int qemu_loadvm_state_begin(QEMUFile *f)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    if (qemu_savevm_state_blocked(&local_err)) {
> +        error_report_err(local_err);
> +        return -EINVAL;
> +    }
> +    /* Load QEMU_VM_SECTION_START section */
> +    ret = qemu_loadvm_state_main(f, mis);
> +    if (ret < 0) {
> +        error_report("Failed to loadvm begin work: %d", ret);
> +    }
> +    return ret;
> +}
> +
> +int qemu_load_device_state(QEMUFile *f)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    int ret;
> +
> +    /* Load QEMU_VM_SECTION_FULL section */
> +    ret = qemu_loadvm_state_main(f, mis);
> +    if (ret < 0) {
> +        error_report("Failed to load device state: %d", ret);
> +        return ret;
> +    }
> +
> +    cpu_synchronize_all_post_init();
> +    return 0;
> +}
> +
>  void hmp_savevm(Monitor *mon, const QDict *qdict)
>  {
>      BlockDriverState *bs, *bs1;
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
@ 2016-02-26 13:00   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-26 13:00 UTC (permalink / raw)
  To: zhanghailiang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We add a new helper functions qemu_savevm_live_state(),
> and make qemu_save_device_state() public.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Again the extra function doesn't do much; but it's best to wrap
teh explicit EOF byte in a savevm function.

Dave

> ---
> v14:
> - New patch split from previous
>  'COLO: Separate the process of saving/loading ram and device state
> ---
>  include/sysemu/sysemu.h |  3 +++
>  migration/savevm.c      | 15 +++++++++++----
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index c0694a1..7b1748c 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -133,6 +133,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>                                             uint64_t *start_list,
>                                             uint64_t *length_list);
>  
> +void qemu_savevm_live_state(QEMUFile *f);
> +int qemu_save_device_state(QEMUFile *f);
> +
>  int qemu_loadvm_state(QEMUFile *f);
>  int qemu_loadvm_state_begin(QEMUFile *f);
>  int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 954e0a7..60c7b57 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1192,13 +1192,20 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      return ret;
>  }
>  
> -static int qemu_save_device_state(QEMUFile *f)
> +void qemu_savevm_live_state(QEMUFile *f)
>  {
> -    SaveStateEntry *se;
> +    /* save QEMU_VM_SECTION_END section */
> +    qemu_savevm_state_complete_precopy(f, true);
> +    qemu_put_byte(f, QEMU_VM_EOF);
> +}
>  
> -    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
> +int qemu_save_device_state(QEMUFile *f)
> +{
> +    SaveStateEntry *se;
>  
> +    if (!migration_in_colo_state()) {
> +        qemu_savevm_state_header(f);
> +    }
>      cpu_synchronize_all_states();
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state
  2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
@ 2016-02-26 13:16   ` Dr. David Alan Gilbert
  2016-02-27 10:03     ` Hailiang Zhang
  0 siblings, 1 reply; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-26 13:16 UTC (permalink / raw)
  To: zhanghailiang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We separate the process of saving/loading ram and device state when do
> checkpoint, we add new helpers for save/load ram/device. With this change,
> we can directly transfer ram from master to slave without using
> QEMUSizeBufferas as assistant, which also reduce the size of extra memory
> been used during checkpoint.
> 
> Besides, we move the colo_flush_ram_cache to the proper position after the
> above change.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v14:
> - split two new patches from this patch
> - Some minor fixes from Dave
> v13:
> - Re-use some existed helper functions to realize saving/loading
>   ram and device.
> v11:
> - Remove load configuration section in qemu_loadvm_state_begin()
> ---
>  migration/colo.c   | 48 ++++++++++++++++++++++++++++++++++++++----------
>  migration/ram.c    |  5 -----
>  migration/savevm.c |  5 +++++
>  3 files changed, 43 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 16bada6..300fa54 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -288,21 +288,37 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>          goto out;
>      }
>  
> +    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +
>      /* Disable block migration */
>      s->params.blk = 0;
>      s->params.shared = 0;
> -    qemu_savevm_state_header(trans);
> -    qemu_savevm_state_begin(trans, &s->params);
> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
> +    ret = qemu_file_get_error(s->to_dst_file);
> +    if (ret < 0) {
> +        error_report("Save vm state begin error");
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
> -    qemu_savevm_state_complete_precopy(trans, false);
> +    /*
> +    * Only save VM's live state, which not including device state.
> +    * TODO: We may need a timeout mechanism to prevent COLO process
> +    * to be blocked here.
> +    */
> +    qemu_savevm_live_state(s->to_dst_file);
> +    /* Note: device state is saved into buffer */
> +    ret = qemu_save_device_state(trans);
>      qemu_mutex_unlock_iothread();

Yes, I still worry a little about what can hang under that lock, but I think
it's the best we've got at the moment; we probably need to understand what
the rules are about what actually needs the lock!

Other than that,

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -
> -    qemu_fflush(trans);
> -
> -    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
> -    if (local_err) {
> +    if (ret < 0) {
> +        error_report("Save device state error");
>          goto out;
>      }
> +    qemu_fflush(trans);
> +
>      /* we send the total size of the vmstate first */
>      size = qsb_get_length(buffer);
>      colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
> @@ -573,6 +589,16 @@ void *colo_process_incoming_thread(void *opaque)
>              goto out;
>          }
>  
> +        ret = qemu_loadvm_state_begin(mis->from_src_file);
> +        if (ret < 0) {
> +            error_report("Load vm state begin error, ret=%d", ret);
> +            goto out;
> +        }
> +        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
> +        if (ret < 0) {
> +            error_report("Load VM's live state (ram) error");
> +            goto out;
> +        }
>          /* read the VM state total size first */
>          value = colo_get_cmd_value(mis->from_src_file,
>                                   COLO_MESSAGE_VMSTATE_SIZE, &local_err);
> @@ -605,8 +631,10 @@ void *colo_process_incoming_thread(void *opaque)
>          qemu_mutex_lock_iothread();
>          qemu_system_reset(VMRESET_SILENT);
>          vmstate_loading = true;
> -        if (qemu_loadvm_state(fb) < 0) {
> -            error_report("COLO: loadvm failed");
> +        colo_flush_ram_cache();
> +        ret = qemu_load_device_state(fb);
> +        if (ret < 0) {
> +            error_report("COLO: load device state failed");
>              qemu_mutex_unlock_iothread();
>              goto out;
>          }
> diff --git a/migration/ram.c b/migration/ram.c
> index 891f3b2..8f416d5 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2465,7 +2465,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       * be atomic
>       */
>      bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
> -    bool need_flush = false;
>  
>      seq_iter++;
>  
> @@ -2500,7 +2499,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              /* After going into COLO, we should load the Page into colo_cache */
>              if (ram_cache_enable) {
>                  host = colo_cache_from_block_offset(block, addr);
> -                need_flush = true;
>              } else {
>                  host = host_from_ram_block_offset(block, addr);
>              }
> @@ -2594,9 +2592,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  
>      rcu_read_unlock();
>  
> -    if (!ret  && ram_cache_enable && need_flush) {
> -        colo_flush_ram_cache();
> -    }
>      DPRINTF("Completed load of VM with exit code %d seq iteration "
>              "%" PRIu64 "\n", ret, seq_iter);
>      return ret;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 60c7b57..1551fbb 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -50,6 +50,7 @@
>  #include "qemu/iov.h"
>  #include "block/snapshot.h"
>  #include "block/qapi.h"
> +#include "migration/colo.h"
>  
>  
>  #ifndef ETH_P_RARP
> @@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
>              break;
>          }
>      }
> +    if (migration_in_colo_state()) {
> +        qemu_put_byte(f, QEMU_VM_EOF);
> +        qemu_fflush(f);
> +    }
>  }
>  
>  /*
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
@ 2016-02-26 16:36   ` Dr. David Alan Gilbert
  2016-02-27  7:54     ` Hailiang Zhang
  0 siblings, 1 reply; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-26 16:36 UTC (permalink / raw)
  To: zhanghailiang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> > From: root <root@localhost.localdomain>
> > 
> > This is the 15th version of COLO (Still only support periodic checkpoint).
> > 
> > Here is only COLO frame part, you can get the whole codes from github:
> > https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> > 
> > There are little changes for this series except the network releated part.
> 
> I was looking at the time the guest is paused during COLO and
> was surprised to find one of the larger chunks was the time to reset
> the guest before loading each checkpoint;  I've traced it part way, the
> biggest contributors for my test VM seem to be:
> 
>   3.8ms  pcibus_reset: VGA
>   1.8ms  pcibus_reset: virtio-net-pci
>   1.5ms  pcibus_reset: virtio-blk-pci
>   1.5ms  qemu_devices_reset: piix4_reset
>   1.1ms  pcibus_reset: piix3-ide
>   1.1ms  pcibus_reset: virtio-rng-pci
> 
> I've not looked deeper yet, but some of these are very silly;
> I'm running with -nographic so why it's taking 3.8ms to reset VGA is 
> going to be interesting.
> Also, my only block device is the virtio-blk, so while I understand the
> standard PC machine has the IDE controller, why it takes it over a ms
> to reset an unused device.

OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.

Dave

> 
> I guess reset is normally off anyones radar since it's outside
> the time anyone cares about, but I guess perhaps the guys trying
> to make qemu start really quickly would be interested.
> 
> Dave
> 
> > 
> > Patch status:
> > Unreviewed: patch 21,27,28,29,33,38
> > Updated: patch 31,34,35,37
> > 
> > TODO:
> > 1. Checkpoint based on proxy in qemu
> > 2. The capability of continuous FT
> > 3. Optimize the VM's downtime during checkpoint
> > 
> > v15:
> >  - Go on the shutdown process if encounter error while sending shutdown
> >    message to SVM. (patch 24)
> >  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
> >    some useless comment. (patch 31, Jason)
> >  - Call object_new_with_props() directly to add filter in
> >    colo_add_buffer_filter. (patch 34, Jason)
> >  - Re-implement colo_set_filter_status() based on COLOBufferFilters
> >    list. (patch 35)
> >  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
> >    list. (patch 37) 
> > v14:
> >  - Re-implement the network processing based on netfilter (Jason Wang)
> >  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
> >  - Split two new patches (patch 27/28) from patch 29
> >  - Fix some other comments from Dave and Markus.
> > 
> > v13:
> >  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
> >   instead of return value to indicate success or failure. (patch 10)
> >  - Remove the optional error message for COLO_EXIT event. (patch 25)
> >  - Use semaphore to notify colo/colo incoming loop that failover work is
> >    finished. (patch 26)
> >  - Move COLO shutdown related codes to colo.c file. (patch 28)
> >  - Fix memory leak bug for colo incoming loop. (new patch 31)
> >  - Re-use some existed helper functions to realize the process of
> >    saving/loading ram and device. (patch 32)
> >  - Fix some other comments from Dave and Markus.
> > 
> > zhanghailiang (38):
> >   configure: Add parameter for configure to enable/disable COLO support
> >   migration: Introduce capability 'x-colo' to migration
> >   COLO: migrate colo related info to secondary node
> >   migration: Integrate COLO checkpoint process into migration
> >   migration: Integrate COLO checkpoint process into loadvm
> >   COLO/migration: Create a new communication path from destination to
> >     source
> >   COLO: Implement colo checkpoint protocol
> >   COLO: Add a new RunState RUN_STATE_COLO
> >   QEMUSizedBuffer: Introduce two help functions for qsb
> >   COLO: Save PVM state to secondary side when do checkpoint
> >   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
> >   ram/COLO: Record the dirty pages that SVM received
> >   COLO: Load VMState into qsb before restore it
> >   COLO: Flush PVM's cached RAM into SVM's memory
> >   COLO: Add checkpoint-delay parameter for migrate-set-parameters
> >   COLO: synchronize PVM's state to SVM periodically
> >   COLO failover: Introduce a new command to trigger a failover
> >   COLO failover: Introduce state to record failover process
> >   COLO: Implement failover work for Primary VM
> >   COLO: Implement failover work for Secondary VM
> >   qmp event: Add COLO_EXIT event to notify users while exited from COLO
> >   COLO failover: Shutdown related socket fd when do failover
> >   COLO failover: Don't do failover during loading VM's state
> >   COLO: Process shutdown command for VM in COLO state
> >   COLO: Update the global runstate after going into colo state
> >   savevm: Introduce two helper functions for save/find loadvm_handlers
> >     entry
> >   migration/savevm: Add new helpers to process the different stages of
> >     loadvm
> >   migration/savevm: Export two helper functions for savevm process
> >   COLO: Separate the process of saving/loading ram and device state
> >   COLO: Split qemu_savevm_state_begin out of checkpoint process
> >   net/filter: Add a 'status' property for filter object
> >   filter-buffer: Accept zero interval
> >   net: Add notifier/callback for netdev init
> >   COLO/filter: add each netdev a buffer filter
> >   COLO: manage the status of buffer filters for PVM
> >   filter-buffer: make filter_buffer_flush() public
> >   COLO: flush buffered packets in checkpoint process or exit COLO
> >   COLO: Add block replication into colo process
> > 
> >  configure                     |  11 +
> >  docs/qmp-events.txt           |  16 +
> >  hmp-commands.hx               |  15 +
> >  hmp.c                         |  15 +
> >  hmp.h                         |   1 +
> >  include/exec/ram_addr.h       |   1 +
> >  include/migration/colo.h      |  42 ++
> >  include/migration/failover.h  |  33 ++
> >  include/migration/migration.h |  16 +
> >  include/migration/qemu-file.h |   3 +-
> >  include/net/filter.h          |   5 +
> >  include/net/net.h             |   4 +
> >  include/sysemu/sysemu.h       |   9 +
> >  migration/Makefile.objs       |   2 +
> >  migration/colo-comm.c         |  76 ++++
> >  migration/colo-failover.c     |  83 ++++
> >  migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
> >  migration/migration.c         | 109 +++++-
> >  migration/qemu-file-buf.c     |  61 +++
> >  migration/ram.c               | 175 ++++++++-
> >  migration/savevm.c            | 114 ++++--
> >  net/filter-buffer.c           |  14 +-
> >  net/filter.c                  |  40 ++
> >  net/net.c                     |  33 ++
> >  qapi-schema.json              | 104 ++++-
> >  qapi/event.json               |  15 +
> >  qemu-options.hx               |   4 +-
> >  qmp-commands.hx               |  23 +-
> >  stubs/Makefile.objs           |   1 +
> >  stubs/migration-colo.c        |  54 +++
> >  trace-events                  |   8 +
> >  vl.c                          |  31 +-
> >  32 files changed, 1908 insertions(+), 76 deletions(-)
> >  create mode 100644 include/migration/colo.h
> >  create mode 100644 include/migration/failover.h
> >  create mode 100644 migration/colo-comm.c
> >  create mode 100644 migration/colo-failover.c
> >  create mode 100644 migration/colo.c
> >  create mode 100644 stubs/migration-colo.c
> > 
> > -- 
> > 1.8.3.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-26 16:36   ` Dr. David Alan Gilbert
@ 2016-02-27  7:54     ` Hailiang Zhang
  2016-02-29  9:47       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 52+ messages in thread
From: Hailiang Zhang @ 2016-02-27  7:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>> From: root <root@localhost.localdomain>
>>>
>>> This is the 15th version of COLO (Still only support periodic checkpoint).
>>>
>>> Here is only COLO frame part, you can get the whole codes from github:
>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
>>>
>>> There are little changes for this series except the network releated part.
>>
>> I was looking at the time the guest is paused during COLO and
>> was surprised to find one of the larger chunks was the time to reset
>> the guest before loading each checkpoint;  I've traced it part way, the
>> biggest contributors for my test VM seem to be:
>>
>>    3.8ms  pcibus_reset: VGA
>>    1.8ms  pcibus_reset: virtio-net-pci
>>    1.5ms  pcibus_reset: virtio-blk-pci
>>    1.5ms  qemu_devices_reset: piix4_reset
>>    1.1ms  pcibus_reset: piix3-ide
>>    1.1ms  pcibus_reset: virtio-rng-pci
>>
>> I've not looked deeper yet, but some of these are very silly;
>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
>> going to be interesting.
>> Also, my only block device is the virtio-blk, so while I understand the
>> standard PC machine has the IDE controller, why it takes it over a ms
>> to reset an unused device.
>
> OK, so I've dug a bit deeper, and it appears that it's the changes in
> PCI bars that actually take the time;  every time we do a reset we
> reset all the BARs, this causes it to do a pci_update_mappings and
> end up doing a memory_region_del_subregion.
> Then we load the config space of the PCI device as we do the vmstate_load,
> and this recreates all the mappings again.
>
> I'm not sure what the fix is, but that sounds like it would
> speed up the checkpoints usefully if we can avoid the map/remap when
> they're the same.
>

Interesting, and thanks for your report.

We already known qemu_system_reset() is a time-consuming function, we shouldn't
call it here, but if we didn't do that, there will be a bug, which we have
reported before in the previous COLO series, the bellow is the copy of the related
patch comment:

     COLO VMstate: Load VM state into qsb before restore it

     We should not destroy the state of secondary until we receive the whole
     state from the primary, in case the primary fails in the middle of sending
     the state, so, here we cache the device state in Secondary before restore it.

     Besides, we should call qemu_system_reset() before load VM state,
     which can ensure the data is intact.
     Note: If we discard qemu_system_reset(), there will be some odd error,
     For exmple, qemu in slave side crashes and reports:

     KVM: entry failed, hardware error 0x7
     EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
     ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
     EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
     ES =0040 00000400 0000ffff 00009300
     CS =f000 000f0000 0000ffff 00009b00
     SS =434f 000434f0 0000ffff 00009300
     DS =434f 000434f0 0000ffff 00009300
     FS =0000 00000000 0000ffff 00009300
     GS =0000 00000000 0000ffff 00009300
     LDT=0000 00000000 0000ffff 00008200
     TR =0000 00000000 0000ffff 00008b00
     GDT=     0002dcc8 00000047
     IDT=     00000000 0000ffff
     CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
     DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
     DR6=00000000ffff0ff0 DR7=0000000000000400
     EFER=0000000000000000
     Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3
     ERROR: invalid runstate transition: 'internal-error' -> 'colo'

     The reason is, some of the device state will be ignored when saving device state to slave,
     if the corresponding data is in its initial value, such as 0.
     But the device state in slave maybe in initialized value, after a loop of checkpoint,
     there will be inconsistent for the value of device state.
     This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
     Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
     Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
     Signed-off-by: Gonglei <arei.gonglei@huawei.com>
     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com

As described above, some values of the device state are zero, they will be
ignored during  migration, it has no problem for normal migration, because
for the VM in destination, the initial values will be zero too. But for COLO,
there are more than one round of migration, the related values may be changed
from no-zero to zero, they will be ignored too in the next checkpoint, the
VMstate will be inconsistent for SVM.

The above error is caused directly by wrong value of 'async_pf_en_msr'.

static const VMStateDescription vmstate_async_pf_msr = {
     .name = "cpu/async_pf_msr",
     .version_id = 1,
     .minimum_version_id = 1,
     .needed = async_pf_msr_needed,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(env.async_pf_en_msr, X86CPU),
         VMSTATE_END_OF_LIST()
     }
};

static bool async_pf_msr_needed(void *opaque)
{
     X86CPU *cpu = opaque;

     return cpu->env.async_pf_en_msr != 0;
}

Some other VMstate of registers in CPUX86State have the same problem,
we can't make sure they won't cause any problems if the values of them
are incorrect.
So here, we just simply call qemu_system_reset() to avoid the inconsistent
problem.
Besides, compared with the most time-consuming operation (ram flushed from
COLO cache to SVM). The time consuming for qemu_system_reset() seems to be
acceptable ;)

Another choice to fix the problem is to save the VMstate ignoring the needed()
return value, but this method is not so graceful.
diff --git a/migration/vmstate.c b/migration/vmstate.c
index e5388f0..7d15bba 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -409,7 +409,7 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
      bool subsection_found = false;

      while (sub && sub->needed) {
-        if (sub->needed(opaque)) {
+        if (sub->needed(opaque) || migrate_in_colo_state()) {
              const VMStateDescription *vmsd = sub->vmsd;
              uint8_t len;


Thanks,
Hailiang

> Dave
>
>>
>> I guess reset is normally off anyones radar since it's outside
>> the time anyone cares about, but I guess perhaps the guys trying
>> to make qemu start really quickly would be interested.
>>
>> Dave
>>
>>>
>>> Patch status:
>>> Unreviewed: patch 21,27,28,29,33,38
>>> Updated: patch 31,34,35,37
>>>
>>> TODO:
>>> 1. Checkpoint based on proxy in qemu
>>> 2. The capability of continuous FT
>>> 3. Optimize the VM's downtime during checkpoint
>>>
>>> v15:
>>>   - Go on the shutdown process if encounter error while sending shutdown
>>>     message to SVM. (patch 24)
>>>   - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
>>>     some useless comment. (patch 31, Jason)
>>>   - Call object_new_with_props() directly to add filter in
>>>     colo_add_buffer_filter. (patch 34, Jason)
>>>   - Re-implement colo_set_filter_status() based on COLOBufferFilters
>>>     list. (patch 35)
>>>   - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
>>>     list. (patch 37)
>>> v14:
>>>   - Re-implement the network processing based on netfilter (Jason Wang)
>>>   - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
>>>   - Split two new patches (patch 27/28) from patch 29
>>>   - Fix some other comments from Dave and Markus.
>>>
>>> v13:
>>>   - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
>>>    instead of return value to indicate success or failure. (patch 10)
>>>   - Remove the optional error message for COLO_EXIT event. (patch 25)
>>>   - Use semaphore to notify colo/colo incoming loop that failover work is
>>>     finished. (patch 26)
>>>   - Move COLO shutdown related codes to colo.c file. (patch 28)
>>>   - Fix memory leak bug for colo incoming loop. (new patch 31)
>>>   - Re-use some existed helper functions to realize the process of
>>>     saving/loading ram and device. (patch 32)
>>>   - Fix some other comments from Dave and Markus.
>>>
>>> zhanghailiang (38):
>>>    configure: Add parameter for configure to enable/disable COLO support
>>>    migration: Introduce capability 'x-colo' to migration
>>>    COLO: migrate colo related info to secondary node
>>>    migration: Integrate COLO checkpoint process into migration
>>>    migration: Integrate COLO checkpoint process into loadvm
>>>    COLO/migration: Create a new communication path from destination to
>>>      source
>>>    COLO: Implement colo checkpoint protocol
>>>    COLO: Add a new RunState RUN_STATE_COLO
>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>    COLO: Save PVM state to secondary side when do checkpoint
>>>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>>>    ram/COLO: Record the dirty pages that SVM received
>>>    COLO: Load VMState into qsb before restore it
>>>    COLO: Flush PVM's cached RAM into SVM's memory
>>>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>>>    COLO: synchronize PVM's state to SVM periodically
>>>    COLO failover: Introduce a new command to trigger a failover
>>>    COLO failover: Introduce state to record failover process
>>>    COLO: Implement failover work for Primary VM
>>>    COLO: Implement failover work for Secondary VM
>>>    qmp event: Add COLO_EXIT event to notify users while exited from COLO
>>>    COLO failover: Shutdown related socket fd when do failover
>>>    COLO failover: Don't do failover during loading VM's state
>>>    COLO: Process shutdown command for VM in COLO state
>>>    COLO: Update the global runstate after going into colo state
>>>    savevm: Introduce two helper functions for save/find loadvm_handlers
>>>      entry
>>>    migration/savevm: Add new helpers to process the different stages of
>>>      loadvm
>>>    migration/savevm: Export two helper functions for savevm process
>>>    COLO: Separate the process of saving/loading ram and device state
>>>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>>>    net/filter: Add a 'status' property for filter object
>>>    filter-buffer: Accept zero interval
>>>    net: Add notifier/callback for netdev init
>>>    COLO/filter: add each netdev a buffer filter
>>>    COLO: manage the status of buffer filters for PVM
>>>    filter-buffer: make filter_buffer_flush() public
>>>    COLO: flush buffered packets in checkpoint process or exit COLO
>>>    COLO: Add block replication into colo process
>>>
>>>   configure                     |  11 +
>>>   docs/qmp-events.txt           |  16 +
>>>   hmp-commands.hx               |  15 +
>>>   hmp.c                         |  15 +
>>>   hmp.h                         |   1 +
>>>   include/exec/ram_addr.h       |   1 +
>>>   include/migration/colo.h      |  42 ++
>>>   include/migration/failover.h  |  33 ++
>>>   include/migration/migration.h |  16 +
>>>   include/migration/qemu-file.h |   3 +-
>>>   include/net/filter.h          |   5 +
>>>   include/net/net.h             |   4 +
>>>   include/sysemu/sysemu.h       |   9 +
>>>   migration/Makefile.objs       |   2 +
>>>   migration/colo-comm.c         |  76 ++++
>>>   migration/colo-failover.c     |  83 ++++
>>>   migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
>>>   migration/migration.c         | 109 +++++-
>>>   migration/qemu-file-buf.c     |  61 +++
>>>   migration/ram.c               | 175 ++++++++-
>>>   migration/savevm.c            | 114 ++++--
>>>   net/filter-buffer.c           |  14 +-
>>>   net/filter.c                  |  40 ++
>>>   net/net.c                     |  33 ++
>>>   qapi-schema.json              | 104 ++++-
>>>   qapi/event.json               |  15 +
>>>   qemu-options.hx               |   4 +-
>>>   qmp-commands.hx               |  23 +-
>>>   stubs/Makefile.objs           |   1 +
>>>   stubs/migration-colo.c        |  54 +++
>>>   trace-events                  |   8 +
>>>   vl.c                          |  31 +-
>>>   32 files changed, 1908 insertions(+), 76 deletions(-)
>>>   create mode 100644 include/migration/colo.h
>>>   create mode 100644 include/migration/failover.h
>>>   create mode 100644 migration/colo-comm.c
>>>   create mode 100644 migration/colo-failover.c
>>>   create mode 100644 migration/colo.c
>>>   create mode 100644 stubs/migration-colo.c
>>>
>>> --
>>> 1.8.3.1
>>>
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state
  2016-02-26 13:16   ` Dr. David Alan Gilbert
@ 2016-02-27 10:03     ` Hailiang Zhang
  0 siblings, 0 replies; 52+ messages in thread
From: Hailiang Zhang @ 2016-02-27 10:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

On 2016/2/26 21:16, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We separate the process of saving/loading ram and device state when do
>> checkpoint, we add new helpers for save/load ram/device. With this change,
>> we can directly transfer ram from master to slave without using
>> QEMUSizeBufferas as assistant, which also reduce the size of extra memory
>> been used during checkpoint.
>>
>> Besides, we move the colo_flush_ram_cache to the proper position after the
>> above change.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v14:
>> - split two new patches from this patch
>> - Some minor fixes from Dave
>> v13:
>> - Re-use some existed helper functions to realize saving/loading
>>    ram and device.
>> v11:
>> - Remove load configuration section in qemu_loadvm_state_begin()
>> ---
>>   migration/colo.c   | 48 ++++++++++++++++++++++++++++++++++++++----------
>>   migration/ram.c    |  5 -----
>>   migration/savevm.c |  5 +++++
>>   3 files changed, 43 insertions(+), 15 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 16bada6..300fa54 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -288,21 +288,37 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>           goto out;
>>       }
>>
>> +    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
>> +    if (local_err) {
>> +        goto out;
>> +    }
>> +
>>       /* Disable block migration */
>>       s->params.blk = 0;
>>       s->params.shared = 0;
>> -    qemu_savevm_state_header(trans);
>> -    qemu_savevm_state_begin(trans, &s->params);
>> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
>> +    ret = qemu_file_get_error(s->to_dst_file);
>> +    if (ret < 0) {
>> +        error_report("Save vm state begin error");
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>> -    qemu_savevm_state_complete_precopy(trans, false);
>> +    /*
>> +    * Only save VM's live state, which not including device state.
>> +    * TODO: We may need a timeout mechanism to prevent COLO process
>> +    * to be blocked here.
>> +    */
>> +    qemu_savevm_live_state(s->to_dst_file);
>> +    /* Note: device state is saved into buffer */
>> +    ret = qemu_save_device_state(trans);
>>       qemu_mutex_unlock_iothread();
>
> Yes, I still worry a little about what can hang under that lock, but I think

Hmm, we have some other places in COLO taking this lock too. Some of them are
OK with holding it. But holding it while sending/receiving date from other side
in COLO is dangerous. One solution is to apply timeout for QEMUFile operation,
But it is not a good choice.

> it's the best we've got at the moment; we probably need to understand what
> the rules are about what actually needs the lock!
>

Yes, this is another way to solve the problem. Don't hold the lock while
sending/receiving date.
For qemu_savevm_live_state() here, IMHO, we can send the remained RAM data
without holding the global lock. Here we hold the lock, because we
call cpu_synchronize_all_states() in it. (I'm not sure ...)

Thanks,
Hailiang

> Other than that,
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>> -
>> -    qemu_fflush(trans);
>> -
>> -    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
>> -    if (local_err) {
>> +    if (ret < 0) {
>> +        error_report("Save device state error");
>>           goto out;
>>       }
>> +    qemu_fflush(trans);
>> +
>>       /* we send the total size of the vmstate first */
>>       size = qsb_get_length(buffer);
>>       colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
>> @@ -573,6 +589,16 @@ void *colo_process_incoming_thread(void *opaque)
>>               goto out;
>>           }
>>
>> +        ret = qemu_loadvm_state_begin(mis->from_src_file);
>> +        if (ret < 0) {
>> +            error_report("Load vm state begin error, ret=%d", ret);
>> +            goto out;
>> +        }
>> +        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
>> +        if (ret < 0) {
>> +            error_report("Load VM's live state (ram) error");
>> +            goto out;
>> +        }
>>           /* read the VM state total size first */
>>           value = colo_get_cmd_value(mis->from_src_file,
>>                                    COLO_MESSAGE_VMSTATE_SIZE, &local_err);
>> @@ -605,8 +631,10 @@ void *colo_process_incoming_thread(void *opaque)
>>           qemu_mutex_lock_iothread();
>>           qemu_system_reset(VMRESET_SILENT);
>>           vmstate_loading = true;
>> -        if (qemu_loadvm_state(fb) < 0) {
>> -            error_report("COLO: loadvm failed");
>> +        colo_flush_ram_cache();
>> +        ret = qemu_load_device_state(fb);
>> +        if (ret < 0) {
>> +            error_report("COLO: load device state failed");
>>               qemu_mutex_unlock_iothread();
>>               goto out;
>>           }
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 891f3b2..8f416d5 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2465,7 +2465,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>        * be atomic
>>        */
>>       bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
>> -    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -2500,7 +2499,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               /* After going into COLO, we should load the Page into colo_cache */
>>               if (ram_cache_enable) {
>>                   host = colo_cache_from_block_offset(block, addr);
>> -                need_flush = true;
>>               } else {
>>                   host = host_from_ram_block_offset(block, addr);
>>               }
>> @@ -2594,9 +2592,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>
>>       rcu_read_unlock();
>>
>> -    if (!ret  && ram_cache_enable && need_flush) {
>> -        colo_flush_ram_cache();
>> -    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 60c7b57..1551fbb 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -50,6 +50,7 @@
>>   #include "qemu/iov.h"
>>   #include "block/snapshot.h"
>>   #include "block/qapi.h"
>> +#include "migration/colo.h"
>>
>>
>>   #ifndef ETH_P_RARP
>> @@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
>>               break;
>>           }
>>       }
>> +    if (migration_in_colo_state()) {
>> +        qemu_put_byte(f, QEMU_VM_EOF);
>> +        qemu_fflush(f);
>> +    }
>>   }
>>
>>   /*
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-27  7:54     ` Hailiang Zhang
@ 2016-02-29  9:47       ` Dr. David Alan Gilbert
  2016-02-29 12:16         ` Hailiang Zhang
  0 siblings, 1 reply; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-29  9:47 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>From: root <root@localhost.localdomain>
> >>>
> >>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>
> >>>Here is only COLO frame part, you can get the whole codes from github:
> >>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>
> >>>There are little changes for this series except the network releated part.
> >>
> >>I was looking at the time the guest is paused during COLO and
> >>was surprised to find one of the larger chunks was the time to reset
> >>the guest before loading each checkpoint;  I've traced it part way, the
> >>biggest contributors for my test VM seem to be:
> >>
> >>   3.8ms  pcibus_reset: VGA
> >>   1.8ms  pcibus_reset: virtio-net-pci
> >>   1.5ms  pcibus_reset: virtio-blk-pci
> >>   1.5ms  qemu_devices_reset: piix4_reset
> >>   1.1ms  pcibus_reset: piix3-ide
> >>   1.1ms  pcibus_reset: virtio-rng-pci
> >>
> >>I've not looked deeper yet, but some of these are very silly;
> >>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>going to be interesting.
> >>Also, my only block device is the virtio-blk, so while I understand the
> >>standard PC machine has the IDE controller, why it takes it over a ms
> >>to reset an unused device.
> >
> >OK, so I've dug a bit deeper, and it appears that it's the changes in
> >PCI bars that actually take the time;  every time we do a reset we
> >reset all the BARs, this causes it to do a pci_update_mappings and
> >end up doing a memory_region_del_subregion.
> >Then we load the config space of the PCI device as we do the vmstate_load,
> >and this recreates all the mappings again.
> >
> >I'm not sure what the fix is, but that sounds like it would
> >speed up the checkpoints usefully if we can avoid the map/remap when
> >they're the same.
> >
> 
> Interesting, and thanks for your report.
> 
> We already known qemu_system_reset() is a time-consuming function, we shouldn't
> call it here, but if we didn't do that, there will be a bug, which we have
> reported before in the previous COLO series, the bellow is the copy of the related
> patch comment:
> 
>     COLO VMstate: Load VM state into qsb before restore it
> 
>     We should not destroy the state of secondary until we receive the whole
>     state from the primary, in case the primary fails in the middle of sending
>     the state, so, here we cache the device state in Secondary before restore it.
> 
>     Besides, we should call qemu_system_reset() before load VM state,
>     which can ensure the data is intact.
>     Note: If we discard qemu_system_reset(), there will be some odd error,
>     For exmple, qemu in slave side crashes and reports:
> 
>     KVM: entry failed, hardware error 0x7
>     EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
>     ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
>     EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>     ES =0040 00000400 0000ffff 00009300
>     CS =f000 000f0000 0000ffff 00009b00
>     SS =434f 000434f0 0000ffff 00009300
>     DS =434f 000434f0 0000ffff 00009300
>     FS =0000 00000000 0000ffff 00009300
>     GS =0000 00000000 0000ffff 00009300
>     LDT=0000 00000000 0000ffff 00008200
>     TR =0000 00000000 0000ffff 00008b00
>     GDT=     0002dcc8 00000047
>     IDT=     00000000 0000ffff
>     CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
>     DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>     DR6=00000000ffff0ff0 DR7=0000000000000400
>     EFER=0000000000000000
>     Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3
>     ERROR: invalid runstate transition: 'internal-error' -> 'colo'
> 
>     The reason is, some of the device state will be ignored when saving device state to slave,
>     if the corresponding data is in its initial value, such as 0.
>     But the device state in slave maybe in initialized value, after a loop of checkpoint,
>     there will be inconsistent for the value of device state.
>     This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
>     Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>     Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>     Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com
> 
> As described above, some values of the device state are zero, they will be
> ignored during  migration, it has no problem for normal migration, because
> for the VM in destination, the initial values will be zero too. But for COLO,
> there are more than one round of migration, the related values may be changed
> from no-zero to zero, they will be ignored too in the next checkpoint, the
> VMstate will be inconsistent for SVM.

Yes, this doesn't really surprise me; a lot of the migration code does weird things
like this, so reset is the safest way.

> The above error is caused directly by wrong value of 'async_pf_en_msr'.
> 
> static const VMStateDescription vmstate_async_pf_msr = {
>     .name = "cpu/async_pf_msr",
>     .version_id = 1,
>     .minimum_version_id = 1,
>     .needed = async_pf_msr_needed,
>     .fields = (VMStateField[]) {
>         VMSTATE_UINT64(env.async_pf_en_msr, X86CPU),
>         VMSTATE_END_OF_LIST()
>     }
> };
> 
> static bool async_pf_msr_needed(void *opaque)
> {
>     X86CPU *cpu = opaque;
> 
>     return cpu->env.async_pf_en_msr != 0;
> }
> 
> Some other VMstate of registers in CPUX86State have the same problem,
> we can't make sure they won't cause any problems if the values of them
> are incorrect.
> So here, we just simply call qemu_system_reset() to avoid the inconsistent
> problem.
> Besides, compared with the most time-consuming operation (ram flushed from
> COLO cache to SVM). The time consuming for qemu_system_reset() seems to be
> acceptable ;)

I've got a patch where I've tried to multithread the flush - it's made it a little
faster, but not as much as I hoped (~20ms down to ~16ms using 4 cores)

> Another choice to fix the problem is to save the VMstate ignoring the needed()
> return value, but this method is not so graceful.
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index e5388f0..7d15bba 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -409,7 +409,7 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
>      bool subsection_found = false;
> 
>      while (sub && sub->needed) {
> -        if (sub->needed(opaque)) {
> +        if (sub->needed(opaque) || migrate_in_colo_state()) {
>              const VMStateDescription *vmsd = sub->vmsd;
>              uint8_t len;

Maybe, but I suspect this will also find other strange cases in devices.
For example, without the reset I wouldn't really trust devices to load the
new state in; they wouldn't have been tested with all the states they
might have left themselves in previously.

Dave



> Thanks,
> Hailiang
> 
> >Dave
> >
> >>
> >>I guess reset is normally off anyones radar since it's outside
> >>the time anyone cares about, but I guess perhaps the guys trying
> >>to make qemu start really quickly would be interested.
> >>
> >>Dave
> >>
> >>>
> >>>Patch status:
> >>>Unreviewed: patch 21,27,28,29,33,38
> >>>Updated: patch 31,34,35,37
> >>>
> >>>TODO:
> >>>1. Checkpoint based on proxy in qemu
> >>>2. The capability of continuous FT
> >>>3. Optimize the VM's downtime during checkpoint
> >>>
> >>>v15:
> >>>  - Go on the shutdown process if encounter error while sending shutdown
> >>>    message to SVM. (patch 24)
> >>>  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
> >>>    some useless comment. (patch 31, Jason)
> >>>  - Call object_new_with_props() directly to add filter in
> >>>    colo_add_buffer_filter. (patch 34, Jason)
> >>>  - Re-implement colo_set_filter_status() based on COLOBufferFilters
> >>>    list. (patch 35)
> >>>  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
> >>>    list. (patch 37)
> >>>v14:
> >>>  - Re-implement the network processing based on netfilter (Jason Wang)
> >>>  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
> >>>  - Split two new patches (patch 27/28) from patch 29
> >>>  - Fix some other comments from Dave and Markus.
> >>>
> >>>v13:
> >>>  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
> >>>   instead of return value to indicate success or failure. (patch 10)
> >>>  - Remove the optional error message for COLO_EXIT event. (patch 25)
> >>>  - Use semaphore to notify colo/colo incoming loop that failover work is
> >>>    finished. (patch 26)
> >>>  - Move COLO shutdown related codes to colo.c file. (patch 28)
> >>>  - Fix memory leak bug for colo incoming loop. (new patch 31)
> >>>  - Re-use some existed helper functions to realize the process of
> >>>    saving/loading ram and device. (patch 32)
> >>>  - Fix some other comments from Dave and Markus.
> >>>
> >>>zhanghailiang (38):
> >>>   configure: Add parameter for configure to enable/disable COLO support
> >>>   migration: Introduce capability 'x-colo' to migration
> >>>   COLO: migrate colo related info to secondary node
> >>>   migration: Integrate COLO checkpoint process into migration
> >>>   migration: Integrate COLO checkpoint process into loadvm
> >>>   COLO/migration: Create a new communication path from destination to
> >>>     source
> >>>   COLO: Implement colo checkpoint protocol
> >>>   COLO: Add a new RunState RUN_STATE_COLO
> >>>   QEMUSizedBuffer: Introduce two help functions for qsb
> >>>   COLO: Save PVM state to secondary side when do checkpoint
> >>>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
> >>>   ram/COLO: Record the dirty pages that SVM received
> >>>   COLO: Load VMState into qsb before restore it
> >>>   COLO: Flush PVM's cached RAM into SVM's memory
> >>>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
> >>>   COLO: synchronize PVM's state to SVM periodically
> >>>   COLO failover: Introduce a new command to trigger a failover
> >>>   COLO failover: Introduce state to record failover process
> >>>   COLO: Implement failover work for Primary VM
> >>>   COLO: Implement failover work for Secondary VM
> >>>   qmp event: Add COLO_EXIT event to notify users while exited from COLO
> >>>   COLO failover: Shutdown related socket fd when do failover
> >>>   COLO failover: Don't do failover during loading VM's state
> >>>   COLO: Process shutdown command for VM in COLO state
> >>>   COLO: Update the global runstate after going into colo state
> >>>   savevm: Introduce two helper functions for save/find loadvm_handlers
> >>>     entry
> >>>   migration/savevm: Add new helpers to process the different stages of
> >>>     loadvm
> >>>   migration/savevm: Export two helper functions for savevm process
> >>>   COLO: Separate the process of saving/loading ram and device state
> >>>   COLO: Split qemu_savevm_state_begin out of checkpoint process
> >>>   net/filter: Add a 'status' property for filter object
> >>>   filter-buffer: Accept zero interval
> >>>   net: Add notifier/callback for netdev init
> >>>   COLO/filter: add each netdev a buffer filter
> >>>   COLO: manage the status of buffer filters for PVM
> >>>   filter-buffer: make filter_buffer_flush() public
> >>>   COLO: flush buffered packets in checkpoint process or exit COLO
> >>>   COLO: Add block replication into colo process
> >>>
> >>>  configure                     |  11 +
> >>>  docs/qmp-events.txt           |  16 +
> >>>  hmp-commands.hx               |  15 +
> >>>  hmp.c                         |  15 +
> >>>  hmp.h                         |   1 +
> >>>  include/exec/ram_addr.h       |   1 +
> >>>  include/migration/colo.h      |  42 ++
> >>>  include/migration/failover.h  |  33 ++
> >>>  include/migration/migration.h |  16 +
> >>>  include/migration/qemu-file.h |   3 +-
> >>>  include/net/filter.h          |   5 +
> >>>  include/net/net.h             |   4 +
> >>>  include/sysemu/sysemu.h       |   9 +
> >>>  migration/Makefile.objs       |   2 +
> >>>  migration/colo-comm.c         |  76 ++++
> >>>  migration/colo-failover.c     |  83 ++++
> >>>  migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
> >>>  migration/migration.c         | 109 +++++-
> >>>  migration/qemu-file-buf.c     |  61 +++
> >>>  migration/ram.c               | 175 ++++++++-
> >>>  migration/savevm.c            | 114 ++++--
> >>>  net/filter-buffer.c           |  14 +-
> >>>  net/filter.c                  |  40 ++
> >>>  net/net.c                     |  33 ++
> >>>  qapi-schema.json              | 104 ++++-
> >>>  qapi/event.json               |  15 +
> >>>  qemu-options.hx               |   4 +-
> >>>  qmp-commands.hx               |  23 +-
> >>>  stubs/Makefile.objs           |   1 +
> >>>  stubs/migration-colo.c        |  54 +++
> >>>  trace-events                  |   8 +
> >>>  vl.c                          |  31 +-
> >>>  32 files changed, 1908 insertions(+), 76 deletions(-)
> >>>  create mode 100644 include/migration/colo.h
> >>>  create mode 100644 include/migration/failover.h
> >>>  create mode 100644 migration/colo-comm.c
> >>>  create mode 100644 migration/colo-failover.c
> >>>  create mode 100644 migration/colo.c
> >>>  create mode 100644 stubs/migration-colo.c
> >>>
> >>>--
> >>>1.8.3.1
> >>>
> >>>
> >>--
> >>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-29  9:47       ` Dr. David Alan Gilbert
@ 2016-02-29 12:16         ` Hailiang Zhang
  2016-02-29 13:04           ` Dr. David Alan Gilbert
  2016-03-01 12:25           ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 52+ messages in thread
From: Hailiang Zhang @ 2016-02-29 12:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>> From: root <root@localhost.localdomain>
>>>>>
>>>>> This is the 15th version of COLO (Still only support periodic checkpoint).
>>>>>
>>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
>>>>>
>>>>> There are little changes for this series except the network releated part.
>>>>
>>>> I was looking at the time the guest is paused during COLO and
>>>> was surprised to find one of the larger chunks was the time to reset
>>>> the guest before loading each checkpoint;  I've traced it part way, the
>>>> biggest contributors for my test VM seem to be:
>>>>
>>>>    3.8ms  pcibus_reset: VGA
>>>>    1.8ms  pcibus_reset: virtio-net-pci
>>>>    1.5ms  pcibus_reset: virtio-blk-pci
>>>>    1.5ms  qemu_devices_reset: piix4_reset
>>>>    1.1ms  pcibus_reset: piix3-ide
>>>>    1.1ms  pcibus_reset: virtio-rng-pci
>>>>
>>>> I've not looked deeper yet, but some of these are very silly;
>>>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
>>>> going to be interesting.
>>>> Also, my only block device is the virtio-blk, so while I understand the
>>>> standard PC machine has the IDE controller, why it takes it over a ms
>>>> to reset an unused device.
>>>
>>> OK, so I've dug a bit deeper, and it appears that it's the changes in
>>> PCI bars that actually take the time;  every time we do a reset we
>>> reset all the BARs, this causes it to do a pci_update_mappings and
>>> end up doing a memory_region_del_subregion.
>>> Then we load the config space of the PCI device as we do the vmstate_load,
>>> and this recreates all the mappings again.
>>>
>>> I'm not sure what the fix is, but that sounds like it would
>>> speed up the checkpoints usefully if we can avoid the map/remap when
>>> they're the same.
>>>
>>
>> Interesting, and thanks for your report.
>>
>> We already known qemu_system_reset() is a time-consuming function, we shouldn't
>> call it here, but if we didn't do that, there will be a bug, which we have
>> reported before in the previous COLO series, the bellow is the copy of the related
>> patch comment:
>>
>>      COLO VMstate: Load VM state into qsb before restore it
>>
>>      We should not destroy the state of secondary until we receive the whole
>>      state from the primary, in case the primary fails in the middle of sending
>>      the state, so, here we cache the device state in Secondary before restore it.
>>
>>      Besides, we should call qemu_system_reset() before load VM state,
>>      which can ensure the data is intact.
>>      Note: If we discard qemu_system_reset(), there will be some odd error,
>>      For exmple, qemu in slave side crashes and reports:
>>
>>      KVM: entry failed, hardware error 0x7
>>      EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
>>      ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
>>      EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>      ES =0040 00000400 0000ffff 00009300
>>      CS =f000 000f0000 0000ffff 00009b00
>>      SS =434f 000434f0 0000ffff 00009300
>>      DS =434f 000434f0 0000ffff 00009300
>>      FS =0000 00000000 0000ffff 00009300
>>      GS =0000 00000000 0000ffff 00009300
>>      LDT=0000 00000000 0000ffff 00008200
>>      TR =0000 00000000 0000ffff 00008b00
>>      GDT=     0002dcc8 00000047
>>      IDT=     00000000 0000ffff
>>      CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
>>      DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>      DR6=00000000ffff0ff0 DR7=0000000000000400
>>      EFER=0000000000000000
>>      Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3
>>      ERROR: invalid runstate transition: 'internal-error' -> 'colo'
>>
>>      The reason is, some of the device state will be ignored when saving device state to slave,
>>      if the corresponding data is in its initial value, such as 0.
>>      But the device state in slave maybe in initialized value, after a loop of checkpoint,
>>      there will be inconsistent for the value of device state.
>>      This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
>>      Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>      Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>      Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>      Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com
>>
>> As described above, some values of the device state are zero, they will be
>> ignored during  migration, it has no problem for normal migration, because
>> for the VM in destination, the initial values will be zero too. But for COLO,
>> there are more than one round of migration, the related values may be changed
>> from no-zero to zero, they will be ignored too in the next checkpoint, the
>> VMstate will be inconsistent for SVM.
>
> Yes, this doesn't really surprise me; a lot of the migration code does weird things
> like this, so reset is the safest way.
>

>> The above error is caused directly by wrong value of 'async_pf_en_msr'.
>>
>> static const VMStateDescription vmstate_async_pf_msr = {
>>      .name = "cpu/async_pf_msr",
>>      .version_id = 1,
>>      .minimum_version_id = 1,
>>      .needed = async_pf_msr_needed,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_UINT64(env.async_pf_en_msr, X86CPU),
>>          VMSTATE_END_OF_LIST()
>>      }
>> };
>>
>> static bool async_pf_msr_needed(void *opaque)
>> {
>>      X86CPU *cpu = opaque;
>>
>>      return cpu->env.async_pf_en_msr != 0;
>> }
>>
>> Some other VMstate of registers in CPUX86State have the same problem,
>> we can't make sure they won't cause any problems if the values of them
>> are incorrect.
>> So here, we just simply call qemu_system_reset() to avoid the inconsistent
>> problem.
>> Besides, compared with the most time-consuming operation (ram flushed from
>> COLO cache to SVM). The time consuming for qemu_system_reset() seems to be
>> acceptable ;)
>
> I've got a patch where I've tried to multithread the flush - it's made it a little
> faster, but not as much as I hoped (~20ms down to ~16ms using 4 cores)
>

Hmm, that seems to be a good idea, after switch to COLO (hybrid) mode, in most cases,
we will get much more dirtied pages than the periodic mode, because the delay time
between two checkpoints is usually longer.
The multi-thread flushing way may gain much more in that case, but i doubt, in some
bad case, users still can't bear the pause time.

Actually, we have thought about this problem for a long time,
In our early test based on Kernel COLO-proxy, we can easily got more than
one seconds' flushing time, IMHO, uses can't bear the long pausing time of VM if they
choose to use COLO.

We have designed another scenario which based on userfault's page-miss capability.
The base idea is to convert the flushing action to marking action, the flush action
will be processed during SVM's running time. For now it is only an idea,
we'd like to verify the idea first. (I'm not quite sure if userfaults' page-miss
feature is good performance designed, while we use it to mark one page to be MISS a time).


Thanks,
Hailiang

>> Another choice to fix the problem is to save the VMstate ignoring the needed()
>> return value, but this method is not so graceful.
>> diff --git a/migration/vmstate.c b/migration/vmstate.c
>> index e5388f0..7d15bba 100644
>> --- a/migration/vmstate.c
>> +++ b/migration/vmstate.c
>> @@ -409,7 +409,7 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
>>       bool subsection_found = false;
>>
>>       while (sub && sub->needed) {
>> -        if (sub->needed(opaque)) {
>> +        if (sub->needed(opaque) || migrate_in_colo_state()) {
>>               const VMStateDescription *vmsd = sub->vmsd;
>>               uint8_t len;
>
> Maybe, but I suspect this will also find other strange cases in devices.
> For example, without the reset I wouldn't really trust devices to load the
> new state in; they wouldn't have been tested with all the states they
> might have left themselves in previously.
>
> Dave
>
>
>
>> Thanks,
>> Hailiang
>>
>>> Dave
>>>
>>>>
>>>> I guess reset is normally off anyones radar since it's outside
>>>> the time anyone cares about, but I guess perhaps the guys trying
>>>> to make qemu start really quickly would be interested.
>>>>
>>>> Dave
>>>>
>>>>>
>>>>> Patch status:
>>>>> Unreviewed: patch 21,27,28,29,33,38
>>>>> Updated: patch 31,34,35,37
>>>>>
>>>>> TODO:
>>>>> 1. Checkpoint based on proxy in qemu
>>>>> 2. The capability of continuous FT
>>>>> 3. Optimize the VM's downtime during checkpoint
>>>>>
>>>>> v15:
>>>>>   - Go on the shutdown process if encounter error while sending shutdown
>>>>>     message to SVM. (patch 24)
>>>>>   - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
>>>>>     some useless comment. (patch 31, Jason)
>>>>>   - Call object_new_with_props() directly to add filter in
>>>>>     colo_add_buffer_filter. (patch 34, Jason)
>>>>>   - Re-implement colo_set_filter_status() based on COLOBufferFilters
>>>>>     list. (patch 35)
>>>>>   - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
>>>>>     list. (patch 37)
>>>>> v14:
>>>>>   - Re-implement the network processing based on netfilter (Jason Wang)
>>>>>   - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
>>>>>   - Split two new patches (patch 27/28) from patch 29
>>>>>   - Fix some other comments from Dave and Markus.
>>>>>
>>>>> v13:
>>>>>   - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
>>>>>    instead of return value to indicate success or failure. (patch 10)
>>>>>   - Remove the optional error message for COLO_EXIT event. (patch 25)
>>>>>   - Use semaphore to notify colo/colo incoming loop that failover work is
>>>>>     finished. (patch 26)
>>>>>   - Move COLO shutdown related codes to colo.c file. (patch 28)
>>>>>   - Fix memory leak bug for colo incoming loop. (new patch 31)
>>>>>   - Re-use some existed helper functions to realize the process of
>>>>>     saving/loading ram and device. (patch 32)
>>>>>   - Fix some other comments from Dave and Markus.
>>>>>
>>>>> zhanghailiang (38):
>>>>>    configure: Add parameter for configure to enable/disable COLO support
>>>>>    migration: Introduce capability 'x-colo' to migration
>>>>>    COLO: migrate colo related info to secondary node
>>>>>    migration: Integrate COLO checkpoint process into migration
>>>>>    migration: Integrate COLO checkpoint process into loadvm
>>>>>    COLO/migration: Create a new communication path from destination to
>>>>>      source
>>>>>    COLO: Implement colo checkpoint protocol
>>>>>    COLO: Add a new RunState RUN_STATE_COLO
>>>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>>>    COLO: Save PVM state to secondary side when do checkpoint
>>>>>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>>>>>    ram/COLO: Record the dirty pages that SVM received
>>>>>    COLO: Load VMState into qsb before restore it
>>>>>    COLO: Flush PVM's cached RAM into SVM's memory
>>>>>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>>>>>    COLO: synchronize PVM's state to SVM periodically
>>>>>    COLO failover: Introduce a new command to trigger a failover
>>>>>    COLO failover: Introduce state to record failover process
>>>>>    COLO: Implement failover work for Primary VM
>>>>>    COLO: Implement failover work for Secondary VM
>>>>>    qmp event: Add COLO_EXIT event to notify users while exited from COLO
>>>>>    COLO failover: Shutdown related socket fd when do failover
>>>>>    COLO failover: Don't do failover during loading VM's state
>>>>>    COLO: Process shutdown command for VM in COLO state
>>>>>    COLO: Update the global runstate after going into colo state
>>>>>    savevm: Introduce two helper functions for save/find loadvm_handlers
>>>>>      entry
>>>>>    migration/savevm: Add new helpers to process the different stages of
>>>>>      loadvm
>>>>>    migration/savevm: Export two helper functions for savevm process
>>>>>    COLO: Separate the process of saving/loading ram and device state
>>>>>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>>>>>    net/filter: Add a 'status' property for filter object
>>>>>    filter-buffer: Accept zero interval
>>>>>    net: Add notifier/callback for netdev init
>>>>>    COLO/filter: add each netdev a buffer filter
>>>>>    COLO: manage the status of buffer filters for PVM
>>>>>    filter-buffer: make filter_buffer_flush() public
>>>>>    COLO: flush buffered packets in checkpoint process or exit COLO
>>>>>    COLO: Add block replication into colo process
>>>>>
>>>>>   configure                     |  11 +
>>>>>   docs/qmp-events.txt           |  16 +
>>>>>   hmp-commands.hx               |  15 +
>>>>>   hmp.c                         |  15 +
>>>>>   hmp.h                         |   1 +
>>>>>   include/exec/ram_addr.h       |   1 +
>>>>>   include/migration/colo.h      |  42 ++
>>>>>   include/migration/failover.h  |  33 ++
>>>>>   include/migration/migration.h |  16 +
>>>>>   include/migration/qemu-file.h |   3 +-
>>>>>   include/net/filter.h          |   5 +
>>>>>   include/net/net.h             |   4 +
>>>>>   include/sysemu/sysemu.h       |   9 +
>>>>>   migration/Makefile.objs       |   2 +
>>>>>   migration/colo-comm.c         |  76 ++++
>>>>>   migration/colo-failover.c     |  83 ++++
>>>>>   migration/colo.c              | 866 ++++++++++++++++++++++++++++++++++++++++++
>>>>>   migration/migration.c         | 109 +++++-
>>>>>   migration/qemu-file-buf.c     |  61 +++
>>>>>   migration/ram.c               | 175 ++++++++-
>>>>>   migration/savevm.c            | 114 ++++--
>>>>>   net/filter-buffer.c           |  14 +-
>>>>>   net/filter.c                  |  40 ++
>>>>>   net/net.c                     |  33 ++
>>>>>   qapi-schema.json              | 104 ++++-
>>>>>   qapi/event.json               |  15 +
>>>>>   qemu-options.hx               |   4 +-
>>>>>   qmp-commands.hx               |  23 +-
>>>>>   stubs/Makefile.objs           |   1 +
>>>>>   stubs/migration-colo.c        |  54 +++
>>>>>   trace-events                  |   8 +
>>>>>   vl.c                          |  31 +-
>>>>>   32 files changed, 1908 insertions(+), 76 deletions(-)
>>>>>   create mode 100644 include/migration/colo.h
>>>>>   create mode 100644 include/migration/failover.h
>>>>>   create mode 100644 migration/colo-comm.c
>>>>>   create mode 100644 migration/colo-failover.c
>>>>>   create mode 100644 migration/colo.c
>>>>>   create mode 100644 stubs/migration-colo.c
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-29 12:16         ` Hailiang Zhang
@ 2016-02-29 13:04           ` Dr. David Alan Gilbert
  2016-03-01 12:25           ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-29 13:04 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:

> >I've got a patch where I've tried to multithread the flush - it's made it a little
> >faster, but not as much as I hoped (~20ms down to ~16ms using 4 cores)
> >
> 
> Hmm, that seems to be a good idea, after switch to COLO (hybrid) mode, in most cases,
> we will get much more dirtied pages than the periodic mode, because the delay time
> between two checkpoints is usually longer.
> The multi-thread flushing way may gain much more in that case, but i doubt, in some
> bad case, users still can't bear the pause time.
> 
> Actually, we have thought about this problem for a long time,
> In our early test based on Kernel COLO-proxy, we can easily got more than
> one seconds' flushing time, IMHO, uses can't bear the long pausing time of VM if they
> choose to use COLO.

Yes, that's just too long; although only solving the 'flushing' time isn't enough in those
cases, because the same cases will probably need to transfer lots of RAM over the wire as well.

> We have designed another scenario which based on userfault's page-miss capability.
> The base idea is to convert the flushing action to marking action, the flush action
> will be processed during SVM's running time. For now it is only an idea,
> we'd like to verify the idea first. (I'm not quite sure if userfaults' page-miss
> feature is good performance designed, while we use it to mark one page to be MISS a time).

Yes, it's a different trade off, slower execution, but no flush time.

Dave

> 
> 
> Thanks,
> Hailiang
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-02-29 12:16         ` Hailiang Zhang
  2016-02-29 13:04           ` Dr. David Alan Gilbert
@ 2016-03-01 12:25           ` Dr. David Alan Gilbert
  2016-03-02 13:01             ` Hailiang Zhang
  1 sibling, 1 reply; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-01 12:25 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	pbonzini, amit.shah, zhangchen.fnst, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >>>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>>From: root <root@localhost.localdomain>
> >>>>>
> >>>>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>>>
> >>>>>Here is only COLO frame part, you can get the whole codes from github:
> >>>>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>>>
> >>>>>There are little changes for this series except the network releated part.
> >>>>
> >>>>I was looking at the time the guest is paused during COLO and
> >>>>was surprised to find one of the larger chunks was the time to reset
> >>>>the guest before loading each checkpoint;  I've traced it part way, the
> >>>>biggest contributors for my test VM seem to be:
> >>>>
> >>>>   3.8ms  pcibus_reset: VGA
> >>>>   1.8ms  pcibus_reset: virtio-net-pci
> >>>>   1.5ms  pcibus_reset: virtio-blk-pci
> >>>>   1.5ms  qemu_devices_reset: piix4_reset
> >>>>   1.1ms  pcibus_reset: piix3-ide
> >>>>   1.1ms  pcibus_reset: virtio-rng-pci
> >>>>
> >>>>I've not looked deeper yet, but some of these are very silly;
> >>>>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>>>going to be interesting.
> >>>>Also, my only block device is the virtio-blk, so while I understand the
> >>>>standard PC machine has the IDE controller, why it takes it over a ms
> >>>>to reset an unused device.
> >>>
> >>>OK, so I've dug a bit deeper, and it appears that it's the changes in
> >>>PCI bars that actually take the time;  every time we do a reset we
> >>>reset all the BARs, this causes it to do a pci_update_mappings and
> >>>end up doing a memory_region_del_subregion.
> >>>Then we load the config space of the PCI device as we do the vmstate_load,
> >>>and this recreates all the mappings again.
> >>>
> >>>I'm not sure what the fix is, but that sounds like it would
> >>>speed up the checkpoints usefully if we can avoid the map/remap when
> >>>they're the same.
> >>>
> >>
> >>Interesting, and thanks for your report.
> >>
> >>We already known qemu_system_reset() is a time-consuming function, we shouldn't
> >>call it here, but if we didn't do that, there will be a bug, which we have
> >>reported before in the previous COLO series, the bellow is the copy of the related
> >>patch comment:

Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
(in particular if the guest changed a bar and the device code tried to access the memory
while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
times:

Dave

commit 7570b2984143860005ad9fe79f5394c75f294328
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Tue Mar 1 12:08:14 2016 +0000

    COLO: Lock memory map around reset/load
    
    Changing the memory map appears to be expensive; we see this
    partiuclarly when on loading a checkpoint we:
       a) reset the devices
          This causes PCI bars to be reset
       b) Loading the device states
          This causes the PCI bars to be reloaded.
    
    Turning this all into a single memory_region_transaction saves
     ~10ms/checkpoint.
    
    TBD: What happens if the device code accesses the RAM during loading
    the checkpoint?
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/migration/colo.c b/migration/colo.c
index 45c3432..c44fb2a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -22,6 +22,7 @@
 #include "net/colo-proxy.h"
 #include "net/net.h"
 #include "block/block_int.h"
+#include "exec/memory.h"
 
 static bool vmstate_loading;
 
@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
 
         stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
         qemu_mutex_lock_iothread();
+        memory_region_transaction_begin();
         qemu_system_reset(VMRESET_SILENT);
         stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
         timed_average_account(&mis->colo_state.time_reset,
@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
                           stage_time_end - stage_time_start);
         stage_time_start = stage_time_end;
         ret = qemu_load_device_state(fb);
+        memory_region_transaction_commit();
         if (ret < 0) {
             error_report("COLO: load device state failed\n");
             vmstate_loading = false;

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-03-01 12:25           ` Dr. David Alan Gilbert
@ 2016-03-02 13:01             ` Hailiang Zhang
  2016-03-03 20:13               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 52+ messages in thread
From: Hailiang Zhang @ 2016-03-02 13:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	pbonzini, amit.shah, zhangchen.fnst, hongyang.yang

On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
>>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
>>>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>>> From: root <root@localhost.localdomain>
>>>>>>>
>>>>>>> This is the 15th version of COLO (Still only support periodic checkpoint).
>>>>>>>
>>>>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>>>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
>>>>>>>
>>>>>>> There are little changes for this series except the network releated part.
>>>>>>
>>>>>> I was looking at the time the guest is paused during COLO and
>>>>>> was surprised to find one of the larger chunks was the time to reset
>>>>>> the guest before loading each checkpoint;  I've traced it part way, the
>>>>>> biggest contributors for my test VM seem to be:
>>>>>>
>>>>>>    3.8ms  pcibus_reset: VGA
>>>>>>    1.8ms  pcibus_reset: virtio-net-pci
>>>>>>    1.5ms  pcibus_reset: virtio-blk-pci
>>>>>>    1.5ms  qemu_devices_reset: piix4_reset
>>>>>>    1.1ms  pcibus_reset: piix3-ide
>>>>>>    1.1ms  pcibus_reset: virtio-rng-pci
>>>>>>
>>>>>> I've not looked deeper yet, but some of these are very silly;
>>>>>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
>>>>>> going to be interesting.
>>>>>> Also, my only block device is the virtio-blk, so while I understand the
>>>>>> standard PC machine has the IDE controller, why it takes it over a ms
>>>>>> to reset an unused device.
>>>>>
>>>>> OK, so I've dug a bit deeper, and it appears that it's the changes in
>>>>> PCI bars that actually take the time;  every time we do a reset we
>>>>> reset all the BARs, this causes it to do a pci_update_mappings and
>>>>> end up doing a memory_region_del_subregion.
>>>>> Then we load the config space of the PCI device as we do the vmstate_load,
>>>>> and this recreates all the mappings again.
>>>>>
>>>>> I'm not sure what the fix is, but that sounds like it would
>>>>> speed up the checkpoints usefully if we can avoid the map/remap when
>>>>> they're the same.
>>>>>
>>>>
>>>> Interesting, and thanks for your report.
>>>>
>>>> We already known qemu_system_reset() is a time-consuming function, we shouldn't
>>>> call it here, but if we didn't do that, there will be a bug, which we have
>>>> reported before in the previous COLO series, the bellow is the copy of the related
>>>> patch comment:
>
> Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> (in particular if the guest changed a bar and the device code tried to access the memory
> while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
> times:
>

Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here,
it should be safe to apply to qemu_system_reset() independently (I tested it too,
it will shaves about 5ms off).

Hailiang

> Dave
>
> commit 7570b2984143860005ad9fe79f5394c75f294328
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Tue Mar 1 12:08:14 2016 +0000
>
>      COLO: Lock memory map around reset/load
>
>      Changing the memory map appears to be expensive; we see this
>      partiuclarly when on loading a checkpoint we:
>         a) reset the devices
>            This causes PCI bars to be reset
>         b) Loading the device states
>            This causes the PCI bars to be reloaded.
>
>      Turning this all into a single memory_region_transaction saves
>       ~10ms/checkpoint.
>
>      TBD: What happens if the device code accesses the RAM during loading
>      the checkpoint?
>
>      Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>      Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 45c3432..c44fb2a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -22,6 +22,7 @@
>   #include "net/colo-proxy.h"
>   #include "net/net.h"
>   #include "block/block_int.h"
> +#include "exec/memory.h"
>
>   static bool vmstate_loading;
>
> @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
>
>           stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
>           qemu_mutex_lock_iothread();
> +        memory_region_transaction_begin();
>           qemu_system_reset(VMRESET_SILENT);
>           stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
>           timed_average_account(&mis->colo_state.time_reset,
> @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
>                             stage_time_end - stage_time_start);
>           stage_time_start = stage_time_end;
>           ret = qemu_load_device_state(fb);
> +        memory_region_transaction_commit();
>           if (ret < 0) {
>               error_report("COLO: load device state failed\n");
>               vmstate_loading = false;
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2016-03-02 13:01             ` Hailiang Zhang
@ 2016-03-03 20:13               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 52+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-03 20:13 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	pbonzini, amit.shah, zhangchen.fnst, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >>>* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >>>>>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>>>>From: root <root@localhost.localdomain>
> >>>>>>>
> >>>>>>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>>>>>
> >>>>>>>Here is only COLO frame part, you can get the whole codes from github:
> >>>>>>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>>>>>
> >>>>>>>There are little changes for this series except the network releated part.
> >>>>>>
> >>>>>>I was looking at the time the guest is paused during COLO and
> >>>>>>was surprised to find one of the larger chunks was the time to reset
> >>>>>>the guest before loading each checkpoint;  I've traced it part way, the
> >>>>>>biggest contributors for my test VM seem to be:
> >>>>>>
> >>>>>>   3.8ms  pcibus_reset: VGA
> >>>>>>   1.8ms  pcibus_reset: virtio-net-pci
> >>>>>>   1.5ms  pcibus_reset: virtio-blk-pci
> >>>>>>   1.5ms  qemu_devices_reset: piix4_reset
> >>>>>>   1.1ms  pcibus_reset: piix3-ide
> >>>>>>   1.1ms  pcibus_reset: virtio-rng-pci
> >>>>>>
> >>>>>>I've not looked deeper yet, but some of these are very silly;
> >>>>>>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>>>>>going to be interesting.
> >>>>>>Also, my only block device is the virtio-blk, so while I understand the
> >>>>>>standard PC machine has the IDE controller, why it takes it over a ms
> >>>>>>to reset an unused device.
> >>>>>
> >>>>>OK, so I've dug a bit deeper, and it appears that it's the changes in
> >>>>>PCI bars that actually take the time;  every time we do a reset we
> >>>>>reset all the BARs, this causes it to do a pci_update_mappings and
> >>>>>end up doing a memory_region_del_subregion.
> >>>>>Then we load the config space of the PCI device as we do the vmstate_load,
> >>>>>and this recreates all the mappings again.
> >>>>>
> >>>>>I'm not sure what the fix is, but that sounds like it would
> >>>>>speed up the checkpoints usefully if we can avoid the map/remap when
> >>>>>they're the same.
> >>>>>
> >>>>
> >>>>Interesting, and thanks for your report.
> >>>>
> >>>>We already known qemu_system_reset() is a time-consuming function, we shouldn't
> >>>>call it here, but if we didn't do that, there will be a bug, which we have
> >>>>reported before in the previous COLO series, the bellow is the copy of the related
> >>>>patch comment:
> >
> >Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> >(in particular if the guest changed a bar and the device code tried to access the memory
> >while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
> >times:
> >
> 
> Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here,
> it should be safe to apply to qemu_system_reset() independently (I tested it too,
> it will shaves about 5ms off).

Yes, it seems quite nice.
I did find today one VM that wont boot with COLO with that change; it's
an ubuntu VM that has a delay in Grub, and it's when it does the first
checkpoint during Grub still being displayed it gets an error from
the inbound migrate.

The error is VQ 0 size 0x80 Guest index 0x2444 inconsistent with Host index 0x119e: delta 0x12a6
from virtio-blk - so maybe virtio-blk is accessing the memory during loading.

Dave

> Hailiang
> 
> >Dave
> >
> >commit 7570b2984143860005ad9fe79f5394c75f294328
> >Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >Date:   Tue Mar 1 12:08:14 2016 +0000
> >
> >     COLO: Lock memory map around reset/load
> >
> >     Changing the memory map appears to be expensive; we see this
> >     partiuclarly when on loading a checkpoint we:
> >        a) reset the devices
> >           This causes PCI bars to be reset
> >        b) Loading the device states
> >           This causes the PCI bars to be reloaded.
> >
> >     Turning this all into a single memory_region_transaction saves
> >      ~10ms/checkpoint.
> >
> >     TBD: What happens if the device code accesses the RAM during loading
> >     the checkpoint?
> >
> >     Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >     Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> >
> >diff --git a/migration/colo.c b/migration/colo.c
> >index 45c3432..c44fb2a 100644
> >--- a/migration/colo.c
> >+++ b/migration/colo.c
> >@@ -22,6 +22,7 @@
> >  #include "net/colo-proxy.h"
> >  #include "net/net.h"
> >  #include "block/block_int.h"
> >+#include "exec/memory.h"
> >
> >  static bool vmstate_loading;
> >
> >@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
> >
> >          stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
> >          qemu_mutex_lock_iothread();
> >+        memory_region_transaction_begin();
> >          qemu_system_reset(VMRESET_SILENT);
> >          stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
> >          timed_average_account(&mis->colo_state.time_reset,
> >@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
> >                            stage_time_end - stage_time_start);
> >          stage_time_start = stage_time_end;
> >          ret = qemu_load_device_state(fb);
> >+        memory_region_transaction_commit();
> >          if (ret < 0) {
> >              error_report("COLO: load device state failed\n");
> >              vmstate_loading = false;
> >
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-03-03 20:14 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
2016-02-26 12:52   ` Dr. David Alan Gilbert
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
2016-02-26 13:00   ` Dr. David Alan Gilbert
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-02-26 13:16   ` Dr. David Alan Gilbert
2016-02-27 10:03     ` Hailiang Zhang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process zhanghailiang
2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
2016-02-26 16:36   ` Dr. David Alan Gilbert
2016-02-27  7:54     ` Hailiang Zhang
2016-02-29  9:47       ` Dr. David Alan Gilbert
2016-02-29 12:16         ` Hailiang Zhang
2016-02-29 13:04           ` Dr. David Alan Gilbert
2016-03-01 12:25           ` Dr. David Alan Gilbert
2016-03-02 13:01             ` Hailiang Zhang
2016-03-03 20:13               ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.