All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
@ 2015-03-26  5:29 zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (29 more replies)
  0 siblings, 30 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
failover, proxy API, block replication API, not include block replication.
The block part has been sent by wencongyang:
[RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints

Compared with last version, there aren't too much optimize and new functions.
The main reason is that there is an known issue that still unsolved, we found
some dirty pages which have been missed setting bit in corresponding bitmap.
And it will trigger strange problem in VM.
We hope to resolve it before add more codes.

You can get the newest integrated qemu colo patches from github:
https://github.com/coloft/qemu/commits/colo-v1.1

About how to test COLO, Please reference to the follow link.
http://wiki.qemu.org/Features/COLO.

Please review and test.

Known issue still unsolved:
(1) Some pages dirtied without setting its corresponding dirty-bitmap.

Previous posted RFC patch series:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html

TODO list:
1 Optimize the process of checkpoint, shorten the time-consuming:
  (Partly done, patch is not include into this series)
   1) separate ram and device save/load process to reduce size of extra memory
      used during checkpoint
   2) live migrate part of dirty pages to slave during sleep time.
2 Add more debug/stat info
  (Partly done, patch is not include into this series)
  include checkpoint count, proxy discompare count, downtime,
   number of live migrated pages, total sent pages, etc.
3 Strengthen failover
4 optimize proxy part, include proxy script.
5 The capability of continuous FT

v4:
- New block replication scheme (use image-fleecing for sencondary side)
- Adress some comments from Eric Blake and Dave
- Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
- Add a delay (100ms) between continuous checkpoint requests to ensure VM
  run 100ms at least since last pause.

v3:
- use proxy instead of colo agent to compare network packets
- add block replication
- Optimize failover disposal
- handle shutdown

v2:
- use QEMUSizedBuffer/QEMUFile as COLO buffer
- colo support is enabled by default
- add nic replication support
- addressed comments from Eric Blake and Dr. David Alan Gilbert

v1:
- implement the frame of colo

Wen Congyang (1):
  COLO: Add block replication into colo process

zhanghailiang (27):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'colo' to migration
  COLO: migrate colo related info to slave
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save VM state to slave when do checkpoint
  COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
  COLO VMstate: Load VM state into qsb before restore it
  arch_init: Start to trace dirty pages of SVM
  COLO RAM: Flush cached RAM into SVM's memory
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Implement COLO master/slave failover work
  COLO failover: Don't do failover during loading VM's state
  COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
  COLO NIC: Implement colo nic device interface configure()
  COLO NIC : Implement colo nic init/destroy function
  COLO NIC: Some init work related with proxy module
  COLO: Do checkpoint according to the result of net packets comparing
  COLO: Improve checkpoint efficiency by do additional periodic
    checkpoint
  COLO: Add colo-set-checkpoint-period command
  COLO NIC: Implement NIC checkpoint and failover
  COLO: Disable qdev hotplug when VM is in COLO mode
  COLO: Implement shutdown checkpoint

 arch_init.c                            | 199 +++++++-
 configure                              |  14 +
 hmp-commands.hx                        |  30 ++
 hmp.c                                  |  14 +
 hmp.h                                  |   2 +
 include/exec/cpu-all.h                 |   1 +
 include/migration/migration-colo.h     |  58 +++
 include/migration/migration-failover.h |  22 +
 include/migration/migration.h          |   3 +
 include/migration/qemu-file.h          |   3 +-
 include/net/colo-nic.h                 |  25 +
 include/net/net.h                      |   4 +
 include/sysemu/sysemu.h                |   3 +
 migration/Makefile.objs                |   2 +
 migration/colo-comm.c                  |  80 ++++
 migration/colo-failover.c              |  48 ++
 migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
 migration/migration.c                  |  60 ++-
 migration/qemu-file-buf.c              |  58 +++
 net/Makefile.objs                      |   1 +
 net/colo-nic.c                         | 438 ++++++++++++++++++
 net/tap.c                              |  45 +-
 qapi-schema.json                       |  42 +-
 qemu-options.hx                        |  10 +-
 qmp-commands.hx                        |  41 ++
 savevm.c                               |   2 +-
 scripts/colo-proxy-script.sh           |  97 ++++
 stubs/Makefile.objs                    |   1 +
 stubs/migration-colo.c                 |  58 +++
 vl.c                                   |  36 +-
 30 files changed, 2178 insertions(+), 28 deletions(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 include/migration/migration-failover.h
 create mode 100644 include/net/colo-nic.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 migration/colo.c.
 create mode 100644 net/colo-nic.c
 create mode 100755 scripts/colo-proxy-script.sh
 create mode 100644 stubs/migration-colo.c

-- 
1.7.12.4

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 02/28] migration: Introduce capability 'colo' to migration zhanghailiang
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is on by default.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 configure | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/configure b/configure
index 589798e..b0a29e2 100755
--- a/configure
+++ b/configure
@@ -258,6 +258,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -923,6 +924,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1323,6 +1328,10 @@ Advanced options (experts only):
   --disable-slirp          disable SLIRP userspace network connectivity
   --disable-kvm            disable KVM acceleration support
   --enable-kvm             enable KVM acceleration support
+  --disable-colo           disable COarse-grain LOck-stepping Virtual
+                           Machines for Non-stop Service
+  --enable-colo            enable COarse-grain LOck-stepping Virtual
+                           Machines for Non-stop Service (default)
   --disable-rdma           disable RDMA-based migration support
   --enable-rdma            enable RDMA-based migration support
   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
@@ -4387,6 +4396,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "COLO support      $colo"
 echo "RDMA support      $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
@@ -4946,6 +4956,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 02/28] migration: Introduce capability 'colo' to migration
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave zhanghailiang
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration-colo.h | 20 ++++++++++++++++++++
 include/migration/migration.h      |  1 +
 migration/Makefile.objs            |  1 +
 migration/colo.c                   | 18 ++++++++++++++++++
 migration/migration.c              | 17 +++++++++++++++++
 qapi-schema.json                   |  5 ++++-
 stubs/Makefile.objs                |  1 +
 stubs/migration-colo.c             | 18 ++++++++++++++++++
 8 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration/colo.c
 create mode 100644 migration/colo.c.
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
new file mode 100644
index 0000000..6fdbb94
--- /dev/null
+++ b/include/migration/migration-colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_COLO_H
+#define QEMU_MIGRATION_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index bf09968..59acb62 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -149,6 +149,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_enable_colo(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..5a25d39 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 0000000..babfe4f
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+bool colo_supported(void)
+{
+    return true;
+}
diff --git a/migration/colo.c. b/migration/colo.c.
new file mode 100644
index 0000000..e69de29
diff --git a/migration/migration.c b/migration/migration.c
index bc42490..7009ddd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "migration/migration-colo.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -154,6 +155,9 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 
     caps = NULL; /* silence compiler warning */
     for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+        if (i == MIGRATION_CAPABILITY_COLO && !colo_supported()) {
+            continue;
+        }
         if (head == NULL) {
             head = g_malloc0(sizeof(*caps));
             caps = head;
@@ -279,6 +283,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 
     for (cap = params; cap; cap = cap->next) {
+        if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
+            cap->value->state && !colo_supported()) {
+            error_setg(errp, "COLO is not currently supported, please"
+                             " configure with --enable-colo option in order to"
+                             " support COLO feature");
+            continue;
+        }
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
 }
@@ -605,6 +616,12 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
+bool migrate_enable_colo(void)
+{
+    MigrationState *s = migrate_get_current();
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_COLO];
+}
+
 /* migration thread support */
 
 static void *migration_thread(void *opaque)
diff --git a/qapi-schema.json b/qapi-schema.json
index ac9594d..193cad4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -518,10 +518,13 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @colo: If enabled, migration will never end, and the state of VM in primary side
+#        will be migrated continuously to VM in secondary side. (since 2.4)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'colo'] }
 
 ##
 # @MigrationCapabilityStatus
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 8beff4c..65a7171 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -39,3 +39,4 @@ stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
 stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += migration-colo.o
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
new file mode 100644
index 0000000..ccbaea8
--- /dev/null
+++ b/stubs/migration-colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+bool colo_supported(void)
+{
+    return false;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 02/28] migration: Introduce capability 'colo' to migration zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-05-15 11:38   ` Dr. David Alan Gilbert
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 04/28] migration: Integrate COLO checkpoint process into migration zhanghailiang
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

We can know if VM in destination should go into COLO mode by refer to
the info that has been migrated from PVM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/migration/migration-colo.h |  2 ++
 migration/Makefile.objs            |  1 +
 migration/colo-comm.c              | 55 ++++++++++++++++++++++++++++++++++++++
 vl.c                               |  5 +++-
 4 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 6fdbb94..de68c72 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -14,7 +14,9 @@
 #define QEMU_MIGRATION_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_mig_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 5a25d39..cb7bd30 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 0000000..cab97e9
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,55 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <migration/migration-colo.h>
+
+#define DEBUG_COLO_COMMON 0
+
+#define DPRINTF(fmt, ...)                                  \
+    do {                                                   \
+        if (DEBUG_COLO_COMMON) {                           \
+            fprintf(stderr, "COLO: " fmt, ## __VA_ARGS__); \
+        }                                                  \
+    } while (0)
+
+static bool colo_requested;
+
+/* save */
+static void colo_info_save(QEMUFile *f, void *opaque)
+{
+    qemu_put_byte(f, migrate_enable_colo());
+}
+
+/* restore */
+static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
+{
+    int value = qemu_get_byte(f);
+
+    if (value && !colo_requested) {
+        DPRINTF("COLO requested!\n");
+    }
+    colo_requested = value;
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_colo_info_handlers = {
+    .save_state = colo_info_save,
+    .load_state = colo_info_load,
+};
+
+void colo_info_mig_init(void)
+{
+    register_savevm_live(NULL, "colo", -1, 1,
+                         &savevm_colo_info_handlers, NULL);
+}
diff --git a/vl.c b/vl.c
index 75ec292..9724992 100644
--- a/vl.c
+++ b/vl.c
@@ -90,6 +90,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/migration-colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4149,7 +4150,9 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
-
+#ifdef CONFIG_COLO
+    colo_info_mig_init();
+#endif
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
     if (machine_class->units_per_default_bus) {
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 04/28] migration: Integrate COLO checkpoint process into migration
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (2 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 05/28] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	david

Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration-colo.h |  3 ++
 include/migration/migration.h      |  2 ++
 migration/colo.c                   | 65 ++++++++++++++++++++++++++++++++++++++
 migration/migration.c              | 22 ++++++++++---
 qapi-schema.json                   |  2 +-
 stubs/migration-colo.c             |  9 ++++++
 6 files changed, 97 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index de68c72..cac23e1 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+void colo_init_checkpointer(MigrationState *s);
+bool migrate_in_colo_state(void);
+
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 59acb62..c723a02 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -65,6 +65,8 @@ struct MigrationState
     int64_t dirty_sync_count;
 };
 
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration/colo.c b/migration/colo.c
index babfe4f..d8cab6d 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,9 +10,74 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
+#include "qemu/error-report.h"
+
+#define DEBUG_COLO 0
+
+#define DPRINTF(fmt, ...)                                   \
+    do {                                                    \
+        if (DEBUG_COLO) {                                   \
+            fprintf(stderr, "colo: " fmt , ## __VA_ARGS__); \
+        }                                                   \
+    } while (0)
+
+static QEMUBH *colo_bh;
 
 bool colo_supported(void)
 {
     return true;
 }
+
+bool migrate_in_colo_state(void)
+{
+    MigrationState *s = migrate_get_current();
+    return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void *colo_thread(void *opaque)
+{
+    MigrationState *s = opaque;
+
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm resume to run\n");
+
+
+    /*TODO: COLO checkpoint savevm loop*/
+
+    migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
+
+    qemu_mutex_lock_iothread();
+    qemu_bh_schedule(s->cleanup_bh);
+    qemu_mutex_unlock_iothread();
+
+    return NULL;
+}
+
+static void colo_start_checkpointer(void *opaque)
+{
+    MigrationState *s = opaque;
+
+    if (colo_bh) {
+        qemu_bh_delete(colo_bh);
+        colo_bh = NULL;
+    }
+
+    qemu_mutex_unlock_iothread();
+    qemu_thread_join(&s->thread);
+    qemu_mutex_lock_iothread();
+
+    migrate_set_state(s, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO);
+
+    qemu_thread_create(&s->thread, "colo", colo_thread, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+    colo_bh = qemu_bh_new(colo_start_checkpointer, s);
+    qemu_bh_schedule(colo_bh);
+}
diff --git a/migration/migration.c b/migration/migration.c
index 7009ddd..d904c4d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -235,6 +235,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        /* TODO: display COLO specific information (checkpoint info etc.) */
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -296,7 +300,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(MigrationState *s, int old_state, int new_state)
 {
     if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) {
         trace_migrate_set_state(new_state);
@@ -633,6 +637,7 @@ static void *migration_thread(void *opaque)
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    bool enable_colo = migrate_enable_colo();
 
     qemu_savevm_state_begin(s->file, &s->params);
 
@@ -670,8 +675,10 @@ static void *migration_thread(void *opaque)
                 }
 
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_COMPLETED);
+                    if (!enable_colo) {
+                        migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
+                                          MIGRATION_STATUS_COMPLETED);
+                    }
                     break;
                 }
             }
@@ -722,11 +729,16 @@ static void *migration_thread(void *opaque)
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
-        if (old_vm_running) {
+        if (s->state == MIGRATION_STATUS_ACTIVE && enable_colo) {
+            colo_init_checkpointer(s);
+        } else if (old_vm_running) {
             vm_start();
         }
     }
-    qemu_bh_schedule(s->cleanup_bh);
+
+    if (!enable_colo) {
+        qemu_bh_schedule(s->cleanup_bh);
+    }
     qemu_mutex_unlock_iothread();
 
     return NULL;
diff --git a/qapi-schema.json b/qapi-schema.json
index 193cad4..172aae3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -433,7 +433,7 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'completed', 'failed' ] }
+            'active', 'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index ccbaea8..495ca28 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -16,3 +16,12 @@ bool colo_supported(void)
 {
     return false;
 }
+
+bool migrate_in_colo_state(void)
+{
+    return false;
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 05/28] migration: Integrate COLO checkpoint process into loadvm
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (3 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 04/28] migration: Integrate COLO checkpoint process into migration zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 06/28] COLO: Implement colo checkpoint protocol zhanghailiang
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 include/migration/migration-colo.h | 13 +++++++++++++
 migration/colo-comm.c              | 10 ++++++++++
 migration/colo.c                   | 14 ++++++++++++++
 migration/migration.c              | 21 ++++++++++++++++++++-
 stubs/migration-colo.c             |  5 +++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index cac23e1..b326c35 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -15,11 +15,24 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "block/coroutine.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+struct colo_incoming {
+    QEMUFile *file;
+    QemuThread thread;
+};
+
 void colo_init_checkpointer(MigrationState *s);
 bool migrate_in_colo_state(void);
 
+/* loadvm */
+extern Coroutine *migration_incoming_co;
+bool loadvm_enable_colo(void);
+void loadvm_exit_colo(void);
+void *colo_process_incoming_checkpoints(void *opaque);
+bool loadvm_in_colo_state(void);
 #endif
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index cab97e9..1d844e1 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -53,3 +53,13 @@ void colo_info_mig_init(void)
     register_savevm_live(NULL, "colo", -1, 1,
                          &savevm_colo_info_handlers, NULL);
 }
+
+bool loadvm_enable_colo(void)
+{
+    return colo_requested;
+}
+
+void loadvm_exit_colo(void)
+{
+    colo_requested = false;
+}
diff --git a/migration/colo.c b/migration/colo.c
index d8cab6d..3b6fbf2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -24,6 +24,7 @@
     } while (0)
 
 static QEMUBH *colo_bh;
+static Coroutine *colo;
 
 bool colo_supported(void)
 {
@@ -81,3 +82,16 @@ void colo_init_checkpointer(MigrationState *s)
     colo_bh = qemu_bh_new(colo_start_checkpointer, s);
     qemu_bh_schedule(colo_bh);
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+    colo = qemu_coroutine_self();
+    assert(colo != NULL);
+
+    /* TODO: COLO checkpoint restore loop */
+
+    colo = NULL;
+    loadvm_exit_colo();
+
+    return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index d904c4d..26bf235 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -96,6 +96,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     }
 }
 
+Coroutine *migration_incoming_co;
 static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
@@ -103,7 +104,25 @@ static void process_incoming_migration_co(void *opaque)
     int ret;
 
     ret = qemu_loadvm_state(f);
-    qemu_fclose(f);
+
+    /* we get colo info, and know if we are in colo mode */
+    if (loadvm_enable_colo()) {
+        struct colo_incoming *colo_in = g_malloc0(sizeof(*colo_in));
+
+        colo_in->file = f;
+        migration_incoming_co = qemu_coroutine_self();
+        qemu_thread_create(&colo_in->thread, "colo incoming",
+             colo_process_incoming_checkpoints, colo_in, QEMU_THREAD_JOINABLE);
+        qemu_coroutine_yield();
+        migration_incoming_co = NULL;
+#if 0
+        /* FIXME  wait checkpoint incoming thread exit, and free resource */
+        qemu_thread_join(&colo_in->thread);
+        g_free(colo_in);
+#endif
+    } else {
+        qemu_fclose(f);
+    }
     free_xbzrle_decoded_buf();
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 495ca28..cbadcd6 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -25,3 +25,8 @@ bool migrate_in_colo_state(void)
 void colo_init_checkpointer(MigrationState *s)
 {
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+    return NULL;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 06/28] COLO: Implement colo checkpoint protocol
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (4 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 05/28] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

                  Primary                 Secondary
  NEW             @
                                          Suspend
  SUSPENDED                               @
                  Suspend&Save state
  SEND            @
                  Send state              Receive state
  RECEIVED                                @
                  Flush network           Load state
  LOADED                                  @
                  Resume                  Resume

                  Start Comparing
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 migration/colo.c | 237 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 234 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 3b6fbf2..5a8ed1b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -23,6 +23,41 @@
         }                                                   \
     } while (0)
 
+enum {
+    COLO_READY = 0x46,
+
+    /*
+    * Checkpoint synchronizing points.
+    *
+    *                  Primary                 Secondary
+    *  NEW             @
+    *                                          Suspend
+    *  SUSPENDED                               @
+    *                  Suspend&Save state
+    *  SEND            @
+    *                  Send state              Receive state
+    *  RECEIVED                                @
+    *                  Flush network           Load state
+    *  LOADED                                  @
+    *                  Resume                  Resume
+    *
+    *                  Start Comparing
+    * NOTE:
+    * 1) '@' who sends the message
+    * 2) Every sync-point is synchronized by two sides with only
+    *    one handshake(single direction) for low-latency.
+    *    If more strict synchronization is required, a opposite direction
+    *    sync-point should be added.
+    * 3) Since sync-points are single direction, the remote side may
+    *    go forward a lot when this side just receives the sync-point.
+    */
+    COLO_CHECKPOINT_NEW,
+    COLO_CHECKPOINT_SUSPENDED,
+    COLO_CHECKPOINT_SEND,
+    COLO_CHECKPOINT_RECEIVED,
+    COLO_CHECKPOINT_LOADED,
+};
+
 static QEMUBH *colo_bh;
 static Coroutine *colo;
 
@@ -37,20 +72,135 @@ bool migrate_in_colo_state(void)
     return (s->state == MIGRATION_STATUS_COLO);
 }
 
+/* colo checkpoint control helper */
+static int colo_ctl_put(QEMUFile *f, uint64_t request)
+{
+    int ret = 0;
+
+    qemu_put_be64(f, request);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+
+    return ret;
+}
+
+static int colo_ctl_get_value(QEMUFile *f, uint64_t *value)
+{
+    int ret = 0;
+    uint64_t temp;
+
+    temp = qemu_get_be64(f);
+
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return -1;
+    }
+
+    *value = temp;
+    return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint64_t require)
+{
+    int ret;
+    uint64_t value;
+
+    ret = colo_ctl_get_value(f, &value);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (value != require) {
+        error_report("unexpected state! expected: %"PRIu64
+                     ", received: %"PRIu64, require, value);
+        exit(1);
+    }
+
+    return ret;
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
+{
+    int ret;
+
+    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to slave */
+
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("got COLO_CHECKPOINT_RECEIVED\n");
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("got COLO_CHECKPOINT_LOADED\n");
+
+    /* TODO: resume master */
+
+out:
+    return ret;
+}
+
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    QEMUFile *colo_control = NULL;
+    int ret;
+
+    colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
+    if (!colo_control) {
+        error_report("Open colo_control failed!");
+        goto out;
+    }
+
+    /*
+     * Wait for slave finish loading vm states and enter COLO
+     * restore.
+     */
+    ret = colo_ctl_get(colo_control, COLO_READY);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("get COLO_READY\n");
 
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     DPRINTF("vm resume to run\n");
 
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        if (colo_do_checkpoint_transaction(s, colo_control)) {
+            goto out;
+        }
+    }
 
-    /*TODO: COLO checkpoint savevm loop*/
-
+out:
     migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
 
+    if (colo_control) {
+        qemu_fclose(colo_control);
+    }
+
     qemu_mutex_lock_iothread();
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
@@ -83,14 +233,95 @@ void colo_init_checkpointer(MigrationState *s)
     qemu_bh_schedule(colo_bh);
 }
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * -1: some error happened, exit colo restore
+ */
+static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
+{
+    int ret;
+    uint64_t cmd;
+
+    ret = colo_ctl_get_value(f, &cmd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    switch (cmd) {
+    case COLO_CHECKPOINT_NEW:
+        *checkpoint_request = 1;
+        return 0;
+    default:
+        return -1;
+    }
+}
+
 void *colo_process_incoming_checkpoints(void *opaque)
 {
+    struct colo_incoming *colo_in = opaque;
+    QEMUFile *f = colo_in->file;
+    int fd = qemu_get_fd(f);
+    QEMUFile *ctl = NULL;
+    int ret;
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
-    /* TODO: COLO checkpoint restore loop */
+    ctl = qemu_fopen_socket(fd, "wb");
+    if (!ctl) {
+        error_report("Can't open incoming channel!");
+        goto out;
+    }
+    ret = colo_ctl_put(ctl, COLO_READY);
+    if (ret < 0) {
+        goto out;
+    }
+    /* TODO: in COLO mode, slave is runing, so start the vm */
+    while (true) {
+        int request = 0;
+        int ret = colo_wait_handle_cmd(f, &request);
+
+        if (ret < 0) {
+            break;
+        } else {
+            if (!request) {
+                continue;
+            }
+        }
 
+        /* TODO: suspend guest */
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("Got COLO_CHECKPOINT_SEND\n");
+
+        /* TODO: read migration data into colo buffer */
+
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("Recived vm state\n");
+
+        /* TODO: load vm state */
+
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
+        if (ret < 0) {
+            goto out;
+        }
+}
+
+out:
     colo = NULL;
+    if (ctl) {
+        qemu_fclose(ctl);
+    }
     loadvm_exit_colo();
 
     return NULL;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (5 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 06/28] COLO: Implement colo checkpoint protocol zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-05-15 11:28   ` Dr. David Alan Gilbert
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	david

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 qapi-schema.json | 5 ++++-
 vl.c             | 8 ++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 172aae3..43a964b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -148,12 +148,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.4)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked' ] }
+            'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index 9724992..8c07244 100644
--- a/vl.c
+++ b/vl.c
@@ -550,6 +550,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
+    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -559,6 +560,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
     { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_PAUSED, RUN_STATE_COLO},
 
     { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -569,9 +571,12 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+    { RUN_STATE_COLO, RUN_STATE_RUNNING },
+
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
     { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
     { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -582,6 +587,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
     { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
     { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+    { RUN_STATE_RUNNING, RUN_STATE_COLO},
 
     { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -592,9 +598,11 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
     { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (6 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-05-15 11:56   ` Dr. David Alan Gilbert
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint zhanghailiang
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Yang Hongyang,
	david

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c     | 58 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 745a850..09a0e2a 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -140,7 +140,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
                        uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
                      off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 16a51a1..686f417 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -365,6 +365,64 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
     return count;
 }
 
+
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
+{
+    int i, l;
+
+    for (i = 0; i < qsb->n_iov && size > 0; i++) {
+        l = MIN(qsb->iov[i].iov_len, size);
+        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+        size -= l;
+    }
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
+{
+    ssize_t rc = qsb_grow(qsb, size);
+    int pending = size, i;
+    qsb->used = 0;
+    uint8_t *buf = NULL;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+        int doneone = 0;
+        /* read until iov full */
+        while (doneone < qsb->iov[i].iov_len && pending > 0) {
+            int readone = 0;
+            buf = qsb->iov[i].iov_base;
+            readone = qemu_get_buffer(f, buf,
+                                MIN(qsb->iov[i].iov_len - doneone, pending));
+            if (readone == 0) {
+                return qsb->used;
+            }
+            buf += readone;
+            doneone += readone;
+            pending -= readone;
+            qsb->used += readone;
+        }
+    }
+    return qsb->used;
+}
+
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (7 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-05-15 12:09   ` Dr. David Alan Gilbert
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 10/28] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

We should save PVM's RAM/device to slave when needed.

For VM state, we  will cache them in slave, we use QEMUSizedBuffer
to store the data, we need know the data size of VM state, so in master,
we use qsb to store VM state temporarily, and then migrate the data to
slave.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 arch_init.c      | 22 ++++++++++++++++++--
 migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 savevm.c         |  2 +-
 3 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index fcfa328..e928e11 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -53,6 +53,7 @@
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
 #include "qemu/rcu_queue.h"
+#include "migration/migration-colo.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
+    /*
+     * migration has already setup the bitmap, reuse it.
+     */
+    if (migrate_in_colo_state()) {
+        goto setup_part;
+    }
+
     mig_throttle_on = false;
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
@@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
-
+setup_part:
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
+    if (migrate_in_colo_state()) {
+        rcu_read_lock();
+    }
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         qemu_put_byte(f, strlen(block->idstr));
         qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
@@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
 
     ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-    migration_end();
+
+    /*
+     * Since we need to reuse dirty bitmap in colo,
+     * don't cleanup the bitmap.
+     */
+    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
+        migration_end();
+    }
 
     rcu_read_unlock();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
diff --git a/migration/colo.c b/migration/colo.c
index 5a8ed1b..64e3f3a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -60,6 +60,9 @@ enum {
 
 static QEMUBH *colo_bh;
 static Coroutine *colo;
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
+QEMUSizedBuffer *colo_buffer;
 
 bool colo_supported(void)
 {
@@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
 static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
 {
     int ret;
+    size_t size;
+    QEMUFile *trans = NULL;
 
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
     if (ret < 0) {
@@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
     if (ret < 0) {
         goto out;
     }
+    /* Reset colo buffer and open it for write */
+    qsb_set_length(colo_buffer, 0);
+    trans = qemu_bufopen("w", colo_buffer);
+    if (!trans) {
+        error_report("Open colo buffer for write failed");
+        goto out;
+    }
+
+    /* suspend and save vm state to colo buffer */
+    qemu_mutex_lock_iothread();
+    vm_stop_force_state(RUN_STATE_COLO);
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm is stoped\n");
+
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_begin(trans, &s->params);
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_complete(trans);
+    qemu_mutex_unlock_iothread();
 
-    /* TODO: suspend and save vm state to colo buffer */
+    qemu_fflush(trans);
 
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
     }
+    /* we send the total size of the vmstate first */
+    size = qsb_get_length(colo_buffer);
+    ret = colo_ctl_put(s->file, size);
+    if (ret < 0) {
+        goto out;
+    }
 
-    /* TODO: send vmstate to slave */
-
+    qsb_put_buffer(s->file, colo_buffer, size);
+    qemu_fflush(s->file);
+    ret = qemu_file_get_error(s->file);
+    if (ret < 0) {
+        goto out;
+    }
     ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
     if (ret < 0) {
         goto out;
@@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
     }
     DPRINTF("got COLO_CHECKPOINT_LOADED\n");
 
-    /* TODO: resume master */
+    ret = 0;
+    /* resume master */
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm resume to run again\n");
 
 out:
+    if (trans) {
+        qemu_fclose(trans);
+    }
+
     return ret;
 }
 
@@ -182,6 +227,12 @@ static void *colo_thread(void *opaque)
     }
     DPRINTF("get COLO_READY\n");
 
+    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (colo_buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -197,6 +248,9 @@ static void *colo_thread(void *opaque)
 out:
     migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
 
+    qsb_free(colo_buffer);
+    colo_buffer = NULL;
+
     if (colo_control) {
         qemu_fclose(colo_control);
     }
diff --git a/savevm.c b/savevm.c
index 3b0e222..cd7ec27 100644
--- a/savevm.c
+++ b/savevm.c
@@ -42,7 +42,7 @@
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
-
+#include "migration/migration-colo.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 10/28] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (8 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 11/28] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

The ram cache is initially the same as SVM/PVM's memory.

At checkpoint, we cache the dirty RAM of PVM into RAM cache in the slave
(so that RAM cache always the same as PVM's memory at every
checkpoint), we will flush cached RAM to SVM after we receive
all PVM's vmstate (RAM/device).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 arch_init.c                        | 70 ++++++++++++++++++++++++++++++++++++--
 include/exec/cpu-all.h             |  1 +
 include/migration/migration-colo.h |  3 ++
 migration/colo.c                   | 27 ++++++++++++---
 4 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index e928e11..e32d258 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -314,6 +314,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -1085,6 +1086,8 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block);
+
 /* Must be called from within a rcu critical section.
  * Returns a pointer from within the RCU-protected ram_list.
  */
@@ -1102,7 +1105,17 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             return NULL;
         }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        if (ram_cache_enable) {
+            /*
+            * During colo checkpoint, we need bitmap of these migrated pages.
+            * It help us to decide which pages in ram cache should be flushed
+            * into VM's RAM later.
+            */
+            migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+            return memory_region_get_ram_cache_ptr(block->mr, block) + offset;
+        } else {
+            return memory_region_get_ram_ptr(block->mr) + offset;
+        }
     }
 
     len = qemu_get_byte(f);
@@ -1112,7 +1125,13 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (!strncmp(id, block->idstr, sizeof(id)) &&
             block->max_length > offset) {
-            return memory_region_get_ram_ptr(block->mr) + offset;
+            if (ram_cache_enable) {
+                migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+                return memory_region_get_ram_cache_ptr(block->mr, block)
+                       + offset;
+            } else {
+                return memory_region_get_ram_ptr(block->mr) + offset;
+            }
         }
     }
 
@@ -1251,6 +1270,53 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it will be called after first migration.
+ */
+void create_and_init_ram_cache(void)
+{
+    RAMBlock *block;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        block->host_cache = g_malloc(block->used_length);
+        memcpy(block->host_cache, block->host, block->used_length);
+    }
+    rcu_read_unlock();
+
+    ram_cache_enable = true;
+}
+
+void release_ram_cache(void)
+{
+    RAMBlock *block;
+
+    ram_cache_enable = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        g_free(block->host_cache);
+    }
+    rcu_read_unlock();
+}
+
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
+{
+   if (mr->alias) {
+        return memory_region_get_ram_cache_ptr(mr->alias, block) +
+               mr->alias_offset;
+    }
+
+    assert(mr->terminates);
+
+    ram_addr_t addr = mr->ram_addr & TARGET_PAGE_MASK;
+
+    assert(addr - block->offset < block->used_length);
+
+    return block->host_cache + (addr - block->offset);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ac06c67..bcfa3bc 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -272,6 +272,7 @@ struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
     uint8_t *host;
+    uint8_t *host_cache; /* For colo, VM's ram cache */
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index b326c35..d47ad72 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -35,4 +35,7 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+/* ram cache */
+void create_and_init_ram_cache(void);
+void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 64e3f3a..105434e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -326,11 +326,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
         error_report("Can't open incoming channel!");
         goto out;
     }
+
+    create_and_init_ram_cache();
+
     ret = colo_ctl_put(ctl, COLO_READY);
     if (ret < 0) {
         goto out;
     }
-    /* TODO: in COLO mode, slave is runing, so start the vm */
+    qemu_mutex_lock_iothread();
+    /* in COLO mode, slave is runing, so start the vm */
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm is start\n");
     while (true) {
         int request = 0;
         int ret = colo_wait_handle_cmd(f, &request);
@@ -343,7 +350,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
             }
         }
 
-        /* TODO: suspend guest */
+        /* suspend guest */
+        qemu_mutex_lock_iothread();
+        vm_stop_force_state(RUN_STATE_COLO);
+        qemu_mutex_unlock_iothread();
+        DPRINTF("suspend vm for checkpoint\n");
+
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
         if (ret < 0) {
             goto out;
@@ -355,7 +367,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Got COLO_CHECKPOINT_SEND\n");
 
-        /* TODO: read migration data into colo buffer */
+        /*TODO Load VM state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
         if (ret < 0) {
@@ -363,16 +375,23 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Recived vm state\n");
 
-        /* TODO: load vm state */
+        /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
             goto out;
         }
+
+        /* resume guest */
+        qemu_mutex_lock_iothread();
+        vm_start();
+        qemu_mutex_unlock_iothread();
+        DPRINTF("OK, vm runs again\n");
 }
 
 out:
     colo = NULL;
+    release_ram_cache();
     if (ctl) {
         qemu_fclose(ctl);
     }
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 11/28] COLO VMstate: Load VM state into qsb before restore it
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (9 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 10/28] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 12/28] arch_init: Start to trace dirty pages of SVM zhanghailiang
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Yang Hongyang,
	david

We should cache the device state to ensure the data is intact
before restore it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 migration/colo.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 105434e..119e66c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -316,8 +316,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
     struct colo_incoming *colo_in = opaque;
     QEMUFile *f = colo_in->file;
     int fd = qemu_get_fd(f);
-    QEMUFile *ctl = NULL;
+    QEMUFile *ctl = NULL, *fb = NULL;
     int ret;
+    uint64_t total_size;
+
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
@@ -329,10 +331,17 @@ void *colo_process_incoming_checkpoints(void *opaque)
 
     create_and_init_ram_cache();
 
+    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (colo_buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     ret = colo_ctl_put(ctl, COLO_READY);
     if (ret < 0) {
         goto out;
     }
+
     qemu_mutex_lock_iothread();
     /* in COLO mode, slave is runing, so start the vm */
     vm_start();
@@ -367,14 +376,39 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Got COLO_CHECKPOINT_SEND\n");
 
-        /*TODO Load VM state */
+        /* read the VM state total size first */
+        ret = colo_ctl_get_value(f, &total_size);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("vmstate total size = %ld\n", total_size);
+        /* read vm device state into colo buffer */
+        ret = qsb_fill_buffer(colo_buffer, f, total_size);
+        if (ret != total_size) {
+            error_report("can't get all migration data");
+            goto out;
+        }
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
         if (ret < 0) {
             goto out;
         }
         DPRINTF("Recived vm state\n");
+        /* open colo buffer for read */
+        fb = qemu_bufopen("r", colo_buffer);
+        if (!fb) {
+            error_report("can't open colo buffer for read");
+            goto out;
+        }
 
+        qemu_mutex_lock_iothread();
+        if (qemu_loadvm_state(fb) < 0) {
+            error_report("COLO: loadvm failed");
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        DPRINTF("Finish load all vm state to cache\n");
+        qemu_mutex_unlock_iothread();
         /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
@@ -387,14 +421,25 @@ void *colo_process_incoming_checkpoints(void *opaque)
         vm_start();
         qemu_mutex_unlock_iothread();
         DPRINTF("OK, vm runs again\n");
-}
+
+        qemu_fclose(fb);
+        fb = NULL;
+    }
 
 out:
     colo = NULL;
+
+    if (fb) {
+        qemu_fclose(fb);
+    }
+
     release_ram_cache();
     if (ctl) {
         qemu_fclose(ctl);
     }
+
+    qsb_free(colo_buffer);
+
     loadvm_exit_colo();
 
     return NULL;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 12/28] arch_init: Start to trace dirty pages of SVM
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (10 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 11/28] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 13/28] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

we will use this dirty bitmap together with VM's cache RAM dirty bitmap
to decide which page in cache should be flushed into VM's RAM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 arch_init.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index e32d258..f3f2460 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1277,6 +1277,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 void create_and_init_ram_cache(void)
 {
     RAMBlock *block;
+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
@@ -1286,6 +1287,14 @@ void create_and_init_ram_cache(void)
     rcu_read_unlock();
 
     ram_cache_enable = true;
+    /*
+    * Start dirty log for slave VM, we will use this dirty bitmap together with
+    * VM's cache RAM dirty bitmap to decide which page in cache should be
+    * flushed into VM's RAM.
+    */
+    migration_bitmap = bitmap_new(ram_cache_pages);
+    migration_dirty_pages = 0;
+    memory_global_dirty_log_start();
 }
 
 void release_ram_cache(void)
@@ -1294,6 +1303,12 @@ void release_ram_cache(void)
 
     ram_cache_enable = false;
 
+    if (migration_bitmap) {
+        memory_global_dirty_log_stop();
+        g_free(migration_bitmap);
+        migration_bitmap = NULL;
+    }
+
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         g_free(block->host_cache);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 13/28] COLO RAM: Flush cached RAM into SVM's memory
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (11 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 12/28] arch_init: Start to trace dirty pages of SVM zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 14/28] COLO failover: Introduce a new command to trigger a failover zhanghailiang
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

During the time of VM's running, PVM/SVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be some with PVM's memory
after checkpoint.

Instead of flushing all content of SVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM or SVM since last checkpoint.
In this way, we ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 arch_init.c                        | 92 ++++++++++++++++++++++++++++++++++++++
 include/migration/migration-colo.h |  1 +
 migration/colo.c                   |  1 -
 3 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index f3f2460..2abb5d2 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1154,6 +1154,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
     static uint64_t seq_iter;
+    bool need_flush = false;
 
     seq_iter++;
 
@@ -1221,6 +1222,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
+
+            need_flush = true;
             ch = qemu_get_byte(f);
             ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
             break;
@@ -1231,6 +1234,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
+
+            need_flush = true;
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
             break;
         case RAM_SAVE_FLAG_XBZRLE:
@@ -1246,6 +1251,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
+            need_flush = true;
             break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
@@ -1265,6 +1271,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     rcu_read_unlock();
+
+    if (!ret  && ram_cache_enable && need_flush) {
+        DPRINTF("Flush ram_cache\n");
+        colo_flush_ram_cache();
+    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
@@ -1332,6 +1343,87 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
     return block->host_cache + (addr - block->offset);
 }
 
+/* fix me: should this helper function be merged with
+ * migration_bitmap_find_and_reset_dirty ?
+ */
+static inline
+ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
+                                            ram_addr_t start)
+{
+    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
+    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
+    uint64_t mr_size = TARGET_PAGE_ALIGN(memory_region_size(mr));
+    unsigned long size = base + (mr_size >> TARGET_PAGE_BITS);
+
+    unsigned long next;
+
+    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
+                         size, nr);
+    if (next < size) {
+        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+    }
+    return (next - base) << TARGET_PAGE_BITS;
+}
+
+/*
+* Flush content of RAM cache into SVM's memory.
+* Only flush the pages that be dirtied by PVM or SVM or both.
+*/
+void colo_flush_ram_cache(void)
+{
+    RAMBlock *block = NULL;
+    void *dst_host;
+    void *src_host;
+    ram_addr_t ca  = 0, ha = 0;
+    bool got_ca = 0, got_ha = 0;
+    int64_t host_dirty = 0, both_dirty = 0;
+
+    address_space_sync_dirty_bitmap(&address_space_memory);
+    rcu_read_lock();
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    while (true) {
+        if (ca < block->used_length && ca <= ha) {
+            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
+            if (ca < block->used_length) {
+                got_ca = 1;
+            }
+        }
+        if (ha < block->used_length && ha <= ca) {
+            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
+            if (ha < block->used_length && ha != ca) {
+                got_ha = 1;
+            }
+            host_dirty += (ha < block->used_length ? 1 : 0);
+            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
+        }
+        if (ca >= block->used_length && ha >= block->used_length) {
+            ca = 0;
+            ha = 0;
+            block = QLIST_NEXT_RCU(block, next);
+            if (!block) {
+                break;
+            }
+        } else {
+            if (got_ha) {
+                got_ha = 0;
+                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
+                           + ha;
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+            }
+            if (got_ca) {
+                got_ca = 0;
+                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
+                           + ca;
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+            }
+        }
+    }
+    rcu_read_unlock();
+    assert(migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index d47ad72..7e177f8 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -37,5 +37,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
 /* ram cache */
 void create_and_init_ram_cache(void);
+void colo_flush_ram_cache(void);
 void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 119e66c..386f5f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -409,7 +409,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Finish load all vm state to cache\n");
         qemu_mutex_unlock_iothread();
-        /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 14/28] COLO failover: Introduce a new command to trigger a failover
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (12 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 13/28] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 15/28] COLO failover: Implement COLO master/slave failover work zhanghailiang
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

We leave users to use whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use command
'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example,
If send the command to PVM, Primary will exit COLO mode, and takeover,
if to Secondary, Secondary will do failover work and at last takeover server.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 hmp-commands.hx                        | 15 ++++++++++++++
 hmp.c                                  |  7 +++++++
 hmp.h                                  |  1 +
 include/migration/migration-colo.h     |  1 +
 include/migration/migration-failover.h | 20 ++++++++++++++++++
 migration/Makefile.objs                |  2 +-
 migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
 migration/colo.c                       |  1 +
 qapi-schema.json                       |  9 ++++++++
 qmp-commands.hx                        | 19 +++++++++++++++++
 stubs/migration-colo.c                 |  8 +++++++
 11 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3089533..e615889 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -993,6 +993,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "colo_lost_heartbeat",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
+                      "a failover or takeover is needed.",
+        .mhandler.cmd = hmp_colo_lost_heartbeat,
+    },
+
+STEXI
+@item colo_lost_heartbeat
+@findex colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index f31ae27..ee0e139 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1184,6 +1184,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_colo_lost_heartbeat(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 2b9308b..af85144 100644
--- a/hmp.h
+++ b/hmp.h
@@ -65,6 +65,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 7e177f8..593431a 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "block/coroutine.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
diff --git a/include/migration/migration-failover.h b/include/migration/migration-failover.h
new file mode 100644
index 0000000..a8767fc
--- /dev/null
+++ b/include/migration/migration-failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_FAILOVER_H
+#define MIGRATION_FAILOVER_H
+
+#include "qemu-common.h"
+
+void failover_request_set(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index cb7bd30..50d8392 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o tcp.o
-common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += colo-comm.o
+common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
new file mode 100644
index 0000000..af78054
--- /dev/null
+++ b/migration/colo-failover.c
@@ -0,0 +1,38 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+#include "migration/migration-failover.h"
+#include "qmp-commands.h"
+
+static bool failover_request;
+
+static QEMUBH *failover_bh;
+
+static void colo_failover_bh(void *opaque)
+{
+    qemu_bh_delete(failover_bh);
+    failover_bh = NULL;
+    /*TODO: Do failover work */
+}
+
+void failover_request_set(void)
+{
+    failover_request = true;
+    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
+    qemu_bh_schedule(failover_bh);
+}
+
+void qmp_colo_lost_heartbeat(Error **errp)
+{
+    failover_request_set();
+}
diff --git a/migration/colo.c b/migration/colo.c
index 386f5f5..ce41afb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
+#include "migration/migration-failover.h"
 
 #define DEBUG_COLO 0
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 43a964b..e11c152 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -567,6 +567,15 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @colo-lost-heartbeat
+#
+# Tell COLO that heartbeat is lost
+#
+# Since: 2.4
+##
+{ 'command': 'colo-lost-heartbeat' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7f68760..85d3d72 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -781,6 +781,25 @@ Example:
 EQMP
 
     {
+        .name       = "colo-lost-heartbeat",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
+    },
+
+SQMP
+colo-lost-heartbeat
+--------------------
+
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+
+Example:
+
+-> { "execute": "colo-lost-heartbeat" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index cbadcd6..82fe14c 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -11,6 +11,7 @@
  */
 
 #include "migration/migration-colo.h"
+#include "qmp-commands.h"
 
 bool colo_supported(void)
 {
@@ -30,3 +31,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
 {
     return NULL;
 }
+
+void qmp_colo_lost_heartbeat(Error **errp)
+{
+    error_setg(errp, "COLO is not supported, please rerun configure"
+                     " with --enable-colo option in order to support"
+                     " COLO feature");
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 15/28] COLO failover: Implement COLO master/slave failover work
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (13 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 14/28] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 16/28] COLO failover: Don't do failover during loading VM's state zhanghailiang
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	david

If failover is requested, after some cleanup work,
PVM or SVM will exit COLO mode, and resume to normal run.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration-colo.h     |  14 ++++
 include/migration/migration-failover.h |   2 +
 migration/colo-comm.c                  |  10 +++
 migration/colo-failover.c              |  12 +++-
 migration/colo.c                       | 122 ++++++++++++++++++++++++++++++++-
 stubs/migration-colo.c                 |   5 ++
 6 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 593431a..7e8fe46 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -22,6 +22,13 @@
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+/* Checkpoint control, called in migration/checkpoint thread */
+enum {
+    COLO_UNPROTECTED_MODE = 0,
+    COLO_PRIMARY_MODE,
+    COLO_SECONDARY_MODE,
+};
+
 struct colo_incoming {
     QEMUFile *file;
     QemuThread thread;
@@ -36,8 +43,15 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+
+int get_colo_mode(void);
+
 /* ram cache */
 void create_and_init_ram_cache(void);
 void colo_flush_ram_cache(void);
 void release_ram_cache(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration-failover.h b/include/migration/migration-failover.h
index a8767fc..5e59b1d 100644
--- a/include/migration/migration-failover.h
+++ b/include/migration/migration-failover.h
@@ -16,5 +16,7 @@
 #include "qemu-common.h"
 
 void failover_request_set(void);
+void failover_request_clear(void);
+bool failover_request_is_set(void);
 
 #endif
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 1d844e1..c3dd617 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -31,6 +31,16 @@ static void colo_info_save(QEMUFile *f, void *opaque)
 }
 
 /* restore */
+int get_colo_mode(void)
+{
+    if (migrate_in_colo_state()) {
+        return COLO_PRIMARY_MODE;
+    } else if (loadvm_in_colo_state()) {
+        return COLO_SECONDARY_MODE;
+    } else {
+        return COLO_UNPROTECTED_MODE;
+    }
+}
 static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
 {
     int value = qemu_get_byte(f);
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index af78054..850b05c 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -22,7 +22,7 @@ static void colo_failover_bh(void *opaque)
 {
     qemu_bh_delete(failover_bh);
     failover_bh = NULL;
-    /*TODO: Do failover work */
+    colo_do_failover(NULL);
 }
 
 void failover_request_set(void)
@@ -32,6 +32,16 @@ void failover_request_set(void)
     qemu_bh_schedule(failover_bh);
 }
 
+void failover_request_clear(void)
+{
+    failover_request = false;
+}
+
+bool failover_request_is_set(void)
+{
+    return failover_request;
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     failover_request_set();
diff --git a/migration/colo.c b/migration/colo.c
index ce41afb..6240178 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -76,6 +76,68 @@ bool migrate_in_colo_state(void)
     return (s->state == MIGRATION_STATUS_COLO);
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+/*
+ * there are two way to entry this function
+ * 1. From colo checkpoint incoming thread, in this case
+ * we should protect it by iothread lock
+ * 2. From user command, because hmp/qmp command
+ * was happened in main loop, iothread lock will cause a
+ * dead lock.
+ */
+static void slave_do_failover(void)
+{
+    DPRINTF("do_failover!\n");
+
+    colo = NULL;
+
+    if (!autostart) {
+        error_report("\"-S\" qemu option will be ignored in colo slave side");
+        /* recover runstate to normal migration finish state */
+        autostart = true;
+    }
+
+    /* On slave side, jump to incoming co */
+    if (migration_incoming_co) {
+        qemu_coroutine_enter(migration_incoming_co, NULL);
+    }
+}
+
+static void master_do_failover(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (s->state != MIGRATION_STATUS_FAILED) {
+        migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
+    }
+
+    vm_start();
+}
+
+static bool failover_completed;
+void colo_do_failover(MigrationState *s)
+{
+    /* Make sure vm stopped while failover */
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (get_colo_mode() == COLO_SECONDARY_MODE) {
+        slave_do_failover();
+    } else {
+        master_do_failover();
+    }
+    failover_completed = true;
+}
+
 /* colo checkpoint control helper */
 static int colo_ctl_put(QEMUFile *f, uint64_t request)
 {
@@ -147,11 +209,23 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
         goto out;
     }
 
+    if (failover_request_is_set()) {
+        ret = -1;
+        goto out;
+    }
     /* suspend and save vm state to colo buffer */
     qemu_mutex_lock_iothread();
     vm_stop_force_state(RUN_STATE_COLO);
     qemu_mutex_unlock_iothread();
     DPRINTF("vm is stoped\n");
+    /*
+     * failover request bh could be called after
+     * vm_stop_force_state so we check failover_request_is_set() again.
+     */
+    if (failover_request_is_set()) {
+        ret = -1;
+        goto out;
+    }
 
     /* Disable block migration */
     s->params.blk = 0;
@@ -247,7 +321,18 @@ static void *colo_thread(void *opaque)
     }
 
 out:
-    migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
+    error_report("colo: some error happens in colo_thread");
+    qemu_mutex_lock_iothread();
+    if (!failover_request_is_set()) {
+        error_report("master takeover from checkpoint channel");
+        failover_request_set();
+    }
+    qemu_mutex_unlock_iothread();
+
+    while (!failover_completed) {
+        ;
+    }
+    failover_request_clear();
 
     qsb_free(colo_buffer);
     colo_buffer = NULL;
@@ -288,6 +373,11 @@ void colo_init_checkpointer(MigrationState *s)
     qemu_bh_schedule(colo_bh);
 }
 
+bool loadvm_in_colo_state(void)
+{
+    return colo != NULL;
+}
+
 /*
  * return:
  * 0: start a checkpoint
@@ -359,6 +449,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
                 continue;
             }
         }
+        if (failover_request_is_set()) {
+            error_report("failover request from heartbeat channel");
+            goto out;
+        }
 
         /* suspend guest */
         qemu_mutex_lock_iothread();
@@ -427,6 +521,32 @@ void *colo_process_incoming_checkpoints(void *opaque)
     }
 
 out:
+    error_report("Detect some error or get a failover request");
+    /* determine whether we need to failover */
+    if (!failover_request_is_set()) {
+        /*
+        * TODO: Here, maybe we should raise a qmp event to the user,
+        * It can help user to know what happens, and help deciding whether to
+        * do failover.
+        */
+        usleep(2000 * 1000);
+    }
+    /* check flag again*/
+    if (!failover_request_is_set()) {
+        /*
+        * We assume that master is still alive according to heartbeat,
+        * just kill slave
+        */
+        error_report("SVM is going to exit!");
+        exit(1);
+    } else {
+        /* if we went here, means master may dead, we are doing failover */
+        while (!failover_completed) {
+            ;
+        }
+        failover_request_clear();
+    }
+
     colo = NULL;
 
     if (fb) {
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 82fe14c..75b9940 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -32,6 +32,11 @@ void *colo_process_incoming_checkpoints(void *opaque)
     return NULL;
 }
 
+bool loadvm_in_colo_state(void)
+{
+    return false;
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     error_setg(errp, "COLO is not supported, please rerun configure"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 16/28] COLO failover: Don't do failover during loading VM's state
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (14 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 15/28] COLO failover: Implement COLO master/slave failover work zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 17/28] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	david

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 migration/colo.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index 6240178..f419e88 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -60,6 +60,7 @@ enum {
 };
 
 static QEMUBH *colo_bh;
+static bool vmstate_loading;
 static Coroutine *colo;
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
@@ -91,7 +92,10 @@ static bool colo_runstate_is_stopped(void)
  */
 static void slave_do_failover(void)
 {
-    DPRINTF("do_failover!\n");
+    /* Wait for incoming thread loading vmstate */
+    while (vmstate_loading) {
+        ;
+    }
 
     colo = NULL;
 
@@ -125,6 +129,7 @@ static void master_do_failover(void)
 static bool failover_completed;
 void colo_do_failover(MigrationState *s)
 {
+    DPRINTF("do_failover!\n");
     /* Make sure vm stopped while failover */
     if (!colo_runstate_is_stopped()) {
         vm_stop_force_state(RUN_STATE_COLO);
@@ -497,12 +502,15 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
 
         qemu_mutex_lock_iothread();
+        vmstate_loading = true;
         if (qemu_loadvm_state(fb) < 0) {
             error_report("COLO: loadvm failed");
+            vmstate_loading = false;
             qemu_mutex_unlock_iothread();
             goto out;
         }
         DPRINTF("Finish load all vm state to cache\n");
+        vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 17/28] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (15 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 16/28] COLO failover: Don't do failover during loading VM's state zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 18/28] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

The 'colo_nicname' should be assigned with network name,
for exmple, 'eth2'. It will be parameter of 'colo_script',
'colo_script' should be assigned with an scirpt path.

We parse these parameter in tap.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/net.h |  4 ++++
 net/tap.c         | 27 ++++++++++++++++++++++++---
 qapi-schema.json  |  8 +++++++-
 qemu-options.hx   | 10 +++++++++-
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 50ffcb9..6cc575f 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -84,6 +84,10 @@ struct NetClientState {
     char *model;
     char *name;
     char info_str[256];
+    char colo_script[1024];
+    char colo_nicname[128];
+    char ifname[128];
+    char ifb[2][128];
     unsigned receive_disabled : 1;
     NetClientDestructor *destructor;
     unsigned int queue_index;
diff --git a/net/tap.c b/net/tap.c
index 968df46..823f78e 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -608,6 +608,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
     Error *err = NULL;
     TAPState *s;
     int vhostfd;
+    NetClientState *nc = NULL;
 
     s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
     if (!s) {
@@ -635,6 +636,17 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
     }
 
+    nc = &(s->nc);
+    snprintf(nc->ifname, sizeof(nc->ifname), "%s", ifname);
+    if (tap->has_colo_script) {
+        snprintf(nc->colo_script, sizeof(nc->colo_script), "%s",
+                 tap->colo_script);
+    }
+    if (tap->has_colo_nicname) {
+        snprintf(nc->colo_nicname, sizeof(nc->colo_nicname), "%s",
+                 tap->colo_nicname);
+    }
+
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
@@ -754,9 +766,10 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (tap->has_ifname || tap->has_script || tap->has_downscript ||
             tap->has_vnet_hdr || tap->has_helper || tap->has_queues ||
-            tap->has_vhostfd) {
+            tap->has_vhostfd || tap->has_colo_script || tap->has_colo_nicname) {
             error_report("ifname=, script=, downscript=, vnet_hdr=, "
                          "helper=, queues=, and vhostfd= "
+                         "colo_script=, and colo_nicname= "
                          "are invalid with fds=");
             return -1;
         }
@@ -796,9 +809,11 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
         }
     } else if (tap->has_helper) {
         if (tap->has_ifname || tap->has_script || tap->has_downscript ||
-            tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds) {
+            tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds ||
+            tap->has_colo_script || tap->has_colo_nicname) {
             error_report("ifname=, script=, downscript=, and vnet_hdr= "
-                         "queues=, and vhostfds= are invalid with helper=");
+                         "queues=, vhostfds=, colo_script=, and "
+                         "colo_nicname= are invalid with helper=");
             return -1;
         }
 
@@ -817,6 +832,12 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             return -1;
         }
     } else {
+        if (queues > 1 && (tap->has_colo_script || tap->has_colo_nicname)) {
+            error_report("queues > 1 is invalid if colo_script or "
+                         "colo_nicname is specified");
+            return -1;
+        }
+
         if (tap->has_vhostfds) {
             error_report("vhostfds= is invalid if fds= wasn't specified");
             return -1;
diff --git a/qapi-schema.json b/qapi-schema.json
index e11c152..8abc367 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2143,6 +2143,10 @@
 #
 # @queues: #optional number of queues to be created for multiqueue capable tap
 #
+# @colo_nicname: #optional the host physical nic for QEMU (Since 2.3)
+#
+# @colo_script: #optional the script file which used by COLO (Since 2.3)
+#
 # Since 1.2
 ##
 { 'type': 'NetdevTapOptions',
@@ -2159,7 +2163,9 @@
     '*vhostfd':    'str',
     '*vhostfds':   'str',
     '*vhostforce': 'bool',
-    '*queues':     'uint32'} }
+    '*queues':     'uint32',
+    '*colo_nicname':  'str',
+    '*colo_script':   'str'} }
 
 ##
 # @NetdevSocketOptions
diff --git a/qemu-options.hx b/qemu-options.hx
index 319d971..ff63c50 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1466,7 +1466,11 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "-net tap[,vlan=n][,name=str],ifname=name\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
 #else
-    "-net tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n"
+    "-net tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]"
+#ifdef CONFIG_COLO
+    "[,colo_nicname=nicname][,colo_script=scriptfile]"
+#endif
+    "\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
     "                use network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
     "                to configure it and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -1486,6 +1490,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "                use 'vhostfd=h' to connect to an already opened vhost net device\n"
     "                use 'vhostfds=x:y:...:z to connect to multiple already opened vhost net devices\n"
     "                use 'queues=n' to specify the number of queues to be created for multiqueue TAP\n"
+#ifdef CONFIG_COLO
+    "                use 'colo_nicname=nicname' to specify the host physical nic for QEMU\n"
+    "                use 'colo_script=scriptfile' to specify script file when colo is enabled\n"
+#endif
     "-net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n"
     "                connects a host TAP network interface to a host bridge device 'br'\n"
     "                (default=" DEFAULT_BRIDGE_INTERFACE ") using the program 'helper'\n"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 18/28] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (16 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 17/28] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 19/28] COLO NIC: Implement colo nic device interface configure() zhanghailiang
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

When COLO mode, we will do some init work for nic that will be used for COLO.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h | 20 ++++++++++++++
 net/Makefile.objs      |  1 +
 net/colo-nic.c         | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/tap.c              | 18 +++++++++----
 stubs/migration-colo.c |  9 +++++++
 5 files changed, 116 insertions(+), 5 deletions(-)
 create mode 100644 include/net/colo-nic.h
 create mode 100644 net/colo-nic.c

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
new file mode 100644
index 0000000..d35ee17
--- /dev/null
+++ b/include/net/colo-nic.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef COLO_NIC_H
+#define COLO_NIC_H
+
+void colo_add_nic_devices(NetClientState *nc);
+void colo_remove_nic_devices(NetClientState *nc);
+
+#endif
diff --git a/net/Makefile.objs b/net/Makefile.objs
index ec19cb3..73f4a81 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
+common-obj-$(CONFIG_COLO) += colo-nic.o
diff --git a/net/colo-nic.c b/net/colo-nic.c
new file mode 100644
index 0000000..965af49
--- /dev/null
+++ b/net/colo-nic.c
@@ -0,0 +1,73 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#include "include/migration/migration.h"
+#include "migration/migration-colo.h"
+#include "net/net.h"
+#include "net/colo-nic.h"
+#include "qemu/error-report.h"
+
+
+typedef struct nic_device {
+    NetClientState *nc;
+    bool (*support_colo)(NetClientState *nc);
+    int (*configure)(NetClientState *nc, bool up, int side, int index);
+    QTAILQ_ENTRY(nic_device) next;
+    bool is_up;
+} nic_device;
+
+
+
+QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
+static int colo_nic_side = -1;
+
+/*
+* colo_proxy_script usage
+* ./colo_proxy_script master/slave install/uninstall phy_if virt_if index
+*/
+static bool colo_nic_support(NetClientState *nc)
+{
+    return nc && nc->colo_script[0] && nc->colo_nicname[0];
+}
+
+void colo_add_nic_devices(NetClientState *nc)
+{
+    struct nic_device *nic = g_malloc0(sizeof(*nic));
+
+    nic->support_colo = colo_nic_support;
+    nic->configure = NULL;
+    /*
+     * TODO
+     * only support "-netdev tap,colo_scripte..."  options
+     * "-net nic -net tap..." options is not supported
+     */
+    nic->nc = nc;
+
+    QTAILQ_INSERT_TAIL(&nic_devices, nic, next);
+}
+
+void colo_remove_nic_devices(NetClientState *nc)
+{
+    struct nic_device *nic, *next_nic;
+
+    if (!nc || colo_nic_side == -1) {
+        return;
+    }
+
+    QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
+        if (nic->nc == nc) {
+            QTAILQ_REMOVE(&nic_devices, nic, next);
+            g_free(nic);
+        }
+    }
+    colo_nic_side = -1;
+}
diff --git a/net/tap.c b/net/tap.c
index 823f78e..d64e046 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,7 @@
 #include "qemu/error-report.h"
 
 #include "net/tap.h"
+#include "net/colo-nic.h"
 
 #include "net/vhost_net.h"
 
@@ -296,6 +297,8 @@ static void tap_cleanup(NetClientState *nc)
 
     qemu_purge_queued_packets(nc);
 
+    colo_remove_nic_devices(nc);
+
     if (s->down_script[0])
         launch_script(s->down_script, s->down_script_arg, s->fd);
 
@@ -603,7 +606,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                             const char *model, const char *name,
                             const char *ifname, const char *script,
                             const char *downscript, const char *vhostfdname,
-                            int vnet_hdr, int fd)
+                            int vnet_hdr, int fd, bool setup_colo)
 {
     Error *err = NULL;
     TAPState *s;
@@ -647,6 +650,10 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                  tap->colo_nicname);
     }
 
+    if (setup_colo) {
+        colo_add_nic_devices(nc);
+    }
+
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
@@ -756,7 +763,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (net_init_tap_one(tap, peer, "tap", name, NULL,
                              script, downscript,
-                             vhostfdname, vnet_hdr, fd)) {
+                             vhostfdname, vnet_hdr, fd, true)) {
             return -1;
         }
     } else if (tap->has_fds) {
@@ -803,7 +810,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             if (net_init_tap_one(tap, peer, "tap", name, ifname,
                                  script, downscript,
                                  tap->has_vhostfds ? vhost_fds[i] : NULL,
-                                 vnet_hdr, fd)) {
+                                 vnet_hdr, fd, false)) {
                 return -1;
             }
         }
@@ -827,7 +834,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (net_init_tap_one(tap, peer, "bridge", name, ifname,
                              script, downscript, vhostfdname,
-                             vnet_hdr, fd)) {
+                             vnet_hdr, fd, false)) {
             close(fd);
             return -1;
         }
@@ -870,7 +877,8 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             if (net_init_tap_one(tap, peer, "tap", name, ifname,
                                  i >= 1 ? "no" : script,
                                  i >= 1 ? "no" : downscript,
-                                 vhostfdname, vnet_hdr, fd)) {
+                                 vhostfdname, vnet_hdr, fd,
+                                 i == 0)) {
                 close(fd);
                 return -1;
             }
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 75b9940..61fe22b 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -12,6 +12,7 @@
 
 #include "migration/migration-colo.h"
 #include "qmp-commands.h"
+#include "net/colo-nic.h"
 
 bool colo_supported(void)
 {
@@ -37,6 +38,14 @@ bool loadvm_in_colo_state(void)
     return false;
 }
 
+void colo_add_nic_devices(NetClientState *nc)
+{
+}
+
+void colo_remove_nic_devices(NetClientState *nc)
+{
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     error_setg(errp, "COLO is not supported, please rerun configure"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 19/28] COLO NIC: Implement colo nic device interface configure()
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (17 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 18/28] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 20/28] COLO NIC : Implement colo nic init/destroy function zhanghailiang
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

Implement colo nic device interface configure()
add a script to configure nic devices:
${QEMU_SCRIPT_DIR}/colo-proxy-script.sh

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 net/colo-nic.c               | 56 ++++++++++++++++++++++++-
 scripts/colo-proxy-script.sh | 97 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100755 scripts/colo-proxy-script.sh

diff --git a/net/colo-nic.c b/net/colo-nic.c
index 965af49..f8fc35d 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
     return nc && nc->colo_script[0] && nc->colo_nicname[0];
 }
 
+static int launch_colo_script(char *argv[])
+{
+    int pid, status;
+    char *script = argv[0];
+
+    /* try to launch network script */
+    pid = fork();
+    if (pid == 0) {
+        execv(script, argv);
+        _exit(1);
+    } else if (pid > 0) {
+        while (waitpid(pid, &status, 0) != pid) {
+            /* loop */
+        }
+
+        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+            return 0;
+        }
+    }
+    return -1;
+}
+
+static int colo_nic_configure(NetClientState *nc,
+            bool up, int side, int index)
+{
+    int i, argc = 6;
+    char *argv[7], index_str[32];
+    char **parg;
+
+    if (!nc && index <= 0) {
+        error_report("Can not parse colo_script or colo_nicname");
+        return -1;
+    }
+
+    parg = argv;
+    *parg++ = nc->colo_script;
+    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
+    *parg++ = (char *)(up ? "install" : "uninstall");
+    *parg++ = nc->colo_nicname;
+    *parg++ = nc->ifname;
+    sprintf(index_str, "%d", index);
+    *parg++ = index_str;
+    *parg = NULL;
+
+    for (i = 0; i < argc; i++) {
+        if (!argv[i][0]) {
+            error_report("Can not get colo_script argument");
+            return -1;
+        }
+    }
+
+    return launch_colo_script(argv);
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
     struct nic_device *nic = g_malloc0(sizeof(*nic));
 
     nic->support_colo = colo_nic_support;
-    nic->configure = NULL;
+    nic->configure = colo_nic_configure;
     /*
      * TODO
      * only support "-netdev tap,colo_scripte..."  options
diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
new file mode 100755
index 0000000..e1a9154
--- /dev/null
+++ b/scripts/colo-proxy-script.sh
@@ -0,0 +1,97 @@
+#!/bin/sh
+#usage: colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
+#.e.g colo-proxy-script.sh master install eth2 tap0 1
+
+side=$1
+action=$2
+phy_if=$3
+virt_if=$4
+index=$5
+br=br1
+failover_br=br0
+
+script_usage()
+{
+    echo -n "usage: ./colo-proxy-script.sh master/slave "
+    echo -e "install/uninstall phy_if virt_if index\n"
+}
+
+master_install()
+{
+    tc qdisc add dev $virt_if root handle 1: prio
+    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+
+    modprobe nf_conntrack_ipv4
+    modprobe xt_PMYCOLO sec_dev=$phy_if
+
+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $virt_if -j PMYCOLO --index $index
+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $virt_if -j PMYCOLO --index $index
+    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
+}
+
+master_uninstall()
+{
+    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc qdisc del dev $virt_if root handle 1: prio
+
+    /usr/local/sbin/iptables -t mangle -F
+    /usr/local/sbin/ip6tables -t mangle -F
+    /usr/local/sbin/arptables -F
+    rmmod xt_PMYCOLO
+}
+
+slave_install()
+{
+    brctl addif $br $phy_if
+    modprobe xt_SECCOLO
+
+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $virt_if -j SECCOLO --index $index
+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $virt_if -j SECCOLO --index $index
+}
+
+slave_uninstall()
+{
+    brctl delif $br $phy_if
+    brctl delif $br $virt_if
+    brctl addif $failover_br $virt_if
+
+    /usr/local/sbin/iptables -t mangle -F
+    /usr/local/sbin/ip6tables -t mangle -F
+    rmmod xt_SECCOLO
+}
+
+if [ $# -ne 5 ]; then
+    script_usage
+    exit 1
+fi
+
+if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
+    script_usage
+    exit 2
+fi
+
+if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
+    script_usage
+    exit 3
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 4
+fi
+
+${side}_${action}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 20/28] COLO NIC : Implement colo nic init/destroy function
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (18 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 19/28] COLO NIC: Implement colo nic device interface configure() zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 21/28] COLO NIC: Some init work related with proxy module zhanghailiang
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

When in colo mode, call colo nic init/destroy function.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c       | 17 +++++++++++
 net/colo-nic.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index d35ee17..40dbcfb 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -14,6 +14,8 @@
 #ifndef COLO_NIC_H
 #define COLO_NIC_H
 
+int colo_proxy_init(int side);
+void colo_proxy_destroy(int side);
 void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
diff --git a/migration/colo.c b/migration/colo.c
index f419e88..dffd6f9 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -14,6 +14,7 @@
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
+#include "net/colo-nic.h"
 
 #define DEBUG_COLO 0
 
@@ -291,6 +292,12 @@ static void *colo_thread(void *opaque)
     QEMUFile *colo_control = NULL;
     int ret;
 
+    if (colo_proxy_init(COLO_PRIMARY_MODE) != 0) {
+        error_report("Init colo proxy error");
+        goto out;
+    }
+    DPRINTF("proxy init complete\n");
+
     colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
     if (!colo_control) {
         error_report("Open colo_control failed!");
@@ -350,6 +357,8 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
+    colo_proxy_destroy(COLO_PRIMARY_MODE);
+
     return NULL;
 }
 
@@ -419,6 +428,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
+     /* configure the network */
+    if (colo_proxy_init(COLO_SECONDARY_MODE) != 0) {
+        error_report("Init colo proxy error\n");
+        goto out;
+    }
+    DPRINTF("proxy init complete\n");
+
     ctl = qemu_fopen_socket(fd, "wb");
     if (!ctl) {
         error_report("Can't open incoming channel!");
@@ -570,5 +586,6 @@ out:
 
     loadvm_exit_colo();
 
+    colo_proxy_destroy(COLO_SECONDARY_MODE);
     return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index f8fc35d..a4719ce 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -26,6 +26,12 @@ typedef struct nic_device {
 } nic_device;
 
 
+typedef struct colo_proxy {
+    int sockfd;
+    int index;
+} colo_proxy;
+
+static colo_proxy cp_info = {-1, -1};
 
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 static int colo_nic_side = -1;
@@ -93,6 +99,60 @@ static int colo_nic_configure(NetClientState *nc,
     return launch_colo_script(argv);
 }
 
+static int configure_one_nic(NetClientState *nc,
+             bool up, int side, int index)
+{
+    struct nic_device *nic;
+
+    assert(nc);
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        if (nic->nc == nc) {
+            if (!nic->support_colo || !nic->support_colo(nic->nc)
+                || !nic->configure) {
+                return -1;
+            }
+            if (up == nic->is_up) {
+                return 0;
+            }
+
+            if (nic->configure(nic->nc, up, side, index) && up) {
+                return -1;
+            }
+            nic->is_up = up;
+            return 0;
+        }
+    }
+
+    return -1;
+}
+
+static int configure_nic(int side, int index)
+{
+    struct nic_device *nic;
+
+    if (QTAILQ_EMPTY(&nic_devices)) {
+        return -1;
+    }
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        if (configure_one_nic(nic->nc, 1, side, index)) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static void teardown_nic(int side, int index)
+{
+    struct nic_device *nic;
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        configure_one_nic(nic->nc, 0, side, index);
+    }
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
     struct nic_device *nic = g_malloc0(sizeof(*nic));
@@ -119,9 +179,29 @@ void colo_remove_nic_devices(NetClientState *nc)
 
     QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
         if (nic->nc == nc) {
+            configure_one_nic(nc, 0, colo_nic_side, cp_info.index);
             QTAILQ_REMOVE(&nic_devices, nic, next);
             g_free(nic);
         }
     }
     colo_nic_side = -1;
 }
+
+int colo_proxy_init(int side)
+{
+    int ret = -1;
+
+    ret = configure_nic(side, cp_info.index);
+    if (ret != 0) {
+        error_report("excute colo-proxy-script failed");
+    }
+    colo_nic_side = side;
+    return ret;
+}
+
+void colo_proxy_destroy(int side)
+{
+    teardown_nic(side, cp_info.index);
+    cp_info.index = -1;
+    colo_nic_side = -1;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 21/28] COLO NIC: Some init work related with proxy module
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (19 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 20/28] COLO NIC : Implement colo nic init/destroy function zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 22/28] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

Implement communication protocol with proxy module by using
netlink, and do some init work.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 net/colo-nic.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

diff --git a/net/colo-nic.c b/net/colo-nic.c
index a4719ce..38d9bf5 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -15,7 +15,19 @@
 #include "net/net.h"
 #include "net/colo-nic.h"
 #include "qemu/error-report.h"
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
 
+#define NETLINK_COLO 28
+
+enum colo_netlink_op {
+    COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+    COLO_CHECKPOINT,
+    COLO_FAILOVER,
+    COLO_PROXY_INIT,
+    COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
 
 typedef struct nic_device {
     NetClientState *nc;
@@ -177,6 +189,12 @@ void colo_remove_nic_devices(NetClientState *nc)
         return;
     }
 
+    /* close netlink socket before cleanup tap device. */
+    if (cp_info.sockfd >= 0) {
+        close(cp_info.sockfd);
+        cp_info.sockfd = -1;
+    }
+
     QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
         if (nic->nc == nc) {
             configure_one_nic(nc, 0, colo_nic_side, cp_info.index);
@@ -187,20 +205,173 @@ void colo_remove_nic_devices(NetClientState *nc)
     colo_nic_side = -1;
 }
 
+static int colo_proxy_send(uint8_t *buff, uint64_t size, int type)
+{
+    struct sockaddr_nl sa;
+    struct nlmsghdr msg;
+    struct iovec iov;
+    struct msghdr mh;
+    int ret;
+
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_pid = 0;
+    sa.nl_groups = 0;
+
+    msg.nlmsg_len = NLMSG_SPACE(0);
+    msg.nlmsg_flags = NLM_F_REQUEST;
+    if (type == COLO_PROXY_INIT) {
+        msg.nlmsg_flags |= NLM_F_ACK;
+    }
+    msg.nlmsg_seq = 0;
+    /* This is untrusty */
+    msg.nlmsg_pid = cp_info.index;
+    msg.nlmsg_type = type;
+
+    iov.iov_base = &msg;
+    iov.iov_len = msg.nlmsg_len;
+
+    mh.msg_name = &sa;
+    mh.msg_namelen = sizeof(sa);
+    mh.msg_iov = &iov;
+    mh.msg_iovlen = 1;
+    mh.msg_control = NULL;
+    mh.msg_controllen = 0;
+    mh.msg_flags = 0;
+
+    ret = sendmsg(cp_info.sockfd, &mh, 0);
+    if (ret <= 0) {
+        error_report("can't send msg to kernel by netlink: %s",
+                     strerror(errno));
+    }
+
+    return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(uint8_t **buff, int flags)
+{
+    struct sockaddr_nl sa;
+    struct iovec iov;
+    struct msghdr mh = {
+        .msg_name = &sa,
+        .msg_namelen = sizeof(sa),
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+    };
+    uint8_t *tmp = g_malloc(16384);
+    uint32_t size = 16384;
+    int64_t len = 0;
+    int ret;
+
+    iov.iov_base = tmp;
+    iov.iov_len = size;
+next:
+   ret = recvmsg(cp_info.sockfd, &mh, flags);
+    if (ret <= 0) {
+        goto out;
+    }
+
+    len += ret;
+    if (mh.msg_flags & MSG_TRUNC) {
+        size += 16384;
+        tmp = g_realloc(tmp, size);
+        iov.iov_base = tmp + len;
+        iov.iov_len = size - len;
+        goto next;
+    }
+
+    *buff = tmp;
+    return len;
+
+out:
+    g_free(tmp);
+    *buff = NULL;
+    return ret;
+}
+
 int colo_proxy_init(int side)
 {
+    int skfd = 0;
+    struct sockaddr_nl sa;
+    struct nlmsghdr *h;
+    struct timeval tv = {0, 500000}; /* timeout for recvmsg from kernel */
+    int i = 1;
     int ret = -1;
+    uint8_t *buff = NULL;
+    int64_t size;
+
+    skfd = socket(PF_NETLINK, SOCK_RAW, NETLINK_COLO);
+    if (skfd < 0) {
+        error_report("can not create a netlink socket: %s", strerror(errno));
+        goto out;
+    }
+    cp_info.sockfd = skfd;
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_groups = 0;
+retry:
+    sa.nl_pid = i++;
+
+    if (i > 10) {
+        error_report("netlink bind error");
+        goto out;
+    }
+
+    ret = bind(skfd, (struct sockaddr *)&sa, sizeof(sa));
+    if (ret < 0 && errno == EADDRINUSE) {
+        error_report("colo index %d has already in used", sa.nl_pid);
+        goto retry;
+    }
+
+    cp_info.index = sa.nl_pid;
+    ret = colo_proxy_send(NULL, 0, COLO_PROXY_INIT);
+    if (ret < 0) {
+        goto out;
+    }
+    setsockopt(cp_info.sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    ret = -1;
+    size = colo_proxy_recv(&buff, 0);
+    /* disable SO_RCVTIMEO */
+    tv.tv_usec = 0;
+    setsockopt(cp_info.sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    if (size < 0) {
+        error_report("Can't recv msg from kernel by netlink: %s",
+                     strerror(errno));
+        goto out;
+    }
+
+    if (size) {
+        h = (struct nlmsghdr *)buff;
+
+        if (h->nlmsg_type == NLMSG_ERROR) {
+            struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
+            if (size - sizeof(*h) < sizeof(*err)) {
+                goto out;
+            }
+            ret = -err->error;
+            if (ret) {
+                goto out;
+            }
+        }
+    }
 
     ret = configure_nic(side, cp_info.index);
     if (ret != 0) {
         error_report("excute colo-proxy-script failed");
     }
     colo_nic_side = side;
+
+out:
+    g_free(buff);
     return ret;
 }
 
 void colo_proxy_destroy(int side)
 {
+    if (cp_info.sockfd >= 0) {
+        close(cp_info.sockfd);
+    }
     teardown_nic(side, cp_info.index);
     cp_info.index = -1;
     colo_nic_side = -1;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 22/28] COLO: Do checkpoint according to the result of net packets comparing
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (20 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 21/28] COLO NIC: Some init work related with proxy module zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

Only do checkpoint, when the VMs' output net packets are inconsistent,
We also limit the min time between two continuous checkpoint action, to
give VM a change to run.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c       | 34 ++++++++++++++++++++++++++++++++++
 net/colo-nic.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 40dbcfb..67c9807 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -19,4 +19,6 @@ void colo_proxy_destroy(int side);
 void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
+int colo_proxy_compare(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index dffd6f9..9ef4554 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -25,6 +25,13 @@
         }                                                   \
     } while (0)
 
+/*
+* We should not do checkpoint one after another without any time interval,
+* Because this will lead continuous 'stop' status for VM.
+* CHECKPOINT_MIN_PERIOD is the min time limit between two checkpoint action.
+*/
+#define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
+
 enum {
     COLO_READY = 0x46,
 
@@ -290,6 +297,7 @@ static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
     QEMUFile *colo_control = NULL;
+    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret;
 
     if (colo_proxy_init(COLO_PRIMARY_MODE) != 0) {
@@ -326,10 +334,36 @@ static void *colo_thread(void *opaque)
     DPRINTF("vm resume to run\n");
 
     while (s->state == MIGRATION_STATUS_COLO) {
+        int proxy_checkpoint_req;
+
+        /* wait for a colo checkpoint */
+        proxy_checkpoint_req = colo_proxy_compare();
+        if (proxy_checkpoint_req < 0) {
+            goto out;
+        } else if (!proxy_checkpoint_req) {
+            /*
+             * No checkpoint is needed, wait for 1ms and then
+             * check if we need checkpoint again
+             */
+            g_usleep(1000);
+            continue;
+        } else {
+            int64_t interval;
+
+            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+            interval = current_time - checkpoint_time;
+            if (interval < CHECKPOINT_MIN_PERIOD) {
+                /* Limit the min time between two checkpoint */
+                g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
+            }
+            DPRINTF("Net packets is not consistent!!!\n");
+        }
+
         /* start a colo checkpoint */
         if (colo_do_checkpoint_transaction(s, colo_control)) {
             goto out;
         }
+        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     }
 
 out:
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 38d9bf5..563d661 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -37,6 +37,9 @@ typedef struct nic_device {
     bool is_up;
 } nic_device;
 
+typedef struct colo_msg {
+    bool is_checkpoint;
+} colo_msg;
 
 typedef struct colo_proxy {
     int sockfd;
@@ -376,3 +379,41 @@ void colo_proxy_destroy(int side)
     cp_info.index = -1;
     colo_nic_side = -1;
 }
+/*
+do checkpoint: return 1
+error: return -1
+do not checkpoint: return 0
+*/
+int colo_proxy_compare(void)
+{
+    uint8_t *buff;
+    int64_t size;
+    struct nlmsghdr *h;
+    struct colo_msg *m;
+    int ret = -1;
+
+    size = colo_proxy_recv(&buff, MSG_DONTWAIT);
+
+    /* timeout, return no checkpoint message. */
+    if (size <= 0) {
+        return 0;
+    }
+
+    h = (struct nlmsghdr *) buff;
+
+    if (h->nlmsg_type == NLMSG_ERROR) {
+        goto out;
+    }
+
+    if (h->nlmsg_len < NLMSG_LENGTH(sizeof(*m))) {
+        goto out;
+    }
+
+    m = NLMSG_DATA(h);
+
+    ret = m->is_checkpoint ? 1 : 0;
+
+out:
+    g_free(buff);
+    return ret;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (21 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 22/28] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-05-18 16:48   ` Dr. David Alan Gilbert
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 24/28] COLO: Add colo-set-checkpoint-period command zhanghailiang
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Yang Hongyang,
	david

Besides normal checkpoint which according to the result of net packets
comparing, We do additional checkpoint periodically, it will reduce the number
of dirty pages when do one checkpoint, if we don't do checkpoint for a long
time (This is a special case when the net packets is always consistent).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 9ef4554..da5bc5e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
@@ -32,6 +33,13 @@
 */
 #define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
 
+/*
+ * force checkpoint timer: unit ms
+ * this is large because COLO checkpoint will mostly depend on
+ * COLO compare module.
+ */
+#define CHECKPOINT_MAX_PEROID 10000
+
 enum {
     COLO_READY = 0x46,
 
@@ -340,14 +348,7 @@ static void *colo_thread(void *opaque)
         proxy_checkpoint_req = colo_proxy_compare();
         if (proxy_checkpoint_req < 0) {
             goto out;
-        } else if (!proxy_checkpoint_req) {
-            /*
-             * No checkpoint is needed, wait for 1ms and then
-             * check if we need checkpoint again
-             */
-            g_usleep(1000);
-            continue;
-        } else {
+        } else if (proxy_checkpoint_req) {
             int64_t interval;
 
             current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
@@ -357,8 +358,20 @@ static void *colo_thread(void *opaque)
                 g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
             }
             DPRINTF("Net packets is not consistent!!!\n");
+            goto do_checkpoint;
+        }
+
+        /*
+         * No proxy checkpoint is request, wait for 100ms
+         * and then check if we need checkpoint again.
+         */
+        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+        if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
+            g_usleep(100000);
+            continue;
         }
 
+do_checkpoint:
         /* start a colo checkpoint */
         if (colo_do_checkpoint_transaction(s, colo_control)) {
             goto out;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 24/28] COLO: Add colo-set-checkpoint-period command
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (22 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 25/28] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

With this command, we can control the period of checkpoint, if
there is no comparison of net packets.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 hmp-commands.hx        | 15 +++++++++++++++
 hmp.c                  |  7 +++++++
 hmp.h                  |  1 +
 migration/colo.c       | 11 ++++++++++-
 qapi-schema.json       | 13 +++++++++++++
 qmp-commands.hx        | 22 ++++++++++++++++++++++
 stubs/migration-colo.c |  4 ++++
 7 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e615889..bea01a6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1008,6 +1008,21 @@ Tell COLO that heartbeat is lost, a failover or takeover is needed.
 ETEXI
 
     {
+        .name       = "colo_set_checkpoint_period",
+        .args_type  = "value:i",
+        .params     = "value",
+        .help       = "set checkpoint period (in ms) for colo. "
+        "Defaults to 100ms",
+        .mhandler.cmd = hmp_colo_set_checkpoint_period,
+    },
+
+STEXI
+@item migrate_set_checkpoint_period @var{value}
+@findex migrate_set_checkpoint_period
+Set checkpoint period to @var{value} (in ms) for colo.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index ee0e139..bd880af 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1191,6 +1191,13 @@ void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict)
+{
+    int64_t value = qdict_get_int(qdict, "value");
+
+    qmp_colo_set_checkpoint_period(value, NULL);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index af85144..f7d23bb 100644
--- a/hmp.h
+++ b/hmp.h
@@ -66,6 +66,7 @@ void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
+void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration/colo.c b/migration/colo.c
index da5bc5e..1c4b222 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,7 @@
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
 #include "net/colo-nic.h"
+#include "qmp-commands.h"
 
 #define DEBUG_COLO 0
 
@@ -78,6 +79,9 @@ enum {
 static QEMUBH *colo_bh;
 static bool vmstate_loading;
 static Coroutine *colo;
+
+int64_t colo_checkpoint_period = CHECKPOINT_MAX_PEROID;
+
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
 QEMUSizedBuffer *colo_buffer;
@@ -93,6 +97,11 @@ bool migrate_in_colo_state(void)
     return (s->state == MIGRATION_STATUS_COLO);
 }
 
+void qmp_colo_set_checkpoint_period(int64_t value, Error **errp)
+{
+    colo_checkpoint_period = value;
+}
+
 static bool colo_runstate_is_stopped(void)
 {
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
@@ -366,7 +375,7 @@ static void *colo_thread(void *opaque)
          * and then check if we need checkpoint again.
          */
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-        if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
+        if (current_time - checkpoint_time < colo_checkpoint_period) {
             g_usleep(100000);
             continue;
         }
diff --git a/qapi-schema.json b/qapi-schema.json
index 8abc367..915a9cb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -576,6 +576,19 @@
 { 'command': 'colo-lost-heartbeat' }
 
 ##
+# @colo-set-checkpoint-period
+#
+# Set colo checkpoint period
+#
+# @value: period of colo checkpoint in ms
+#
+# Returns: nothing on success
+#
+# Since: 2.4
+##
+{ 'command': 'colo-set-checkpoint-period', 'data': {'value': 'int'} }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 85d3d72..ab7e4a1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -800,6 +800,28 @@ Example:
 EQMP
 
     {
+         .name       = "colo-set-checkpoint-period",
+         .args_type  = "value:i",
+         .mhandler.cmd_new = qmp_marshal_input_colo_set_checkpoint_period,
+    },
+
+SQMP
+colo-set-checkpoint-period
+--------------------------
+
+set checkpoint period
+
+Arguments:
+- "value": checkpoint period
+
+Example:
+
+-> { "execute": "colo-set-checkpoint-period", "arguments": { "value": "1000" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 61fe22b..368590a 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -52,3 +52,7 @@ void qmp_colo_lost_heartbeat(Error **errp)
                      " with --enable-colo option in order to support"
                      " COLO feature");
 }
+
+void qmp_colo_set_checkpoint_period(int64_t value, Error **errp)
+{
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 25/28] COLO NIC: Implement NIC checkpoint and failover
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (23 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 24/28] COLO: Add colo-set-checkpoint-period command zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 26/28] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, david

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h |  3 ++-
 migration/colo.c       | 22 +++++++++++++++++++---
 net/colo-nic.c         | 19 +++++++++++++++++++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 67c9807..ddc21cd 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -20,5 +20,6 @@ void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
 int colo_proxy_compare(void);
-
+int colo_proxy_failover(void);
+int colo_proxy_checkpoint(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 1c4b222..54ae184 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -122,6 +122,11 @@ static void slave_do_failover(void)
         ;
     }
 
+    if (colo_proxy_failover() != 0) {
+        error_report("colo proxy failed to do failover");
+    }
+    colo_proxy_destroy(COLO_SECONDARY_MODE);
+
     colo = NULL;
 
     if (!autostart) {
@@ -144,6 +149,8 @@ static void master_do_failover(void)
         vm_stop_force_state(RUN_STATE_COLO);
     }
 
+    colo_proxy_destroy(COLO_PRIMARY_MODE);
+
     if (s->state != MIGRATION_STATUS_FAILED) {
         migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
     }
@@ -267,6 +274,11 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
 
     qemu_fflush(trans);
 
+    ret = colo_proxy_checkpoint();
+    if (ret < 0) {
+        goto out;
+    }
+
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
@@ -413,8 +425,6 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
-    colo_proxy_destroy(COLO_PRIMARY_MODE);
-
     return NULL;
 }
 
@@ -542,6 +552,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
             goto out;
         }
 
+        ret = colo_proxy_checkpoint();
+        if (ret < 0) {
+                goto out;
+        }
+        DPRINTF("proxy begin to do checkpoint\n");
+
         ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
         if (ret < 0) {
             goto out;
@@ -618,6 +634,7 @@ out:
         * just kill slave
         */
         error_report("SVM is going to exit!");
+        colo_proxy_destroy(COLO_SECONDARY_MODE);
         exit(1);
     } else {
         /* if we went here, means master may dead, we are doing failover */
@@ -642,6 +659,5 @@ out:
 
     loadvm_exit_colo();
 
-    colo_proxy_destroy(COLO_SECONDARY_MODE);
     return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 563d661..02a454d 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -379,6 +379,25 @@ void colo_proxy_destroy(int side)
     cp_info.index = -1;
     colo_nic_side = -1;
 }
+
+int colo_proxy_failover(void)
+{
+    if (colo_proxy_send(NULL, 0, COLO_FAILOVER) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+int colo_proxy_checkpoint(void)
+{
+    if (colo_proxy_send(NULL, 0, COLO_CHECKPOINT) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
 do checkpoint: return 1
 error: return -1
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 26/28] COLO: Disable qdev hotplug when VM is in COLO mode
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (24 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 25/28] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 27/28] COLO: Implement shutdown checkpoint zhanghailiang
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Yang Hongyang,
	david

COLO do not support qdev hotplug migration, disable it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 54ae184..7d57121 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "hw/qdev-core.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
@@ -325,6 +326,7 @@ out:
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    int dev_hotplug = qdev_hotplug;
     QEMUFile *colo_control = NULL;
     int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret;
@@ -341,6 +343,8 @@ static void *colo_thread(void *opaque)
         goto out;
     }
 
+    qdev_hotplug = 0;
+
     /*
      * Wait for slave finish loading vm states and enter COLO
      * restore.
@@ -425,6 +429,8 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
+    qdev_hotplug = dev_hotplug;
+
     return NULL;
 }
 
@@ -487,10 +493,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
     struct colo_incoming *colo_in = opaque;
     QEMUFile *f = colo_in->file;
     int fd = qemu_get_fd(f);
+    int dev_hotplug = qdev_hotplug;
     QEMUFile *ctl = NULL, *fb = NULL;
     int ret;
     uint64_t total_size;
 
+    qdev_hotplug = 0;
+
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
@@ -659,5 +668,7 @@ out:
 
     loadvm_exit_colo();
 
+    qdev_hotplug = dev_hotplug;
+
     return NULL;
 }
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 27/28] COLO: Implement shutdown checkpoint
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (25 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 26/28] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 28/28] COLO: Add block replication into colo process zhanghailiang
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Lai Jiangshan,
	david

For SVM, we forbid it shutdown directly when in COLO mode,
FOR PVM's shutdown, we should do some work to ensure the consistent action
between PVM and SVM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/migration/migration-colo.h |  1 +
 include/sysemu/sysemu.h            |  3 +++
 migration/colo-comm.c              |  5 +++++
 migration/colo.c                   | 19 +++++++++++++++++++
 vl.c                               | 23 +++++++++++++++++++++--
 5 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 7e8fe46..e8628f7 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -44,6 +44,7 @@ void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
 
+bool vm_in_colo_state(void);
 int get_colo_mode(void);
 
 /* ram cache */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8a52934..8b37bd2 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -51,6 +51,8 @@ typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -58,6 +60,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index c3dd617..ee92bb2 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -31,6 +31,11 @@ static void colo_info_save(QEMUFile *f, void *opaque)
 }
 
 /* restore */
+bool vm_in_colo_state(void)
+{
+    return migrate_in_colo_state() || loadvm_in_colo_state();
+}
+
 int get_colo_mode(void)
 {
     if (migrate_in_colo_state()) {
diff --git a/migration/colo.c b/migration/colo.c
index 7d57121..894bf5f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -75,6 +75,8 @@ enum {
     COLO_CHECKPOINT_SEND,
     COLO_CHECKPOINT_RECEIVED,
     COLO_CHECKPOINT_LOADED,
+
+    COLO_GUEST_SHUTDOWN
 };
 
 static QEMUBH *colo_bh;
@@ -308,6 +310,13 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
     }
     DPRINTF("got COLO_CHECKPOINT_LOADED\n");
 
+    if (colo_shutdown_requested) {
+        colo_ctl_put(s->file, COLO_GUEST_SHUTDOWN);
+        qemu_fflush(s->file);
+        colo_shutdown_requested = 0;
+        qemu_system_shutdown_request_core();
+    }
+
     ret = 0;
     /* resume master */
     qemu_mutex_lock_iothread();
@@ -483,6 +492,16 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
     case COLO_CHECKPOINT_NEW:
         *checkpoint_request = 1;
         return 0;
+    case COLO_GUEST_SHUTDOWN:
+        qemu_mutex_lock_iothread();
+        qemu_system_shutdown_request_core();
+        qemu_mutex_unlock_iothread();
+        /* the main thread will exit and termiante the whole
+        * process, do we need some cleanup?
+        */
+        for (;;) {
+            ;
+        }
     default:
         return -1;
     }
diff --git a/vl.c b/vl.c
index 8c07244..b0f3237 100644
--- a/vl.c
+++ b/vl.c
@@ -1532,6 +1532,8 @@ static NotifierList wakeup_notifiers =
     NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
+int colo_shutdown_requested;
+
 int qemu_shutdown_requested_get(void)
 {
     return shutdown_requested;
@@ -1648,6 +1650,10 @@ void qemu_system_reset(bool report)
 void qemu_system_reset_request(void)
 {
     if (no_reboot) {
+        if (vm_in_colo_state()) {
+            colo_shutdown_requested = 1;
+            return;
+        }
         shutdown_requested = 1;
     } else {
         reset_requested = 1;
@@ -1716,13 +1722,26 @@ void qemu_system_killed(int signal, pid_t pid)
     qemu_system_shutdown_request();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 {
-    trace_qemu_system_shutdown_request();
     shutdown_requested = 1;
     qemu_notify_event();
 }
 
+void qemu_system_shutdown_request(void)
+{
+    trace_qemu_system_shutdown_request();
+    /*
+    * if in colo mode, we need do some significant work before respond to the
+    * shutdown request.
+    */
+    if (vm_in_colo_state()) {
+        colo_shutdown_requested = 1;
+        return;
+    }
+    qemu_system_shutdown_request_core();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown(&error_abort);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [RFC PATCH v4 28/28] COLO: Add block replication into colo process
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (26 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 27/28] COLO: Implement shutdown checkpoint zhanghailiang
@ 2015-03-26  5:29 ` zhanghailiang
  2015-04-08  8:16 ` [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
  2015-05-14 12:14 ` Dr. David Alan Gilbert
  29 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-03-26  5:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, amit.shah, Yang Hongyang,
	david

From: Wen Congyang <wency@cn.fujitsu.com>

Make sure master start block replication after slave's block replication started

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 118 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 894bf5f..d70f80b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,8 @@
 #include "migration/migration-failover.h"
 #include "net/colo-nic.h"
 #include "qmp-commands.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
 
 #define DEBUG_COLO 0
 
@@ -110,6 +112,68 @@ static bool colo_runstate_is_stopped(void)
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void blk_start_replication(bool primary, Error **errp)
+{
+    int mode = primary ? COLO_MODE_PRIMARY : COLO_MODE_SECONDARY;
+    BlockBackend *blk, *temp;
+    Error *local_err = NULL;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+            continue;
+        }
+
+        bdrv_start_replication(blk_bs(blk), mode, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+    }
+
+    return;
+
+fail:
+    for (temp = blk_next(NULL); temp != blk; temp = blk_next(temp)) {
+        bdrv_stop_replication(blk_bs(temp), NULL);
+    }
+}
+
+static void blk_do_checkpoint(Error **errp)
+{
+    BlockBackend *blk;
+    Error *local_err = NULL;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+            continue;
+        }
+
+        bdrv_do_checkpoint(blk_bs(blk), &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+static void blk_stop_replication(Error **errp)
+{
+    BlockBackend *blk;
+    Error *local_err = NULL;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+            continue;
+        }
+
+        bdrv_stop_replication(blk_bs(blk), &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 /*
  * there are two way to entry this function
  * 1. From colo checkpoint incoming thread, in this case
@@ -120,6 +184,8 @@ static bool colo_runstate_is_stopped(void)
  */
 static void slave_do_failover(void)
 {
+    Error *local_err = NULL;
+
     /* Wait for incoming thread loading vmstate */
     while (vmstate_loading) {
         ;
@@ -129,6 +195,10 @@ static void slave_do_failover(void)
         error_report("colo proxy failed to do failover");
     }
     colo_proxy_destroy(COLO_SECONDARY_MODE);
+    blk_stop_replication(&local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
 
     colo = NULL;
 
@@ -147,6 +217,7 @@ static void slave_do_failover(void)
 static void master_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
+    Error *local_err = NULL;
 
     if (!colo_runstate_is_stopped()) {
         vm_stop_force_state(RUN_STATE_COLO);
@@ -158,6 +229,11 @@ static void master_do_failover(void)
         migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
     }
 
+    blk_stop_replication(&local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     vm_start();
 }
 
@@ -231,6 +307,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
     int ret;
     size_t size;
     QEMUFile *trans = NULL;
+    Error *local_err = NULL;
 
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
     if (ret < 0) {
@@ -282,6 +359,16 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
         goto out;
     }
 
+    /* we call this api although this may do nothing on primary side */
+    qemu_mutex_lock_iothread();
+    blk_do_checkpoint(&local_err);
+    qemu_mutex_unlock_iothread();
+    if (local_err) {
+        error_report_err(local_err);
+        ret = -1;
+        goto out;
+    }
+
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
@@ -339,6 +426,7 @@ static void *colo_thread(void *opaque)
     QEMUFile *colo_control = NULL;
     int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret;
+    Error *local_err = NULL;
 
     if (colo_proxy_init(COLO_PRIMARY_MODE) != 0) {
         error_report("Init colo proxy error");
@@ -370,6 +458,12 @@ static void *colo_thread(void *opaque)
         goto out;
     }
 
+    /* start block replication */
+    blk_start_replication(true, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -414,7 +508,11 @@ do_checkpoint:
     }
 
 out:
-    error_report("colo: some error happens in colo_thread");
+    if (local_err) {
+        error_report_err(local_err);
+    } else {
+        error_report("colo: some error happens in colo_thread");
+    }
     qemu_mutex_lock_iothread();
     if (!failover_request_is_set()) {
         error_report("master takeover from checkpoint channel");
@@ -516,6 +614,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
     QEMUFile *ctl = NULL, *fb = NULL;
     int ret;
     uint64_t total_size;
+    Error *local_err = NULL;
 
     qdev_hotplug = 0;
 
@@ -543,6 +642,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
         goto out;
     }
 
+    /* start block replication */
+    blk_start_replication(false, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    DPRINTF("finish block replication\n");
+
     ret = colo_ctl_put(ctl, COLO_READY);
     if (ret < 0) {
         goto out;
@@ -627,7 +733,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Finish load all vm state to cache\n");
         vmstate_loading = false;
+
+        /* discard colo disk buffer */
+        blk_do_checkpoint(&local_err);
         qemu_mutex_unlock_iothread();
+        if (local_err) {
+            goto out;
+        }
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
@@ -645,7 +757,11 @@ void *colo_process_incoming_checkpoints(void *opaque)
     }
 
 out:
-    error_report("Detect some error or get a failover request");
+    if (local_err) {
+        error_report_err(local_err);
+    } else {
+        error_report("Detect some error or get a failover request");
+    }
     /* determine whether we need to failover */
     if (!failover_request_is_set()) {
         /*
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (27 preceding siblings ...)
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 28/28] COLO: Add block replication into colo process zhanghailiang
@ 2015-04-08  8:16 ` zhanghailiang
  2015-04-22 11:18   ` Dr. David Alan Gilbert
  2015-05-14 12:14 ` Dr. David Alan Gilbert
  29 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-04-08  8:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, dgilbert,
	peter huangpeng, arei.gonglei, amit.shah, david

Hi,

ping ...

The main blocked bugs for COLO have been solved,
we also have finished some new features and optimization on COLO. (If you are interested in this,
we can send them to you in private ;))

For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
patch set before it been totally reviewed.

COLO is a totally new feature which is still in early stage, we hope to speed up the development,
so your comments and feedback are warmly welcomed. :)

Thanks,
zhanghailiang

On 2015/3/26 13:29, zhanghailiang wrote:
> This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
> failover, proxy API, block replication API, not include block replication.
> The block part has been sent by wencongyang:
> [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
>
> Compared with last version, there aren't too much optimize and new functions.
> The main reason is that there is an known issue that still unsolved, we found
> some dirty pages which have been missed setting bit in corresponding bitmap.
> And it will trigger strange problem in VM.
> We hope to resolve it before add more codes.
>
> You can get the newest integrated qemu colo patches from github:
> https://github.com/coloft/qemu/commits/colo-v1.1
>
> About how to test COLO, Please reference to the follow link.
> http://wiki.qemu.org/Features/COLO.
>
> Please review and test.
>
> Known issue still unsolved:
> (1) Some pages dirtied without setting its corresponding dirty-bitmap.
>
> Previous posted RFC patch series:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
>
> TODO list:
> 1 Optimize the process of checkpoint, shorten the time-consuming:
>    (Partly done, patch is not include into this series)
>     1) separate ram and device save/load process to reduce size of extra memory
>        used during checkpoint
>     2) live migrate part of dirty pages to slave during sleep time.
> 2 Add more debug/stat info
>    (Partly done, patch is not include into this series)
>    include checkpoint count, proxy discompare count, downtime,
>     number of live migrated pages, total sent pages, etc.
> 3 Strengthen failover
> 4 optimize proxy part, include proxy script.
> 5 The capability of continuous FT
>
> v4:
> - New block replication scheme (use image-fleecing for sencondary side)
> - Adress some comments from Eric Blake and Dave
> - Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
> - Add a delay (100ms) between continuous checkpoint requests to ensure VM
>    run 100ms at least since last pause.
>
> v3:
> - use proxy instead of colo agent to compare network packets
> - add block replication
> - Optimize failover disposal
> - handle shutdown
>
> v2:
> - use QEMUSizedBuffer/QEMUFile as COLO buffer
> - colo support is enabled by default
> - add nic replication support
> - addressed comments from Eric Blake and Dr. David Alan Gilbert
>
> v1:
> - implement the frame of colo
>
> Wen Congyang (1):
>    COLO: Add block replication into colo process
>
> zhanghailiang (27):
>    configure: Add parameter for configure to enable/disable COLO support
>    migration: Introduce capability 'colo' to migration
>    COLO: migrate colo related info to slave
>    migration: Integrate COLO checkpoint process into migration
>    migration: Integrate COLO checkpoint process into loadvm
>    COLO: Implement colo checkpoint protocol
>    COLO: Add a new RunState RUN_STATE_COLO
>    QEMUSizedBuffer: Introduce two help functions for qsb
>    COLO: Save VM state to slave when do checkpoint
>    COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>    COLO VMstate: Load VM state into qsb before restore it
>    arch_init: Start to trace dirty pages of SVM
>    COLO RAM: Flush cached RAM into SVM's memory
>    COLO failover: Introduce a new command to trigger a failover
>    COLO failover: Implement COLO master/slave failover work
>    COLO failover: Don't do failover during loading VM's state
>    COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>    COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>    COLO NIC: Implement colo nic device interface configure()
>    COLO NIC : Implement colo nic init/destroy function
>    COLO NIC: Some init work related with proxy module
>    COLO: Do checkpoint according to the result of net packets comparing
>    COLO: Improve checkpoint efficiency by do additional periodic
>      checkpoint
>    COLO: Add colo-set-checkpoint-period command
>    COLO NIC: Implement NIC checkpoint and failover
>    COLO: Disable qdev hotplug when VM is in COLO mode
>    COLO: Implement shutdown checkpoint
>
>   arch_init.c                            | 199 +++++++-
>   configure                              |  14 +
>   hmp-commands.hx                        |  30 ++
>   hmp.c                                  |  14 +
>   hmp.h                                  |   2 +
>   include/exec/cpu-all.h                 |   1 +
>   include/migration/migration-colo.h     |  58 +++
>   include/migration/migration-failover.h |  22 +
>   include/migration/migration.h          |   3 +
>   include/migration/qemu-file.h          |   3 +-
>   include/net/colo-nic.h                 |  25 +
>   include/net/net.h                      |   4 +
>   include/sysemu/sysemu.h                |   3 +
>   migration/Makefile.objs                |   2 +
>   migration/colo-comm.c                  |  80 ++++
>   migration/colo-failover.c              |  48 ++
>   migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
>   migration/migration.c                  |  60 ++-
>   migration/qemu-file-buf.c              |  58 +++
>   net/Makefile.objs                      |   1 +
>   net/colo-nic.c                         | 438 ++++++++++++++++++
>   net/tap.c                              |  45 +-
>   qapi-schema.json                       |  42 +-
>   qemu-options.hx                        |  10 +-
>   qmp-commands.hx                        |  41 ++
>   savevm.c                               |   2 +-
>   scripts/colo-proxy-script.sh           |  97 ++++
>   stubs/Makefile.objs                    |   1 +
>   stubs/migration-colo.c                 |  58 +++
>   vl.c                                   |  36 +-
>   30 files changed, 2178 insertions(+), 28 deletions(-)
>   create mode 100644 include/migration/migration-colo.h
>   create mode 100644 include/migration/migration-failover.h
>   create mode 100644 include/net/colo-nic.h
>   create mode 100644 migration/colo-comm.c
>   create mode 100644 migration/colo-failover.c
>   create mode 100644 migration/colo.c
>   create mode 100644 migration/colo.c.
>   create mode 100644 net/colo-nic.c
>   create mode 100755 scripts/colo-proxy-script.sh
>   create mode 100644 stubs/migration-colo.c
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-08  8:16 ` [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
@ 2015-04-22 11:18   ` Dr. David Alan Gilbert
  2015-04-24  7:25     ` Wen Congyang
  2015-04-24  8:52     ` zhanghailiang
  0 siblings, 2 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-22 11:18 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter huangpeng, arei.gonglei, amit.shah, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Hi,
> 
> ping ...

I will get to look at this again; but not until after next week.

> The main blocked bugs for COLO have been solved,

I've got the v3 set running, but the biggest problem I hit are problems
with the packet comparison module; I've seen a panic which I think is
in colo_send_checkpoint_req that I think is due to the use of
GFP_KERNEL to allocate the netlink message and I think it can schedule
there.  I tried making that a GFP_ATOMIC  but I'm hitting other
problems with :

kcolo_thread, no conn, schedule out

that I've not had time to look into yet.

So I only get about a 50% success rate of starting COLO.
I see there are stuff in the TODO of the colo-proxy that
seem to say the netlink stuff should change, maybe you're already fixing
that?

> we also have finished some new features and optimization on COLO. (If you are interested in this,
> we can send them to you in private ;))

> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
> patch set before it been totally reviewed.

I'd like to see those; but I don't want to take code privately.
It's OK to post extra stuff as a separate set.

> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
> so your comments and feedback are warmly welcomed. :)

Yes, it's getting there though; I don't think anyone else has
got this close to getting a full FT set working with disk and networking.

Dave

> 
> Thanks,
> zhanghailiang
> 
> On 2015/3/26 13:29, zhanghailiang wrote:
> >This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
> >failover, proxy API, block replication API, not include block replication.
> >The block part has been sent by wencongyang:
> >[RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
> >
> >Compared with last version, there aren't too much optimize and new functions.
> >The main reason is that there is an known issue that still unsolved, we found
> >some dirty pages which have been missed setting bit in corresponding bitmap.
> >And it will trigger strange problem in VM.
> >We hope to resolve it before add more codes.
> >
> >You can get the newest integrated qemu colo patches from github:
> >https://github.com/coloft/qemu/commits/colo-v1.1
> >
> >About how to test COLO, Please reference to the follow link.
> >http://wiki.qemu.org/Features/COLO.
> >
> >Please review and test.
> >
> >Known issue still unsolved:
> >(1) Some pages dirtied without setting its corresponding dirty-bitmap.
> >
> >Previous posted RFC patch series:
> >http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> >http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> >https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
> >
> >TODO list:
> >1 Optimize the process of checkpoint, shorten the time-consuming:
> >   (Partly done, patch is not include into this series)
> >    1) separate ram and device save/load process to reduce size of extra memory
> >       used during checkpoint
> >    2) live migrate part of dirty pages to slave during sleep time.
> >2 Add more debug/stat info
> >   (Partly done, patch is not include into this series)
> >   include checkpoint count, proxy discompare count, downtime,
> >    number of live migrated pages, total sent pages, etc.
> >3 Strengthen failover
> >4 optimize proxy part, include proxy script.
> >5 The capability of continuous FT
> >
> >v4:
> >- New block replication scheme (use image-fleecing for sencondary side)
> >- Adress some comments from Eric Blake and Dave
> >- Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
> >- Add a delay (100ms) between continuous checkpoint requests to ensure VM
> >   run 100ms at least since last pause.
> >
> >v3:
> >- use proxy instead of colo agent to compare network packets
> >- add block replication
> >- Optimize failover disposal
> >- handle shutdown
> >
> >v2:
> >- use QEMUSizedBuffer/QEMUFile as COLO buffer
> >- colo support is enabled by default
> >- add nic replication support
> >- addressed comments from Eric Blake and Dr. David Alan Gilbert
> >
> >v1:
> >- implement the frame of colo
> >
> >Wen Congyang (1):
> >   COLO: Add block replication into colo process
> >
> >zhanghailiang (27):
> >   configure: Add parameter for configure to enable/disable COLO support
> >   migration: Introduce capability 'colo' to migration
> >   COLO: migrate colo related info to slave
> >   migration: Integrate COLO checkpoint process into migration
> >   migration: Integrate COLO checkpoint process into loadvm
> >   COLO: Implement colo checkpoint protocol
> >   COLO: Add a new RunState RUN_STATE_COLO
> >   QEMUSizedBuffer: Introduce two help functions for qsb
> >   COLO: Save VM state to slave when do checkpoint
> >   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
> >   COLO VMstate: Load VM state into qsb before restore it
> >   arch_init: Start to trace dirty pages of SVM
> >   COLO RAM: Flush cached RAM into SVM's memory
> >   COLO failover: Introduce a new command to trigger a failover
> >   COLO failover: Implement COLO master/slave failover work
> >   COLO failover: Don't do failover during loading VM's state
> >   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
> >   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
> >   COLO NIC: Implement colo nic device interface configure()
> >   COLO NIC : Implement colo nic init/destroy function
> >   COLO NIC: Some init work related with proxy module
> >   COLO: Do checkpoint according to the result of net packets comparing
> >   COLO: Improve checkpoint efficiency by do additional periodic
> >     checkpoint
> >   COLO: Add colo-set-checkpoint-period command
> >   COLO NIC: Implement NIC checkpoint and failover
> >   COLO: Disable qdev hotplug when VM is in COLO mode
> >   COLO: Implement shutdown checkpoint
> >
> >  arch_init.c                            | 199 +++++++-
> >  configure                              |  14 +
> >  hmp-commands.hx                        |  30 ++
> >  hmp.c                                  |  14 +
> >  hmp.h                                  |   2 +
> >  include/exec/cpu-all.h                 |   1 +
> >  include/migration/migration-colo.h     |  58 +++
> >  include/migration/migration-failover.h |  22 +
> >  include/migration/migration.h          |   3 +
> >  include/migration/qemu-file.h          |   3 +-
> >  include/net/colo-nic.h                 |  25 +
> >  include/net/net.h                      |   4 +
> >  include/sysemu/sysemu.h                |   3 +
> >  migration/Makefile.objs                |   2 +
> >  migration/colo-comm.c                  |  80 ++++
> >  migration/colo-failover.c              |  48 ++
> >  migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
> >  migration/migration.c                  |  60 ++-
> >  migration/qemu-file-buf.c              |  58 +++
> >  net/Makefile.objs                      |   1 +
> >  net/colo-nic.c                         | 438 ++++++++++++++++++
> >  net/tap.c                              |  45 +-
> >  qapi-schema.json                       |  42 +-
> >  qemu-options.hx                        |  10 +-
> >  qmp-commands.hx                        |  41 ++
> >  savevm.c                               |   2 +-
> >  scripts/colo-proxy-script.sh           |  97 ++++
> >  stubs/Makefile.objs                    |   1 +
> >  stubs/migration-colo.c                 |  58 +++
> >  vl.c                                   |  36 +-
> >  30 files changed, 2178 insertions(+), 28 deletions(-)
> >  create mode 100644 include/migration/migration-colo.h
> >  create mode 100644 include/migration/migration-failover.h
> >  create mode 100644 include/net/colo-nic.h
> >  create mode 100644 migration/colo-comm.c
> >  create mode 100644 migration/colo-failover.c
> >  create mode 100644 migration/colo.c
> >  create mode 100644 migration/colo.c.
> >  create mode 100644 net/colo-nic.c
> >  create mode 100755 scripts/colo-proxy-script.sh
> >  create mode 100644 stubs/migration-colo.c
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-22 11:18   ` Dr. David Alan Gilbert
@ 2015-04-24  7:25     ` Wen Congyang
  2015-04-24  8:35       ` Dr. David Alan Gilbert
  2015-04-24  8:52     ` zhanghailiang
  1 sibling, 1 reply; 51+ messages in thread
From: Wen Congyang @ 2015-04-24  7:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter huangpeng, arei.gonglei, amit.shah, david

On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi,
>>
>> ping ...
> 
> I will get to look at this again; but not until after next week.
> 
>> The main blocked bugs for COLO have been solved,
> 
> I've got the v3 set running, but the biggest problem I hit are problems
> with the packet comparison module; I've seen a panic which I think is
> in colo_send_checkpoint_req that I think is due to the use of
> GFP_KERNEL to allocate the netlink message and I think it can schedule
> there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> problems with :

Thanks for your test.
I guest the backtrace should like:
1. colo_send_checkpoint_req()
2. colo_setup_checkpoint_by_id()

Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.

> 
> kcolo_thread, no conn, schedule out

Hmm, how to reproduce it? In my test, I only focus on block replication, and
I don't use the network.

> 
> that I've not had time to look into yet.
> 
> So I only get about a 50% success rate of starting COLO.
> I see there are stuff in the TODO of the colo-proxy that
> seem to say the netlink stuff should change, maybe you're already fixing
> that?

Do you mean you get about a 50% success rate if you use the network?


Thanks
Wen Congyang

> 
>> we also have finished some new features and optimization on COLO. (If you are interested in this,
>> we can send them to you in private ;))
> 
>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
>> patch set before it been totally reviewed.
> 
> I'd like to see those; but I don't want to take code privately.
> It's OK to post extra stuff as a separate set.
> 
>> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
>> so your comments and feedback are warmly welcomed. :)
> 
> Yes, it's getting there though; I don't think anyone else has
> got this close to getting a full FT set working with disk and networking.
> 
> Dave
> 
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-24  7:25     ` Wen Congyang
@ 2015-04-24  8:35       ` Dr. David Alan Gilbert
  2015-04-28 10:51         ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-24  8:35 UTC (permalink / raw)
  To: Wen Congyang
  Cc: zhanghailiang, lizhijian, quintela, yunhong.jiang, eddie.dong,
	qemu-devel, peter huangpeng, arei.gonglei, amit.shah, david

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
> > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >> Hi,
> >>
> >> ping ...
> > 
> > I will get to look at this again; but not until after next week.
> > 
> >> The main blocked bugs for COLO have been solved,
> > 
> > I've got the v3 set running, but the biggest problem I hit are problems
> > with the packet comparison module; I've seen a panic which I think is
> > in colo_send_checkpoint_req that I think is due to the use of
> > GFP_KERNEL to allocate the netlink message and I think it can schedule
> > there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> > problems with :
> 
> Thanks for your test.
> I guest the backtrace should like:
> 1. colo_send_checkpoint_req()
> 2. colo_setup_checkpoint_by_id()
> 
> Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.

See the backtrace below.

> > kcolo_thread, no conn, schedule out
> 
> Hmm, how to reproduce it? In my test, I only focus on block replication, and
> I don't use the network.
> 
> > 
> > that I've not had time to look into yet.
> > 
> > So I only get about a 50% success rate of starting COLO.
> > I see there are stuff in the TODO of the colo-proxy that
> > seem to say the netlink stuff should change, maybe you're already fixing
> > that?
> 
> Do you mean you get about a 50% success rate if you use the network?

I always run with the network configured; but the 'kcolo_thread, no conn' bug
will hit very early; so I don't see any output on the primary or secondary
after the migrate -d is issued on the primary.  On the primary in the dmesg
I see:
[  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
[  736.615268] kcolo_thread, no conn, schedule out, chk 0
[  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  736.718273] arp_tables: (C) 2002 David S. Miller

I've not had a chance to look further at that yet.

Here is the backtrace from the 1st bug.

Dave (I'm on holiday next week; I probably won't respond to many mails)

[ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
[ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc sg kvm_intel kvm snd_hda_codec_generic
 cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common snd_hwdep syscopyarea snd_seq crc32_pclmul crc
32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw sysimgblt gf128mul ttm drm_kms_helper snd_p
cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm cryptd virtio_console i2c_core virti
o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc xfs libcrc32c sr_mod cdrom ata_generic
[ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci libata virtio_ring floppy virtio dm_mirror
 dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-123.20.1.el7.dgilbertcolo
.x86_64 #1
[ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 ffffffff815e2b0c
[ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 ffffffff815e827b
[ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 00000000000135c0
[ 9087.833657] Call Trace:
[ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
[ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
[ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
[ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
[ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
[ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
[ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 [nf_conntrack]
[ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
[ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
[ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
[ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
[ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 [xt_PMYCOLO]
[ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO]
[ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 [arptable_filter]
[ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
[ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
[ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
[ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
[ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
[ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
[ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
[ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
[ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
[ 9087.833979]  [<ffffffff8109506d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
[ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
[ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
[ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
[ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
[ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
[ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
[ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
[ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
[ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
[ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
[ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
[ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
[ 9087.837189] ------------[ cut here ]------------
[ 9087.837189] kernel BUG at net/core/dev.c:4130!

> 
> 
> Thanks
> Wen Congyang
> 
> > 
> >> we also have finished some new features and optimization on COLO. (If you are interested in this,
> >> we can send them to you in private ;))
> > 
> >> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
> >> patch set before it been totally reviewed.
> > 
> > I'd like to see those; but I don't want to take code privately.
> > It's OK to post extra stuff as a separate set.
> > 
> >> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
> >> so your comments and feedback are warmly welcomed. :)
> > 
> > Yes, it's getting there though; I don't think anyone else has
> > got this close to getting a full FT set working with disk and networking.
> > 
> > Dave
> > 
> >>
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-22 11:18   ` Dr. David Alan Gilbert
  2015-04-24  7:25     ` Wen Congyang
@ 2015-04-24  8:52     ` zhanghailiang
  2015-04-24  8:56       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-04-24  8:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

On 2015/4/22 19:18, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi,
>>
>> ping ...
>
> I will get to look at this again; but not until after next week.
>

OK, thanks for your reply. :)

>> The main blocked bugs for COLO have been solved,
>
> I've got the v3 set running, but the biggest problem I hit are problems
> with the packet comparison module; I've seen a panic which I think is

What's the panic log?

> in colo_send_checkpoint_req that I think is due to the use of
> GFP_KERNEL to allocate the netlink message and I think it can schedule
> there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> problems with :
>
> kcolo_thread, no conn, schedule out
>

Er, it is OK to get this messages if you enable the debug,
if there is no net connect to VM, or there is a checkpoint request happening,
it is no need to compare any network packets. So we just schedule out the kcolo_thread.
Is it just this messages been printed ? Or maybe some other problems ?

> that I've not had time to look into yet.
>
> So I only get about a 50% success rate of starting COLO.

This is really strange, yes, sometimes we can come across problems like kernel panic in our tests,
but not so often. Can you describe the problems in detail ?

> I see there are stuff in the TODO of the colo-proxy that
> seem to say the netlink stuff should change, maybe you're already fixing
> that?
>

Yes, we are trying to replace the  current netlink in COLO with nfnetlink interface.
Hope to merge the code in next version.

>> we also have finished some new features and optimization on COLO. (If you are interested in this,
>> we can send them to you in private ;))
>
>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
>> patch set before it been totally reviewed.
>
> I'd like to see those; but I don't want to take code privately.
> It's OK to post extra stuff as a separate set.
>

Hmm, there is really a good idea, maybe we should also add a branch
with all the optimization and new features in github.

>> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
>> so your comments and feedback are warmly welcomed. :)
>
> Yes, it's getting there though; I don't think anyone else has
> got this close to getting a full FT set working with disk and networking.
>

Thanks,
zhanghailiang

>>
>> On 2015/3/26 13:29, zhanghailiang wrote:
>>> This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
>>> failover, proxy API, block replication API, not include block replication.
>>> The block part has been sent by wencongyang:
>>> [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
>>>
>>> Compared with last version, there aren't too much optimize and new functions.
>>> The main reason is that there is an known issue that still unsolved, we found
>>> some dirty pages which have been missed setting bit in corresponding bitmap.
>>> And it will trigger strange problem in VM.
>>> We hope to resolve it before add more codes.
>>>
>>> You can get the newest integrated qemu colo patches from github:
>>> https://github.com/coloft/qemu/commits/colo-v1.1
>>>
>>> About how to test COLO, Please reference to the follow link.
>>> http://wiki.qemu.org/Features/COLO.
>>>
>>> Please review and test.
>>>
>>> Known issue still unsolved:
>>> (1) Some pages dirtied without setting its corresponding dirty-bitmap.
>>>
>>> Previous posted RFC patch series:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
>>> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
>>>
>>> TODO list:
>>> 1 Optimize the process of checkpoint, shorten the time-consuming:
>>>    (Partly done, patch is not include into this series)
>>>     1) separate ram and device save/load process to reduce size of extra memory
>>>        used during checkpoint
>>>     2) live migrate part of dirty pages to slave during sleep time.
>>> 2 Add more debug/stat info
>>>    (Partly done, patch is not include into this series)
>>>    include checkpoint count, proxy discompare count, downtime,
>>>     number of live migrated pages, total sent pages, etc.
>>> 3 Strengthen failover
>>> 4 optimize proxy part, include proxy script.
>>> 5 The capability of continuous FT
>>>
>>> v4:
>>> - New block replication scheme (use image-fleecing for sencondary side)
>>> - Adress some comments from Eric Blake and Dave
>>> - Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
>>> - Add a delay (100ms) between continuous checkpoint requests to ensure VM
>>>    run 100ms at least since last pause.
>>>
>>> v3:
>>> - use proxy instead of colo agent to compare network packets
>>> - add block replication
>>> - Optimize failover disposal
>>> - handle shutdown
>>>
>>> v2:
>>> - use QEMUSizedBuffer/QEMUFile as COLO buffer
>>> - colo support is enabled by default
>>> - add nic replication support
>>> - addressed comments from Eric Blake and Dr. David Alan Gilbert
>>>
>>> v1:
>>> - implement the frame of colo
>>>
>>> Wen Congyang (1):
>>>    COLO: Add block replication into colo process
>>>
>>> zhanghailiang (27):
>>>    configure: Add parameter for configure to enable/disable COLO support
>>>    migration: Introduce capability 'colo' to migration
>>>    COLO: migrate colo related info to slave
>>>    migration: Integrate COLO checkpoint process into migration
>>>    migration: Integrate COLO checkpoint process into loadvm
>>>    COLO: Implement colo checkpoint protocol
>>>    COLO: Add a new RunState RUN_STATE_COLO
>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>    COLO: Save VM state to slave when do checkpoint
>>>    COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>>>    COLO VMstate: Load VM state into qsb before restore it
>>>    arch_init: Start to trace dirty pages of SVM
>>>    COLO RAM: Flush cached RAM into SVM's memory
>>>    COLO failover: Introduce a new command to trigger a failover
>>>    COLO failover: Implement COLO master/slave failover work
>>>    COLO failover: Don't do failover during loading VM's state
>>>    COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>>>    COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>>>    COLO NIC: Implement colo nic device interface configure()
>>>    COLO NIC : Implement colo nic init/destroy function
>>>    COLO NIC: Some init work related with proxy module
>>>    COLO: Do checkpoint according to the result of net packets comparing
>>>    COLO: Improve checkpoint efficiency by do additional periodic
>>>      checkpoint
>>>    COLO: Add colo-set-checkpoint-period command
>>>    COLO NIC: Implement NIC checkpoint and failover
>>>    COLO: Disable qdev hotplug when VM is in COLO mode
>>>    COLO: Implement shutdown checkpoint
>>>
>>>   arch_init.c                            | 199 +++++++-
>>>   configure                              |  14 +
>>>   hmp-commands.hx                        |  30 ++
>>>   hmp.c                                  |  14 +
>>>   hmp.h                                  |   2 +
>>>   include/exec/cpu-all.h                 |   1 +
>>>   include/migration/migration-colo.h     |  58 +++
>>>   include/migration/migration-failover.h |  22 +
>>>   include/migration/migration.h          |   3 +
>>>   include/migration/qemu-file.h          |   3 +-
>>>   include/net/colo-nic.h                 |  25 +
>>>   include/net/net.h                      |   4 +
>>>   include/sysemu/sysemu.h                |   3 +
>>>   migration/Makefile.objs                |   2 +
>>>   migration/colo-comm.c                  |  80 ++++
>>>   migration/colo-failover.c              |  48 ++
>>>   migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
>>>   migration/migration.c                  |  60 ++-
>>>   migration/qemu-file-buf.c              |  58 +++
>>>   net/Makefile.objs                      |   1 +
>>>   net/colo-nic.c                         | 438 ++++++++++++++++++
>>>   net/tap.c                              |  45 +-
>>>   qapi-schema.json                       |  42 +-
>>>   qemu-options.hx                        |  10 +-
>>>   qmp-commands.hx                        |  41 ++
>>>   savevm.c                               |   2 +-
>>>   scripts/colo-proxy-script.sh           |  97 ++++
>>>   stubs/Makefile.objs                    |   1 +
>>>   stubs/migration-colo.c                 |  58 +++
>>>   vl.c                                   |  36 +-
>>>   30 files changed, 2178 insertions(+), 28 deletions(-)
>>>   create mode 100644 include/migration/migration-colo.h
>>>   create mode 100644 include/migration/migration-failover.h
>>>   create mode 100644 include/net/colo-nic.h
>>>   create mode 100644 migration/colo-comm.c
>>>   create mode 100644 migration/colo-failover.c
>>>   create mode 100644 migration/colo.c
>>>   create mode 100644 migration/colo.c.
>>>   create mode 100644 net/colo-nic.c
>>>   create mode 100755 scripts/colo-proxy-script.sh
>>>   create mode 100644 stubs/migration-colo.c
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-24  8:52     ` zhanghailiang
@ 2015-04-24  8:56       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-24  8:56 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/4/22 19:18, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>Hi,
> >>
> >>ping ...
> >
> >I will get to look at this again; but not until after next week.
> >
> 
> OK, thanks for your reply. :)
> 
> >>The main blocked bugs for COLO have been solved,
> >
> >I've got the v3 set running, but the biggest problem I hit are problems
> >with the packet comparison module; I've seen a panic which I think is
> 
> What's the panic log?

See my reply to Wen I just sent.

> >in colo_send_checkpoint_req that I think is due to the use of
> >GFP_KERNEL to allocate the netlink message and I think it can schedule
> >there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> >problems with :
> >
> >kcolo_thread, no conn, schedule out
> >
> 
> Er, it is OK to get this messages if you enable the debug,
> if there is no net connect to VM, or there is a checkpoint request happening,
> it is no need to compare any network packets. So we just schedule out the kcolo_thread.
> Is it just this messages been printed ? Or maybe some other problems ?

The problem is that the primary stops at that point; I've not looked why
yet.

> >that I've not had time to look into yet.
> >
> >So I only get about a 50% success rate of starting COLO.
> 
> This is really strange, yes, sometimes we can come across problems like kernel panic in our tests,
> but not so often. Can you describe the problems in detail ?
> 
> >I see there are stuff in the TODO of the colo-proxy that
> >seem to say the netlink stuff should change, maybe you're already fixing
> >that?
> >
> 
> Yes, we are trying to replace the  current netlink in COLO with nfnetlink interface.
> Hope to merge the code in next version.

Good.

> >>we also have finished some new features and optimization on COLO. (If you are interested in this,
> >>we can send them to you in private ;))
> >
> >>For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
> >>patch set before it been totally reviewed.
> >
> >I'd like to see those; but I don't want to take code privately.
> >It's OK to post extra stuff as a separate set.
> >
> 
> Hmm, there is really a good idea, maybe we should also add a branch
> with all the optimization and new features in github.

Yes, that would be good.

Dave

> >>COLO is a totally new feature which is still in early stage, we hope to speed up the development,
> >>so your comments and feedback are warmly welcomed. :)
> >
> >Yes, it's getting there though; I don't think anyone else has
> >got this close to getting a full FT set working with disk and networking.
> >
> 
> Thanks,
> zhanghailiang
> 
> >>
> >>On 2015/3/26 13:29, zhanghailiang wrote:
> >>>This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
> >>>failover, proxy API, block replication API, not include block replication.
> >>>The block part has been sent by wencongyang:
> >>>[RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
> >>>
> >>>Compared with last version, there aren't too much optimize and new functions.
> >>>The main reason is that there is an known issue that still unsolved, we found
> >>>some dirty pages which have been missed setting bit in corresponding bitmap.
> >>>And it will trigger strange problem in VM.
> >>>We hope to resolve it before add more codes.
> >>>
> >>>You can get the newest integrated qemu colo patches from github:
> >>>https://github.com/coloft/qemu/commits/colo-v1.1
> >>>
> >>>About how to test COLO, Please reference to the follow link.
> >>>http://wiki.qemu.org/Features/COLO.
> >>>
> >>>Please review and test.
> >>>
> >>>Known issue still unsolved:
> >>>(1) Some pages dirtied without setting its corresponding dirty-bitmap.
> >>>
> >>>Previous posted RFC patch series:
> >>>http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> >>>http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> >>>https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
> >>>
> >>>TODO list:
> >>>1 Optimize the process of checkpoint, shorten the time-consuming:
> >>>   (Partly done, patch is not include into this series)
> >>>    1) separate ram and device save/load process to reduce size of extra memory
> >>>       used during checkpoint
> >>>    2) live migrate part of dirty pages to slave during sleep time.
> >>>2 Add more debug/stat info
> >>>   (Partly done, patch is not include into this series)
> >>>   include checkpoint count, proxy discompare count, downtime,
> >>>    number of live migrated pages, total sent pages, etc.
> >>>3 Strengthen failover
> >>>4 optimize proxy part, include proxy script.
> >>>5 The capability of continuous FT
> >>>
> >>>v4:
> >>>- New block replication scheme (use image-fleecing for sencondary side)
> >>>- Adress some comments from Eric Blake and Dave
> >>>- Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
> >>>- Add a delay (100ms) between continuous checkpoint requests to ensure VM
> >>>   run 100ms at least since last pause.
> >>>
> >>>v3:
> >>>- use proxy instead of colo agent to compare network packets
> >>>- add block replication
> >>>- Optimize failover disposal
> >>>- handle shutdown
> >>>
> >>>v2:
> >>>- use QEMUSizedBuffer/QEMUFile as COLO buffer
> >>>- colo support is enabled by default
> >>>- add nic replication support
> >>>- addressed comments from Eric Blake and Dr. David Alan Gilbert
> >>>
> >>>v1:
> >>>- implement the frame of colo
> >>>
> >>>Wen Congyang (1):
> >>>   COLO: Add block replication into colo process
> >>>
> >>>zhanghailiang (27):
> >>>   configure: Add parameter for configure to enable/disable COLO support
> >>>   migration: Introduce capability 'colo' to migration
> >>>   COLO: migrate colo related info to slave
> >>>   migration: Integrate COLO checkpoint process into migration
> >>>   migration: Integrate COLO checkpoint process into loadvm
> >>>   COLO: Implement colo checkpoint protocol
> >>>   COLO: Add a new RunState RUN_STATE_COLO
> >>>   QEMUSizedBuffer: Introduce two help functions for qsb
> >>>   COLO: Save VM state to slave when do checkpoint
> >>>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
> >>>   COLO VMstate: Load VM state into qsb before restore it
> >>>   arch_init: Start to trace dirty pages of SVM
> >>>   COLO RAM: Flush cached RAM into SVM's memory
> >>>   COLO failover: Introduce a new command to trigger a failover
> >>>   COLO failover: Implement COLO master/slave failover work
> >>>   COLO failover: Don't do failover during loading VM's state
> >>>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
> >>>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
> >>>   COLO NIC: Implement colo nic device interface configure()
> >>>   COLO NIC : Implement colo nic init/destroy function
> >>>   COLO NIC: Some init work related with proxy module
> >>>   COLO: Do checkpoint according to the result of net packets comparing
> >>>   COLO: Improve checkpoint efficiency by do additional periodic
> >>>     checkpoint
> >>>   COLO: Add colo-set-checkpoint-period command
> >>>   COLO NIC: Implement NIC checkpoint and failover
> >>>   COLO: Disable qdev hotplug when VM is in COLO mode
> >>>   COLO: Implement shutdown checkpoint
> >>>
> >>>  arch_init.c                            | 199 +++++++-
> >>>  configure                              |  14 +
> >>>  hmp-commands.hx                        |  30 ++
> >>>  hmp.c                                  |  14 +
> >>>  hmp.h                                  |   2 +
> >>>  include/exec/cpu-all.h                 |   1 +
> >>>  include/migration/migration-colo.h     |  58 +++
> >>>  include/migration/migration-failover.h |  22 +
> >>>  include/migration/migration.h          |   3 +
> >>>  include/migration/qemu-file.h          |   3 +-
> >>>  include/net/colo-nic.h                 |  25 +
> >>>  include/net/net.h                      |   4 +
> >>>  include/sysemu/sysemu.h                |   3 +
> >>>  migration/Makefile.objs                |   2 +
> >>>  migration/colo-comm.c                  |  80 ++++
> >>>  migration/colo-failover.c              |  48 ++
> >>>  migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
> >>>  migration/migration.c                  |  60 ++-
> >>>  migration/qemu-file-buf.c              |  58 +++
> >>>  net/Makefile.objs                      |   1 +
> >>>  net/colo-nic.c                         | 438 ++++++++++++++++++
> >>>  net/tap.c                              |  45 +-
> >>>  qapi-schema.json                       |  42 +-
> >>>  qemu-options.hx                        |  10 +-
> >>>  qmp-commands.hx                        |  41 ++
> >>>  savevm.c                               |   2 +-
> >>>  scripts/colo-proxy-script.sh           |  97 ++++
> >>>  stubs/Makefile.objs                    |   1 +
> >>>  stubs/migration-colo.c                 |  58 +++
> >>>  vl.c                                   |  36 +-
> >>>  30 files changed, 2178 insertions(+), 28 deletions(-)
> >>>  create mode 100644 include/migration/migration-colo.h
> >>>  create mode 100644 include/migration/migration-failover.h
> >>>  create mode 100644 include/net/colo-nic.h
> >>>  create mode 100644 migration/colo-comm.c
> >>>  create mode 100644 migration/colo-failover.c
> >>>  create mode 100644 migration/colo.c
> >>>  create mode 100644 migration/colo.c.
> >>>  create mode 100644 net/colo-nic.c
> >>>  create mode 100755 scripts/colo-proxy-script.sh
> >>>  create mode 100644 stubs/migration-colo.c
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-24  8:35       ` Dr. David Alan Gilbert
@ 2015-04-28 10:51         ` zhanghailiang
  2015-05-06 17:11           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-04-28 10:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Wen Congyang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

On 2015/4/24 16:35, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> Hi,
>>>>
>>>> ping ...
>>>
>>> I will get to look at this again; but not until after next week.
>>>
>>>> The main blocked bugs for COLO have been solved,
>>>
>>> I've got the v3 set running, but the biggest problem I hit are problems
>>> with the packet comparison module; I've seen a panic which I think is
>>> in colo_send_checkpoint_req that I think is due to the use of
>>> GFP_KERNEL to allocate the netlink message and I think it can schedule
>>> there.  I tried making that a GFP_ATOMIC  but I'm hitting other
>>> problems with :
>>
>> Thanks for your test.
>> I guest the backtrace should like:
>> 1. colo_send_checkpoint_req()
>> 2. colo_setup_checkpoint_by_id()
>>
>> Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.
>
> See the backtrace below.
>
>>> kcolo_thread, no conn, schedule out
>>
>> Hmm, how to reproduce it? In my test, I only focus on block replication, and
>> I don't use the network.
>>
>>>
>>> that I've not had time to look into yet.
>>>
>>> So I only get about a 50% success rate of starting COLO.
>>> I see there are stuff in the TODO of the colo-proxy that
>>> seem to say the netlink stuff should change, maybe you're already fixing
>>> that?
>>
>> Do you mean you get about a 50% success rate if you use the network?
>
> I always run with the network configured; but the 'kcolo_thread, no conn' bug
> will hit very early; so I don't see any output on the primary or secondary
> after the migrate -d is issued on the primary.  On the primary in the dmesg
> I see:
> [  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
> [  736.615268] kcolo_thread, no conn, schedule out, chk 0
> [  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
> [  736.718273] arp_tables: (C) 2002 David S. Miller
>
> I've not had a chance to look further at that yet.
>
> Here is the backtrace from the 1st bug.
>
> Dave (I'm on holiday next week; I probably won't respond to many mails)
>
> [ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
> [ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
> pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
> les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc sg kvm_intel kvm snd_hda_codec_generic
>   cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common snd_hwdep syscopyarea snd_seq crc32_pclmul crc
> 32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw sysimgblt gf128mul ttm drm_kms_helper snd_p
> cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm cryptd virtio_console i2c_core virti
> o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc xfs libcrc32c sr_mod cdrom ata_generic
> [ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci libata virtio_ring floppy virtio dm_mirror
>   dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> [ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-123.20.1.el7.dgilbertcolo
> .x86_64 #1
> [ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 ffffffff815e2b0c
> [ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 ffffffff815e827b
> [ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 00000000000135c0
> [ 9087.833657] Call Trace:
> [ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
> [ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
> [ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
> [ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
> [ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
> [ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
> [ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 [nf_conntrack]
> [ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
> [ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
> [ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
> [ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
> [ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 [xt_PMYCOLO]
> [ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO]
> [ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 [arptable_filter]
> [ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
> [ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
> [ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
> [ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
> [ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
> [ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
> [ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
> [ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
> [ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
> [ 9087.833979]  [<ffffffff8109506d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
> [ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
> [ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
> [ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
> [ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
> [ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
> [ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
> [ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
> [ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
> [ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
> [ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
> [ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
> [ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
> [ 9087.837189] ------------[ cut here ]------------
> [ 9087.837189] kernel BUG at net/core/dev.c:4130!
>

Hi Dave,

This seems to be a deadlock bug. We have called some functions that could lead to schedule
between rcu read lock and unlock. There are two places, One is netlink_alloc_skb() with GFP_KERNEL flag,
and the other one is netlink_unicast() (It can also lead to schedule in some special cases).

Please test with the follow modification. ;)

diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c
index a8cf1a1..d8a6eab 100644
--- a/xt_PMYCOLO.c
+++ b/xt_PMYCOLO.c
@@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) {
         if (node) {
                 pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n",
                         id, node);
+               rcu_read_unlock();
                 colo_send_checkpoint_req(&node->u.p);
         }
         rcu_read_unlock();


Thanks,
zhanghailiang

>>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>> we also have finished some new features and optimization on COLO. (If you are interested in this,
>>>> we can send them to you in private ;))
>>>
>>>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
>>>> patch set before it been totally reviewed.
>>>
>>> I'd like to see those; but I don't want to take code privately.
>>> It's OK to post extra stuff as a separate set.
>>>
>>>> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
>>>> so your comments and feedback are warmly welcomed. :)
>>>
>>> Yes, it's getting there though; I don't think anyone else has
>>> got this close to getting a full FT set working with disk and networking.
>>>
>>> Dave
>>>
>>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-04-28 10:51         ` zhanghailiang
@ 2015-05-06 17:11           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-06 17:11 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/4/24 16:35, Dr. David Alan Gilbert wrote:
> >* Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
> >>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>Hi,
> >>>>
> >>>>ping ...
> >>>
> >>>I will get to look at this again; but not until after next week.
> >>>
> >>>>The main blocked bugs for COLO have been solved,
> >>>
> >>>I've got the v3 set running, but the biggest problem I hit are problems
> >>>with the packet comparison module; I've seen a panic which I think is
> >>>in colo_send_checkpoint_req that I think is due to the use of
> >>>GFP_KERNEL to allocate the netlink message and I think it can schedule
> >>>there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> >>>problems with :
> >>
> >>Thanks for your test.
> >>I guest the backtrace should like:
> >>1. colo_send_checkpoint_req()
> >>2. colo_setup_checkpoint_by_id()
> >>
> >>Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.
> >
> >See the backtrace below.
> >
> >>>kcolo_thread, no conn, schedule out
> >>
> >>Hmm, how to reproduce it? In my test, I only focus on block replication, and
> >>I don't use the network.
> >>
> >>>
> >>>that I've not had time to look into yet.
> >>>
> >>>So I only get about a 50% success rate of starting COLO.
> >>>I see there are stuff in the TODO of the colo-proxy that
> >>>seem to say the netlink stuff should change, maybe you're already fixing
> >>>that?
> >>
> >>Do you mean you get about a 50% success rate if you use the network?
> >
> >I always run with the network configured; but the 'kcolo_thread, no conn' bug
> >will hit very early; so I don't see any output on the primary or secondary
> >after the migrate -d is issued on the primary.  On the primary in the dmesg
> >I see:
> >[  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
> >[  736.615268] kcolo_thread, no conn, schedule out, chk 0
> >[  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
> >[  736.718273] arp_tables: (C) 2002 David S. Miller
> >
> >I've not had a chance to look further at that yet.
> >
> >Here is the backtrace from the 1st bug.
> >
> >Dave (I'm on holiday next week; I probably won't respond to many mails)
> >
> >[ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
> >[ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
> >pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
> >les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc sg kvm_intel kvm snd_hda_codec_generic
> >  cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common snd_hwdep syscopyarea snd_seq crc32_pclmul crc
> >32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw sysimgblt gf128mul ttm drm_kms_helper snd_p
> >cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm cryptd virtio_console i2c_core virti
> >o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc xfs libcrc32c sr_mod cdrom ata_generic
> >[ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci libata virtio_ring floppy virtio dm_mirror
> >  dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> >[ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-123.20.1.el7.dgilbertcolo
> >.x86_64 #1
> >[ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> >[ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 ffffffff815e2b0c
> >[ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 ffffffff815e827b
> >[ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 00000000000135c0
> >[ 9087.833657] Call Trace:
> >[ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
> >[ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
> >[ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
> >[ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
> >[ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
> >[ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
> >[ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 [nf_conntrack]
> >[ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
> >[ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
> >[ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
> >[ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
> >[ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 [xt_PMYCOLO]
> >[ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO]
> >[ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 [arptable_filter]
> >[ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
> >[ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
> >[ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
> >[ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
> >[ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
> >[ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
> >[ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
> >[ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
> >[ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
> >[ 9087.833979]  [<ffffffff8109506d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
> >[ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
> >[ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
> >[ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
> >[ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
> >[ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
> >[ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
> >[ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
> >[ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
> >[ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
> >[ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
> >[ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
> >[ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
> >[ 9087.837189] ------------[ cut here ]------------
> >[ 9087.837189] kernel BUG at net/core/dev.c:4130!
> >
> 
> Hi Dave,

Hi,
  Sorry for the delayed response; I was on vacation last week.

> This seems to be a deadlock bug. We have called some functions that could lead to schedule
> between rcu read lock and unlock. There are two places, One is netlink_alloc_skb() with GFP_KERNEL flag,
> and the other one is netlink_unicast() (It can also lead to schedule in some special cases).
> 
> Please test with the follow modification. ;)

Thanks.

> diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c
> index a8cf1a1..d8a6eab 100644
> --- a/xt_PMYCOLO.c
> +++ b/xt_PMYCOLO.c
> @@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) {
>         if (node) {
>                 pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n",
>                         id, node);
> +               rcu_read_unlock();
>                 colo_send_checkpoint_req(&node->u.p);
>         }
>         rcu_read_unlock();

It seemed to still generate me the same backtrace;  I'm not sure that
the rcu_read_lock is the problem here, I'd assumed it was because it was
being called in the softirq stuff, but I'm fuzzy about how that's
supposed to work.

Dave

> 
> 
> Thanks,
> zhanghailiang
> 
> >>
> >>
> >>Thanks
> >>Wen Congyang
> >>
> >>>
> >>>>we also have finished some new features and optimization on COLO. (If you are interested in this,
> >>>>we can send them to you in private ;))
> >>>
> >>>>For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
> >>>>patch set before it been totally reviewed.
> >>>
> >>>I'd like to see those; but I don't want to take code privately.
> >>>It's OK to post extra stuff as a separate set.
> >>>
> >>>>COLO is a totally new feature which is still in early stage, we hope to speed up the development,
> >>>>so your comments and feedback are warmly welcomed. :)
> >>>
> >>>Yes, it's getting there though; I don't think anyone else has
> >>>got this close to getting a full FT set working with disk and networking.
> >>>
> >>>Dave
> >>>
> >>>>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (28 preceding siblings ...)
  2015-04-08  8:16 ` [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
@ 2015-05-14 12:14 ` Dr. David Alan Gilbert
  2015-05-14 12:58   ` zhanghailiang
  29 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-14 12:14 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
> failover, proxy API, block replication API, not include block replication.
> The block part has been sent by wencongyang:
> [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
> 
> Compared with last version, there aren't too much optimize and new functions.
> The main reason is that there is an known issue that still unsolved, we found
> some dirty pages which have been missed setting bit in corresponding bitmap.
> And it will trigger strange problem in VM.
> We hope to resolve it before add more codes.
> 
> You can get the newest integrated qemu colo patches from github:
> https://github.com/coloft/qemu/commits/colo-v1.1

I thought I'd just say I've got the remotes/origin/colo_huawei_v4.7 off Wen's
git running here on one of my pair of machines.

This set of hosts is mostly OK running colo; the proxy module still gets
upset sometimes; but mostly on errors;  if the colo-proxy-script fails
during startup, I get RCU stalls; but if the script works, colo normally
works on this setup.

Dave

> About how to test COLO, Please reference to the follow link.
> http://wiki.qemu.org/Features/COLO.
> 
> Please review and test.
> 
> Known issue still unsolved:
> (1) Some pages dirtied without setting its corresponding dirty-bitmap.
> 
> Previous posted RFC patch series:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
> 
> TODO list:
> 1 Optimize the process of checkpoint, shorten the time-consuming:
>   (Partly done, patch is not include into this series)
>    1) separate ram and device save/load process to reduce size of extra memory
>       used during checkpoint
>    2) live migrate part of dirty pages to slave during sleep time.
> 2 Add more debug/stat info
>   (Partly done, patch is not include into this series)
>   include checkpoint count, proxy discompare count, downtime,
>    number of live migrated pages, total sent pages, etc.
> 3 Strengthen failover
> 4 optimize proxy part, include proxy script.
> 5 The capability of continuous FT
> 
> v4:
> - New block replication scheme (use image-fleecing for sencondary side)
> - Adress some comments from Eric Blake and Dave
> - Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
> - Add a delay (100ms) between continuous checkpoint requests to ensure VM
>   run 100ms at least since last pause.
> 
> v3:
> - use proxy instead of colo agent to compare network packets
> - add block replication
> - Optimize failover disposal
> - handle shutdown
> 
> v2:
> - use QEMUSizedBuffer/QEMUFile as COLO buffer
> - colo support is enabled by default
> - add nic replication support
> - addressed comments from Eric Blake and Dr. David Alan Gilbert
> 
> v1:
> - implement the frame of colo
> 
> Wen Congyang (1):
>   COLO: Add block replication into colo process
> 
> zhanghailiang (27):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'colo' to migration
>   COLO: migrate colo related info to slave
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save VM state to slave when do checkpoint
>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>   COLO VMstate: Load VM state into qsb before restore it
>   arch_init: Start to trace dirty pages of SVM
>   COLO RAM: Flush cached RAM into SVM's memory
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Implement COLO master/slave failover work
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>   COLO NIC: Implement colo nic device interface configure()
>   COLO NIC : Implement colo nic init/destroy function
>   COLO NIC: Some init work related with proxy module
>   COLO: Do checkpoint according to the result of net packets comparing
>   COLO: Improve checkpoint efficiency by do additional periodic
>     checkpoint
>   COLO: Add colo-set-checkpoint-period command
>   COLO NIC: Implement NIC checkpoint and failover
>   COLO: Disable qdev hotplug when VM is in COLO mode
>   COLO: Implement shutdown checkpoint
> 
>  arch_init.c                            | 199 +++++++-
>  configure                              |  14 +
>  hmp-commands.hx                        |  30 ++
>  hmp.c                                  |  14 +
>  hmp.h                                  |   2 +
>  include/exec/cpu-all.h                 |   1 +
>  include/migration/migration-colo.h     |  58 +++
>  include/migration/migration-failover.h |  22 +
>  include/migration/migration.h          |   3 +
>  include/migration/qemu-file.h          |   3 +-
>  include/net/colo-nic.h                 |  25 +
>  include/net/net.h                      |   4 +
>  include/sysemu/sysemu.h                |   3 +
>  migration/Makefile.objs                |   2 +
>  migration/colo-comm.c                  |  80 ++++
>  migration/colo-failover.c              |  48 ++
>  migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
>  migration/migration.c                  |  60 ++-
>  migration/qemu-file-buf.c              |  58 +++
>  net/Makefile.objs                      |   1 +
>  net/colo-nic.c                         | 438 ++++++++++++++++++
>  net/tap.c                              |  45 +-
>  qapi-schema.json                       |  42 +-
>  qemu-options.hx                        |  10 +-
>  qmp-commands.hx                        |  41 ++
>  savevm.c                               |   2 +-
>  scripts/colo-proxy-script.sh           |  97 ++++
>  stubs/Makefile.objs                    |   1 +
>  stubs/migration-colo.c                 |  58 +++
>  vl.c                                   |  36 +-
>  30 files changed, 2178 insertions(+), 28 deletions(-)
>  create mode 100644 include/migration/migration-colo.h
>  create mode 100644 include/migration/migration-failover.h
>  create mode 100644 include/net/colo-nic.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 migration/colo.c.
>  create mode 100644 net/colo-nic.c
>  create mode 100755 scripts/colo-proxy-script.sh
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-05-14 12:14 ` Dr. David Alan Gilbert
@ 2015-05-14 12:58   ` zhanghailiang
  2015-05-14 16:09     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-05-14 12:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

On 2015/5/14 20:14, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
>> failover, proxy API, block replication API, not include block replication.
>> The block part has been sent by wencongyang:
>> [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
>>
>> Compared with last version, there aren't too much optimize and new functions.
>> The main reason is that there is an known issue that still unsolved, we found
>> some dirty pages which have been missed setting bit in corresponding bitmap.
>> And it will trigger strange problem in VM.
>> We hope to resolve it before add more codes.
>>
>> You can get the newest integrated qemu colo patches from github:
>> https://github.com/coloft/qemu/commits/colo-v1.1
>
> I thought I'd just say I've got the remotes/origin/colo_huawei_v4.7 off Wen's
> git running here on one of my pair of machines.
>
> This set of hosts is mostly OK running colo; the proxy module still gets
> upset sometimes; but mostly on errors;  if the colo-proxy-script fails
> during startup, I get RCU stalls; but if the script works, colo normally
> works on this setup.

Hi Dave,

I'm trying to optimize the proxy module codes, and yes, the 'lock' we used in proxy module
was a little messy before, and i have finished rewriting the part of communication between qemu
and proxy, also removing the parameter of xt_PMYCOLO module ...

There are still some further tests need to be done, i will finish it as soon as possible.
I hope to send the next version in next week. And you can test the branch of Wen's temporarily,
Except block part, there is no difference in COLO framework between his branch and mine. ;)

Thanks,
zhanghailiang


>
>> About how to test COLO, Please reference to the follow link.
>> http://wiki.qemu.org/Features/COLO.
>>
>> Please review and test.
>>
>> Known issue still unsolved:
>> (1) Some pages dirtied without setting its corresponding dirty-bitmap.
>>
>> Previous posted RFC patch series:
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
>> https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
>>
>> TODO list:
>> 1 Optimize the process of checkpoint, shorten the time-consuming:
>>    (Partly done, patch is not include into this series)
>>     1) separate ram and device save/load process to reduce size of extra memory
>>        used during checkpoint
>>     2) live migrate part of dirty pages to slave during sleep time.
>> 2 Add more debug/stat info
>>    (Partly done, patch is not include into this series)
>>    include checkpoint count, proxy discompare count, downtime,
>>     number of live migrated pages, total sent pages, etc.
>> 3 Strengthen failover
>> 4 optimize proxy part, include proxy script.
>> 5 The capability of continuous FT
>>
>> v4:
>> - New block replication scheme (use image-fleecing for sencondary side)
>> - Adress some comments from Eric Blake and Dave
>> - Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
>> - Add a delay (100ms) between continuous checkpoint requests to ensure VM
>>    run 100ms at least since last pause.
>>
>> v3:
>> - use proxy instead of colo agent to compare network packets
>> - add block replication
>> - Optimize failover disposal
>> - handle shutdown
>>
>> v2:
>> - use QEMUSizedBuffer/QEMUFile as COLO buffer
>> - colo support is enabled by default
>> - add nic replication support
>> - addressed comments from Eric Blake and Dr. David Alan Gilbert
>>
>> v1:
>> - implement the frame of colo
>>
>> Wen Congyang (1):
>>    COLO: Add block replication into colo process
>>
>> zhanghailiang (27):
>>    configure: Add parameter for configure to enable/disable COLO support
>>    migration: Introduce capability 'colo' to migration
>>    COLO: migrate colo related info to slave
>>    migration: Integrate COLO checkpoint process into migration
>>    migration: Integrate COLO checkpoint process into loadvm
>>    COLO: Implement colo checkpoint protocol
>>    COLO: Add a new RunState RUN_STATE_COLO
>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>    COLO: Save VM state to slave when do checkpoint
>>    COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>>    COLO VMstate: Load VM state into qsb before restore it
>>    arch_init: Start to trace dirty pages of SVM
>>    COLO RAM: Flush cached RAM into SVM's memory
>>    COLO failover: Introduce a new command to trigger a failover
>>    COLO failover: Implement COLO master/slave failover work
>>    COLO failover: Don't do failover during loading VM's state
>>    COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>>    COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>>    COLO NIC: Implement colo nic device interface configure()
>>    COLO NIC : Implement colo nic init/destroy function
>>    COLO NIC: Some init work related with proxy module
>>    COLO: Do checkpoint according to the result of net packets comparing
>>    COLO: Improve checkpoint efficiency by do additional periodic
>>      checkpoint
>>    COLO: Add colo-set-checkpoint-period command
>>    COLO NIC: Implement NIC checkpoint and failover
>>    COLO: Disable qdev hotplug when VM is in COLO mode
>>    COLO: Implement shutdown checkpoint
>>
>>   arch_init.c                            | 199 +++++++-
>>   configure                              |  14 +
>>   hmp-commands.hx                        |  30 ++
>>   hmp.c                                  |  14 +
>>   hmp.h                                  |   2 +
>>   include/exec/cpu-all.h                 |   1 +
>>   include/migration/migration-colo.h     |  58 +++
>>   include/migration/migration-failover.h |  22 +
>>   include/migration/migration.h          |   3 +
>>   include/migration/qemu-file.h          |   3 +-
>>   include/net/colo-nic.h                 |  25 +
>>   include/net/net.h                      |   4 +
>>   include/sysemu/sysemu.h                |   3 +
>>   migration/Makefile.objs                |   2 +
>>   migration/colo-comm.c                  |  80 ++++
>>   migration/colo-failover.c              |  48 ++
>>   migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
>>   migration/migration.c                  |  60 ++-
>>   migration/qemu-file-buf.c              |  58 +++
>>   net/Makefile.objs                      |   1 +
>>   net/colo-nic.c                         | 438 ++++++++++++++++++
>>   net/tap.c                              |  45 +-
>>   qapi-schema.json                       |  42 +-
>>   qemu-options.hx                        |  10 +-
>>   qmp-commands.hx                        |  41 ++
>>   savevm.c                               |   2 +-
>>   scripts/colo-proxy-script.sh           |  97 ++++
>>   stubs/Makefile.objs                    |   1 +
>>   stubs/migration-colo.c                 |  58 +++
>>   vl.c                                   |  36 +-
>>   30 files changed, 2178 insertions(+), 28 deletions(-)
>>   create mode 100644 include/migration/migration-colo.h
>>   create mode 100644 include/migration/migration-failover.h
>>   create mode 100644 include/net/colo-nic.h
>>   create mode 100644 migration/colo-comm.c
>>   create mode 100644 migration/colo-failover.c
>>   create mode 100644 migration/colo.c
>>   create mode 100644 migration/colo.c.
>>   create mode 100644 net/colo-nic.c
>>   create mode 100755 scripts/colo-proxy-script.sh
>>   create mode 100644 stubs/migration-colo.c
>>
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-05-14 12:58   ` zhanghailiang
@ 2015-05-14 16:09     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-14 16:09 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/5/14 20:14, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint,
> >>failover, proxy API, block replication API, not include block replication.
> >>The block part has been sent by wencongyang:
> >>[RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints
> >>
> >>Compared with last version, there aren't too much optimize and new functions.
> >>The main reason is that there is an known issue that still unsolved, we found
> >>some dirty pages which have been missed setting bit in corresponding bitmap.
> >>And it will trigger strange problem in VM.
> >>We hope to resolve it before add more codes.
> >>
> >>You can get the newest integrated qemu colo patches from github:
> >>https://github.com/coloft/qemu/commits/colo-v1.1
> >
> >I thought I'd just say I've got the remotes/origin/colo_huawei_v4.7 off Wen's
> >git running here on one of my pair of machines.
> >
> >This set of hosts is mostly OK running colo; the proxy module still gets
> >upset sometimes; but mostly on errors;  if the colo-proxy-script fails
> >during startup, I get RCU stalls; but if the script works, colo normally
> >works on this setup.
> 
> Hi Dave,
> 
> I'm trying to optimize the proxy module codes, and yes, the 'lock' we used in proxy module
> was a little messy before, and i have finished rewriting the part of communication between qemu
> and proxy, also removing the parameter of xt_PMYCOLO module ...

Good; that should be a lot better.

> There are still some further tests need to be done, i will finish it as soon as possible.
> I hope to send the next version in next week. And you can test the branch of Wen's temporarily,
> Except block part, there is no difference in COLO framework between his branch and mine. ;)

I'll be happy to try that.

Dave

> 
> Thanks,
> zhanghailiang
> 
> 
> >
> >>About how to test COLO, Please reference to the follow link.
> >>http://wiki.qemu.org/Features/COLO.
> >>
> >>Please review and test.
> >>
> >>Known issue still unsolved:
> >>(1) Some pages dirtied without setting its corresponding dirty-bitmap.
> >>
> >>Previous posted RFC patch series:
> >>http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> >>http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> >>https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html
> >>
> >>TODO list:
> >>1 Optimize the process of checkpoint, shorten the time-consuming:
> >>   (Partly done, patch is not include into this series)
> >>    1) separate ram and device save/load process to reduce size of extra memory
> >>       used during checkpoint
> >>    2) live migrate part of dirty pages to slave during sleep time.
> >>2 Add more debug/stat info
> >>   (Partly done, patch is not include into this series)
> >>   include checkpoint count, proxy discompare count, downtime,
> >>    number of live migrated pages, total sent pages, etc.
> >>3 Strengthen failover
> >>4 optimize proxy part, include proxy script.
> >>5 The capability of continuous FT
> >>
> >>v4:
> >>- New block replication scheme (use image-fleecing for sencondary side)
> >>- Adress some comments from Eric Blake and Dave
> >>- Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint
> >>- Add a delay (100ms) between continuous checkpoint requests to ensure VM
> >>   run 100ms at least since last pause.
> >>
> >>v3:
> >>- use proxy instead of colo agent to compare network packets
> >>- add block replication
> >>- Optimize failover disposal
> >>- handle shutdown
> >>
> >>v2:
> >>- use QEMUSizedBuffer/QEMUFile as COLO buffer
> >>- colo support is enabled by default
> >>- add nic replication support
> >>- addressed comments from Eric Blake and Dr. David Alan Gilbert
> >>
> >>v1:
> >>- implement the frame of colo
> >>
> >>Wen Congyang (1):
> >>   COLO: Add block replication into colo process
> >>
> >>zhanghailiang (27):
> >>   configure: Add parameter for configure to enable/disable COLO support
> >>   migration: Introduce capability 'colo' to migration
> >>   COLO: migrate colo related info to slave
> >>   migration: Integrate COLO checkpoint process into migration
> >>   migration: Integrate COLO checkpoint process into loadvm
> >>   COLO: Implement colo checkpoint protocol
> >>   COLO: Add a new RunState RUN_STATE_COLO
> >>   QEMUSizedBuffer: Introduce two help functions for qsb
> >>   COLO: Save VM state to slave when do checkpoint
> >>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
> >>   COLO VMstate: Load VM state into qsb before restore it
> >>   arch_init: Start to trace dirty pages of SVM
> >>   COLO RAM: Flush cached RAM into SVM's memory
> >>   COLO failover: Introduce a new command to trigger a failover
> >>   COLO failover: Implement COLO master/slave failover work
> >>   COLO failover: Don't do failover during loading VM's state
> >>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
> >>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
> >>   COLO NIC: Implement colo nic device interface configure()
> >>   COLO NIC : Implement colo nic init/destroy function
> >>   COLO NIC: Some init work related with proxy module
> >>   COLO: Do checkpoint according to the result of net packets comparing
> >>   COLO: Improve checkpoint efficiency by do additional periodic
> >>     checkpoint
> >>   COLO: Add colo-set-checkpoint-period command
> >>   COLO NIC: Implement NIC checkpoint and failover
> >>   COLO: Disable qdev hotplug when VM is in COLO mode
> >>   COLO: Implement shutdown checkpoint
> >>
> >>  arch_init.c                            | 199 +++++++-
> >>  configure                              |  14 +
> >>  hmp-commands.hx                        |  30 ++
> >>  hmp.c                                  |  14 +
> >>  hmp.h                                  |   2 +
> >>  include/exec/cpu-all.h                 |   1 +
> >>  include/migration/migration-colo.h     |  58 +++
> >>  include/migration/migration-failover.h |  22 +
> >>  include/migration/migration.h          |   3 +
> >>  include/migration/qemu-file.h          |   3 +-
> >>  include/net/colo-nic.h                 |  25 +
> >>  include/net/net.h                      |   4 +
> >>  include/sysemu/sysemu.h                |   3 +
> >>  migration/Makefile.objs                |   2 +
> >>  migration/colo-comm.c                  |  80 ++++
> >>  migration/colo-failover.c              |  48 ++
> >>  migration/colo.c                       | 809 +++++++++++++++++++++++++++++++++
> >>  migration/migration.c                  |  60 ++-
> >>  migration/qemu-file-buf.c              |  58 +++
> >>  net/Makefile.objs                      |   1 +
> >>  net/colo-nic.c                         | 438 ++++++++++++++++++
> >>  net/tap.c                              |  45 +-
> >>  qapi-schema.json                       |  42 +-
> >>  qemu-options.hx                        |  10 +-
> >>  qmp-commands.hx                        |  41 ++
> >>  savevm.c                               |   2 +-
> >>  scripts/colo-proxy-script.sh           |  97 ++++
> >>  stubs/Makefile.objs                    |   1 +
> >>  stubs/migration-colo.c                 |  58 +++
> >>  vl.c                                   |  36 +-
> >>  30 files changed, 2178 insertions(+), 28 deletions(-)
> >>  create mode 100644 include/migration/migration-colo.h
> >>  create mode 100644 include/migration/migration-failover.h
> >>  create mode 100644 include/net/colo-nic.h
> >>  create mode 100644 migration/colo-comm.c
> >>  create mode 100644 migration/colo-failover.c
> >>  create mode 100644 migration/colo.c
> >>  create mode 100644 migration/colo.c.
> >>  create mode 100644 net/colo-nic.c
> >>  create mode 100755 scripts/colo-proxy-script.sh
> >>  create mode 100644 stubs/migration-colo.c
> >>
> >>--
> >>1.7.12.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2015-05-15 11:28   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-15 11:28 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Guest will enter this state when paused to save/restore VM state
> under colo checkpoint.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(Do suspend and watchdog-with-pause work with colo - they sound odd
combinations).

> ---
>  qapi-schema.json | 5 ++++-
>  vl.c             | 8 ++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 172aae3..43a964b 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -148,12 +148,15 @@
>  # @watchdog: the watchdog action is configured to pause and has been triggered
>  #
>  # @guest-panicked: guest has been panicked as a result of guest OS panic
> +#
> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since
> +# 2.4)
>  ##
>  { 'enum': 'RunState',
>    'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>              'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>              'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
> -            'guest-panicked' ] }
> +            'guest-panicked', 'colo' ] }
>  
>  ##
>  # @StatusInfo:
> diff --git a/vl.c b/vl.c
> index 9724992..8c07244 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -550,6 +550,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>  
>      { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
> +    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
>  
>      { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
>      { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
> @@ -559,6 +560,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>  
>      { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
>      { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_PAUSED, RUN_STATE_COLO},
>  
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
> @@ -569,9 +571,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>  
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
>  
>      { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>  
> +    { RUN_STATE_COLO, RUN_STATE_RUNNING },
> +
>      { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>      { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
>      { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
> @@ -582,6 +587,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
>      { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
>      { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
> +    { RUN_STATE_RUNNING, RUN_STATE_COLO},
>  
>      { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
>  
> @@ -592,9 +598,11 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>      { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>  
>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
>  
>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave zhanghailiang
@ 2015-05-15 11:38   ` Dr. David Alan Gilbert
  2015-05-18  5:04     ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-15 11:38 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We can know if VM in destination should go into COLO mode by refer to
> the info that has been migrated from PVM.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  include/migration/migration-colo.h |  2 ++
>  migration/Makefile.objs            |  1 +
>  migration/colo-comm.c              | 55 ++++++++++++++++++++++++++++++++++++++
>  vl.c                               |  5 +++-
>  4 files changed, 62 insertions(+), 1 deletion(-)
>  create mode 100644 migration/colo-comm.c
> 
> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
> index 6fdbb94..de68c72 100644
> --- a/include/migration/migration-colo.h
> +++ b/include/migration/migration-colo.h
> @@ -14,7 +14,9 @@
>  #define QEMU_MIGRATION_COLO_H
>  
>  #include "qemu-common.h"
> +#include "migration/migration.h"
>  
>  bool colo_supported(void);
> +void colo_info_mig_init(void);
>  
>  #endif
> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index 5a25d39..cb7bd30 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -1,5 +1,6 @@
>  common-obj-y += migration.o tcp.o
>  common-obj-$(CONFIG_COLO) += colo.o
> +common-obj-y += colo-comm.o
>  common-obj-y += vmstate.o
>  common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>  common-obj-y += xbzrle.o
> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
> new file mode 100644
> index 0000000..cab97e9
> --- /dev/null
> +++ b/migration/colo-comm.c
> @@ -0,0 +1,55 @@
> +/*
> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + * (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later. See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include <migration/migration-colo.h>
> +
> +#define DEBUG_COLO_COMMON 0
> +
> +#define DPRINTF(fmt, ...)                                  \
> +    do {                                                   \
> +        if (DEBUG_COLO_COMMON) {                           \
> +            fprintf(stderr, "COLO: " fmt, ## __VA_ARGS__); \
> +        }                                                  \
> +    } while (0)


I'm trying to get rid of all the DPRINTF's in migration - I've turned
most of the existing DPRINTF into trace_ calls already;
using the stderr trace backend it's very easy (--enable-trace-backends=stderr)
and then you can switch them on individually, it also prints times with
each message that's useful for performance tuning.

Dave

> +static bool colo_requested;
> +
> +/* save */
> +static void colo_info_save(QEMUFile *f, void *opaque)
> +{
> +    qemu_put_byte(f, migrate_enable_colo());
> +}
> +
> +/* restore */
> +static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    int value = qemu_get_byte(f);
> +
> +    if (value && !colo_requested) {
> +        DPRINTF("COLO requested!\n");
> +    }
> +    colo_requested = value;
> +
> +    return 0;
> +}
> +
> +static SaveVMHandlers savevm_colo_info_handlers = {
> +    .save_state = colo_info_save,
> +    .load_state = colo_info_load,
> +};
> +
> +void colo_info_mig_init(void)
> +{
> +    register_savevm_live(NULL, "colo", -1, 1,
> +                         &savevm_colo_info_handlers, NULL);
> +}
> diff --git a/vl.c b/vl.c
> index 75ec292..9724992 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -90,6 +90,7 @@ int main(int argc, char **argv)
>  #include "sysemu/dma.h"
>  #include "audio/audio.h"
>  #include "migration/migration.h"
> +#include "migration/migration-colo.h"
>  #include "sysemu/kvm.h"
>  #include "qapi/qmp/qjson.h"
>  #include "qemu/option.h"
> @@ -4149,7 +4150,9 @@ int main(int argc, char **argv, char **envp)
>  
>      blk_mig_init();
>      ram_mig_init();
> -
> +#ifdef CONFIG_COLO
> +    colo_info_mig_init();
> +#endif
>      /* If the currently selected machine wishes to override the units-per-bus
>       * property of its default HBA interface type, do so now. */
>      if (machine_class->units_per_default_bus) {
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2015-05-15 11:56   ` Dr. David Alan Gilbert
  2015-05-18  5:10     ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-15 11:56 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Yang Hongyang, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
> VM state:
> One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
> into QEMUFile, this is used to send buffered VM state to secondary.
> Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
> qsb, this is used to get VM state from socket into a buffer.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(I could use these in my postcopy world)

Dave

> ---
>  include/migration/qemu-file.h |  3 ++-
>  migration/qemu-file-buf.c     | 58 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 745a850..09a0e2a 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -140,7 +140,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
>                         uint8_t *buf);
>  ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
>                       off_t pos, size_t count);
> -
> +void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
> +int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);
>  
>  /*
>   * For use on files opened with qemu_bufopen
> diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
> index 16a51a1..686f417 100644
> --- a/migration/qemu-file-buf.c
> +++ b/migration/qemu-file-buf.c
> @@ -365,6 +365,64 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
>      return count;
>  }
>  
> +
> +/**
> + * Put the content of a given QEMUSizedBuffer into QEMUFile.
> + *
> + * @f: A QEMUFile
> + * @qsb: A QEMUSizedBuffer
> + * @size: size of content to write
> + */
> +void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
> +{
> +    int i, l;
> +
> +    for (i = 0; i < qsb->n_iov && size > 0; i++) {
> +        l = MIN(qsb->iov[i].iov_len, size);
> +        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
> +        size -= l;
> +    }
> +}
> +
> +/*
> + * Read 'size' bytes of data from the file into qsb.
> + * always fill from pos 0 and used after qsb_create().
> + *
> + * It will return size bytes unless there was an error, in which case it will
> + * return as many as it managed to read (assuming blocking fd's which
> + * all current QEMUFile are)
> + */
> +int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
> +{
> +    ssize_t rc = qsb_grow(qsb, size);
> +    int pending = size, i;
> +    qsb->used = 0;
> +    uint8_t *buf = NULL;
> +
> +    if (rc < 0) {
> +        return rc;
> +    }
> +
> +    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
> +        int doneone = 0;
> +        /* read until iov full */
> +        while (doneone < qsb->iov[i].iov_len && pending > 0) {
> +            int readone = 0;
> +            buf = qsb->iov[i].iov_base;
> +            readone = qemu_get_buffer(f, buf,
> +                                MIN(qsb->iov[i].iov_len - doneone, pending));
> +            if (readone == 0) {
> +                return qsb->used;
> +            }
> +            buf += readone;
> +            doneone += readone;
> +            pending -= readone;
> +            qsb->used += readone;
> +        }
> +    }
> +    return qsb->used;
> +}
> +
>  typedef struct QEMUBuffer {
>      QEMUSizedBuffer *qsb;
>      QEMUFile *file;
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint zhanghailiang
@ 2015-05-15 12:09   ` Dr. David Alan Gilbert
  2015-05-18  9:11     ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-15 12:09 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We should save PVM's RAM/device to slave when needed.
> 
> For VM state, we  will cache them in slave, we use QEMUSizedBuffer
> to store the data, we need know the data size of VM state, so in master,
> we use qsb to store VM state temporarily, and then migrate the data to
> slave.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  arch_init.c      | 22 ++++++++++++++++++--
>  migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  savevm.c         |  2 +-
>  3 files changed, 79 insertions(+), 7 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index fcfa328..e928e11 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -53,6 +53,7 @@
>  #include "hw/acpi/acpi.h"
>  #include "qemu/host-utils.h"
>  #include "qemu/rcu_queue.h"
> +#include "migration/migration-colo.h"
>  
>  #ifdef DEBUG_ARCH_INIT
>  #define DPRINTF(fmt, ...) \
> @@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMBlock *block;
>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>  
> +    /*
> +     * migration has already setup the bitmap, reuse it.
> +     */
> +    if (migrate_in_colo_state()) {
> +        goto setup_part;
> +    }
> +

This is a bit odd.   It would be easier if you just moved the init code
inside this if, rather than goto'ing over it (or move the other code that
you actually want into another function that then gets called from the bottom
of here?)
The thing that also makes it especially odd is that you goto over
the rcu_read_lock and then have to fix it up; that's getting messy.

(The qemu style seems to be OK to use goto to jump to a shared error
block at the end of a function but otherwise it should be rare).

>      mig_throttle_on = false;
>      dirty_rate_high_cnt = 0;
>      bitmap_sync_count = 0;
> @@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      migration_bitmap_sync();
>      qemu_mutex_unlock_ramlist();
>      qemu_mutex_unlock_iothread();
> -
> +setup_part:
>      qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>  
> +    if (migrate_in_colo_state()) {
> +        rcu_read_lock();
> +    }
>      QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>          qemu_put_byte(f, strlen(block->idstr));
>          qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
> @@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>      }
>  
>      ram_control_after_iterate(f, RAM_CONTROL_FINISH);
> -    migration_end();
> +
> +    /*
> +     * Since we need to reuse dirty bitmap in colo,
> +     * don't cleanup the bitmap.
> +     */
> +    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
> +        migration_end();
> +    }
>  
>      rcu_read_unlock();
>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> diff --git a/migration/colo.c b/migration/colo.c
> index 5a8ed1b..64e3f3a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -60,6 +60,9 @@ enum {
>  
>  static QEMUBH *colo_bh;
>  static Coroutine *colo;
> +/* colo buffer */
> +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)

Surely you want that as 4*1024*1024 ?  Anyway, now that qemu has
migrate_set_parameter, it's probably best to wire these magic numbers
to parameters that can be configured.

> +QEMUSizedBuffer *colo_buffer;
>  
>  bool colo_supported(void)
>  {
> @@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
>  static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>  {
>      int ret;
> +    size_t size;
> +    QEMUFile *trans = NULL;
>  
>      ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
>      if (ret < 0) {
> @@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>      if (ret < 0) {
>          goto out;
>      }
> +    /* Reset colo buffer and open it for write */
> +    qsb_set_length(colo_buffer, 0);
> +    trans = qemu_bufopen("w", colo_buffer);
> +    if (!trans) {
> +        error_report("Open colo buffer for write failed");
> +        goto out;
> +    }
> +
> +    /* suspend and save vm state to colo buffer */
> +    qemu_mutex_lock_iothread();
> +    vm_stop_force_state(RUN_STATE_COLO);
> +    qemu_mutex_unlock_iothread();
> +    DPRINTF("vm is stoped\n");
> +
> +    /* Disable block migration */
> +    s->params.blk = 0;
> +    s->params.shared = 0;
> +    qemu_savevm_state_begin(trans, &s->params);
> +    qemu_mutex_lock_iothread();
> +    qemu_savevm_state_complete(trans);
> +    qemu_mutex_unlock_iothread();
>  
> -    /* TODO: suspend and save vm state to colo buffer */
> +    qemu_fflush(trans);
>  
>      ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>      if (ret < 0) {
>          goto out;
>      }
> +    /* we send the total size of the vmstate first */
> +    size = qsb_get_length(colo_buffer);
> +    ret = colo_ctl_put(s->file, size);
> +    if (ret < 0) {
> +        goto out;
> +    }
>  
> -    /* TODO: send vmstate to slave */
> -
> +    qsb_put_buffer(s->file, colo_buffer, size);
> +    qemu_fflush(s->file);
> +    ret = qemu_file_get_error(s->file);
> +    if (ret < 0) {
> +        goto out;
> +    }
>      ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
>      if (ret < 0) {
>          goto out;
> @@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>      }
>      DPRINTF("got COLO_CHECKPOINT_LOADED\n");
>  
> -    /* TODO: resume master */
> +    ret = 0;
> +    /* resume master */
> +    qemu_mutex_lock_iothread();
> +    vm_start();
> +    qemu_mutex_unlock_iothread();
> +    DPRINTF("vm resume to run again\n");
>  
>  out:
> +    if (trans) {
> +        qemu_fclose(trans);
> +    }
> +
>      return ret;
>  }
>  
> @@ -182,6 +227,12 @@ static void *colo_thread(void *opaque)
>      }
>      DPRINTF("get COLO_READY\n");
>  
> +    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
> +    if (colo_buffer == NULL) {
> +        error_report("Failed to allocate colo buffer!");
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
>      vm_start();
>      qemu_mutex_unlock_iothread();
> @@ -197,6 +248,9 @@ static void *colo_thread(void *opaque)
>  out:
>      migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
>  
> +    qsb_free(colo_buffer);
> +    colo_buffer = NULL;
> +
>      if (colo_control) {
>          qemu_fclose(colo_control);
>      }
> diff --git a/savevm.c b/savevm.c
> index 3b0e222..cd7ec27 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -42,7 +42,7 @@
>  #include "qemu/iov.h"
>  #include "block/snapshot.h"
>  #include "block/qapi.h"
> -
> +#include "migration/migration-colo.h"
>  
>  #ifndef ETH_P_RARP
>  #define ETH_P_RARP 0x8035
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave
  2015-05-15 11:38   ` Dr. David Alan Gilbert
@ 2015-05-18  5:04     ` zhanghailiang
  0 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-05-18  5:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

On 2015/5/15 19:38, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We can know if VM in destination should go into COLO mode by refer to
>> the info that has been migrated from PVM.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>   include/migration/migration-colo.h |  2 ++
>>   migration/Makefile.objs            |  1 +
>>   migration/colo-comm.c              | 55 ++++++++++++++++++++++++++++++++++++++
>>   vl.c                               |  5 +++-
>>   4 files changed, 62 insertions(+), 1 deletion(-)
>>   create mode 100644 migration/colo-comm.c
>>
>> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
>> index 6fdbb94..de68c72 100644
>> --- a/include/migration/migration-colo.h
>> +++ b/include/migration/migration-colo.h
>> @@ -14,7 +14,9 @@
>>   #define QEMU_MIGRATION_COLO_H
>>
>>   #include "qemu-common.h"
>> +#include "migration/migration.h"
>>
>>   bool colo_supported(void);
>> +void colo_info_mig_init(void);
>>
>>   #endif
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index 5a25d39..cb7bd30 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -1,5 +1,6 @@
>>   common-obj-y += migration.o tcp.o
>>   common-obj-$(CONFIG_COLO) += colo.o
>> +common-obj-y += colo-comm.o
>>   common-obj-y += vmstate.o
>>   common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>>   common-obj-y += xbzrle.o
>> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
>> new file mode 100644
>> index 0000000..cab97e9
>> --- /dev/null
>> +++ b/migration/colo-comm.c
>> @@ -0,0 +1,55 @@
>> +/*
>> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>> + * (a.k.a. Fault Tolerance or Continuous Replication)
>> + *
>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> + * Copyright (c) 2015 FUJITSU LIMITED
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later. See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include <migration/migration-colo.h>
>> +
>> +#define DEBUG_COLO_COMMON 0
>> +
>> +#define DPRINTF(fmt, ...)                                  \
>> +    do {                                                   \
>> +        if (DEBUG_COLO_COMMON) {                           \
>> +            fprintf(stderr, "COLO: " fmt, ## __VA_ARGS__); \
>> +        }                                                  \
>> +    } while (0)
>
>
> I'm trying to get rid of all the DPRINTF's in migration - I've turned
> most of the existing DPRINTF into trace_ calls already;
> using the stderr trace backend it's very easy (--enable-trace-backends=stderr)
> and then you can switch them on individually, it also prints times with
> each message that's useful for performance tuning.

Yes, we should change all DPRINTF to trace_ calls. We will try to fix them in next version. ;)

Thanks,
zhanghailiang

>
>> +static bool colo_requested;
>> +
>> +/* save */
>> +static void colo_info_save(QEMUFile *f, void *opaque)
>> +{
>> +    qemu_put_byte(f, migrate_enable_colo());
>> +}
>> +
>> +/* restore */
>> +static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
>> +{
>> +    int value = qemu_get_byte(f);
>> +
>> +    if (value && !colo_requested) {
>> +        DPRINTF("COLO requested!\n");
>> +    }
>> +    colo_requested = value;
>> +
>> +    return 0;
>> +}
>> +
>> +static SaveVMHandlers savevm_colo_info_handlers = {
>> +    .save_state = colo_info_save,
>> +    .load_state = colo_info_load,
>> +};
>> +
>> +void colo_info_mig_init(void)
>> +{
>> +    register_savevm_live(NULL, "colo", -1, 1,
>> +                         &savevm_colo_info_handlers, NULL);
>> +}
>> diff --git a/vl.c b/vl.c
>> index 75ec292..9724992 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -90,6 +90,7 @@ int main(int argc, char **argv)
>>   #include "sysemu/dma.h"
>>   #include "audio/audio.h"
>>   #include "migration/migration.h"
>> +#include "migration/migration-colo.h"
>>   #include "sysemu/kvm.h"
>>   #include "qapi/qmp/qjson.h"
>>   #include "qemu/option.h"
>> @@ -4149,7 +4150,9 @@ int main(int argc, char **argv, char **envp)
>>
>>       blk_mig_init();
>>       ram_mig_init();
>> -
>> +#ifdef CONFIG_COLO
>> +    colo_info_mig_init();
>> +#endif
>>       /* If the currently selected machine wishes to override the units-per-bus
>>        * property of its default HBA interface type, do so now. */
>>       if (machine_class->units_per_default_bus) {
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb
  2015-05-15 11:56   ` Dr. David Alan Gilbert
@ 2015-05-18  5:10     ` zhanghailiang
  0 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-05-18  5:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Yang Hongyang, david

On 2015/5/15 19:56, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
>> VM state:
>> One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
>> into QEMUFile, this is used to send buffered VM state to secondary.
>> Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
>> qsb, this is used to get VM state from socket into a buffer.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> (I could use these in my postcopy world)

Please feel free to do that, maybe this patch can be merged in advance as prerequisite patch ;)

>
>> ---
>>   include/migration/qemu-file.h |  3 ++-
>>   migration/qemu-file-buf.c     | 58 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
>> index 745a850..09a0e2a 100644
>> --- a/include/migration/qemu-file.h
>> +++ b/include/migration/qemu-file.h
>> @@ -140,7 +140,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
>>                          uint8_t *buf);
>>   ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
>>                        off_t pos, size_t count);
>> -
>> +void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
>> +int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);
>>
>>   /*
>>    * For use on files opened with qemu_bufopen
>> diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
>> index 16a51a1..686f417 100644
>> --- a/migration/qemu-file-buf.c
>> +++ b/migration/qemu-file-buf.c
>> @@ -365,6 +365,64 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
>>       return count;
>>   }
>>
>> +
>> +/**
>> + * Put the content of a given QEMUSizedBuffer into QEMUFile.
>> + *
>> + * @f: A QEMUFile
>> + * @qsb: A QEMUSizedBuffer
>> + * @size: size of content to write
>> + */
>> +void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
>> +{
>> +    int i, l;
>> +
>> +    for (i = 0; i < qsb->n_iov && size > 0; i++) {
>> +        l = MIN(qsb->iov[i].iov_len, size);
>> +        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
>> +        size -= l;
>> +    }
>> +}
>> +
>> +/*
>> + * Read 'size' bytes of data from the file into qsb.
>> + * always fill from pos 0 and used after qsb_create().
>> + *
>> + * It will return size bytes unless there was an error, in which case it will
>> + * return as many as it managed to read (assuming blocking fd's which
>> + * all current QEMUFile are)
>> + */
>> +int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
>> +{
>> +    ssize_t rc = qsb_grow(qsb, size);
>> +    int pending = size, i;
>> +    qsb->used = 0;
>> +    uint8_t *buf = NULL;
>> +
>> +    if (rc < 0) {
>> +        return rc;
>> +    }
>> +
>> +    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
>> +        int doneone = 0;
>> +        /* read until iov full */
>> +        while (doneone < qsb->iov[i].iov_len && pending > 0) {
>> +            int readone = 0;
>> +            buf = qsb->iov[i].iov_base;
>> +            readone = qemu_get_buffer(f, buf,
>> +                                MIN(qsb->iov[i].iov_len - doneone, pending));
>> +            if (readone == 0) {
>> +                return qsb->used;
>> +            }
>> +            buf += readone;
>> +            doneone += readone;
>> +            pending -= readone;
>> +            qsb->used += readone;
>> +        }
>> +    }
>> +    return qsb->used;
>> +}
>> +
>>   typedef struct QEMUBuffer {
>>       QEMUSizedBuffer *qsb;
>>       QEMUFile *file;
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint
  2015-05-15 12:09   ` Dr. David Alan Gilbert
@ 2015-05-18  9:11     ` zhanghailiang
  2015-05-18 12:10       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 51+ messages in thread
From: zhanghailiang @ 2015-05-18  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

On 2015/5/15 20:09, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We should save PVM's RAM/device to slave when needed.
>>
>> For VM state, we  will cache them in slave, we use QEMUSizedBuffer
>> to store the data, we need know the data size of VM state, so in master,
>> we use qsb to store VM state temporarily, and then migrate the data to
>> slave.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   arch_init.c      | 22 ++++++++++++++++++--
>>   migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>   savevm.c         |  2 +-
>>   3 files changed, 79 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index fcfa328..e928e11 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -53,6 +53,7 @@
>>   #include "hw/acpi/acpi.h"
>>   #include "qemu/host-utils.h"
>>   #include "qemu/rcu_queue.h"
>> +#include "migration/migration-colo.h"
>>
>>   #ifdef DEBUG_ARCH_INIT
>>   #define DPRINTF(fmt, ...) \
>> @@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>       RAMBlock *block;
>>       int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>>
>> +    /*
>> +     * migration has already setup the bitmap, reuse it.
>> +     */
>> +    if (migrate_in_colo_state()) {
>> +        goto setup_part;
>> +    }
>> +
>
> This is a bit odd.   It would be easier if you just moved the init code
> inside this if, rather than goto'ing over it (or move the other code that
> you actually want into another function that then gets called from the bottom
> of here?)
> The thing that also makes it especially odd is that you goto over
> the rcu_read_lock and then have to fix it up; that's getting messy.
>

Yes, here we reuse ram_save_setup in COLO's checkpoint process, the difference is
in COLO's checkpoint process, we don't have to initialize these global variables again,
which has been initialized in the previous first migration process. But we have to resend
the info of ram_list.blocks. (Maybe this also not necessary ). I will split this function temporarily
to make this look gentler.

> (The qemu style seems to be OK to use goto to jump to a shared error
> block at the end of a function but otherwise it should be rare).
>
>>       mig_throttle_on = false;
>>       dirty_rate_high_cnt = 0;
>>       bitmap_sync_count = 0;
>> @@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>       migration_bitmap_sync();
>>       qemu_mutex_unlock_ramlist();
>>       qemu_mutex_unlock_iothread();
>> -
>> +setup_part:
>>       qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>>
>> +    if (migrate_in_colo_state()) {
>> +        rcu_read_lock();
>> +    }
>>       QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>>           qemu_put_byte(f, strlen(block->idstr));
>>           qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>> @@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>       }
>>
>>       ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>> -    migration_end();
>> +
>> +    /*
>> +     * Since we need to reuse dirty bitmap in colo,
>> +     * don't cleanup the bitmap.
>> +     */
>> +    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
>> +        migration_end();
>> +    }
>>
>>       rcu_read_unlock();
>>       qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 5a8ed1b..64e3f3a 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -60,6 +60,9 @@ enum {
>>
>>   static QEMUBH *colo_bh;
>>   static Coroutine *colo;
>> +/* colo buffer */
>> +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
>
> Surely you want that as 4*1024*1024 ?  Anyway, now that qemu has
> migrate_set_parameter, it's probably best to wire these magic numbers
> to parameters that can be configured.
>

Er, actually this macro can be any value, it does not matter, because qsb will grow automatically
if the size of qsb is not enough. (Am i right?).
And i don't think this internal used value should be exported to user. For now this size include
the size of dirty pages and the size of device related data. But, maybe i can use the new
'migrate_set_parameter' to achieve the capability of 'colo_set_checkpoint_period'.


Thanks,
zhanghailiang

>> +QEMUSizedBuffer *colo_buffer;
>>
>>   bool colo_supported(void)
>>   {
>> @@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
>>   static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>   {
>>       int ret;
>> +    size_t size;
>> +    QEMUFile *trans = NULL;
>>
>>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
>>       if (ret < 0) {
>> @@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>       if (ret < 0) {
>>           goto out;
>>       }
>> +    /* Reset colo buffer and open it for write */
>> +    qsb_set_length(colo_buffer, 0);
>> +    trans = qemu_bufopen("w", colo_buffer);
>> +    if (!trans) {
>> +        error_report("Open colo buffer for write failed");
>> +        goto out;
>> +    }
>> +
>> +    /* suspend and save vm state to colo buffer */
>> +    qemu_mutex_lock_iothread();
>> +    vm_stop_force_state(RUN_STATE_COLO);
>> +    qemu_mutex_unlock_iothread();
>> +    DPRINTF("vm is stoped\n");
>> +
>> +    /* Disable block migration */
>> +    s->params.blk = 0;
>> +    s->params.shared = 0;
>> +    qemu_savevm_state_begin(trans, &s->params);
>> +    qemu_mutex_lock_iothread();
>> +    qemu_savevm_state_complete(trans);
>> +    qemu_mutex_unlock_iothread();
>>
>> -    /* TODO: suspend and save vm state to colo buffer */
>> +    qemu_fflush(trans);
>>
>>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>>       if (ret < 0) {
>>           goto out;
>>       }
>> +    /* we send the total size of the vmstate first */
>> +    size = qsb_get_length(colo_buffer);
>> +    ret = colo_ctl_put(s->file, size);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>>
>> -    /* TODO: send vmstate to slave */
>> -
>> +    qsb_put_buffer(s->file, colo_buffer, size);
>> +    qemu_fflush(s->file);
>> +    ret = qemu_file_get_error(s->file);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>>       ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
>>       if (ret < 0) {
>>           goto out;
>> @@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>       }
>>       DPRINTF("got COLO_CHECKPOINT_LOADED\n");
>>
>> -    /* TODO: resume master */
>> +    ret = 0;
>> +    /* resume master */
>> +    qemu_mutex_lock_iothread();
>> +    vm_start();
>> +    qemu_mutex_unlock_iothread();
>> +    DPRINTF("vm resume to run again\n");
>>
>>   out:
>> +    if (trans) {
>> +        qemu_fclose(trans);
>> +    }
>> +
>>       return ret;
>>   }
>>
>> @@ -182,6 +227,12 @@ static void *colo_thread(void *opaque)
>>       }
>>       DPRINTF("get COLO_READY\n");
>>
>> +    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
>> +    if (colo_buffer == NULL) {
>> +        error_report("Failed to allocate colo buffer!");
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>>       vm_start();
>>       qemu_mutex_unlock_iothread();
>> @@ -197,6 +248,9 @@ static void *colo_thread(void *opaque)
>>   out:
>>       migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
>>
>> +    qsb_free(colo_buffer);
>> +    colo_buffer = NULL;
>> +
>>       if (colo_control) {
>>           qemu_fclose(colo_control);
>>       }
>> diff --git a/savevm.c b/savevm.c
>> index 3b0e222..cd7ec27 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -42,7 +42,7 @@
>>   #include "qemu/iov.h"
>>   #include "block/snapshot.h"
>>   #include "block/qapi.h"
>> -
>> +#include "migration/migration-colo.h"
>>
>>   #ifndef ETH_P_RARP
>>   #define ETH_P_RARP 0x8035
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint
  2015-05-18  9:11     ` zhanghailiang
@ 2015-05-18 12:10       ` Dr. David Alan Gilbert
  2015-05-18 12:22         ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-18 12:10 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/5/15 20:09, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>We should save PVM's RAM/device to slave when needed.
> >>
> >>For VM state, we  will cache them in slave, we use QEMUSizedBuffer
> >>to store the data, we need know the data size of VM state, so in master,
> >>we use qsb to store VM state temporarily, and then migrate the data to
> >>slave.
> >>
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >>Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>---
> >>  arch_init.c      | 22 ++++++++++++++++++--
> >>  migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  savevm.c         |  2 +-
> >>  3 files changed, 79 insertions(+), 7 deletions(-)
> >>
> >>diff --git a/arch_init.c b/arch_init.c
> >>index fcfa328..e928e11 100644
> >>--- a/arch_init.c
> >>+++ b/arch_init.c
> >>@@ -53,6 +53,7 @@
> >>  #include "hw/acpi/acpi.h"
> >>  #include "qemu/host-utils.h"
> >>  #include "qemu/rcu_queue.h"
> >>+#include "migration/migration-colo.h"
> >>
> >>  #ifdef DEBUG_ARCH_INIT
> >>  #define DPRINTF(fmt, ...) \
> >>@@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >>      RAMBlock *block;
> >>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
> >>
> >>+    /*
> >>+     * migration has already setup the bitmap, reuse it.
> >>+     */
> >>+    if (migrate_in_colo_state()) {
> >>+        goto setup_part;
> >>+    }
> >>+
> >
> >This is a bit odd.   It would be easier if you just moved the init code
> >inside this if, rather than goto'ing over it (or move the other code that
> >you actually want into another function that then gets called from the bottom
> >of here?)
> >The thing that also makes it especially odd is that you goto over
> >the rcu_read_lock and then have to fix it up; that's getting messy.
> >
> 
> Yes, here we reuse ram_save_setup in COLO's checkpoint process, the difference is
> in COLO's checkpoint process, we don't have to initialize these global variables again,
> which has been initialized in the previous first migration process. But we have to resend
> the info of ram_list.blocks. (Maybe this also not necessary ). I will split this function temporarily
> to make this look gentler.

Great.

> >(The qemu style seems to be OK to use goto to jump to a shared error
> >block at the end of a function but otherwise it should be rare).
> >
> >>      mig_throttle_on = false;
> >>      dirty_rate_high_cnt = 0;
> >>      bitmap_sync_count = 0;
> >>@@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >>      migration_bitmap_sync();
> >>      qemu_mutex_unlock_ramlist();
> >>      qemu_mutex_unlock_iothread();
> >>-
> >>+setup_part:
> >>      qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
> >>
> >>+    if (migrate_in_colo_state()) {
> >>+        rcu_read_lock();
> >>+    }
> >>      QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> >>          qemu_put_byte(f, strlen(block->idstr));
> >>          qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
> >>@@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >>      }
> >>
> >>      ram_control_after_iterate(f, RAM_CONTROL_FINISH);
> >>-    migration_end();
> >>+
> >>+    /*
> >>+     * Since we need to reuse dirty bitmap in colo,
> >>+     * don't cleanup the bitmap.
> >>+     */
> >>+    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
> >>+        migration_end();
> >>+    }
> >>
> >>      rcu_read_unlock();
> >>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> >>diff --git a/migration/colo.c b/migration/colo.c
> >>index 5a8ed1b..64e3f3a 100644
> >>--- a/migration/colo.c
> >>+++ b/migration/colo.c
> >>@@ -60,6 +60,9 @@ enum {
> >>
> >>  static QEMUBH *colo_bh;
> >>  static Coroutine *colo;
> >>+/* colo buffer */
> >>+#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
> >
> >Surely you want that as 4*1024*1024 ?  Anyway, now that qemu has
> >migrate_set_parameter, it's probably best to wire these magic numbers
> >to parameters that can be configured.
> >
> 
> Er, actually this macro can be any value, it does not matter, because qsb will grow automatically
> if the size of qsb is not enough. (Am i right?).
> And i don't think this internal used value should be exported to user. For now this size include
> the size of dirty pages and the size of device related data.

Yes, you're right; (However I'd still use a power of 2 for a size for memory, but maybe
that's just me)

> But, maybe i can use the new
> 'migrate_set_parameter' to achieve the capability of 'colo_set_checkpoint_period'.

Yes, it's good for that.

Dave

> 
> 
> Thanks,
> zhanghailiang
> 
> >>+QEMUSizedBuffer *colo_buffer;
> >>
> >>  bool colo_supported(void)
> >>  {
> >>@@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
> >>  static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> >>  {
> >>      int ret;
> >>+    size_t size;
> >>+    QEMUFile *trans = NULL;
> >>
> >>      ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> >>      if (ret < 0) {
> >>@@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> >>      if (ret < 0) {
> >>          goto out;
> >>      }
> >>+    /* Reset colo buffer and open it for write */
> >>+    qsb_set_length(colo_buffer, 0);
> >>+    trans = qemu_bufopen("w", colo_buffer);
> >>+    if (!trans) {
> >>+        error_report("Open colo buffer for write failed");
> >>+        goto out;
> >>+    }
> >>+
> >>+    /* suspend and save vm state to colo buffer */
> >>+    qemu_mutex_lock_iothread();
> >>+    vm_stop_force_state(RUN_STATE_COLO);
> >>+    qemu_mutex_unlock_iothread();
> >>+    DPRINTF("vm is stoped\n");
> >>+
> >>+    /* Disable block migration */
> >>+    s->params.blk = 0;
> >>+    s->params.shared = 0;
> >>+    qemu_savevm_state_begin(trans, &s->params);
> >>+    qemu_mutex_lock_iothread();
> >>+    qemu_savevm_state_complete(trans);
> >>+    qemu_mutex_unlock_iothread();
> >>
> >>-    /* TODO: suspend and save vm state to colo buffer */
> >>+    qemu_fflush(trans);
> >>
> >>      ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
> >>      if (ret < 0) {
> >>          goto out;
> >>      }
> >>+    /* we send the total size of the vmstate first */
> >>+    size = qsb_get_length(colo_buffer);
> >>+    ret = colo_ctl_put(s->file, size);
> >>+    if (ret < 0) {
> >>+        goto out;
> >>+    }
> >>
> >>-    /* TODO: send vmstate to slave */
> >>-
> >>+    qsb_put_buffer(s->file, colo_buffer, size);
> >>+    qemu_fflush(s->file);
> >>+    ret = qemu_file_get_error(s->file);
> >>+    if (ret < 0) {
> >>+        goto out;
> >>+    }
> >>      ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
> >>      if (ret < 0) {
> >>          goto out;
> >>@@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> >>      }
> >>      DPRINTF("got COLO_CHECKPOINT_LOADED\n");
> >>
> >>-    /* TODO: resume master */
> >>+    ret = 0;
> >>+    /* resume master */
> >>+    qemu_mutex_lock_iothread();
> >>+    vm_start();
> >>+    qemu_mutex_unlock_iothread();
> >>+    DPRINTF("vm resume to run again\n");
> >>
> >>  out:
> >>+    if (trans) {
> >>+        qemu_fclose(trans);
> >>+    }
> >>+
> >>      return ret;
> >>  }
> >>
> >>@@ -182,6 +227,12 @@ static void *colo_thread(void *opaque)
> >>      }
> >>      DPRINTF("get COLO_READY\n");
> >>
> >>+    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
> >>+    if (colo_buffer == NULL) {
> >>+        error_report("Failed to allocate colo buffer!");
> >>+        goto out;
> >>+    }
> >>+
> >>      qemu_mutex_lock_iothread();
> >>      vm_start();
> >>      qemu_mutex_unlock_iothread();
> >>@@ -197,6 +248,9 @@ static void *colo_thread(void *opaque)
> >>  out:
> >>      migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
> >>
> >>+    qsb_free(colo_buffer);
> >>+    colo_buffer = NULL;
> >>+
> >>      if (colo_control) {
> >>          qemu_fclose(colo_control);
> >>      }
> >>diff --git a/savevm.c b/savevm.c
> >>index 3b0e222..cd7ec27 100644
> >>--- a/savevm.c
> >>+++ b/savevm.c
> >>@@ -42,7 +42,7 @@
> >>  #include "qemu/iov.h"
> >>  #include "block/snapshot.h"
> >>  #include "block/qapi.h"
> >>-
> >>+#include "migration/migration-colo.h"
> >>
> >>  #ifndef ETH_P_RARP
> >>  #define ETH_P_RARP 0x8035
> >>--
> >>1.7.12.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint
  2015-05-18 12:10       ` Dr. David Alan Gilbert
@ 2015-05-18 12:22         ` zhanghailiang
  0 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-05-18 12:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Lai Jiangshan,
	Yang Hongyang, david

On 2015/5/18 20:10, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/5/15 20:09, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> We should save PVM's RAM/device to slave when needed.
>>>>
>>>> For VM state, we  will cache them in slave, we use QEMUSizedBuffer
>>>> to store the data, we need know the data size of VM state, so in master,
>>>> we use qsb to store VM state temporarily, and then migrate the data to
>>>> slave.
>>>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> ---
>>>>   arch_init.c      | 22 ++++++++++++++++++--
>>>>   migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>>>   savevm.c         |  2 +-
>>>>   3 files changed, 79 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch_init.c b/arch_init.c
>>>> index fcfa328..e928e11 100644
>>>> --- a/arch_init.c
>>>> +++ b/arch_init.c
>>>> @@ -53,6 +53,7 @@
>>>>   #include "hw/acpi/acpi.h"
>>>>   #include "qemu/host-utils.h"
>>>>   #include "qemu/rcu_queue.h"
>>>> +#include "migration/migration-colo.h"
>>>>
>>>>   #ifdef DEBUG_ARCH_INIT
>>>>   #define DPRINTF(fmt, ...) \
>>>> @@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>>       RAMBlock *block;
>>>>       int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>>>>
>>>> +    /*
>>>> +     * migration has already setup the bitmap, reuse it.
>>>> +     */
>>>> +    if (migrate_in_colo_state()) {
>>>> +        goto setup_part;
>>>> +    }
>>>> +
>>>
>>> This is a bit odd.   It would be easier if you just moved the init code
>>> inside this if, rather than goto'ing over it (or move the other code that
>>> you actually want into another function that then gets called from the bottom
>>> of here?)
>>> The thing that also makes it especially odd is that you goto over
>>> the rcu_read_lock and then have to fix it up; that's getting messy.
>>>
>>
>> Yes, here we reuse ram_save_setup in COLO's checkpoint process, the difference is
>> in COLO's checkpoint process, we don't have to initialize these global variables again,
>> which has been initialized in the previous first migration process. But we have to resend
>> the info of ram_list.blocks. (Maybe this also not necessary ). I will split this function temporarily
>> to make this look gentler.
>
> Great.
>
>>> (The qemu style seems to be OK to use goto to jump to a shared error
>>> block at the end of a function but otherwise it should be rare).
>>>
>>>>       mig_throttle_on = false;
>>>>       dirty_rate_high_cnt = 0;
>>>>       bitmap_sync_count = 0;
>>>> @@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>>       migration_bitmap_sync();
>>>>       qemu_mutex_unlock_ramlist();
>>>>       qemu_mutex_unlock_iothread();
>>>> -
>>>> +setup_part:
>>>>       qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>>>>
>>>> +    if (migrate_in_colo_state()) {
>>>> +        rcu_read_lock();
>>>> +    }
>>>>       QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>>>>           qemu_put_byte(f, strlen(block->idstr));
>>>>           qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
>>>> @@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>>>>       }
>>>>
>>>>       ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>>>> -    migration_end();
>>>> +
>>>> +    /*
>>>> +     * Since we need to reuse dirty bitmap in colo,
>>>> +     * don't cleanup the bitmap.
>>>> +     */
>>>> +    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
>>>> +        migration_end();
>>>> +    }
>>>>
>>>>       rcu_read_unlock();
>>>>       qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>>>> diff --git a/migration/colo.c b/migration/colo.c
>>>> index 5a8ed1b..64e3f3a 100644
>>>> --- a/migration/colo.c
>>>> +++ b/migration/colo.c
>>>> @@ -60,6 +60,9 @@ enum {
>>>>
>>>>   static QEMUBH *colo_bh;
>>>>   static Coroutine *colo;
>>>> +/* colo buffer */
>>>> +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
>>>
>>> Surely you want that as 4*1024*1024 ?  Anyway, now that qemu has
>>> migrate_set_parameter, it's probably best to wire these magic numbers
>>> to parameters that can be configured.
>>>
>>
>> Er, actually this macro can be any value, it does not matter, because qsb will grow automatically
>> if the size of qsb is not enough. (Am i right?).
>> And i don't think this internal used value should be exported to user. For now this size include
>> the size of dirty pages and the size of device related data.
>
> Yes, you're right; (However I'd still use a power of 2 for a size for memory, but maybe
> that's just me)
>

Er, I searched this in qemu codes, your advise about 'power of 2 for size' is very reasonable. ;)
I will fix that, Thanks!

>> But, maybe i can use the new
>> 'migrate_set_parameter' to achieve the capability of 'colo_set_checkpoint_period'.
>
> Yes, it's good for that.
>
> Dave
>
>>
>>
>> Thanks,
>> zhanghailiang
>>
>>>> +QEMUSizedBuffer *colo_buffer;
>>>>
>>>>   bool colo_supported(void)
>>>>   {
>>>> @@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
>>>>   static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>>>   {
>>>>       int ret;
>>>> +    size_t size;
>>>> +    QEMUFile *trans = NULL;
>>>>
>>>>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
>>>>       if (ret < 0) {
>>>> @@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>>>       if (ret < 0) {
>>>>           goto out;
>>>>       }
>>>> +    /* Reset colo buffer and open it for write */
>>>> +    qsb_set_length(colo_buffer, 0);
>>>> +    trans = qemu_bufopen("w", colo_buffer);
>>>> +    if (!trans) {
>>>> +        error_report("Open colo buffer for write failed");
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    /* suspend and save vm state to colo buffer */
>>>> +    qemu_mutex_lock_iothread();
>>>> +    vm_stop_force_state(RUN_STATE_COLO);
>>>> +    qemu_mutex_unlock_iothread();
>>>> +    DPRINTF("vm is stoped\n");
>>>> +
>>>> +    /* Disable block migration */
>>>> +    s->params.blk = 0;
>>>> +    s->params.shared = 0;
>>>> +    qemu_savevm_state_begin(trans, &s->params);
>>>> +    qemu_mutex_lock_iothread();
>>>> +    qemu_savevm_state_complete(trans);
>>>> +    qemu_mutex_unlock_iothread();
>>>>
>>>> -    /* TODO: suspend and save vm state to colo buffer */
>>>> +    qemu_fflush(trans);
>>>>
>>>>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>>>>       if (ret < 0) {
>>>>           goto out;
>>>>       }
>>>> +    /* we send the total size of the vmstate first */
>>>> +    size = qsb_get_length(colo_buffer);
>>>> +    ret = colo_ctl_put(s->file, size);
>>>> +    if (ret < 0) {
>>>> +        goto out;
>>>> +    }
>>>>
>>>> -    /* TODO: send vmstate to slave */
>>>> -
>>>> +    qsb_put_buffer(s->file, colo_buffer, size);
>>>> +    qemu_fflush(s->file);
>>>> +    ret = qemu_file_get_error(s->file);
>>>> +    if (ret < 0) {
>>>> +        goto out;
>>>> +    }
>>>>       ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
>>>>       if (ret < 0) {
>>>>           goto out;
>>>> @@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>>>>       }
>>>>       DPRINTF("got COLO_CHECKPOINT_LOADED\n");
>>>>
>>>> -    /* TODO: resume master */
>>>> +    ret = 0;
>>>> +    /* resume master */
>>>> +    qemu_mutex_lock_iothread();
>>>> +    vm_start();
>>>> +    qemu_mutex_unlock_iothread();
>>>> +    DPRINTF("vm resume to run again\n");
>>>>
>>>>   out:
>>>> +    if (trans) {
>>>> +        qemu_fclose(trans);
>>>> +    }
>>>> +
>>>>       return ret;
>>>>   }
>>>>
>>>> @@ -182,6 +227,12 @@ static void *colo_thread(void *opaque)
>>>>       }
>>>>       DPRINTF("get COLO_READY\n");
>>>>
>>>> +    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
>>>> +    if (colo_buffer == NULL) {
>>>> +        error_report("Failed to allocate colo buffer!");
>>>> +        goto out;
>>>> +    }
>>>> +
>>>>       qemu_mutex_lock_iothread();
>>>>       vm_start();
>>>>       qemu_mutex_unlock_iothread();
>>>> @@ -197,6 +248,9 @@ static void *colo_thread(void *opaque)
>>>>   out:
>>>>       migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
>>>>
>>>> +    qsb_free(colo_buffer);
>>>> +    colo_buffer = NULL;
>>>> +
>>>>       if (colo_control) {
>>>>           qemu_fclose(colo_control);
>>>>       }
>>>> diff --git a/savevm.c b/savevm.c
>>>> index 3b0e222..cd7ec27 100644
>>>> --- a/savevm.c
>>>> +++ b/savevm.c
>>>> @@ -42,7 +42,7 @@
>>>>   #include "qemu/iov.h"
>>>>   #include "block/snapshot.h"
>>>>   #include "block/qapi.h"
>>>> -
>>>> +#include "migration/migration-colo.h"
>>>>
>>>>   #ifndef ETH_P_RARP
>>>>   #define ETH_P_RARP 0x8035
>>>> --
>>>> 1.7.12.4
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint
  2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
@ 2015-05-18 16:48   ` Dr. David Alan Gilbert
  2015-05-19  6:08     ` zhanghailiang
  0 siblings, 1 reply; 51+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-18 16:48 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	dgilbert, arei.gonglei, amit.shah, peter.huangpeng,
	Yang Hongyang, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Besides normal checkpoint which according to the result of net packets
> comparing, We do additional checkpoint periodically, it will reduce the number
> of dirty pages when do one checkpoint, if we don't do checkpoint for a long
> time (This is a special case when the net packets is always consistent).
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  migration/colo.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 9ef4554..da5bc5e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -10,6 +10,7 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#include "qemu/timer.h"
>  #include "sysemu/sysemu.h"
>  #include "migration/migration-colo.h"
>  #include "qemu/error-report.h"
> @@ -32,6 +33,13 @@
>  */
>  #define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
>  
> +/*
> + * force checkpoint timer: unit ms
> + * this is large because COLO checkpoint will mostly depend on
> + * COLO compare module.
> + */
> +#define CHECKPOINT_MAX_PEROID 10000
> +
>  enum {
>      COLO_READY = 0x46,
>  
> @@ -340,14 +348,7 @@ static void *colo_thread(void *opaque)
>          proxy_checkpoint_req = colo_proxy_compare();
>          if (proxy_checkpoint_req < 0) {
>              goto out;
> -        } else if (!proxy_checkpoint_req) {
> -            /*
> -             * No checkpoint is needed, wait for 1ms and then
> -             * check if we need checkpoint again
> -             */
> -            g_usleep(1000);
> -            continue;
> -        } else {
> +        } else if (proxy_checkpoint_req) {
>              int64_t interval;
>  
>              current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> @@ -357,8 +358,20 @@ static void *colo_thread(void *opaque)
>                  g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
>              }
>              DPRINTF("Net packets is not consistent!!!\n");
> +            goto do_checkpoint;
> +        }
> +
> +        /*
> +         * No proxy checkpoint is request, wait for 100ms
> +         * and then check if we need checkpoint again.
> +         */
> +        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
> +            g_usleep(100000);
> +            continue;

This 100ms sleep is interesting - can you explain it's purpose; is it
just to save CPU time in the colo thread?  It used to be 1ms (above
and in the previous version).

The MIN_PERIOD already stops the checkpoints being too close together,
so this is a separate sleep from that.

Dave

>          }
>  
> +do_checkpoint:
>          /* start a colo checkpoint */
>          if (colo_do_checkpoint_transaction(s, colo_control)) {
>              goto out;
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint
  2015-05-18 16:48   ` Dr. David Alan Gilbert
@ 2015-05-19  6:08     ` zhanghailiang
  0 siblings, 0 replies; 51+ messages in thread
From: zhanghailiang @ 2015-05-19  6:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, amit.shah, Yang Hongyang, david

On 2015/5/19 0:48, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Besides normal checkpoint which according to the result of net packets
>> comparing, We do additional checkpoint periodically, it will reduce the number
>> of dirty pages when do one checkpoint, if we don't do checkpoint for a long
>> time (This is a special case when the net packets is always consistent).
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   migration/colo.c | 29 +++++++++++++++++++++--------
>>   1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 9ef4554..da5bc5e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,6 +10,7 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include "qemu/timer.h"
>>   #include "sysemu/sysemu.h"
>>   #include "migration/migration-colo.h"
>>   #include "qemu/error-report.h"
>> @@ -32,6 +33,13 @@
>>   */
>>   #define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
>>
>> +/*
>> + * force checkpoint timer: unit ms
>> + * this is large because COLO checkpoint will mostly depend on
>> + * COLO compare module.
>> + */
>> +#define CHECKPOINT_MAX_PEROID 10000
>> +
>>   enum {
>>       COLO_READY = 0x46,
>>
>> @@ -340,14 +348,7 @@ static void *colo_thread(void *opaque)
>>           proxy_checkpoint_req = colo_proxy_compare();
>>           if (proxy_checkpoint_req < 0) {
>>               goto out;
>> -        } else if (!proxy_checkpoint_req) {
>> -            /*
>> -             * No checkpoint is needed, wait for 1ms and then
>> -             * check if we need checkpoint again
>> -             */
>> -            g_usleep(1000);
>> -            continue;
>> -        } else {
>> +        } else if (proxy_checkpoint_req) {
>>               int64_t interval;
>>
>>               current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> @@ -357,8 +358,20 @@ static void *colo_thread(void *opaque)
>>                   g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
>>               }
>>               DPRINTF("Net packets is not consistent!!!\n");
>> +            goto do_checkpoint;
>> +        }
>> +
>> +        /*
>> +         * No proxy checkpoint is request, wait for 100ms
>> +         * and then check if we need checkpoint again.
>> +         */
>> +        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> +        if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
>> +            g_usleep(100000);
>> +            continue;
>
> This 100ms sleep is interesting - can you explain it's purpose; is it
> just to save CPU time in the colo thread?  It used to be 1ms (above
> and in the previous version).
>

You are right, it is just to save CPU time. I'm not sure but if '1ms' is a little too short ?
In our latest patch, we actually will send some dirty pages to slave side during this sleep time
if there are dirty pages.

> The MIN_PERIOD already stops the checkpoints being too close together,
> so this is a separate sleep from that.
>>           }
>>
>> +do_checkpoint:
>>           /* start a colo checkpoint */
>>           if (colo_do_checkpoint_transaction(s, colo_control)) {
>>               goto out;
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-05-19  6:09 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 02/28] migration: Introduce capability 'colo' to migration zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave zhanghailiang
2015-05-15 11:38   ` Dr. David Alan Gilbert
2015-05-18  5:04     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 04/28] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 05/28] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 06/28] COLO: Implement colo checkpoint protocol zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-05-15 11:28   ` Dr. David Alan Gilbert
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-05-15 11:56   ` Dr. David Alan Gilbert
2015-05-18  5:10     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint zhanghailiang
2015-05-15 12:09   ` Dr. David Alan Gilbert
2015-05-18  9:11     ` zhanghailiang
2015-05-18 12:10       ` Dr. David Alan Gilbert
2015-05-18 12:22         ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 10/28] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 11/28] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 12/28] arch_init: Start to trace dirty pages of SVM zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 13/28] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 14/28] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 15/28] COLO failover: Implement COLO master/slave failover work zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 16/28] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 17/28] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 18/28] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 19/28] COLO NIC: Implement colo nic device interface configure() zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 20/28] COLO NIC : Implement colo nic init/destroy function zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 21/28] COLO NIC: Some init work related with proxy module zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 22/28] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
2015-05-18 16:48   ` Dr. David Alan Gilbert
2015-05-19  6:08     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 24/28] COLO: Add colo-set-checkpoint-period command zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 25/28] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 26/28] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 27/28] COLO: Implement shutdown checkpoint zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 28/28] COLO: Add block replication into colo process zhanghailiang
2015-04-08  8:16 ` [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-04-22 11:18   ` Dr. David Alan Gilbert
2015-04-24  7:25     ` Wen Congyang
2015-04-24  8:35       ` Dr. David Alan Gilbert
2015-04-28 10:51         ` zhanghailiang
2015-05-06 17:11           ` Dr. David Alan Gilbert
2015-04-24  8:52     ` zhanghailiang
2015-04-24  8:56       ` Dr. David Alan Gilbert
2015-05-14 12:14 ` Dr. David Alan Gilbert
2015-05-14 12:58   ` zhanghailiang
2015-05-14 16:09     ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.