All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2015-12-15  8:22 zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (38 more replies)
  0 siblings, 39 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

This is the 12th version of COLO.

As usual, this version of COLO is only support periodic checkpoint,
just like MicroCheckpointing and Remus does.

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
  Remove the nbd child from the quorum:
  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
  Note: there is no qmp command to remove the blockdev now

Secondary:
  The primary host is down, so we should do the following thing:
  { 'execute': 'nbd-server-stop' }

Please review, thanks.

TODO:
1. Implement packets compare module (proxy) in qemu (Doing)
2. Checkpoint based on proxy in qemu
3. The capability of continuous FT

v12:
 - Fix the bug that default buffer filter broken vhost-net.
 - Add an flag in struct NetFilterState to help skipping default
  filter for packets travelling through filter layer.
 - Remove the default failover treatment which may cause split-brain.
 - Rename checkpoint-delay to x-checkpoint-delay.
 - Check if all netdev supports default filter before going into COLO.
 - Reconstruct send/receive helper functions in patch 10.
 - Address serveral other comments from Dave 

v11:
 - Re-implement buffer/release packets based on filter-buffer according
   to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
 - Rebase master to re-use some stuff introduced by post-copy.
 - Address several comments from Eric and Dave, the fixing record can
   be found in each patch.

v10:
 - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
 - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
 - Simplify the process of primary side by dropping colo thread and reusing
   migration thread. (Dave's suggestion)
 - Add several netfilter related APIs to support buffer/release packets
   for COLO (patch 32 ~ patch 36)

zhanghailiang (38):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Export migrate_set_state()
  migration: Add state records for migration incoming
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  migration: Rename the'file' member of MigrationState
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  ram: Split host_from_stream_offset() into two helper functions
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add event notification for COLO error
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Split load vm state function qemu_loadvm_state
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter-buffer: Add default filter-buffer for each netdev
  filter-buffer: Accept zero interval
  filter-buffer: Introduce a helper function to enable/disable default
    filter
  filter-buffer: Introduce a helper function to release packets
  colo: Use default buffer-filter to buffer and release packets
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  17 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   9 +-
 include/migration/colo.h      |  38 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  18 +-
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |  12 +
 include/net/net.h             |   5 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  71 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
 migration/exec.c              |   4 +-
 migration/fd.c                |   4 +-
 migration/migration.c         | 216 ++++++++----
 migration/postcopy-ram.c      |   6 +-
 migration/qemu-file-buf.c     |  61 ++++
 migration/ram.c               | 213 ++++++++++--
 migration/rdma.c              |   2 +-
 migration/savevm.c            | 295 ++++++++++++----
 migration/tcp.c               |   4 +-
 migration/unix.c              |   4 +-
 net/filter-buffer.c           | 127 ++++++-
 net/filter.c                  |   6 +-
 net/net.c                     |  58 ++++
 qapi-schema.json              | 106 +++++-
 qapi/event.json               |  17 +
 qmp-commands.hx               |  24 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  45 +++
 trace-events                  |  10 +
 vl.c                          |  37 +-
 37 files changed, 2152 insertions(+), 195 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  9:46   ` Wen Congyang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is On by default.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Turn COLO on in default (Eric's suggestion)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 configure | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/configure b/configure
index b9552fd..32e466f 100755
--- a/configure
+++ b/configure
@@ -260,6 +260,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -939,6 +940,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1362,6 +1367,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   fdt             fdt device tree
   bluez           bluez stack connectivity
   kvm             KVM acceleration support
+  colo            COarse-grain LOck-stepping VM for Non-stop Service
   rdma            RDMA-based migration support
   uuid            uuid support
   vde             support for vde network
@@ -4792,6 +4798,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "COLO support      $colo"
 echo "RDMA support      $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
@@ -5381,6 +5388,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node zhanghailiang
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, hongyang.yang

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v10:
- Rename capability 'colo' to experimental 'x-colo' (Eric's suggestion).
- Rename migrate_enable_colo() to migrate_colo_enabled() (Eric's suggestion).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h      | 20 ++++++++++++++++++++
 include/migration/migration.h |  1 +
 migration/Makefile.objs       |  1 +
 migration/colo.c              | 18 ++++++++++++++++++
 migration/migration.c         | 17 +++++++++++++++++
 qapi-schema.json              |  6 +++++-
 qmp-commands.hx               |  1 +
 stubs/Makefile.objs           |  1 +
 stubs/migration-colo.c        | 18 ++++++++++++++++++
 9 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
new file mode 100644
index 0000000..c60a590
--- /dev/null
+++ b/include/migration/colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_COLO_H
+#define QEMU_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index fd018b7..1f004e4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -268,6 +268,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 0cac6d7..65ecc35 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 0000000..2c40d2e
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+    return true;
+}
diff --git a/migration/migration.c b/migration/migration.c
index adc6b6f..0d525ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -33,6 +33,7 @@
 #include "qom/cpu.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "migration/colo.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -480,6 +481,9 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 
     caps = NULL; /* silence compiler warning */
     for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+        if (i == MIGRATION_CAPABILITY_X_COLO && !colo_supported()) {
+            continue;
+        }
         if (head == NULL) {
             head = g_malloc0(sizeof(*caps));
             caps = head;
@@ -679,6 +683,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 
     for (cap = params; cap; cap = cap->next) {
+        if (cap->value->capability == MIGRATION_CAPABILITY_X_COLO &&
+            !colo_supported()) {
+            error_setg(errp, "COLO is not currently supported, please"
+                             " configure with --enable-colo option in order to"
+                             " support COLO feature");
+            continue;
+        }
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
 
@@ -1581,6 +1592,12 @@ fail:
     migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
 }
 
+bool migrate_colo_enabled(void)
+{
+    MigrationState *s = migrate_get_current();
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
diff --git a/qapi-schema.json b/qapi-schema.json
index 8b1a423..d20c0ec 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -546,11 +546,15 @@
 #          been migrated, pulling the remaining pages along as needed. NOTE: If
 #          the migration fails during postcopy the VM will fail.  (since 2.5)
 #
+# @x-colo: If enabled, migration will never end, and the state of the VM on the
+#        primary side will be migrated continuously to the VM on secondary
+#        side. (since 2.6)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-           'compress', 'events', 'x-postcopy-ram'] }
+           'compress', 'events', 'x-postcopy-ram', 'x-colo'] }
 
 ##
 # @MigrationCapabilityStatus
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 517cdf1..91979b4 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3625,6 +3625,7 @@ Query current migration capabilities
          - "rdma-pin-all" : RDMA Pin Page state (json-bool)
          - "auto-converge" : Auto Converge state (json-bool)
          - "zero-blocks" : Zero Blocks state (json-bool)
+         - "x-colo" : COarse-Grain LOck Stepping for Non-stop Service (json-bool)
 
 Arguments:
 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index d7898a0..12eb4c6 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -38,3 +38,4 @@ stub-obj-y += qmp_pc_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += target-get-monitor-def.o
 stub-obj-y += vhost.o
+stub-obj-y += migration-colo.o
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
new file mode 100644
index 0000000..3d817df
--- /dev/null
+++ b/stubs/migration-colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+    return false;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state() zhanghailiang
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We can know if VM in destination should go into COLO mode by refer to
the info that been migrated from PVM.

We skip this section if colo is not enabled (i.e.
migrate_set_capability colo off), so that, It not break compatibility with migration
however the --enable-colo/disable-colo on the source/destination;

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Add Reviewed-by tag
v10:
- Use VMSTATE_BOOL instead of VMSTATE_UNIT32 for 'colo_requested' (Dave's suggestion).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h |  2 ++
 migration/Makefile.objs  |  1 +
 migration/colo-comm.c    | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                     |  3 ++-
 4 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index c60a590..9b6662d 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -14,7 +14,9 @@
 #define QEMU_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_mig_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 65ecc35..81b5713 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 0000000..fb407e0
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,50 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <migration/colo.h>
+#include "trace.h"
+
+typedef struct {
+     bool colo_requested;
+} COLOInfo;
+
+static COLOInfo colo_info;
+
+static void colo_info_pre_save(void *opaque)
+{
+    COLOInfo *s = opaque;
+
+    s->colo_requested = migrate_colo_enabled();
+}
+
+static bool colo_info_need(void *opaque)
+{
+   return migrate_colo_enabled();
+}
+
+static const VMStateDescription colo_state = {
+     .name = "COLOState",
+     .version_id = 1,
+     .minimum_version_id = 1,
+     .pre_save = colo_info_pre_save,
+     .needed = colo_info_need,
+     .fields = (VMStateField[]) {
+         VMSTATE_BOOL(colo_requested, COLOInfo),
+         VMSTATE_END_OF_LIST()
+        },
+};
+
+void colo_info_mig_init(void)
+{
+    vmstate_register(NULL, 0, &colo_state, &colo_info);
+}
diff --git a/vl.c b/vl.c
index 4211ff1..f84fde8 100644
--- a/vl.c
+++ b/vl.c
@@ -91,6 +91,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4450,7 +4451,7 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
-
+    colo_info_mig_init();
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
     if (machine_class->units_per_default_bus) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state()
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (2 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming zhanghailiang
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Fix the first parameter of migrate_set_state(), and export it.
We will use it in later.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- New patch which is split from patch
  'migration: Add state records for migration incoming' (Juan's suggestion)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  2 ++
 migration/migration.c         | 36 +++++++++++++++++++++---------------
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1f004e4..4b19e80 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -169,6 +169,8 @@ struct MigrationState
     RAMBlock *last_req_rb;
 };
 
+void migrate_set_state(int *state, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index 0d525ee..c9cd80d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -798,9 +798,9 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(int *state, int old_state, int new_state)
 {
-    if (atomic_cmpxchg(&s->state, old_state, new_state) == old_state) {
+    if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
         trace_migrate_set_state(new_state);
         migrate_generate_event(new_state);
     }
@@ -833,7 +833,7 @@ static void migrate_fd_cleanup(void *opaque)
            (s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE));
 
     if (s->state == MIGRATION_STATUS_CANCELLING) {
-        migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
+        migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,
                           MIGRATION_STATUS_CANCELLED);
     }
 
@@ -844,7 +844,8 @@ void migrate_fd_error(MigrationState *s)
 {
     trace_migrate_fd_error();
     assert(s->file == NULL);
-    migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED);
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_FAILED);
     notifier_list_notify(&migration_state_notifiers, s);
 }
 
@@ -864,7 +865,7 @@ static void migrate_fd_cancel(MigrationState *s)
         if (!migration_is_setup_or_active(old_state)) {
             break;
         }
-        migrate_set_state(s, old_state, MIGRATION_STATUS_CANCELLING);
+        migrate_set_state(&s->state, old_state, MIGRATION_STATUS_CANCELLING);
     } while (s->state != MIGRATION_STATUS_CANCELLING);
 
     /*
@@ -938,7 +939,7 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->migration_thread_running = false;
     s->last_req_rb = NULL;
 
-    migrate_set_state(s, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
+    migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
     QSIMPLEQ_INIT(&s->src_page_requests);
 
@@ -1037,7 +1038,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
-        migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED);
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_FAILED);
         return;
     }
 
@@ -1416,7 +1418,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     int ret;
     const QEMUSizedBuffer *qsb;
     int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-    migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
+    migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     trace_postcopy_start();
@@ -1507,7 +1509,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     ret = qemu_file_get_error(ms->file);
     if (ret) {
         error_report("postcopy_start: Migration stream errored");
-        migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+        migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                               MIGRATION_STATUS_FAILED);
     }
 
@@ -1516,7 +1518,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
 fail_closefb:
     qemu_fclose(fb);
 fail:
-    migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+    migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                           MIGRATION_STATUS_FAILED);
     qemu_mutex_unlock_iothread();
     return -1;
@@ -1585,11 +1587,13 @@ static void migration_completion(MigrationState *s, int current_active_state,
         goto fail;
     }
 
-    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
+    migrate_set_state(&s->state, current_active_state,
+                      MIGRATION_STATUS_COMPLETED);
     return;
 
 fail:
-    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+    migrate_set_state(&s->state, current_active_state,
+                      MIGRATION_STATUS_FAILED);
 }
 
 bool migrate_colo_enabled(void)
@@ -1640,7 +1644,8 @@ static void *migration_thread(void *opaque)
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     current_active_state = MIGRATION_STATUS_ACTIVE;
-    migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_ACTIVE);
 
     trace_migration_thread_setup_complete();
 
@@ -1683,7 +1688,8 @@ static void *migration_thread(void *opaque)
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+            migrate_set_state(&s->state, current_active_state,
+                              MIGRATION_STATUS_FAILED);
             trace_migration_thread_file_err();
             break;
         }
@@ -1764,7 +1770,7 @@ void migrate_fd_connect(MigrationState *s)
     if (migrate_postcopy_ram()) {
         if (open_return_path_on_source(s)) {
             error_report("Unable to open return-path for postcopy");
-            migrate_set_state(s, MIGRATION_STATUS_SETUP,
+            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                               MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
             return;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (3 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state() zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 17:36   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

For migration destination, we also need to know its state,
we will use it in COLO.

Here we add a new member 'state' for MigrationIncomingState,
and also use migrate_set_state() to modify its value.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Split exporting migrate_set_state() part into a new patch (Juan's suggestion)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  1 +
 migration/migration.c         | 14 +++++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4b19e80..99dfa92 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -105,6 +105,7 @@ struct MigrationIncomingState {
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
 
+    int state;
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/migration/migration.c b/migration/migration.c
index c9cd80d..d58ce98 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -112,6 +112,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
 {
     mis_current = g_new0(MigrationIncomingState, 1);
     mis_current->from_src_file = f;
+    mis_current->state = MIGRATION_STATUS_NONE;
     QLIST_INIT(&mis_current->loadvm_handlers);
     qemu_mutex_init(&mis_current->rp_mutex);
     qemu_event_init(&mis_current->main_thread_load_event, false);
@@ -332,8 +333,8 @@ static void process_incoming_migration_co(void *opaque)
 
     mis = migration_incoming_state_new(f);
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
-    migrate_generate_event(MIGRATION_STATUS_ACTIVE);
-
+    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
+                      MIGRATION_STATUS_ACTIVE);
     ret = qemu_loadvm_state(f);
 
     ps = postcopy_state_get();
@@ -362,7 +363,8 @@ static void process_incoming_migration_co(void *opaque)
     migration_incoming_state_destroy();
 
     if (ret < 0) {
-        migrate_generate_event(MIGRATION_STATUS_FAILED);
+        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
         error_report("load of migration failed: %s", strerror(-ret));
         migrate_decompress_threads_join();
         exit(EXIT_FAILURE);
@@ -371,7 +373,8 @@ static void process_incoming_migration_co(void *opaque)
     /* Make sure all file formats flush their mutable metadata */
     bdrv_invalidate_cache_all(&local_err);
     if (local_err) {
-        migrate_generate_event(MIGRATION_STATUS_FAILED);
+        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
         error_report_err(local_err);
         migrate_decompress_threads_join();
         exit(EXIT_FAILURE);
@@ -403,7 +406,8 @@ static void process_incoming_migration_co(void *opaque)
      * observer sees this event they might start to prod at the VM assuming
      * it's ready to use.
      */
-    migrate_generate_event(MIGRATION_STATUS_COMPLETED);
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COMPLETED);
 }
 
 void process_incoming_migration(QEMUFile *f)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (4 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.

We reuse migration thread, so if colo is enabled by user, migration thread will
go into the process of colo.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- Rebase to master
- Add Reviewed-by tag
v10:
- Simplify process by dropping colo thread and reusing migration thread.
     (Dave's suggestion)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h |  3 +++
 migration/colo.c         | 31 +++++++++++++++++++++++++++++++
 migration/migration.c    | 30 ++++++++++++++++++++++++++----
 qapi-schema.json         |  4 +++-
 stubs/migration-colo.c   |  9 +++++++++
 trace-events             |  3 +++
 6 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 9b6662d..f462f06 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+void migrate_start_colo_process(MigrationState *s);
+bool migration_in_colo_state(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 2c40d2e..cf0ccb8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,9 +10,40 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "trace.h"
 
 bool colo_supported(void)
 {
     return true;
 }
+
+bool migration_in_colo_state(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void colo_process_checkpoint(MigrationState *s)
+{
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("stop", "run");
+
+    /*TODO: COLO checkpoint savevm loop*/
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+    qemu_mutex_unlock_iothread();
+    migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COLO);
+    colo_process_checkpoint(s);
+    qemu_mutex_lock_iothread();
+}
diff --git a/migration/migration.c b/migration/migration.c
index d58ce98..99b870d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -640,6 +640,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        /* TODO: display COLO specific information (checkpoint info etc.) */
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -999,7 +1003,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.shared = has_inc && inc;
 
     if (migration_is_setup_or_active(s->state) ||
-        s->state == MIGRATION_STATUS_CANCELLING) {
+        s->state == MIGRATION_STATUS_CANCELLING ||
+        s->state == MIGRATION_STATUS_COLO) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -1591,8 +1596,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
         goto fail;
     }
 
-    migrate_set_state(&s->state, current_active_state,
-                      MIGRATION_STATUS_COMPLETED);
+    if (!migrate_colo_enabled()) {
+        migrate_set_state(&s->state, current_active_state,
+                          MIGRATION_STATUS_COMPLETED);
+    }
+
     return;
 
 fail:
@@ -1624,6 +1632,7 @@ static void *migration_thread(void *opaque)
     bool entered_postcopy = false;
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
+    bool enable_colo = migrate_colo_enabled();
 
     rcu_register_thread();
 
@@ -1731,7 +1740,11 @@ static void *migration_thread(void *opaque)
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     qemu_mutex_lock_iothread();
-    qemu_savevm_state_cleanup();
+    /* The resource has been allocated by migration will be reused in COLO
+      process, so don't release them. */
+    if (!enable_colo) {
+        qemu_savevm_state_cleanup();
+    }
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
@@ -1744,6 +1757,15 @@ static void *migration_thread(void *opaque)
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
+        if (s->state == MIGRATION_STATUS_ACTIVE && enable_colo) {
+            migrate_start_colo_process(s);
+            qemu_savevm_state_cleanup();
+            /*
+            * Fixme: we will run VM in COLO no matter its old running state.
+            * After exited COLO, we will keep running.
+            */
+            old_vm_running = true;
+        }
         if (old_vm_running && !entered_postcopy) {
             vm_start();
         }
diff --git a/qapi-schema.json b/qapi-schema.json
index d20c0ec..c9ff34e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -436,12 +436,14 @@
 #
 # @failed: some error occurred during migration process.
 #
+# @colo: VM is in the process of fault tolerance. (since 2.6)
+#
 # Since: 2.3
 #
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
 
 ##
 # @MigrationInfo
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 3d817df..acddca6 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -16,3 +16,12 @@ bool colo_supported(void)
 {
     return false;
 }
+
+bool migration_in_colo_state(void)
+{
+    return false;
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+}
diff --git a/trace-events b/trace-events
index 2fce98e..5565e79 100644
--- a/trace-events
+++ b/trace-events
@@ -1577,6 +1577,9 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 
+# migration/colo.c
+colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (5 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 08/38] migration: Rename the'file' member of MigrationState zhanghailiang
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.
We add three new members to struct MigrationIncomingState, 'have_colo_incoming_thread'
and 'colo_incoming_thread' record the colo related threads for secondary VM,
'migration_incoming_co' records the original migration incoming coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- We moved the place of bdrv_invalidate_cache_all(), but done the deleting work
  in other patch. Fix it.
- Add documentation for colo in 'MigrationStatus' (Eric's review comment)
v10:
- fix a bug about fd leak which is found by Dave.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h      |  7 +++++++
 include/migration/migration.h |  7 +++++++
 migration/colo-comm.c         | 10 ++++++++++
 migration/colo.c              | 22 ++++++++++++++++++++++
 migration/migration.c         | 31 +++++++++++++++++++++----------
 stubs/migration-colo.c        | 10 ++++++++++
 6 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index f462f06..2676c4a 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -15,6 +15,8 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "qemu/coroutine_int.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
@@ -22,4 +24,9 @@ void colo_info_mig_init(void);
 void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
+/* loadvm */
+bool migration_incoming_enable_colo(void);
+void migration_incoming_exit_colo(void);
+void *colo_process_incoming_thread(void *opaque);
+bool migration_incoming_in_colo_state(void);
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 99dfa92..a57a734 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -22,6 +22,7 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 #include "exec/cpu-common.h"
+#include "qemu/coroutine_int.h"
 
 #define QEMU_VM_FILE_MAGIC           0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
@@ -106,6 +107,12 @@ struct MigrationIncomingState {
     void     *postcopy_tmp_page;
 
     int state;
+
+    bool have_colo_incoming_thread;
+    QemuThread colo_incoming_thread;
+    /* The coroutine we should enter (back) after failover */
+    Coroutine *migration_incoming_co;
+
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index fb407e0..30df3d3 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -48,3 +48,13 @@ void colo_info_mig_init(void)
 {
     vmstate_register(NULL, 0, &colo_state, &colo_info);
 }
+
+bool migration_incoming_enable_colo(void)
+{
+    return colo_info.colo_requested;
+}
+
+void migration_incoming_exit_colo(void)
+{
+    colo_info.colo_requested = 0;
+}
diff --git a/migration/colo.c b/migration/colo.c
index cf0ccb8..6880aa0 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
+#include "qemu/error-report.h"
 
 bool colo_supported(void)
 {
@@ -26,6 +27,13 @@ bool migration_in_colo_state(void)
     return (s->state == MIGRATION_STATUS_COLO);
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    return mis && (mis->state == MIGRATION_STATUS_COLO);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     qemu_mutex_lock_iothread();
@@ -47,3 +55,17 @@ void migrate_start_colo_process(MigrationState *s)
     colo_process_checkpoint(s);
     qemu_mutex_lock_iothread();
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COLO);
+
+    /* TODO: COLO checkpoint restore loop */
+
+    migration_incoming_exit_colo();
+
+    return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 99b870d..d5691c2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -358,6 +358,27 @@ static void process_incoming_migration_co(void *opaque)
         /* Else if something went wrong then just fall out of the normal exit */
     }
 
+    if (!ret) {
+        /* Make sure all file formats flush their mutable metadata */
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            migrate_decompress_threads_join();
+            exit(EXIT_FAILURE);
+        }
+    }
+    /* we get colo info, and know if we are in colo mode */
+    if (!ret && migration_incoming_enable_colo()) {
+        mis->migration_incoming_co = qemu_coroutine_self();
+        qemu_thread_create(&mis->colo_incoming_thread, "colo incoming",
+             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+        mis->have_colo_incoming_thread = true;
+        qemu_coroutine_yield();
+
+        /* Wait checkpoint incoming thread exit before free resource */
+        qemu_thread_join(&mis->colo_incoming_thread);
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
@@ -370,16 +391,6 @@ static void process_incoming_migration_co(void *opaque)
         exit(EXIT_FAILURE);
     }
 
-    /* Make sure all file formats flush their mutable metadata */
-    bdrv_invalidate_cache_all(&local_err);
-    if (local_err) {
-        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                          MIGRATION_STATUS_FAILED);
-        error_report_err(local_err);
-        migrate_decompress_threads_join();
-        exit(EXIT_FAILURE);
-    }
-
     /*
      * This must happen after all error conditions are dealt with and
      * we're sure the VM is going to be running on this host.
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index acddca6..c12516e 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -22,6 +22,16 @@ bool migration_in_colo_state(void)
     return false;
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+    return false;
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+    return NULL;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 08/38] migration: Rename the'file' member of MigrationState
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (6 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Rename the 'file' member of MigrationState to 'to_dst_file'.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
- Add the missed modification for RDMA migration. (Found by Wen Congyang)
v11:
- Only rename 'file' member of MigrationState

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  2 +-
 migration/exec.c              |  4 +--
 migration/fd.c                |  4 +--
 migration/migration.c         | 72 ++++++++++++++++++++++---------------------
 migration/postcopy-ram.c      |  6 ++--
 migration/rdma.c              |  2 +-
 migration/savevm.c            |  2 +-
 migration/tcp.c               |  4 +--
 migration/unix.c              |  4 +--
 9 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index a57a734..ba5bcec 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -140,7 +140,7 @@ struct MigrationState
     size_t xfer_limit;
     QemuThread thread;
     QEMUBH *cleanup_bh;
-    QEMUFile *file;
+    QEMUFile *to_dst_file;
     int parameters[MIGRATION_PARAMETER_MAX];
 
     int state;
diff --git a/migration/exec.c b/migration/exec.c
index 8406d2b..9037109 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -36,8 +36,8 @@
 
 void exec_start_outgoing_migration(MigrationState *s, const char *command, Error **errp)
 {
-    s->file = qemu_popen_cmd(command, "w");
-    if (s->file == NULL) {
+    s->to_dst_file = qemu_popen_cmd(command, "w");
+    if (s->to_dst_file == NULL) {
         error_setg_errno(errp, errno, "failed to popen the migration target");
         return;
     }
diff --git a/migration/fd.c b/migration/fd.c
index 3e4bed0..9a9d6c5 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -50,9 +50,9 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
     }
 
     if (fd_is_socket(fd)) {
-        s->file = qemu_fopen_socket(fd, "wb");
+        s->to_dst_file = qemu_fopen_socket(fd, "wb");
     } else {
-        s->file = qemu_fdopen(fd, "wb");
+        s->to_dst_file = qemu_fdopen(fd, "wb");
     }
 
     migrate_fd_connect(s);
diff --git a/migration/migration.c b/migration/migration.c
index d5691c2..a1074c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -834,7 +834,7 @@ static void migrate_fd_cleanup(void *opaque)
 
     flush_page_queue(s);
 
-    if (s->file) {
+    if (s->to_dst_file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
         if (s->migration_thread_running) {
@@ -844,8 +844,8 @@ static void migrate_fd_cleanup(void *opaque)
         qemu_mutex_lock_iothread();
 
         migrate_compress_threads_join();
-        qemu_fclose(s->file);
-        s->file = NULL;
+        qemu_fclose(s->to_dst_file);
+        s->to_dst_file = NULL;
     }
 
     assert((s->state != MIGRATION_STATUS_ACTIVE) &&
@@ -862,7 +862,7 @@ static void migrate_fd_cleanup(void *opaque)
 void migrate_fd_error(MigrationState *s)
 {
     trace_migrate_fd_error();
-    assert(s->file == NULL);
+    assert(s->to_dst_file == NULL);
     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                       MIGRATION_STATUS_FAILED);
     notifier_list_notify(&migration_state_notifiers, s);
@@ -871,7 +871,7 @@ void migrate_fd_error(MigrationState *s)
 static void migrate_fd_cancel(MigrationState *s)
 {
     int old_state ;
-    QEMUFile *f = migrate_get_current()->file;
+    QEMUFile *f = migrate_get_current()->to_dst_file;
     trace_migrate_fd_cancel();
 
     if (s->rp_state.from_dst_file) {
@@ -942,7 +942,7 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->bytes_xfer = 0;
     s->xfer_limit = 0;
     s->cleanup_bh = 0;
-    s->file = NULL;
+    s->to_dst_file = NULL;
     s->state = MIGRATION_STATUS_NONE;
     s->params = *params;
     s->rp_state.from_dst_file = NULL;
@@ -1122,8 +1122,9 @@ void qmp_migrate_set_speed(int64_t value, Error **errp)
 
     s = migrate_get_current();
     s->bandwidth_limit = value;
-    if (s->file) {
-        qemu_file_set_rate_limit(s->file, s->bandwidth_limit / XFER_LIMIT_RATIO);
+    if (s->to_dst_file) {
+        qemu_file_set_rate_limit(s->to_dst_file,
+                                 s->bandwidth_limit / XFER_LIMIT_RATIO);
     }
 }
 
@@ -1393,7 +1394,7 @@ out:
 static int open_return_path_on_source(MigrationState *ms)
 {
 
-    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->file);
+    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
     if (!ms->rp_state.from_dst_file) {
         return -1;
     }
@@ -1415,7 +1416,7 @@ static int await_return_path_close_on_source(MigrationState *ms)
      * rp_thread will exit, however if there's an error we need to cause
      * it to exit.
      */
-    if (qemu_file_get_error(ms->file) && ms->rp_state.from_dst_file) {
+    if (qemu_file_get_error(ms->to_dst_file) && ms->rp_state.from_dst_file) {
         /*
          * shutdown(2), if we have it, will cause it to unblock if it's stuck
          * waiting for the destination.
@@ -1458,7 +1459,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      * Cause any non-postcopiable, but iterative devices to
      * send out their final data.
      */
-    qemu_savevm_state_complete_precopy(ms->file, true);
+    qemu_savevm_state_complete_precopy(ms->to_dst_file, true);
 
     /*
      * in Finish migrate and with the io-lock held everything should
@@ -1476,9 +1477,9 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      * will notice we're in POSTCOPY_ACTIVE and not actually
      * wrap their state up here
      */
-    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
     /* Ping just for debugging, helps line traces up */
-    qemu_savevm_send_ping(ms->file, 2);
+    qemu_savevm_send_ping(ms->to_dst_file, 2);
 
     /*
      * While loading the device state we may trigger page transfer
@@ -1512,7 +1513,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     qsb = qemu_buf_get(fb);
 
     /* Now send that blob */
-    if (qemu_savevm_send_packaged(ms->file, qsb)) {
+    if (qemu_savevm_send_packaged(ms->to_dst_file, qsb)) {
         goto fail_closefb;
     }
     qemu_fclose(fb);
@@ -1524,9 +1525,9 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      * Although this ping is just for debug, it could potentially be
      * used for getting a better measurement of downtime at the source.
      */
-    qemu_savevm_send_ping(ms->file, 4);
+    qemu_savevm_send_ping(ms->to_dst_file, 4);
 
-    ret = qemu_file_get_error(ms->file);
+    ret = qemu_file_get_error(ms->to_dst_file);
     if (ret) {
         error_report("postcopy_start: Migration stream errored");
         migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
@@ -1569,8 +1570,8 @@ static void migration_completion(MigrationState *s, int current_active_state,
         if (!ret) {
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             if (ret >= 0) {
-                qemu_file_set_rate_limit(s->file, INT64_MAX);
-                qemu_savevm_state_complete_precopy(s->file, false);
+                qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
+                qemu_savevm_state_complete_precopy(s->to_dst_file, false);
             }
         }
         qemu_mutex_unlock_iothread();
@@ -1581,7 +1582,7 @@ static void migration_completion(MigrationState *s, int current_active_state,
     } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
         trace_migration_completion_postcopy_end();
 
-        qemu_savevm_state_complete_postcopy(s->file);
+        qemu_savevm_state_complete_postcopy(s->to_dst_file);
         trace_migration_completion_postcopy_end_after_complete();
     }
 
@@ -1602,7 +1603,7 @@ static void migration_completion(MigrationState *s, int current_active_state,
         }
     }
 
-    if (qemu_file_get_error(s->file)) {
+    if (qemu_file_get_error(s->to_dst_file)) {
         trace_migration_completion_file_err();
         goto fail;
     }
@@ -1647,24 +1648,24 @@ static void *migration_thread(void *opaque)
 
     rcu_register_thread();
 
-    qemu_savevm_state_header(s->file);
+    qemu_savevm_state_header(s->to_dst_file);
 
     if (migrate_postcopy_ram()) {
         /* Now tell the dest that it should open its end so it can reply */
-        qemu_savevm_send_open_return_path(s->file);
+        qemu_savevm_send_open_return_path(s->to_dst_file);
 
         /* And do a ping that will make stuff easier to debug */
-        qemu_savevm_send_ping(s->file, 1);
+        qemu_savevm_send_ping(s->to_dst_file, 1);
 
         /*
          * Tell the destination that we *might* want to do postcopy later;
          * if the other end can't do postcopy it should fail now, nice and
          * early.
          */
-        qemu_savevm_send_postcopy_advise(s->file);
+        qemu_savevm_send_postcopy_advise(s->to_dst_file);
     }
 
-    qemu_savevm_state_begin(s->file, &s->params);
+    qemu_savevm_state_begin(s->to_dst_file, &s->params);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     current_active_state = MIGRATION_STATUS_ACTIVE;
@@ -1678,10 +1679,10 @@ static void *migration_thread(void *opaque)
         int64_t current_time;
         uint64_t pending_size;
 
-        if (!qemu_file_rate_limit(s->file)) {
+        if (!qemu_file_rate_limit(s->to_dst_file)) {
             uint64_t pend_post, pend_nonpost;
 
-            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+            qemu_savevm_state_pending(s->to_dst_file, max_size, &pend_nonpost,
                                       &pend_post);
             pending_size = pend_nonpost + pend_post;
             trace_migrate_pending(pending_size, max_size,
@@ -1702,7 +1703,7 @@ static void *migration_thread(void *opaque)
                     continue;
                 }
                 /* Just another iteration step */
-                qemu_savevm_state_iterate(s->file, entered_postcopy);
+                qemu_savevm_state_iterate(s->to_dst_file, entered_postcopy);
             } else {
                 trace_migration_thread_low_pending(pending_size);
                 migration_completion(s, current_active_state,
@@ -1711,7 +1712,7 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->file)) {
+        if (qemu_file_get_error(s->to_dst_file)) {
             migrate_set_state(&s->state, current_active_state,
                               MIGRATION_STATUS_FAILED);
             trace_migration_thread_file_err();
@@ -1719,7 +1720,8 @@ static void *migration_thread(void *opaque)
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         if (current_time >= initial_time + BUFFER_DELAY) {
-            uint64_t transferred_bytes = qemu_ftell(s->file) - initial_bytes;
+            uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
+                                         initial_bytes;
             uint64_t time_spent = current_time - initial_time;
             double bandwidth = (double)transferred_bytes / time_spent;
             max_size = bandwidth * migrate_max_downtime() / 1000000;
@@ -1735,11 +1737,11 @@ static void *migration_thread(void *opaque)
                 s->expected_downtime = s->dirty_bytes_rate / bandwidth;
             }
 
-            qemu_file_reset_rate_limit(s->file);
+            qemu_file_reset_rate_limit(s->to_dst_file);
             initial_time = current_time;
-            initial_bytes = qemu_ftell(s->file);
+            initial_bytes = qemu_ftell(s->to_dst_file);
         }
-        if (qemu_file_rate_limit(s->file)) {
+        if (qemu_file_rate_limit(s->to_dst_file)) {
             /* usleep expects microseconds */
             g_usleep((initial_time + BUFFER_DELAY - current_time)*1000);
         }
@@ -1757,7 +1759,7 @@ static void *migration_thread(void *opaque)
         qemu_savevm_state_cleanup();
     }
     if (s->state == MIGRATION_STATUS_COMPLETED) {
-        uint64_t transferred_bytes = qemu_ftell(s->file);
+        uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
         s->total_time = end_time - s->total_time;
         if (!entered_postcopy) {
             s->downtime = end_time - start_time;
@@ -1794,7 +1796,7 @@ void migrate_fd_connect(MigrationState *s)
     s->expected_downtime = max_downtime/1000000;
     s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
-    qemu_file_set_rate_limit(s->file,
+    qemu_file_set_rate_limit(s->to_dst_file,
                              s->bandwidth_limit / XFER_LIMIT_RATIO);
 
     /* Notify before starting migration thread */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3946aa9..0c88006 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -727,7 +727,8 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
 
     if (pds->cur_entry == MAX_DISCARDS_PER_COMMAND) {
         /* Full set, ship it! */
-        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->ramblock_name,
+        qemu_savevm_send_postcopy_ram_discard(ms->to_dst_file,
+                                              pds->ramblock_name,
                                               pds->cur_entry,
                                               pds->start_list,
                                               pds->length_list);
@@ -747,7 +748,8 @@ void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
 {
     /* Anything unsent? */
     if (pds->cur_entry) {
-        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->ramblock_name,
+        qemu_savevm_send_postcopy_ram_discard(ms->to_dst_file,
+                                              pds->ramblock_name,
                                               pds->cur_entry,
                                               pds->start_list,
                                               pds->length_list);
diff --git a/migration/rdma.c b/migration/rdma.c
index dcabb91..3314589 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3506,7 +3506,7 @@ void rdma_start_outgoing_migration(void *opaque,
 
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
-    s->file = qemu_fopen_rdma(rdma, "wb");
+    s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
     migrate_fd_connect(s);
     return;
 err:
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ad1b93..f102870 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1163,7 +1163,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         .shared = 0
     };
     MigrationState *ms = migrate_init(&params);
-    ms->file = f;
+    ms->to_dst_file = f;
 
     if (qemu_savevm_state_blocked(errp)) {
         return -EINVAL;
diff --git a/migration/tcp.c b/migration/tcp.c
index ae89172..e083d68 100644
--- a/migration/tcp.c
+++ b/migration/tcp.c
@@ -39,11 +39,11 @@ static void tcp_wait_for_connect(int fd, Error *err, void *opaque)
 
     if (fd < 0) {
         DPRINTF("migrate connect error: %s\n", error_get_pretty(err));
-        s->file = NULL;
+        s->to_dst_file = NULL;
         migrate_fd_error(s);
     } else {
         DPRINTF("migrate connect success\n");
-        s->file = qemu_fopen_socket(fd, "wb");
+        s->to_dst_file = qemu_fopen_socket(fd, "wb");
         migrate_fd_connect(s);
     }
 }
diff --git a/migration/unix.c b/migration/unix.c
index b591813..5492dd6 100644
--- a/migration/unix.c
+++ b/migration/unix.c
@@ -39,11 +39,11 @@ static void unix_wait_for_connect(int fd, Error *err, void *opaque)
 
     if (fd < 0) {
         DPRINTF("migrate connect error: %s\n", error_get_pretty(err));
-        s->file = NULL;
+        s->to_dst_file = NULL;
         migrate_fd_error(s);
     } else {
         DPRINTF("migrate connect success\n");
-        s->file = qemu_fopen_socket(fd, "wb");
+        s->to_dst_file = qemu_fopen_socket(fd, "wb");
         migrate_fd_connect(s);
     }
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (7 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 08/38] migration: Rename the'file' member of MigrationState zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

This new communication path will be used for returning messages
from destination to source.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Rebase master to use qemu_file_get_return_path() for opening return path
v10:
- fix the the error log (Dave's suggestion).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 6880aa0..0ab9618 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -36,6 +36,15 @@ bool migration_incoming_in_colo_state(void)
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+    int ret = 0;
+
+    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
+    if (!s->rp_state.from_dst_file) {
+        ret = -EINVAL;
+        error_report("Open QEMUFile from_dst_file failed");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -43,8 +52,16 @@ static void colo_process_checkpoint(MigrationState *s)
 
     /*TODO: COLO checkpoint savevm loop*/
 
+out:
+    if (ret < 0) {
+        error_report("%s: %s", __func__, strerror(-ret));
+    }
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
+
+    if (s->rp_state.from_dst_file) {
+        qemu_fclose(s->rp_state.from_dst_file);
+    }
 }
 
 void migrate_start_colo_process(MigrationState *s)
@@ -59,12 +76,34 @@ void migrate_start_colo_process(MigrationState *s)
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    int ret = 0;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
+    mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
+    if (!mis->to_src_file) {
+        ret = -EINVAL;
+        error_report("colo incoming thread: Open QEMUFile to_src_file failed");
+        goto out;
+    }
+    /* Note: We set the fd to unblocked in migration incoming coroutine,
+    *  But here we are in the colo incoming thread, so it is ok to set the
+    *  fd back to blocked.
+    */
+    qemu_set_block(qemu_get_fd(mis->from_src_file));
+
     /* TODO: COLO checkpoint restore loop */
 
+out:
+    if (ret < 0) {
+        error_report("colo incoming thread will exit, detect error: %s",
+                     strerror(-ret));
+    }
+
+    if (mis->to_src_file) {
+        qemu_fclose(mis->to_src_file);
+    }
     migration_incoming_exit_colo();
 
     return NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (8 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 14:52   ` Dr. David Alan Gilbert
  2015-12-19  8:54   ` Markus Armbruster
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
                   ` (28 subsequent siblings)
  38 siblings, 2 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

                       Primary                         Secondary
                                                       initial work
'checkpoint-ready'     <------------------------------ @

'checkpoint-request'   @ ----------------------------->
                                                       Suspend (Only in hybrid mode)
'checkpoint-reply'     <------------------------------ @
                       Suspend&Save state
'vmstate-send'         @ ----------------------------->
                       Send state                      Receive state
'vmstate-received'     <------------------------------ @
                       Release packets                 Load state
'vmstate-load'         <------------------------------ @
                       Resume                          Resume (Only in hybrid mode)

                       Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Eric Blake <eblake@redhat.com>
---
v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json |  25 ++++++++
 trace-events     |   2 +
 3 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 0ab9618..0ce2a6e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,10 +10,12 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include <unistd.h>
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qemu/sockets.h"
 
 bool colo_supported(void)
 {
@@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
+{
+    int ret;
+
+    if (cmd >= COLO_COMMAND_MAX) {
+        error_report("%s: Invalid cmd", __func__);
+        return -EINVAL;
+    }
+    qemu_put_be32(f, cmd);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
+
+    return ret;
+}
+
+static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
+{
+    int ret;
+
+    *cmd = qemu_get_be32(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return ret;
+    }
+    if (*cmd >= COLO_COMMAND_MAX) {
+        error_report("%s: Invalid cmd", __func__);
+        return -EINVAL;
+    }
+    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
+    return 0;
+}
+
+static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
+{
+    int ret;
+    uint32_t cmd;
+
+    ret = colo_get_cmd(f, &cmd);
+    if (ret < 0) {
+        return ret;
+    }
+    if (cmd != expect_cmd) {
+        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
+                     expect_cmd, cmd);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+    int ret;
+
+    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_CHECKPOINT_REPLY);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to Secondary */
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_VMSTATE_RECEIVED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_VMSTATE_LOADED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: resume Primary */
+
+out:
+    return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     int ret = 0;
@@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    /*
+     * Wait for Secondary finish loading vm states and enter COLO
+     * restore.
+     */
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_CHECKPOINT_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
-    /*TODO: COLO checkpoint savevm loop*/
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        ret = colo_do_checkpoint_transaction(s);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
     if (ret < 0) {
@@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
     qemu_mutex_lock_iothread();
 }
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * -1: some error happened, exit colo restore
+ */
+static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
+{
+    int ret;
+    uint32_t cmd;
+
+    ret = colo_get_cmd(f, &cmd);
+    if (ret < 0) {
+        /* do failover ? */
+        return ret;
+    }
+
+    switch (cmd) {
+    case COLO_COMMAND_CHECKPOINT_REQUEST:
+        *checkpoint_request = 1;
+        return 0;
+    default:
+        return -EINVAL;
+    }
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
     */
     qemu_set_block(qemu_get_fd(mis->from_src_file));
 
-    /* TODO: COLO checkpoint restore loop */
+
+    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
+    while (mis->state == MIGRATION_STATUS_COLO) {
+        int request = 0;
+        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
+
+        if (ret < 0) {
+            break;
+        } else {
+            if (!request) {
+                continue;
+            }
+        }
+        /* FIXME: This is unnecessary for periodic checkpoint mode */
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = colo_get_check_cmd(mis->from_src_file,
+                                 COLO_COMMAND_VMSTATE_SEND);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: read migration data into colo buffer */
+
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: load vm state */
+
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
     if (ret < 0) {
diff --git a/qapi-schema.json b/qapi-schema.json
index c9ff34e..85f7800 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,6 +720,31 @@
 { 'command': 'migrate-start-postcopy' }
 
 ##
+# @COLOCommand
+#
+# The commands for COLO fault tolerance
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-size: The total size of VMstate.
+#
+# @vmstate-received: VM's state has been received by SVM.
+#
+# @vmstate-loaded: VM's state has been loaded by SVM.
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOCommand',
+  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
+            'vmstate-send', 'vmstate-size','vmstate-received',
+            'vmstate-loaded' ] }
+
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/trace-events b/trace-events
index 5565e79..39fdd8d 100644
--- a/trace-events
+++ b/trace-events
@@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
 
 # migration/colo.c
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+colo_put_cmd(const char *msg) "Send '%s' cmd"
+colo_get_cmd(const char *msg) "Receive '%s' cmd"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (9 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-19  9:27   ` Markus Armbruster
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 12/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, hongyang.yang

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi-schema.json | 5 ++++-
 vl.c             | 8 ++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 85f7800..0423b47 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -154,12 +154,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.6)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked' ] }
+            'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index f84fde8..fca630b 100644
--- a/vl.c
+++ b/vl.c
@@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
     { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
     { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
     { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_PAUSED, RUN_STATE_COLO},
 
     { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+    { RUN_STATE_COLO, RUN_STATE_RUNNING },
+
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
     { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
     { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
     { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
     { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+    { RUN_STATE_RUNNING, RUN_STATE_COLO},
 
     { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
     { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 12/38] QEMUSizedBuffer: Introduce two help functions for qsb
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (10 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 13/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- size_t'ify these two help functions (Dave's suggestion)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c     | 61 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index b5d08d2..ca6a582 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -150,7 +150,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
                        uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
                      off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size);
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 49516b8..c50a495 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -366,6 +366,67 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source,
     return count;
 }
 
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size)
+{
+    size_t l;
+    int i;
+
+    for (i = 0; i < qsb->n_iov && size > 0; i++) {
+        l = MIN(qsb->iov[i].iov_len, size);
+        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+        size -= l;
+    }
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size)
+{
+    ssize_t rc = qsb_grow(qsb, size);
+    ssize_t pending = size;
+    int i;
+    uint8_t *buf = NULL;
+
+    qsb->used = 0;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+        size_t doneone = 0;
+        /* read until iov full */
+        while (doneone < qsb->iov[i].iov_len && pending > 0) {
+            size_t readone = 0;
+
+            buf = qsb->iov[i].iov_base;
+            readone = qemu_get_buffer(f, buf,
+                                MIN(qsb->iov[i].iov_len - doneone, pending));
+            if (readone == 0) {
+                return qsb->used;
+            }
+            buf += readone;
+            doneone += readone;
+            pending -= readone;
+            qsb->used += readone;
+        }
+    }
+    return qsb->used;
+}
+
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 13/38] COLO: Save PVM state to secondary side when do checkpoint
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (11 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 12/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

The main process of checkpoint is to synchronize SVM with PVM.
VM's state includes ram and device state. So we will migrate PVM's
state to SVM when do checkpoint, just like migration does.

We will cache PVM's state in slave, we use QEMUSizedBuffer
to store the data, we need to know the size of VM state, so in master,
we use qsb to store VM state temporarily, get the data size by call qsb_get_length()
and then migrate the data to the qsb in the secondary side.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Replace the old colo_ctl_get() with the new helper function colo_put_cmd_value()
v11:
- Add Reviewed-by tag

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 migration/ram.c  | 39 +++++++++++++++++++-------
 2 files changed, 108 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 0ce2a6e..42bc6ef 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,9 @@
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
 bool colo_supported(void)
 {
     return true;
@@ -53,6 +56,22 @@ static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
     return ret;
 }
 
+static int colo_put_cmd_value(QEMUFile *f, uint32_t cmd, uint64_t value)
+{
+    int ret;
+
+    ret = colo_put_cmd(f, cmd);
+    if (ret < 0) {
+        return 0;
+     }
+    qemu_put_be64(f, value);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+
+    return ret;
+}
+
 static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
 {
     int ret;
@@ -88,9 +107,12 @@ static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
     return 0;
 }
 
-static int colo_do_checkpoint_transaction(MigrationState *s)
+static int colo_do_checkpoint_transaction(MigrationState *s,
+                                          QEMUSizedBuffer *buffer)
 {
     int ret;
+    size_t size;
+    QEMUFile *trans = NULL;
 
     ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
     if (ret < 0) {
@@ -102,15 +124,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s)
     if (ret < 0) {
         goto out;
     }
+    /* Reset colo buffer and open it for write */
+    qsb_set_length(buffer, 0);
+    trans = qemu_bufopen("w", buffer);
+    if (!trans) {
+        error_report("Open colo buffer for write failed");
+        goto out;
+    }
 
-    /* TODO: suspend and save vm state to colo buffer */
+    qemu_mutex_lock_iothread();
+    vm_stop_force_state(RUN_STATE_COLO);
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("run", "stop");
+
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_header(trans);
+    qemu_savevm_state_begin(trans, &s->params);
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_complete_precopy(trans, false);
+    qemu_mutex_unlock_iothread();
+
+    qemu_fflush(trans);
 
     ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
     if (ret < 0) {
         goto out;
     }
+    /* we send the total size of the vmstate first */
+    size = qsb_get_length(buffer);
+    ret = colo_put_cmd_value(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
+    if (ret < 0) {
+        goto out;
+    }
 
-    /* TODO: send vmstate to Secondary */
+    qsb_put_buffer(s->to_dst_file, buffer, size);
+    qemu_fflush(s->to_dst_file);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        goto out;
+    }
 
     ret = colo_get_check_cmd(s->rp_state.from_dst_file,
                              COLO_COMMAND_VMSTATE_RECEIVED);
@@ -124,14 +178,24 @@ static int colo_do_checkpoint_transaction(MigrationState *s)
         goto out;
     }
 
-    /* TODO: resume Primary */
+    ret = 0;
+    /* Resume primary guest */
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    trace_colo_vm_state_change("stop", "run");
 
 out:
+    if (trans) {
+        qemu_fclose(trans);
+    }
+
     return ret;
 }
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+    QEMUSizedBuffer *buffer = NULL;
     int ret = 0;
 
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
@@ -151,6 +215,13 @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (buffer == NULL) {
+        ret = -ENOMEM;
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -158,7 +229,7 @@ static void colo_process_checkpoint(MigrationState *s)
 
     while (s->state == MIGRATION_STATUS_COLO) {
         /* start a colo checkpoint */
-        ret = colo_do_checkpoint_transaction(s);
+        ret = colo_do_checkpoint_transaction(s, buffer);
         if (ret < 0) {
             goto out;
         }
@@ -171,6 +242,9 @@ out:
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    qsb_free(buffer);
+    buffer = NULL;
+
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
diff --git a/migration/ram.c b/migration/ram.c
index 0490f00..a709471 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "migration/colo.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -1866,16 +1867,8 @@ err:
     return ret;
 }
 
-
-/* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
- * long-running RCU critical section.  When rcu-reclaims in the code
- * start to become numerous it will be necessary to reduce the
- * granularity of these critical sections.
- */
-
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_init_globals(void)
 {
-    RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
     dirty_rate_high_cnt = 0;
@@ -1940,6 +1933,31 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
+    rcu_read_unlock();
+
+    return 0;
+}
+
+/* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
+ * long-running RCU critical section.  When rcu-reclaims in the code
+ * start to become numerous it will be necessary to reduce the
+ * granularity of these critical sections.
+ */
+
+static int ram_save_setup(QEMUFile *f, void *opaque)
+{
+    RAMBlock *block;
+
+    /*
+     * migration has already setup the bitmap, reuse it.
+     */
+    if (!migration_in_colo_state()) {
+        if (ram_save_init_globals() < 0) {
+            return -1;
+         }
+    }
+
+    rcu_read_lock();
 
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
@@ -2041,7 +2059,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     while (true) {
         int pages;
 
-        pages = ram_find_and_save_block(f, true, &bytes_transferred);
+        pages = ram_find_and_save_block(f, !migration_in_colo_state(),
+                                        &bytes_transferred);
         /* no more blocks to sent */
         if (pages == 0) {
             break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (12 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 13/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 15:18   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 15/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
                   ` (24 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Split host_from_stream_offset() into two parts:
One is to get ram block, which the block idstr may be get from migration
stream, the other is to get hva (host) address from block and the offset.
Besides, we will do the check working in a new helper offset_in_ramblock().

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
v12:
- Remove the offset parameter for ram_block_from_stream() and
  check the validity of the related value in a new helper. (Dave's suggestion)
v11:
- New patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/exec/ram_addr.h |  8 ++++++--
 migration/ram.c         | 40 +++++++++++++++++++++++++---------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 7115154..2b31279 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -38,10 +38,14 @@ struct RAMBlock {
     int fd;
 };
 
+static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
+{
+    return (b && b->host && offset < b->used_length) ? true : false;
+}
+
 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
 {
-    assert(offset < block->used_length);
-    assert(block->host);
+    assert(offset_in_ramblock(block, offset));
     return (char *)block->host + offset;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index a709471..09fe6e6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2138,28 +2138,24 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
  * Returns a pointer from within the RCU-protected ram_list.
  */
 /*
- * Read a RAMBlock ID from the stream f, find the host address of the
- * start of that block and add on 'offset'
+ * Read a RAMBlock ID from the stream f.
  *
  * f: Stream to read from
- * offset: Offset within the block
  * flags: Page flags (mostly to see if it's a continuation of previous block)
  */
-static inline void *host_from_stream_offset(QEMUFile *f,
-                                            ram_addr_t offset,
-                                            int flags)
+static inline RAMBlock *ram_block_from_stream(QEMUFile *f,
+                                              int flags)
 {
     static RAMBlock *block = NULL;
     char id[256];
     uint8_t len;
 
     if (flags & RAM_SAVE_FLAG_CONTINUE) {
-        if (!block || block->max_length <= offset) {
+        if (!block) {
             error_report("Ack, bad migration stream!");
             return NULL;
         }
-
-        return block->host + offset;
+        return block;
     }
 
     len = qemu_get_byte(f);
@@ -2167,12 +2163,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     block = qemu_ram_block_by_name(id);
-    if (block && block->max_length > offset) {
-        return block->host + offset;
+    if (!block) {
+        error_report("Can't find block %s", id);
+        return NULL;
     }
 
-    error_report("Can't find block %s", id);
-    return NULL;
+    return block;
+}
+
+static inline void *host_from_ram_block_offset(RAMBlock *block,
+                                               ram_addr_t offset)
+{
+    if (!offset_in_ramblock(block, offset)) {
+        return NULL;
+    }
+
+    return block->host + offset;
 }
 
 /*
@@ -2319,7 +2325,9 @@ static int ram_load_postcopy(QEMUFile *f)
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
         place_needed = false;
         if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE)) {
-            host = host_from_stream_offset(f, addr, flags);
+            RAMBlock *block = ram_block_from_stream(f, flags);
+
+            host = host_from_ram_block_offset(block, addr);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -2450,7 +2458,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-            host = host_from_stream_offset(f, addr, flags);
+            RAMBlock *block = ram_block_from_stream(f, flags);
+
+            host = host_from_ram_block_offset(block, addr);
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 15/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (13 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 16/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Fix minor error in error_report (Dave's comment)
- Add Reviewed-by tag
v11:
- Rename 'host_cache' to 'colo_cache' (Dave's suggestion)
v10:
- Split the process of dirty pages recording into a new patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/exec/ram_addr.h       |  1 +
 include/migration/migration.h |  4 +++
 migration/colo.c              |  9 ++++++
 migration/ram.c               | 73 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 2b31279..962d322 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -26,6 +26,7 @@ struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
     uint8_t *host;
+    uint8_t *colo_cache; /* For colo, VM's ram cache */
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba5bcec..e41372d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -332,4 +332,8 @@ int ram_save_queue_pages(MigrationState *ms, const char *rbname,
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 42bc6ef..5ff4946 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -304,6 +304,11 @@ void *colo_process_incoming_thread(void *opaque)
     */
     qemu_set_block(qemu_get_fd(mis->from_src_file));
 
+    ret = colo_init_ram_cache();
+    if (ret < 0) {
+        error_report("Failed to initialize ram cache");
+        goto out;
+    }
 
     ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
     if (ret < 0) {
@@ -354,6 +359,10 @@ out:
                      strerror(-ret));
     }
 
+    qemu_mutex_lock_iothread();
+    colo_release_ram_cache();
+    qemu_mutex_unlock_iothread();
+
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
diff --git a/migration/ram.c b/migration/ram.c
index 09fe6e6..db5096a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -223,6 +223,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static QemuMutex migration_bitmap_mutex;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -2181,6 +2182,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
     return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+                                                 ram_addr_t offset)
+{
+    if (!offset_in_ramblock(block, offset)) {
+        return NULL;
+    }
+    if (!block->colo_cache) {
+        error_report("%s: colo_cache is NULL in block :%s",
+                     __func__, block->idstr);
+        return NULL;
+    }
+    return block->colo_cache + offset;
+}
+
 /*
  * If a page (or a whole RDMA chunk) has been
  * determined to be zero, then zap it.
@@ -2460,7 +2475,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
-            host = host_from_ram_block_offset(block, addr);
+            /* After going into COLO, we should load the Page into colo_cache */
+            if (ram_cache_enable) {
+                host = colo_cache_from_block_offset(block, addr);
+            } else {
+                host = host_from_ram_block_offset(block, addr);
+            }
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -2556,6 +2576,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it will be called after first migration.
+ */
+int colo_init_ram_cache(void)
+{
+    RAMBlock *block;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        block->colo_cache = qemu_anon_ram_alloc(block->used_length, NULL);
+        if (!block->colo_cache) {
+            error_report("%s: Can't alloc memory for colo cache of block %s,"
+                         "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
+                         block->used_length);
+            goto out_locked;
+        }
+        memcpy(block->colo_cache, block->host, block->used_length);
+    }
+    rcu_read_unlock();
+    ram_cache_enable = true;
+    return 0;
+
+out_locked:
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+
+    rcu_read_unlock();
+    return -errno;
+}
+
+void colo_release_ram_cache(void)
+{
+    RAMBlock *block;
+
+    ram_cache_enable = false;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+    rcu_read_unlock();
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 16/38] ram/COLO: Record the dirty pages that SVM received
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (14 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 15/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 17/38] COLO: Load VMState into qsb before restore it zhanghailiang
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.
We record them by re-using migration dirty bitmap.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Split a new helper function from original
  host_from_stream_offset() (Dave's suggestion)
- Only do recording work in this patch
v10:
- New patch split from v9's patch 13
- Rebase to master to use 'migration_bitmap_rcu'

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/ram.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index db5096a..3d5947b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2185,6 +2185,9 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
                                                  ram_addr_t offset)
 {
+    unsigned long *bitmap;
+    long k;
+
     if (!offset_in_ramblock(block, offset)) {
         return NULL;
     }
@@ -2193,6 +2196,17 @@ static inline void *colo_cache_from_block_offset(RAMBlock *block,
                      __func__, block->idstr);
         return NULL;
     }
+
+    k = (block->mr->ram_addr + offset) >> TARGET_PAGE_BITS;
+    bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+    /*
+    * During colo checkpoint, we need bitmap of these migrated pages.
+    * It help us to decide which pages in ram cache should be flushed
+    * into VM's RAM later.
+    */
+    if (!test_and_set_bit(k, bitmap)) {
+        migration_dirty_pages++;
+    }
     return block->colo_cache + offset;
 }
 
@@ -2583,6 +2597,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 int colo_init_ram_cache(void)
 {
     RAMBlock *block;
+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
@@ -2597,6 +2612,15 @@ int colo_init_ram_cache(void)
     }
     rcu_read_unlock();
     ram_cache_enable = true;
+    /*
+    * Record the dirty pages that sent by PVM, we use this dirty bitmap together
+    * with to decide which page in cache should be flushed into SVM's RAM. Here
+    * we use the same name 'migration_bitmap_rcu' as for migration.
+    */
+    migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
+    migration_bitmap_rcu->bmap = bitmap_new(ram_cache_pages);
+    migration_dirty_pages = 0;
+
     return 0;
 
 out_locked:
@@ -2614,9 +2638,15 @@ out_locked:
 void colo_release_ram_cache(void)
 {
     RAMBlock *block;
+    struct BitmapRcu *bitmap = migration_bitmap_rcu;
 
     ram_cache_enable = false;
 
+    atomic_rcu_set(&migration_bitmap_rcu, NULL);
+    if (bitmap) {
+        call_rcu(bitmap, migration_bitmap_free, rcu);
+    }
+
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (block->colo_cache) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 17/38] COLO: Load VMState into qsb before restore it
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (15 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 16/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We should not destroy the state of SVM (Secondary VM) until we receive the whole
state from the PVM (Primary VM), in case the primary fails in the middle of sending
the state, so, here we cache the device state in Secondary before restore it.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>

---
v12:
- Use the new helper colo_get_cmd_value() instead of colo_ctl_get()

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 5ff4946..a4d49ff 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -107,6 +107,21 @@ static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
     return 0;
 }
 
+static int colo_get_cmd_value(QEMUFile *f, uint32_t expect_cmd, uint64_t *value)
+{
+    int ret;
+
+    ret = colo_get_check_cmd(f, expect_cmd);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *value = qemu_get_be64(f);
+    ret = qemu_file_get_error(f);
+
+    return ret;
+}
+
 static int colo_do_checkpoint_transaction(MigrationState *s,
                                           QEMUSizedBuffer *buffer)
 {
@@ -287,7 +302,11 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    QEMUFile *fb = NULL;
+    QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
+    uint64_t  total_size;
     int ret = 0;
+    uint64_t value;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -310,6 +329,12 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
     if (ret < 0) {
         goto out;
@@ -338,19 +363,53 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
-        /* TODO: read migration data into colo buffer */
+        /* read the VM state total size first */
+        ret = colo_get_cmd_value(mis->from_src_file,
+                                 COLO_COMMAND_VMSTATE_SIZE, &value);
+        if (ret < 0) {
+            error_report("%s: Failed to get vmstate size", __func__);
+            goto out;
+        }
+
+        /* read vm device state into colo buffer */
+        total_size = qsb_fill_buffer(buffer, mis->from_src_file, value);
+        if (total_size != value) {
+            error_report("Got %lu VMState data, less than expected %lu",
+                         total_size, value);
+            ret = -EINVAL;
+            goto out;
+        }
 
         ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
         if (ret < 0) {
             goto out;
         }
 
-        /* TODO: load vm state */
+        /* open colo buffer for read */
+        fb = qemu_bufopen("r", buffer);
+        if (!fb) {
+            error_report("can't open colo buffer for read");
+            goto out;
+        }
+
+        qemu_mutex_lock_iothread();
+        qemu_system_reset(VMRESET_SILENT);
+        if (qemu_loadvm_state(fb) < 0) {
+            error_report("COLO: loadvm failed");
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        qemu_mutex_unlock_iothread();
+
+        /* TODO: flush vm state */
 
         ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
         if (ret < 0) {
             goto out;
         }
+
+        qemu_fclose(fb);
+        fb = NULL;
     }
 
 out:
@@ -359,6 +418,11 @@ out:
                      strerror(-ret));
     }
 
+    if (fb) {
+        qemu_fclose(fb);
+    }
+    qsb_free(buffer);
+
     qemu_mutex_lock_iothread();
     colo_release_ram_cache();
     qemu_mutex_unlock_iothread();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (16 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 17/38] COLO: Load VMState into qsb before restore it zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 11:07   ` Changlong Xie
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
                   ` (20 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be some with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add a trace point in the end of colo_flush_ram_cache() (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move the place of 'need_flush' (Dave's suggestion)
- Remove unused 'DPRINTF("Flush ram_cache\n")'
v10:
- trace the number of dirty pages that be received.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/migration.h |  1 +
 migration/colo.c              |  2 --
 migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
 trace-events                  |  2 ++
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e41372d..221176b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -336,4 +336,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_flush_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index a4d49ff..e40cdb9 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -401,8 +401,6 @@ void *colo_process_incoming_thread(void *opaque)
         }
         qemu_mutex_unlock_iothread();
 
-        /* TODO: flush vm state */
-
         ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
         if (ret < 0) {
             goto out;
diff --git a/migration/ram.c b/migration/ram.c
index 3d5947b..8ff7f7c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2458,6 +2458,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      * be atomic
      */
     bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
+    bool need_flush = false;
 
     seq_iter++;
 
@@ -2492,6 +2493,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             /* After going into COLO, we should load the Page into colo_cache */
             if (ram_cache_enable) {
                 host = colo_cache_from_block_offset(block, addr);
+                need_flush = true;
             } else {
                 host = host_from_ram_block_offset(block, addr);
             }
@@ -2585,6 +2587,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     }
 
     rcu_read_unlock();
+
+    if (!ret  && ram_cache_enable && need_flush) {
+        colo_flush_ram_cache();
+    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
@@ -2657,6 +2663,38 @@ void colo_release_ram_cache(void)
     rcu_read_unlock();
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+void colo_flush_ram_cache(void)
+{
+    RAMBlock *block = NULL;
+    void *dst_host;
+    void *src_host;
+    ram_addr_t offset = 0;
+
+    trace_colo_flush_ram_cache_begin(migration_dirty_pages);
+    rcu_read_lock();
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    while (block) {
+        ram_addr_t ram_addr_abs;
+        offset = migration_bitmap_find_dirty(block, offset, &ram_addr_abs);
+        migration_bitmap_clear_dirty(ram_addr_abs);
+        if (offset >= block->used_length) {
+            offset = 0;
+            block = QLIST_NEXT_RCU(block, next);
+        } else {
+            dst_host = block->host + offset;
+            src_host = block->colo_cache + offset;
+            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+        }
+    }
+    rcu_read_unlock();
+    trace_colo_flush_ram_cache_end();
+    assert(migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/trace-events b/trace-events
index 39fdd8d..7f76029 100644
--- a/trace-events
+++ b/trace-events
@@ -1264,6 +1264,8 @@ migration_throttle(void) ""
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (17 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-19  9:33   ` Markus Armbruster
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 20/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
                   ` (19 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	hongyang.yang

Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move this patch ahead of the patch where uses 'checkpoint_delay'
 (Dave's suggestion)
v10:
- Fix related qmp command

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 hmp.c                 |  7 +++++++
 migration/migration.c | 24 +++++++++++++++++++++++-
 qapi-schema.json      | 19 ++++++++++++++++---
 qmp-commands.hx       |  4 ++--
 4 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/hmp.c b/hmp.c
index 2140605..ee87d38 100644
--- a/hmp.c
+++ b/hmp.c
@@ -284,6 +284,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: %" PRId64,
             MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
             params->x_cpu_throttle_increment);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
+            params->x_checkpoint_delay);
         monitor_printf(mon, "\n");
     }
 
@@ -1237,6 +1240,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     bool has_decompress_threads = false;
     bool has_x_cpu_throttle_initial = false;
     bool has_x_cpu_throttle_increment = false;
+    bool has_x_checkpoint_delay = false;
     int i;
 
     for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
@@ -1256,6 +1260,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                 break;
             case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
                 has_x_cpu_throttle_increment = true;
+            case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
+                has_x_checkpoint_delay = true;
                 break;
             }
             qmp_migrate_set_parameters(has_compress_level, value,
@@ -1263,6 +1269,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                                        has_decompress_threads, value,
                                        has_x_cpu_throttle_initial, value,
                                        has_x_cpu_throttle_increment, value,
+                                       has_x_checkpoint_delay, value,
                                        &err);
             break;
         }
diff --git a/migration/migration.c b/migration/migration.c
index a1074c3..8988358 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,11 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
+/* The delay time (in ms) between two COLO checkpoints
+ * Note: Please change this default value to 10000 when we support hybrid mode.
+ */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -91,6 +96,8 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
         .parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
                 DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
+        .parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
+                DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
     };
 
     if (!once) {
@@ -530,6 +537,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
             s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
     params->x_cpu_throttle_increment =
             s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
+    params->x_checkpoint_delay =
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY];
 
     return params;
 }
@@ -736,7 +745,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                                 bool has_x_cpu_throttle_initial,
                                 int64_t x_cpu_throttle_initial,
                                 bool has_x_cpu_throttle_increment,
-                                int64_t x_cpu_throttle_increment, Error **errp)
+                                int64_t x_cpu_throttle_increment,
+                                bool has_x_checkpoint_delay,
+                                int64_t x_checkpoint_delay,
+                                Error **errp)
 {
     MigrationState *s = migrate_get_current();
 
@@ -771,6 +783,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                    "x_cpu_throttle_increment",
                    "an integer in the range of 1 to 99");
     }
+    if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                    "x_checkpoint_delay",
+                    "is invalid, it should be positive");
+    }
 
     if (has_compress_level) {
         s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
@@ -791,6 +808,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
         s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
                                                     x_cpu_throttle_increment;
     }
+
+    if (has_x_checkpoint_delay) {
+        s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
+                                                    x_checkpoint_delay;
+    }
 }
 
 void qmp_migrate_start_postcopy(Error **errp)
diff --git a/qapi-schema.json b/qapi-schema.json
index 0423b47..a5699a7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -623,11 +623,16 @@
 # @x-cpu-throttle-increment: throttle percentage increase each time
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
+#
+# @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
+#          periodic mode. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
-           'x-cpu-throttle-initial', 'x-cpu-throttle-increment'] }
+           'x-cpu-throttle-initial', 'x-cpu-throttle-increment',
+           'x-checkpoint-delay' ] }
 
 #
 # @migrate-set-parameters
@@ -647,6 +652,9 @@
 # @x-cpu-throttle-increment: throttle percentage increase each time
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
+#
+# @x-checkpoint-delay: the delay time between two checkpoints. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'command': 'migrate-set-parameters',
@@ -654,7 +662,8 @@
             '*compress-threads': 'int',
             '*decompress-threads': 'int',
             '*x-cpu-throttle-initial': 'int',
-            '*x-cpu-throttle-increment': 'int'} }
+            '*x-cpu-throttle-increment': 'int',
+            '*x-checkpoint-delay': 'int' } }
 
 #
 # @MigrationParameters
@@ -673,6 +682,8 @@
 #                            auto-converge detects that migration is not making
 #                            progress. The default value is 10. (Since 2.5)
 #
+# @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.6)
+#
 # Since: 2.4
 ##
 { 'struct': 'MigrationParameters',
@@ -680,7 +691,9 @@
             'compress-threads': 'int',
             'decompress-threads': 'int',
             'x-cpu-throttle-initial': 'int',
-            'x-cpu-throttle-increment': 'int'} }
+            'x-cpu-throttle-increment': 'int',
+            'x-checkpoint-delay': 'int'} }
+
 ##
 # @query-migrate-parameters
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 91979b4..89756c9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3651,7 +3651,7 @@ Set migration parameters
 - "compress-level": set compression level during migration (json-int)
 - "compress-threads": set compression thread count for migration (json-int)
 - "decompress-threads": set decompression thread count for migration (json-int)
-
+- "x-checkpoint-delay": set the delay time for periodic checkpoint (json-int)
 Arguments:
 
 Example:
@@ -3664,7 +3664,7 @@ EQMP
     {
         .name       = "migrate-set-parameters",
         .args_type  =
-            "compress-level:i?,compress-threads:i?,decompress-threads:i?",
+            "compress-level:i?,compress-threads:i?,decompress-threads:i?,x-checkpoint-delay:i?",
         .mhandler.cmd_new = qmp_marshal_migrate_set_parameters,
     },
 SQMP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 20/38] COLO: synchronize PVM's state to SVM periodically
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (18 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Do checkpoint periodically, the default interval is 200ms.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Fix wrong sleep time for checkpoint period. (Dave's review comment)

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index e40cdb9..ca5df44 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -11,6 +11,7 @@
  */
 
 #include <unistd.h>
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -211,6 +212,7 @@ out:
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
+    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret = 0;
 
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
@@ -243,11 +245,21 @@ static void colo_process_checkpoint(MigrationState *s)
     trace_colo_vm_state_change("stop", "run");
 
     while (s->state == MIGRATION_STATUS_COLO) {
+        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+        if (current_time - checkpoint_time <
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+            int64_t delay_ms;
+
+            delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
+                       (current_time - checkpoint_time);
+            g_usleep(delay_ms * 1000);
+        }
         /* start a colo checkpoint */
         ret = colo_do_checkpoint_transaction(s, buffer);
         if (ret < 0) {
             goto out;
         }
+        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     }
 
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (19 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 20/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 15:27   ` Dr. David Alan Gilbert
  2015-12-19  9:38   ` Markus Armbruster
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 22/38] COLO failover: Introduce state to record failover process zhanghailiang
                   ` (17 subsequent siblings)
  38 siblings, 2 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	hongyang.yang

We leave users to choose whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use experimental command
'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example, if the command is sent to the PVM, the Primary side will
exit COLO mode and take over operation. If sent to the Secondary, the
secondary will run failover work, then take over server operation to
become the new Primary.

Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v11:
- Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
- Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
v10:
- Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 hmp-commands.hx              | 15 +++++++++++++++
 hmp.c                        |  8 ++++++++
 hmp.h                        |  1 +
 include/migration/colo.h     |  3 +++
 include/migration/failover.h | 20 ++++++++++++++++++++
 migration/Makefile.objs      |  2 +-
 migration/colo-comm.c        | 11 +++++++++++
 migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
 migration/colo.c             |  1 +
 qapi-schema.json             | 29 +++++++++++++++++++++++++++++
 qmp-commands.hx              | 19 +++++++++++++++++++
 stubs/migration-colo.c       |  8 ++++++++
 12 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb52e4d..a381b0b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
 ETEXI
 
     {
+        .name       = "x_colo_lost_heartbeat",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
+                      "a failover or takeover is needed.",
+        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
+    },
+
+STEXI
+@item x_colo_lost_heartbeat
+@findex x_colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index ee87d38..dc6dc30 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1310,6 +1310,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_x_colo_lost_heartbeat(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index a8c5b5a..864a300 100644
--- a/hmp.h
+++ b/hmp.h
@@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2676c4a..ba27719 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
@@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
 void migration_incoming_exit_colo(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
+
+COLOMode get_colo_mode(void);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
new file mode 100644
index 0000000..1785b52
--- /dev/null
+++ b/include/migration/failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_FAILOVER_H
+#define QEMU_FAILOVER_H
+
+#include "qemu-common.h"
+
+void failover_request_active(Error **errp);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 81b5713..920d1e7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o tcp.o
-common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += colo-comm.o
+common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 30df3d3..58a6488 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -20,6 +20,17 @@ typedef struct {
 
 static COLOInfo colo_info;
 
+COLOMode get_colo_mode(void)
+{
+    if (migration_in_colo_state()) {
+        return COLO_MODE_PRIMARY;
+    } else if (migration_incoming_in_colo_state()) {
+        return COLO_MODE_SECONDARY;
+    } else {
+        return COLO_MODE_UNKNOWN;
+    }
+}
+
 static void colo_info_pre_save(void *opaque)
 {
     COLOInfo *s = opaque;
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
new file mode 100644
index 0000000..e3897c6
--- /dev/null
+++ b/migration/colo-failover.c
@@ -0,0 +1,41 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+#include "migration/failover.h"
+#include "qmp-commands.h"
+#include "qapi/qmp/qerror.h"
+
+static QEMUBH *failover_bh;
+
+static void colo_failover_bh(void *opaque)
+{
+    qemu_bh_delete(failover_bh);
+    failover_bh = NULL;
+    /*TODO: Do failover work */
+}
+
+void failover_request_active(Error **errp)
+{
+    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
+    qemu_bh_schedule(failover_bh);
+}
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
+        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
+        return;
+    }
+
+    failover_request_active(errp);
+}
diff --git a/migration/colo.c b/migration/colo.c
index ca5df44..7098497 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
+#include "migration/failover.h"
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
diff --git a/qapi-schema.json b/qapi-schema.json
index a5699a7..feb7d53 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -761,6 +761,35 @@
             'vmstate-send', 'vmstate-size','vmstate-received',
             'vmstate-loaded' ] }
 
+##
+# @COLOMode
+#
+# The colo mode
+#
+# @unknown: unknown mode
+#
+# @primary: master side
+#
+# @secondary: slave side
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOMode',
+  'data': [ 'unknown', 'primary', 'secondary'] }
+
+##
+# @x-colo-lost-heartbeat
+#
+# Tell qemu that heartbeat is lost, request it to do takeover procedures.
+# If this command is sent to the PVM, the Primary side will exit COLO mode.
+# If sent to the Secondary, the Secondary side will run failover work,
+# then takes over server operation to become the service VM.
+#
+# Since: 2.6
+##
+{ 'command': 'x-colo-lost-heartbeat' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 89756c9..76ad208 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -805,6 +805,25 @@ Example:
 EQMP
 
     {
+        .name       = "x-colo-lost-heartbeat",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
+    },
+
+SQMP
+x-colo-lost-heartbeat
+--------------------
+
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+
+Example:
+
+-> { "execute": "x-colo-lost-heartbeat" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index c12516e..5028f63 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -11,6 +11,7 @@
  */
 
 #include "migration/colo.h"
+#include "qmp-commands.h"
 
 bool colo_supported(void)
 {
@@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
 {
     return NULL;
 }
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+    error_setg(errp, "COLO is not supported, please rerun configure"
+                     " with --enable-colo option in order to support"
+                     " COLO feature");
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 22/38] COLO failover: Introduce state to record failover process
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (20 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM zhanghailiang
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

When handling failover, we do different things according to the different stage
of failover process, here we introduce a global atomic variable to record the
status of failover.

We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and set the value.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v11:
- fix several typos found by Dave
- Add Reviewed-by tag

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/failover.h | 10 ++++++++++
 migration/colo-failover.c    | 37 +++++++++++++++++++++++++++++++++++++
 migration/colo.c             |  4 ++++
 trace-events                 |  1 +
 4 files changed, 52 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index 1785b52..882c625 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -15,6 +15,16 @@
 
 #include "qemu-common.h"
 
+typedef enum COLOFailoverStatus {
+    FAILOVER_STATUS_NONE = 0,
+    FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
+    FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
+    FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+} COLOFailoverStatus;
+
+void failover_init_state(void);
+int failover_set_state(int old_state, int new_state);
+int failover_get_state(void);
 void failover_request_active(Error **errp);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index e3897c6..1b1be24 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -14,22 +14,59 @@
 #include "migration/failover.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "trace.h"
 
 static QEMUBH *failover_bh;
+static COLOFailoverStatus failover_state;
 
 static void colo_failover_bh(void *opaque)
 {
+    int old_state;
+
     qemu_bh_delete(failover_bh);
     failover_bh = NULL;
+    old_state = failover_set_state(FAILOVER_STATUS_REQUEST,
+                                   FAILOVER_STATUS_HANDLING);
+    if (old_state != FAILOVER_STATUS_REQUEST) {
+        error_report("Unkown error for failover, old_state=%d", old_state);
+        return;
+    }
     /*TODO: Do failover work */
 }
 
 void failover_request_active(Error **errp)
 {
+   if (failover_set_state(FAILOVER_STATUS_NONE, FAILOVER_STATUS_REQUEST)
+         != FAILOVER_STATUS_NONE) {
+        error_setg(errp, "COLO failover is already actived");
+        return;
+    }
     failover_bh = qemu_bh_new(colo_failover_bh, NULL);
     qemu_bh_schedule(failover_bh);
 }
 
+void failover_init_state(void)
+{
+    failover_state = FAILOVER_STATUS_NONE;
+}
+
+int failover_set_state(int old_state, int new_state)
+{
+    int old;
+
+    old = atomic_cmpxchg(&failover_state, old_state, new_state);
+    if (old == old_state) {
+        trace_colo_failover_set_state(new_state);
+    }
+    return old;
+}
+
+int failover_get_state(void)
+{
+    return atomic_read(&failover_state);
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
     if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index 7098497..176384e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -216,6 +216,8 @@ static void colo_process_checkpoint(MigrationState *s)
     int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret = 0;
 
+    failover_init_state();
+
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         ret = -EINVAL;
@@ -324,6 +326,8 @@ void *colo_process_incoming_thread(void *opaque)
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
+    failover_init_state();
+
     mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
     if (!mis->to_src_file) {
         ret = -EINVAL;
diff --git a/trace-events b/trace-events
index 7f76029..3992b45 100644
--- a/trace-events
+++ b/trace-events
@@ -1583,6 +1583,7 @@ postcopy_ram_incoming_cleanup_join(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_put_cmd(const char *msg) "Send '%s' cmd"
 colo_get_cmd(const char *msg) "Receive '%s' cmd"
+colo_failover_set_state(int new_state) "new state %d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (21 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 22/38] COLO failover: Introduce state to record failover process zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 15:35   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 24/38] COLO: Implement failover work for Secondary VM zhanghailiang
                   ` (15 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

For PVM, if there is failover request from users.
The colo thread will exit the loop while the failover BH does the
cleanup work and resumes VM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v12:
- Fix error report and remove unnecessary check in primary_vm_do_failover()
 (Dave's suggestion)
v11:
- Don't call migration_end() in primary_vm_do_failover(),
 The cleanup work will be done in migration_thread().
- Remove vm_start() in primary_vm_do_failover() which also been done
  in migraiton_thread()
v10:
- Call migration_end() in primary_vm_do_failover()

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h     |  3 +++
 include/migration/failover.h |  1 +
 migration/colo-failover.c    |  7 +++++-
 migration/colo.c             | 54 ++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index ba27719..0b02e95 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
index 882c625..fba3931 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -26,5 +26,6 @@ void failover_init_state(void);
 int failover_set_state(int old_state, int new_state);
 int failover_get_state(void);
 void failover_request_active(Error **errp);
+bool failover_request_is_active(void);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 1b1be24..0c525da 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
         error_report("Unkown error for failover, old_state=%d", old_state);
         return;
     }
-    /*TODO: Do failover work */
+    colo_do_failover(NULL);
 }
 
 void failover_request_active(Error **errp)
@@ -67,6 +67,11 @@ int failover_get_state(void)
     return atomic_read(&failover_state);
 }
 
+bool failover_request_is_active(void)
+{
+    return ((failover_get_state() != FAILOVER_STATUS_NONE));
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
     if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index 176384e..977c8d8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -41,6 +41,40 @@ bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+static void primary_vm_do_failover(void)
+{
+    MigrationState *s = migrate_get_current();
+    int old_state;
+
+    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+
+    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                   FAILOVER_STATUS_COMPLETED);
+    if (old_state != FAILOVER_STATUS_HANDLING) {
+        error_report("Incorrect state (%d) while doing failover for Primary VM",
+                     old_state);
+        return;
+    }
+}
+
+void colo_do_failover(MigrationState *s)
+{
+    /* Make sure vm stopped while failover */
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (get_colo_mode() == COLO_MODE_PRIMARY) {
+        primary_vm_do_failover();
+    }
+}
+
 static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
 {
     int ret;
@@ -150,9 +184,22 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     }
 
     qemu_mutex_lock_iothread();
+    if (failover_request_is_active()) {
+        qemu_mutex_unlock_iothread();
+        ret = -1;
+        goto out;
+    }
     vm_stop_force_state(RUN_STATE_COLO);
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("run", "stop");
+    /*
+     * failover request bh could be called after
+     * vm_stop_force_state so we check failover_request_is_active() again.
+     */
+    if (failover_request_is_active()) {
+        ret = -1;
+        goto out;
+    }
 
     /* Disable block migration */
     s->params.blk = 0;
@@ -248,6 +295,11 @@ static void colo_process_checkpoint(MigrationState *s)
     trace_colo_vm_state_change("stop", "run");
 
     while (s->state == MIGRATION_STATUS_COLO) {
+        if (failover_request_is_active()) {
+            error_report("failover request");
+            goto out;
+        }
+
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
         if (current_time - checkpoint_time <
             s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
@@ -269,8 +321,6 @@ out:
     if (ret < 0) {
         error_report("%s: %s", __func__, strerror(-ret));
     }
-    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
-                      MIGRATION_STATUS_COMPLETED);
 
     qsb_free(buffer);
     buffer = NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 24/38] COLO: Implement failover work for Secondary VM
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (22 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

If users require SVM to takeover work, colo incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
v12:
- Improve error message that suggested by Dave
- Add Reviewed-by tag

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 977c8d8..d1dd4e1 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -46,6 +46,33 @@ static bool colo_runstate_is_stopped(void)
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void secondary_vm_do_failover(void)
+{
+    int old_state;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
+                      MIGRATION_STATUS_COMPLETED);
+
+    if (!autostart) {
+        error_report("\"-S\" qemu option will be ignored in secondary side");
+        /* recover runstate to normal migration finish state */
+        autostart = true;
+    }
+
+    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                   FAILOVER_STATUS_COMPLETED);
+    if (old_state != FAILOVER_STATUS_HANDLING) {
+        error_report("Incorrect state (%d) while doing failover for "
+                     "secondary VM", old_state);
+        return;
+    }
+    /* For Secondary VM, jump to incoming co */
+    if (mis->migration_incoming_co) {
+        qemu_coroutine_enter(mis->migration_incoming_co, NULL);
+    }
+}
+
 static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
@@ -72,6 +99,8 @@ void colo_do_failover(MigrationState *s)
 
     if (get_colo_mode() == COLO_MODE_PRIMARY) {
         primary_vm_do_failover();
+    } else {
+        secondary_vm_do_failover();
     }
 }
 
@@ -418,6 +447,12 @@ void *colo_process_incoming_thread(void *opaque)
                 continue;
             }
         }
+
+        if (failover_request_is_active()) {
+            error_report("failover request");
+            goto out;
+        }
+
         /* FIXME: This is unnecessary for periodic checkpoint mode */
         ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
         if (ret < 0) {
@@ -487,10 +522,11 @@ out:
         qemu_fclose(fb);
     }
     qsb_free(buffer);
-
-    qemu_mutex_lock_iothread();
+    /* Here, we can ensure BH is hold the global lock, and will join colo
+    * incoming thread, so here it is not necessary to lock here again,
+    * or there will be a deadlock error.
+    */
     colo_release_ram_cache();
-    qemu_mutex_unlock_iothread();
 
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (23 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 24/38] COLO: Implement failover work for Secondary VM zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 16:03   ` Eric Blake
  2015-12-19 10:02   ` Markus Armbruster
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
                   ` (13 subsequent siblings)
  38 siblings, 2 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, Michael Roth, hongyang.yang

If some errors happen during VM's COLO FT stage, it's important to notify the users
of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v11:
- Fix several typos found by Eric

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 docs/qmp-events.txt | 17 +++++++++++++++++
 migration/colo.c    | 11 +++++++++++
 qapi-schema.json    | 16 ++++++++++++++++
 qapi/event.json     | 17 +++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index d2f1ce4..19f68fc 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,23 @@ Example:
 Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
 event.
 
+COLO_EXIT
+---------
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.
+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason":  the exit reason, internal error or external request. (json-string)
+ - "error": error message (json-string, operation)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+
 DEVICE_DELETED
 --------------
 
diff --git a/migration/colo.c b/migration/colo.c
index d1dd4e1..d06c14f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
 out:
     if (ret < 0) {
         error_report("%s: %s", __func__, strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
     }
 
     qsb_free(buffer);
@@ -516,6 +522,11 @@ out:
     if (ret < 0) {
         error_report("colo incoming thread will exit, detect error: %s",
                      strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
     }
 
     if (fb) {
diff --git a/qapi-schema.json b/qapi-schema.json
index feb7d53..f6ecb88 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -778,6 +778,22 @@
   'data': [ 'unknown', 'primary', 'secondary'] }
 
 ##
+# @COLOExitReason
+#
+# The reason for a COLO exit
+#
+# @unknown: unknown reason
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'unknown', 'request', 'error'] }
+
+##
 # @x-colo-lost-heartbeat
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index f0cef01..f63d456 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -255,6 +255,23 @@
   'data': {'status': 'MigrationStatus'}}
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# @error: #optional, error message. Only present on error happening.
+#
+# Since: 2.6
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (24 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  9:44   ` Dr. David Alan Gilbert
  2015-12-15 10:23   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
                   ` (12 subsequent siblings)
  38 siblings, 2 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

If the net connection between COLO's two sides is broken while colo/colo incoming
thread is blocked in 'read'/'write' socket fd. It will not detect this error until
connect timeout. It will be a long time.

Here we shutdown all the related socket file descriptors to wake up the blocking
operation in failover BH. Besides, we should close the corresponding file descriptors
after failvoer BH shutdown them, or there will be an error.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v12:
- Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion)
v11:
- Only shutdown fd for once

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d06c14f..58531e7 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void)
         /* recover runstate to normal migration finish state */
         autostart = true;
     }
+    /*
+    * Make sure colo incoming thread not block in recv or send,
+    * If mis->from_src_file and mis->to_src_file use the same fd,
+    * The second shutdown() will return -1, we ignore this value,
+    * it is harmless.
+    */
+    if (mis->from_src_file) {
+        qemu_file_shutdown(mis->from_src_file);
+    }
+    if (mis->to_src_file) {
+        qemu_file_shutdown(mis->to_src_file);
+    }
 
     old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
                                    FAILOVER_STATUS_COMPLETED);
@@ -82,6 +94,18 @@ static void primary_vm_do_failover(void)
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    /*
+    * Make sure colo thread no block in recv or send,
+    * The s->rp_state.from_dst_file and s->to_dst_file may use the
+    * same fd, but we still shutdown the fd for twice, it is harmless.
+    */
+    if (s->to_dst_file) {
+        qemu_file_shutdown(s->to_dst_file);
+    }
+    if (s->rp_state.from_dst_file) {
+        qemu_file_shutdown(s->rp_state.from_dst_file);
+    }
+
     old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
                                    FAILOVER_STATUS_COMPLETED);
     if (old_state != FAILOVER_STATUS_HANDLING) {
@@ -348,7 +372,7 @@ static void colo_process_checkpoint(MigrationState *s)
     }
 
 out:
-    if (ret < 0) {
+    if (ret < 0 || (!ret && !failover_request_is_active())) {
         error_report("%s: %s", __func__, strerror(-ret));
         qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
                                   true, strerror(-ret), NULL);
@@ -360,6 +384,15 @@ out:
     qsb_free(buffer);
     buffer = NULL;
 
+    /* Hope this not to be too long to loop here */
+    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
+        ;
+    }
+    /*
+    * Must be called after failover BH is completed,
+    * Or the failover BH may shutdown the wrong fd, that
+    * re-used by other thread after we release here.
+    */
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
@@ -519,7 +552,7 @@ void *colo_process_incoming_thread(void *opaque)
     }
 
 out:
-    if (ret < 0) {
+    if (ret < 0 || (!ret && !failover_request_is_active())) {
         error_report("colo incoming thread will exit, detect error: %s",
                      strerror(-ret));
         qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
@@ -539,6 +572,11 @@ out:
     */
     colo_release_ram_cache();
 
+    /* Hope this not to be too long to loop here */
+    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
+        ;
+    }
+    /* Must be called after failover BH is completed */
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (25 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 10:21   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We should not do failover work while the main thread is loading
VM's state, otherwise it will destroy the consistent of VM's memory and
device state.

Here we add a new failover status 'RELAUNCH' which means we should
relaunch the process of failover.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/migration/failover.h |  2 ++
 migration/colo.c             | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index fba3931..e115d25 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
     FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
     FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
     FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+    /* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
+    FAILOVER_STATUS_RELAUNCH = 4,
 } COLOFailoverStatus;
 
 void failover_init_state(void);
diff --git a/migration/colo.c b/migration/colo.c
index 58531e7..f4bb661 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -20,6 +20,8 @@
 #include "migration/failover.h"
 #include "qapi-event.h"
 
+static bool vmstate_loading;
+
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -52,6 +54,19 @@ static void secondary_vm_do_failover(void)
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    /* Can not do failover during the process of VM's loading VMstate, Or
+      * it will break the secondary VM.
+      */
+    if (vmstate_loading) {
+        old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+                                       FAILOVER_STATUS_RELAUNCH);
+        if (old_state != FAILOVER_STATUS_HANDLING) {
+            error_report("Unknow error while do failover for secondary VM,"
+                         "old_state: %d", old_state);
+        }
+        return;
+    }
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
@@ -535,13 +550,23 @@ void *colo_process_incoming_thread(void *opaque)
 
         qemu_mutex_lock_iothread();
         qemu_system_reset(VMRESET_SILENT);
+        vmstate_loading = true;
         if (qemu_loadvm_state(fb) < 0) {
             error_report("COLO: loadvm failed");
+            vmstate_loading = false;
             qemu_mutex_unlock_iothread();
             goto out;
         }
+
+        vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
+        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+            failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
+            failover_request_active(NULL);
+            goto out;
+        }
+
         ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
         if (ret < 0) {
             goto out;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (26 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 11:31   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state zhanghailiang
                   ` (10 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	Paolo Bonzini, hongyang.yang

If VM is in COLO FT state, we should do some extra work before normal shutdown
process. SVM will ignore the shutdown command if this command is issued directly
to it, PVM will send the shutdown command to SVM if it gets this command.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  3 +++
 migration/colo.c        | 25 +++++++++++++++++++++++--
 qapi-schema.json        |  4 +++-
 vl.c                    | 26 ++++++++++++++++++++++++--
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3bb8897..91eeda3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -52,6 +52,8 @@ typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -59,6 +61,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index f4bb661..a094991 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -231,6 +231,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
                                           QEMUSizedBuffer *buffer)
 {
     int ret;
+    int colo_shutdown;
     size_t size;
     QEMUFile *trans = NULL;
 
@@ -258,6 +259,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         ret = -1;
         goto out;
     }
+    colo_shutdown = colo_shutdown_requested;
     vm_stop_force_state(RUN_STATE_COLO);
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("run", "stop");
@@ -311,6 +313,15 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    if (colo_shutdown) {
+        colo_put_cmd(s->to_dst_file, COLO_COMMAND_GUEST_SHUTDOWN);
+        qemu_fflush(s->to_dst_file);
+        colo_shutdown_requested = 0;
+        qemu_system_shutdown_request_core();
+        /* Fix me: Just let the colo thread exit ? */
+        qemu_thread_exit(0);
+    }
+
     ret = 0;
     /* Resume primary guest */
     qemu_mutex_lock_iothread();
@@ -370,8 +381,9 @@ static void colo_process_checkpoint(MigrationState *s)
         }
 
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-        if (current_time - checkpoint_time <
-            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+        if ((current_time - checkpoint_time <
+            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) &&
+            !colo_shutdown_requested) {
             int64_t delay_ms;
 
             delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
@@ -442,6 +454,15 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
     case COLO_COMMAND_CHECKPOINT_REQUEST:
         *checkpoint_request = 1;
         return 0;
+    case COLO_COMMAND_GUEST_SHUTDOWN:
+        qemu_mutex_lock_iothread();
+        vm_stop_force_state(RUN_STATE_COLO);
+        qemu_system_shutdown_request_core();
+        qemu_mutex_unlock_iothread();
+        /* the main thread will exit and termiante the whole
+        * process, do we need some cleanup?
+        */
+        qemu_thread_exit(0);
     default:
         return -EINVAL;
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index f6ecb88..b5b1a02 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -754,12 +754,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @guest-shutdown: shutdown require from PVM to SVM
+#
 # Since: 2.6
 ##
 { 'enum': 'COLOCommand',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
             'vmstate-send', 'vmstate-size','vmstate-received',
-            'vmstate-loaded' ] }
+            'vmstate-loaded', 'guest-shutdown' ] }
 
 ##
 # @COLOMode
diff --git a/vl.c b/vl.c
index fca630b..1a61300 100644
--- a/vl.c
+++ b/vl.c
@@ -1636,6 +1636,8 @@ static NotifierList wakeup_notifiers =
     NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
+int colo_shutdown_requested;
+
 int qemu_shutdown_requested_get(void)
 {
     return shutdown_requested;
@@ -1767,6 +1769,10 @@ void qemu_system_guest_panicked(void)
 void qemu_system_reset_request(void)
 {
     if (no_reboot) {
+        qemu_system_shutdown_request();
+        if (!shutdown_requested) {/* colo handle it ? */
+            return;
+        }
         shutdown_requested = 1;
     } else {
         reset_requested = 1;
@@ -1840,14 +1846,30 @@ void qemu_system_killed(int signal, pid_t pid)
     qemu_notify_event();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 {
-    trace_qemu_system_shutdown_request();
     replay_shutdown_request();
     shutdown_requested = 1;
     qemu_notify_event();
 }
 
+void qemu_system_shutdown_request(void)
+{
+    trace_qemu_system_shutdown_request();
+    /*
+    * if in colo mode, we need do some significant work before respond to the
+    * shutdown request.
+    */
+    if (migration_incoming_in_colo_state()) {
+        return ; /* primary's responsibility */
+    }
+    if (migration_in_colo_state()) {
+        colo_shutdown_requested = 1;
+        return;
+    }
+    qemu_system_shutdown_request_core();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown(&error_abort);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (27 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 11:52   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
                   ` (9 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
after going into colo state.
So it is necessary to update the global runstate after going into colo state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/colo.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index a094991..62a0444 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -374,6 +374,11 @@ static void colo_process_checkpoint(MigrationState *s)
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
+    ret = global_state_store();
+    if (ret < 0) {
+        goto out;
+    }
+
     while (s->state == MIGRATION_STATUS_COLO) {
         if (failover_request_is_active()) {
             error_report("failover request");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (28 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 12:08   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
                   ` (8 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

qemu_loadvm_state is too long, and we can simplify it by splitting up
with three helper functions.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/savevm.c | 161 ++++++++++++++++++++++++++++++++---------------------
 1 file changed, 97 insertions(+), 64 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f102870..c7c26d8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1710,90 +1710,123 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
     }
 }
 
+static int
+qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
+{
+    uint32_t instance_id, version_id, section_id;
+    SaveStateEntry *se;
+    LoadStateEntry *le;
+    char idstr[256];
+    int ret;
+
+    /* Read section start */
+    section_id = qemu_get_be32(f);
+    if (!qemu_get_counted_string(f, idstr)) {
+        error_report("Unable to read ID string for section %u",
+                     section_id);
+        return -EINVAL;
+    }
+    instance_id = qemu_get_be32(f);
+    version_id = qemu_get_be32(f);
+
+    trace_qemu_loadvm_state_section_startfull(section_id, idstr,
+            instance_id, version_id);
+    /* Find savevm section */
+    se = find_se(idstr, instance_id);
+    if (se == NULL) {
+        error_report("Unknown savevm section or instance '%s' %d",
+                     idstr, instance_id);
+        ret = -EINVAL;
+        return ret;
+    }
+
+    /* Validate version */
+    if (version_id > se->version_id) {
+        error_report("savevm: unsupported version %d for '%s' v%d",
+                     version_id, idstr, se->version_id);
+        ret = -EINVAL;
+        return ret;
+    }
+
+    /* Add entry */
+    le = g_malloc0(sizeof(*le));
+
+    le->se = se;
+    le->section_id = section_id;
+    le->version_id = version_id;
+    QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
+
+    ret = vmstate_load(f, le->se, le->version_id);
+    if (ret < 0) {
+        error_report("error while loading state for instance 0x%x of"
+                     " device '%s'", instance_id, idstr);
+        return ret;
+    }
+    if (!check_section_footer(f, le)) {
+        ret = -EINVAL;
+        return ret;
+    }
+
+    return 0;
+}
+
+static int
+qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
+{
+    uint32_t section_id;
+    LoadStateEntry *le;
+    int ret;
+
+    section_id = qemu_get_be32(f);
+
+    trace_qemu_loadvm_state_section_partend(section_id);
+    QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
+        if (le->section_id == section_id) {
+            break;
+        }
+    }
+    if (le == NULL) {
+        error_report("Unknown savevm section %d", section_id);
+        ret = -EINVAL;
+        return ret;
+    }
+
+    ret = vmstate_load(f, le->se, le->version_id);
+    if (ret < 0) {
+        error_report("error while loading state section id %d(%s)",
+                     section_id, le->se->idstr);
+        return ret;
+    }
+    if (!check_section_footer(f, le)) {
+        ret = -EINVAL;
+        return ret;
+    }
+
+    return 0;
+}
+
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret;
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-        uint32_t instance_id, version_id, section_id;
-        SaveStateEntry *se;
-        LoadStateEntry *le;
-        char idstr[256];
 
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
-            /* Read section start */
-            section_id = qemu_get_be32(f);
-            if (!qemu_get_counted_string(f, idstr)) {
-                error_report("Unable to read ID string for section %u",
-                            section_id);
-                return -EINVAL;
-            }
-            instance_id = qemu_get_be32(f);
-            version_id = qemu_get_be32(f);
-
-            trace_qemu_loadvm_state_section_startfull(section_id, idstr,
-                                                      instance_id, version_id);
-            /* Find savevm section */
-            se = find_se(idstr, instance_id);
-            if (se == NULL) {
-                error_report("Unknown savevm section or instance '%s' %d",
-                             idstr, instance_id);
-                return -EINVAL;
-            }
-
-            /* Validate version */
-            if (version_id > se->version_id) {
-                error_report("savevm: unsupported version %d for '%s' v%d",
-                             version_id, idstr, se->version_id);
-                return -EINVAL;
-            }
-
-            /* Add entry */
-            le = g_malloc0(sizeof(*le));
-
-            le->se = se;
-            le->section_id = section_id;
-            le->version_id = version_id;
-            QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
-
-            ret = vmstate_load(f, le->se, le->version_id);
+            ret = qemu_loadvm_section_start_full(f, mis);
             if (ret < 0) {
-                error_report("error while loading state for instance 0x%x of"
-                             " device '%s'", instance_id, idstr);
                 return ret;
             }
-            if (!check_section_footer(f, le)) {
-                return -EINVAL;
-            }
             break;
         case QEMU_VM_SECTION_PART:
         case QEMU_VM_SECTION_END:
-            section_id = qemu_get_be32(f);
-
-            trace_qemu_loadvm_state_section_partend(section_id);
-            QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
-                if (le->section_id == section_id) {
-                    break;
-                }
-            }
-            if (le == NULL) {
-                error_report("Unknown savevm section %d", section_id);
-                return -EINVAL;
-            }
-
-            ret = vmstate_load(f, le->se, le->version_id);
+            ret = qemu_loadvm_section_part_end(f, mis);
             if (ret < 0) {
-                error_report("error while loading state section id %d(%s)",
-                             section_id, le->se->idstr);
                 return ret;
             }
-            if (!check_section_footer(f, le)) {
-                return -EINVAL;
-            }
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (29 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 10:53   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
                   ` (7 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

We separate the process of saving/loading ram and device state when do checkpoint,
we add new helpers for save/load ram/device. With this change, we can directly
transfer ram from master to slave without using QEMUSizeBuffer as assistant,
which also reduce the size of extra memory been used during checkpoint.

Besides, we move the colo_flush_ram_cache to the proper position after the
above change.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
v11:
- Remove load configuration section in qemu_loadvm_state_begin()

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/sysemu/sysemu.h |   6 +++
 migration/colo.c        |  43 ++++++++++++----
 migration/ram.c         |   5 --
 migration/savevm.c      | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 168 insertions(+), 18 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 91eeda3..5deae53 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -133,7 +133,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint64_t *start_list,
                                            uint64_t *length_list);
 
+int qemu_save_ram_precopy(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_load_ram_state(QEMUFile *f);
+int qemu_load_device_state(QEMUFile *f);
 
 typedef enum DisplayType
 {
diff --git a/migration/colo.c b/migration/colo.c
index 62a0444..d253d64 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -272,21 +272,32 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
+    if (ret < 0) {
+        goto out;
+    }
     /* Disable block migration */
     s->params.blk = 0;
     s->params.shared = 0;
-    qemu_savevm_state_header(trans);
-    qemu_savevm_state_begin(trans, &s->params);
-    qemu_mutex_lock_iothread();
-    qemu_savevm_state_complete_precopy(trans, false);
-    qemu_mutex_unlock_iothread();
-
-    qemu_fflush(trans);
+    qemu_savevm_state_begin(s->to_dst_file, &s->params);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        error_report("save vm state begin error\n");
+        goto out;
+    }
 
-    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
+    qemu_mutex_lock_iothread();
+    /* Note: device state is saved into buffer */
+    ret = qemu_save_device_state(trans);
     if (ret < 0) {
+        error_report("save device state error\n");
+        qemu_mutex_unlock_iothread();
         goto out;
     }
+    qemu_fflush(trans);
+    qemu_save_ram_precopy(s->to_dst_file);
+    qemu_mutex_unlock_iothread();
+
     /* we send the total size of the vmstate first */
     size = qsb_get_length(buffer);
     ret = colo_put_cmd_value(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
@@ -545,6 +556,16 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        ret = qemu_loadvm_state_begin(mis->from_src_file);
+        if (ret < 0) {
+            error_report("load vm state begin error, ret=%d", ret);
+            goto out;
+        }
+        ret = qemu_load_ram_state(mis->from_src_file);
+        if (ret < 0) {
+            error_report("load ram state error");
+            goto out;
+        }
         /* read the VM state total size first */
         ret = colo_get_cmd_value(mis->from_src_file,
                                  COLO_COMMAND_VMSTATE_SIZE, &value);
@@ -577,8 +598,10 @@ void *colo_process_incoming_thread(void *opaque)
         qemu_mutex_lock_iothread();
         qemu_system_reset(VMRESET_SILENT);
         vmstate_loading = true;
-        if (qemu_loadvm_state(fb) < 0) {
-            error_report("COLO: loadvm failed");
+        colo_flush_ram_cache();
+        ret = qemu_load_device_state(fb);
+        if (ret < 0) {
+            error_report("COLO: load device state failed\n");
             vmstate_loading = false;
             qemu_mutex_unlock_iothread();
             goto out;
diff --git a/migration/ram.c b/migration/ram.c
index 8ff7f7c..45d9332 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2458,7 +2458,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      * be atomic
      */
     bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
-    bool need_flush = false;
 
     seq_iter++;
 
@@ -2493,7 +2492,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             /* After going into COLO, we should load the Page into colo_cache */
             if (ram_cache_enable) {
                 host = colo_cache_from_block_offset(block, addr);
-                need_flush = true;
             } else {
                 host = host_from_ram_block_offset(block, addr);
             }
@@ -2588,9 +2586,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
     rcu_read_unlock();
 
-    if (!ret  && ram_cache_enable && need_flush) {
-        colo_flush_ram_cache();
-    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index c7c26d8..94c0d10 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -50,6 +50,7 @@
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
+#include "migration/colo.h"
 
 
 #ifndef ETH_P_RARP
@@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
             break;
         }
     }
+    if (migration_in_colo_state()) {
+        qemu_put_byte(f, QEMU_VM_EOF);
+        qemu_fflush(f);
+    }
 }
 
 /*
@@ -1192,13 +1197,44 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+int qemu_save_ram_precopy(QEMUFile *f)
 {
     SaveStateEntry *se;
+    int ret = 0;
 
-    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+
+        save_section_header(f, se, QEMU_VM_SECTION_END);
 
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return ret;
+        }
+    }
+    qemu_put_byte(f, QEMU_VM_EOF);
+
+    return 0;
+}
+
+int qemu_save_device_state(QEMUFile *f)
+{
+    SaveStateEntry *se;
+
+    if (!migration_in_colo_state()) {
+        qemu_savevm_state_header(f);
+    }
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -1938,6 +1974,96 @@ int qemu_loadvm_state(QEMUFile *f)
     return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = -1;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (!mis) {
+        error_report("qemu_loadvm_state_begin");
+        return -EINVAL;
+    }
+    /* CleanUp */
+    loadvm_free_handlers(mis);
+
+    if (qemu_savevm_state_blocked(NULL)) {
+        return -EINVAL;
+    }
+
+    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
+        if (section_type != QEMU_VM_SECTION_START) {
+            error_report("QEMU_VM_SECTION_START");
+            ret = -EINVAL;
+            goto out;
+        }
+        ret = qemu_loadvm_section_start_full(f, mis);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+    ret = qemu_file_get_error(f);
+    if (ret == 0) {
+        return 0;
+     }
+out:
+    return ret;
+}
+
+int qemu_load_ram_state(QEMUFile *f)
+{
+    uint8_t section_type;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int ret = -1;
+
+    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
+        if (section_type != QEMU_VM_SECTION_PART &&
+            section_type != QEMU_VM_SECTION_END) {
+            error_report("load ram state, not get "
+                         "QEMU_VM_SECTION_FULL or QEMU_VM_SECTION_END");
+            return -EINVAL;
+        }
+        ret = qemu_loadvm_section_part_end(f, mis);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+    ret = qemu_file_get_error(f);
+    if (ret == 0) {
+        return 0;
+     }
+out:
+    return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+    uint8_t section_type;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int ret = -1;
+
+    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
+        if (section_type != QEMU_VM_SECTION_FULL) {
+            error_report("load device state error: "
+                         "Not get QEMU_VM_SECTION_FULL");
+            return -EINVAL;
+        }
+        ret = qemu_loadvm_section_start_full(f, mis);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+
+    ret = qemu_file_get_error(f);
+
+    cpu_synchronize_all_post_init();
+    if (ret == 0) {
+        return 0;
+    }
+out:
+    return ret;
+}
+
 void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
     BlockDriverState *bs, *bs1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (30 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-18 12:01   ` Dr. David Alan Gilbert
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 33/38] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
                   ` (6 subsequent siblings)
  38 siblings, 1 reply; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these data
transferring in the later checkpoint.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/colo.c | 51 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d253d64..4571359 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -276,15 +276,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     if (ret < 0) {
         goto out;
     }
-    /* Disable block migration */
-    s->params.blk = 0;
-    s->params.shared = 0;
-    qemu_savevm_state_begin(s->to_dst_file, &s->params);
-    ret = qemu_file_get_error(s->to_dst_file);
-    if (ret < 0) {
-        error_report("save vm state begin error\n");
-        goto out;
-    }
 
     qemu_mutex_lock_iothread();
     /* Note: device state is saved into buffer */
@@ -348,6 +339,21 @@ out:
     return ret;
 }
 
+static int colo_prepare_before_save(MigrationState *s)
+{
+    int ret;
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_begin(s->to_dst_file, &s->params);
+    ret = qemu_file_get_error(s->to_dst_file);
+    if (ret < 0) {
+        error_report("save vm state begin error\n");
+        return ret;
+    }
+    return 0;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
@@ -363,6 +369,11 @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    ret = colo_prepare_before_save(s);
+    if (ret < 0) {
+        goto out;
+    }
+
     /*
      * Wait for Secondary finish loading vm states and enter COLO
      * restore.
@@ -484,6 +495,18 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
     }
 }
 
+static int colo_prepare_before_load(QEMUFile *f)
+{
+    int ret;
+
+    ret = qemu_loadvm_state_begin(f);
+    if (ret < 0) {
+        error_report("load vm state begin error, ret=%d", ret);
+        return ret;
+    }
+    return 0;
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -522,6 +545,11 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    ret = colo_prepare_before_load(mis->from_src_file);
+    if (ret < 0) {
+        goto out;
+    }
+
     ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
     if (ret < 0) {
         goto out;
@@ -556,11 +584,6 @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
-        ret = qemu_loadvm_state_begin(mis->from_src_file);
-        if (ret < 0) {
-            error_report("load vm state begin error, ret=%d", ret);
-            goto out;
-        }
         ret = qemu_load_ram_state(mis->from_src_file);
         if (ret < 0) {
             error_report("load ram state error");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 33/38] net/filter-buffer: Add default filter-buffer for each netdev
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (31 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 34/38] filter-buffer: Accept zero interval zhanghailiang
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Jason Wang, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, hongyang.yang

We add each netdev (except vhost-net) a default filter-buffer,
which will be used for COLO or Micro-checkpoint to buffer VM's packets.
The name of default filter-buffer is 'nop'.
For the default filter-buffer, it will not buffer any packets in default.
So it has no side effect for the netdev.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v12:
- Skip vhost-net when add default filter
- Don't go through filter layer if the filter is disabled.
v11:
- New patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/net/filter.h | 10 +++++++
 net/filter-buffer.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/filter.c         |  6 +++-
 net/net.c            | 12 ++++++++
 4 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index 2deda36..40aa38c 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -56,6 +56,8 @@ struct NetFilterState {
     NetClientState *netdev;
     NetFilterDirection direction;
     char info_str[256];
+    bool is_default;
+    bool enabled;
     QTAILQ_ENTRY(NetFilterState) next;
 };
 
@@ -74,4 +76,12 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
                                     int iovcnt,
                                     void *opaque);
 
+static inline bool qemu_need_skip_netfilter(NetFilterState *nf)
+{
+    return nf->enabled ? false : true;
+}
+
+void netdev_add_default_filter_buffer(const char *netdev_id,
+                                      NetFilterDirection direction,
+                                      Error **errp);
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 57be149..9cf3544 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -14,6 +14,13 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi-visit.h"
 #include "qom/object.h"
+#include "net/net.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp-output-visitor.h"
+#include "qapi/qmp-input-visitor.h"
+#include "monitor/monitor.h"
+#include "qmp-commands.h"
+#include "net/vhost_net.h"
 
 #define TYPE_FILTER_BUFFER "filter-buffer"
 
@@ -102,6 +109,7 @@ static void filter_buffer_cleanup(NetFilterState *nf)
 static void filter_buffer_setup(NetFilterState *nf, Error **errp)
 {
     FilterBufferState *s = FILTER_BUFFER(nf);
+    char *path = object_get_canonical_path_component(OBJECT(nf));
 
     /*
      * We may want to accept zero interval when VM FT solutions like MC
@@ -114,6 +122,14 @@ static void filter_buffer_setup(NetFilterState *nf, Error **errp)
     }
 
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+    nf->is_default = !strcmp(path, "nop");
+    /*
+    * For the default buffer filter, it will be disabled by default,
+    * So it will not buffer any packets.
+    */
+    if (nf->is_default) {
+        nf->enabled = false;
+    }
     if (s->interval) {
         timer_init_us(&s->release_timer, QEMU_CLOCK_VIRTUAL,
                       filter_buffer_release_timer, nf);
@@ -163,6 +179,72 @@ out:
     error_propagate(errp, local_err);
 }
 
+/*
+* This will be used by COLO or MC FT, for which they will need
+* to buffer the packets of VM's net devices, Here we add a default
+* buffer filter for each netdev. The name of default buffer filter is
+* 'nop'
+*/
+void netdev_add_default_filter_buffer(const char *netdev_id,
+                                      NetFilterDirection direction,
+                                      Error **errp)
+{
+    QmpOutputVisitor *qov;
+    QmpInputVisitor *qiv;
+    Visitor *ov, *iv;
+    QObject *obj = NULL;
+    QDict *qdict;
+    void *dummy = NULL;
+    const char *id = "nop";
+    char *queue = g_strdup(NetFilterDirection_lookup[direction]);
+    NetClientState *nc = qemu_find_netdev(netdev_id);
+    Error *err = NULL;
+
+    /* FIXME: Not support multiple queues */
+    if (!nc || nc->queue_index > 1) {
+        g_free(queue);
+        return;
+    }
+    /* Not support vhost-net */
+    if (get_vhost_net(nc)) {
+        g_free(queue);
+        return;
+    }
+    qov = qmp_output_visitor_new();
+    ov = qmp_output_get_visitor(qov);
+    visit_start_struct(ov,  &dummy, NULL, NULL, 0, &err);
+    if (err) {
+        goto out;
+    }
+    visit_type_str(ov, &nc->name, "netdev", &err);
+    if (err) {
+        goto out;
+    }
+    visit_type_str(ov, &queue, "queue", &err);
+    if (err) {
+        goto out;
+    }
+    visit_end_struct(ov, &err);
+    if (err) {
+        goto out;
+    }
+    obj = qmp_output_get_qobject(qov);
+    g_assert(obj != NULL);
+    qdict = qobject_to_qdict(obj);
+    qmp_output_visitor_cleanup(qov);
+
+    qiv = qmp_input_visitor_new(obj);
+    iv = qmp_input_get_visitor(qiv);
+    object_add(TYPE_FILTER_BUFFER, id, qdict, iv, &err);
+    qmp_input_visitor_cleanup(qiv);
+    qobject_decref(obj);
+out:
+    g_free(queue);
+    if (err) {
+        error_propagate(errp, err);
+    }
+}
+
 static void filter_buffer_init(Object *obj)
 {
     object_property_add(obj, "interval", "int",
diff --git a/net/filter.c b/net/filter.c
index 1365bad..0b1e408 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -163,7 +163,8 @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
     }
 
     nf->netdev = ncs[0];
-
+    nf->is_default = false;
+    nf->enabled = true;
     if (nfc->setup) {
         nfc->setup(nf, &local_err);
         if (local_err) {
@@ -190,6 +191,9 @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
         g_free(info);
     }
     object_property_iter_free(iter);
+    info = g_strdup_printf(",status=%s", nf->enabled ? "on" : "off");
+    g_strlcat(nf->info_str, info, sizeof(nf->info_str));
+    g_free(info);
 }
 
 static void netfilter_finalize(Object *obj)
diff --git a/net/net.c b/net/net.c
index ade6051..d04d872 100644
--- a/net/net.c
+++ b/net/net.c
@@ -581,6 +581,10 @@ static ssize_t filter_receive_iov(NetClientState *nc,
     NetFilterState *nf = NULL;
 
     QTAILQ_FOREACH(nf, &nc->filters, next) {
+        /* Don't go through filter if it is off */
+        if (qemu_need_skip_netfilter(nf)) {
+            continue;
+        }
         ret = qemu_netfilter_receive(nf, direction, sender, flags, iov,
                                      iovcnt, sent_cb);
         if (ret) {
@@ -1028,6 +1032,14 @@ static int net_client_init1(const void *object, int is_netdev, Error **errp)
         }
         return -1;
     }
+
+    if (is_netdev) {
+        const Netdev *netdev = object;
+
+        netdev_add_default_filter_buffer(netdev->id,
+                                         NET_FILTER_DIRECTION_RX,
+                                         errp);
+    }
     return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 34/38] filter-buffer: Accept zero interval
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (32 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 33/38] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 35/38] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Jason Wang, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, hongyang.yang

For default buffer filter, its 'interval' value is zero,
so here we should accept zero interval.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Cc: Jason Wang <jasowang@redhat.com>
---
v12:
- Add Reviewed-by tag
v11:
- Add comment
v10:
- new patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/filter-buffer.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 9cf3544..8abac94 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -111,16 +111,6 @@ static void filter_buffer_setup(NetFilterState *nf, Error **errp)
     FilterBufferState *s = FILTER_BUFFER(nf);
     char *path = object_get_canonical_path_component(OBJECT(nf));
 
-    /*
-     * We may want to accept zero interval when VM FT solutions like MC
-     * or COLO use this filter to release packets on demand.
-     */
-    if (!s->interval) {
-        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval",
-                   "a non-zero interval");
-        return;
-    }
-
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
     nf->is_default = !strcmp(path, "nop");
     /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 35/38] filter-buffer: Introduce a helper function to enable/disable default filter
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (33 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 34/38] filter-buffer: Accept zero interval zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 36/38] filter-buffer: Introduce a helper function to release packets zhanghailiang
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Jason Wang, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, hongyang.yang

The default buffer filter doesn't buffer packets in default,
but we need to buffer packets for COLO or Micro-checkpoint,
Here we add a helper function to enable/disable filter's buffer
capability.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v12:
- Rename the heler function to qemu_set_default_filters_status()
v11:
- New patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/net/filter.h |  1 +
 include/net/net.h    |  4 ++++
 net/filter-buffer.c  | 19 +++++++++++++++++++
 net/net.c            | 29 +++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 40aa38c..08aa604 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -84,4 +84,5 @@ static inline bool qemu_need_skip_netfilter(NetFilterState *nf)
 void netdev_add_default_filter_buffer(const char *netdev_id,
                                       NetFilterDirection direction,
                                       Error **errp);
+void qemu_set_default_filters_status(bool enable);
 #endif /* QEMU_NET_FILTER_H */
diff --git a/include/net/net.h b/include/net/net.h
index 7af3e15..5c65c45 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -125,6 +125,10 @@ NetClientState *qemu_find_vlan_client_by_name(Monitor *mon, int vlan_id,
                                               const char *client_str);
 typedef void (*qemu_nic_foreach)(NICState *nic, void *opaque);
 void qemu_foreach_nic(qemu_nic_foreach func, void *opaque);
+typedef void (*qemu_netfilter_foreach)(NetFilterState *nf, void *opaque,
+                                       Error **errp);
+void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
+                            Error **errp);
 int qemu_can_send_packet(NetClientState *nc);
 ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
                           int iovcnt);
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 8abac94..90a50cc 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -169,6 +169,25 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void set_default_filter_status(NetFilterState *nf,
+                                      void *opaque,
+                                      Error **errp)
+{
+    if (!strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_BUFFER)) {
+        bool *status = opaque;
+
+        if (nf->is_default) {
+            nf->enabled = *status;
+        }
+    }
+}
+
+void qemu_set_default_filters_status(bool enable)
+{
+    qemu_foreach_netfilter(set_default_filter_status,
+                           &enable, NULL);
+}
+
 /*
 * This will be used by COLO or MC FT, for which they will need
 * to buffer the packets of VM's net devices, Here we add a default
diff --git a/net/net.c b/net/net.c
index d04d872..75b828e 100644
--- a/net/net.c
+++ b/net/net.c
@@ -259,6 +259,35 @@ static char *assign_name(NetClientState *nc1, const char *model)
     return g_strdup_printf("%s.%d", model, id);
 }
 
+void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
+                            Error **errp)
+{
+    NetClientState *nc;
+    NetFilterState *nf;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+            continue;
+        }
+        /* FIXME: Not support multiqueue */
+        if (nc->queue_index > 1) {
+            error_setg(errp, "%s: multiqueue is not supported", __func__);
+            return;
+        }
+        QTAILQ_FOREACH(nf, &nc->filters, next) {
+            if (func) {
+                Error *local_err = NULL;
+
+                func(nf, opaque, &local_err);
+                if (local_err) {
+                    error_propagate(errp, local_err);
+                    return;
+                }
+            }
+        }
+    }
+}
+
 static void qemu_net_client_destructor(NetClientState *nc)
 {
     g_free(nc);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 36/38] filter-buffer: Introduce a helper function to release packets
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (34 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 35/38] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 37/38] colo: Use default buffer-filter to buffer and " zhanghailiang
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Jason Wang, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, hongyang.yang

We need to release all the packets from VM in COLO or Micro-checkpoint,
here we add a new helper function to realse the packets that buffered
by default buffer-filter

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v12:
- Rename this helper function
v11:
- New patch

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/net/filter.h |  1 +
 net/filter-buffer.c  | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 08aa604..52cb38b 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -84,5 +84,6 @@ static inline bool qemu_need_skip_netfilter(NetFilterState *nf)
 void netdev_add_default_filter_buffer(const char *netdev_id,
                                       NetFilterDirection direction,
                                       Error **errp);
+void qemu_release_default_filters_packets(void);
 void qemu_set_default_filters_status(bool enable);
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 90a50cc..d53b251 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -169,6 +169,24 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void release_default_filter_packets(NetFilterState *nf,
+                                           void *opaque,
+                                           Error **errp)
+{
+    if (!strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_BUFFER)) {
+
+        if (nf->is_default) {
+            filter_buffer_flush(nf);
+        }
+    }
+}
+
+/* public APIs */
+void qemu_release_default_filters_packets(void)
+{
+    qemu_foreach_netfilter(release_default_filter_packets, NULL, NULL);
+}
+
 static void set_default_filter_status(NetFilterState *nf,
                                       void *opaque,
                                       Error **errp)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 37/38] colo: Use default buffer-filter to buffer and release packets
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (35 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 36/38] filter-buffer: Introduce a helper function to release packets zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 38/38] COLO: Add block replication into colo process zhanghailiang
  2015-12-15 12:14 ` [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, Jason Wang, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, hongyang.yang

Enable default filter to buffer packets and release the
packets after a checkpoint.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
---
v12:
- Add a helper function to check if all netdev supports buffer packets.
- Flush buffered packets when do failover.
v11:
- Use new helper functions to buffer and release packets.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/net/net.h |  1 +
 migration/colo.c  | 24 +++++++++++++++++++++++-
 net/net.c         | 17 +++++++++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/net/net.h b/include/net/net.h
index 5c65c45..2eb9451 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -129,6 +129,7 @@ typedef void (*qemu_netfilter_foreach)(NetFilterState *nf, void *opaque,
                                        Error **errp);
 void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
                             Error **errp);
+bool qemu_netdev_support_netfilter(void);
 int qemu_can_send_packet(NetClientState *nc);
 ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
                           int iovcnt);
diff --git a/migration/colo.c b/migration/colo.c
index 4571359..b7a7ad6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -19,6 +19,8 @@
 #include "qemu/sockets.h"
 #include "migration/failover.h"
 #include "qapi-event.h"
+#include "net/filter.h"
+#include "net/net.h"
 
 static bool vmstate_loading;
 
@@ -128,6 +130,10 @@ static void primary_vm_do_failover(void)
                      old_state);
         return;
     }
+    /* Don't buffer any packets while exited COLO */
+    qemu_set_default_filters_status(false);
+    /* Flush the residuary buffered packts */
+    qemu_release_default_filters_packets();
 }
 
 void colo_do_failover(MigrationState *s)
@@ -315,6 +321,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    qemu_release_default_filters_packets();
+
     if (colo_shutdown) {
         colo_put_cmd(s->to_dst_file, COLO_COMMAND_GUEST_SHUTDOWN);
         qemu_fflush(s->to_dst_file);
@@ -354,6 +362,17 @@ static int colo_prepare_before_save(MigrationState *s)
     return 0;
 }
 
+static int colo_init_buffer_filters(void)
+{
+    if (!qemu_netdev_support_netfilter()) {
+        return -EPERM;
+    }
+    /* Begin to buffer packets that sent by VM */
+    qemu_set_default_filters_status(true);
+
+    return 0;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QEMUSizedBuffer *buffer = NULL;
@@ -361,7 +380,10 @@ static void colo_process_checkpoint(MigrationState *s)
     int ret = 0;
 
     failover_init_state();
-
+    ret = colo_init_buffer_filters();
+    if (ret < 0) {
+        goto out;
+    }
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         ret = -EINVAL;
diff --git a/net/net.c b/net/net.c
index 75b828e..96d97ce 100644
--- a/net/net.c
+++ b/net/net.c
@@ -288,6 +288,23 @@ void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
     }
 }
 
+bool qemu_netdev_support_netfilter(void)
+{
+    NetClientState *nc;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+            continue;
+        }
+        if (QTAILQ_EMPTY(&nc->filters)) {
+            error_report("netdev (%s) does not support filter", nc->name);
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static void qemu_net_client_destructor(NetClientState *nc)
 {
     g_free(nc);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH COLO-Frame v12 38/38] COLO: Add block replication into colo process
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (36 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 37/38] colo: Use default buffer-filter to buffer and " zhanghailiang
@ 2015-12-15  8:22 ` zhanghailiang
  2015-12-15 12:14 ` [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
  38 siblings, 0 replies; 94+ messages in thread
From: zhanghailiang @ 2015-12-15  8:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

Make sure master start block replication after slave's block replication started.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/colo.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events     |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index b7a7ad6..d748fb5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,6 +21,7 @@
 #include "qapi-event.h"
 #include "net/filter.h"
 #include "net/net.h"
+#include "block/block_int.h"
 
 static bool vmstate_loading;
 
@@ -55,6 +56,7 @@ static void secondary_vm_do_failover(void)
 {
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
 
     /* Can not do failover during the process of VM's loading VMstate, Or
       * it will break the secondary VM.
@@ -72,6 +74,12 @@ static void secondary_vm_do_failover(void)
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    bdrv_stop_replication_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+    trace_colo_stop_block_replication("failover");
+
     if (!autostart) {
         error_report("\"-S\" qemu option will be ignored in secondary side");
         /* recover runstate to normal migration finish state */
@@ -107,6 +115,7 @@ static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
     int old_state;
+    Error *local_err = NULL;
 
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
@@ -134,6 +143,12 @@ static void primary_vm_do_failover(void)
     qemu_set_default_filters_status(false);
     /* Flush the residuary buffered packts */
     qemu_release_default_filters_packets();
+
+    bdrv_stop_replication_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+    trace_colo_stop_block_replication("failover");
 }
 
 void colo_do_failover(MigrationState *s)
@@ -240,6 +255,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     int colo_shutdown;
     size_t size;
     QEMUFile *trans = NULL;
+    Error *local_err = NULL;
 
     ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
     if (ret < 0) {
@@ -278,6 +294,16 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    /* we call this api although this may do nothing on primary side */
+    qemu_mutex_lock_iothread();
+    bdrv_do_checkpoint_all(&local_err);
+    qemu_mutex_unlock_iothread();
+    if (local_err) {
+        error_report_err(local_err);
+        ret = -1;
+        goto out;
+    }
+
     ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
     if (ret < 0) {
         goto out;
@@ -324,6 +350,10 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     qemu_release_default_filters_packets();
 
     if (colo_shutdown) {
+        qemu_mutex_lock_iothread();
+        bdrv_stop_replication_all(false, NULL);
+        trace_colo_stop_block_replication("shutdown");
+        qemu_mutex_unlock_iothread();
         colo_put_cmd(s->to_dst_file, COLO_COMMAND_GUEST_SHUTDOWN);
         qemu_fflush(s->to_dst_file);
         colo_shutdown_requested = 0;
@@ -378,6 +408,7 @@ static void colo_process_checkpoint(MigrationState *s)
     QEMUSizedBuffer *buffer = NULL;
     int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int ret = 0;
+    Error *local_err = NULL;
 
     failover_init_state();
     ret = colo_init_buffer_filters();
@@ -414,6 +445,15 @@ static void colo_process_checkpoint(MigrationState *s)
     }
 
     qemu_mutex_lock_iothread();
+    /* start block replication */
+    bdrv_start_replication_all(REPLICATION_MODE_PRIMARY, &local_err);
+    if (local_err) {
+        qemu_mutex_unlock_iothread();
+        error_report_err(local_err);
+        ret = -EINVAL;
+        goto out;
+    }
+    trace_colo_start_block_replication();
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
@@ -506,6 +546,8 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
     case COLO_COMMAND_GUEST_SHUTDOWN:
         qemu_mutex_lock_iothread();
         vm_stop_force_state(RUN_STATE_COLO);
+        bdrv_stop_replication_all(false, NULL);
+        trace_colo_stop_block_replication("shutdown");
         qemu_system_shutdown_request_core();
         qemu_mutex_unlock_iothread();
         /* the main thread will exit and termiante the whole
@@ -537,6 +579,7 @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t  total_size;
     int ret = 0;
     uint64_t value;
+    Error *local_err = NULL;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -572,6 +615,16 @@ void *colo_process_incoming_thread(void *opaque)
         goto out;
     }
 
+    qemu_mutex_lock_iothread();
+    /* start block replication */
+    bdrv_start_replication_all(REPLICATION_MODE_SECONDARY, &local_err);
+    qemu_mutex_unlock_iothread();
+    if (local_err) {
+        error_report_err(local_err);
+        goto out;
+    }
+    trace_colo_start_block_replication();
+
     ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
     if (ret < 0) {
         goto out;
@@ -651,6 +704,13 @@ void *colo_process_incoming_thread(void *opaque)
             qemu_mutex_unlock_iothread();
             goto out;
         }
+        /* discard colo disk buffer */
+        bdrv_do_checkpoint_all(&local_err);
+        if (local_err) {
+            vmstate_loading = false;
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
 
         vmstate_loading = false;
         qemu_mutex_unlock_iothread();
diff --git a/trace-events b/trace-events
index 3992b45..3951689 100644
--- a/trace-events
+++ b/trace-events
@@ -1584,6 +1584,8 @@ colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_put_cmd(const char *msg) "Send '%s' cmd"
 colo_get_cmd(const char *msg) "Receive '%s' cmd"
 colo_failover_set_state(int new_state) "new state %d"
+colo_start_block_replication(void) "Block replication is started"
+colo_stop_block_replication(const char *reason) "Block replication is stopped(reason: '%s')"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
@ 2015-12-15  9:44   ` Dr. David Alan Gilbert
  2015-12-15 10:23   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15  9:44 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If the net connection between COLO's two sides is broken while colo/colo incoming
> thread is blocked in 'read'/'write' socket fd. It will not detect this error until
> connect timeout. It will be a long time.
> 
> Here we shutdown all the related socket file descriptors to wake up the blocking
> operation in failover BH. Besides, we should close the corresponding file descriptors
> after failvoer BH shutdown them, or there will be an error.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
> v12:
> - Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion)
> v11:
> - Only shutdown fd for once
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index d06c14f..58531e7 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void)
>          /* recover runstate to normal migration finish state */
>          autostart = true;
>      }
> +    /*
> +    * Make sure colo incoming thread not block in recv or send,
> +    * If mis->from_src_file and mis->to_src_file use the same fd,
> +    * The second shutdown() will return -1, we ignore this value,
> +    * it is harmless.
> +    */
> +    if (mis->from_src_file) {
> +        qemu_file_shutdown(mis->from_src_file);
> +    }
> +    if (mis->to_src_file) {
> +        qemu_file_shutdown(mis->to_src_file);
> +    }
>  
>      old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>                                     FAILOVER_STATUS_COMPLETED);
> @@ -82,6 +94,18 @@ static void primary_vm_do_failover(void)
>      migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>                        MIGRATION_STATUS_COMPLETED);
>  
> +    /*
> +    * Make sure colo thread no block in recv or send,
> +    * The s->rp_state.from_dst_file and s->to_dst_file may use the
> +    * same fd, but we still shutdown the fd for twice, it is harmless.
> +    */
> +    if (s->to_dst_file) {
> +        qemu_file_shutdown(s->to_dst_file);
> +    }
> +    if (s->rp_state.from_dst_file) {
> +        qemu_file_shutdown(s->rp_state.from_dst_file);
> +    }
> +
>      old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>                                     FAILOVER_STATUS_COMPLETED);
>      if (old_state != FAILOVER_STATUS_HANDLING) {
> @@ -348,7 +372,7 @@ static void colo_process_checkpoint(MigrationState *s)
>      }
>  
>  out:
> -    if (ret < 0) {
> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>          error_report("%s: %s", __func__, strerror(-ret));
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>                                    true, strerror(-ret), NULL);
> @@ -360,6 +384,15 @@ out:
>      qsb_free(buffer);
>      buffer = NULL;
>  
> +    /* Hope this not to be too long to loop here */
> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> +        ;
> +    }
> +    /*
> +    * Must be called after failover BH is completed,
> +    * Or the failover BH may shutdown the wrong fd, that
> +    * re-used by other thread after we release here.
> +    */
>      if (s->rp_state.from_dst_file) {
>          qemu_fclose(s->rp_state.from_dst_file);
>      }
> @@ -519,7 +552,7 @@ void *colo_process_incoming_thread(void *opaque)
>      }
>  
>  out:
> -    if (ret < 0) {
> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>          error_report("colo incoming thread will exit, detect error: %s",
>                       strerror(-ret));
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
> @@ -539,6 +572,11 @@ out:
>      */
>      colo_release_ram_cache();
>  
> +    /* Hope this not to be too long to loop here */
> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> +        ;
> +    }
> +    /* Must be called after failover BH is completed */
>      if (mis->to_src_file) {
>          qemu_fclose(mis->to_src_file);
>      }
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
@ 2015-12-15  9:46   ` Wen Congyang
  2015-12-15 11:19     ` Hailiang Zhang
  2015-12-15 11:31     ` Hailiang Zhang
  0 siblings, 2 replies; 94+ messages in thread
From: Wen Congyang @ 2015-12-15  9:46 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 12/15/2015 04:22 PM, zhanghailiang wrote:
> configure --enable-colo/--disable-colo to switch COLO
> support on/off.
> COLO support is On by default.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> v11:
> - Turn COLO on in default (Eric's suggestion)
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

I think you forgot to remove this line.

Thanks
Wen Congyang

> ---
>  configure | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/configure b/configure
> index b9552fd..32e466f 100755
> --- a/configure
> +++ b/configure
> @@ -260,6 +260,7 @@ xfs=""
>  vhost_net="no"
>  vhost_scsi="no"
>  kvm="no"
> +colo="yes"
>  rdma=""
>  gprof="no"
>  debug_tcg="no"
> @@ -939,6 +940,10 @@ for opt do
>    ;;
>    --enable-kvm) kvm="yes"
>    ;;
> +  --disable-colo) colo="no"
> +  ;;
> +  --enable-colo) colo="yes"
> +  ;;
>    --disable-tcg-interpreter) tcg_interpreter="no"
>    ;;
>    --enable-tcg-interpreter) tcg_interpreter="yes"
> @@ -1362,6 +1367,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>    fdt             fdt device tree
>    bluez           bluez stack connectivity
>    kvm             KVM acceleration support
> +  colo            COarse-grain LOck-stepping VM for Non-stop Service
>    rdma            RDMA-based migration support
>    uuid            uuid support
>    vde             support for vde network
> @@ -4792,6 +4798,7 @@ echo "Linux AIO support $linux_aio"
>  echo "ATTR/XATTR support $attr"
>  echo "Install blobs     $blobs"
>  echo "KVM support       $kvm"
> +echo "COLO support      $colo"
>  echo "RDMA support      $rdma"
>  echo "TCG interpreter   $tcg_interpreter"
>  echo "fdt support       $fdt"
> @@ -5381,6 +5388,10 @@ if have_backend "ftrace"; then
>  fi
>  echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
>  
> +if test "$colo" = "yes"; then
> +  echo "CONFIG_COLO=y" >> $config_host_mak
> +fi
> +
>  if test "$rdma" = "yes" ; then
>    echo "CONFIG_RDMA=y" >> $config_host_mak
>  fi
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
@ 2015-12-15 10:21   ` Dr. David Alan Gilbert
  2015-12-25  1:02     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 10:21 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We should not do failover work while the main thread is loading
> VM's state, otherwise it will destroy the consistent of VM's memory and
> device state.
> 
> Here we add a new failover status 'RELAUNCH' which means we should
> relaunch the process of failover.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  include/migration/failover.h |  2 ++
>  migration/colo.c             | 25 +++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/migration/failover.h b/include/migration/failover.h
> index fba3931..e115d25 100644
> --- a/include/migration/failover.h
> +++ b/include/migration/failover.h
> @@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
>      FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
>      FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
>      FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
> +    /* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
> +    FAILOVER_STATUS_RELAUNCH = 4,
>  } COLOFailoverStatus;
>  
>  void failover_init_state(void);
> diff --git a/migration/colo.c b/migration/colo.c
> index 58531e7..f4bb661 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -20,6 +20,8 @@
>  #include "migration/failover.h"
>  #include "qapi-event.h"
>  
> +static bool vmstate_loading;
> +
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
> @@ -52,6 +54,19 @@ static void secondary_vm_do_failover(void)
>      int old_state;
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
> +    /* Can not do failover during the process of VM's loading VMstate, Or
> +      * it will break the secondary VM.
> +      */
> +    if (vmstate_loading) {
> +        old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
> +                                       FAILOVER_STATUS_RELAUNCH);
> +        if (old_state != FAILOVER_STATUS_HANDLING) {
> +            error_report("Unknow error while do failover for secondary VM,"
> +                         "old_state: %d", old_state);

Typo: 'Unknown' and it would be good to say it was during vmstate_loading.

The state is being loaded from the qemu buffer, not the real file descriptor,
so we're guaranteed that the vmstate will finish loading; so yes, this is OK.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +        }
> +        return;
> +    }
> +
>      migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
>                        MIGRATION_STATUS_COMPLETED);
>  
> @@ -535,13 +550,23 @@ void *colo_process_incoming_thread(void *opaque)
>  
>          qemu_mutex_lock_iothread();
>          qemu_system_reset(VMRESET_SILENT);
> +        vmstate_loading = true;
>          if (qemu_loadvm_state(fb) < 0) {
>              error_report("COLO: loadvm failed");
> +            vmstate_loading = false;
>              qemu_mutex_unlock_iothread();
>              goto out;
>          }
> +
> +        vmstate_loading = false;
>          qemu_mutex_unlock_iothread();
>  
> +        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
> +            failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
> +            failover_request_active(NULL);
> +            goto out;
> +        }
> +
>          ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>          if (ret < 0) {
>              goto out;
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
  2015-12-15  9:44   ` Dr. David Alan Gilbert
@ 2015-12-15 10:23   ` Dr. David Alan Gilbert
  2015-12-16  5:58     ` Hailiang Zhang
  1 sibling, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 10:23 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If the net connection between COLO's two sides is broken while colo/colo incoming
> thread is blocked in 'read'/'write' socket fd. It will not detect this error until
> connect timeout. It will be a long time.
> 
> Here we shutdown all the related socket file descriptors to wake up the blocking
> operation in failover BH. Besides, we should close the corresponding file descriptors
> after failvoer BH shutdown them, or there will be an error.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v12:
> - Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion)
> v11:
> - Only shutdown fd for once
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index d06c14f..58531e7 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void)
>          /* recover runstate to normal migration finish state */
>          autostart = true;
>      }
> +    /*
> +    * Make sure colo incoming thread not block in recv or send,
> +    * If mis->from_src_file and mis->to_src_file use the same fd,
> +    * The second shutdown() will return -1, we ignore this value,
> +    * it is harmless.
> +    */
> +    if (mis->from_src_file) {
> +        qemu_file_shutdown(mis->from_src_file);
> +    }
> +    if (mis->to_src_file) {
> +        qemu_file_shutdown(mis->to_src_file);
> +    }
>  
>      old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>                                     FAILOVER_STATUS_COMPLETED);
> @@ -82,6 +94,18 @@ static void primary_vm_do_failover(void)
>      migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>                        MIGRATION_STATUS_COMPLETED);
>  
> +    /*
> +    * Make sure colo thread no block in recv or send,
> +    * The s->rp_state.from_dst_file and s->to_dst_file may use the
> +    * same fd, but we still shutdown the fd for twice, it is harmless.
> +    */
> +    if (s->to_dst_file) {
> +        qemu_file_shutdown(s->to_dst_file);
> +    }
> +    if (s->rp_state.from_dst_file) {
> +        qemu_file_shutdown(s->rp_state.from_dst_file);
> +    }
> +
>      old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>                                     FAILOVER_STATUS_COMPLETED);
>      if (old_state != FAILOVER_STATUS_HANDLING) {
> @@ -348,7 +372,7 @@ static void colo_process_checkpoint(MigrationState *s)
>      }
>  
>  out:
> -    if (ret < 0) {
> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>          error_report("%s: %s", __func__, strerror(-ret));
>          qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>                                    true, strerror(-ret), NULL);
> @@ -360,6 +384,15 @@ out:
>      qsb_free(buffer);
>      buffer = NULL;
>  
> +    /* Hope this not to be too long to loop here */
> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> +        ;
> +    }
> +    /*
> +    * Must be called after failover BH is completed,
> +    * Or the failover BH may shutdown the wrong fd, that
> +    * re-used by other thread after we release here.
> +    */
>      if (s->rp_state.from_dst_file) {
>          qemu_fclose(s->rp_state.from_dst_file);
>      }
> @@ -519,7 +552,7 @@ void *colo_process_incoming_thread(void *opaque)
>      }
>  
>  out:
> -    if (ret < 0) {
> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>          error_report("colo incoming thread will exit, detect error: %s",
>                       strerror(-ret));
>          qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
> @@ -539,6 +572,11 @@ out:
>      */
>      colo_release_ram_cache();
>  
> +    /* Hope this not to be too long to loop here */
> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> +        ;
> +    }

Hmm, one thing I just noticed; if there was a failure earlier
in colo_process_incoming_thread, ret <0, and it 'goto out'
then I think it gets stuck in this failover loop?

Dave

> +    /* Must be called after failover BH is completed */
>      if (mis->to_src_file) {
>          qemu_fclose(mis->to_src_file);
>      }
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
@ 2015-12-15 11:07   ` Changlong Xie
  2015-12-25  3:03     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Changlong Xie @ 2015-12-15 11:07 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

On 12/15/2015 04:22 PM, zhanghailiang wrote:
> During the time of VM's running, PVM may dirty some pages, we will transfer
> PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
> time. So, the content of SVM's RAM cache will always be some with PVM's memory
"some" => "same"

Thanks
	-Xie
> after checkpoint.
>
> Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
> we do this in a more efficient way:
> Only flush any page that dirtied by PVM since last checkpoint.
> In this way, we can ensure SVM's memory same with PVM's.
>
> Besides, we must ensure flush RAM cache before load device state.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> v12:
> - Add a trace point in the end of colo_flush_ram_cache() (Dave's suggestion)
> - Add Reviewed-by tag
> v11:
> - Move the place of 'need_flush' (Dave's suggestion)
> - Remove unused 'DPRINTF("Flush ram_cache\n")'
> v10:
> - trace the number of dirty pages that be received.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>   include/migration/migration.h |  1 +
>   migration/colo.c              |  2 --
>   migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
>   trace-events                  |  2 ++
>   4 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index e41372d..221176b 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -336,4 +336,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
>   /* ram cache */
>   int colo_init_ram_cache(void);
>   void colo_release_ram_cache(void);
> +void colo_flush_ram_cache(void);
>   #endif
> diff --git a/migration/colo.c b/migration/colo.c
> index a4d49ff..e40cdb9 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -401,8 +401,6 @@ void *colo_process_incoming_thread(void *opaque)
>           }
>           qemu_mutex_unlock_iothread();
>
> -        /* TODO: flush vm state */
> -
>           ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>           if (ret < 0) {
>               goto out;
> diff --git a/migration/ram.c b/migration/ram.c
> index 3d5947b..8ff7f7c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2458,6 +2458,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>        * be atomic
>        */
>       bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
> +    bool need_flush = false;
>
>       seq_iter++;
>
> @@ -2492,6 +2493,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>               /* After going into COLO, we should load the Page into colo_cache */
>               if (ram_cache_enable) {
>                   host = colo_cache_from_block_offset(block, addr);
> +                need_flush = true;
>               } else {
>                   host = host_from_ram_block_offset(block, addr);
>               }
> @@ -2585,6 +2587,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       }
>
>       rcu_read_unlock();
> +
> +    if (!ret  && ram_cache_enable && need_flush) {
> +        colo_flush_ram_cache();
> +    }
>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>               "%" PRIu64 "\n", ret, seq_iter);
>       return ret;
> @@ -2657,6 +2663,38 @@ void colo_release_ram_cache(void)
>       rcu_read_unlock();
>   }
>
> +/*
> + * Flush content of RAM cache into SVM's memory.
> + * Only flush the pages that be dirtied by PVM or SVM or both.
> + */
> +void colo_flush_ram_cache(void)
> +{
> +    RAMBlock *block = NULL;
> +    void *dst_host;
> +    void *src_host;
> +    ram_addr_t offset = 0;
> +
> +    trace_colo_flush_ram_cache_begin(migration_dirty_pages);
> +    rcu_read_lock();
> +    block = QLIST_FIRST_RCU(&ram_list.blocks);
> +    while (block) {
> +        ram_addr_t ram_addr_abs;
> +        offset = migration_bitmap_find_dirty(block, offset, &ram_addr_abs);
> +        migration_bitmap_clear_dirty(ram_addr_abs);
> +        if (offset >= block->used_length) {
> +            offset = 0;
> +            block = QLIST_NEXT_RCU(block, next);
> +        } else {
> +            dst_host = block->host + offset;
> +            src_host = block->colo_cache + offset;
> +            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> +        }
> +    }
> +    rcu_read_unlock();
> +    trace_colo_flush_ram_cache_end();
> +    assert(migration_dirty_pages == 0);
> +}
> +
>   static SaveVMHandlers savevm_ram_handlers = {
>       .save_live_setup = ram_save_setup,
>       .save_live_iterate = ram_save_iterate,
> diff --git a/trace-events b/trace-events
> index 39fdd8d..7f76029 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1264,6 +1264,8 @@ migration_throttle(void) ""
>   ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>   ram_postcopy_send_discard_bitmap(void) ""
>   ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
> +colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
> +colo_flush_ram_cache_end(void) ""
>
>   # hw/display/qxl.c
>   disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support
  2015-12-15  9:46   ` Wen Congyang
@ 2015-12-15 11:19     ` Hailiang Zhang
  2015-12-15 11:31     ` Hailiang Zhang
  1 sibling, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-15 11:19 UTC (permalink / raw)
  To: Wen Congyang, qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 17:46, Wen Congyang wrote:
> On 12/15/2015 04:22 PM, zhanghailiang wrote:
>> configure --enable-colo/--disable-colo to switch COLO
>> support on/off.
>> COLO support is On by default.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>> v11:
>> - Turn COLO on in default (Eric's suggestion)
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>
> I think you forgot to remove this line.
>

Hmm, odd, i found some other patches of this series contain this redundant 'Signed-off-by' too.
Maybe i done something wrong when made patches, i will figure out it. Thanks.

> Thanks
> Wen Congyang
>
>> ---
>>   configure | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/configure b/configure
>> index b9552fd..32e466f 100755
>> --- a/configure
>> +++ b/configure
>> @@ -260,6 +260,7 @@ xfs=""
>>   vhost_net="no"
>>   vhost_scsi="no"
>>   kvm="no"
>> +colo="yes"
>>   rdma=""
>>   gprof="no"
>>   debug_tcg="no"
>> @@ -939,6 +940,10 @@ for opt do
>>     ;;
>>     --enable-kvm) kvm="yes"
>>     ;;
>> +  --disable-colo) colo="no"
>> +  ;;
>> +  --enable-colo) colo="yes"
>> +  ;;
>>     --disable-tcg-interpreter) tcg_interpreter="no"
>>     ;;
>>     --enable-tcg-interpreter) tcg_interpreter="yes"
>> @@ -1362,6 +1367,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>>     fdt             fdt device tree
>>     bluez           bluez stack connectivity
>>     kvm             KVM acceleration support
>> +  colo            COarse-grain LOck-stepping VM for Non-stop Service
>>     rdma            RDMA-based migration support
>>     uuid            uuid support
>>     vde             support for vde network
>> @@ -4792,6 +4798,7 @@ echo "Linux AIO support $linux_aio"
>>   echo "ATTR/XATTR support $attr"
>>   echo "Install blobs     $blobs"
>>   echo "KVM support       $kvm"
>> +echo "COLO support      $colo"
>>   echo "RDMA support      $rdma"
>>   echo "TCG interpreter   $tcg_interpreter"
>>   echo "fdt support       $fdt"
>> @@ -5381,6 +5388,10 @@ if have_backend "ftrace"; then
>>   fi
>>   echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
>>
>> +if test "$colo" = "yes"; then
>> +  echo "CONFIG_COLO=y" >> $config_host_mak
>> +fi
>> +
>>   if test "$rdma" = "yes" ; then
>>     echo "CONFIG_RDMA=y" >> $config_host_mak
>>   fi
>>
>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
@ 2015-12-15 11:31   ` Dr. David Alan Gilbert
  2015-12-25  6:13     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 11:31 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Paolo Bonzini,
	hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If VM is in COLO FT state, we should do some extra work before normal shutdown
> process. SVM will ignore the shutdown command if this command is issued directly
> to it, PVM will send the shutdown command to SVM if it gets this command.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  include/sysemu/sysemu.h |  3 +++
>  migration/colo.c        | 25 +++++++++++++++++++++++--
>  qapi-schema.json        |  4 +++-
>  vl.c                    | 26 ++++++++++++++++++++++++--
>  4 files changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 3bb8897..91eeda3 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -52,6 +52,8 @@ typedef enum WakeupReason {
>      QEMU_WAKEUP_REASON_OTHER,
>  } WakeupReason;
>  
> +extern int colo_shutdown_requested;
> +
>  void qemu_system_reset_request(void);
>  void qemu_system_suspend_request(void);
>  void qemu_register_suspend_notifier(Notifier *notifier);
> @@ -59,6 +61,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
>  void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>  void qemu_register_wakeup_notifier(Notifier *notifier);
>  void qemu_system_shutdown_request(void);
> +void qemu_system_shutdown_request_core(void);
>  void qemu_system_powerdown_request(void);
>  void qemu_register_powerdown_notifier(Notifier *notifier);
>  void qemu_system_debug_request(void);
> diff --git a/migration/colo.c b/migration/colo.c
> index f4bb661..a094991 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -231,6 +231,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>                                            QEMUSizedBuffer *buffer)
>  {
>      int ret;
> +    int colo_shutdown;
>      size_t size;
>      QEMUFile *trans = NULL;
>  
> @@ -258,6 +259,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>          ret = -1;
>          goto out;
>      }
> +    colo_shutdown = colo_shutdown_requested;
>      vm_stop_force_state(RUN_STATE_COLO);
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("run", "stop");
> @@ -311,6 +313,15 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>          goto out;
>      }
>  
> +    if (colo_shutdown) {
> +        colo_put_cmd(s->to_dst_file, COLO_COMMAND_GUEST_SHUTDOWN);
> +        qemu_fflush(s->to_dst_file);
> +        colo_shutdown_requested = 0;
> +        qemu_system_shutdown_request_core();
> +        /* Fix me: Just let the colo thread exit ? */
> +        qemu_thread_exit(0);
> +    }
> +
>      ret = 0;
>      /* Resume primary guest */
>      qemu_mutex_lock_iothread();
> @@ -370,8 +381,9 @@ static void colo_process_checkpoint(MigrationState *s)
>          }
>  
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -        if (current_time - checkpoint_time <
> -            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
> +        if ((current_time - checkpoint_time <
> +            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) &&
> +            !colo_shutdown_requested) {
>              int64_t delay_ms;
>  
>              delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
> @@ -442,6 +454,15 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>      case COLO_COMMAND_CHECKPOINT_REQUEST:
>          *checkpoint_request = 1;
>          return 0;
> +    case COLO_COMMAND_GUEST_SHUTDOWN:
> +        qemu_mutex_lock_iothread();
> +        vm_stop_force_state(RUN_STATE_COLO);
> +        qemu_system_shutdown_request_core();
> +        qemu_mutex_unlock_iothread();
> +        /* the main thread will exit and termiante the whole

Typo 'termiante'

> +        * process, do we need some cleanup?
> +        */
> +        qemu_thread_exit(0);

Yes, I'm not sure how much real cleanup you need during shutdown;
I wonder how a shutdown will look to the management layers above;
if they don't realise it's a shutdown they might try and do a failover
when one side exits.

>      default:
>          return -EINVAL;
>      }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index f6ecb88..b5b1a02 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -754,12 +754,14 @@
>  #
>  # @vmstate-loaded: VM's state has been loaded by SVM.
>  #
> +# @guest-shutdown: shutdown require from PVM to SVM
> +#
>  # Since: 2.6
>  ##
>  { 'enum': 'COLOCommand',
>    'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>              'vmstate-send', 'vmstate-size','vmstate-received',
> -            'vmstate-loaded' ] }
> +            'vmstate-loaded', 'guest-shutdown' ] }
>  
>  ##
>  # @COLOMode
> diff --git a/vl.c b/vl.c
> index fca630b..1a61300 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1636,6 +1636,8 @@ static NotifierList wakeup_notifiers =
>      NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
>  static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
>  
> +int colo_shutdown_requested;
> +
>  int qemu_shutdown_requested_get(void)
>  {
>      return shutdown_requested;
> @@ -1767,6 +1769,10 @@ void qemu_system_guest_panicked(void)
>  void qemu_system_reset_request(void)
>  {
>      if (no_reboot) {
> +        qemu_system_shutdown_request();
> +        if (!shutdown_requested) {/* colo handle it ? */
> +            return;
> +        }
>          shutdown_requested = 1;

Do we still need that 'shutdown_requested = 1'  - it's already
true at this point or it returned?

>      } else {
>          reset_requested = 1;
> @@ -1840,14 +1846,30 @@ void qemu_system_killed(int signal, pid_t pid)
>      qemu_notify_event();
>  }
>  
> -void qemu_system_shutdown_request(void)
> +void qemu_system_shutdown_request_core(void)
>  {
> -    trace_qemu_system_shutdown_request();
>      replay_shutdown_request();
>      shutdown_requested = 1;
>      qemu_notify_event();
>  }
>  
> +void qemu_system_shutdown_request(void)
> +{
> +    trace_qemu_system_shutdown_request();
> +    /*
> +    * if in colo mode, we need do some significant work before respond to the
> +    * shutdown request.
> +    */
> +    if (migration_incoming_in_colo_state()) {
> +        return ; /* primary's responsibility */
> +    }
> +    if (migration_in_colo_state()) {
> +        colo_shutdown_requested = 1;
> +        return;
> +    }

Try to move most of this into migration/colo*.c ;
here you could just do:
    if (colo_shutdown()) {
        return;
    }

it's best to keep vl.c as simple as possible.

Dave

> +    qemu_system_shutdown_request_core();
> +}
> +
>  static void qemu_system_powerdown(void)
>  {
>      qapi_event_send_powerdown(&error_abort);
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support
  2015-12-15  9:46   ` Wen Congyang
  2015-12-15 11:19     ` Hailiang Zhang
@ 2015-12-15 11:31     ` Hailiang Zhang
  1 sibling, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-15 11:31 UTC (permalink / raw)
  To: Wen Congyang, qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 17:46, Wen Congyang wrote:
> On 12/15/2015 04:22 PM, zhanghailiang wrote:
>> configure --enable-colo/--disable-colo to switch COLO
>> support on/off.
>> COLO support is On by default.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>> v11:
>> - Turn COLO on in default (Eric's suggestion)
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>
> I think you forgot to remove this line.
>

Oops, i used the wrong git command to make these patches:
'git format-patch -s', with this '-s' parameter, it will add
the 'Signed-off-by' tag automatically. I will take care in next version, Thanks.

> Thanks
> Wen Congyang
>
>> ---
>>   configure | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/configure b/configure
>> index b9552fd..32e466f 100755
>> --- a/configure
>> +++ b/configure
>> @@ -260,6 +260,7 @@ xfs=""
>>   vhost_net="no"
>>   vhost_scsi="no"
>>   kvm="no"
>> +colo="yes"
>>   rdma=""
>>   gprof="no"
>>   debug_tcg="no"
>> @@ -939,6 +940,10 @@ for opt do
>>     ;;
>>     --enable-kvm) kvm="yes"
>>     ;;
>> +  --disable-colo) colo="no"
>> +  ;;
>> +  --enable-colo) colo="yes"
>> +  ;;
>>     --disable-tcg-interpreter) tcg_interpreter="no"
>>     ;;
>>     --enable-tcg-interpreter) tcg_interpreter="yes"
>> @@ -1362,6 +1367,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>>     fdt             fdt device tree
>>     bluez           bluez stack connectivity
>>     kvm             KVM acceleration support
>> +  colo            COarse-grain LOck-stepping VM for Non-stop Service
>>     rdma            RDMA-based migration support
>>     uuid            uuid support
>>     vde             support for vde network
>> @@ -4792,6 +4798,7 @@ echo "Linux AIO support $linux_aio"
>>   echo "ATTR/XATTR support $attr"
>>   echo "Install blobs     $blobs"
>>   echo "KVM support       $kvm"
>> +echo "COLO support      $colo"
>>   echo "RDMA support      $rdma"
>>   echo "TCG interpreter   $tcg_interpreter"
>>   echo "fdt support       $fdt"
>> @@ -5381,6 +5388,10 @@ if have_backend "ftrace"; then
>>   fi
>>   echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
>>
>> +if test "$colo" = "yes"; then
>> +  echo "CONFIG_COLO=y" >> $config_host_mak
>> +fi
>> +
>>   if test "$rdma" = "yes" ; then
>>     echo "CONFIG_RDMA=y" >> $config_host_mak
>>   fi
>>
>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state zhanghailiang
@ 2015-12-15 11:52   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 11:52 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
> after going into colo state.
> So it is necessary to update the global runstate after going into colo state.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/colo.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index a094991..62a0444 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -374,6 +374,11 @@ static void colo_process_checkpoint(MigrationState *s)
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> +    ret = global_state_store();
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      while (s->state == MIGRATION_STATUS_COLO) {
>          if (failover_request_is_active()) {
>              error_report("failover request");
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
@ 2015-12-15 12:08   ` Dr. David Alan Gilbert
  2015-12-25  6:37     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 12:08 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> qemu_loadvm_state is too long, and we can simplify it by splitting up
> with three helper functions.

Yes, good idea.

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/savevm.c | 161 ++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 97 insertions(+), 64 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index f102870..c7c26d8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1710,90 +1710,123 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
>      }
>  }
>  
> +static int
> +qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
> +{
> +    uint32_t instance_id, version_id, section_id;
> +    SaveStateEntry *se;
> +    LoadStateEntry *le;
> +    char idstr[256];
> +    int ret;
> +
> +    /* Read section start */
> +    section_id = qemu_get_be32(f);
> +    if (!qemu_get_counted_string(f, idstr)) {
> +        error_report("Unable to read ID string for section %u",
> +                     section_id);
> +        return -EINVAL;
> +    }
> +    instance_id = qemu_get_be32(f);
> +    version_id = qemu_get_be32(f);
> +
> +    trace_qemu_loadvm_state_section_startfull(section_id, idstr,
> +            instance_id, version_id);
> +    /* Find savevm section */
> +    se = find_se(idstr, instance_id);
> +    if (se == NULL) {
> +        error_report("Unknown savevm section or instance '%s' %d",
> +                     idstr, instance_id);
> +        ret = -EINVAL;
> +        return ret;

Minor; you don't need 'ret' there, just return -EINVAL.

> +    }
> +
> +    /* Validate version */
> +    if (version_id > se->version_id) {
> +        error_report("savevm: unsupported version %d for '%s' v%d",
> +                     version_id, idstr, se->version_id);
> +        ret = -EINVAL;
> +        return ret;

same

> +    }
> +
> +    /* Add entry */
> +    le = g_malloc0(sizeof(*le));
> +
> +    le->se = se;
> +    le->section_id = section_id;
> +    le->version_id = version_id;
> +    QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
> +
> +    ret = vmstate_load(f, le->se, le->version_id);
> +    if (ret < 0) {
> +        error_report("error while loading state for instance 0x%x of"
> +                     " device '%s'", instance_id, idstr);
> +        return ret;
> +    }
> +    if (!check_section_footer(f, le)) {
> +        ret = -EINVAL;
> +        return ret;

same.

> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
> +{
> +    uint32_t section_id;
> +    LoadStateEntry *le;
> +    int ret;
> +
> +    section_id = qemu_get_be32(f);
> +
> +    trace_qemu_loadvm_state_section_partend(section_id);
> +    QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
> +        if (le->section_id == section_id) {
> +            break;
> +        }
> +    }
> +    if (le == NULL) {
> +        error_report("Unknown savevm section %d", section_id);
> +        ret = -EINVAL;
> +        return ret;

same

> +    }
> +
> +    ret = vmstate_load(f, le->se, le->version_id);
> +    if (ret < 0) {
> +        error_report("error while loading state section id %d(%s)",
> +                     section_id, le->se->idstr);
> +        return ret;
> +    }
> +    if (!check_section_footer(f, le)) {
> +        ret = -EINVAL;
> +        return ret;

same

> +    }
> +
> +    return 0;
> +}
> +
>  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>      uint8_t section_type;
>      int ret;
>  
>      while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> -        uint32_t instance_id, version_id, section_id;
> -        SaveStateEntry *se;
> -        LoadStateEntry *le;
> -        char idstr[256];
>  
>          trace_qemu_loadvm_state_section(section_type);
>          switch (section_type) {
>          case QEMU_VM_SECTION_START:
>          case QEMU_VM_SECTION_FULL:
> -            /* Read section start */
> -            section_id = qemu_get_be32(f);
> -            if (!qemu_get_counted_string(f, idstr)) {
> -                error_report("Unable to read ID string for section %u",
> -                            section_id);
> -                return -EINVAL;
> -            }
> -            instance_id = qemu_get_be32(f);
> -            version_id = qemu_get_be32(f);
> -
> -            trace_qemu_loadvm_state_section_startfull(section_id, idstr,
> -                                                      instance_id, version_id);
> -            /* Find savevm section */
> -            se = find_se(idstr, instance_id);
> -            if (se == NULL) {
> -                error_report("Unknown savevm section or instance '%s' %d",
> -                             idstr, instance_id);
> -                return -EINVAL;
> -            }
> -
> -            /* Validate version */
> -            if (version_id > se->version_id) {
> -                error_report("savevm: unsupported version %d for '%s' v%d",
> -                             version_id, idstr, se->version_id);
> -                return -EINVAL;
> -            }
> -
> -            /* Add entry */
> -            le = g_malloc0(sizeof(*le));
> -
> -            le->se = se;
> -            le->section_id = section_id;
> -            le->version_id = version_id;
> -            QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
> -
> -            ret = vmstate_load(f, le->se, le->version_id);
> +            ret = qemu_loadvm_section_start_full(f, mis);
>              if (ret < 0) {
> -                error_report("error while loading state for instance 0x%x of"
> -                             " device '%s'", instance_id, idstr);
>                  return ret;
>              }
> -            if (!check_section_footer(f, le)) {
> -                return -EINVAL;
> -            }
>              break;
>          case QEMU_VM_SECTION_PART:
>          case QEMU_VM_SECTION_END:
> -            section_id = qemu_get_be32(f);
> -
> -            trace_qemu_loadvm_state_section_partend(section_id);
> -            QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
> -                if (le->section_id == section_id) {
> -                    break;
> -                }
> -            }
> -            if (le == NULL) {
> -                error_report("Unknown savevm section %d", section_id);
> -                return -EINVAL;
> -            }
> -
> -            ret = vmstate_load(f, le->se, le->version_id);
> +            ret = qemu_loadvm_section_part_end(f, mis);
>              if (ret < 0) {
> -                error_report("error while loading state section id %d(%s)",
> -                             section_id, le->se->idstr);
>                  return ret;
>              }
> -            if (!check_section_footer(f, le)) {
> -                return -EINVAL;
> -            }
>              break;
>          case QEMU_VM_COMMAND:
>              ret = loadvm_process_command(f);
> -- 
> 1.8.3.1


Other than the minor return fixups;

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
                   ` (37 preceding siblings ...)
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 38/38] COLO: Add block replication into colo process zhanghailiang
@ 2015-12-15 12:14 ` Dr. David Alan Gilbert
  2015-12-15 12:41   ` Hailiang Zhang
  38 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 12:14 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> This is the 12th version of COLO.
> 
> As usual, this version of COLO is only support periodic checkpoint,
> just like MicroCheckpointing and Remus does.
> 
> Here is only COLO frame part, you can get the whole codes from github:
> https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode

Hi,
  Have you tried wiring in Zhang Chen's new userland colo proxy yet?
I'd like to start trying it out.

Dave

> Test procedure:
> 1. Startup qemu
> Primary side:
> #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
> Secondary side:
> #x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
> 2. On Secondary VM's QEMU monitor, issue command
> {'execute':'qmp_capabilities'}
> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
> {'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
> 
> 3. On Primary VM's QEMU monitor, issue command:
> {'execute':'qmp_capabilities'}
> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
> {'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
> {'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
> 
> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
> You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
> to change the checkpoint period time.
> 
> 5. Failover test
> You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
> monitor at the same time, then SVM will failover and client will not feel this 
> change.
> 
> Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
> issue block related command to stop block replication.
> Primary:
>   Remove the nbd child from the quorum:
>   { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>   Note: there is no qmp command to remove the blockdev now
> 
> Secondary:
>   The primary host is down, so we should do the following thing:
>   { 'execute': 'nbd-server-stop' }
> 
> Please review, thanks.
> 
> TODO:
> 1. Implement packets compare module (proxy) in qemu (Doing)
> 2. Checkpoint based on proxy in qemu
> 3. The capability of continuous FT
> 
> v12:
>  - Fix the bug that default buffer filter broken vhost-net.
>  - Add an flag in struct NetFilterState to help skipping default
>   filter for packets travelling through filter layer.
>  - Remove the default failover treatment which may cause split-brain.
>  - Rename checkpoint-delay to x-checkpoint-delay.
>  - Check if all netdev supports default filter before going into COLO.
>  - Reconstruct send/receive helper functions in patch 10.
>  - Address serveral other comments from Dave 
> 
> v11:
>  - Re-implement buffer/release packets based on filter-buffer according
>    to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
>  - Rebase master to re-use some stuff introduced by post-copy.
>  - Address several comments from Eric and Dave, the fixing record can
>    be found in each patch.
> 
> v10:
>  - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
>  - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
>  - Simplify the process of primary side by dropping colo thread and reusing
>    migration thread. (Dave's suggestion)
>  - Add several netfilter related APIs to support buffer/release packets
>    for COLO (patch 32 ~ patch 36)
> 
> zhanghailiang (38):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'x-colo' to migration
>   COLO: migrate colo related info to secondary node
>   migration: Export migrate_set_state()
>   migration: Add state records for migration incoming
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   migration: Rename the'file' member of MigrationState
>   COLO/migration: Create a new communication path from destination to
>     source
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save PVM state to secondary side when do checkpoint
>   ram: Split host_from_stream_offset() into two helper functions
>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>   ram/COLO: Record the dirty pages that SVM received
>   COLO: Load VMState into qsb before restore it
>   COLO: Flush PVM's cached RAM into SVM's memory
>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
>   COLO: synchronize PVM's state to SVM periodically
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Introduce state to record failover process
>   COLO: Implement failover work for Primary VM
>   COLO: Implement failover work for Secondary VM
>   qmp event: Add event notification for COLO error
>   COLO failover: Shutdown related socket fd when do failover
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Process shutdown command for VM in COLO state
>   COLO: Update the global runstate after going into colo state
>   savevm: Split load vm state function qemu_loadvm_state
>   COLO: Separate the process of saving/loading ram and device state
>   COLO: Split qemu_savevm_state_begin out of checkpoint process
>   net/filter-buffer: Add default filter-buffer for each netdev
>   filter-buffer: Accept zero interval
>   filter-buffer: Introduce a helper function to enable/disable default
>     filter
>   filter-buffer: Introduce a helper function to release packets
>   colo: Use default buffer-filter to buffer and release packets
>   COLO: Add block replication into colo process
> 
>  configure                     |  11 +
>  docs/qmp-events.txt           |  17 +
>  hmp-commands.hx               |  15 +
>  hmp.c                         |  15 +
>  hmp.h                         |   1 +
>  include/exec/ram_addr.h       |   9 +-
>  include/migration/colo.h      |  38 +++
>  include/migration/failover.h  |  33 ++
>  include/migration/migration.h |  18 +-
>  include/migration/qemu-file.h |   3 +-
>  include/net/filter.h          |  12 +
>  include/net/net.h             |   5 +
>  include/sysemu/sysemu.h       |   9 +
>  migration/Makefile.objs       |   2 +
>  migration/colo-comm.c         |  71 ++++
>  migration/colo-failover.c     |  83 +++++
>  migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
>  migration/exec.c              |   4 +-
>  migration/fd.c                |   4 +-
>  migration/migration.c         | 216 ++++++++----
>  migration/postcopy-ram.c      |   6 +-
>  migration/qemu-file-buf.c     |  61 ++++
>  migration/ram.c               | 213 ++++++++++--
>  migration/rdma.c              |   2 +-
>  migration/savevm.c            | 295 ++++++++++++----
>  migration/tcp.c               |   4 +-
>  migration/unix.c              |   4 +-
>  net/filter-buffer.c           | 127 ++++++-
>  net/filter.c                  |   6 +-
>  net/net.c                     |  58 ++++
>  qapi-schema.json              | 106 +++++-
>  qapi/event.json               |  17 +
>  qmp-commands.hx               |  24 +-
>  stubs/Makefile.objs           |   1 +
>  stubs/migration-colo.c        |  45 +++
>  trace-events                  |  10 +
>  vl.c                          |  37 +-
>  37 files changed, 2152 insertions(+), 195 deletions(-)
>  create mode 100644 include/migration/colo.h
>  create mode 100644 include/migration/failover.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-15 12:14 ` [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
@ 2015-12-15 12:41   ` Hailiang Zhang
  2015-12-17 10:52     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-15 12:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> This is the 12th version of COLO.
>>
>> As usual, this version of COLO is only support periodic checkpoint,
>> just like MicroCheckpointing and Remus does.
>>
>> Here is only COLO frame part, you can get the whole codes from github:
>> https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
>
> Hi,
>    Have you tried wiring in Zhang Chen's new userland colo proxy yet?
> I'd like to start trying it out.
>

Not yet, actually, for frame part, we can re-use most of the previous codes that based on
kernel proxy. And, yes, please, you are welcome to join us. ;)

> Dave
>
>> Test procedure:
>> 1. Startup qemu
>> Primary side:
>> #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
>> Secondary side:
>> #x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
>> 2. On Secondary VM's QEMU monitor, issue command
>> {'execute':'qmp_capabilities'}
>> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
>> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
>> {'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
>>
>> 3. On Primary VM's QEMU monitor, issue command:
>> {'execute':'qmp_capabilities'}
>> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
>> {'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
>> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
>> {'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
>>
>> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
>> You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
>> to change the checkpoint period time.
>>
>> 5. Failover test
>> You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
>> monitor at the same time, then SVM will failover and client will not feel this
>> change.
>>
>> Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
>> issue block related command to stop block replication.
>> Primary:
>>    Remove the nbd child from the quorum:
>>    { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>>    Note: there is no qmp command to remove the blockdev now
>>
>> Secondary:
>>    The primary host is down, so we should do the following thing:
>>    { 'execute': 'nbd-server-stop' }
>>
>> Please review, thanks.
>>
>> TODO:
>> 1. Implement packets compare module (proxy) in qemu (Doing)
>> 2. Checkpoint based on proxy in qemu
>> 3. The capability of continuous FT
>>
>> v12:
>>   - Fix the bug that default buffer filter broken vhost-net.
>>   - Add an flag in struct NetFilterState to help skipping default
>>    filter for packets travelling through filter layer.
>>   - Remove the default failover treatment which may cause split-brain.
>>   - Rename checkpoint-delay to x-checkpoint-delay.
>>   - Check if all netdev supports default filter before going into COLO.
>>   - Reconstruct send/receive helper functions in patch 10.
>>   - Address serveral other comments from Dave
>>
>> v11:
>>   - Re-implement buffer/release packets based on filter-buffer according
>>     to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
>>   - Rebase master to re-use some stuff introduced by post-copy.
>>   - Address several comments from Eric and Dave, the fixing record can
>>     be found in each patch.
>>
>> v10:
>>   - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
>>   - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
>>   - Simplify the process of primary side by dropping colo thread and reusing
>>     migration thread. (Dave's suggestion)
>>   - Add several netfilter related APIs to support buffer/release packets
>>     for COLO (patch 32 ~ patch 36)
>>
>> zhanghailiang (38):
>>    configure: Add parameter for configure to enable/disable COLO support
>>    migration: Introduce capability 'x-colo' to migration
>>    COLO: migrate colo related info to secondary node
>>    migration: Export migrate_set_state()
>>    migration: Add state records for migration incoming
>>    migration: Integrate COLO checkpoint process into migration
>>    migration: Integrate COLO checkpoint process into loadvm
>>    migration: Rename the'file' member of MigrationState
>>    COLO/migration: Create a new communication path from destination to
>>      source
>>    COLO: Implement colo checkpoint protocol
>>    COLO: Add a new RunState RUN_STATE_COLO
>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>    COLO: Save PVM state to secondary side when do checkpoint
>>    ram: Split host_from_stream_offset() into two helper functions
>>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>>    ram/COLO: Record the dirty pages that SVM received
>>    COLO: Load VMState into qsb before restore it
>>    COLO: Flush PVM's cached RAM into SVM's memory
>>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>>    COLO: synchronize PVM's state to SVM periodically
>>    COLO failover: Introduce a new command to trigger a failover
>>    COLO failover: Introduce state to record failover process
>>    COLO: Implement failover work for Primary VM
>>    COLO: Implement failover work for Secondary VM
>>    qmp event: Add event notification for COLO error
>>    COLO failover: Shutdown related socket fd when do failover
>>    COLO failover: Don't do failover during loading VM's state
>>    COLO: Process shutdown command for VM in COLO state
>>    COLO: Update the global runstate after going into colo state
>>    savevm: Split load vm state function qemu_loadvm_state
>>    COLO: Separate the process of saving/loading ram and device state
>>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>>    net/filter-buffer: Add default filter-buffer for each netdev
>>    filter-buffer: Accept zero interval
>>    filter-buffer: Introduce a helper function to enable/disable default
>>      filter
>>    filter-buffer: Introduce a helper function to release packets
>>    colo: Use default buffer-filter to buffer and release packets
>>    COLO: Add block replication into colo process
>>
>>   configure                     |  11 +
>>   docs/qmp-events.txt           |  17 +
>>   hmp-commands.hx               |  15 +
>>   hmp.c                         |  15 +
>>   hmp.h                         |   1 +
>>   include/exec/ram_addr.h       |   9 +-
>>   include/migration/colo.h      |  38 +++
>>   include/migration/failover.h  |  33 ++
>>   include/migration/migration.h |  18 +-
>>   include/migration/qemu-file.h |   3 +-
>>   include/net/filter.h          |  12 +
>>   include/net/net.h             |   5 +
>>   include/sysemu/sysemu.h       |   9 +
>>   migration/Makefile.objs       |   2 +
>>   migration/colo-comm.c         |  71 ++++
>>   migration/colo-failover.c     |  83 +++++
>>   migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
>>   migration/exec.c              |   4 +-
>>   migration/fd.c                |   4 +-
>>   migration/migration.c         | 216 ++++++++----
>>   migration/postcopy-ram.c      |   6 +-
>>   migration/qemu-file-buf.c     |  61 ++++
>>   migration/ram.c               | 213 ++++++++++--
>>   migration/rdma.c              |   2 +-
>>   migration/savevm.c            | 295 ++++++++++++----
>>   migration/tcp.c               |   4 +-
>>   migration/unix.c              |   4 +-
>>   net/filter-buffer.c           | 127 ++++++-
>>   net/filter.c                  |   6 +-
>>   net/net.c                     |  58 ++++
>>   qapi-schema.json              | 106 +++++-
>>   qapi/event.json               |  17 +
>>   qmp-commands.hx               |  24 +-
>>   stubs/Makefile.objs           |   1 +
>>   stubs/migration-colo.c        |  45 +++
>>   trace-events                  |  10 +
>>   vl.c                          |  37 +-
>>   37 files changed, 2152 insertions(+), 195 deletions(-)
>>   create mode 100644 include/migration/colo.h
>>   create mode 100644 include/migration/failover.h
>>   create mode 100644 migration/colo-comm.c
>>   create mode 100644 migration/colo-failover.c
>>   create mode 100644 migration/colo.c
>>   create mode 100644 stubs/migration-colo.c
>>
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming zhanghailiang
@ 2015-12-15 17:36   ` Dr. David Alan Gilbert
  2015-12-16  5:37     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-15 17:36 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> For migration destination, we also need to know its state,
> we will use it in COLO.
> 
> Here we add a new member 'state' for MigrationIncomingState,
> and also use migrate_set_state() to modify its value.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Actually note there is a bug here; see below

> ---
> v11:
> - Split exporting migrate_set_state() part into a new patch (Juan's suggestion)
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  include/migration/migration.h |  1 +
>  migration/migration.c         | 14 +++++++++-----
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 4b19e80..99dfa92 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -105,6 +105,7 @@ struct MigrationIncomingState {
>      QemuMutex rp_mutex;    /* We send replies from multiple threads */
>      void     *postcopy_tmp_page;
>  
> +    int state;
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
>  };
> diff --git a/migration/migration.c b/migration/migration.c
> index c9cd80d..d58ce98 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -112,6 +112,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
>  {
>      mis_current = g_new0(MigrationIncomingState, 1);
>      mis_current->from_src_file = f;
> +    mis_current->state = MIGRATION_STATUS_NONE;
>      QLIST_INIT(&mis_current->loadvm_handlers);
>      qemu_mutex_init(&mis_current->rp_mutex);
>      qemu_event_init(&mis_current->main_thread_load_event, false);
> @@ -332,8 +333,8 @@ static void process_incoming_migration_co(void *opaque)
>  
>      mis = migration_incoming_state_new(f);
>      postcopy_state_set(POSTCOPY_INCOMING_NONE);
> -    migrate_generate_event(MIGRATION_STATUS_ACTIVE);
> -
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
> +                      MIGRATION_STATUS_ACTIVE);
>      ret = qemu_loadvm_state(f);
>  
>      ps = postcopy_state_get();
> @@ -362,7 +363,8 @@ static void process_incoming_migration_co(void *opaque)
>      migration_incoming_state_destroy();

We're freeing mis now - we can't use the state later!

>  
>      if (ret < 0) {
> -        migrate_generate_event(MIGRATION_STATUS_FAILED);
> +        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                          MIGRATION_STATUS_FAILED);
>          error_report("load of migration failed: %s", strerror(-ret));
>          migrate_decompress_threads_join();
>          exit(EXIT_FAILURE);
> @@ -371,7 +373,8 @@ static void process_incoming_migration_co(void *opaque)
>      /* Make sure all file formats flush their mutable metadata */
>      bdrv_invalidate_cache_all(&local_err);
>      if (local_err) {
> -        migrate_generate_event(MIGRATION_STATUS_FAILED);
> +        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                          MIGRATION_STATUS_FAILED);
>          error_report_err(local_err);
>          migrate_decompress_threads_join();
>          exit(EXIT_FAILURE);
> @@ -403,7 +406,8 @@ static void process_incoming_migration_co(void *opaque)
>       * observer sees this event they might start to prod at the VM assuming
>       * it's ready to use.
>       */
> -    migrate_generate_event(MIGRATION_STATUS_COMPLETED);
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                      MIGRATION_STATUS_COMPLETED);

So I moved the migration_incoming_state_destroy()  to here in my world.

Dave

>  }
>  
>  void process_incoming_migration(QEMUFile *f)
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming
  2015-12-15 17:36   ` Dr. David Alan Gilbert
@ 2015-12-16  5:37     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-16  5:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/16 1:36, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> For migration destination, we also need to know its state,
>> we will use it in COLO.
>>
>> Here we add a new member 'state' for MigrationIncomingState,
>> and also use migrate_set_state() to modify its value.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Actually note there is a bug here; see below
>
>> ---
>> v11:
>> - Split exporting migrate_set_state() part into a new patch (Juan's suggestion)
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/migration/migration.h |  1 +
>>   migration/migration.c         | 14 +++++++++-----
>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index 4b19e80..99dfa92 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -105,6 +105,7 @@ struct MigrationIncomingState {
>>       QemuMutex rp_mutex;    /* We send replies from multiple threads */
>>       void     *postcopy_tmp_page;
>>
>> +    int state;
>>       /* See savevm.c */
>>       LoadStateEntry_Head loadvm_handlers;
>>   };
>> diff --git a/migration/migration.c b/migration/migration.c
>> index c9cd80d..d58ce98 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -112,6 +112,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
>>   {
>>       mis_current = g_new0(MigrationIncomingState, 1);
>>       mis_current->from_src_file = f;
>> +    mis_current->state = MIGRATION_STATUS_NONE;
>>       QLIST_INIT(&mis_current->loadvm_handlers);
>>       qemu_mutex_init(&mis_current->rp_mutex);
>>       qemu_event_init(&mis_current->main_thread_load_event, false);
>> @@ -332,8 +333,8 @@ static void process_incoming_migration_co(void *opaque)
>>
>>       mis = migration_incoming_state_new(f);
>>       postcopy_state_set(POSTCOPY_INCOMING_NONE);
>> -    migrate_generate_event(MIGRATION_STATUS_ACTIVE);
>> -
>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
>> +                      MIGRATION_STATUS_ACTIVE);
>>       ret = qemu_loadvm_state(f);
>>
>>       ps = postcopy_state_get();
>> @@ -362,7 +363,8 @@ static void process_incoming_migration_co(void *opaque)
>>       migration_incoming_state_destroy();
>
> We're freeing mis now - we can't use the state later!
>
>>
>>       if (ret < 0) {
>> -        migrate_generate_event(MIGRATION_STATUS_FAILED);
>> +        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> +                          MIGRATION_STATUS_FAILED);
>>           error_report("load of migration failed: %s", strerror(-ret));
>>           migrate_decompress_threads_join();
>>           exit(EXIT_FAILURE);
>> @@ -371,7 +373,8 @@ static void process_incoming_migration_co(void *opaque)
>>       /* Make sure all file formats flush their mutable metadata */
>>       bdrv_invalidate_cache_all(&local_err);
>>       if (local_err) {
>> -        migrate_generate_event(MIGRATION_STATUS_FAILED);
>> +        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> +                          MIGRATION_STATUS_FAILED);
>>           error_report_err(local_err);
>>           migrate_decompress_threads_join();
>>           exit(EXIT_FAILURE);
>> @@ -403,7 +406,8 @@ static void process_incoming_migration_co(void *opaque)
>>        * observer sees this event they might start to prod at the VM assuming
>>        * it's ready to use.
>>        */
>> -    migrate_generate_event(MIGRATION_STATUS_COMPLETED);
>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> +                      MIGRATION_STATUS_COMPLETED);
>
> So I moved the migration_incoming_state_destroy()  to here in my world.
>

Yes, it is bug, thank you for fixing it.

> Dave
>
>>   }
>>
>>   void process_incoming_migration(QEMUFile *f)
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover
  2015-12-15 10:23   ` Dr. David Alan Gilbert
@ 2015-12-16  5:58     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-16  5:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 18:23, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If the net connection between COLO's two sides is broken while colo/colo incoming
>> thread is blocked in 'read'/'write' socket fd. It will not detect this error until
>> connect timeout. It will be a long time.
>>
>> Here we shutdown all the related socket file descriptors to wake up the blocking
>> operation in failover BH. Besides, we should close the corresponding file descriptors
>> after failvoer BH shutdown them, or there will be an error.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v12:
>> - Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion)
>> v11:
>> - Only shutdown fd for once
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/colo.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d06c14f..58531e7 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void)
>>           /* recover runstate to normal migration finish state */
>>           autostart = true;
>>       }
>> +    /*
>> +    * Make sure colo incoming thread not block in recv or send,
>> +    * If mis->from_src_file and mis->to_src_file use the same fd,
>> +    * The second shutdown() will return -1, we ignore this value,
>> +    * it is harmless.
>> +    */
>> +    if (mis->from_src_file) {
>> +        qemu_file_shutdown(mis->from_src_file);
>> +    }
>> +    if (mis->to_src_file) {
>> +        qemu_file_shutdown(mis->to_src_file);
>> +    }
>>
>>       old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>>                                      FAILOVER_STATUS_COMPLETED);
>> @@ -82,6 +94,18 @@ static void primary_vm_do_failover(void)
>>       migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>>                         MIGRATION_STATUS_COMPLETED);
>>
>> +    /*
>> +    * Make sure colo thread no block in recv or send,
>> +    * The s->rp_state.from_dst_file and s->to_dst_file may use the
>> +    * same fd, but we still shutdown the fd for twice, it is harmless.
>> +    */
>> +    if (s->to_dst_file) {
>> +        qemu_file_shutdown(s->to_dst_file);
>> +    }
>> +    if (s->rp_state.from_dst_file) {
>> +        qemu_file_shutdown(s->rp_state.from_dst_file);
>> +    }
>> +
>>       old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>>                                      FAILOVER_STATUS_COMPLETED);
>>       if (old_state != FAILOVER_STATUS_HANDLING) {
>> @@ -348,7 +372,7 @@ static void colo_process_checkpoint(MigrationState *s)
>>       }
>>
>>   out:
>> -    if (ret < 0) {
>> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>>           error_report("%s: %s", __func__, strerror(-ret));
>>           qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>>                                     true, strerror(-ret), NULL);
>> @@ -360,6 +384,15 @@ out:
>>       qsb_free(buffer);
>>       buffer = NULL;
>>
>> +    /* Hope this not to be too long to loop here */
>> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
>> +        ;
>> +    }
>> +    /*
>> +    * Must be called after failover BH is completed,
>> +    * Or the failover BH may shutdown the wrong fd, that
>> +    * re-used by other thread after we release here.
>> +    */
>>       if (s->rp_state.from_dst_file) {
>>           qemu_fclose(s->rp_state.from_dst_file);
>>       }
>> @@ -519,7 +552,7 @@ void *colo_process_incoming_thread(void *opaque)
>>       }
>>
>>   out:
>> -    if (ret < 0) {
>> +    if (ret < 0 || (!ret && !failover_request_is_active())) {
>>           error_report("colo incoming thread will exit, detect error: %s",
>>                        strerror(-ret));
>>           qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>> @@ -539,6 +572,11 @@ out:
>>       */
>>       colo_release_ram_cache();
>>
>> +    /* Hope this not to be too long to loop here */
>> +    while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
>> +        ;
>> +    }
>
> Hmm, one thing I just noticed; if there was a failure earlier
> in colo_process_incoming_thread, ret <0, and it 'goto out'
> then I think it gets stuck in this failover loop?
>

Yes, it will get stuck here until users kick it out by using
'x-colo-lost-heartbeat'command, Primary side has the same problem,
Since we can ensure Secondary side has got the whole VM's state after
we go into colo incoming thread, then we can assume it has already
go into the COLO state, if something error happens, Can we treat it as
happened in common COLO process ? (I'm not sure about this, but the
loop is not so good, Using semaphore to notify incoming thread maybe better)


> Dave
>
>> +    /* Must be called after failover BH is completed */
>>       if (mis->to_src_file) {
>>           qemu_fclose(mis->to_src_file);
>>       }
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-15 12:41   ` Hailiang Zhang
@ 2015-12-17 10:52     ` Dr. David Alan Gilbert
  2015-12-18  1:10       ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-17 10:52 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>This is the 12th version of COLO.
> >>
> >>As usual, this version of COLO is only support periodic checkpoint,
> >>just like MicroCheckpointing and Remus does.
> >>
> >>Here is only COLO frame part, you can get the whole codes from github:
> >>https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
> >
> >Hi,
> >   Have you tried wiring in Zhang Chen's new userland colo proxy yet?
> >I'd like to start trying it out.
> >
> 
> Not yet, actually, for frame part, we can re-use most of the previous codes that based on
> kernel proxy. And, yes, please, you are welcome to join us. ;)

Yes, that's certainly something I'll look at immediately at the start of the new year
(I'm out for 2 weeks from Friday).

I've just tested this series on my machines, and it works well.
Two things:
  1) I just posted a patch to add an HMP equivalent to x-blockdev-change
  2) If you run with an older machine type (e.g. pc-i440fx-2.3) then if I failover to the
secondary then I hit a 'invalid runstate transition: 'inmigrate' -> 'prelaunch'';
I guess this is something to do with global_state.

Dave

> >Dave
> >
> >>Test procedure:
> >>1. Startup qemu
> >>Primary side:
> >>#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
> >>Secondary side:
> >>#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
> >>2. On Secondary VM's QEMU monitor, issue command
> >>{'execute':'qmp_capabilities'}
> >>{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
> >>{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
> >>{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
> >>
> >>3. On Primary VM's QEMU monitor, issue command:
> >>{'execute':'qmp_capabilities'}
> >>{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
> >>{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
> >>{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
> >>{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
> >>
> >>4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
> >>You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
> >>to change the checkpoint period time.
> >>
> >>5. Failover test
> >>You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
> >>monitor at the same time, then SVM will failover and client will not feel this
> >>change.
> >>
> >>Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
> >>issue block related command to stop block replication.
> >>Primary:
> >>   Remove the nbd child from the quorum:
> >>   { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
> >>   Note: there is no qmp command to remove the blockdev now
> >>
> >>Secondary:
> >>   The primary host is down, so we should do the following thing:
> >>   { 'execute': 'nbd-server-stop' }
> >>
> >>Please review, thanks.
> >>
> >>TODO:
> >>1. Implement packets compare module (proxy) in qemu (Doing)
> >>2. Checkpoint based on proxy in qemu
> >>3. The capability of continuous FT
> >>
> >>v12:
> >>  - Fix the bug that default buffer filter broken vhost-net.
> >>  - Add an flag in struct NetFilterState to help skipping default
> >>   filter for packets travelling through filter layer.
> >>  - Remove the default failover treatment which may cause split-brain.
> >>  - Rename checkpoint-delay to x-checkpoint-delay.
> >>  - Check if all netdev supports default filter before going into COLO.
> >>  - Reconstruct send/receive helper functions in patch 10.
> >>  - Address serveral other comments from Dave
> >>
> >>v11:
> >>  - Re-implement buffer/release packets based on filter-buffer according
> >>    to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
> >>  - Rebase master to re-use some stuff introduced by post-copy.
> >>  - Address several comments from Eric and Dave, the fixing record can
> >>    be found in each patch.
> >>
> >>v10:
> >>  - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
> >>  - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
> >>  - Simplify the process of primary side by dropping colo thread and reusing
> >>    migration thread. (Dave's suggestion)
> >>  - Add several netfilter related APIs to support buffer/release packets
> >>    for COLO (patch 32 ~ patch 36)
> >>
> >>zhanghailiang (38):
> >>   configure: Add parameter for configure to enable/disable COLO support
> >>   migration: Introduce capability 'x-colo' to migration
> >>   COLO: migrate colo related info to secondary node
> >>   migration: Export migrate_set_state()
> >>   migration: Add state records for migration incoming
> >>   migration: Integrate COLO checkpoint process into migration
> >>   migration: Integrate COLO checkpoint process into loadvm
> >>   migration: Rename the'file' member of MigrationState
> >>   COLO/migration: Create a new communication path from destination to
> >>     source
> >>   COLO: Implement colo checkpoint protocol
> >>   COLO: Add a new RunState RUN_STATE_COLO
> >>   QEMUSizedBuffer: Introduce two help functions for qsb
> >>   COLO: Save PVM state to secondary side when do checkpoint
> >>   ram: Split host_from_stream_offset() into two helper functions
> >>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
> >>   ram/COLO: Record the dirty pages that SVM received
> >>   COLO: Load VMState into qsb before restore it
> >>   COLO: Flush PVM's cached RAM into SVM's memory
> >>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
> >>   COLO: synchronize PVM's state to SVM periodically
> >>   COLO failover: Introduce a new command to trigger a failover
> >>   COLO failover: Introduce state to record failover process
> >>   COLO: Implement failover work for Primary VM
> >>   COLO: Implement failover work for Secondary VM
> >>   qmp event: Add event notification for COLO error
> >>   COLO failover: Shutdown related socket fd when do failover
> >>   COLO failover: Don't do failover during loading VM's state
> >>   COLO: Process shutdown command for VM in COLO state
> >>   COLO: Update the global runstate after going into colo state
> >>   savevm: Split load vm state function qemu_loadvm_state
> >>   COLO: Separate the process of saving/loading ram and device state
> >>   COLO: Split qemu_savevm_state_begin out of checkpoint process
> >>   net/filter-buffer: Add default filter-buffer for each netdev
> >>   filter-buffer: Accept zero interval
> >>   filter-buffer: Introduce a helper function to enable/disable default
> >>     filter
> >>   filter-buffer: Introduce a helper function to release packets
> >>   colo: Use default buffer-filter to buffer and release packets
> >>   COLO: Add block replication into colo process
> >>
> >>  configure                     |  11 +
> >>  docs/qmp-events.txt           |  17 +
> >>  hmp-commands.hx               |  15 +
> >>  hmp.c                         |  15 +
> >>  hmp.h                         |   1 +
> >>  include/exec/ram_addr.h       |   9 +-
> >>  include/migration/colo.h      |  38 +++
> >>  include/migration/failover.h  |  33 ++
> >>  include/migration/migration.h |  18 +-
> >>  include/migration/qemu-file.h |   3 +-
> >>  include/net/filter.h          |  12 +
> >>  include/net/net.h             |   5 +
> >>  include/sysemu/sysemu.h       |   9 +
> >>  migration/Makefile.objs       |   2 +
> >>  migration/colo-comm.c         |  71 ++++
> >>  migration/colo-failover.c     |  83 +++++
> >>  migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
> >>  migration/exec.c              |   4 +-
> >>  migration/fd.c                |   4 +-
> >>  migration/migration.c         | 216 ++++++++----
> >>  migration/postcopy-ram.c      |   6 +-
> >>  migration/qemu-file-buf.c     |  61 ++++
> >>  migration/ram.c               | 213 ++++++++++--
> >>  migration/rdma.c              |   2 +-
> >>  migration/savevm.c            | 295 ++++++++++++----
> >>  migration/tcp.c               |   4 +-
> >>  migration/unix.c              |   4 +-
> >>  net/filter-buffer.c           | 127 ++++++-
> >>  net/filter.c                  |   6 +-
> >>  net/net.c                     |  58 ++++
> >>  qapi-schema.json              | 106 +++++-
> >>  qapi/event.json               |  17 +
> >>  qmp-commands.hx               |  24 +-
> >>  stubs/Makefile.objs           |   1 +
> >>  stubs/migration-colo.c        |  45 +++
> >>  trace-events                  |  10 +
> >>  vl.c                          |  37 +-
> >>  37 files changed, 2152 insertions(+), 195 deletions(-)
> >>  create mode 100644 include/migration/colo.h
> >>  create mode 100644 include/migration/failover.h
> >>  create mode 100644 migration/colo-comm.c
> >>  create mode 100644 migration/colo-failover.c
> >>  create mode 100644 migration/colo.c
> >>  create mode 100644 stubs/migration-colo.c
> >>
> >>--
> >>1.8.3.1
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-17 10:52     ` Dr. David Alan Gilbert
@ 2015-12-18  1:10       ` Hailiang Zhang
  2015-12-18 15:47         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-18  1:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/17 18:52, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> This is the 12th version of COLO.
>>>>
>>>> As usual, this version of COLO is only support periodic checkpoint,
>>>> just like MicroCheckpointing and Remus does.
>>>>
>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>> https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
>>>
>>> Hi,
>>>    Have you tried wiring in Zhang Chen's new userland colo proxy yet?
>>> I'd like to start trying it out.
>>>
>>
>> Not yet, actually, for frame part, we can re-use most of the previous codes that based on
>> kernel proxy. And, yes, please, you are welcome to join us. ;)
>
> Yes, that's certainly something I'll look at immediately at the start of the new year
> (I'm out for 2 weeks from Friday).
>

Great~

> I've just tested this series on my machines, and it works well.

Thank you for the testing.

> Two things:
>    1) I just posted a patch to add an HMP equivalent to x-blockdev-change
>    2) If you run with an older machine type (e.g. pc-i440fx-2.3) then if I failover to the
> secondary then I hit a 'invalid runstate transition: 'inmigrate' -> 'prelaunch'';
> I guess this is something to do with global_state.
>

Yes, we have fixed one problem related to global_state. I didn't test COLO with
older machine type. I will look into it, thanks for reporting it.

Hailiang

> Dave
>
>>> Dave
>>>
>>>> Test procedure:
>>>> 1. Startup qemu
>>>> Primary side:
>>>> #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
>>>> Secondary side:
>>>> #x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
>>>> 2. On Secondary VM's QEMU monitor, issue command
>>>> {'execute':'qmp_capabilities'}
>>>> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
>>>> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
>>>> {'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
>>>>
>>>> 3. On Primary VM's QEMU monitor, issue command:
>>>> {'execute':'qmp_capabilities'}
>>>> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
>>>> {'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
>>>> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
>>>> {'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
>>>>
>>>> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
>>>> You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
>>>> to change the checkpoint period time.
>>>>
>>>> 5. Failover test
>>>> You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
>>>> monitor at the same time, then SVM will failover and client will not feel this
>>>> change.
>>>>
>>>> Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
>>>> issue block related command to stop block replication.
>>>> Primary:
>>>>    Remove the nbd child from the quorum:
>>>>    { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>>>>    Note: there is no qmp command to remove the blockdev now
>>>>
>>>> Secondary:
>>>>    The primary host is down, so we should do the following thing:
>>>>    { 'execute': 'nbd-server-stop' }
>>>>
>>>> Please review, thanks.
>>>>
>>>> TODO:
>>>> 1. Implement packets compare module (proxy) in qemu (Doing)
>>>> 2. Checkpoint based on proxy in qemu
>>>> 3. The capability of continuous FT
>>>>
>>>> v12:
>>>>   - Fix the bug that default buffer filter broken vhost-net.
>>>>   - Add an flag in struct NetFilterState to help skipping default
>>>>    filter for packets travelling through filter layer.
>>>>   - Remove the default failover treatment which may cause split-brain.
>>>>   - Rename checkpoint-delay to x-checkpoint-delay.
>>>>   - Check if all netdev supports default filter before going into COLO.
>>>>   - Reconstruct send/receive helper functions in patch 10.
>>>>   - Address serveral other comments from Dave
>>>>
>>>> v11:
>>>>   - Re-implement buffer/release packets based on filter-buffer according
>>>>     to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
>>>>   - Rebase master to re-use some stuff introduced by post-copy.
>>>>   - Address several comments from Eric and Dave, the fixing record can
>>>>     be found in each patch.
>>>>
>>>> v10:
>>>>   - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
>>>>   - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
>>>>   - Simplify the process of primary side by dropping colo thread and reusing
>>>>     migration thread. (Dave's suggestion)
>>>>   - Add several netfilter related APIs to support buffer/release packets
>>>>     for COLO (patch 32 ~ patch 36)
>>>>
>>>> zhanghailiang (38):
>>>>    configure: Add parameter for configure to enable/disable COLO support
>>>>    migration: Introduce capability 'x-colo' to migration
>>>>    COLO: migrate colo related info to secondary node
>>>>    migration: Export migrate_set_state()
>>>>    migration: Add state records for migration incoming
>>>>    migration: Integrate COLO checkpoint process into migration
>>>>    migration: Integrate COLO checkpoint process into loadvm
>>>>    migration: Rename the'file' member of MigrationState
>>>>    COLO/migration: Create a new communication path from destination to
>>>>      source
>>>>    COLO: Implement colo checkpoint protocol
>>>>    COLO: Add a new RunState RUN_STATE_COLO
>>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>>    COLO: Save PVM state to secondary side when do checkpoint
>>>>    ram: Split host_from_stream_offset() into two helper functions
>>>>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>>>>    ram/COLO: Record the dirty pages that SVM received
>>>>    COLO: Load VMState into qsb before restore it
>>>>    COLO: Flush PVM's cached RAM into SVM's memory
>>>>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>>>>    COLO: synchronize PVM's state to SVM periodically
>>>>    COLO failover: Introduce a new command to trigger a failover
>>>>    COLO failover: Introduce state to record failover process
>>>>    COLO: Implement failover work for Primary VM
>>>>    COLO: Implement failover work for Secondary VM
>>>>    qmp event: Add event notification for COLO error
>>>>    COLO failover: Shutdown related socket fd when do failover
>>>>    COLO failover: Don't do failover during loading VM's state
>>>>    COLO: Process shutdown command for VM in COLO state
>>>>    COLO: Update the global runstate after going into colo state
>>>>    savevm: Split load vm state function qemu_loadvm_state
>>>>    COLO: Separate the process of saving/loading ram and device state
>>>>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>>>>    net/filter-buffer: Add default filter-buffer for each netdev
>>>>    filter-buffer: Accept zero interval
>>>>    filter-buffer: Introduce a helper function to enable/disable default
>>>>      filter
>>>>    filter-buffer: Introduce a helper function to release packets
>>>>    colo: Use default buffer-filter to buffer and release packets
>>>>    COLO: Add block replication into colo process
>>>>
>>>>   configure                     |  11 +
>>>>   docs/qmp-events.txt           |  17 +
>>>>   hmp-commands.hx               |  15 +
>>>>   hmp.c                         |  15 +
>>>>   hmp.h                         |   1 +
>>>>   include/exec/ram_addr.h       |   9 +-
>>>>   include/migration/colo.h      |  38 +++
>>>>   include/migration/failover.h  |  33 ++
>>>>   include/migration/migration.h |  18 +-
>>>>   include/migration/qemu-file.h |   3 +-
>>>>   include/net/filter.h          |  12 +
>>>>   include/net/net.h             |   5 +
>>>>   include/sysemu/sysemu.h       |   9 +
>>>>   migration/Makefile.objs       |   2 +
>>>>   migration/colo-comm.c         |  71 ++++
>>>>   migration/colo-failover.c     |  83 +++++
>>>>   migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
>>>>   migration/exec.c              |   4 +-
>>>>   migration/fd.c                |   4 +-
>>>>   migration/migration.c         | 216 ++++++++----
>>>>   migration/postcopy-ram.c      |   6 +-
>>>>   migration/qemu-file-buf.c     |  61 ++++
>>>>   migration/ram.c               | 213 ++++++++++--
>>>>   migration/rdma.c              |   2 +-
>>>>   migration/savevm.c            | 295 ++++++++++++----
>>>>   migration/tcp.c               |   4 +-
>>>>   migration/unix.c              |   4 +-
>>>>   net/filter-buffer.c           | 127 ++++++-
>>>>   net/filter.c                  |   6 +-
>>>>   net/net.c                     |  58 ++++
>>>>   qapi-schema.json              | 106 +++++-
>>>>   qapi/event.json               |  17 +
>>>>   qmp-commands.hx               |  24 +-
>>>>   stubs/Makefile.objs           |   1 +
>>>>   stubs/migration-colo.c        |  45 +++
>>>>   trace-events                  |  10 +
>>>>   vl.c                          |  37 +-
>>>>   37 files changed, 2152 insertions(+), 195 deletions(-)
>>>>   create mode 100644 include/migration/colo.h
>>>>   create mode 100644 include/migration/failover.h
>>>>   create mode 100644 migration/colo-comm.c
>>>>   create mode 100644 migration/colo-failover.c
>>>>   create mode 100644 migration/colo.c
>>>>   create mode 100644 stubs/migration-colo.c
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
@ 2015-12-18 10:53   ` Dr. David Alan Gilbert
  2015-12-28  3:46     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 10:53 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We separate the process of saving/loading ram and device state when do checkpoint,
> we add new helpers for save/load ram/device. With this change, we can directly
> transfer ram from master to slave without using QEMUSizeBuffer as assistant,
> which also reduce the size of extra memory been used during checkpoint.
> 
> Besides, we move the colo_flush_ram_cache to the proper position after the
> above change.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v11:
> - Remove load configuration section in qemu_loadvm_state_begin()
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  include/sysemu/sysemu.h |   6 +++
>  migration/colo.c        |  43 ++++++++++++----
>  migration/ram.c         |   5 --
>  migration/savevm.c      | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
>  4 files changed, 168 insertions(+), 18 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 91eeda3..5deae53 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -133,7 +133,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>                                             uint64_t *start_list,
>                                             uint64_t *length_list);
>  
> +int qemu_save_ram_precopy(QEMUFile *f);
> +int qemu_save_device_state(QEMUFile *f);
> +
>  int qemu_loadvm_state(QEMUFile *f);
> +int qemu_loadvm_state_begin(QEMUFile *f);
> +int qemu_load_ram_state(QEMUFile *f);
> +int qemu_load_device_state(QEMUFile *f);
>  
>  typedef enum DisplayType
>  {
> diff --git a/migration/colo.c b/migration/colo.c
> index 62a0444..d253d64 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -272,21 +272,32 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>          goto out;
>      }
>  
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    if (ret < 0) {
> +        goto out;
> +    }
>      /* Disable block migration */
>      s->params.blk = 0;
>      s->params.shared = 0;
> -    qemu_savevm_state_header(trans);
> -    qemu_savevm_state_begin(trans, &s->params);
> -    qemu_mutex_lock_iothread();
> -    qemu_savevm_state_complete_precopy(trans, false);
> -    qemu_mutex_unlock_iothread();
> -
> -    qemu_fflush(trans);
> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
> +    ret = qemu_file_get_error(s->to_dst_file);
> +    if (ret < 0) {
> +        error_report("save vm state begin error\n");

You don't need \n in error_report (Markus is trying to get rid
of all the existing cases where people do that!)

> +        goto out;
> +    }
>  
> -    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    qemu_mutex_lock_iothread();
> +    /* Note: device state is saved into buffer */
> +    ret = qemu_save_device_state(trans);
>      if (ret < 0) {
> +        error_report("save device state error\n");
> +        qemu_mutex_unlock_iothread();
>          goto out;
>      }
> +    qemu_fflush(trans);
> +    qemu_save_ram_precopy(s->to_dst_file);
> +    qemu_mutex_unlock_iothread();
> +

It's interesting you're saving the devices and then saving the RAM,
where as in a normal migration we always save the RAM first and then
the devices;  I don't _think_ it makes any difference but I thought
I'd point it out.

>      /* we send the total size of the vmstate first */
>      size = qsb_get_length(buffer);
>      ret = colo_put_cmd_value(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
> @@ -545,6 +556,16 @@ void *colo_process_incoming_thread(void *opaque)
>              goto out;
>          }
>  
> +        ret = qemu_loadvm_state_begin(mis->from_src_file);
> +        if (ret < 0) {
> +            error_report("load vm state begin error, ret=%d", ret);
> +            goto out;
> +        }
> +        ret = qemu_load_ram_state(mis->from_src_file);
> +        if (ret < 0) {
> +            error_report("load ram state error");
> +            goto out;
> +        }
>          /* read the VM state total size first */
>          ret = colo_get_cmd_value(mis->from_src_file,
>                                   COLO_COMMAND_VMSTATE_SIZE, &value);
> @@ -577,8 +598,10 @@ void *colo_process_incoming_thread(void *opaque)
>          qemu_mutex_lock_iothread();
>          qemu_system_reset(VMRESET_SILENT);
>          vmstate_loading = true;
> -        if (qemu_loadvm_state(fb) < 0) {
> -            error_report("COLO: loadvm failed");
> +        colo_flush_ram_cache();
> +        ret = qemu_load_device_state(fb);
> +        if (ret < 0) {
> +            error_report("COLO: load device state failed\n");
>              vmstate_loading = false;
>              qemu_mutex_unlock_iothread();
>              goto out;
> diff --git a/migration/ram.c b/migration/ram.c
> index 8ff7f7c..45d9332 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2458,7 +2458,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       * be atomic
>       */
>      bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
> -    bool need_flush = false;
>  
>      seq_iter++;
>  
> @@ -2493,7 +2492,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              /* After going into COLO, we should load the Page into colo_cache */
>              if (ram_cache_enable) {
>                  host = colo_cache_from_block_offset(block, addr);
> -                need_flush = true;
>              } else {
>                  host = host_from_ram_block_offset(block, addr);
>              }
> @@ -2588,9 +2586,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  
>      rcu_read_unlock();
>  
> -    if (!ret  && ram_cache_enable && need_flush) {
> -        colo_flush_ram_cache();
> -    }
>      DPRINTF("Completed load of VM with exit code %d seq iteration "
>              "%" PRIu64 "\n", ret, seq_iter);
>      return ret;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c7c26d8..94c0d10 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -50,6 +50,7 @@
>  #include "qemu/iov.h"
>  #include "block/snapshot.h"
>  #include "block/qapi.h"
> +#include "migration/colo.h"
>  
>  
>  #ifndef ETH_P_RARP
> @@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
>              break;
>          }
>      }
> +    if (migration_in_colo_state()) {
> +        qemu_put_byte(f, QEMU_VM_EOF);
> +        qemu_fflush(f);
> +    }
>  }
>  
>  /*
> @@ -1192,13 +1197,44 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      return ret;
>  }
>  
> -static int qemu_save_device_state(QEMUFile *f)
> +int qemu_save_ram_precopy(QEMUFile *f)
>  {
>      SaveStateEntry *se;
> +    int ret = 0;
>  
> -    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->save_live_complete_precopy) {
> +            continue;
> +        }
> +        if (se->ops && se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +        trace_savevm_section_start(se->idstr, se->section_id);

Please update the trace_ names to match the function.

> +
> +        save_section_header(f, se, QEMU_VM_SECTION_END);
>  
> +        ret = se->ops->save_live_complete_precopy(f, se->opaque);
> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
> +        save_section_footer(f, se);
> +        if (ret < 0) {
> +            qemu_file_set_error(f, ret);
> +            return ret;
> +        }
> +    }
> +    qemu_put_byte(f, QEMU_VM_EOF);
> +
> +    return 0;
> +}

OK, that function is a bit odd - you're relying on a device having
save_live_complete_precopy to let you know that it's a RAM block; it's
currently true but it'll get more interesting if anyone tries to
add any other postcopy devices.  Please add a comment at least
to point that out.

> +
> +int qemu_save_device_state(QEMUFile *f)
> +{
> +    SaveStateEntry *se;
> +
> +    if (!migration_in_colo_state()) {
> +        qemu_savevm_state_header(f);
> +    }
>      cpu_synchronize_all_states();
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> @@ -1938,6 +1974,96 @@ int qemu_loadvm_state(QEMUFile *f)
>      return ret;
>  }
>  
> +int qemu_loadvm_state_begin(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    int ret = -1;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    if (!mis) {
> +        error_report("qemu_loadvm_state_begin");
> +        return -EINVAL;
> +    }

an odd error; how can that happen?

> +    /* CleanUp */
> +    loadvm_free_handlers(mis);

I don't understand why you do that here?

> +
> +    if (qemu_savevm_state_blocked(NULL)) {
> +        return -EINVAL;
> +    }

The other calls to that function print the error it returns!

> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> +        if (section_type != QEMU_VM_SECTION_START) {
> +            error_report("QEMU_VM_SECTION_START");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        ret = qemu_loadvm_section_start_full(f, mis);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
> +    ret = qemu_file_get_error(f);
> +    if (ret == 0) {
> +        return 0;
> +     }

That 'if' isn't needed - just remove the 3 lines and it will do the same
thing!

> +out:
> +    return ret;
> +}
> +
> +int qemu_load_ram_state(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    int ret = -1;
> +
> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> +        if (section_type != QEMU_VM_SECTION_PART &&
> +            section_type != QEMU_VM_SECTION_END) {
> +            error_report("load ram state, not get "
> +                         "QEMU_VM_SECTION_FULL or QEMU_VM_SECTION_END");
> +            return -EINVAL;
> +        }
> +        ret = qemu_loadvm_section_part_end(f, mis);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
> +    ret = qemu_file_get_error(f);
> +    if (ret == 0) {
> +        return 0;
> +     }
> +out:
> +    return ret;
> +}
> +
> +int qemu_load_device_state(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    int ret = -1;
> +
> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> +        if (section_type != QEMU_VM_SECTION_FULL) {
> +            error_report("load device state error: "
> +                         "Not get QEMU_VM_SECTION_FULL");
> +            return -EINVAL;
> +        }
> +        ret = qemu_loadvm_section_start_full(f, mis);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
> +
> +    ret = qemu_file_get_error(f);
> +
> +    cpu_synchronize_all_post_init();
> +    if (ret == 0) {
> +        return 0;
> +    }
> +out:
> +    return ret;
> +}
> +

These three functions are all very similar;  would it be easier
just to call qemu_loadvm_state_main ?  Perhaps add a flag/enum
parameter to it for it to check which section_types are allowed
in the 3 different cases?

Dave

>  void hmp_savevm(Monitor *mon, const QDict *qdict)
>  {
>      BlockDriverState *bs, *bs1;
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
@ 2015-12-18 12:01   ` Dr. David Alan Gilbert
  2015-12-28  7:29     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 12:01 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
> It mainly sets up devices and does the first device state pass. These data will
> not change during the later checkpoint process. So, we split it out of
> colo_do_checkpoint_transaction(), in this way, we can reduce these data
> transferring in the later checkpoint.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/colo.c | 51 +++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index d253d64..4571359 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -276,15 +276,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>      if (ret < 0) {
>          goto out;
>      }
> -    /* Disable block migration */
> -    s->params.blk = 0;
> -    s->params.shared = 0;
> -    qemu_savevm_state_begin(s->to_dst_file, &s->params);
> -    ret = qemu_file_get_error(s->to_dst_file);
> -    if (ret < 0) {
> -        error_report("save vm state begin error\n");
> -        goto out;
> -    }
>  
>      qemu_mutex_lock_iothread();
>      /* Note: device state is saved into buffer */
> @@ -348,6 +339,21 @@ out:
>      return ret;
>  }
>  
> +static int colo_prepare_before_save(MigrationState *s)
> +{
> +    int ret;
> +    /* Disable block migration */
> +    s->params.blk = 0;
> +    s->params.shared = 0;
> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
> +    ret = qemu_file_get_error(s->to_dst_file);
> +    if (ret < 0) {
> +        error_report("save vm state begin error\n");

 '\n' again not needed.

> +        return ret;
> +    }
> +    return 0;
> +}
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      QEMUSizedBuffer *buffer = NULL;
> @@ -363,6 +369,11 @@ static void colo_process_checkpoint(MigrationState *s)
>          goto out;
>      }
>  
> +    ret = colo_prepare_before_save(s);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      /*
>       * Wait for Secondary finish loading vm states and enter COLO
>       * restore.
> @@ -484,6 +495,18 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>      }
>  }
>  
> +static int colo_prepare_before_load(QEMUFile *f)
> +{
> +    int ret;
> +
> +    ret = qemu_loadvm_state_begin(f);
> +    if (ret < 0) {
> +        error_report("load vm state begin error, ret=%d", ret);
> +        return ret;

You can simplify these returns; remove this line.

> +    }
> +    return 0;

and make this return ret; same in a few places.


Other than those minor issues;

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +}
> +
>  void *colo_process_incoming_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -522,6 +545,11 @@ void *colo_process_incoming_thread(void *opaque)
>          goto out;
>      }
>  
> +    ret = colo_prepare_before_load(mis->from_src_file);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>      if (ret < 0) {
>          goto out;
> @@ -556,11 +584,6 @@ void *colo_process_incoming_thread(void *opaque)
>              goto out;
>          }
>  
> -        ret = qemu_loadvm_state_begin(mis->from_src_file);
> -        if (ret < 0) {
> -            error_report("load vm state begin error, ret=%d", ret);
> -            goto out;
> -        }
>          ret = qemu_load_ram_state(mis->from_src_file);
>          if (ret < 0) {
>              error_report("load ram state error");
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
@ 2015-12-18 14:52   ` Dr. David Alan Gilbert
  2015-12-28  7:34     ` Hailiang Zhang
  2015-12-19  8:54   ` Markus Armbruster
  1 sibling, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 14:52 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We need communications protocol of user-defined to control the checkpoint
> process.
> 
> The new checkpoint request is started by Primary VM, and the interactive process
> like below:
> Checkpoint synchronizing points,
> 
>                        Primary                         Secondary
>                                                        initial work
> 'checkpoint-ready'     <------------------------------ @
> 
> 'checkpoint-request'   @ ----------------------------->
>                                                        Suspend (Only in hybrid mode)
> 'checkpoint-reply'     <------------------------------ @
>                        Suspend&Save state
> 'vmstate-send'         @ ----------------------------->
>                        Send state                      Receive state
> 'vmstate-received'     <------------------------------ @
>                        Release packets                 Load state
> 'vmstate-load'         <------------------------------ @
>                        Resume                          Resume (Only in hybrid mode)
> 
>                        Start Comparing (Only in hybrid mode)
> NOTE:
>  1) '@' who sends the message
>  2) Every sync-point is synchronized by two sides with only
>     one handshake(single direction) for low-latency.
>     If more strict synchronization is required, a opposite direction
>     sync-point should be added.
>  3) Since sync-points are single direction, the remote side may
>     go forward a lot when this side just receives the sync-point.
>  4) For now, we only support 'periodic' checkpoint, for which
>    the Secondary VM is not running, later we will support 'hybrid' mode.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Eric Blake <eblake@redhat.com>
> ---
> v12:
> - Rename colo_ctl_put() to colo_put_cmd()
> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>   the third parameter
> - Rename colo_ctl_get_cmd() to colo_get_cmd()
> - Remove useless 'invalid' member for COLOcommand enum.
> v11:
> - Add missing 'checkpoint-ready' communication in comment.
> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
> - Fix trace for colo_ctl_get() to trace command and value both
> v10:
> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
> - Remove unused 'ram-steal'
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qapi-schema.json |  25 ++++++++
>  trace-events     |   2 +
>  3 files changed, 208 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 0ab9618..0ce2a6e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -10,10 +10,12 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#include <unistd.h>
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qemu/sockets.h"
>  
>  bool colo_supported(void)
>  {
> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>      return mis && (mis->state == MIGRATION_STATUS_COLO);
>  }
>  
> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
> +{
> +    int ret;
> +
> +    if (cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    qemu_put_be32(f, cmd);
> +    qemu_fflush(f);
> +
> +    ret = qemu_file_get_error(f);
> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
> +
> +    return ret;
> +}
> +
> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
> +{
> +    int ret;
> +
> +    *cmd = qemu_get_be32(f);
> +    ret = qemu_file_get_error(f);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (*cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
> +    return 0;
> +}
> +
> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (cmd != expect_cmd) {
> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
> +                     expect_cmd, cmd);

Those still need to be PRIu32

But other than that,

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int colo_do_checkpoint_transaction(MigrationState *s)
> +{
> +    int ret;
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_REPLY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: suspend and save vm state to colo buffer */
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: send vmstate to Secondary */
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_RECEIVED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_LOADED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: resume Primary */
> +
> +out:
> +    return ret;
> +}
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      int ret = 0;
> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>          goto out;
>      }
>  
> +    /*
> +     * Wait for Secondary finish loading vm states and enter COLO
> +     * restore.
> +     */
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
>      vm_start();
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> -    /*TODO: COLO checkpoint savevm loop*/
> +    while (s->state == MIGRATION_STATUS_COLO) {
> +        /* start a colo checkpoint */
> +        ret = colo_do_checkpoint_transaction(s);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>      qemu_mutex_lock_iothread();
>  }
>  
> +/*
> + * return:
> + * 0: start a checkpoint
> + * -1: some error happened, exit colo restore
> + */
> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        /* do failover ? */
> +        return ret;
> +    }
> +
> +    switch (cmd) {
> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
> +        *checkpoint_request = 1;
> +        return 0;
> +    default:
> +        return -EINVAL;
> +    }
> +}
> +
>  void *colo_process_incoming_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>      */
>      qemu_set_block(qemu_get_fd(mis->from_src_file));
>  
> -    /* TODO: COLO checkpoint restore loop */
> +
> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    while (mis->state == MIGRATION_STATUS_COLO) {
> +        int request = 0;
> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
> +
> +        if (ret < 0) {
> +            break;
> +        } else {
> +            if (!request) {
> +                continue;
> +            }
> +        }
> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        ret = colo_get_check_cmd(mis->from_src_file,
> +                                 COLO_COMMAND_VMSTATE_SEND);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: read migration data into colo buffer */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: load vm state */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index c9ff34e..85f7800 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -720,6 +720,31 @@
>  { 'command': 'migrate-start-postcopy' }
>  
>  ##
> +# @COLOCommand
> +#
> +# The commands for COLO fault tolerance
> +#
> +# @checkpoint-ready: SVM is ready for checkpointing
> +#
> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
> +#
> +# @checkpoint-reply: SVM gets PVM's checkpoint request
> +#
> +# @vmstate-send: VM's state will be sent by PVM.
> +#
> +# @vmstate-size: The total size of VMstate.
> +#
> +# @vmstate-received: VM's state has been received by SVM.
> +#
> +# @vmstate-loaded: VM's state has been loaded by SVM.
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOCommand',
> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> +            'vmstate-send', 'vmstate-size','vmstate-received',
> +            'vmstate-loaded' ] }
> +
>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/trace-events b/trace-events
> index 5565e79..39fdd8d 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>  
>  # migration/colo.c
>  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
> +colo_put_cmd(const char *msg) "Send '%s' cmd"
> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>  
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
@ 2015-12-18 15:18   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 15:18 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Split host_from_stream_offset() into two parts:
> One is to get ram block, which the block idstr may be get from migration
> stream, the other is to get hva (host) address from block and the offset.
> Besides, we will do the check working in a new helper offset_in_ramblock().
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
> v12:
> - Remove the offset parameter for ram_block_from_stream() and
>   check the validity of the related value in a new helper. (Dave's suggestion)
> v11:
> - New patch
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/exec/ram_addr.h |  8 ++++++--
>  migration/ram.c         | 40 +++++++++++++++++++++++++---------------
>  2 files changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 7115154..2b31279 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -38,10 +38,14 @@ struct RAMBlock {
>      int fd;
>  };
>  
> +static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> +{
> +    return (b && b->host && offset < b->used_length) ? true : false;
> +}
> +
>  static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
>  {
> -    assert(offset < block->used_length);
> -    assert(block->host);
> +    assert(offset_in_ramblock(block, offset));
>      return (char *)block->host + offset;
>  }
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index a709471..09fe6e6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2138,28 +2138,24 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
>   * Returns a pointer from within the RCU-protected ram_list.
>   */
>  /*
> - * Read a RAMBlock ID from the stream f, find the host address of the
> - * start of that block and add on 'offset'
> + * Read a RAMBlock ID from the stream f.
>   *
>   * f: Stream to read from
> - * offset: Offset within the block
>   * flags: Page flags (mostly to see if it's a continuation of previous block)
>   */
> -static inline void *host_from_stream_offset(QEMUFile *f,
> -                                            ram_addr_t offset,
> -                                            int flags)
> +static inline RAMBlock *ram_block_from_stream(QEMUFile *f,
> +                                              int flags)
>  {
>      static RAMBlock *block = NULL;
>      char id[256];
>      uint8_t len;
>  
>      if (flags & RAM_SAVE_FLAG_CONTINUE) {
> -        if (!block || block->max_length <= offset) {
> +        if (!block) {
>              error_report("Ack, bad migration stream!");
>              return NULL;
>          }
> -
> -        return block->host + offset;
> +        return block;
>      }
>  
>      len = qemu_get_byte(f);
> @@ -2167,12 +2163,22 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>      id[len] = 0;
>  
>      block = qemu_ram_block_by_name(id);
> -    if (block && block->max_length > offset) {
> -        return block->host + offset;
> +    if (!block) {
> +        error_report("Can't find block %s", id);
> +        return NULL;
>      }
>  
> -    error_report("Can't find block %s", id);
> -    return NULL;
> +    return block;
> +}
> +
> +static inline void *host_from_ram_block_offset(RAMBlock *block,
> +                                               ram_addr_t offset)
> +{
> +    if (!offset_in_ramblock(block, offset)) {
> +        return NULL;
> +    }
> +
> +    return block->host + offset;
>  }
>  
>  /*
> @@ -2319,7 +2325,9 @@ static int ram_load_postcopy(QEMUFile *f)
>          trace_ram_load_postcopy_loop((uint64_t)addr, flags);
>          place_needed = false;
>          if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE)) {
> -            host = host_from_stream_offset(f, addr, flags);
> +            RAMBlock *block = ram_block_from_stream(f, flags);
> +
> +            host = host_from_ram_block_offset(block, addr);
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
> @@ -2450,7 +2458,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  
>          if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> -            host = host_from_stream_offset(f, addr, flags);
> +            RAMBlock *block = ram_block_from_stream(f, flags);
> +
> +            host = host_from_ram_block_offset(block, addr);
>              if (!host) {
>                  error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
>                  ret = -EINVAL;
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2015-12-18 15:27   ` Dr. David Alan Gilbert
  2015-12-19  9:38   ` Markus Armbruster
  1 sibling, 0 replies; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 15:27 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, qemu-devel, arei.gonglei, stefanha,
	amit.shah, Luiz Capitulino, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We leave users to choose whatever heartbeat solution they want, if the heartbeat
> is lost, or other errors they detect, they can use experimental command
> 'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
> accordingly.
> 
> For example, if the command is sent to the PVM, the Primary side will
> exit COLO mode and take over operation. If sent to the Secondary, the
> secondary will run failover work, then take over server operation to
> become the new Primary.
> 
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
> v11:
> - Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
> - Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
> v10:
> - Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  hmp-commands.hx              | 15 +++++++++++++++
>  hmp.c                        |  8 ++++++++
>  hmp.h                        |  1 +
>  include/migration/colo.h     |  3 +++
>  include/migration/failover.h | 20 ++++++++++++++++++++
>  migration/Makefile.objs      |  2 +-
>  migration/colo-comm.c        | 11 +++++++++++
>  migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>  migration/colo.c             |  1 +
>  qapi-schema.json             | 29 +++++++++++++++++++++++++++++
>  qmp-commands.hx              | 19 +++++++++++++++++++
>  stubs/migration-colo.c       |  8 ++++++++
>  12 files changed, 157 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/failover.h
>  create mode 100644 migration/colo-failover.c
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bb52e4d..a381b0b 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
>  ETEXI
>  
>      {
> +        .name       = "x_colo_lost_heartbeat",
> +        .args_type  = "",
> +        .params     = "",
> +        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
> +                      "a failover or takeover is needed.",
> +        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
> +    },
> +
> +STEXI
> +@item x_colo_lost_heartbeat
> +@findex x_colo_lost_heartbeat
> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
> +ETEXI
> +
> +    {
>          .name       = "client_migrate_info",
>          .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>          .params     = "protocol hostname port tls-port cert-subject",
> diff --git a/hmp.c b/hmp.c
> index ee87d38..dc6dc30 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1310,6 +1310,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
>      hmp_handle_error(mon, &err);
>  }
>  
> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +
> +    qmp_x_colo_lost_heartbeat(&err);
> +    hmp_handle_error(mon, &err);
> +}
> +
>  void hmp_set_password(Monitor *mon, const QDict *qdict)
>  {
>      const char *protocol  = qdict_get_str(qdict, "protocol");
> diff --git a/hmp.h b/hmp.h
> index a8c5b5a..864a300 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
>  void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
>  void hmp_set_password(Monitor *mon, const QDict *qdict);
>  void hmp_expire_password(Monitor *mon, const QDict *qdict);
>  void hmp_eject(Monitor *mon, const QDict *qdict);
> diff --git a/include/migration/colo.h b/include/migration/colo.h
> index 2676c4a..ba27719 100644
> --- a/include/migration/colo.h
> +++ b/include/migration/colo.h
> @@ -17,6 +17,7 @@
>  #include "migration/migration.h"
>  #include "qemu/coroutine_int.h"
>  #include "qemu/thread.h"
> +#include "qemu/main-loop.h"
>  
>  bool colo_supported(void);
>  void colo_info_mig_init(void);
> @@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
>  void migration_incoming_exit_colo(void);
>  void *colo_process_incoming_thread(void *opaque);
>  bool migration_incoming_in_colo_state(void);
> +
> +COLOMode get_colo_mode(void);
>  #endif
> diff --git a/include/migration/failover.h b/include/migration/failover.h
> new file mode 100644
> index 0000000..1785b52
> --- /dev/null
> +++ b/include/migration/failover.h
> @@ -0,0 +1,20 @@
> +/*
> + *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + *  (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_FAILOVER_H
> +#define QEMU_FAILOVER_H
> +
> +#include "qemu-common.h"
> +
> +void failover_request_active(Error **errp);
> +
> +#endif
> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index 81b5713..920d1e7 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -1,6 +1,6 @@
>  common-obj-y += migration.o tcp.o
> -common-obj-$(CONFIG_COLO) += colo.o
>  common-obj-y += colo-comm.o
> +common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
>  common-obj-y += vmstate.o
>  common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>  common-obj-y += xbzrle.o postcopy-ram.o
> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
> index 30df3d3..58a6488 100644
> --- a/migration/colo-comm.c
> +++ b/migration/colo-comm.c
> @@ -20,6 +20,17 @@ typedef struct {
>  
>  static COLOInfo colo_info;
>  
> +COLOMode get_colo_mode(void)
> +{
> +    if (migration_in_colo_state()) {
> +        return COLO_MODE_PRIMARY;
> +    } else if (migration_incoming_in_colo_state()) {
> +        return COLO_MODE_SECONDARY;
> +    } else {
> +        return COLO_MODE_UNKNOWN;
> +    }
> +}
> +
>  static void colo_info_pre_save(void *opaque)
>  {
>      COLOInfo *s = opaque;
> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
> new file mode 100644
> index 0000000..e3897c6
> --- /dev/null
> +++ b/migration/colo-failover.c
> @@ -0,0 +1,41 @@
> +/*
> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + * (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#include "migration/colo.h"
> +#include "migration/failover.h"
> +#include "qmp-commands.h"
> +#include "qapi/qmp/qerror.h"
> +
> +static QEMUBH *failover_bh;
> +
> +static void colo_failover_bh(void *opaque)
> +{
> +    qemu_bh_delete(failover_bh);
> +    failover_bh = NULL;
> +    /*TODO: Do failover work */
> +}
> +
> +void failover_request_active(Error **errp)
> +{
> +    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
> +    qemu_bh_schedule(failover_bh);
> +}
> +
> +void qmp_x_colo_lost_heartbeat(Error **errp)
> +{
> +    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
> +        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
> +        return;
> +    }
> +
> +    failover_request_active(errp);
> +}
> diff --git a/migration/colo.c b/migration/colo.c
> index ca5df44..7098497 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -17,6 +17,7 @@
>  #include "trace.h"
>  #include "qemu/error-report.h"
>  #include "qemu/sockets.h"
> +#include "migration/failover.h"
>  
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> diff --git a/qapi-schema.json b/qapi-schema.json
> index a5699a7..feb7d53 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -761,6 +761,35 @@
>              'vmstate-send', 'vmstate-size','vmstate-received',
>              'vmstate-loaded' ] }
>  
> +##
> +# @COLOMode
> +#
> +# The colo mode
> +#
> +# @unknown: unknown mode
> +#
> +# @primary: master side
> +#
> +# @secondary: slave side
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOMode',
> +  'data': [ 'unknown', 'primary', 'secondary'] }
> +
> +##
> +# @x-colo-lost-heartbeat
> +#
> +# Tell qemu that heartbeat is lost, request it to do takeover procedures.
> +# If this command is sent to the PVM, the Primary side will exit COLO mode.
> +# If sent to the Secondary, the Secondary side will run failover work,
> +# then takes over server operation to become the service VM.
> +#
> +# Since: 2.6
> +##
> +{ 'command': 'x-colo-lost-heartbeat' }
> +
> +##
>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 89756c9..76ad208 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -805,6 +805,25 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "x-colo-lost-heartbeat",
> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
> +    },
> +
> +SQMP
> +x-colo-lost-heartbeat
> +--------------------
> +
> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
> +
> +Example:
> +
> +-> { "execute": "x-colo-lost-heartbeat" }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "client_migrate_info",
>          .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>          .params     = "protocol hostname port tls-port cert-subject",
> diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
> index c12516e..5028f63 100644
> --- a/stubs/migration-colo.c
> +++ b/stubs/migration-colo.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "migration/colo.h"
> +#include "qmp-commands.h"
>  
>  bool colo_supported(void)
>  {
> @@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
>  {
>      return NULL;
>  }
> +
> +void qmp_x_colo_lost_heartbeat(Error **errp)
> +{
> +    error_setg(errp, "COLO is not supported, please rerun configure"
> +                     " with --enable-colo option in order to support"
> +                     " COLO feature");
> +}
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM zhanghailiang
@ 2015-12-18 15:35   ` Dr. David Alan Gilbert
  2015-12-28  7:39     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 15:35 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> For PVM, if there is failover request from users.
> The colo thread will exit the loop while the failover BH does the
> cleanup work and resumes VM.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v12:
> - Fix error report and remove unnecessary check in primary_vm_do_failover()
>  (Dave's suggestion)
> v11:
> - Don't call migration_end() in primary_vm_do_failover(),
>  The cleanup work will be done in migration_thread().
> - Remove vm_start() in primary_vm_do_failover() which also been done
>   in migraiton_thread()
> v10:
> - Call migration_end() in primary_vm_do_failover()
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  include/migration/colo.h     |  3 +++
>  include/migration/failover.h |  1 +
>  migration/colo-failover.c    |  7 +++++-
>  migration/colo.c             | 54 ++++++++++++++++++++++++++++++++++++++++++--
>  4 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git a/include/migration/colo.h b/include/migration/colo.h
> index ba27719..0b02e95 100644
> --- a/include/migration/colo.h
> +++ b/include/migration/colo.h
> @@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
>  bool migration_incoming_in_colo_state(void);
>  
>  COLOMode get_colo_mode(void);
> +
> +/* failover */
> +void colo_do_failover(MigrationState *s);
>  #endif
> diff --git a/include/migration/failover.h b/include/migration/failover.h
> index 882c625..fba3931 100644
> --- a/include/migration/failover.h
> +++ b/include/migration/failover.h
> @@ -26,5 +26,6 @@ void failover_init_state(void);
>  int failover_set_state(int old_state, int new_state);
>  int failover_get_state(void);
>  void failover_request_active(Error **errp);
> +bool failover_request_is_active(void);
>  
>  #endif
> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
> index 1b1be24..0c525da 100644
> --- a/migration/colo-failover.c
> +++ b/migration/colo-failover.c
> @@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
>          error_report("Unkown error for failover, old_state=%d", old_state);
>          return;
>      }
> -    /*TODO: Do failover work */
> +    colo_do_failover(NULL);
>  }
>  
>  void failover_request_active(Error **errp)
> @@ -67,6 +67,11 @@ int failover_get_state(void)
>      return atomic_read(&failover_state);
>  }
>  
> +bool failover_request_is_active(void)
> +{
> +    return ((failover_get_state() != FAILOVER_STATUS_NONE));

You can remove the two sets of brackets.
But other than that:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +}
> +
>  void qmp_x_colo_lost_heartbeat(Error **errp)
>  {
>      if (get_colo_mode() == COLO_MODE_UNKNOWN) {
> diff --git a/migration/colo.c b/migration/colo.c
> index 176384e..977c8d8 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -41,6 +41,40 @@ bool migration_incoming_in_colo_state(void)
>      return mis && (mis->state == MIGRATION_STATUS_COLO);
>  }
>  
> +static bool colo_runstate_is_stopped(void)
> +{
> +    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
> +}
> +
> +static void primary_vm_do_failover(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +    int old_state;
> +
> +    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
> +                      MIGRATION_STATUS_COMPLETED);
> +
> +    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
> +                                   FAILOVER_STATUS_COMPLETED);
> +    if (old_state != FAILOVER_STATUS_HANDLING) {
> +        error_report("Incorrect state (%d) while doing failover for Primary VM",
> +                     old_state);
> +        return;
> +    }
> +}
> +
> +void colo_do_failover(MigrationState *s)
> +{
> +    /* Make sure vm stopped while failover */
> +    if (!colo_runstate_is_stopped()) {
> +        vm_stop_force_state(RUN_STATE_COLO);
> +    }
> +
> +    if (get_colo_mode() == COLO_MODE_PRIMARY) {
> +        primary_vm_do_failover();
> +    }
> +}
> +
>  static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>  {
>      int ret;
> @@ -150,9 +184,22 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>      }
>  
>      qemu_mutex_lock_iothread();
> +    if (failover_request_is_active()) {
> +        qemu_mutex_unlock_iothread();
> +        ret = -1;
> +        goto out;
> +    }
>      vm_stop_force_state(RUN_STATE_COLO);
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("run", "stop");
> +    /*
> +     * failover request bh could be called after
> +     * vm_stop_force_state so we check failover_request_is_active() again.
> +     */
> +    if (failover_request_is_active()) {
> +        ret = -1;
> +        goto out;
> +    }
>  
>      /* Disable block migration */
>      s->params.blk = 0;
> @@ -248,6 +295,11 @@ static void colo_process_checkpoint(MigrationState *s)
>      trace_colo_vm_state_change("stop", "run");
>  
>      while (s->state == MIGRATION_STATUS_COLO) {
> +        if (failover_request_is_active()) {
> +            error_report("failover request");
> +            goto out;
> +        }
> +
>          current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>          if (current_time - checkpoint_time <
>              s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
> @@ -269,8 +321,6 @@ out:
>      if (ret < 0) {
>          error_report("%s: %s", __func__, strerror(-ret));
>      }
> -    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
> -                      MIGRATION_STATUS_COMPLETED);
>  
>      qsb_free(buffer);
>      buffer = NULL;
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-18  1:10       ` Hailiang Zhang
@ 2015-12-18 15:47         ` Dr. David Alan Gilbert
  2015-12-23  1:24           ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Dr. David Alan Gilbert @ 2015-12-18 15:47 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/12/17 18:52, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
> >>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>This is the 12th version of COLO.
> >>>>
> >>>>As usual, this version of COLO is only support periodic checkpoint,
> >>>>just like MicroCheckpointing and Remus does.
> >>>>
> >>>>Here is only COLO frame part, you can get the whole codes from github:
> >>>>https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
> >>>
> >>>Hi,
> >>>   Have you tried wiring in Zhang Chen's new userland colo proxy yet?
> >>>I'd like to start trying it out.
> >>>
> >>
> >>Not yet, actually, for frame part, we can re-use most of the previous codes that based on
> >>kernel proxy. And, yes, please, you are welcome to join us. ;)
> >
> >Yes, that's certainly something I'll look at immediately at the start of the new year
> >(I'm out for 2 weeks from Friday).
> >
> 
> Great~
> 
> >I've just tested this series on my machines, and it works well.
> 
> Thank you for the testing.
> 
> >Two things:
> >   1) I just posted a patch to add an HMP equivalent to x-blockdev-change
> >   2) If you run with an older machine type (e.g. pc-i440fx-2.3) then if I failover to the
> >secondary then I hit a 'invalid runstate transition: 'inmigrate' -> 'prelaunch'';
> >I guess this is something to do with global_state.
> >
> 
> Yes, we have fixed one problem related to global_state. I didn't test COLO with
> older machine type. I will look into it, thanks for reporting it.

I think I've Reviewed-by or sent comments on all of the patches in this set with the exception of
  25 - that is QMP so I'll leave that for Eric
  33-37 that are Network related which I don't know much about, so I'll leave those for Jason
  38 - that's mostly block, so perhaps Stefan is best to look at that.

Getting closer!

Dave

> 
> Hailiang
> 
> >Dave
> >
> >>>Dave
> >>>
> >>>>Test procedure:
> >>>>1. Startup qemu
> >>>>Primary side:
> >>>>#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
> >>>>Secondary side:
> >>>>#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
> >>>>2. On Secondary VM's QEMU monitor, issue command
> >>>>{'execute':'qmp_capabilities'}
> >>>>{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
> >>>>{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
> >>>>{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
> >>>>
> >>>>3. On Primary VM's QEMU monitor, issue command:
> >>>>{'execute':'qmp_capabilities'}
> >>>>{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
> >>>>{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
> >>>>{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
> >>>>{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
> >>>>
> >>>>4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
> >>>>You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
> >>>>to change the checkpoint period time.
> >>>>
> >>>>5. Failover test
> >>>>You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
> >>>>monitor at the same time, then SVM will failover and client will not feel this
> >>>>change.
> >>>>
> >>>>Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
> >>>>issue block related command to stop block replication.
> >>>>Primary:
> >>>>   Remove the nbd child from the quorum:
> >>>>   { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
> >>>>   Note: there is no qmp command to remove the blockdev now
> >>>>
> >>>>Secondary:
> >>>>   The primary host is down, so we should do the following thing:
> >>>>   { 'execute': 'nbd-server-stop' }
> >>>>
> >>>>Please review, thanks.
> >>>>
> >>>>TODO:
> >>>>1. Implement packets compare module (proxy) in qemu (Doing)
> >>>>2. Checkpoint based on proxy in qemu
> >>>>3. The capability of continuous FT
> >>>>
> >>>>v12:
> >>>>  - Fix the bug that default buffer filter broken vhost-net.
> >>>>  - Add an flag in struct NetFilterState to help skipping default
> >>>>   filter for packets travelling through filter layer.
> >>>>  - Remove the default failover treatment which may cause split-brain.
> >>>>  - Rename checkpoint-delay to x-checkpoint-delay.
> >>>>  - Check if all netdev supports default filter before going into COLO.
> >>>>  - Reconstruct send/receive helper functions in patch 10.
> >>>>  - Address serveral other comments from Dave
> >>>>
> >>>>v11:
> >>>>  - Re-implement buffer/release packets based on filter-buffer according
> >>>>    to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
> >>>>  - Rebase master to re-use some stuff introduced by post-copy.
> >>>>  - Address several comments from Eric and Dave, the fixing record can
> >>>>    be found in each patch.
> >>>>
> >>>>v10:
> >>>>  - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
> >>>>  - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
> >>>>  - Simplify the process of primary side by dropping colo thread and reusing
> >>>>    migration thread. (Dave's suggestion)
> >>>>  - Add several netfilter related APIs to support buffer/release packets
> >>>>    for COLO (patch 32 ~ patch 36)
> >>>>
> >>>>zhanghailiang (38):
> >>>>   configure: Add parameter for configure to enable/disable COLO support
> >>>>   migration: Introduce capability 'x-colo' to migration
> >>>>   COLO: migrate colo related info to secondary node
> >>>>   migration: Export migrate_set_state()
> >>>>   migration: Add state records for migration incoming
> >>>>   migration: Integrate COLO checkpoint process into migration
> >>>>   migration: Integrate COLO checkpoint process into loadvm
> >>>>   migration: Rename the'file' member of MigrationState
> >>>>   COLO/migration: Create a new communication path from destination to
> >>>>     source
> >>>>   COLO: Implement colo checkpoint protocol
> >>>>   COLO: Add a new RunState RUN_STATE_COLO
> >>>>   QEMUSizedBuffer: Introduce two help functions for qsb
> >>>>   COLO: Save PVM state to secondary side when do checkpoint
> >>>>   ram: Split host_from_stream_offset() into two helper functions
> >>>>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
> >>>>   ram/COLO: Record the dirty pages that SVM received
> >>>>   COLO: Load VMState into qsb before restore it
> >>>>   COLO: Flush PVM's cached RAM into SVM's memory
> >>>>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
> >>>>   COLO: synchronize PVM's state to SVM periodically
> >>>>   COLO failover: Introduce a new command to trigger a failover
> >>>>   COLO failover: Introduce state to record failover process
> >>>>   COLO: Implement failover work for Primary VM
> >>>>   COLO: Implement failover work for Secondary VM
> >>>>   qmp event: Add event notification for COLO error
> >>>>   COLO failover: Shutdown related socket fd when do failover
> >>>>   COLO failover: Don't do failover during loading VM's state
> >>>>   COLO: Process shutdown command for VM in COLO state
> >>>>   COLO: Update the global runstate after going into colo state
> >>>>   savevm: Split load vm state function qemu_loadvm_state
> >>>>   COLO: Separate the process of saving/loading ram and device state
> >>>>   COLO: Split qemu_savevm_state_begin out of checkpoint process
> >>>>   net/filter-buffer: Add default filter-buffer for each netdev
> >>>>   filter-buffer: Accept zero interval
> >>>>   filter-buffer: Introduce a helper function to enable/disable default
> >>>>     filter
> >>>>   filter-buffer: Introduce a helper function to release packets
> >>>>   colo: Use default buffer-filter to buffer and release packets
> >>>>   COLO: Add block replication into colo process
> >>>>
> >>>>  configure                     |  11 +
> >>>>  docs/qmp-events.txt           |  17 +
> >>>>  hmp-commands.hx               |  15 +
> >>>>  hmp.c                         |  15 +
> >>>>  hmp.h                         |   1 +
> >>>>  include/exec/ram_addr.h       |   9 +-
> >>>>  include/migration/colo.h      |  38 +++
> >>>>  include/migration/failover.h  |  33 ++
> >>>>  include/migration/migration.h |  18 +-
> >>>>  include/migration/qemu-file.h |   3 +-
> >>>>  include/net/filter.h          |  12 +
> >>>>  include/net/net.h             |   5 +
> >>>>  include/sysemu/sysemu.h       |   9 +
> >>>>  migration/Makefile.objs       |   2 +
> >>>>  migration/colo-comm.c         |  71 ++++
> >>>>  migration/colo-failover.c     |  83 +++++
> >>>>  migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
> >>>>  migration/exec.c              |   4 +-
> >>>>  migration/fd.c                |   4 +-
> >>>>  migration/migration.c         | 216 ++++++++----
> >>>>  migration/postcopy-ram.c      |   6 +-
> >>>>  migration/qemu-file-buf.c     |  61 ++++
> >>>>  migration/ram.c               | 213 ++++++++++--
> >>>>  migration/rdma.c              |   2 +-
> >>>>  migration/savevm.c            | 295 ++++++++++++----
> >>>>  migration/tcp.c               |   4 +-
> >>>>  migration/unix.c              |   4 +-
> >>>>  net/filter-buffer.c           | 127 ++++++-
> >>>>  net/filter.c                  |   6 +-
> >>>>  net/net.c                     |  58 ++++
> >>>>  qapi-schema.json              | 106 +++++-
> >>>>  qapi/event.json               |  17 +
> >>>>  qmp-commands.hx               |  24 +-
> >>>>  stubs/Makefile.objs           |   1 +
> >>>>  stubs/migration-colo.c        |  45 +++
> >>>>  trace-events                  |  10 +
> >>>>  vl.c                          |  37 +-
> >>>>  37 files changed, 2152 insertions(+), 195 deletions(-)
> >>>>  create mode 100644 include/migration/colo.h
> >>>>  create mode 100644 include/migration/failover.h
> >>>>  create mode 100644 migration/colo-comm.c
> >>>>  create mode 100644 migration/colo-failover.c
> >>>>  create mode 100644 migration/colo.c
> >>>>  create mode 100644 stubs/migration-colo.c
> >>>>
> >>>>--
> >>>>1.8.3.1
> >>>>
> >>>>
> >>>--
> >>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>.
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
@ 2015-12-18 16:03   ` Eric Blake
  2015-12-23  1:55     ` Hailiang Zhang
  2015-12-19 10:02   ` Markus Armbruster
  1 sibling, 1 reply; 94+ messages in thread
From: Eric Blake @ 2015-12-18 16:03 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, arei.gonglei, stefanha,
	amit.shah, Michael Roth, hongyang.yang

[-- Attachment #1: Type: text/plain, Size: 2563 bytes --]

On 12/15/2015 01:22 AM, zhanghailiang wrote:
> If some errors happen during VM's COLO FT stage, it's important to notify the users
> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
> failover work immediately.
> If users don't want to get involved in COLO's failover verdict,
> it is still necessary to notify users that we exited COLO mode.
> 
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v11:
> - Fix several typos found by Eric
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---

> +++ b/docs/qmp-events.txt
> @@ -184,6 +184,23 @@ Example:
>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>  event.
>  
> +COLO_EXIT
> +---------
> +
> +Emitted when VM finishes COLO mode due to some errors happening or
> +at the request of users.
> +
> +Data:
> +
> + - "mode": COLO mode, primary or secondary side (json-string)
> + - "reason":  the exit reason, internal error or external request. (json-string)
> + - "error": error message (json-string, operation)

s/operation/optional/
May want to word it as:

- "error": error message for human consumption (json-string, optional)

to point out that machines shouldn't parse it.

> +++ b/migration/colo.c
> @@ -18,6 +18,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/sockets.h"
>  #include "migration/failover.h"
> +#include "qapi-event.h"
>  
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>  out:
>      if (ret < 0) {
>          error_report("%s: %s", __func__, strerror(-ret));

Unrelated: I mentioned in another thread that we may want to start
thinking about adding error_report_errno(); this would be another client.

> +++ b/qapi-schema.json
> @@ -778,6 +778,22 @@
>    'data': [ 'unknown', 'primary', 'secondary'] }
>  
>  ##
> +# @COLOExitReason
> +#
> +# The reason for a COLO exit
> +#
> +# @unknown: unknown reason
> +#

If we never return 'unknown', then it is not worth having it in the enum
(we can always add it later if we find a reason to have it; but adding
it now feels premature if the code base isn't using it).

Otherwise looks okay to me.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
  2015-12-18 14:52   ` Dr. David Alan Gilbert
@ 2015-12-19  8:54   ` Markus Armbruster
  2015-12-22  7:00     ` Hailiang Zhang
  1 sibling, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2015-12-19  8:54 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

Jumping in at v12 for a bit of QAPI review (and whatever else catched my
eye nearby), please pardon my ignorance of COLO in general, and previous
review of this series in particular.

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> We need communications protocol of user-defined to control the checkpoint
> process.
>
> The new checkpoint request is started by Primary VM, and the interactive process
> like below:
> Checkpoint synchronizing points,
>
>                        Primary                         Secondary
>                                                        initial work
> 'checkpoint-ready'     <------------------------------ @
>
> 'checkpoint-request'   @ ----------------------------->
>                                                        Suspend (Only in hybrid mode)
> 'checkpoint-reply'     <------------------------------ @
>                        Suspend&Save state
> 'vmstate-send'         @ ----------------------------->
>                        Send state                      Receive state
> 'vmstate-received'     <------------------------------ @
>                        Release packets                 Load state
> 'vmstate-load'         <------------------------------ @
>                        Resume                          Resume (Only in hybrid mode)

Long lines.  Easy to fix: shorten your arrows.

>                        Start Comparing (Only in hybrid mode)
> NOTE:
>  1) '@' who sends the message
>  2) Every sync-point is synchronized by two sides with only
>     one handshake(single direction) for low-latency.
>     If more strict synchronization is required, a opposite direction
>     sync-point should be added.
>  3) Since sync-points are single direction, the remote side may
>     go forward a lot when this side just receives the sync-point.
>  4) For now, we only support 'periodic' checkpoint, for which
>    the Secondary VM is not running, later we will support 'hybrid' mode.

Useful commit message, but shouldn't this explanation (also) be in the
source?

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Eric Blake <eblake@redhat.com>
> ---
> v12:
> - Rename colo_ctl_put() to colo_put_cmd()
> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>   the third parameter
> - Rename colo_ctl_get_cmd() to colo_get_cmd()
> - Remove useless 'invalid' member for COLOcommand enum.
> v11:
> - Add missing 'checkpoint-ready' communication in comment.
> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
> - Fix trace for colo_ctl_get() to trace command and value both
> v10:
> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
> - Remove unused 'ram-steal'
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qapi-schema.json |  25 ++++++++
>  trace-events     |   2 +
>  3 files changed, 208 insertions(+), 2 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 0ab9618..0ce2a6e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -10,10 +10,12 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#include <unistd.h>
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qemu/sockets.h"
>  
>  bool colo_supported(void)
>  {
> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>      return mis && (mis->state == MIGRATION_STATUS_COLO);
>  }
>  
> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
> +{
> +    int ret;
> +
> +    if (cmd >= COLO_COMMAND_MAX) {

Needs a trivial rebase due to commit 7fb1cf1.

> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;

Can this run in a context with different error handling needs?

Or asked differently: who may ultimately handle this error?  Whoever
that may be, how does it need to report errors?

Peeking ahead: the immediate callers don't handle this error, they just
pass it on their callers.

I'm asking because I'm trying to understand whether error_report() is
appropriate here, or whether you need to use error_setg(), and leave the
actual reporting to the spot that ultimately handles this error.

> +    }
> +    qemu_put_be32(f, cmd);
> +    qemu_fflush(f);
> +
> +    ret = qemu_file_get_error(f);
> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
> +
> +    return ret;
> +}

Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?

> +
> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
> +{
> +    int ret;
> +
> +    *cmd = qemu_get_be32(f);
> +    ret = qemu_file_get_error(f);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (*cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
> +    return 0;
> +}

Same question.

The "get" in the name suggests the function returns the value gotten,
like similarly named function elsewhere in migration/ do.

> +
> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (cmd != expect_cmd) {
> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",

Grammar nit: "Unexpected".  Suggest: "Unexpected COLO command %d,
expected %d".

> +                     expect_cmd, cmd);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int colo_do_checkpoint_transaction(MigrationState *s)
> +{
> +    int ret;
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_REPLY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: suspend and save vm state to colo buffer */
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: send vmstate to Secondary */
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_RECEIVED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_LOADED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: resume Primary */
> +
> +out:
> +    return ret;
> +}
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      int ret = 0;
> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>          goto out;
>      }
>  
> +    /*
> +     * Wait for Secondary finish loading vm states and enter COLO
> +     * restore.
> +     */
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
>      vm_start();
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> -    /*TODO: COLO checkpoint savevm loop*/
> +    while (s->state == MIGRATION_STATUS_COLO) {
> +        /* start a colo checkpoint */
> +        ret = colo_do_checkpoint_transaction(s);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>      qemu_mutex_lock_iothread();
>  }
>  
> +/*
> + * return:
> + * 0: start a checkpoint
> + * -1: some error happened, exit colo restore
> + */

Suggest to make this a proper function comment, i.e.

/*
 * One line describing purpose
 * As many additional lines as it takes to further explain what it does,
 * preconditions, side effects, return values, error conditions.  Use
 * @name to refer to parameters.
 */

> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        /* do failover ? */
> +        return ret;
> +    }
> +
> +    switch (cmd) {
> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
> +        *checkpoint_request = 1;
> +        return 0;
> +    default:
> +        return -EINVAL;
> +    }

switch makes sense only if you're going to add cases.

Suggest to set *checkpoint_request = 0 on error, for robustness.

> +}
> +
>  void *colo_process_incoming_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>      */
>      qemu_set_block(qemu_get_fd(mis->from_src_file));
>  
> -    /* TODO: COLO checkpoint restore loop */
> +
> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    while (mis->state == MIGRATION_STATUS_COLO) {
> +        int request = 0;

Dead initialization.

> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
> +
> +        if (ret < 0) {
> +            break;
> +        } else {
> +            if (!request) {
> +                continue;
> +            }
> +        }

Convoluted nesting.  Suggest

        if (ret < 0) {
            break;
        }
        if (!request) {
            continue;
        }

Actually, !request can't happen, so I'd make it.

        if (ret < 0) {
            break;
        }
        assert(request);

until it can happen.

> +        /* FIXME: This is unnecessary for periodic checkpoint mode */

When you add a FIXME, you should probably point to it in your commit
message.  May not be necessary when the FIXME goes away later in this
series.

Pretty much the same for TODO.

> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
> +        if (ret < 0) {
> +            goto out;

Above, you used break to "break" the loop on error.  Here, you use "goto
out".  Suggest to pick one and stick to it.

> +        }
> +
> +        ret = colo_get_check_cmd(mis->from_src_file,
> +                                 COLO_COMMAND_VMSTATE_SEND);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: read migration data into colo buffer */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: load vm state */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index c9ff34e..85f7800 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -720,6 +720,31 @@
>  { 'command': 'migrate-start-postcopy' }
>  
>  ##
> +# @COLOCommand
> +#
> +# The commands for COLO fault tolerance
> +#
> +# @checkpoint-ready: SVM is ready for checkpointing
> +#
> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
> +#
> +# @checkpoint-reply: SVM gets PVM's checkpoint request
> +#
> +# @vmstate-send: VM's state will be sent by PVM.
> +#
> +# @vmstate-size: The total size of VMstate.
> +#
> +# @vmstate-received: VM's state has been received by SVM.
> +#
> +# @vmstate-loaded: VM's state has been loaded by SVM.
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOCommand',
> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> +            'vmstate-send', 'vmstate-size','vmstate-received',
> +            'vmstate-loaded' ] }
> +

Space after 'vmstate-size', please.

'vmstate-size' is not used in this patch.  You may want to add it with
its first use instead.

Should this enum really be named "COLOCommand"?  'checkpoint-ready',
'checkpoint-request', 'vmstate-send' look like commands to me, but the
others look like replies.


>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/trace-events b/trace-events
> index 5565e79..39fdd8d 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>  
>  # migration/colo.c
>  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
> +colo_put_cmd(const char *msg) "Send '%s' cmd"
> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>  
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"

I like how this commit creates just the two state machines, and leaves
filling in their actions to later commits.  Helps ignorant rewiewers
like me :)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2015-12-19  9:27   ` Markus Armbruster
  2015-12-22 13:32     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2015-12-19  9:27 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> Guest will enter this state when paused to save/restore VM state
> under colo checkpoint.
>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  qapi-schema.json | 5 ++++-
>  vl.c             | 8 ++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 85f7800..0423b47 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -154,12 +154,15 @@
>  # @watchdog: the watchdog action is configured to pause and has been triggered
>  #
>  # @guest-panicked: guest has been panicked as a result of guest OS panic
> +#
> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since
> +# 2.6)
>  ##
>  { 'enum': 'RunState',
>    'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>              'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>              'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
> -            'guest-panicked' ] }
> +            'guest-panicked', 'colo' ] }
>  
>  ##
>  # @StatusInfo:
> diff --git a/vl.c b/vl.c
> index f84fde8..fca630b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
>      { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
>      { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
>  
>      { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
>      { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
> @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>  
>      { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
>      { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_PAUSED, RUN_STATE_COLO},
>  
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
> @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>  
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
>  
>      { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>  
> +    { RUN_STATE_COLO, RUN_STATE_RUNNING },
> +
>      { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>      { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
>      { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
> @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
>      { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
>      { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
> +    { RUN_STATE_RUNNING, RUN_STATE_COLO},
>  
>      { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
>  
> @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>      { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>  
>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
>  
>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },

Pardon my ignorance, but could you explain the new run state in a bit
more detail for me?

Your additions to runstate_transitions_def[] show we can go *from* state
'colo' only to state 'running', but we can go *to* state 'colo' from
various other states.  This may well be sane, but it's not *obviously*
sane :)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
@ 2015-12-19  9:33   ` Markus Armbruster
  2015-12-22 13:43     ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2015-12-19  9:33 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	dgilbert, hongyang.yang

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> Add checkpoint-delay parameter for migrate-set-parameters, so that
> we can control the checkpoint frequency when COLO is in periodic mode.
>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> v12:
> - Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
> - Add Reviewed-by tag
> v11:
> - Move this patch ahead of the patch where uses 'checkpoint_delay'
>  (Dave's suggestion)
> v10:
> - Fix related qmp command
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  hmp.c                 |  7 +++++++
>  migration/migration.c | 24 +++++++++++++++++++++++-
>  qapi-schema.json      | 19 ++++++++++++++++---
>  qmp-commands.hx       |  4 ++--
>  4 files changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/hmp.c b/hmp.c
> index 2140605..ee87d38 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -284,6 +284,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, " %s: %" PRId64,
>              MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
>              params->x_cpu_throttle_increment);
> +        monitor_printf(mon, " %s: %" PRId64,
> +            MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
> +            params->x_checkpoint_delay);
>          monitor_printf(mon, "\n");
>      }
>  
> @@ -1237,6 +1240,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>      bool has_decompress_threads = false;
>      bool has_x_cpu_throttle_initial = false;
>      bool has_x_cpu_throttle_increment = false;
> +    bool has_x_checkpoint_delay = false;
>      int i;
>  
>      for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
> @@ -1256,6 +1260,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>                  break;
>              case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
>                  has_x_cpu_throttle_increment = true;
> +            case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
> +                has_x_checkpoint_delay = true;
>                  break;
>              }
>              qmp_migrate_set_parameters(has_compress_level, value,
> @@ -1263,6 +1269,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>                                         has_decompress_threads, value,
>                                         has_x_cpu_throttle_initial, value,
>                                         has_x_cpu_throttle_increment, value,
> +                                       has_x_checkpoint_delay, value,
>                                         &err);
>              break;
>          }
> diff --git a/migration/migration.c b/migration/migration.c
> index a1074c3..8988358 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -56,6 +56,11 @@
>  /* Migration XBZRLE default cache size */
>  #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
>  
> +/* The delay time (in ms) between two COLO checkpoints
> + * Note: Please change this default value to 10000 when we support hybrid mode.
> + */
> +#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
> +
>  static NotifierList migration_state_notifiers =
>      NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
>  
> @@ -91,6 +96,8 @@ MigrationState *migrate_get_current(void)
>                  DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
>          .parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
>                  DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
> +        .parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
> +                DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
>      };
>  
>      if (!once) {
> @@ -530,6 +537,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>              s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
>      params->x_cpu_throttle_increment =
>              s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
> +    params->x_checkpoint_delay =
> +            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY];
>  
>      return params;
>  }
> @@ -736,7 +745,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>                                  bool has_x_cpu_throttle_initial,
>                                  int64_t x_cpu_throttle_initial,
>                                  bool has_x_cpu_throttle_increment,
> -                                int64_t x_cpu_throttle_increment, Error **errp)
> +                                int64_t x_cpu_throttle_increment,
> +                                bool has_x_checkpoint_delay,
> +                                int64_t x_checkpoint_delay,
> +                                Error **errp)
>  {
>      MigrationState *s = migrate_get_current();
>  
> @@ -771,6 +783,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>                     "x_cpu_throttle_increment",
>                     "an integer in the range of 1 to 99");
>      }
> +    if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                    "x_checkpoint_delay",
> +                    "is invalid, it should be positive");
> +    }
>  
>      if (has_compress_level) {
>          s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
> @@ -791,6 +808,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>          s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
>                                                      x_cpu_throttle_increment;
>      }
> +
> +    if (has_x_checkpoint_delay) {
> +        s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
> +                                                    x_checkpoint_delay;
> +    }
>  }
>  
>  void qmp_migrate_start_postcopy(Error **errp)
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 0423b47..a5699a7 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -623,11 +623,16 @@
>  # @x-cpu-throttle-increment: throttle percentage increase each time
>  #                            auto-converge detects that migration is not making
>  #                            progress. The default value is 10. (Since 2.5)
> +#
> +# @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
> +#          periodic mode. (Since 2.6)
> +#
>  # Since: 2.4
>  ##
>  { 'enum': 'MigrationParameter',
>    'data': ['compress-level', 'compress-threads', 'decompress-threads',
> -           'x-cpu-throttle-initial', 'x-cpu-throttle-increment'] }
> +           'x-cpu-throttle-initial', 'x-cpu-throttle-increment',
> +           'x-checkpoint-delay' ] }
>  
>  #
>  # @migrate-set-parameters
> @@ -647,6 +652,9 @@
>  # @x-cpu-throttle-increment: throttle percentage increase each time
>  #                            auto-converge detects that migration is not making
>  #                            progress. The default value is 10. (Since 2.5)
> +#
> +# @x-checkpoint-delay: the delay time between two checkpoints. (Since 2.6)
> +#

Unit?  I guess it's ms, as above.

>  # Since: 2.4
>  ##
>  { 'command': 'migrate-set-parameters',
> @@ -654,7 +662,8 @@
>              '*compress-threads': 'int',
>              '*decompress-threads': 'int',
>              '*x-cpu-throttle-initial': 'int',
> -            '*x-cpu-throttle-increment': 'int'} }
> +            '*x-cpu-throttle-increment': 'int',
> +            '*x-checkpoint-delay': 'int' } }
>  
>  #
>  # @MigrationParameters
> @@ -673,6 +682,8 @@
>  #                            auto-converge detects that migration is not making
>  #                            progress. The default value is 10. (Since 2.5)
>  #
> +# @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.6)
> +#

Same question.

>  # Since: 2.4
>  ##
>  { 'struct': 'MigrationParameters',
> @@ -680,7 +691,9 @@
>              'compress-threads': 'int',
>              'decompress-threads': 'int',
>              'x-cpu-throttle-initial': 'int',
> -            'x-cpu-throttle-increment': 'int'} }
> +            'x-cpu-throttle-increment': 'int',
> +            'x-checkpoint-delay': 'int'} }
> +
>  ##
>  # @query-migrate-parameters
>  #

x-checkpoint-delay intentionally not added to MigrationInfo?

> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 91979b4..89756c9 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3651,7 +3651,7 @@ Set migration parameters
>  - "compress-level": set compression level during migration (json-int)
>  - "compress-threads": set compression thread count for migration (json-int)
>  - "decompress-threads": set decompression thread count for migration (json-int)
> -
> +- "x-checkpoint-delay": set the delay time for periodic checkpoint (json-int)

Unit?

>  Arguments:
>  
>  Example:
> @@ -3664,7 +3664,7 @@ EQMP
>      {
>          .name       = "migrate-set-parameters",
>          .args_type  =
> -            "compress-level:i?,compress-threads:i?,decompress-threads:i?",
> +            "compress-level:i?,compress-threads:i?,decompress-threads:i?,x-checkpoint-delay:i?",
>          .mhandler.cmd_new = qmp_marshal_migrate_set_parameters,
>      },
>  SQMP

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
  2015-12-18 15:27   ` Dr. David Alan Gilbert
@ 2015-12-19  9:38   ` Markus Armbruster
  2015-12-22 13:50     ` Hailiang Zhang
  1 sibling, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2015-12-19  9:38 UTC (permalink / raw)
  To: zhanghailiang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	dgilbert, hongyang.yang

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> We leave users to choose whatever heartbeat solution they want, if the heartbeat
> is lost, or other errors they detect, they can use experimental command
> 'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
> accordingly.
>
> For example, if the command is sent to the PVM, the Primary side will
> exit COLO mode and take over operation. If sent to the Secondary, the
> secondary will run failover work, then take over server operation to
> become the new Primary.
>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v11:
> - Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
> - Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
> v10:
> - Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  hmp-commands.hx              | 15 +++++++++++++++
>  hmp.c                        |  8 ++++++++
>  hmp.h                        |  1 +
>  include/migration/colo.h     |  3 +++
>  include/migration/failover.h | 20 ++++++++++++++++++++
>  migration/Makefile.objs      |  2 +-
>  migration/colo-comm.c        | 11 +++++++++++
>  migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>  migration/colo.c             |  1 +
>  qapi-schema.json             | 29 +++++++++++++++++++++++++++++
>  qmp-commands.hx              | 19 +++++++++++++++++++
>  stubs/migration-colo.c       |  8 ++++++++
>  12 files changed, 157 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/failover.h
>  create mode 100644 migration/colo-failover.c
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bb52e4d..a381b0b 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
>  ETEXI
>  
>      {
> +        .name       = "x_colo_lost_heartbeat",
> +        .args_type  = "",
> +        .params     = "",
> +        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
> +                      "a failover or takeover is needed.",
> +        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
> +    },
> +
> +STEXI
> +@item x_colo_lost_heartbeat
> +@findex x_colo_lost_heartbeat
> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
> +ETEXI
> +
> +    {
>          .name       = "client_migrate_info",
>          .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>          .params     = "protocol hostname port tls-port cert-subject",
> diff --git a/hmp.c b/hmp.c
> index ee87d38..dc6dc30 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1310,6 +1310,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
>      hmp_handle_error(mon, &err);
>  }
>  
> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
> +{
> +    Error *err = NULL;
> +
> +    qmp_x_colo_lost_heartbeat(&err);
> +    hmp_handle_error(mon, &err);
> +}
> +
>  void hmp_set_password(Monitor *mon, const QDict *qdict)
>  {
>      const char *protocol  = qdict_get_str(qdict, "protocol");
> diff --git a/hmp.h b/hmp.h
> index a8c5b5a..864a300 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
>  void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
>  void hmp_set_password(Monitor *mon, const QDict *qdict);
>  void hmp_expire_password(Monitor *mon, const QDict *qdict);
>  void hmp_eject(Monitor *mon, const QDict *qdict);
> diff --git a/include/migration/colo.h b/include/migration/colo.h
> index 2676c4a..ba27719 100644
> --- a/include/migration/colo.h
> +++ b/include/migration/colo.h
> @@ -17,6 +17,7 @@
>  #include "migration/migration.h"
>  #include "qemu/coroutine_int.h"
>  #include "qemu/thread.h"
> +#include "qemu/main-loop.h"
>  
>  bool colo_supported(void);
>  void colo_info_mig_init(void);
> @@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
>  void migration_incoming_exit_colo(void);
>  void *colo_process_incoming_thread(void *opaque);
>  bool migration_incoming_in_colo_state(void);
> +
> +COLOMode get_colo_mode(void);
>  #endif
> diff --git a/include/migration/failover.h b/include/migration/failover.h
> new file mode 100644
> index 0000000..1785b52
> --- /dev/null
> +++ b/include/migration/failover.h
> @@ -0,0 +1,20 @@
> +/*
> + *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + *  (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_FAILOVER_H
> +#define QEMU_FAILOVER_H
> +
> +#include "qemu-common.h"
> +
> +void failover_request_active(Error **errp);
> +
> +#endif
> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index 81b5713..920d1e7 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -1,6 +1,6 @@
>  common-obj-y += migration.o tcp.o
> -common-obj-$(CONFIG_COLO) += colo.o
>  common-obj-y += colo-comm.o
> +common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
>  common-obj-y += vmstate.o
>  common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>  common-obj-y += xbzrle.o postcopy-ram.o
> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
> index 30df3d3..58a6488 100644
> --- a/migration/colo-comm.c
> +++ b/migration/colo-comm.c
> @@ -20,6 +20,17 @@ typedef struct {
>  
>  static COLOInfo colo_info;
>  
> +COLOMode get_colo_mode(void)
> +{
> +    if (migration_in_colo_state()) {
> +        return COLO_MODE_PRIMARY;
> +    } else if (migration_incoming_in_colo_state()) {
> +        return COLO_MODE_SECONDARY;
> +    } else {
> +        return COLO_MODE_UNKNOWN;
> +    }
> +}
> +
>  static void colo_info_pre_save(void *opaque)
>  {
>      COLOInfo *s = opaque;
> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
> new file mode 100644
> index 0000000..e3897c6
> --- /dev/null
> +++ b/migration/colo-failover.c
> @@ -0,0 +1,41 @@
> +/*
> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + * (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#include "migration/colo.h"
> +#include "migration/failover.h"
> +#include "qmp-commands.h"
> +#include "qapi/qmp/qerror.h"
> +
> +static QEMUBH *failover_bh;
> +
> +static void colo_failover_bh(void *opaque)
> +{
> +    qemu_bh_delete(failover_bh);
> +    failover_bh = NULL;
> +    /*TODO: Do failover work */
> +}
> +
> +void failover_request_active(Error **errp)
> +{
> +    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
> +    qemu_bh_schedule(failover_bh);
> +}
> +
> +void qmp_x_colo_lost_heartbeat(Error **errp)
> +{
> +    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
> +        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
> +        return;
> +    }
> +
> +    failover_request_active(errp);
> +}
> diff --git a/migration/colo.c b/migration/colo.c
> index ca5df44..7098497 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -17,6 +17,7 @@
>  #include "trace.h"
>  #include "qemu/error-report.h"
>  #include "qemu/sockets.h"
> +#include "migration/failover.h"
>  
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> diff --git a/qapi-schema.json b/qapi-schema.json
> index a5699a7..feb7d53 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -761,6 +761,35 @@
>              'vmstate-send', 'vmstate-size','vmstate-received',
>              'vmstate-loaded' ] }
>  
> +##
> +# @COLOMode
> +#
> +# The colo mode

This is rather terse for an ignorant reader like me.

> +#
> +# @unknown: unknown mode

What does "unknown mode" mean, and how can it happen?

> +#
> +# @primary: master side
> +#
> +# @secondary: slave side
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOMode',
> +  'data': [ 'unknown', 'primary', 'secondary'] }
> +
> +##
> +# @x-colo-lost-heartbeat
> +#
> +# Tell qemu that heartbeat is lost, request it to do takeover procedures.
> +# If this command is sent to the PVM, the Primary side will exit COLO mode.
> +# If sent to the Secondary, the Secondary side will run failover work,
> +# then takes over server operation to become the service VM.
> +#
> +# Since: 2.6
> +##
> +{ 'command': 'x-colo-lost-heartbeat' }
> +
> +##
>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 89756c9..76ad208 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -805,6 +805,25 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "x-colo-lost-heartbeat",
> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
> +    },
> +
> +SQMP
> +x-colo-lost-heartbeat
> +--------------------
> +
> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
> +
> +Example:
> +
> +-> { "execute": "x-colo-lost-heartbeat" }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "client_migrate_info",
>          .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>          .params     = "protocol hostname port tls-port cert-subject",
> diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
> index c12516e..5028f63 100644
> --- a/stubs/migration-colo.c
> +++ b/stubs/migration-colo.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "migration/colo.h"
> +#include "qmp-commands.h"
>  
>  bool colo_supported(void)
>  {
> @@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
>  {
>      return NULL;
>  }
> +
> +void qmp_x_colo_lost_heartbeat(Error **errp)
> +{
> +    error_setg(errp, "COLO is not supported, please rerun configure"
> +                     " with --enable-colo option in order to support"
> +                     " COLO feature");
> +}

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
  2015-12-18 16:03   ` Eric Blake
@ 2015-12-19 10:02   ` Markus Armbruster
  2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
                       ` (2 more replies)
  1 sibling, 3 replies; 94+ messages in thread
From: Markus Armbruster @ 2015-12-19 10:02 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Michael Roth, lizhijian, quintela, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, arei.gonglei, stefanha, amit.shah,
	qemu-block, dgilbert, hongyang.yang

Copying qemu-block because this seems related to generalising block jobs
to background jobs.

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> If some errors happen during VM's COLO FT stage, it's important to notify the users
> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
> failover work immediately.
> If users don't want to get involved in COLO's failover verdict,
> it is still necessary to notify users that we exited COLO mode.
>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
> v11:
> - Fix several typos found by Eric
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  docs/qmp-events.txt | 17 +++++++++++++++++
>  migration/colo.c    | 11 +++++++++++
>  qapi-schema.json    | 16 ++++++++++++++++
>  qapi/event.json     | 17 +++++++++++++++++
>  4 files changed, 61 insertions(+)
>
> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
> index d2f1ce4..19f68fc 100644
> --- a/docs/qmp-events.txt
> +++ b/docs/qmp-events.txt
> @@ -184,6 +184,23 @@ Example:
>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>  event.
>  
> +COLO_EXIT
> +---------
> +
> +Emitted when VM finishes COLO mode due to some errors happening or
> +at the request of users.

How would the event's recipient distinguish between "due to error" and
"at the user's request"?

> +
> +Data:
> +
> + - "mode": COLO mode, primary or secondary side (json-string)
> + - "reason":  the exit reason, internal error or external request. (json-string)
> + - "error": error message (json-string, operation)
> +
> +Example:
> +
> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
> +

Pardon my ignorance again...  Does "VM finishes COLO mode" means have
some kind of COLO background job, and it just finished for whatever
reason?

If yes, this COLO job could be an instance of the general background job
concept we're trying to grow from the existing block job concept.

I'm not asking you to rebase your work onto the background job
infrastructure, not least for the simple reason that it doesn't exist,
yet.  But I think it would be fruitful to compare your COLO job
management QMP interface with the one we have for block jobs.  Not only
may that avoid unnecessary inconsistency, it could also help shape the
general background job interface.

Quick overview of the block job QMP interface:

* Commands to create a job: block-commit, block-stream, drive-mirror,
  drive-backup.

* Get information on jobs: query-block-jobs

* Pause a job: block-job-pause

* Resume a job: block-job-resume

* Cancel a job: block-job-cancel

* Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED

* Block job error event: BLOCK_JOB_ERROR

* Block job synchronous completion: event BLOCK_JOB_READY and command
  block-job-complete

>  DEVICE_DELETED
>  --------------
>  
> diff --git a/migration/colo.c b/migration/colo.c
> index d1dd4e1..d06c14f 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -18,6 +18,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/sockets.h"
>  #include "migration/failover.h"
> +#include "qapi-event.h"
>  
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>  out:
>      if (ret < 0) {
>          error_report("%s: %s", __func__, strerror(-ret));
> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
> +                                  true, strerror(-ret), NULL);
> +    } else {
> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
> +                                  false, NULL, NULL);
>      }
>  
>      qsb_free(buffer);
> @@ -516,6 +522,11 @@ out:
>      if (ret < 0) {
>          error_report("colo incoming thread will exit, detect error: %s",
>                       strerror(-ret));
> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
> +                                  true, strerror(-ret), NULL);
> +    } else {
> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
> +                                  false, NULL, NULL);
>      }
>  
>      if (fb) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index feb7d53..f6ecb88 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -778,6 +778,22 @@
>    'data': [ 'unknown', 'primary', 'secondary'] }
>  
>  ##
> +# @COLOExitReason
> +#
> +# The reason for a COLO exit
> +#
> +# @unknown: unknown reason

How can @unknown happen?

> +#
> +# @request: COLO exit is due to an external request
> +#
> +# @error: COLO exit is due to an internal error
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOExitReason',
> +  'data': [ 'unknown', 'request', 'error'] }
> +
> +##
>  # @x-colo-lost-heartbeat
>  #
>  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
> diff --git a/qapi/event.json b/qapi/event.json
> index f0cef01..f63d456 100644
> --- a/qapi/event.json
> +++ b/qapi/event.json
> @@ -255,6 +255,23 @@
>    'data': {'status': 'MigrationStatus'}}
>  
>  ##
> +# @COLO_EXIT
> +#
> +# Emitted when VM finishes COLO mode due to some errors happening or
> +# at the request of users.
> +#
> +# @mode: which COLO mode the VM was in when it exited.

Can we get 'unknown' here?

> +#
> +# @reason: describes the reason for the COLO exit.

Can we get 'unknown' here?

> +#
> +# @error: #optional, error message. Only present on error happening.
> +#
> +# Since: 2.6
> +##
> +{ 'event': 'COLO_EXIT',
> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
> +
> +##
>  # @ACPI_DEVICE_OST
>  #
>  # Emitted when guest executes ACPI _OST method.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-19 10:02   ` Markus Armbruster
@ 2015-12-21 21:14     ` John Snow
  2015-12-23  3:14       ` Hailiang Zhang
  2015-12-23  1:24     ` [Qemu-devel] " Wen Congyang
  2015-12-23  3:10     ` [Qemu-devel] " Hailiang Zhang
  2 siblings, 1 reply; 94+ messages in thread
From: John Snow @ 2015-12-21 21:14 UTC (permalink / raw)
  To: Markus Armbruster, zhanghailiang
  Cc: qemu-block, lizhijian, quintela, qemu-devel, yunhong.jiang,
	eddie.dong, peter.huangpeng, Michael Roth, arei.gonglei,
	stefanha, amit.shah, dgilbert, hongyang.yang



On 12/19/2015 05:02 AM, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
> 
> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
> 
>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>  docs/qmp-events.txt | 17 +++++++++++++++++
>>  migration/colo.c    | 11 +++++++++++
>>  qapi-schema.json    | 16 ++++++++++++++++
>>  qapi/event.json     | 17 +++++++++++++++++
>>  4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>  event.
>>  
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
> 
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
> 
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
> 
> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
> 
> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
> 
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet.  But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs.  Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.
> 

Yes. The "background job" concept doesn't exist in a formal way outside
of the block layer yet, but we're looking to expand it as we re-tool the
block jobs themselves.

It may be the case that the COLO commands and events need to go in as
they are now, but later we can bring them back into the generalized job
infrastructure.

> Quick overview of the block job QMP interface:
> 
> * Commands to create a job: block-commit, block-stream, drive-mirror,
>   drive-backup.
> 
> * Get information on jobs: query-block-jobs
> 
> * Pause a job: block-job-pause
> 
> * Resume a job: block-job-resume
> 
> * Cancel a job: block-job-cancel
> 
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
> 
> * Block job error event: BLOCK_JOB_ERROR
> 
> * Block job synchronous completion: event BLOCK_JOB_READY and command
>   block-job-complete
> 

The block-agnostic version of these commands would likely be:

query-jobs
job-pause
job-resume
job-cancel
job-complete

Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY.


It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it
occurred due to an error, we'd also see JOB_ERROR emitted.

>>  DEVICE_DELETED
>>  --------------
>>  
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>  #include "qemu/error-report.h"
>>  #include "qemu/sockets.h"
>>  #include "migration/failover.h"
>> +#include "qapi-event.h"
>>  
>>  /* colo buffer */
>>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>  out:
>>      if (ret < 0) {
>>          error_report("%s: %s", __func__, strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      qsb_free(buffer);
>> @@ -516,6 +522,11 @@ out:
>>      if (ret < 0) {
>>          error_report("colo incoming thread will exit, detect error: %s",
>>                       strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      if (fb) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index feb7d53..f6ecb88 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>>    'data': [ 'unknown', 'primary', 'secondary'] }
>>  
>>  ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
> 
> How can @unknown happen?
> 
>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOExitReason',
>> +  'data': [ 'unknown', 'request', 'error'] }
>> +
>> +##
>>  # @x-colo-lost-heartbeat
>>  #
>>  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> diff --git a/qapi/event.json b/qapi/event.json
>> index f0cef01..f63d456 100644
>> --- a/qapi/event.json
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>>    'data': {'status': 'MigrationStatus'}}
>>  
>>  ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# at the request of users.
>> +#
>> +# @mode: which COLO mode the VM was in when it exited.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @reason: describes the reason for the COLO exit.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>> +
>> +##
>>  # @ACPI_DEVICE_OST
>>  #
>>  # Emitted when guest executes ACPI _OST method.
> 

-- 
—js

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-19  8:54   ` Markus Armbruster
@ 2015-12-22  7:00     ` Hailiang Zhang
  2016-01-11 12:47       ` Markus Armbruster
  0 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-22  7:00 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

Hi Markus,

On 2015/12/19 16:54, Markus Armbruster wrote:
> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
> eye nearby), please pardon my ignorance of COLO in general, and previous
> review of this series in particular.
>

Thanks all the same :)

> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> We need communications protocol of user-defined to control the checkpoint
>> process.
>>
>> The new checkpoint request is started by Primary VM, and the interactive process
>> like below:
>> Checkpoint synchronizing points,
>>
>>                         Primary                         Secondary
>>                                                         initial work
>> 'checkpoint-ready'     <------------------------------ @
>>
>> 'checkpoint-request'   @ ----------------------------->
>>                                                         Suspend (Only in hybrid mode)
>> 'checkpoint-reply'     <------------------------------ @
>>                         Suspend&Save state
>> 'vmstate-send'         @ ----------------------------->
>>                         Send state                      Receive state
>> 'vmstate-received'     <------------------------------ @
>>                         Release packets                 Load state
>> 'vmstate-load'         <------------------------------ @
>>                         Resume                          Resume (Only in hybrid mode)
>
> Long lines.  Easy to fix: shorten your arrows.
>
>>                         Start Comparing (Only in hybrid mode)

OK.

>> NOTE:
>>   1) '@' who sends the message
>>   2) Every sync-point is synchronized by two sides with only
>>      one handshake(single direction) for low-latency.
>>      If more strict synchronization is required, a opposite direction
>>      sync-point should be added.
>>   3) Since sync-points are single direction, the remote side may
>>      go forward a lot when this side just receives the sync-point.
>>   4) For now, we only support 'periodic' checkpoint, for which
>>     the Secondary VM is not running, later we will support 'hybrid' mode.
>
> Useful commit message, but shouldn't this explanation (also) be in the
> source?
>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> ---
>> v12:
>> - Rename colo_ctl_put() to colo_put_cmd()
>> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>>    the third parameter
>> - Rename colo_ctl_get_cmd() to colo_get_cmd()
>> - Remove useless 'invalid' member for COLOcommand enum.
>> v11:
>> - Add missing 'checkpoint-ready' communication in comment.
>> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
>> - Fix trace for colo_ctl_get() to trace command and value both
>> v10:
>> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
>> - Remove unused 'ram-steal'
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qapi-schema.json |  25 ++++++++
>>   trace-events     |   2 +
>>   3 files changed, 208 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 0ab9618..0ce2a6e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,10 +10,12 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include <unistd.h>
>>   #include "sysemu/sysemu.h"
>>   #include "migration/colo.h"
>>   #include "trace.h"
>>   #include "qemu/error-report.h"
>> +#include "qemu/sockets.h"
>>
>>   bool colo_supported(void)
>>   {
>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>   }
>>
>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>> +{
>> +    int ret;
>> +
>> +    if (cmd >= COLO_COMMAND_MAX) {
>
> Needs a trivial rebase due to commit 7fb1cf1.
>

>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>
> Can this run in a context with different error handling needs?
>
> Or asked differently: who may ultimately handle this error?  Whoever
> that may be, how does it need to report errors?
>
> Peeking ahead: the immediate callers don't handle this error, they just
> pass it on their callers.
>
> I'm asking because I'm trying to understand whether error_report() is
> appropriate here, or whether you need to use error_setg(), and leave the
> actual reporting to the spot that ultimately handles this error.
>

Hmm, i know what you mean, we handled them all together after exit from the colo process loop,
Use error_setg() seems to be a good idea, with this modification, we can also drop the return
value. I will fix it in next version.


>> +    }
>> +    qemu_put_be32(f, cmd);
>> +    qemu_fflush(f);
>> +
>> +    ret = qemu_file_get_error(f);
>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>> +
>> +    return ret;
>> +}
>
> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>

OK, i will change it to use enum COLOCommand.

>> +
>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>> +{
>> +    int ret;
>> +
>> +    *cmd = qemu_get_be32(f);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (*cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>> +    return 0;
>> +}
>
> Same question.
>
> The "get" in the name suggests the function returns the value gotten,
> like similarly named function elsewhere in migration/ do.
>
Do you mean it should return the cmd value directly, not though parameter way ?
After we convert it to use error_setg() to indicate success or not, we can do like that.
I will fix it.

>> +
>> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (cmd != expect_cmd) {
>> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
>
> Grammar nit: "Unexpected".  Suggest: "Unexpected COLO command %d,
> expected %d".
>

Will fix it.

>> +                     expect_cmd, cmd);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int colo_do_checkpoint_transaction(MigrationState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_REPLY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: suspend and save vm state to colo buffer */
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: send vmstate to Secondary */
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_RECEIVED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_LOADED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: resume Primary */
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>>   static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       int ret = 0;
>> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>>           goto out;
>>       }
>>
>> +    /*
>> +     * Wait for Secondary finish loading vm states and enter COLO
>> +     * restore.
>> +     */
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>>       vm_start();
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("stop", "run");
>>
>> -    /*TODO: COLO checkpoint savevm loop*/
>> +    while (s->state == MIGRATION_STATUS_COLO) {
>> +        /* start a colo checkpoint */
>> +        ret = colo_do_checkpoint_transaction(s);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>>       qemu_mutex_lock_iothread();
>>   }
>>
>> +/*
>> + * return:
>> + * 0: start a checkpoint
>> + * -1: some error happened, exit colo restore
>> + */
>
> Suggest to make this a proper function comment, i.e.
>

Good catch, i will fix it as your suggestion.

> /*
>   * One line describing purpose
>   * As many additional lines as it takes to further explain what it does,
>   * preconditions, side effects, return values, error conditions.  Use
>   * @name to refer to parameters.
>   */
>
>> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        /* do failover ? */
>> +        return ret;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
>> +        *checkpoint_request = 1;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>
> switch makes sense only if you're going to add cases.
>

Yes, we will add COLO_COMMAND_GUEST_SHUTDOWN in the later patch,
and maybe all more cases in future.

> Suggest to set *checkpoint_request = 0 on error, for robustness.
>

OK.

>> +}
>> +
>>   void *colo_process_incoming_thread(void *opaque)
>>   {
>>       MigrationIncomingState *mis = opaque;
>> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>>       */
>>       qemu_set_block(qemu_get_fd(mis->from_src_file));
>>
>> -    /* TODO: COLO checkpoint restore loop */
>> +
>> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    while (mis->state == MIGRATION_STATUS_COLO) {
>> +        int request = 0;
>
> Dead initialization.
>

>> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
>> +
>> +        if (ret < 0) {
>> +            break;
>> +        } else {
>> +            if (!request) {
>> +                continue;
>> +            }
>> +        }
>
> Convoluted nesting.  Suggest
>
>          if (ret < 0) {
>              break;
>          }
>          if (!request) {
>              continue;
>          }
>
> Actually, !request can't happen, so I'd make it.
>
>          if (ret < 0) {
>              break;
>          }
>          assert(request);
>
> until it can happen.
>

Yes, you are right, it should never happen.

>> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
>
> When you add a FIXME, you should probably point to it in your commit
> message.  May not be necessary when the FIXME goes away later in this
> series.
>
> Pretty much the same for TODO.
>
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
>> +        if (ret < 0) {
>> +            goto out;
>
> Above, you used break to "break" the loop on error.  Here, you use "goto
> out".  Suggest to pick one and stick to it.
>
>> +        }
>> +
>> +        ret = colo_get_check_cmd(mis->from_src_file,
>> +                                 COLO_COMMAND_VMSTATE_SEND);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: read migration data into colo buffer */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: load vm state */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index c9ff34e..85f7800 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -720,6 +720,31 @@
>>   { 'command': 'migrate-start-postcopy' }
>>
>>   ##
>> +# @COLOCommand
>> +#
>> +# The commands for COLO fault tolerance
>> +#
>> +# @checkpoint-ready: SVM is ready for checkpointing
>> +#
>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>> +#
>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>> +#
>> +# @vmstate-send: VM's state will be sent by PVM.
>> +#
>> +# @vmstate-size: The total size of VMstate.
>> +#
>> +# @vmstate-received: VM's state has been received by SVM.
>> +#
>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOCommand',
>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>> +            'vmstate-loaded' ] }
>> +
>
> Space after 'vmstate-size', please.
>

> 'vmstate-size' is not used in this patch.  You may want to add it with
> its first use instead.
>

OK, i will move it to the corresponding patch.

> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
> others look like replies.
>

Yes, COLOCommand is not so exact. what about name it COLOProtocol?

>
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/trace-events b/trace-events
>> index 5565e79..39fdd8d 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>
>>   # migration/colo.c
>>   colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>
>>   # kvm-all.c
>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>
> I like how this commit creates just the two state machines, and leaves
> filling in their actions to later commits.  Helps ignorant rewiewers
> like me :)
>
>

Do you mean i should split this patch ? Leave this patch with the simplest colo process,
maybe just 'ready, request, reply', and add the other states in later patch?

Thanks,
Hailiang

> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO
  2015-12-19  9:27   ` Markus Armbruster
@ 2015-12-22 13:32     ` Hailiang Zhang
  2016-01-11 13:16       ` Markus Armbruster
  0 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-22 13:32 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

On 2015/12/19 17:27, Markus Armbruster wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> Guest will enter this state when paused to save/restore VM state
>> under colo checkpoint.
>>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>   qapi-schema.json | 5 ++++-
>>   vl.c             | 8 ++++++++
>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 85f7800..0423b47 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -154,12 +154,15 @@
>>   # @watchdog: the watchdog action is configured to pause and has been triggered
>>   #
>>   # @guest-panicked: guest has been panicked as a result of guest OS panic
>> +#
>> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since
>> +# 2.6)
>>   ##
>>   { 'enum': 'RunState',
>>     'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>>               'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>>               'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
>> -            'guest-panicked' ] }
>> +            'guest-panicked', 'colo' ] }
>>
>>   ##
>>   # @StatusInfo:
>> diff --git a/vl.c b/vl.c
>> index f84fde8..fca630b 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>       { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
>>       { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
>>       { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
>>
>>       { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
>>       { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
>> @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>
>>       { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
>>       { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_PAUSED, RUN_STATE_COLO},
>>
>>       { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>>       { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
>> @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>>
>>       { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>>       { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
>> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
>>
>>       { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>>
>> +    { RUN_STATE_COLO, RUN_STATE_RUNNING },
>> +
>>       { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>>       { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
>>       { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
>> @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>       { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
>>       { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
>>       { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
>> +    { RUN_STATE_RUNNING, RUN_STATE_COLO},
>>
>>       { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
>>
>> @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = {
>>       { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>       { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>       { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>>
>>       { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>>       { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
>>
>>       { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>>       { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
>
> Pardon my ignorance, but could you explain the new run state in a bit
> more detail for me?
>

OK, in normally, we only need switch between COLO and RUNNING state.
But we can't forbid users to issue other command while VM is COLO state.

In every checkpoint, we have to pause to send VM's state to SVM, and before we
pause VM, users may issue 'stop' command, which will change state to 'RUN_STATE_PAUSE',
we don't want to abort VM because of this command. (Actually, we will support 'stop' VM
during VM is in COLO state). So we need the state machine 'RUN_STATE_PAUSED -> RUN_STATE_COLO'.
We enter COLO state just after a full migration process which the last state will be
'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter COLO loop, we may get
'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need
state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and  'RUN_STATE_INMIGRATE, RUN_STATE_COLO'.
The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or users may issue standby command.
We need to ensure VM not be crashed.

Actually, we may need more states which can go to 'colo' state, maybe just follow the cases of
'MIGRATE' state.

Thanks,
zhanghailiang

> Your additions to runstate_transitions_def[] show we can go *from* state
> 'colo' only to state 'running', but we can go *to* state 'colo' from
> various other states.  This may well be sane, but it's not *obviously*
> sane :)
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters
  2015-12-19  9:33   ` Markus Armbruster
@ 2015-12-22 13:43     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-22 13:43 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	dgilbert, hongyang.yang

On 2015/12/19 17:33, Markus Armbruster wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> Add checkpoint-delay parameter for migrate-set-parameters, so that
>> we can control the checkpoint frequency when COLO is in periodic mode.
>>
>> Cc: Luiz Capitulino <lcapitulino@redhat.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>> v12:
>> - Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
>> - Add Reviewed-by tag
>> v11:
>> - Move this patch ahead of the patch where uses 'checkpoint_delay'
>>   (Dave's suggestion)
>> v10:
>> - Fix related qmp command
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   hmp.c                 |  7 +++++++
>>   migration/migration.c | 24 +++++++++++++++++++++++-
>>   qapi-schema.json      | 19 ++++++++++++++++---
>>   qmp-commands.hx       |  4 ++--
>>   4 files changed, 48 insertions(+), 6 deletions(-)
>>
>> diff --git a/hmp.c b/hmp.c
>> index 2140605..ee87d38 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -284,6 +284,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>           monitor_printf(mon, " %s: %" PRId64,
>>               MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
>>               params->x_cpu_throttle_increment);
>> +        monitor_printf(mon, " %s: %" PRId64,
>> +            MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
>> +            params->x_checkpoint_delay);
>>           monitor_printf(mon, "\n");
>>       }
>>
>> @@ -1237,6 +1240,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>       bool has_decompress_threads = false;
>>       bool has_x_cpu_throttle_initial = false;
>>       bool has_x_cpu_throttle_increment = false;
>> +    bool has_x_checkpoint_delay = false;
>>       int i;
>>
>>       for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
>> @@ -1256,6 +1260,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>                   break;
>>               case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
>>                   has_x_cpu_throttle_increment = true;
>> +            case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
>> +                has_x_checkpoint_delay = true;
>>                   break;
>>               }
>>               qmp_migrate_set_parameters(has_compress_level, value,
>> @@ -1263,6 +1269,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>                                          has_decompress_threads, value,
>>                                          has_x_cpu_throttle_initial, value,
>>                                          has_x_cpu_throttle_increment, value,
>> +                                       has_x_checkpoint_delay, value,
>>                                          &err);
>>               break;
>>           }
>> diff --git a/migration/migration.c b/migration/migration.c
>> index a1074c3..8988358 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -56,6 +56,11 @@
>>   /* Migration XBZRLE default cache size */
>>   #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
>>
>> +/* The delay time (in ms) between two COLO checkpoints
>> + * Note: Please change this default value to 10000 when we support hybrid mode.
>> + */
>> +#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
>> +
>>   static NotifierList migration_state_notifiers =
>>       NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
>>
>> @@ -91,6 +96,8 @@ MigrationState *migrate_get_current(void)
>>                   DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
>>           .parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
>>                   DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
>> +        .parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
>> +                DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
>>       };
>>
>>       if (!once) {
>> @@ -530,6 +537,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>>               s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
>>       params->x_cpu_throttle_increment =
>>               s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
>> +    params->x_checkpoint_delay =
>> +            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY];
>>
>>       return params;
>>   }
>> @@ -736,7 +745,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>>                                   bool has_x_cpu_throttle_initial,
>>                                   int64_t x_cpu_throttle_initial,
>>                                   bool has_x_cpu_throttle_increment,
>> -                                int64_t x_cpu_throttle_increment, Error **errp)
>> +                                int64_t x_cpu_throttle_increment,
>> +                                bool has_x_checkpoint_delay,
>> +                                int64_t x_checkpoint_delay,
>> +                                Error **errp)
>>   {
>>       MigrationState *s = migrate_get_current();
>>
>> @@ -771,6 +783,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>>                      "x_cpu_throttle_increment",
>>                      "an integer in the range of 1 to 99");
>>       }
>> +    if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
>> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
>> +                    "x_checkpoint_delay",
>> +                    "is invalid, it should be positive");
>> +    }
>>
>>       if (has_compress_level) {
>>           s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
>> @@ -791,6 +808,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>>           s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
>>                                                       x_cpu_throttle_increment;
>>       }
>> +
>> +    if (has_x_checkpoint_delay) {
>> +        s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
>> +                                                    x_checkpoint_delay;
>> +    }
>>   }
>>
>>   void qmp_migrate_start_postcopy(Error **errp)
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 0423b47..a5699a7 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -623,11 +623,16 @@
>>   # @x-cpu-throttle-increment: throttle percentage increase each time
>>   #                            auto-converge detects that migration is not making
>>   #                            progress. The default value is 10. (Since 2.5)
>> +#
>> +# @x-checkpoint-delay: The delay time (in ms) between two COLO checkpoints in
>> +#          periodic mode. (Since 2.6)
>> +#
>>   # Since: 2.4
>>   ##
>>   { 'enum': 'MigrationParameter',
>>     'data': ['compress-level', 'compress-threads', 'decompress-threads',
>> -           'x-cpu-throttle-initial', 'x-cpu-throttle-increment'] }
>> +           'x-cpu-throttle-initial', 'x-cpu-throttle-increment',
>> +           'x-checkpoint-delay' ] }
>>
>>   #
>>   # @migrate-set-parameters
>> @@ -647,6 +652,9 @@
>>   # @x-cpu-throttle-increment: throttle percentage increase each time
>>   #                            auto-converge detects that migration is not making
>>   #                            progress. The default value is 10. (Since 2.5)
>> +#
>> +# @x-checkpoint-delay: the delay time between two checkpoints. (Since 2.6)
>> +#
>
> Unit?  I guess it's ms, as above.
>

Yes, i will add it.

>>   # Since: 2.4
>>   ##
>>   { 'command': 'migrate-set-parameters',
>> @@ -654,7 +662,8 @@
>>               '*compress-threads': 'int',
>>               '*decompress-threads': 'int',
>>               '*x-cpu-throttle-initial': 'int',
>> -            '*x-cpu-throttle-increment': 'int'} }
>> +            '*x-cpu-throttle-increment': 'int',
>> +            '*x-checkpoint-delay': 'int' } }
>>
>>   #
>>   # @MigrationParameters
>> @@ -673,6 +682,8 @@
>>   #                            auto-converge detects that migration is not making
>>   #                            progress. The default value is 10. (Since 2.5)
>>   #
>> +# @x-checkpoint-delay: the delay time between two COLO checkpoints. (Since 2.6)
>> +#
>
> Same question.
>

OK.

>>   # Since: 2.4
>>   ##
>>   { 'struct': 'MigrationParameters',
>> @@ -680,7 +691,9 @@
>>               'compress-threads': 'int',
>>               'decompress-threads': 'int',
>>               'x-cpu-throttle-initial': 'int',
>> -            'x-cpu-throttle-increment': 'int'} }
>> +            'x-cpu-throttle-increment': 'int',
>> +            'x-checkpoint-delay': 'int'} }
>> +
>>   ##
>>   # @query-migrate-parameters
>>   #
>
> x-checkpoint-delay intentionally not added to MigrationInfo?
>

Yes, we show the value in query-migrate-parameter, for now,
we didn't export any COLO info in MigrationInfo, we will add it
later, after this series be merged.


>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index 91979b4..89756c9 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -3651,7 +3651,7 @@ Set migration parameters
>>   - "compress-level": set compression level during migration (json-int)
>>   - "compress-threads": set compression thread count for migration (json-int)
>>   - "decompress-threads": set decompression thread count for migration (json-int)
>> -
>> +- "x-checkpoint-delay": set the delay time for periodic checkpoint (json-int)
>
> Unit?
>

As above, ms.

>>   Arguments:
>>
>>   Example:
>> @@ -3664,7 +3664,7 @@ EQMP
>>       {
>>           .name       = "migrate-set-parameters",
>>           .args_type  =
>> -            "compress-level:i?,compress-threads:i?,decompress-threads:i?",
>> +            "compress-level:i?,compress-threads:i?,decompress-threads:i?,x-checkpoint-delay:i?",
>>           .mhandler.cmd_new = qmp_marshal_migrate_set_parameters,
>>       },
>>   SQMP
>
> .
>

Thanks.
Hailiang

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover
  2015-12-19  9:38   ` Markus Armbruster
@ 2015-12-22 13:50     ` Hailiang Zhang
  2015-12-25  2:27       ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-22 13:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	dgilbert, hongyang.yang

On 2015/12/19 17:38, Markus Armbruster wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> We leave users to choose whatever heartbeat solution they want, if the heartbeat
>> is lost, or other errors they detect, they can use experimental command
>> 'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
>> accordingly.
>>
>> For example, if the command is sent to the PVM, the Primary side will
>> exit COLO mode and take over operation. If sent to the Secondary, the
>> secondary will run failover work, then take over server operation to
>> become the new Primary.
>>
>> Cc: Luiz Capitulino <lcapitulino@redhat.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
>> - Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
>> v10:
>> - Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   hmp-commands.hx              | 15 +++++++++++++++
>>   hmp.c                        |  8 ++++++++
>>   hmp.h                        |  1 +
>>   include/migration/colo.h     |  3 +++
>>   include/migration/failover.h | 20 ++++++++++++++++++++
>>   migration/Makefile.objs      |  2 +-
>>   migration/colo-comm.c        | 11 +++++++++++
>>   migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>>   migration/colo.c             |  1 +
>>   qapi-schema.json             | 29 +++++++++++++++++++++++++++++
>>   qmp-commands.hx              | 19 +++++++++++++++++++
>>   stubs/migration-colo.c       |  8 ++++++++
>>   12 files changed, 157 insertions(+), 1 deletion(-)
>>   create mode 100644 include/migration/failover.h
>>   create mode 100644 migration/colo-failover.c
>>
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index bb52e4d..a381b0b 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
>>   ETEXI
>>
>>       {
>> +        .name       = "x_colo_lost_heartbeat",
>> +        .args_type  = "",
>> +        .params     = "",
>> +        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
>> +                      "a failover or takeover is needed.",
>> +        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
>> +    },
>> +
>> +STEXI
>> +@item x_colo_lost_heartbeat
>> +@findex x_colo_lost_heartbeat
>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>> +ETEXI
>> +
>> +    {
>>           .name       = "client_migrate_info",
>>           .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>>           .params     = "protocol hostname port tls-port cert-subject",
>> diff --git a/hmp.c b/hmp.c
>> index ee87d38..dc6dc30 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -1310,6 +1310,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
>>       hmp_handle_error(mon, &err);
>>   }
>>
>> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
>> +{
>> +    Error *err = NULL;
>> +
>> +    qmp_x_colo_lost_heartbeat(&err);
>> +    hmp_handle_error(mon, &err);
>> +}
>> +
>>   void hmp_set_password(Monitor *mon, const QDict *qdict)
>>   {
>>       const char *protocol  = qdict_get_str(qdict, "protocol");
>> diff --git a/hmp.h b/hmp.h
>> index a8c5b5a..864a300 100644
>> --- a/hmp.h
>> +++ b/hmp.h
>> @@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
>>   void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
>>   void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
>>   void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
>> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
>>   void hmp_set_password(Monitor *mon, const QDict *qdict);
>>   void hmp_expire_password(Monitor *mon, const QDict *qdict);
>>   void hmp_eject(Monitor *mon, const QDict *qdict);
>> diff --git a/include/migration/colo.h b/include/migration/colo.h
>> index 2676c4a..ba27719 100644
>> --- a/include/migration/colo.h
>> +++ b/include/migration/colo.h
>> @@ -17,6 +17,7 @@
>>   #include "migration/migration.h"
>>   #include "qemu/coroutine_int.h"
>>   #include "qemu/thread.h"
>> +#include "qemu/main-loop.h"
>>
>>   bool colo_supported(void);
>>   void colo_info_mig_init(void);
>> @@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
>>   void migration_incoming_exit_colo(void);
>>   void *colo_process_incoming_thread(void *opaque);
>>   bool migration_incoming_in_colo_state(void);
>> +
>> +COLOMode get_colo_mode(void);
>>   #endif
>> diff --git a/include/migration/failover.h b/include/migration/failover.h
>> new file mode 100644
>> index 0000000..1785b52
>> --- /dev/null
>> +++ b/include/migration/failover.h
>> @@ -0,0 +1,20 @@
>> +/*
>> + *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>> + *  (a.k.a. Fault Tolerance or Continuous Replication)
>> + *
>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>> + * Copyright (c) 2015 FUJITSU LIMITED
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later.  See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef QEMU_FAILOVER_H
>> +#define QEMU_FAILOVER_H
>> +
>> +#include "qemu-common.h"
>> +
>> +void failover_request_active(Error **errp);
>> +
>> +#endif
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index 81b5713..920d1e7 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -1,6 +1,6 @@
>>   common-obj-y += migration.o tcp.o
>> -common-obj-$(CONFIG_COLO) += colo.o
>>   common-obj-y += colo-comm.o
>> +common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
>>   common-obj-y += vmstate.o
>>   common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>>   common-obj-y += xbzrle.o postcopy-ram.o
>> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
>> index 30df3d3..58a6488 100644
>> --- a/migration/colo-comm.c
>> +++ b/migration/colo-comm.c
>> @@ -20,6 +20,17 @@ typedef struct {
>>
>>   static COLOInfo colo_info;
>>
>> +COLOMode get_colo_mode(void)
>> +{
>> +    if (migration_in_colo_state()) {
>> +        return COLO_MODE_PRIMARY;
>> +    } else if (migration_incoming_in_colo_state()) {
>> +        return COLO_MODE_SECONDARY;
>> +    } else {
>> +        return COLO_MODE_UNKNOWN;
>> +    }
>> +}
>> +
>>   static void colo_info_pre_save(void *opaque)
>>   {
>>       COLOInfo *s = opaque;
>> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
>> new file mode 100644
>> index 0000000..e3897c6
>> --- /dev/null
>> +++ b/migration/colo-failover.c
>> @@ -0,0 +1,41 @@
>> +/*
>> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>> + * (a.k.a. Fault Tolerance or Continuous Replication)
>> + *
>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2015 FUJITSU LIMITED
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later.  See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "migration/colo.h"
>> +#include "migration/failover.h"
>> +#include "qmp-commands.h"
>> +#include "qapi/qmp/qerror.h"
>> +
>> +static QEMUBH *failover_bh;
>> +
>> +static void colo_failover_bh(void *opaque)
>> +{
>> +    qemu_bh_delete(failover_bh);
>> +    failover_bh = NULL;
>> +    /*TODO: Do failover work */
>> +}
>> +
>> +void failover_request_active(Error **errp)
>> +{
>> +    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
>> +    qemu_bh_schedule(failover_bh);
>> +}
>> +
>> +void qmp_x_colo_lost_heartbeat(Error **errp)
>> +{
>> +    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
>> +        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
>> +        return;
>> +    }
>> +
>> +    failover_request_active(errp);
>> +}
>> diff --git a/migration/colo.c b/migration/colo.c
>> index ca5df44..7098497 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -17,6 +17,7 @@
>>   #include "trace.h"
>>   #include "qemu/error-report.h"
>>   #include "qemu/sockets.h"
>> +#include "migration/failover.h"
>>
>>   /* colo buffer */
>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index a5699a7..feb7d53 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -761,6 +761,35 @@
>>               'vmstate-send', 'vmstate-size','vmstate-received',
>>               'vmstate-loaded' ] }
>>
>> +##
>> +# @COLOMode
>> +#
>> +# The colo mode
>
> This is rather terse for an ignorant reader like me.
>

Hmm, this is used to distinguish Primary and Secondary sides, I will
add more comments.

>> +#
>> +# @unknown: unknown mode
>
> What does "unknown mode" mean, and how can it happen?
>

It will never happen, i will remove it. :)

>> +#
>> +# @primary: master side
>> +#
>> +# @secondary: slave side
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOMode',
>> +  'data': [ 'unknown', 'primary', 'secondary'] }
>> +
>> +##
>> +# @x-colo-lost-heartbeat
>> +#
>> +# Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> +# If this command is sent to the PVM, the Primary side will exit COLO mode.
>> +# If sent to the Secondary, the Secondary side will run failover work,
>> +# then takes over server operation to become the service VM.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'command': 'x-colo-lost-heartbeat' }
>> +
>> +##
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index 89756c9..76ad208 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -805,6 +805,25 @@ Example:
>>   EQMP
>>
>>       {
>> +        .name       = "x-colo-lost-heartbeat",
>> +        .args_type  = "",
>> +        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
>> +    },
>> +
>> +SQMP
>> +x-colo-lost-heartbeat
>> +--------------------
>> +
>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>> +
>> +Example:
>> +
>> +-> { "execute": "x-colo-lost-heartbeat" }
>> +<- { "return": {} }
>> +
>> +EQMP
>> +
>> +    {
>>           .name       = "client_migrate_info",
>>           .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>>           .params     = "protocol hostname port tls-port cert-subject",
>> diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
>> index c12516e..5028f63 100644
>> --- a/stubs/migration-colo.c
>> +++ b/stubs/migration-colo.c
>> @@ -11,6 +11,7 @@
>>    */
>>
>>   #include "migration/colo.h"
>> +#include "qmp-commands.h"
>>
>>   bool colo_supported(void)
>>   {
>> @@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
>>   {
>>       return NULL;
>>   }
>> +
>> +void qmp_x_colo_lost_heartbeat(Error **errp)
>> +{
>> +    error_setg(errp, "COLO is not supported, please rerun configure"
>> +                     " with --enable-colo option in order to support"
>> +                     " COLO feature");
>> +}
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
  2015-12-18 15:47         ` Dr. David Alan Gilbert
@ 2015-12-23  1:24           ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-23  1:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/18 23:47, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/12/17 18:52, Dr. David Alan Gilbert wrote:
>>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2015/12/15 20:14, Dr. David Alan Gilbert wrote:
>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>> This is the 12th version of COLO.
>>>>>>
>>>>>> As usual, this version of COLO is only support periodic checkpoint,
>>>>>> just like MicroCheckpointing and Remus does.
>>>>>>
>>>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>>>> https://github.com/coloft/qemu/commits/colo-v2.3-periodic-mode
>>>>>
>>>>> Hi,
>>>>>    Have you tried wiring in Zhang Chen's new userland colo proxy yet?
>>>>> I'd like to start trying it out.
>>>>>
>>>>
>>>> Not yet, actually, for frame part, we can re-use most of the previous codes that based on
>>>> kernel proxy. And, yes, please, you are welcome to join us. ;)
>>>
>>> Yes, that's certainly something I'll look at immediately at the start of the new year
>>> (I'm out for 2 weeks from Friday).
>>>
>>
>> Great~
>>
>>> I've just tested this series on my machines, and it works well.
>>
>> Thank you for the testing.
>>
>>> Two things:
>>>    1) I just posted a patch to add an HMP equivalent to x-blockdev-change
>>>    2) If you run with an older machine type (e.g. pc-i440fx-2.3) then if I failover to the
>>> secondary then I hit a 'invalid runstate transition: 'inmigrate' -> 'prelaunch'';
>>> I guess this is something to do with global_state.
>>>
>>
>> Yes, we have fixed one problem related to global_state. I didn't test COLO with
>> older machine type. I will look into it, thanks for reporting it.
>

Hi Dave,

> I think I've Reviewed-by or sent comments on all of the patches in this set with the exception of
>    25 - that is QMP so I'll leave that for Eric
>    33-37 that are Network related which I don't know much about, so I'll leave those for Jason
>    38 - that's mostly block, so perhaps Stefan is best to look at that.
>

Thanks very much for your help. :)

Hailiang

> Getting closer!
>
> Dave
>
>>
>> Hailiang
>>
>>> Dave
>>>
>>>>> Dave
>>>>>
>>>>>> Test procedure:
>>>>>> 1. Startup qemu
>>>>>> Primary side:
>>>>>> #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
>>>>>> Secondary side:
>>>>>> #x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
>>>>>> 2. On Secondary VM's QEMU monitor, issue command
>>>>>> {'execute':'qmp_capabilities'}
>>>>>> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
>>>>>> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
>>>>>> {'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
>>>>>>
>>>>>> 3. On Primary VM's QEMU monitor, issue command:
>>>>>> {'execute':'qmp_capabilities'}
>>>>>> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
>>>>>> {'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
>>>>>> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
>>>>>> {'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
>>>>>>
>>>>>> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
>>>>>> You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
>>>>>> to change the checkpoint period time.
>>>>>>
>>>>>> 5. Failover test
>>>>>> You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
>>>>>> monitor at the same time, then SVM will failover and client will not feel this
>>>>>> change.
>>>>>>
>>>>>> Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
>>>>>> issue block related command to stop block replication.
>>>>>> Primary:
>>>>>>    Remove the nbd child from the quorum:
>>>>>>    { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>>>>>>    Note: there is no qmp command to remove the blockdev now
>>>>>>
>>>>>> Secondary:
>>>>>>    The primary host is down, so we should do the following thing:
>>>>>>    { 'execute': 'nbd-server-stop' }
>>>>>>
>>>>>> Please review, thanks.
>>>>>>
>>>>>> TODO:
>>>>>> 1. Implement packets compare module (proxy) in qemu (Doing)
>>>>>> 2. Checkpoint based on proxy in qemu
>>>>>> 3. The capability of continuous FT
>>>>>>
>>>>>> v12:
>>>>>>   - Fix the bug that default buffer filter broken vhost-net.
>>>>>>   - Add an flag in struct NetFilterState to help skipping default
>>>>>>    filter for packets travelling through filter layer.
>>>>>>   - Remove the default failover treatment which may cause split-brain.
>>>>>>   - Rename checkpoint-delay to x-checkpoint-delay.
>>>>>>   - Check if all netdev supports default filter before going into COLO.
>>>>>>   - Reconstruct send/receive helper functions in patch 10.
>>>>>>   - Address serveral other comments from Dave
>>>>>>
>>>>>> v11:
>>>>>>   - Re-implement buffer/release packets based on filter-buffer according
>>>>>>     to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
>>>>>>   - Rebase master to re-use some stuff introduced by post-copy.
>>>>>>   - Address several comments from Eric and Dave, the fixing record can
>>>>>>     be found in each patch.
>>>>>>
>>>>>> v10:
>>>>>>   - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
>>>>>>   - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
>>>>>>   - Simplify the process of primary side by dropping colo thread and reusing
>>>>>>     migration thread. (Dave's suggestion)
>>>>>>   - Add several netfilter related APIs to support buffer/release packets
>>>>>>     for COLO (patch 32 ~ patch 36)
>>>>>>
>>>>>> zhanghailiang (38):
>>>>>>    configure: Add parameter for configure to enable/disable COLO support
>>>>>>    migration: Introduce capability 'x-colo' to migration
>>>>>>    COLO: migrate colo related info to secondary node
>>>>>>    migration: Export migrate_set_state()
>>>>>>    migration: Add state records for migration incoming
>>>>>>    migration: Integrate COLO checkpoint process into migration
>>>>>>    migration: Integrate COLO checkpoint process into loadvm
>>>>>>    migration: Rename the'file' member of MigrationState
>>>>>>    COLO/migration: Create a new communication path from destination to
>>>>>>      source
>>>>>>    COLO: Implement colo checkpoint protocol
>>>>>>    COLO: Add a new RunState RUN_STATE_COLO
>>>>>>    QEMUSizedBuffer: Introduce two help functions for qsb
>>>>>>    COLO: Save PVM state to secondary side when do checkpoint
>>>>>>    ram: Split host_from_stream_offset() into two helper functions
>>>>>>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>>>>>>    ram/COLO: Record the dirty pages that SVM received
>>>>>>    COLO: Load VMState into qsb before restore it
>>>>>>    COLO: Flush PVM's cached RAM into SVM's memory
>>>>>>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>>>>>>    COLO: synchronize PVM's state to SVM periodically
>>>>>>    COLO failover: Introduce a new command to trigger a failover
>>>>>>    COLO failover: Introduce state to record failover process
>>>>>>    COLO: Implement failover work for Primary VM
>>>>>>    COLO: Implement failover work for Secondary VM
>>>>>>    qmp event: Add event notification for COLO error
>>>>>>    COLO failover: Shutdown related socket fd when do failover
>>>>>>    COLO failover: Don't do failover during loading VM's state
>>>>>>    COLO: Process shutdown command for VM in COLO state
>>>>>>    COLO: Update the global runstate after going into colo state
>>>>>>    savevm: Split load vm state function qemu_loadvm_state
>>>>>>    COLO: Separate the process of saving/loading ram and device state
>>>>>>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>>>>>>    net/filter-buffer: Add default filter-buffer for each netdev
>>>>>>    filter-buffer: Accept zero interval
>>>>>>    filter-buffer: Introduce a helper function to enable/disable default
>>>>>>      filter
>>>>>>    filter-buffer: Introduce a helper function to release packets
>>>>>>    colo: Use default buffer-filter to buffer and release packets
>>>>>>    COLO: Add block replication into colo process
>>>>>>
>>>>>>   configure                     |  11 +
>>>>>>   docs/qmp-events.txt           |  17 +
>>>>>>   hmp-commands.hx               |  15 +
>>>>>>   hmp.c                         |  15 +
>>>>>>   hmp.h                         |   1 +
>>>>>>   include/exec/ram_addr.h       |   9 +-
>>>>>>   include/migration/colo.h      |  38 +++
>>>>>>   include/migration/failover.h  |  33 ++
>>>>>>   include/migration/migration.h |  18 +-
>>>>>>   include/migration/qemu-file.h |   3 +-
>>>>>>   include/net/filter.h          |  12 +
>>>>>>   include/net/net.h             |   5 +
>>>>>>   include/sysemu/sysemu.h       |   9 +
>>>>>>   migration/Makefile.objs       |   2 +
>>>>>>   migration/colo-comm.c         |  71 ++++
>>>>>>   migration/colo-failover.c     |  83 +++++
>>>>>>   migration/colo.c              | 765 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>   migration/exec.c              |   4 +-
>>>>>>   migration/fd.c                |   4 +-
>>>>>>   migration/migration.c         | 216 ++++++++----
>>>>>>   migration/postcopy-ram.c      |   6 +-
>>>>>>   migration/qemu-file-buf.c     |  61 ++++
>>>>>>   migration/ram.c               | 213 ++++++++++--
>>>>>>   migration/rdma.c              |   2 +-
>>>>>>   migration/savevm.c            | 295 ++++++++++++----
>>>>>>   migration/tcp.c               |   4 +-
>>>>>>   migration/unix.c              |   4 +-
>>>>>>   net/filter-buffer.c           | 127 ++++++-
>>>>>>   net/filter.c                  |   6 +-
>>>>>>   net/net.c                     |  58 ++++
>>>>>>   qapi-schema.json              | 106 +++++-
>>>>>>   qapi/event.json               |  17 +
>>>>>>   qmp-commands.hx               |  24 +-
>>>>>>   stubs/Makefile.objs           |   1 +
>>>>>>   stubs/migration-colo.c        |  45 +++
>>>>>>   trace-events                  |  10 +
>>>>>>   vl.c                          |  37 +-
>>>>>>   37 files changed, 2152 insertions(+), 195 deletions(-)
>>>>>>   create mode 100644 include/migration/colo.h
>>>>>>   create mode 100644 include/migration/failover.h
>>>>>>   create mode 100644 migration/colo-comm.c
>>>>>>   create mode 100644 migration/colo-failover.c
>>>>>>   create mode 100644 migration/colo.c
>>>>>>   create mode 100644 stubs/migration-colo.c
>>>>>>
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-19 10:02   ` Markus Armbruster
  2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
@ 2015-12-23  1:24     ` Wen Congyang
  2016-01-05 19:21       ` [Qemu-devel] [Qemu-block] " John Snow
  2015-12-23  3:10     ` [Qemu-devel] " Hailiang Zhang
  2 siblings, 1 reply; 94+ messages in thread
From: Wen Congyang @ 2015-12-23  1:24 UTC (permalink / raw)
  To: Markus Armbruster, zhanghailiang
  Cc: qemu-block, lizhijian, quintela, qemu-devel, yunhong.jiang,
	eddie.dong, peter.huangpeng, Michael Roth, arei.gonglei,
	stefanha, amit.shah, dgilbert, hongyang.yang

On 12/19/2015 06:02 PM, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
> 
> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
> 
>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>  docs/qmp-events.txt | 17 +++++++++++++++++
>>  migration/colo.c    | 11 +++++++++++
>>  qapi-schema.json    | 16 ++++++++++++++++
>>  qapi/event.json     | 17 +++++++++++++++++
>>  4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>  event.
>>  
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
> 
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
> 
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
> 
> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
> 
> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
> 
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet.  But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs.  Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.

COLO is not a block job. If live migration is a background jon, COLO
is also a backgroud job.

> 
> Quick overview of the block job QMP interface:
> 
> * Commands to create a job: block-commit, block-stream, drive-mirror,
>   drive-backup.
> 
> * Get information on jobs: query-block-jobs
> 
> * Pause a job: block-job-pause
> 
> * Resume a job: block-job-resume
> 
> * Cancel a job: block-job-cancel
> 
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
> 
> * Block job error event: BLOCK_JOB_ERROR
> 
> * Block job synchronous completion: event BLOCK_JOB_READY and command
>   block-job-complete

What is background job infrastructure? Do you mean implement all the above
interfaces for each background job?

Thanks
Wen Congyang

> 
>>  DEVICE_DELETED
>>  --------------
>>  
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>  #include "qemu/error-report.h"
>>  #include "qemu/sockets.h"
>>  #include "migration/failover.h"
>> +#include "qapi-event.h"
>>  
>>  /* colo buffer */
>>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>  out:
>>      if (ret < 0) {
>>          error_report("%s: %s", __func__, strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      qsb_free(buffer);
>> @@ -516,6 +522,11 @@ out:
>>      if (ret < 0) {
>>          error_report("colo incoming thread will exit, detect error: %s",
>>                       strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      if (fb) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index feb7d53..f6ecb88 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>>    'data': [ 'unknown', 'primary', 'secondary'] }
>>  
>>  ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
> 
> How can @unknown happen?
> 
>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOExitReason',
>> +  'data': [ 'unknown', 'request', 'error'] }
>> +
>> +##
>>  # @x-colo-lost-heartbeat
>>  #
>>  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> diff --git a/qapi/event.json b/qapi/event.json
>> index f0cef01..f63d456 100644
>> --- a/qapi/event.json
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>>    'data': {'status': 'MigrationStatus'}}
>>  
>>  ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# at the request of users.
>> +#
>> +# @mode: which COLO mode the VM was in when it exited.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @reason: describes the reason for the COLO exit.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>> +
>> +##
>>  # @ACPI_DEVICE_OST
>>  #
>>  # Emitted when guest executes ACPI _OST method.
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-18 16:03   ` Eric Blake
@ 2015-12-23  1:55     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-23  1:55 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: lizhijian, quintela, Markus Armbruster, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, arei.gonglei, stefanha,
	amit.shah, Michael Roth, hongyang.yang

On 2015/12/19 0:03, Eric Blake wrote:
> On 12/15/2015 01:22 AM, zhanghailiang wrote:
>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>   event.
>>
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. (json-string)
>> + - "error": error message (json-string, operation)
>
> s/operation/optional/
> May want to word it as:
>
> - "error": error message for human consumption (json-string, optional)
>
> to point out that machines shouldn't parse it.
>

Good idea, i will fix it like that.

>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>   #include "qemu/error-report.h"
>>   #include "qemu/sockets.h"
>>   #include "migration/failover.h"
>> +#include "qapi-event.h"
>>
>>   /* colo buffer */
>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>   out:
>>       if (ret < 0) {
>>           error_report("%s: %s", __func__, strerror(-ret));
>
> Unrelated: I mentioned in another thread that we may want to start
> thinking about adding error_report_errno(); this would be another client.
>

Hmm, yes, we may need such a helper function.

>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>>     'data': [ 'unknown', 'primary', 'secondary'] }
>>
>>   ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
>> +#
>
> If we never return 'unknown', then it is not worth having it in the enum
> (we can always add it later if we find a reason to have it; but adding
> it now feels premature if the code base isn't using it).
>

You are right, it should never happen, i will remove it in next version, thanks.

> Otherwise looks okay to me.
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-19 10:02   ` Markus Armbruster
  2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
  2015-12-23  1:24     ` [Qemu-devel] " Wen Congyang
@ 2015-12-23  3:10     ` Hailiang Zhang
  2016-01-11 13:24       ` Markus Armbruster
  2 siblings, 1 reply; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-23  3:10 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Michael Roth, lizhijian, quintela, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, arei.gonglei, stefanha, amit.shah,
	qemu-block, dgilbert, hongyang.yang

On 2015/12/19 18:02, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
>

Er, this event just used to help users to know what happened to VM with COLO FT
on. If users get this event, they can make further check what's wrong, and
decide which side should take over the work.

> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   docs/qmp-events.txt | 17 +++++++++++++++++
>>   migration/colo.c    | 11 +++++++++++
>>   qapi-schema.json    | 16 ++++++++++++++++
>>   qapi/event.json     | 17 +++++++++++++++++
>>   4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>   event.
>>
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
>
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
>

If they get this event with 'reason' is 'request', it is 'at the user's request',
Or, it will be 'due to error' (The key for 'reason' will be 'error', and we have an optional
error message which may help to figure out what happened.)

>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
>
> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
>

As above, what i have said.

> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
>
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet.  But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs.  Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.
>

Interesting, i'm not quite familiar with this block background job infrastructure.
If we consider COLO FT as a background job, we can certainly use it. I will have a look
at it.

> Quick overview of the block job QMP interface:
>
> * Commands to create a job: block-commit, block-stream, drive-mirror,
>    drive-backup.
>
> * Get information on jobs: query-block-jobs
>
> * Pause a job: block-job-pause
>
> * Resume a job: block-job-resume
>
> * Cancel a job: block-job-cancel
>
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
>
> * Block job error event: BLOCK_JOB_ERROR
>
> * Block job synchronous completion: event BLOCK_JOB_READY and command
>    block-job-complete
>
>>   DEVICE_DELETED
>>   --------------
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>   #include "qemu/error-report.h"
>>   #include "qemu/sockets.h"
>>   #include "migration/failover.h"
>> +#include "qapi-event.h"
>>
>>   /* colo buffer */
>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>   out:
>>       if (ret < 0) {
>>           error_report("%s: %s", __func__, strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>       }
>>
>>       qsb_free(buffer);
>> @@ -516,6 +522,11 @@ out:
>>       if (ret < 0) {
>>           error_report("colo incoming thread will exit, detect error: %s",
>>                        strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>       }
>>
>>       if (fb) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index feb7d53..f6ecb88 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>>     'data': [ 'unknown', 'primary', 'secondary'] }
>>
>>   ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
>
> How can @unknown happen?
>

>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOExitReason',
>> +  'data': [ 'unknown', 'request', 'error'] }
>> +
>> +##
>>   # @x-colo-lost-heartbeat
>>   #
>>   # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> diff --git a/qapi/event.json b/qapi/event.json
>> index f0cef01..f63d456 100644
>> --- a/qapi/event.json
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>>     'data': {'status': 'MigrationStatus'}}
>>
>>   ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# at the request of users.
>> +#
>> +# @mode: which COLO mode the VM was in when it exited.
>
> Can we get 'unknown' here?
>

No, i will remove it :)

>> +#
>> +# @reason: describes the reason for the COLO exit.
>
> Can we get 'unknown' here?
>

No, it should never happen for now. i will remove it.

>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>> +
>> +##
>>   # @ACPI_DEVICE_OST
>>   #
>>   # Emitted when guest executes ACPI _OST method.
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
@ 2015-12-23  3:14       ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-23  3:14 UTC (permalink / raw)
  To: John Snow, Markus Armbruster
  Cc: qemu-block, lizhijian, quintela, qemu-devel, yunhong.jiang,
	eddie.dong, peter.huangpeng, Michael Roth, arei.gonglei,
	stefanha, amit.shah, dgilbert, hongyang.yang

On 2015/12/22 5:14, John Snow wrote:
>
>
> On 12/19/2015 05:02 AM, Markus Armbruster wrote:
>> Copying qemu-block because this seems related to generalising block jobs
>> to background jobs.
>>
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>>> failover work immediately.
>>> If users don't want to get involved in COLO's failover verdict,
>>> it is still necessary to notify users that we exited COLO mode.
>>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>> v11:
>>> - Fix several typos found by Eric
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> ---
>>>   docs/qmp-events.txt | 17 +++++++++++++++++
>>>   migration/colo.c    | 11 +++++++++++
>>>   qapi-schema.json    | 16 ++++++++++++++++
>>>   qapi/event.json     | 17 +++++++++++++++++
>>>   4 files changed, 61 insertions(+)
>>>
>>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>>> index d2f1ce4..19f68fc 100644
>>> --- a/docs/qmp-events.txt
>>> +++ b/docs/qmp-events.txt
>>> @@ -184,6 +184,23 @@ Example:
>>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>>   event.
>>>
>>> +COLO_EXIT
>>> +---------
>>> +
>>> +Emitted when VM finishes COLO mode due to some errors happening or
>>> +at the request of users.
>>
>> How would the event's recipient distinguish between "due to error" and
>> "at the user's request"?
>>
>>> +
>>> +Data:
>>> +
>>> + - "mode": COLO mode, primary or secondary side (json-string)
>>> + - "reason":  the exit reason, internal error or external request. (json-string)
>>> + - "error": error message (json-string, operation)
>>> +
>>> +Example:
>>> +
>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>>> +
>>
>> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
>> some kind of COLO background job, and it just finished for whatever
>> reason?
>>
>> If yes, this COLO job could be an instance of the general background job
>> concept we're trying to grow from the existing block job concept.
>>
>> I'm not asking you to rebase your work onto the background job
>> infrastructure, not least for the simple reason that it doesn't exist,
>> yet.  But I think it would be fruitful to compare your COLO job
>> management QMP interface with the one we have for block jobs.  Not only
>> may that avoid unnecessary inconsistency, it could also help shape the
>> general background job interface.
>>
>
> Yes. The "background job" concept doesn't exist in a formal way outside
> of the block layer yet, but we're looking to expand it as we re-tool the
> block jobs themselves.
>
> It may be the case that the COLO commands and events need to go in as
> they are now, but later we can bring them back into the generalized job
> infrastructure.
>

Agreed. ;)

>> Quick overview of the block job QMP interface:
>>
>> * Commands to create a job: block-commit, block-stream, drive-mirror,
>>    drive-backup.
>>
>> * Get information on jobs: query-block-jobs
>>
>> * Pause a job: block-job-pause
>>
>> * Resume a job: block-job-resume
>>
>> * Cancel a job: block-job-cancel
>>
>> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
>>
>> * Block job error event: BLOCK_JOB_ERROR
>>
>> * Block job synchronous completion: event BLOCK_JOB_READY and command
>>    block-job-complete
>>
>
> The block-agnostic version of these commands would likely be:
>
> query-jobs
> job-pause
> job-resume
> job-cancel
> job-complete
>
> Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY.
>
>
> It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it
> occurred due to an error, we'd also see JOB_ERROR emitted.
>

Yes, if we use this job frame for COLO, the COLO_EXIT will be like that.

>>>   DEVICE_DELETED
>>>   --------------
>>>
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index d1dd4e1..d06c14f 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -18,6 +18,7 @@
>>>   #include "qemu/error-report.h"
>>>   #include "qemu/sockets.h"
>>>   #include "migration/failover.h"
>>> +#include "qapi-event.h"
>>>
>>>   /* colo buffer */
>>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>>   out:
>>>       if (ret < 0) {
>>>           error_report("%s: %s", __func__, strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>       }
>>>
>>>       qsb_free(buffer);
>>> @@ -516,6 +522,11 @@ out:
>>>       if (ret < 0) {
>>>           error_report("colo incoming thread will exit, detect error: %s",
>>>                        strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>       }
>>>
>>>       if (fb) {
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index feb7d53..f6ecb88 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -778,6 +778,22 @@
>>>     'data': [ 'unknown', 'primary', 'secondary'] }
>>>
>>>   ##
>>> +# @COLOExitReason
>>> +#
>>> +# The reason for a COLO exit
>>> +#
>>> +# @unknown: unknown reason
>>
>> How can @unknown happen?
>>
>>> +#
>>> +# @request: COLO exit is due to an external request
>>> +#
>>> +# @error: COLO exit is due to an internal error
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOExitReason',
>>> +  'data': [ 'unknown', 'request', 'error'] }
>>> +
>>> +##
>>>   # @x-colo-lost-heartbeat
>>>   #
>>>   # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>>> diff --git a/qapi/event.json b/qapi/event.json
>>> index f0cef01..f63d456 100644
>>> --- a/qapi/event.json
>>> +++ b/qapi/event.json
>>> @@ -255,6 +255,23 @@
>>>     'data': {'status': 'MigrationStatus'}}
>>>
>>>   ##
>>> +# @COLO_EXIT
>>> +#
>>> +# Emitted when VM finishes COLO mode due to some errors happening or
>>> +# at the request of users.
>>> +#
>>> +# @mode: which COLO mode the VM was in when it exited.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @reason: describes the reason for the COLO exit.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @error: #optional, error message. Only present on error happening.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'event': 'COLO_EXIT',
>>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>>> +
>>> +##
>>>   # @ACPI_DEVICE_OST
>>>   #
>>>   # Emitted when guest executes ACPI _OST method.
>>
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state
  2015-12-15 10:21   ` Dr. David Alan Gilbert
@ 2015-12-25  1:02     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-25  1:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 18:21, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We should not do failover work while the main thread is loading
>> VM's state, otherwise it will destroy the consistent of VM's memory and
>> device state.
>>
>> Here we add a new failover status 'RELAUNCH' which means we should
>> relaunch the process of failover.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   include/migration/failover.h |  2 ++
>>   migration/colo.c             | 25 +++++++++++++++++++++++++
>>   2 files changed, 27 insertions(+)
>>
>> diff --git a/include/migration/failover.h b/include/migration/failover.h
>> index fba3931..e115d25 100644
>> --- a/include/migration/failover.h
>> +++ b/include/migration/failover.h
>> @@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
>>       FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
>>       FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
>>       FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
>> +    /* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
>> +    FAILOVER_STATUS_RELAUNCH = 4,
>>   } COLOFailoverStatus;
>>
>>   void failover_init_state(void);
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 58531e7..f4bb661 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -20,6 +20,8 @@
>>   #include "migration/failover.h"
>>   #include "qapi-event.h"
>>
>> +static bool vmstate_loading;
>> +
>>   /* colo buffer */
>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>>
>> @@ -52,6 +54,19 @@ static void secondary_vm_do_failover(void)
>>       int old_state;
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>
>> +    /* Can not do failover during the process of VM's loading VMstate, Or
>> +      * it will break the secondary VM.
>> +      */
>> +    if (vmstate_loading) {
>> +        old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>> +                                       FAILOVER_STATUS_RELAUNCH);
>> +        if (old_state != FAILOVER_STATUS_HANDLING) {
>> +            error_report("Unknow error while do failover for secondary VM,"
>> +                         "old_state: %d", old_state);
>
> Typo: 'Unknown' and it would be good to say it was during vmstate_loading.
>
> The state is being loaded from the qemu buffer, not the real file descriptor,
> so we're guaranteed that the vmstate will finish loading; so yes, this is OK.
>

I will fix it in next version.

Thanks.
Hailiang
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>
>> +        }
>> +        return;
>> +    }
>> +
>>       migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
>>                         MIGRATION_STATUS_COMPLETED);
>>
>> @@ -535,13 +550,23 @@ void *colo_process_incoming_thread(void *opaque)
>>
>>           qemu_mutex_lock_iothread();
>>           qemu_system_reset(VMRESET_SILENT);
>> +        vmstate_loading = true;
>>           if (qemu_loadvm_state(fb) < 0) {
>>               error_report("COLO: loadvm failed");
>> +            vmstate_loading = false;
>>               qemu_mutex_unlock_iothread();
>>               goto out;
>>           }
>> +
>> +        vmstate_loading = false;
>>           qemu_mutex_unlock_iothread();
>>
>> +        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
>> +            failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
>> +            failover_request_active(NULL);
>> +            goto out;
>> +        }
>> +
>>           ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>>           if (ret < 0) {
>>               goto out;
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover
  2015-12-22 13:50     ` Hailiang Zhang
@ 2015-12-25  2:27       ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-25  2:27 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Luiz Capitulino,
	dgilbert, hongyang.yang

On 2015/12/22 21:50, Hailiang Zhang wrote:
> On 2015/12/19 17:38, Markus Armbruster wrote:
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> We leave users to choose whatever heartbeat solution they want, if the heartbeat
>>> is lost, or other errors they detect, they can use experimental command
>>> 'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
>>> accordingly.
>>>
>>> For example, if the command is sent to the PVM, the Primary side will
>>> exit COLO mode and take over operation. If sent to the Secondary, the
>>> secondary will run failover work, then take over server operation to
>>> become the new Primary.
>>>
>>> Cc: Luiz Capitulino <lcapitulino@redhat.com>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>> v11:
>>> - Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
>>> - Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
>>> v10:
>>> - Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> ---
>>>   hmp-commands.hx              | 15 +++++++++++++++
>>>   hmp.c                        |  8 ++++++++
>>>   hmp.h                        |  1 +
>>>   include/migration/colo.h     |  3 +++
>>>   include/migration/failover.h | 20 ++++++++++++++++++++
>>>   migration/Makefile.objs      |  2 +-
>>>   migration/colo-comm.c        | 11 +++++++++++
>>>   migration/colo-failover.c    | 41 +++++++++++++++++++++++++++++++++++++++++
>>>   migration/colo.c             |  1 +
>>>   qapi-schema.json             | 29 +++++++++++++++++++++++++++++
>>>   qmp-commands.hx              | 19 +++++++++++++++++++
>>>   stubs/migration-colo.c       |  8 ++++++++
>>>   12 files changed, 157 insertions(+), 1 deletion(-)
>>>   create mode 100644 include/migration/failover.h
>>>   create mode 100644 migration/colo-failover.c
>>>
>>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>>> index bb52e4d..a381b0b 100644
>>> --- a/hmp-commands.hx
>>> +++ b/hmp-commands.hx
>>> @@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
>>>   ETEXI
>>>
>>>       {
>>> +        .name       = "x_colo_lost_heartbeat",
>>> +        .args_type  = "",
>>> +        .params     = "",
>>> +        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
>>> +                      "a failover or takeover is needed.",
>>> +        .mhandler.cmd = hmp_x_colo_lost_heartbeat,
>>> +    },
>>> +
>>> +STEXI
>>> +@item x_colo_lost_heartbeat
>>> +@findex x_colo_lost_heartbeat
>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>> +ETEXI
>>> +
>>> +    {
>>>           .name       = "client_migrate_info",
>>>           .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>>>           .params     = "protocol hostname port tls-port cert-subject",
>>> diff --git a/hmp.c b/hmp.c
>>> index ee87d38..dc6dc30 100644
>>> --- a/hmp.c
>>> +++ b/hmp.c
>>> @@ -1310,6 +1310,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
>>>       hmp_handle_error(mon, &err);
>>>   }
>>>
>>> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
>>> +{
>>> +    Error *err = NULL;
>>> +
>>> +    qmp_x_colo_lost_heartbeat(&err);
>>> +    hmp_handle_error(mon, &err);
>>> +}
>>> +
>>>   void hmp_set_password(Monitor *mon, const QDict *qdict)
>>>   {
>>>       const char *protocol  = qdict_get_str(qdict, "protocol");
>>> diff --git a/hmp.h b/hmp.h
>>> index a8c5b5a..864a300 100644
>>> --- a/hmp.h
>>> +++ b/hmp.h
>>> @@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
>>>   void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
>>>   void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
>>>   void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
>>> +void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
>>>   void hmp_set_password(Monitor *mon, const QDict *qdict);
>>>   void hmp_expire_password(Monitor *mon, const QDict *qdict);
>>>   void hmp_eject(Monitor *mon, const QDict *qdict);
>>> diff --git a/include/migration/colo.h b/include/migration/colo.h
>>> index 2676c4a..ba27719 100644
>>> --- a/include/migration/colo.h
>>> +++ b/include/migration/colo.h
>>> @@ -17,6 +17,7 @@
>>>   #include "migration/migration.h"
>>>   #include "qemu/coroutine_int.h"
>>>   #include "qemu/thread.h"
>>> +#include "qemu/main-loop.h"
>>>
>>>   bool colo_supported(void);
>>>   void colo_info_mig_init(void);
>>> @@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
>>>   void migration_incoming_exit_colo(void);
>>>   void *colo_process_incoming_thread(void *opaque);
>>>   bool migration_incoming_in_colo_state(void);
>>> +
>>> +COLOMode get_colo_mode(void);
>>>   #endif
>>> diff --git a/include/migration/failover.h b/include/migration/failover.h
>>> new file mode 100644
>>> index 0000000..1785b52
>>> --- /dev/null
>>> +++ b/include/migration/failover.h
>>> @@ -0,0 +1,20 @@
>>> +/*
>>> + *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>>> + *  (a.k.a. Fault Tolerance or Continuous Replication)
>>> + *
>>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
>>> + * Copyright (c) 2015 FUJITSU LIMITED
>>> + * Copyright (c) 2015 Intel Corporation
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>>> + * later.  See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#ifndef QEMU_FAILOVER_H
>>> +#define QEMU_FAILOVER_H
>>> +
>>> +#include "qemu-common.h"
>>> +
>>> +void failover_request_active(Error **errp);
>>> +
>>> +#endif
>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>>> index 81b5713..920d1e7 100644
>>> --- a/migration/Makefile.objs
>>> +++ b/migration/Makefile.objs
>>> @@ -1,6 +1,6 @@
>>>   common-obj-y += migration.o tcp.o
>>> -common-obj-$(CONFIG_COLO) += colo.o
>>>   common-obj-y += colo-comm.o
>>> +common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
>>>   common-obj-y += vmstate.o
>>>   common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
>>>   common-obj-y += xbzrle.o postcopy-ram.o
>>> diff --git a/migration/colo-comm.c b/migration/colo-comm.c
>>> index 30df3d3..58a6488 100644
>>> --- a/migration/colo-comm.c
>>> +++ b/migration/colo-comm.c
>>> @@ -20,6 +20,17 @@ typedef struct {
>>>
>>>   static COLOInfo colo_info;
>>>
>>> +COLOMode get_colo_mode(void)
>>> +{
>>> +    if (migration_in_colo_state()) {
>>> +        return COLO_MODE_PRIMARY;
>>> +    } else if (migration_incoming_in_colo_state()) {
>>> +        return COLO_MODE_SECONDARY;
>>> +    } else {
>>> +        return COLO_MODE_UNKNOWN;
>>> +    }
>>> +}
>>> +
>>>   static void colo_info_pre_save(void *opaque)
>>>   {
>>>       COLOInfo *s = opaque;
>>> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
>>> new file mode 100644
>>> index 0000000..e3897c6
>>> --- /dev/null
>>> +++ b/migration/colo-failover.c
>>> @@ -0,0 +1,41 @@
>>> +/*
>>> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>>> + * (a.k.a. Fault Tolerance or Continuous Replication)
>>> + *
>>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
>>> + * Copyright (c) 2015 FUJITSU LIMITED
>>> + * Copyright (c) 2015 Intel Corporation
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>>> + * later.  See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "migration/colo.h"
>>> +#include "migration/failover.h"
>>> +#include "qmp-commands.h"
>>> +#include "qapi/qmp/qerror.h"
>>> +
>>> +static QEMUBH *failover_bh;
>>> +
>>> +static void colo_failover_bh(void *opaque)
>>> +{
>>> +    qemu_bh_delete(failover_bh);
>>> +    failover_bh = NULL;
>>> +    /*TODO: Do failover work */
>>> +}
>>> +
>>> +void failover_request_active(Error **errp)
>>> +{
>>> +    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
>>> +    qemu_bh_schedule(failover_bh);
>>> +}
>>> +
>>> +void qmp_x_colo_lost_heartbeat(Error **errp)
>>> +{
>>> +    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
>>> +        error_setg(errp, QERR_FEATURE_DISABLED, "colo");
>>> +        return;
>>> +    }
>>> +
>>> +    failover_request_active(errp);
>>> +}
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index ca5df44..7098497 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -17,6 +17,7 @@
>>>   #include "trace.h"
>>>   #include "qemu/error-report.h"
>>>   #include "qemu/sockets.h"
>>> +#include "migration/failover.h"
>>>
>>>   /* colo buffer */
>>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index a5699a7..feb7d53 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -761,6 +761,35 @@
>>>               'vmstate-send', 'vmstate-size','vmstate-received',
>>>               'vmstate-loaded' ] }
>>>
>>> +##
>>> +# @COLOMode
>>> +#
>>> +# The colo mode
>>
>> This is rather terse for an ignorant reader like me.
>>
>
> Hmm, this is used to distinguish Primary and Secondary sides, I will
> add more comments.
>
>>> +#
>>> +# @unknown: unknown mode
>>
>> What does "unknown mode" mean, and how can it happen?
>>
>
> It will never happen, i will remove it. :)
>

Er, i made a mistake, we need this 'unknown mode', which indicates we are
not in COLO mode. I will add more comments about it.

>>> +#
>>> +# @primary: master side
>>> +#
>>> +# @secondary: slave side
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOMode',
>>> +  'data': [ 'unknown', 'primary', 'secondary'] }
>>> +
>>> +##
>>> +# @x-colo-lost-heartbeat
>>> +#
>>> +# Tell qemu that heartbeat is lost, request it to do takeover procedures.
>>> +# If this command is sent to the PVM, the Primary side will exit COLO mode.
>>> +# If sent to the Secondary, the Secondary side will run failover work,
>>> +# then takes over server operation to become the service VM.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'command': 'x-colo-lost-heartbeat' }
>>> +
>>> +##
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>>> index 89756c9..76ad208 100644
>>> --- a/qmp-commands.hx
>>> +++ b/qmp-commands.hx
>>> @@ -805,6 +805,25 @@ Example:
>>>   EQMP
>>>
>>>       {
>>> +        .name       = "x-colo-lost-heartbeat",
>>> +        .args_type  = "",
>>> +        .mhandler.cmd_new = qmp_marshal_x_colo_lost_heartbeat,
>>> +    },
>>> +
>>> +SQMP
>>> +x-colo-lost-heartbeat
>>> +--------------------
>>> +
>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>> +
>>> +Example:
>>> +
>>> +-> { "execute": "x-colo-lost-heartbeat" }
>>> +<- { "return": {} }
>>> +
>>> +EQMP
>>> +
>>> +    {
>>>           .name       = "client_migrate_info",
>>>           .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
>>>           .params     = "protocol hostname port tls-port cert-subject",
>>> diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
>>> index c12516e..5028f63 100644
>>> --- a/stubs/migration-colo.c
>>> +++ b/stubs/migration-colo.c
>>> @@ -11,6 +11,7 @@
>>>    */
>>>
>>>   #include "migration/colo.h"
>>> +#include "qmp-commands.h"
>>>
>>>   bool colo_supported(void)
>>>   {
>>> @@ -35,3 +36,10 @@ void *colo_process_incoming_thread(void *opaque)
>>>   {
>>>       return NULL;
>>>   }
>>> +
>>> +void qmp_x_colo_lost_heartbeat(Error **errp)
>>> +{
>>> +    error_setg(errp, "COLO is not supported, please rerun configure"
>>> +                     " with --enable-colo option in order to support"
>>> +                     " COLO feature");
>>> +}
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory
  2015-12-15 11:07   ` Changlong Xie
@ 2015-12-25  3:03     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-25  3:03 UTC (permalink / raw)
  To: Changlong Xie, qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

On 2015/12/15 19:07, Changlong Xie wrote:
> On 12/15/2015 04:22 PM, zhanghailiang wrote:
>> During the time of VM's running, PVM may dirty some pages, we will transfer
>> PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
>> time. So, the content of SVM's RAM cache will always be some with PVM's memory
> "some" => "same"
>

Fixed, thanks.

> Thanks
>      -Xie
>> after checkpoint.
>>
>> Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
>> we do this in a more efficient way:
>> Only flush any page that dirtied by PVM since last checkpoint.
>> In this way, we can ensure SVM's memory same with PVM's.
>>
>> Besides, we must ensure flush RAM cache before load device state.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>> v12:
>> - Add a trace point in the end of colo_flush_ram_cache() (Dave's suggestion)
>> - Add Reviewed-by tag
>> v11:
>> - Move the place of 'need_flush' (Dave's suggestion)
>> - Remove unused 'DPRINTF("Flush ram_cache\n")'
>> v10:
>> - trace the number of dirty pages that be received.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/migration/migration.h |  1 +
>>   migration/colo.c              |  2 --
>>   migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
>>   trace-events                  |  2 ++
>>   4 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index e41372d..221176b 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -336,4 +336,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
>>   /* ram cache */
>>   int colo_init_ram_cache(void);
>>   void colo_release_ram_cache(void);
>> +void colo_flush_ram_cache(void);
>>   #endif
>> diff --git a/migration/colo.c b/migration/colo.c
>> index a4d49ff..e40cdb9 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -401,8 +401,6 @@ void *colo_process_incoming_thread(void *opaque)
>>           }
>>           qemu_mutex_unlock_iothread();
>>
>> -        /* TODO: flush vm state */
>> -
>>           ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>>           if (ret < 0) {
>>               goto out;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 3d5947b..8ff7f7c 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2458,6 +2458,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>        * be atomic
>>        */
>>       bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
>> +    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -2492,6 +2493,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               /* After going into COLO, we should load the Page into colo_cache */
>>               if (ram_cache_enable) {
>>                   host = colo_cache_from_block_offset(block, addr);
>> +                need_flush = true;
>>               } else {
>>                   host = host_from_ram_block_offset(block, addr);
>>               }
>> @@ -2585,6 +2587,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>       }
>>
>>       rcu_read_unlock();
>> +
>> +    if (!ret  && ram_cache_enable && need_flush) {
>> +        colo_flush_ram_cache();
>> +    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> @@ -2657,6 +2663,38 @@ void colo_release_ram_cache(void)
>>       rcu_read_unlock();
>>   }
>>
>> +/*
>> + * Flush content of RAM cache into SVM's memory.
>> + * Only flush the pages that be dirtied by PVM or SVM or both.
>> + */
>> +void colo_flush_ram_cache(void)
>> +{
>> +    RAMBlock *block = NULL;
>> +    void *dst_host;
>> +    void *src_host;
>> +    ram_addr_t offset = 0;
>> +
>> +    trace_colo_flush_ram_cache_begin(migration_dirty_pages);
>> +    rcu_read_lock();
>> +    block = QLIST_FIRST_RCU(&ram_list.blocks);
>> +    while (block) {
>> +        ram_addr_t ram_addr_abs;
>> +        offset = migration_bitmap_find_dirty(block, offset, &ram_addr_abs);
>> +        migration_bitmap_clear_dirty(ram_addr_abs);
>> +        if (offset >= block->used_length) {
>> +            offset = 0;
>> +            block = QLIST_NEXT_RCU(block, next);
>> +        } else {
>> +            dst_host = block->host + offset;
>> +            src_host = block->colo_cache + offset;
>> +            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
>> +        }
>> +    }
>> +    rcu_read_unlock();
>> +    trace_colo_flush_ram_cache_end();
>> +    assert(migration_dirty_pages == 0);
>> +}
>> +
>>   static SaveVMHandlers savevm_ram_handlers = {
>>       .save_live_setup = ram_save_setup,
>>       .save_live_iterate = ram_save_iterate,
>> diff --git a/trace-events b/trace-events
>> index 39fdd8d..7f76029 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1264,6 +1264,8 @@ migration_throttle(void) ""
>>   ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>>   ram_postcopy_send_discard_bitmap(void) ""
>>   ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
>> +colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
>> +colo_flush_ram_cache_end(void) ""
>>
>>   # hw/display/qxl.c
>>   disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
>>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state
  2015-12-15 11:31   ` Dr. David Alan Gilbert
@ 2015-12-25  6:13     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-25  6:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, Paolo Bonzini,
	hongyang.yang

On 2015/12/15 19:31, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If VM is in COLO FT state, we should do some extra work before normal shutdown
>> process. SVM will ignore the shutdown command if this command is issued directly
>> to it, PVM will send the shutdown command to SVM if it gets this command.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   include/sysemu/sysemu.h |  3 +++
>>   migration/colo.c        | 25 +++++++++++++++++++++++--
>>   qapi-schema.json        |  4 +++-
>>   vl.c                    | 26 ++++++++++++++++++++++++--
>>   4 files changed, 53 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index 3bb8897..91eeda3 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -52,6 +52,8 @@ typedef enum WakeupReason {
>>       QEMU_WAKEUP_REASON_OTHER,
>>   } WakeupReason;
>>
>> +extern int colo_shutdown_requested;
>> +
>>   void qemu_system_reset_request(void);
>>   void qemu_system_suspend_request(void);
>>   void qemu_register_suspend_notifier(Notifier *notifier);
>> @@ -59,6 +61,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
>>   void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
>>   void qemu_register_wakeup_notifier(Notifier *notifier);
>>   void qemu_system_shutdown_request(void);
>> +void qemu_system_shutdown_request_core(void);
>>   void qemu_system_powerdown_request(void);
>>   void qemu_register_powerdown_notifier(Notifier *notifier);
>>   void qemu_system_debug_request(void);
>> diff --git a/migration/colo.c b/migration/colo.c
>> index f4bb661..a094991 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -231,6 +231,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>                                             QEMUSizedBuffer *buffer)
>>   {
>>       int ret;
>> +    int colo_shutdown;
>>       size_t size;
>>       QEMUFile *trans = NULL;
>>
>> @@ -258,6 +259,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>           ret = -1;
>>           goto out;
>>       }
>> +    colo_shutdown = colo_shutdown_requested;
>>       vm_stop_force_state(RUN_STATE_COLO);
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("run", "stop");
>> @@ -311,6 +313,15 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>           goto out;
>>       }
>>
>> +    if (colo_shutdown) {
>> +        colo_put_cmd(s->to_dst_file, COLO_COMMAND_GUEST_SHUTDOWN);
>> +        qemu_fflush(s->to_dst_file);
>> +        colo_shutdown_requested = 0;
>> +        qemu_system_shutdown_request_core();
>> +        /* Fix me: Just let the colo thread exit ? */
>> +        qemu_thread_exit(0);
>> +    }
>> +
>>       ret = 0;
>>       /* Resume primary guest */
>>       qemu_mutex_lock_iothread();
>> @@ -370,8 +381,9 @@ static void colo_process_checkpoint(MigrationState *s)
>>           }
>>
>>           current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> -        if (current_time - checkpoint_time <
>> -            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
>> +        if ((current_time - checkpoint_time <
>> +            s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) &&
>> +            !colo_shutdown_requested) {
>>               int64_t delay_ms;
>>
>>               delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
>> @@ -442,6 +454,15 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>>       case COLO_COMMAND_CHECKPOINT_REQUEST:
>>           *checkpoint_request = 1;
>>           return 0;
>> +    case COLO_COMMAND_GUEST_SHUTDOWN:
>> +        qemu_mutex_lock_iothread();
>> +        vm_stop_force_state(RUN_STATE_COLO);
>> +        qemu_system_shutdown_request_core();
>> +        qemu_mutex_unlock_iothread();
>> +        /* the main thread will exit and termiante the whole
>
> Typo 'termiante'
>

>> +        * process, do we need some cleanup?
>> +        */
>> +        qemu_thread_exit(0);
>
> Yes, I'm not sure how much real cleanup you need during shutdown;
> I wonder how a shutdown will look to the management layers above;
> if they don't realise it's a shutdown they might try and do a failover
> when one side exits.
>

I have tested this case for several times, it seems worked well.

>>       default:
>>           return -EINVAL;
>>       }
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index f6ecb88..b5b1a02 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -754,12 +754,14 @@
>>   #
>>   # @vmstate-loaded: VM's state has been loaded by SVM.
>>   #
>> +# @guest-shutdown: shutdown require from PVM to SVM
>> +#
>>   # Since: 2.6
>>   ##
>>   { 'enum': 'COLOCommand',
>>     'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>               'vmstate-send', 'vmstate-size','vmstate-received',
>> -            'vmstate-loaded' ] }
>> +            'vmstate-loaded', 'guest-shutdown' ] }
>>
>>   ##
>>   # @COLOMode
>> diff --git a/vl.c b/vl.c
>> index fca630b..1a61300 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -1636,6 +1636,8 @@ static NotifierList wakeup_notifiers =
>>       NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
>>   static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
>>
>> +int colo_shutdown_requested;
>> +
>>   int qemu_shutdown_requested_get(void)
>>   {
>>       return shutdown_requested;
>> @@ -1767,6 +1769,10 @@ void qemu_system_guest_panicked(void)
>>   void qemu_system_reset_request(void)
>>   {
>>       if (no_reboot) {
>> +        qemu_system_shutdown_request();
>> +        if (!shutdown_requested) {/* colo handle it ? */
>> +            return;
>> +        }
>>           shutdown_requested = 1;
>
> Do we still need that 'shutdown_requested = 1'  - it's already
> true at this point or it returned?
>

No, it is useless, i will remove it.

>>       } else {
>>           reset_requested = 1;
>> @@ -1840,14 +1846,30 @@ void qemu_system_killed(int signal, pid_t pid)
>>       qemu_notify_event();
>>   }
>>
>> -void qemu_system_shutdown_request(void)
>> +void qemu_system_shutdown_request_core(void)
>>   {
>> -    trace_qemu_system_shutdown_request();
>>       replay_shutdown_request();
>>       shutdown_requested = 1;
>>       qemu_notify_event();
>>   }
>>
>> +void qemu_system_shutdown_request(void)
>> +{
>> +    trace_qemu_system_shutdown_request();
>> +    /*
>> +    * if in colo mode, we need do some significant work before respond to the
>> +    * shutdown request.
>> +    */
>> +    if (migration_incoming_in_colo_state()) {
>> +        return ; /* primary's responsibility */
>> +    }
>> +    if (migration_in_colo_state()) {
>> +        colo_shutdown_requested = 1;
>> +        return;
>> +    }
>
> Try to move most of this into migration/colo*.c ;
> here you could just do:
>      if (colo_shutdown()) {
>          return;
>      }
>
> it's best to keep vl.c as simple as possible.
>

Yes, it is reasonable. I will fix it in next version.

> Dave
>
>> +    qemu_system_shutdown_request_core();
>> +}
>> +
>>   static void qemu_system_powerdown(void)
>>   {
>>       qapi_event_send_powerdown(&error_abort);
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state
  2015-12-15 12:08   ` Dr. David Alan Gilbert
@ 2015-12-25  6:37     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-25  6:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/15 20:08, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> qemu_loadvm_state is too long, and we can simplify it by splitting up
>> with three helper functions.
>
> Yes, good idea.
>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/savevm.c | 161 ++++++++++++++++++++++++++++++++---------------------
>>   1 file changed, 97 insertions(+), 64 deletions(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index f102870..c7c26d8 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1710,90 +1710,123 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
>>       }
>>   }
>>
>> +static int
>> +qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>> +{
>> +    uint32_t instance_id, version_id, section_id;
>> +    SaveStateEntry *se;
>> +    LoadStateEntry *le;
>> +    char idstr[256];
>> +    int ret;
>> +
>> +    /* Read section start */
>> +    section_id = qemu_get_be32(f);
>> +    if (!qemu_get_counted_string(f, idstr)) {
>> +        error_report("Unable to read ID string for section %u",
>> +                     section_id);
>> +        return -EINVAL;
>> +    }
>> +    instance_id = qemu_get_be32(f);
>> +    version_id = qemu_get_be32(f);
>> +
>> +    trace_qemu_loadvm_state_section_startfull(section_id, idstr,
>> +            instance_id, version_id);
>> +    /* Find savevm section */
>> +    se = find_se(idstr, instance_id);
>> +    if (se == NULL) {
>> +        error_report("Unknown savevm section or instance '%s' %d",
>> +                     idstr, instance_id);
>> +        ret = -EINVAL;
>> +        return ret;
>
> Minor; you don't need 'ret' there, just return -EINVAL.
>
>> +    }
>> +
>> +    /* Validate version */
>> +    if (version_id > se->version_id) {
>> +        error_report("savevm: unsupported version %d for '%s' v%d",
>> +                     version_id, idstr, se->version_id);
>> +        ret = -EINVAL;
>> +        return ret;
>
> same
>
>> +    }
>> +
>> +    /* Add entry */
>> +    le = g_malloc0(sizeof(*le));
>> +
>> +    le->se = se;
>> +    le->section_id = section_id;
>> +    le->version_id = version_id;
>> +    QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
>> +
>> +    ret = vmstate_load(f, le->se, le->version_id);
>> +    if (ret < 0) {
>> +        error_report("error while loading state for instance 0x%x of"
>> +                     " device '%s'", instance_id, idstr);
>> +        return ret;
>> +    }
>> +    if (!check_section_footer(f, le)) {
>> +        ret = -EINVAL;
>> +        return ret;
>
> same.
>
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>> +{
>> +    uint32_t section_id;
>> +    LoadStateEntry *le;
>> +    int ret;
>> +
>> +    section_id = qemu_get_be32(f);
>> +
>> +    trace_qemu_loadvm_state_section_partend(section_id);
>> +    QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
>> +        if (le->section_id == section_id) {
>> +            break;
>> +        }
>> +    }
>> +    if (le == NULL) {
>> +        error_report("Unknown savevm section %d", section_id);
>> +        ret = -EINVAL;
>> +        return ret;
>
> same
>
>> +    }
>> +
>> +    ret = vmstate_load(f, le->se, le->version_id);
>> +    if (ret < 0) {
>> +        error_report("error while loading state section id %d(%s)",
>> +                     section_id, le->se->idstr);
>> +        return ret;
>> +    }
>> +    if (!check_section_footer(f, le)) {
>> +        ret = -EINVAL;
>> +        return ret;
>
> same
>
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>>   {
>>       uint8_t section_type;
>>       int ret;
>>
>>       while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> -        uint32_t instance_id, version_id, section_id;
>> -        SaveStateEntry *se;
>> -        LoadStateEntry *le;
>> -        char idstr[256];
>>
>>           trace_qemu_loadvm_state_section(section_type);
>>           switch (section_type) {
>>           case QEMU_VM_SECTION_START:
>>           case QEMU_VM_SECTION_FULL:
>> -            /* Read section start */
>> -            section_id = qemu_get_be32(f);
>> -            if (!qemu_get_counted_string(f, idstr)) {
>> -                error_report("Unable to read ID string for section %u",
>> -                            section_id);
>> -                return -EINVAL;
>> -            }
>> -            instance_id = qemu_get_be32(f);
>> -            version_id = qemu_get_be32(f);
>> -
>> -            trace_qemu_loadvm_state_section_startfull(section_id, idstr,
>> -                                                      instance_id, version_id);
>> -            /* Find savevm section */
>> -            se = find_se(idstr, instance_id);
>> -            if (se == NULL) {
>> -                error_report("Unknown savevm section or instance '%s' %d",
>> -                             idstr, instance_id);
>> -                return -EINVAL;
>> -            }
>> -
>> -            /* Validate version */
>> -            if (version_id > se->version_id) {
>> -                error_report("savevm: unsupported version %d for '%s' v%d",
>> -                             version_id, idstr, se->version_id);
>> -                return -EINVAL;
>> -            }
>> -
>> -            /* Add entry */
>> -            le = g_malloc0(sizeof(*le));
>> -
>> -            le->se = se;
>> -            le->section_id = section_id;
>> -            le->version_id = version_id;
>> -            QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
>> -
>> -            ret = vmstate_load(f, le->se, le->version_id);
>> +            ret = qemu_loadvm_section_start_full(f, mis);
>>               if (ret < 0) {
>> -                error_report("error while loading state for instance 0x%x of"
>> -                             " device '%s'", instance_id, idstr);
>>                   return ret;
>>               }
>> -            if (!check_section_footer(f, le)) {
>> -                return -EINVAL;
>> -            }
>>               break;
>>           case QEMU_VM_SECTION_PART:
>>           case QEMU_VM_SECTION_END:
>> -            section_id = qemu_get_be32(f);
>> -
>> -            trace_qemu_loadvm_state_section_partend(section_id);
>> -            QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
>> -                if (le->section_id == section_id) {
>> -                    break;
>> -                }
>> -            }
>> -            if (le == NULL) {
>> -                error_report("Unknown savevm section %d", section_id);
>> -                return -EINVAL;
>> -            }
>> -
>> -            ret = vmstate_load(f, le->se, le->version_id);
>> +            ret = qemu_loadvm_section_part_end(f, mis);
>>               if (ret < 0) {
>> -                error_report("error while loading state section id %d(%s)",
>> -                             section_id, le->se->idstr);
>>                   return ret;
>>               }
>> -            if (!check_section_footer(f, le)) {
>> -                return -EINVAL;
>> -            }
>>               break;
>>           case QEMU_VM_COMMAND:
>>               ret = loadvm_process_command(f);
>> --
>> 1.8.3.1
>
>
> Other than the minor return fixups;

I will fix them all in next version.

> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>

Thanks.

>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state
  2015-12-18 10:53   ` Dr. David Alan Gilbert
@ 2015-12-28  3:46     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-28  3:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/18 18:53, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We separate the process of saving/loading ram and device state when do checkpoint,
>> we add new helpers for save/load ram/device. With this change, we can directly
>> transfer ram from master to slave without using QEMUSizeBuffer as assistant,
>> which also reduce the size of extra memory been used during checkpoint.
>>
>> Besides, we move the colo_flush_ram_cache to the proper position after the
>> above change.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Remove load configuration section in qemu_loadvm_state_begin()
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/sysemu/sysemu.h |   6 +++
>>   migration/colo.c        |  43 ++++++++++++----
>>   migration/ram.c         |   5 --
>>   migration/savevm.c      | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   4 files changed, 168 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index 91eeda3..5deae53 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -133,7 +133,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>>                                              uint64_t *start_list,
>>                                              uint64_t *length_list);
>>
>> +int qemu_save_ram_precopy(QEMUFile *f);
>> +int qemu_save_device_state(QEMUFile *f);
>> +
>>   int qemu_loadvm_state(QEMUFile *f);
>> +int qemu_loadvm_state_begin(QEMUFile *f);
>> +int qemu_load_ram_state(QEMUFile *f);
>> +int qemu_load_device_state(QEMUFile *f);
>>
>>   typedef enum DisplayType
>>   {
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 62a0444..d253d64 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -272,21 +272,32 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>           goto out;
>>       }
>>
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>>       /* Disable block migration */
>>       s->params.blk = 0;
>>       s->params.shared = 0;
>> -    qemu_savevm_state_header(trans);
>> -    qemu_savevm_state_begin(trans, &s->params);
>> -    qemu_mutex_lock_iothread();
>> -    qemu_savevm_state_complete_precopy(trans, false);
>> -    qemu_mutex_unlock_iothread();
>> -
>> -    qemu_fflush(trans);
>> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
>> +    ret = qemu_file_get_error(s->to_dst_file);
>> +    if (ret < 0) {
>> +        error_report("save vm state begin error\n");
>
> You don't need \n in error_report (Markus is trying to get rid
> of all the existing cases where people do that!)
>

I will remove it.

>> +        goto out;
>> +    }
>>
>> -    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    qemu_mutex_lock_iothread();
>> +    /* Note: device state is saved into buffer */
>> +    ret = qemu_save_device_state(trans);
>>       if (ret < 0) {
>> +        error_report("save device state error\n");
>> +        qemu_mutex_unlock_iothread();
>>           goto out;
>>       }
>> +    qemu_fflush(trans);
>> +    qemu_save_ram_precopy(s->to_dst_file);
>> +    qemu_mutex_unlock_iothread();
>> +
>
> It's interesting you're saving the devices and then saving the RAM,
> where as in a normal migration we always save the RAM first and then
> the devices;  I don't _think_ it makes any difference but I thought
> I'd point it out.
>

Yes, you are right, i will adjust their orders.

>>       /* we send the total size of the vmstate first */
>>       size = qsb_get_length(buffer);
>>       ret = colo_put_cmd_value(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
>> @@ -545,6 +556,16 @@ void *colo_process_incoming_thread(void *opaque)
>>               goto out;
>>           }
>>
>> +        ret = qemu_loadvm_state_begin(mis->from_src_file);
>> +        if (ret < 0) {
>> +            error_report("load vm state begin error, ret=%d", ret);
>> +            goto out;
>> +        }
>> +        ret = qemu_load_ram_state(mis->from_src_file);
>> +        if (ret < 0) {
>> +            error_report("load ram state error");
>> +            goto out;
>> +        }
>>           /* read the VM state total size first */
>>           ret = colo_get_cmd_value(mis->from_src_file,
>>                                    COLO_COMMAND_VMSTATE_SIZE, &value);
>> @@ -577,8 +598,10 @@ void *colo_process_incoming_thread(void *opaque)
>>           qemu_mutex_lock_iothread();
>>           qemu_system_reset(VMRESET_SILENT);
>>           vmstate_loading = true;
>> -        if (qemu_loadvm_state(fb) < 0) {
>> -            error_report("COLO: loadvm failed");
>> +        colo_flush_ram_cache();
>> +        ret = qemu_load_device_state(fb);
>> +        if (ret < 0) {
>> +            error_report("COLO: load device state failed\n");
>>               vmstate_loading = false;
>>               qemu_mutex_unlock_iothread();
>>               goto out;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 8ff7f7c..45d9332 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2458,7 +2458,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>        * be atomic
>>        */
>>       bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
>> -    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -2493,7 +2492,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               /* After going into COLO, we should load the Page into colo_cache */
>>               if (ram_cache_enable) {
>>                   host = colo_cache_from_block_offset(block, addr);
>> -                need_flush = true;
>>               } else {
>>                   host = host_from_ram_block_offset(block, addr);
>>               }
>> @@ -2588,9 +2586,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>
>>       rcu_read_unlock();
>>
>> -    if (!ret  && ram_cache_enable && need_flush) {
>> -        colo_flush_ram_cache();
>> -    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index c7c26d8..94c0d10 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -50,6 +50,7 @@
>>   #include "qemu/iov.h"
>>   #include "block/snapshot.h"
>>   #include "block/qapi.h"
>> +#include "migration/colo.h"
>>
>>
>>   #ifndef ETH_P_RARP
>> @@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
>>               break;
>>           }
>>       }
>> +    if (migration_in_colo_state()) {
>> +        qemu_put_byte(f, QEMU_VM_EOF);
>> +        qemu_fflush(f);
>> +    }
>>   }
>>
>>   /*
>> @@ -1192,13 +1197,44 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>       return ret;
>>   }
>>
>> -static int qemu_save_device_state(QEMUFile *f)
>> +int qemu_save_ram_precopy(QEMUFile *f)
>>   {
>>       SaveStateEntry *se;
>> +    int ret = 0;
>>
>> -    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
>> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
>> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> +        if (!se->ops || !se->ops->save_live_complete_precopy) {
>> +            continue;
>> +        }
>> +        if (se->ops && se->ops->is_active) {
>> +            if (!se->ops->is_active(se->opaque)) {
>> +                continue;
>> +            }
>> +        }
>> +        trace_savevm_section_start(se->idstr, se->section_id);
>
> Please update the trace_ names to match the function.
>

OK.

>> +
>> +        save_section_header(f, se, QEMU_VM_SECTION_END);
>>
>> +        ret = se->ops->save_live_complete_precopy(f, se->opaque);
>> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
>> +        save_section_footer(f, se);
>> +        if (ret < 0) {
>> +            qemu_file_set_error(f, ret);
>> +            return ret;
>> +        }
>> +    }
>> +    qemu_put_byte(f, QEMU_VM_EOF);
>> +
>> +    return 0;
>> +}
>
> OK, that function is a bit odd - you're relying on a device having
> save_live_complete_precopy to let you know that it's a RAM block; it's

The function name is not exact. It's not only ram, but also include other
devices that use save_live_complete_precopy(), they should be all saved in this
function. For now, it is ram only.

> currently true but it'll get more interesting if anyone tries to
> add any other postcopy devices.  Please add a comment at least
> to point that out.
>

OK.

>> +
>> +int qemu_save_device_state(QEMUFile *f)
>> +{
>> +    SaveStateEntry *se;
>> +
>> +    if (!migration_in_colo_state()) {
>> +        qemu_savevm_state_header(f);
>> +    }
>>       cpu_synchronize_all_states();
>>
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> @@ -1938,6 +1974,96 @@ int qemu_loadvm_state(QEMUFile *f)
>>       return ret;
>>   }
>>
>> +int qemu_loadvm_state_begin(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    int ret = -1;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +
>> +    if (!mis) {
>> +        error_report("qemu_loadvm_state_begin");
>> +        return -EINVAL;
>> +    }
>
> an odd error; how can that happen?
>

It should never happen, it is needless. I will remove it.

>> +    /* CleanUp */
>> +    loadvm_free_handlers(mis);
>
> I don't understand why you do that here?
>

We will do a full migration process before entering COLO state,
And we will save the section info (LoadStateEntry) into a list while load
VM state. Please see qemu_loadvm_section_start_full().
We will re-save the section info in every load VM state process. So
we need to cleanup the previous section info that stored in
mis->loadvm_handlers list.
Hmm, besides, we should release it after finishing loading VM state for every
checkpoint process, we didn't do that in this version, and there will be an
memory leak error.
I will fix this bug in next version.

>> +
>> +    if (qemu_savevm_state_blocked(NULL)) {
>> +        return -EINVAL;
>> +    }
>
> The other calls to that function print the error it returns!
>

OK, i will fix it.

>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_START) {
>> +            error_report("QEMU_VM_SECTION_START");
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +        ret = qemu_loadvm_section_start_full(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +    ret = qemu_file_get_error(f);
>> +    if (ret == 0) {
>> +        return 0;
>> +     }
>
> That 'if' isn't needed - just remove the 3 lines and it will do the same
> thing!
>

OK.

>> +out:
>> +    return ret;
>> +}
>> +
>> +int qemu_load_ram_state(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    int ret = -1;
>> +
>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_PART &&
>> +            section_type != QEMU_VM_SECTION_END) {
>> +            error_report("load ram state, not get "
>> +                         "QEMU_VM_SECTION_FULL or QEMU_VM_SECTION_END");
>> +            return -EINVAL;
>> +        }
>> +        ret = qemu_loadvm_section_part_end(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +    ret = qemu_file_get_error(f);
>> +    if (ret == 0) {
>> +        return 0;
>> +     }
>> +out:
>> +    return ret;
>> +}
>> +
>> +int qemu_load_device_state(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    int ret = -1;
>> +
>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_FULL) {
>> +            error_report("load device state error: "
>> +                         "Not get QEMU_VM_SECTION_FULL");
>> +            return -EINVAL;
>> +        }
>> +        ret = qemu_loadvm_section_start_full(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    ret = qemu_file_get_error(f);
>> +
>> +    cpu_synchronize_all_post_init();
>> +    if (ret == 0) {
>> +        return 0;
>> +    }
>> +out:
>> +    return ret;
>> +}
>> +
>
> These three functions are all very similar;  would it be easier
> just to call qemu_loadvm_state_main ?  Perhaps add a flag/enum

Yes, we can call qemu_loadvm_state_main() directly in these three functions.
We don't have to change the define of qemu_loadvm_state_main(). This is more simple.

Thanks.
Hailiang

> parameter to it for it to check which section_types are allowed
> in the 3 different cases?
>
> Dave
>
>>   void hmp_savevm(Monitor *mon, const QDict *qdict)
>>   {
>>       BlockDriverState *bs, *bs1;
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process
  2015-12-18 12:01   ` Dr. David Alan Gilbert
@ 2015-12-28  7:29     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-28  7:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/18 20:01, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
>> It mainly sets up devices and does the first device state pass. These data will
>> not change during the later checkpoint process. So, we split it out of
>> colo_do_checkpoint_transaction(), in this way, we can reduce these data
>> transferring in the later checkpoint.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   migration/colo.c | 51 +++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 37 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d253d64..4571359 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -276,15 +276,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>       if (ret < 0) {
>>           goto out;
>>       }
>> -    /* Disable block migration */
>> -    s->params.blk = 0;
>> -    s->params.shared = 0;
>> -    qemu_savevm_state_begin(s->to_dst_file, &s->params);
>> -    ret = qemu_file_get_error(s->to_dst_file);
>> -    if (ret < 0) {
>> -        error_report("save vm state begin error\n");
>> -        goto out;
>> -    }
>>
>>       qemu_mutex_lock_iothread();
>>       /* Note: device state is saved into buffer */
>> @@ -348,6 +339,21 @@ out:
>>       return ret;
>>   }
>>
>> +static int colo_prepare_before_save(MigrationState *s)
>> +{
>> +    int ret;
>> +    /* Disable block migration */
>> +    s->params.blk = 0;
>> +    s->params.shared = 0;
>> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
>> +    ret = qemu_file_get_error(s->to_dst_file);
>> +    if (ret < 0) {
>> +        error_report("save vm state begin error\n");
>
>   '\n' again not needed.
>
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>> +
>>   static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       QEMUSizedBuffer *buffer = NULL;
>> @@ -363,6 +369,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>           goto out;
>>       }
>>
>> +    ret = colo_prepare_before_save(s);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       /*
>>        * Wait for Secondary finish loading vm states and enter COLO
>>        * restore.
>> @@ -484,6 +495,18 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>>       }
>>   }
>>
>> +static int colo_prepare_before_load(QEMUFile *f)
>> +{
>> +    int ret;
>> +
>> +    ret = qemu_loadvm_state_begin(f);
>> +    if (ret < 0) {
>> +        error_report("load vm state begin error, ret=%d", ret);
>> +        return ret;
>
> You can simplify these returns; remove this line.
>
>> +    }
>> +    return 0;
>
> and make this return ret; same in a few places.
>
>
> Other than those minor issues;
>

I will fix them all in next version.

> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>

Thanks.
Hailiang

>
>> +}
>> +
>>   void *colo_process_incoming_thread(void *opaque)
>>   {
>>       MigrationIncomingState *mis = opaque;
>> @@ -522,6 +545,11 @@ void *colo_process_incoming_thread(void *opaque)
>>           goto out;
>>       }
>>
>> +    ret = colo_prepare_before_load(mis->from_src_file);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>>       if (ret < 0) {
>>           goto out;
>> @@ -556,11 +584,6 @@ void *colo_process_incoming_thread(void *opaque)
>>               goto out;
>>           }
>>
>> -        ret = qemu_loadvm_state_begin(mis->from_src_file);
>> -        if (ret < 0) {
>> -            error_report("load vm state begin error, ret=%d", ret);
>> -            goto out;
>> -        }
>>           ret = qemu_load_ram_state(mis->from_src_file);
>>           if (ret < 0) {
>>               error_report("load ram state error");
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-18 14:52   ` Dr. David Alan Gilbert
@ 2015-12-28  7:34     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-28  7:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/18 22:52, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We need communications protocol of user-defined to control the checkpoint
>> process.
>>
>> The new checkpoint request is started by Primary VM, and the interactive process
>> like below:
>> Checkpoint synchronizing points,
>>
>>                         Primary                         Secondary
>>                                                         initial work
>> 'checkpoint-ready'     <------------------------------ @
>>
>> 'checkpoint-request'   @ ----------------------------->
>>                                                         Suspend (Only in hybrid mode)
>> 'checkpoint-reply'     <------------------------------ @
>>                         Suspend&Save state
>> 'vmstate-send'         @ ----------------------------->
>>                         Send state                      Receive state
>> 'vmstate-received'     <------------------------------ @
>>                         Release packets                 Load state
>> 'vmstate-load'         <------------------------------ @
>>                         Resume                          Resume (Only in hybrid mode)
>>
>>                         Start Comparing (Only in hybrid mode)
>> NOTE:
>>   1) '@' who sends the message
>>   2) Every sync-point is synchronized by two sides with only
>>      one handshake(single direction) for low-latency.
>>      If more strict synchronization is required, a opposite direction
>>      sync-point should be added.
>>   3) Since sync-points are single direction, the remote side may
>>      go forward a lot when this side just receives the sync-point.
>>   4) For now, we only support 'periodic' checkpoint, for which
>>     the Secondary VM is not running, later we will support 'hybrid' mode.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> ---
>> v12:
>> - Rename colo_ctl_put() to colo_put_cmd()
>> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>>    the third parameter
>> - Rename colo_ctl_get_cmd() to colo_get_cmd()
>> - Remove useless 'invalid' member for COLOcommand enum.
>> v11:
>> - Add missing 'checkpoint-ready' communication in comment.
>> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
>> - Fix trace for colo_ctl_get() to trace command and value both
>> v10:
>> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
>> - Remove unused 'ram-steal'
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qapi-schema.json |  25 ++++++++
>>   trace-events     |   2 +
>>   3 files changed, 208 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 0ab9618..0ce2a6e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,10 +10,12 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include <unistd.h>
>>   #include "sysemu/sysemu.h"
>>   #include "migration/colo.h"
>>   #include "trace.h"
>>   #include "qemu/error-report.h"
>> +#include "qemu/sockets.h"
>>
>>   bool colo_supported(void)
>>   {
>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>   }
>>
>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>> +{
>> +    int ret;
>> +
>> +    if (cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    qemu_put_be32(f, cmd);
>> +    qemu_fflush(f);
>> +
>> +    ret = qemu_file_get_error(f);
>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>> +
>> +    return ret;
>> +}
>> +
>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>> +{
>> +    int ret;
>> +
>> +    *cmd = qemu_get_be32(f);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (*cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>> +    return 0;
>> +}
>> +
>> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (cmd != expect_cmd) {
>> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
>> +                     expect_cmd, cmd);
>
> Those still need to be PRIu32
>

Er, i have change the type to 'COLOCommand' according to Markus's comment.

> But other than that,
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks.
Hailiang

>
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int colo_do_checkpoint_transaction(MigrationState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_REPLY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: suspend and save vm state to colo buffer */
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: send vmstate to Secondary */
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_RECEIVED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_LOADED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: resume Primary */
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>>   static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       int ret = 0;
>> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>>           goto out;
>>       }
>>
>> +    /*
>> +     * Wait for Secondary finish loading vm states and enter COLO
>> +     * restore.
>> +     */
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>>       vm_start();
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("stop", "run");
>>
>> -    /*TODO: COLO checkpoint savevm loop*/
>> +    while (s->state == MIGRATION_STATUS_COLO) {
>> +        /* start a colo checkpoint */
>> +        ret = colo_do_checkpoint_transaction(s);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>>       qemu_mutex_lock_iothread();
>>   }
>>
>> +/*
>> + * return:
>> + * 0: start a checkpoint
>> + * -1: some error happened, exit colo restore
>> + */
>> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        /* do failover ? */
>> +        return ret;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
>> +        *checkpoint_request = 1;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +}
>> +
>>   void *colo_process_incoming_thread(void *opaque)
>>   {
>>       MigrationIncomingState *mis = opaque;
>> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>>       */
>>       qemu_set_block(qemu_get_fd(mis->from_src_file));
>>
>> -    /* TODO: COLO checkpoint restore loop */
>> +
>> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    while (mis->state == MIGRATION_STATUS_COLO) {
>> +        int request = 0;
>> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
>> +
>> +        if (ret < 0) {
>> +            break;
>> +        } else {
>> +            if (!request) {
>> +                continue;
>> +            }
>> +        }
>> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        ret = colo_get_check_cmd(mis->from_src_file,
>> +                                 COLO_COMMAND_VMSTATE_SEND);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: read migration data into colo buffer */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: load vm state */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index c9ff34e..85f7800 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -720,6 +720,31 @@
>>   { 'command': 'migrate-start-postcopy' }
>>
>>   ##
>> +# @COLOCommand
>> +#
>> +# The commands for COLO fault tolerance
>> +#
>> +# @checkpoint-ready: SVM is ready for checkpointing
>> +#
>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>> +#
>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>> +#
>> +# @vmstate-send: VM's state will be sent by PVM.
>> +#
>> +# @vmstate-size: The total size of VMstate.
>> +#
>> +# @vmstate-received: VM's state has been received by SVM.
>> +#
>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOCommand',
>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>> +            'vmstate-loaded' ] }
>> +
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/trace-events b/trace-events
>> index 5565e79..39fdd8d 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>
>>   # migration/colo.c
>>   colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>
>>   # kvm-all.c
>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM
  2015-12-18 15:35   ` Dr. David Alan Gilbert
@ 2015-12-28  7:39     ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2015-12-28  7:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, hongyang.yang

On 2015/12/18 23:35, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> For PVM, if there is failover request from users.
>> The colo thread will exit the loop while the failover BH does the
>> cleanup work and resumes VM.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v12:
>> - Fix error report and remove unnecessary check in primary_vm_do_failover()
>>   (Dave's suggestion)
>> v11:
>> - Don't call migration_end() in primary_vm_do_failover(),
>>   The cleanup work will be done in migration_thread().
>> - Remove vm_start() in primary_vm_do_failover() which also been done
>>    in migraiton_thread()
>> v10:
>> - Call migration_end() in primary_vm_do_failover()
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/migration/colo.h     |  3 +++
>>   include/migration/failover.h |  1 +
>>   migration/colo-failover.c    |  7 +++++-
>>   migration/colo.c             | 54 ++++++++++++++++++++++++++++++++++++++++++--
>>   4 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/migration/colo.h b/include/migration/colo.h
>> index ba27719..0b02e95 100644
>> --- a/include/migration/colo.h
>> +++ b/include/migration/colo.h
>> @@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
>>   bool migration_incoming_in_colo_state(void);
>>
>>   COLOMode get_colo_mode(void);
>> +
>> +/* failover */
>> +void colo_do_failover(MigrationState *s);
>>   #endif
>> diff --git a/include/migration/failover.h b/include/migration/failover.h
>> index 882c625..fba3931 100644
>> --- a/include/migration/failover.h
>> +++ b/include/migration/failover.h
>> @@ -26,5 +26,6 @@ void failover_init_state(void);
>>   int failover_set_state(int old_state, int new_state);
>>   int failover_get_state(void);
>>   void failover_request_active(Error **errp);
>> +bool failover_request_is_active(void);
>>
>>   #endif
>> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
>> index 1b1be24..0c525da 100644
>> --- a/migration/colo-failover.c
>> +++ b/migration/colo-failover.c
>> @@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
>>           error_report("Unkown error for failover, old_state=%d", old_state);
>>           return;
>>       }
>> -    /*TODO: Do failover work */
>> +    colo_do_failover(NULL);
>>   }
>>
>>   void failover_request_active(Error **errp)
>> @@ -67,6 +67,11 @@ int failover_get_state(void)
>>       return atomic_read(&failover_state);
>>   }
>>
>> +bool failover_request_is_active(void)
>> +{
>> +    return ((failover_get_state() != FAILOVER_STATUS_NONE));
>
> You can remove the two sets of brackets.

OK :)

> But other than that:
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>

Thanks.
Hailiang

>
>> +}
>> +
>>   void qmp_x_colo_lost_heartbeat(Error **errp)
>>   {
>>       if (get_colo_mode() == COLO_MODE_UNKNOWN) {
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 176384e..977c8d8 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -41,6 +41,40 @@ bool migration_incoming_in_colo_state(void)
>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>   }
>>
>> +static bool colo_runstate_is_stopped(void)
>> +{
>> +    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
>> +}
>> +
>> +static void primary_vm_do_failover(void)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +    int old_state;
>> +
>> +    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>> +                      MIGRATION_STATUS_COMPLETED);
>> +
>> +    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>> +                                   FAILOVER_STATUS_COMPLETED);
>> +    if (old_state != FAILOVER_STATUS_HANDLING) {
>> +        error_report("Incorrect state (%d) while doing failover for Primary VM",
>> +                     old_state);
>> +        return;
>> +    }
>> +}
>> +
>> +void colo_do_failover(MigrationState *s)
>> +{
>> +    /* Make sure vm stopped while failover */
>> +    if (!colo_runstate_is_stopped()) {
>> +        vm_stop_force_state(RUN_STATE_COLO);
>> +    }
>> +
>> +    if (get_colo_mode() == COLO_MODE_PRIMARY) {
>> +        primary_vm_do_failover();
>> +    }
>> +}
>> +
>>   static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>   {
>>       int ret;
>> @@ -150,9 +184,22 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>       }
>>
>>       qemu_mutex_lock_iothread();
>> +    if (failover_request_is_active()) {
>> +        qemu_mutex_unlock_iothread();
>> +        ret = -1;
>> +        goto out;
>> +    }
>>       vm_stop_force_state(RUN_STATE_COLO);
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("run", "stop");
>> +    /*
>> +     * failover request bh could be called after
>> +     * vm_stop_force_state so we check failover_request_is_active() again.
>> +     */
>> +    if (failover_request_is_active()) {
>> +        ret = -1;
>> +        goto out;
>> +    }
>>
>>       /* Disable block migration */
>>       s->params.blk = 0;
>> @@ -248,6 +295,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>       trace_colo_vm_state_change("stop", "run");
>>
>>       while (s->state == MIGRATION_STATUS_COLO) {
>> +        if (failover_request_is_active()) {
>> +            error_report("failover request");
>> +            goto out;
>> +        }
>> +
>>           current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>           if (current_time - checkpoint_time <
>>               s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
>> @@ -269,8 +321,6 @@ out:
>>       if (ret < 0) {
>>           error_report("%s: %s", __func__, strerror(-ret));
>>       }
>> -    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>> -                      MIGRATION_STATUS_COMPLETED);
>>
>>       qsb_free(buffer);
>>       buffer = NULL;
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-23  1:24     ` [Qemu-devel] " Wen Congyang
@ 2016-01-05 19:21       ` John Snow
  0 siblings, 0 replies; 94+ messages in thread
From: John Snow @ 2016-01-05 19:21 UTC (permalink / raw)
  To: Wen Congyang, Markus Armbruster, zhanghailiang
  Cc: qemu-block, lizhijian, quintela, yunhong.jiang, eddie.dong,
	qemu-devel, peter.huangpeng, arei.gonglei, stefanha, amit.shah,
	Michael Roth, dgilbert, hongyang.yang



On 12/22/2015 08:24 PM, Wen Congyang wrote:
> On 12/19/2015 06:02 PM, Markus Armbruster wrote:
>> Copying qemu-block because this seems related to generalising block jobs
>> to background jobs.
>>
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>>> failover work immediately.
>>> If users don't want to get involved in COLO's failover verdict,
>>> it is still necessary to notify users that we exited COLO mode.
>>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>> v11:
>>> - Fix several typos found by Eric
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> ---
>>>  docs/qmp-events.txt | 17 +++++++++++++++++
>>>  migration/colo.c    | 11 +++++++++++
>>>  qapi-schema.json    | 16 ++++++++++++++++
>>>  qapi/event.json     | 17 +++++++++++++++++
>>>  4 files changed, 61 insertions(+)
>>>
>>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>>> index d2f1ce4..19f68fc 100644
>>> --- a/docs/qmp-events.txt
>>> +++ b/docs/qmp-events.txt
>>> @@ -184,6 +184,23 @@ Example:
>>>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>>  event.
>>>  
>>> +COLO_EXIT
>>> +---------
>>> +
>>> +Emitted when VM finishes COLO mode due to some errors happening or
>>> +at the request of users.
>>
>> How would the event's recipient distinguish between "due to error" and
>> "at the user's request"?
>>
>>> +
>>> +Data:
>>> +
>>> + - "mode": COLO mode, primary or secondary side (json-string)
>>> + - "reason":  the exit reason, internal error or external request. (json-string)
>>> + - "error": error message (json-string, operation)
>>> +
>>> +Example:
>>> +
>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>>> +
>>
>> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
>> some kind of COLO background job, and it just finished for whatever
>> reason?
>>
>> If yes, this COLO job could be an instance of the general background job
>> concept we're trying to grow from the existing block job concept.
>>
>> I'm not asking you to rebase your work onto the background job
>> infrastructure, not least for the simple reason that it doesn't exist,
>> yet.  But I think it would be fruitful to compare your COLO job
>> management QMP interface with the one we have for block jobs.  Not only
>> may that avoid unnecessary inconsistency, it could also help shape the
>> general background job interface.
> 
> COLO is not a block job. If live migration is a background jon, COLO
> is also a backgroud job.
> 

Right. We are contemplating expanding the "block job" subsystem to be a
generic "background job" system. Live Migration might be one target to
be converted into this Jobs API, COLO might also be a fit.

The framework doesn't exist yet, though.

>>
>> Quick overview of the block job QMP interface:
>>
>> * Commands to create a job: block-commit, block-stream, drive-mirror,
>>   drive-backup.
>>
>> * Get information on jobs: query-block-jobs
>>
>> * Pause a job: block-job-pause
>>
>> * Resume a job: block-job-resume
>>
>> * Cancel a job: block-job-cancel
>>
>> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
>>
>> * Block job error event: BLOCK_JOB_ERROR
>>
>> * Block job synchronous completion: event BLOCK_JOB_READY and command
>>   block-job-complete
> 
> What is background job infrastructure? Do you mean implement all the above
> interfaces for each background job?
> 
> Thanks
> Wen Congyang
> 

Markus is laying out how Block Jobs currently work for some background
on how the job system exists today. He's highlighting the commands to
create, query, pause, resume, and cancel jobs; as well as demonstrating
the QMP events that the Block Job system uses to indicate completion,
cancellation, error and convergence.

We're thinking of making a generic background job system that would
replace the blockjobs API with a new generic Jobs API that looks very
similar.

Something like this:

Commands:
query: query-jobs
pause: job-pause
resume: job-resume
cancel: job-cancel
complete: job-complete (finalizes a long running command that has converged)

Events:
completion: JOB_COMPLETED, JOB_CANCELLED
error: JOB_ERROR
convergence indicator: JOB_READY

The system doesn't exist yet, but your proposed events that indicate
success/failure etc for COLO caught Markus' attention as perhaps quite
neatly fitting into the above proposed system.

--js

>>
>>>  DEVICE_DELETED
>>>  --------------
>>>  
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index d1dd4e1..d06c14f 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -18,6 +18,7 @@
>>>  #include "qemu/error-report.h"
>>>  #include "qemu/sockets.h"
>>>  #include "migration/failover.h"
>>> +#include "qapi-event.h"
>>>  
>>>  /* colo buffer */
>>>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>>  out:
>>>      if (ret < 0) {
>>>          error_report("%s: %s", __func__, strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>      }
>>>  
>>>      qsb_free(buffer);
>>> @@ -516,6 +522,11 @@ out:
>>>      if (ret < 0) {
>>>          error_report("colo incoming thread will exit, detect error: %s",
>>>                       strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>      }
>>>  
>>>      if (fb) {
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index feb7d53..f6ecb88 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -778,6 +778,22 @@
>>>    'data': [ 'unknown', 'primary', 'secondary'] }
>>>  
>>>  ##
>>> +# @COLOExitReason
>>> +#
>>> +# The reason for a COLO exit
>>> +#
>>> +# @unknown: unknown reason
>>
>> How can @unknown happen?
>>
>>> +#
>>> +# @request: COLO exit is due to an external request
>>> +#
>>> +# @error: COLO exit is due to an internal error
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOExitReason',
>>> +  'data': [ 'unknown', 'request', 'error'] }
>>> +
>>> +##
>>>  # @x-colo-lost-heartbeat
>>>  #
>>>  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>>> diff --git a/qapi/event.json b/qapi/event.json
>>> index f0cef01..f63d456 100644
>>> --- a/qapi/event.json
>>> +++ b/qapi/event.json
>>> @@ -255,6 +255,23 @@
>>>    'data': {'status': 'MigrationStatus'}}
>>>  
>>>  ##
>>> +# @COLO_EXIT
>>> +#
>>> +# Emitted when VM finishes COLO mode due to some errors happening or
>>> +# at the request of users.
>>> +#
>>> +# @mode: which COLO mode the VM was in when it exited.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @reason: describes the reason for the COLO exit.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @error: #optional, error message. Only present on error happening.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'event': 'COLO_EXIT',
>>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>>> +
>>> +##
>>>  # @ACPI_DEVICE_OST
>>>  #
>>>  # Emitted when guest executes ACPI _OST method.
>>
>>
>>
>> .
>>
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2015-12-22  7:00     ` Hailiang Zhang
@ 2016-01-11 12:47       ` Markus Armbruster
  2016-01-12 12:57         ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2016-01-11 12:47 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:

> Hi Markus,
>
> On 2015/12/19 16:54, Markus Armbruster wrote:
>> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
>> eye nearby), please pardon my ignorance of COLO in general, and previous
>> review of this series in particular.
>>
>
> Thanks all the same :)
[...]
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index 0ab9618..0ce2a6e 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -10,10 +10,12 @@
>>>    * later.  See the COPYING file in the top-level directory.
>>>    */
>>>
>>> +#include <unistd.h>
>>>   #include "sysemu/sysemu.h"
>>>   #include "migration/colo.h"
>>>   #include "trace.h"
>>>   #include "qemu/error-report.h"
>>> +#include "qemu/sockets.h"
>>>
>>>   bool colo_supported(void)
>>>   {
>>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>   }
>>>
>>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (cmd >= COLO_COMMAND_MAX) {
>>
>> Needs a trivial rebase due to commit 7fb1cf1.
>>
>
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>
>> Can this run in a context with different error handling needs?
>>
>> Or asked differently: who may ultimately handle this error?  Whoever
>> that may be, how does it need to report errors?
>>
>> Peeking ahead: the immediate callers don't handle this error, they just
>> pass it on their callers.
>>
>> I'm asking because I'm trying to understand whether error_report() is
>> appropriate here, or whether you need to use error_setg(), and leave the
>> actual reporting to the spot that ultimately handles this error.
>>
>
> Hmm, i know what you mean, we handled them all together after exit
> from the colo process loop,
> Use error_setg() seems to be a good idea, with this modification, we
> can also drop the return
> value. I will fix it in next version.
>
>
>>> +    }
>>> +    qemu_put_be32(f, cmd);
>>> +    qemu_fflush(f);
>>> +
>>> +    ret = qemu_file_get_error(f);
>>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>>> +
>>> +    return ret;
>>> +}
>>
>> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>>
>
> OK, i will change it to use enum COLOCommand.
>
>>> +
>>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    *cmd = qemu_get_be32(f);
>>> +    ret = qemu_file_get_error(f);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    if (*cmd >= COLO_COMMAND_MAX) {
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>> +    }
>>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>>> +    return 0;
>>> +}
>>
>> Same question.
>>
>> The "get" in the name suggests the function returns the value gotten,
>> like similarly named function elsewhere in migration/ do.
>>
> Do you mean it should return the cmd value directly, not though parameter way ?
> After we convert it to use error_setg() to indicate success or not, we
> can do like that.
> I will fix it.

Sounds good to me.

[...]
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index c9ff34e..85f7800 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -720,6 +720,31 @@
>>>   { 'command': 'migrate-start-postcopy' }
>>>
>>>   ##
>>> +# @COLOCommand
>>> +#
>>> +# The commands for COLO fault tolerance
>>> +#
>>> +# @checkpoint-ready: SVM is ready for checkpointing
>>> +#
>>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>>> +#
>>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>>> +#
>>> +# @vmstate-send: VM's state will be sent by PVM.
>>> +#
>>> +# @vmstate-size: The total size of VMstate.
>>> +#
>>> +# @vmstate-received: VM's state has been received by SVM.
>>> +#
>>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOCommand',
>>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>>> +            'vmstate-loaded' ] }
>>> +
>>
>> Space after 'vmstate-size', please.
>>
>
>> 'vmstate-size' is not used in this patch.  You may want to add it with
>> its first use instead.
>>
>
> OK, i will move it to the corresponding patch.
>
>> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
>> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
>> others look like replies.
>>
>
> Yes, COLOCommand is not so exact. what about name it COLOProtocol?

A protocol specifies valid sequences of messages, and what they mean.
This isn't a protocol, it's a message within a protocol.  COLOMessage?

>>
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/trace-events b/trace-events
>>> index 5565e79..39fdd8d 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>>
>>>   # migration/colo.c
>>>   colo_vm_state_change(const char *old, const char *new) "Change
>>> '%s' => '%s'"
>>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>>
>>>   # kvm-all.c
>>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>
>> I like how this commit creates just the two state machines, and leaves
>> filling in their actions to later commits.  Helps ignorant rewiewers
>> like me :)
>>
>>
>
> Do you mean i should split this patch ? Leave this patch with the
> simplest colo process,
> maybe just 'ready, request, reply', and add the other states in later patch?

No, I *like* how you split up the work.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO
  2015-12-22 13:32     ` Hailiang Zhang
@ 2016-01-11 13:16       ` Markus Armbruster
  2016-01-12 12:54         ` Hailiang Zhang
  0 siblings, 1 reply; 94+ messages in thread
From: Markus Armbruster @ 2016-01-11 13:16 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:

> On 2015/12/19 17:27, Markus Armbruster wrote:
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> Guest will enter this state when paused to save/restore VM state
>>> under colo checkpoint.
>>>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>   qapi-schema.json | 5 ++++-
>>>   vl.c             | 8 ++++++++
>>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index 85f7800..0423b47 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -154,12 +154,15 @@
>>>   # @watchdog: the watchdog action is configured to pause and has been triggered
>>>   #
>>>   # @guest-panicked: guest has been panicked as a result of guest OS panic
>>> +#
>>> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since
>>> +# 2.6)
>>>   ##
>>>   { 'enum': 'RunState',
>>>     'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>>>               'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>>>               'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
>>> -            'guest-panicked' ] }
>>> +            'guest-panicked', 'colo' ] }
>>>
>>>   ##
>>>   # @StatusInfo:
>>> diff --git a/vl.c b/vl.c
>>> index f84fde8..fca630b 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>       { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
>>>       { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
>>>       { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
>>> +    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
>>>
>>>       { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
>>>       { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
>>> @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>
>>>       { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
>>>       { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
>>> +    { RUN_STATE_PAUSED, RUN_STATE_COLO},
>>>
>>>       { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>>>       { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
>>> @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>
>>>       { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>>>       { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
>>> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
>>>
>>>       { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>>>
>>> +    { RUN_STATE_COLO, RUN_STATE_RUNNING },
>>> +
>>>       { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>>>       { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
>>>       { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
>>> @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>       { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
>>>       { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
>>>       { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
>>> +    { RUN_STATE_RUNNING, RUN_STATE_COLO},
>>>
>>>       { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
>>>
>>> @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>       { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>>       { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>>       { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>>> +    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>>>
>>>       { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>>>       { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>>> +    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
>>>
>>>       { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>>>       { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
>>
>> Pardon my ignorance, but could you explain the new run state in a bit
>> more detail for me?
>>
>
> OK, in normally, we only need switch between COLO and RUNNING state.
> But we can't forbid users to issue other command while VM is COLO state.
>
> In every checkpoint, we have to pause to send VM's state to SVM, and before we
> pause VM, users may issue 'stop' command, which will change state to
> 'RUN_STATE_PAUSE',
> we don't want to abort VM because of this command. (Actually, we will
> support 'stop' VM
> during VM is in COLO state). So we need the state machine
> 'RUN_STATE_PAUSED -> RUN_STATE_COLO'.

What's the next state then?

> We enter COLO state just after a full migration process which the last
> state will be
> 'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter
> COLO loop, we may get
> 'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need
> state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and
> 'RUN_STATE_INMIGRATE, RUN_STATE_COLO'.
> The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or
> users may issue standby command.
> We need to ensure VM not be crashed.
>
> Actually, we may need more states which can go to 'colo' state, maybe
> just follow the cases of
> 'MIGRATE' state.

I believe we should fully work out the state transitions added by COLO.
I like to write that down in this form:

    (state, trigger) -> (action, state')

Example:

    (running, checkpoint) -> (begin-checkpointing, colo)

with a suitable explanation of 'checkpoint' and 'begin-checkpointing'.
    
For brevity, multiple

    (state1, trigger) -> (action, state')
    (state2, trigger) -> (action, state')
    ...
    (stateN, trigger) -> (action, state')

can be abbreviated to

    ({state1, state2, stateN}, trigger) -> (action, state')

Example:

    ({running, paused, ...}, checkpoint) -> (begin-checkpointing, colo)

For clarity, chains of state transitions should be described in the
order they happen.

Pictures showing the states connected with transition arrows labelled
with the trigger can help.

Two properties to check:

1. Correctness: every state transition thus written down does the right
   thing.

2. Completeness: for every pair (state, trigger), we got a state
   transition, or an explanation why it cannot happen.

> Thanks,
> zhanghailiang
>
>> Your additions to runstate_transitions_def[] show we can go *from* state
>> 'colo' only to state 'running', but we can go *to* state 'colo' from
>> various other states.  This may well be sane, but it's not *obviously*
>> sane :)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
  2015-12-23  3:10     ` [Qemu-devel] " Hailiang Zhang
@ 2016-01-11 13:24       ` Markus Armbruster
  0 siblings, 0 replies; 94+ messages in thread
From: Markus Armbruster @ 2016-01-11 13:24 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: qemu-block, lizhijian, quintela, qemu-devel, yunhong.jiang,
	eddie.dong, peter.huangpeng, Michael Roth, arei.gonglei,
	stefanha, amit.shah, dgilbert, hongyang.yang

Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:

> On 2015/12/19 18:02, Markus Armbruster wrote:
>> Copying qemu-block because this seems related to generalising block jobs
>> to background jobs.
>>
>
> Er, this event just used to help users to know what happened to VM with COLO FT
> on. If users get this event, they can make further check what's wrong, and
> decide which side should take over the work.
>
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> If some errors happen during VM's COLO FT stage, it's important to
>>> notify the users
>>> of this event. Together with 'colo_lost_heartbeat', users can
>>> intervene in COLO's
>>> failover work immediately.
>>> If users don't want to get involved in COLO's failover verdict,
>>> it is still necessary to notify users that we exited COLO mode.
>>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>> v11:
>>> - Fix several typos found by Eric
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> ---
>>>   docs/qmp-events.txt | 17 +++++++++++++++++
>>>   migration/colo.c    | 11 +++++++++++
>>>   qapi-schema.json    | 16 ++++++++++++++++
>>>   qapi/event.json     | 17 +++++++++++++++++
>>>   4 files changed, 61 insertions(+)
>>>
>>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>>> index d2f1ce4..19f68fc 100644
>>> --- a/docs/qmp-events.txt
>>> +++ b/docs/qmp-events.txt
>>> @@ -184,6 +184,23 @@ Example:
>>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>>   event.
>>>
>>> +COLO_EXIT
>>> +---------
>>> +
>>> +Emitted when VM finishes COLO mode due to some errors happening or
>>> +at the request of users.
>>
>> How would the event's recipient distinguish between "due to error" and
>> "at the user's request"?
>>
>
> If they get this event with 'reason' is 'request', it is 'at the
> user's request',
> Or, it will be 'due to error' (The key for 'reason' will be 'error',
> and we have an optional
> error message which may help to figure out what happened.)

For what it's worth, block jobs use separate events BLOCK_JOB_CANCELLED
and BLOCK_JOB_ERROR.

>>> +
>>> +Data:
>>> +
>>> + - "mode": COLO mode, primary or secondary side (json-string)
>>> + - "reason": the exit reason, internal error or external
>>> request. (json-string)
>>> + - "error": error message (json-string, operation)
>>> +
>>> +Example:
>>> +
>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>>> +
>>
>> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
>> some kind of COLO background job, and it just finished for whatever
>> reason?
>>
>
> As above, what i have said.
>
>> If yes, this COLO job could be an instance of the general background job
>> concept we're trying to grow from the existing block job concept.
>>
>> I'm not asking you to rebase your work onto the background job
>> infrastructure, not least for the simple reason that it doesn't exist,
>> yet.  But I think it would be fruitful to compare your COLO job
>> management QMP interface with the one we have for block jobs.  Not only
>> may that avoid unnecessary inconsistency, it could also help shape the
>> general background job interface.
>>
>
> Interesting, i'm not quite familiar with this block background job
> infrastructure.
> If we consider COLO FT as a background job, we can certainly use it. I
> will have a look
> at it.

Thanks!  Let's avoid unnecessary differences between COLO and block job
interfaces.  Later on, we can hopefully make them both use a common
background job infrastructure, and the smaller their differences are,
the easier that'll be.

[...]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO
  2016-01-11 13:16       ` Markus Armbruster
@ 2016-01-12 12:54         ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2016-01-12 12:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

On 2016/1/11 21:16, Markus Armbruster wrote:
> Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:
>
>> On 2015/12/19 17:27, Markus Armbruster wrote:
>>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>>
>>>> Guest will enter this state when paused to save/restore VM state
>>>> under colo checkpoint.
>>>>
>>>> Cc: Eric Blake <eblake@redhat.com>
>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>>> ---
>>>>    qapi-schema.json | 5 ++++-
>>>>    vl.c             | 8 ++++++++
>>>>    2 files changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>>> index 85f7800..0423b47 100644
>>>> --- a/qapi-schema.json
>>>> +++ b/qapi-schema.json
>>>> @@ -154,12 +154,15 @@
>>>>    # @watchdog: the watchdog action is configured to pause and has been triggered
>>>>    #
>>>>    # @guest-panicked: guest has been panicked as a result of guest OS panic
>>>> +#
>>>> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since
>>>> +# 2.6)
>>>>    ##
>>>>    { 'enum': 'RunState',
>>>>      'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>>>>                'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>>>>                'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
>>>> -            'guest-panicked' ] }
>>>> +            'guest-panicked', 'colo' ] }
>>>>
>>>>    ##
>>>>    # @StatusInfo:
>>>> diff --git a/vl.c b/vl.c
>>>> index f84fde8..fca630b 100644
>>>> --- a/vl.c
>>>> +++ b/vl.c
>>>> @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>>        { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG },
>>>>        { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED },
>>>>        { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
>>>> +    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
>>>>
>>>>        { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
>>>>        { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
>>>> @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>>
>>>>        { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
>>>> +    { RUN_STATE_PAUSED, RUN_STATE_COLO},
>>>>
>>>>        { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
>>>> @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>>
>>>>        { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
>>>> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
>>>>
>>>>        { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>>>>
>>>> +    { RUN_STATE_COLO, RUN_STATE_RUNNING },
>>>> +
>>>>        { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>>>>        { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
>>>>        { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
>>>> @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>>        { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
>>>>        { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
>>>>        { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
>>>> +    { RUN_STATE_RUNNING, RUN_STATE_COLO},
>>>>
>>>>        { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
>>>>
>>>> @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>>        { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>>>        { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>>>> +    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>>>>
>>>>        { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>>>> +    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
>>>>
>>>>        { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>>>>        { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
>>>
>>> Pardon my ignorance, but could you explain the new run state in a bit
>>> more detail for me?
>>>
>>
>> OK, in normally, we only need switch between COLO and RUNNING state.
>> But we can't forbid users to issue other command while VM is COLO state.
>>
>> In every checkpoint, we have to pause to send VM's state to SVM, and before we
>> pause VM, users may issue 'stop' command, which will change state to
>> 'RUN_STATE_PAUSE',
>> we don't want to abort VM because of this command. (Actually, we will
>> support 'stop' VM
>> during VM is in COLO state). So we need the state machine
>> 'RUN_STATE_PAUSED -> RUN_STATE_COLO'.
>
> What's the next state then?
>

We may switch to RUN_STATE_RUNNING, actually, here, the RUN_STATE_COLO is only used to
indicate that VM is stopped in COLO process.

>> We enter COLO state just after a full migration process which the last
>> state will be
>> 'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter
>> COLO loop, we may get
>> 'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need
>> state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and
>> 'RUN_STATE_INMIGRATE, RUN_STATE_COLO'.
>> The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or
>> users may issue standby command.
>> We need to ensure VM not be crashed.
>>
>> Actually, we may need more states which can go to 'colo' state, maybe
>> just follow the cases of
>> 'MIGRATE' state.
>
> I believe we should fully work out the state transitions added by COLO.
> I like to write that down in this form:
>
>      (state, trigger) -> (action, state')
>

I'm a little confused, for runstate_transitions_def, it seems that,
the state transition is a simple way: (state1, state2). Here we only switch to
RUN_STATE_COLO state when we need to do something with VM is paused.

> Example:
>
>      (running, checkpoint) -> (begin-checkpointing, colo)
>

Do you want me to add these new states into runstate_transitions_def ?
What's the real status (running or stopping) of 'checkpoint' and 'colo' for VM here ?

> with a suitable explanation of 'checkpoint' and 'begin-checkpointing'.
>
> For brevity, multiple
>
>      (state1, trigger) -> (action, state')
>      (state2, trigger) -> (action, state')
>      ...
>      (stateN, trigger) -> (action, state')
>
> can be abbreviated to
>
>      ({state1, state2, stateN}, trigger) -> (action, state')
>
> Example:
>
>      ({running, paused, ...}, checkpoint) -> (begin-checkpointing, colo)
>
> For clarity, chains of state transitions should be described in the
> order they happen.
>
> Pictures showing the states connected with transition arrows labelled
> with the trigger can help.
>
> Two properties to check:
>
> 1. Correctness: every state transition thus written down does the right
>     thing.
>
> 2. Completeness: for every pair (state, trigger), we got a state
>     transition, or an explanation why it cannot happen.
>
>> Thanks,
>> zhanghailiang
>>
>>> Your additions to runstate_transitions_def[] show we can go *from* state
>>> 'colo' only to state 'running', but we can go *to* state 'colo' from
>>> various other states.  This may well be sane, but it's not *obviously*
>>> sane :)
>
> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol
  2016-01-11 12:47       ` Markus Armbruster
@ 2016-01-12 12:57         ` Hailiang Zhang
  0 siblings, 0 replies; 94+ messages in thread
From: Hailiang Zhang @ 2016-01-12 12:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, arei.gonglei, stefanha, amit.shah, dgilbert,
	hongyang.yang

On 2016/1/11 20:47, Markus Armbruster wrote:
> Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:
>
>> Hi Markus,
>>
>> On 2015/12/19 16:54, Markus Armbruster wrote:
>>> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
>>> eye nearby), please pardon my ignorance of COLO in general, and previous
>>> review of this series in particular.
>>>
>>
>> Thanks all the same :)
> [...]
>>>> diff --git a/migration/colo.c b/migration/colo.c
>>>> index 0ab9618..0ce2a6e 100644
>>>> --- a/migration/colo.c
>>>> +++ b/migration/colo.c
>>>> @@ -10,10 +10,12 @@
>>>>     * later.  See the COPYING file in the top-level directory.
>>>>     */
>>>>
>>>> +#include <unistd.h>
>>>>    #include "sysemu/sysemu.h"
>>>>    #include "migration/colo.h"
>>>>    #include "trace.h"
>>>>    #include "qemu/error-report.h"
>>>> +#include "qemu/sockets.h"
>>>>
>>>>    bool colo_supported(void)
>>>>    {
>>>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>>>        return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>>    }
>>>>
>>>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    if (cmd >= COLO_COMMAND_MAX) {
>>>
>>> Needs a trivial rebase due to commit 7fb1cf1.
>>>
>>
>>>> +        error_report("%s: Invalid cmd", __func__);
>>>> +        return -EINVAL;
>>>
>>> Can this run in a context with different error handling needs?
>>>
>>> Or asked differently: who may ultimately handle this error?  Whoever
>>> that may be, how does it need to report errors?
>>>
>>> Peeking ahead: the immediate callers don't handle this error, they just
>>> pass it on their callers.
>>>
>>> I'm asking because I'm trying to understand whether error_report() is
>>> appropriate here, or whether you need to use error_setg(), and leave the
>>> actual reporting to the spot that ultimately handles this error.
>>>
>>
>> Hmm, i know what you mean, we handled them all together after exit
>> from the colo process loop,
>> Use error_setg() seems to be a good idea, with this modification, we
>> can also drop the return
>> value. I will fix it in next version.
>>
>>
>>>> +    }
>>>> +    qemu_put_be32(f, cmd);
>>>> +    qemu_fflush(f);
>>>> +
>>>> +    ret = qemu_file_get_error(f);
>>>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>>>> +
>>>> +    return ret;
>>>> +}
>>>
>>> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>>>
>>
>> OK, i will change it to use enum COLOCommand.
>>
>>>> +
>>>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    *cmd = qemu_get_be32(f);
>>>> +    ret = qemu_file_get_error(f);
>>>> +    if (ret < 0) {
>>>> +        return ret;
>>>> +    }
>>>> +    if (*cmd >= COLO_COMMAND_MAX) {
>>>> +        error_report("%s: Invalid cmd", __func__);
>>>> +        return -EINVAL;
>>>> +    }
>>>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>>>> +    return 0;
>>>> +}
>>>
>>> Same question.
>>>
>>> The "get" in the name suggests the function returns the value gotten,
>>> like similarly named function elsewhere in migration/ do.
>>>
>> Do you mean it should return the cmd value directly, not though parameter way ?
>> After we convert it to use error_setg() to indicate success or not, we
>> can do like that.
>> I will fix it.
>
> Sounds good to me.
>

I have fixed them in v13 version :)

> [...]
>>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>>> index c9ff34e..85f7800 100644
>>>> --- a/qapi-schema.json
>>>> +++ b/qapi-schema.json
>>>> @@ -720,6 +720,31 @@
>>>>    { 'command': 'migrate-start-postcopy' }
>>>>
>>>>    ##
>>>> +# @COLOCommand
>>>> +#
>>>> +# The commands for COLO fault tolerance
>>>> +#
>>>> +# @checkpoint-ready: SVM is ready for checkpointing
>>>> +#
>>>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>>>> +#
>>>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>>>> +#
>>>> +# @vmstate-send: VM's state will be sent by PVM.
>>>> +#
>>>> +# @vmstate-size: The total size of VMstate.
>>>> +#
>>>> +# @vmstate-received: VM's state has been received by SVM.
>>>> +#
>>>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>>>> +#
>>>> +# Since: 2.6
>>>> +##
>>>> +{ 'enum': 'COLOCommand',
>>>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>>>> +            'vmstate-loaded' ] }
>>>> +
>>>
>>> Space after 'vmstate-size', please.
>>>
>>
>>> 'vmstate-size' is not used in this patch.  You may want to add it with
>>> its first use instead.
>>>
>>
>> OK, i will move it to the corresponding patch.
>>
>>> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
>>> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
>>> others look like replies.
>>>
>>
>> Yes, COLOCommand is not so exact. what about name it COLOProtocol?
>
> A protocol specifies valid sequences of messages, and what they mean.
> This isn't a protocol, it's a message within a protocol.  COLOMessage?
>

Yes, COLOMessage is more precise, i will fix it in next version.

>>>
>>>>    # @MouseInfo:
>>>>    #
>>>>    # Information about a mouse device.
>>>> diff --git a/trace-events b/trace-events
>>>> index 5565e79..39fdd8d 100644
>>>> --- a/trace-events
>>>> +++ b/trace-events
>>>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>>>
>>>>    # migration/colo.c
>>>>    colo_vm_state_change(const char *old, const char *new) "Change
>>>> '%s' => '%s'"
>>>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>>>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>>>
>>>>    # kvm-all.c
>>>>    kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>>
>>> I like how this commit creates just the two state machines, and leaves
>>> filling in their actions to later commits.  Helps ignorant rewiewers
>>> like me :)
>>>
>>>
>>
>> Do you mean i should split this patch ? Leave this patch with the
>> simplest colo process,
>> maybe just 'ready, request, reply', and add the other states in later patch?
>
> No, I *like* how you split up the work.
>

OK.

Thanks.
Hailiang

> .
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2016-01-12 12:58 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-12-15  9:46   ` Wen Congyang
2015-12-15 11:19     ` Hailiang Zhang
2015-12-15 11:31     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state() zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming zhanghailiang
2015-12-15 17:36   ` Dr. David Alan Gilbert
2015-12-16  5:37     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 08/38] migration: Rename the'file' member of MigrationState zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
2015-12-18 14:52   ` Dr. David Alan Gilbert
2015-12-28  7:34     ` Hailiang Zhang
2015-12-19  8:54   ` Markus Armbruster
2015-12-22  7:00     ` Hailiang Zhang
2016-01-11 12:47       ` Markus Armbruster
2016-01-12 12:57         ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-12-19  9:27   ` Markus Armbruster
2015-12-22 13:32     ` Hailiang Zhang
2016-01-11 13:16       ` Markus Armbruster
2016-01-12 12:54         ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 12/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 13/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
2015-12-18 15:18   ` Dr. David Alan Gilbert
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 15/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 16/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 17/38] COLO: Load VMState into qsb before restore it zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2015-12-15 11:07   ` Changlong Xie
2015-12-25  3:03     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2015-12-19  9:33   ` Markus Armbruster
2015-12-22 13:43     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 20/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-12-18 15:27   ` Dr. David Alan Gilbert
2015-12-19  9:38   ` Markus Armbruster
2015-12-22 13:50     ` Hailiang Zhang
2015-12-25  2:27       ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 22/38] COLO failover: Introduce state to record failover process zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM zhanghailiang
2015-12-18 15:35   ` Dr. David Alan Gilbert
2015-12-28  7:39     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 24/38] COLO: Implement failover work for Secondary VM zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
2015-12-18 16:03   ` Eric Blake
2015-12-23  1:55     ` Hailiang Zhang
2015-12-19 10:02   ` Markus Armbruster
2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
2015-12-23  3:14       ` Hailiang Zhang
2015-12-23  1:24     ` [Qemu-devel] " Wen Congyang
2016-01-05 19:21       ` [Qemu-devel] [Qemu-block] " John Snow
2015-12-23  3:10     ` [Qemu-devel] " Hailiang Zhang
2016-01-11 13:24       ` Markus Armbruster
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2015-12-15  9:44   ` Dr. David Alan Gilbert
2015-12-15 10:23   ` Dr. David Alan Gilbert
2015-12-16  5:58     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-12-15 10:21   ` Dr. David Alan Gilbert
2015-12-25  1:02     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
2015-12-15 11:31   ` Dr. David Alan Gilbert
2015-12-25  6:13     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state zhanghailiang
2015-12-15 11:52   ` Dr. David Alan Gilbert
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
2015-12-15 12:08   ` Dr. David Alan Gilbert
2015-12-25  6:37     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2015-12-18 10:53   ` Dr. David Alan Gilbert
2015-12-28  3:46     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2015-12-18 12:01   ` Dr. David Alan Gilbert
2015-12-28  7:29     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 33/38] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 34/38] filter-buffer: Accept zero interval zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 35/38] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 36/38] filter-buffer: Introduce a helper function to release packets zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 37/38] colo: Use default buffer-filter to buffer and " zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 38/38] COLO: Add block replication into colo process zhanghailiang
2015-12-15 12:14 ` [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
2015-12-15 12:41   ` Hailiang Zhang
2015-12-17 10:52     ` Dr. David Alan Gilbert
2015-12-18  1:10       ` Hailiang Zhang
2015-12-18 15:47         ` Dr. David Alan Gilbert
2015-12-23  1:24           ` Hailiang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.