All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
@ 2015-02-12  3:16 zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 01/27] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (29 more replies)
  0 siblings, 30 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini

This is the 3th version of COLO, it is only COLO frame part, include: VM checkpoint,
failover, proxy API, block replication API, not include block replication.
The block part has been sent by wencongyang:
'[RFC PATCH 00/14] Block replication for continuous checkpoints'

You can get the integrated qemu colo patches from github:
https://github.com/coloft/qemu/commits/colo-v1.0

Compared with the previous version, we have realized all parts of COLO frame, 
and it is works now.

The main change since last version is, we use colo proxy mode instead of
colo agent, they are all used for network packets compare, but proxy is more
efficient, it is based on netfilter.
Another modification is we implement new block replication scheme, 
you can get more info from wencongyang's block patch series 

If you don't know about COLO, please refer to below link for detailed 
information.

The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum in 2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf

Previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html

The below is the detail about test COLO, you can also get this info
from http://wiki.qemu.org/Features/COLO.
* Hardware requirements
There is at least one directly connected nic to forward the network requests
from client to secondary VM. The directly connected nic must not be used by
any other purpose.

* Network link topology
=================================normal ======================================
                                +--------+
                                |client  |
   master                       +----+---+                    slave
-------------------------+           |            + -------------------------+
   PVM                   |           +            |                          |
+-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
|guest  |     +---+-+    |                        |       +---+-+            |
|     [tap0]--+ br0 |    |                        |       | br0 |            |
|       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+      SVM   |
+-------+                |                        |    |            +-------+|
                         |                        |    |  +-----+   | guest ||
                       [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
                         |                        |       +-----+   |       ||
                         |                        |                 +-------+|
-------------------------+                        +--------------------------+
e.g.
master:
br0: 192.168.0.33
eth1: 192.168.1.33
eth2: 192.168.2.33

slave:
br0: 192.168.0.88
br1: no ip address
eth1: 192.168.1.88
eth2: 192.168.2.88
(Actually, you can also use eth0 as checkpoint channel)
Note: in normal, SVM will always be linked to br1 like above until
failover.

* Test environment prepare:
1. Set Up the Bridge and network environment
You must setup you network environment like above picture,
In master, setup a bridge br0, using command brctl, like:
# ifconfig eth0 down
# ifconfig eth0 0.0.0.0
# brctl addbr br0
# brctl addif br0 eth0
# ifconfig br0 192.168.0.33 netmask 255.255.255.0
# ifconfig eth0 up
In slave, setup two bridge br0, br1, commands are same with above,
please note that br1 is linked to eth1(the forward nic).

2.Qemu-ifup
We need a script to bring up the TAP interface.
You can find this info from http://en.wikibooks.org/wiki/QEMU/Networking.
Master:
root@master# cat /etc/qemu-ifup
#!/bin/sh
switch=br0
if [ -n "$1" ]; then
        ip link set $1 up
        brctl addif ${switch} $1
fi
Slave:
root@slave # cat /etc/qemu-ifup
#!/bin/sh
switch=br1  #in primary, switch is br0. in secondary switch is br1
if [ -n "$1" ]; then
        ip link set $1 up
        brctl addif ${switch} $1
fi 

3. Prepare host kernel
colo-proxy kernel module need cooperate with linux kernel.
You should put a kernel patch 'colo-patch-for-kernel.patch'
(It's based on linux kernel-3.19) which you can get from 
https://github.com/gao-feng/colo-proxy.git
and then compile kernel and intall the new kernel.

4. Proxy module
proxy module is used for network packets compare, you can also get the lastest
version from: https://github.com/gao-feng/colo-proxy.git.
You can compile and install it by using command 'make' && 'make install'.

5. Modified iptables
We have add a new rule to iptables command, so please get the patch from
https://github.com/gao-feng/colo-proxy/blob/master/COLO-library_for_iptables-1.4.21.patch
It is based on version 1.4.21.

6. Qemu colo
Checkout the latest colo branch from
https://github.com/coloft/qemu/commits/colo-v1.0
configure and make: 
# ./configure --target-list=x86_64-softmmu --enable-colo --enable-quorum 
# make

* Test steps:
1. load module
# modprobe nf_conntrack_colo (Other colo module will be automatically loaded by
script colo-proxy-script.sh)
# modprobe xt_mark
# modprobe kvm-intel

2. startup qemu
master:
# qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=quorum,read-pattern=first,children.0.file.filename=suse11_3.img,children.0.driver=raw,children.1.file.driver=nbd+colo,children.1.file.host=192.168.2.88,children.1.file.port=8889,children.1.file.export=colo1,children.1.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S
slave:
# qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=blkcolo,export=colo1,backing.file.filename=suse11_3.img,backing.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8888

3. On Secondary VM's QEMU monitor, run
(qemu) nbd_server_start 192.168.2.88:8889 

4.on Primary VM's QEMU monitor, run following command:
(qemu) migrate_set_capability colo on
(qemu) migrate tcp:192.168.2.88:8888

5. done
You will see two runing VMs, whenever you make changes to PVM, SVM
will be synced to PVM's state.

6. failover test:
You can kill SVM (PVM) and run 'colo_lost_heartbeat' in PVM's (SVM's) monitor
at the same time, then PVM (SVM) will failover and client will not feel this
change.

It is still a framework, far away from commercial use,
so any comments/feedbacks are warmly welcomed ;)

PS: 
We (huawei) have cooperated with fujitsu on COLO work,
and we work mainly on COLO frame and fujitsu will focus on COLO block.

TODO list:
1) Optimize the process of checkpoint, shorten the time-consuming
2) Add more debug/stat info 
3) Strengthen failover 
4) The capability of continuous FT

v3:
- use proxy instead of colo agent to compare network packets
- add block replication
- Optimize failover disposal
- handle shutdown

v2:
- use QEMUSizedBuffer/QEMUFile as COLO buffer
- colo support is enabled by default
- add nic replication support
- addressed comments from Eric Blake and Dr. David Alan Gilbert

v1:
- implement the frame of colo


zhanghailiang (27):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'colo' to migration
  COLO: migrate colo related info to slave
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  migration: Don't send vm description in COLO mode
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save VM state to slave when do checkpoint
  COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
  COLO VMstate: Load VM state into qsb before restore it
  COLO RAM: Flush cached RAM into SVM's memory
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Implement COLO master/slave failover work
  COLO failover: Don't do failover during loading VM's state
  COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
  COLO NIC: Implement colo nic device interface configure()
  COLO NIC : Implement colo nic init/destroy function
  COLO NIC: Some init work related with proxy module
  COLO: Do checkpoint according to the result of net packets comparing
  COLO: Improve checkpoint efficiency by do additional periodic
    checkpoint
  COLO NIC: Implement NIC checkpoint and failover
  COLO: Disable qdev hotplug when VM is in COLO mode
  COLO: Implement shutdown checkpoint
  COLO: Add block replication into colo process

 arch_init.c                            | 196 ++++++++-
 configure                              |  14 +
 hmp-commands.hx                        |  15 +
 hmp.c                                  |   7 +
 hmp.h                                  |   1 +
 include/exec/cpu-all.h                 |   1 +
 include/migration/migration-colo.h     |  57 +++
 include/migration/migration-failover.h |  22 +
 include/migration/migration.h          |  14 +
 include/migration/qemu-file.h          |   3 +-
 include/net/colo-nic.h                 |  25 ++
 include/net/net.h                      |   4 +
 include/sysemu/sysemu.h                |   3 +
 migration/Makefile.objs                |   2 +
 migration/colo-comm.c                  |  81 ++++
 migration/colo-failover.c              |  48 +++
 migration/colo.c                       | 743 +++++++++++++++++++++++++++++++++
 migration/migration.c                  |  74 +++-
 migration/qemu-file-buf.c              |  57 +++
 net/Makefile.objs                      |   1 +
 net/colo-nic.c                         | 438 +++++++++++++++++++
 net/tap.c                              |  45 +-
 qapi-schema.json                       |  27 +-
 qemu-options.hx                        |  10 +-
 qmp-commands.hx                        |  19 +
 savevm.c                               |  10 +-
 scripts/colo-proxy-script.sh           |  88 ++++
 stubs/Makefile.objs                    |   1 +
 stubs/migration-colo.c                 |  49 +++
 vl.c                                   |  36 +-
 30 files changed, 2047 insertions(+), 44 deletions(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 include/migration/migration-failover.h
 create mode 100644 include/net/colo-nic.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 net/colo-nic.c
 create mode 100755 scripts/colo-proxy-script.sh
 create mode 100644 stubs/migration-colo.c

-- 
1.7.12.4

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 01/27] configure: Add parameter for configure to enable/disable COLO support
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration zhanghailiang
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is on by default.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 configure | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/configure b/configure
index 7ba4bcb..958429e 100755
--- a/configure
+++ b/configure
@@ -258,6 +258,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -923,6 +924,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1323,6 +1328,10 @@ Advanced options (experts only):
   --disable-slirp          disable SLIRP userspace network connectivity
   --disable-kvm            disable KVM acceleration support
   --enable-kvm             enable KVM acceleration support
+  --disable-colo           disable COarse-grain LOck-stepping Virtual
+                           Machines for Non-stop Service
+  --enable-colo            enable COarse-grain LOck-stepping Virtual
+                           Machines for Non-stop Service (default)
   --disable-rdma           disable RDMA-based migration support
   --enable-rdma            enable RDMA-based migration support
   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
@@ -4364,6 +4373,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "COLO support      $colo"
 echo "RDMA support      $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
@@ -4922,6 +4932,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 01/27] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-16 21:57   ` Eric Blake
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave zhanghailiang
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

This capability allows Primary VM (PVM) to be continuously checkpointed
to secondary VM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration.h |  1 +
 migration/migration.c         | 15 +++++++++++++++
 qapi-schema.json              |  5 ++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f37348b..3f5c705 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -154,6 +154,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_enable_colo(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/migration.c b/migration/migration.c
index b3adbc6..8403c8a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -276,6 +276,15 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 
     for (cap = params; cap; cap = cap->next) {
+#ifndef CONFIG_COLO
+        if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
+            cap->value->state) {
+            error_setg(errp, "COLO is not currently supported, please"
+                             " configure with --enable-colo option in order to"
+                             " support COLO feature");
+            continue;
+        }
+#endif
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
 }
@@ -585,6 +594,12 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
+bool migrate_enable_colo(void)
+{
+    MigrationState *s = migrate_get_current();
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_COLO];
+}
+
 /* migration thread support */
 
 static void *migration_thread(void *opaque)
diff --git a/qapi-schema.json b/qapi-schema.json
index e16f8eb..8c59e50 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -494,10 +494,13 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @colo: If enabled, the migration will never end, and the VM will instead be
+#          continuously checkpointed. (since 2.3)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'colo'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 01/27] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-16 23:20   ` Eric Blake
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration zhanghailiang
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

We can know if we should go into COLO mode by the info that
has been migrated from PVM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/migration/migration-colo.h | 21 ++++++++++++++
 migration/Makefile.objs            |  1 +
 migration/colo-comm.c              | 56 ++++++++++++++++++++++++++++++++++++++
 vl.c                               |  5 +++-
 4 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
new file mode 100644
index 0000000..d52ebd0
--- /dev/null
+++ b/include/migration/migration-colo.h
@@ -0,0 +1,21 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_COLO_H
+#define QEMU_MIGRATION_COLO_H
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+
+void colo_info_mig_init(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..97b72ad 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 0000000..8caa948
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,56 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <migration/migration-colo.h>
+
+/* #define DEBUG_COLO */
+
+#ifdef DEBUG_COLO
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stdout, "COLO: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+static bool colo_requested;
+
+/* save */
+static void colo_info_save(QEMUFile *f, void *opaque)
+{
+    qemu_put_byte(f, migrate_enable_colo());
+}
+
+/* restore */
+static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
+{
+    int value = qemu_get_byte(f);
+
+    if (value && !colo_requested) {
+        DPRINTF("COLO requested!\n");
+    }
+    colo_requested = value;
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_colo_info_handlers = {
+    .save_state = colo_info_save,
+    .load_state = colo_info_load,
+};
+
+void colo_info_mig_init(void)
+{
+    register_savevm_live(NULL, "colo", -1, 1,
+                         &savevm_colo_info_handlers, NULL);
+}
diff --git a/vl.c b/vl.c
index 8c8f142..40badc4 100644
--- a/vl.c
+++ b/vl.c
@@ -89,6 +89,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/migration-colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4147,7 +4148,9 @@ int main(int argc, char **argv, char **envp)
 
     blk_mig_init();
     ram_mig_init();
-
+#ifdef CONFIG_COLO
+    colo_info_mig_init();
+#endif
     /* If the currently selected machine wishes to override the units-per-bus
      * property of its default HBA interface type, do so now. */
     if (machine_class->units_per_default_bus) {
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (2 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-16 23:27   ` Eric Blake
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 05/27] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Lai Jiangshan

Add a migrate state: MIG_STATE_COLO, enter this migration state
after the first live migration successfully finished.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration-colo.h |  2 ++
 include/migration/migration.h      | 13 +++++++
 migration/Makefile.objs            |  1 +
 migration/colo.c                   | 72 ++++++++++++++++++++++++++++++++++++++
 migration/migration.c              | 38 +++++++++++---------
 stubs/Makefile.objs                |  1 +
 stubs/migration-colo.c             | 17 +++++++++
 7 files changed, 128 insertions(+), 16 deletions(-)
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index d52ebd0..b72662c 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -18,4 +18,6 @@
 
 void colo_info_mig_init(void);
 
+void colo_init_checkpointer(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3f5c705..c4c98d2 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -65,6 +65,19 @@ struct MigrationState
     int64_t dirty_sync_count;
 };
 
+enum {
+    MIG_STATE_ERROR = -1,
+    MIG_STATE_NONE,
+    MIG_STATE_SETUP,
+    MIG_STATE_CANCELLING,
+    MIG_STATE_CANCELLED,
+    MIG_STATE_ACTIVE,
+    MIG_STATE_COLO,
+    MIG_STATE_COMPLETED,
+};
+
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 97b72ad..895583e 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += colo-comm.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 0000000..f40b0d8
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,72 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "sysemu/sysemu.h"
+#include "migration/migration-colo.h"
+#include "qemu/error-report.h"
+
+/* #define DEBUG_COLO */
+
+#ifdef DEBUG_COLO
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while (0)
+#endif
+
+static QEMUBH *colo_bh;
+
+static void *colo_thread(void *opaque)
+{
+    MigrationState *s = opaque;
+
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm resume to run\n");
+
+
+    /*TODO: COLO checkpoint savevm loop*/
+
+    migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
+
+    qemu_mutex_lock_iothread();
+    qemu_bh_schedule(s->cleanup_bh);
+    qemu_mutex_unlock_iothread();
+
+    return NULL;
+}
+
+static void colo_start_checkpointer(void *opaque)
+{
+    MigrationState *s = opaque;
+
+    if (colo_bh) {
+        qemu_bh_delete(colo_bh);
+        colo_bh = NULL;
+    }
+
+    qemu_mutex_unlock_iothread();
+    qemu_thread_join(&s->thread);
+    qemu_mutex_lock_iothread();
+
+    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COLO);
+
+    qemu_thread_create(&s->thread, "colo", colo_thread, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+    colo_bh = qemu_bh_new(colo_start_checkpointer, s);
+    qemu_bh_schedule(colo_bh);
+}
diff --git a/migration/migration.c b/migration/migration.c
index 8403c8a..536ba01e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -25,16 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
-
-enum {
-    MIG_STATE_ERROR = -1,
-    MIG_STATE_NONE,
-    MIG_STATE_SETUP,
-    MIG_STATE_CANCELLING,
-    MIG_STATE_CANCELLED,
-    MIG_STATE_ACTIVE,
-    MIG_STATE_COMPLETED,
-};
+#include "migration/migration-colo.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -227,6 +218,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIG_STATE_COLO:
+        info->has_status = true;
+        info->status = g_strdup("colo");
+        /* TODO: display COLO specific informations(checkpoint info etc.),*/
+        break;
     case MIG_STATE_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -270,7 +266,8 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+    if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+        s->state == MIG_STATE_COLO) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -291,7 +288,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(MigrationState *s, int old_state, int new_state)
 {
     if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) {
         trace_migrate_set_state(new_state);
@@ -437,7 +434,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.shared = has_inc && inc;
 
     if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
-        s->state == MIG_STATE_CANCELLING) {
+        s->state == MIG_STATE_CANCELLING || s->state == MIG_STATE_COLO) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -611,6 +608,7 @@ static void *migration_thread(void *opaque)
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    bool enable_colo = migrate_enable_colo();
 
     qemu_savevm_state_begin(s->file, &s->params);
 
@@ -647,7 +645,10 @@ static void *migration_thread(void *opaque)
                 }
 
                 if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED);
+                    if (!enable_colo) {
+                        migrate_set_state(s, MIG_STATE_ACTIVE,
+                                          MIG_STATE_COMPLETED);
+                    }
                     break;
                 }
             }
@@ -697,11 +698,16 @@ static void *migration_thread(void *opaque)
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
-        if (old_vm_running) {
+        if (s->state == MIG_STATE_ACTIVE && enable_colo) {
+            colo_init_checkpointer(s);
+        } else if (old_vm_running) {
             vm_start();
         }
     }
-    qemu_bh_schedule(s->cleanup_bh);
+
+    if (!enable_colo) {
+        qemu_bh_schedule(s->cleanup_bh);
+    }
     qemu_mutex_unlock_iothread();
 
     return NULL;
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 5e347d0..9fe6b4c 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -40,3 +40,4 @@ stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
 stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += migration-colo.o
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
new file mode 100644
index 0000000..b2cff9c
--- /dev/null
+++ b/stubs/migration-colo.c
@@ -0,0 +1,17 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+void colo_init_checkpointer(MigrationState *s)
+{
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 05/27] migration: Integrate COLO checkpoint process into loadvm
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (3 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 06/27] migration: Don't send vm description in COLO mode zhanghailiang
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 include/migration/migration-colo.h | 14 +++++++++++++-
 migration/colo-comm.c              | 10 ++++++++++
 migration/colo.c                   | 15 ++++++++++++++-
 migration/migration.c              | 21 ++++++++++++++++++++-
 stubs/migration-colo.c             |  5 +++++
 5 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index b72662c..9f1443b 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -15,9 +15,21 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "block/coroutine.h"
+#include "qemu/thread.h"
 
 void colo_info_mig_init(void);
 
-void colo_init_checkpointer(MigrationState *s);
+struct colo_incoming {
+    QEMUFile *file;
+    QemuThread thread;
+};
 
+void colo_init_checkpointer(MigrationState *s);
+/* loadvm */
+extern Coroutine *migration_incoming_co;
+bool loadvm_enable_colo(void);
+void loadvm_exit_colo(void);
+void *colo_process_incoming_checkpoints(void *opaque);
+bool loadvm_in_colo_state(void);
 #endif
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 8caa948..038d12f 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -54,3 +54,13 @@ void colo_info_mig_init(void)
     register_savevm_live(NULL, "colo", -1, 1,
                          &savevm_colo_info_handlers, NULL);
 }
+
+bool loadvm_enable_colo(void)
+{
+    return colo_requested;
+}
+
+void loadvm_exit_colo(void)
+{
+    colo_requested = false;
+}
diff --git a/migration/colo.c b/migration/colo.c
index f40b0d8..b54bb52 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -24,7 +24,7 @@ do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
 #endif
 
 static QEMUBH *colo_bh;
-
+static Coroutine *colo;
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
@@ -70,3 +70,16 @@ void colo_init_checkpointer(MigrationState *s)
     colo_bh = qemu_bh_new(colo_start_checkpointer, s);
     qemu_bh_schedule(colo_bh);
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+    colo = qemu_coroutine_self();
+    assert(colo != NULL);
+
+    /* TODO: COLO checkpoint restore loop */
+
+    colo = NULL;
+    loadvm_exit_colo();
+
+    return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 536ba01e..81f911f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -79,6 +79,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     }
 }
 
+Coroutine *migration_incoming_co;
 static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
@@ -86,7 +87,25 @@ static void process_incoming_migration_co(void *opaque)
     int ret;
 
     ret = qemu_loadvm_state(f);
-    qemu_fclose(f);
+
+    /* we get colo info, and know if we are in colo mode */
+    if (loadvm_enable_colo()) {
+        struct colo_incoming *colo_in = g_malloc0(sizeof(*colo_in));
+
+        colo_in->file = f;
+        migration_incoming_co = qemu_coroutine_self();
+        qemu_thread_create(&colo_in->thread, "colo incoming",
+             colo_process_incoming_checkpoints, colo_in, QEMU_THREAD_JOINABLE);
+        qemu_coroutine_yield();
+        migration_incoming_co = NULL;
+#if 0
+        /* FIXME  wait checkpoint incoming thread exit, and free resource */
+        qemu_thread_join(&colo_in->thread);
+        g_free(colo_in);
+#endif
+    } else {
+        qemu_fclose(f);
+    }
     free_xbzrle_decoded_buf();
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index b2cff9c..7a3dbc5 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -15,3 +15,8 @@
 void colo_init_checkpointer(MigrationState *s)
 {
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+    return NULL;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 06/27] migration: Don't send vm description in COLO mode
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (4 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 05/27] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 07/27] COLO: Implement colo checkpoint protocol zhanghailiang
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

Commit 8118f09 add vm description to the end of migration
stream, but in COLO mode, we use the migration channel to send
control code, so the additional info been send will cause slave
receive unexpect control msg, so just do not send vm description
when migrate under COLO mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 savevm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index 8040766..7d79a4b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -782,9 +782,11 @@ void qemu_savevm_state_complete(QEMUFile *f)
     qjson_finish(vmdesc);
     vmdesc_len = strlen(qjson_get_str(vmdesc));
 
-    qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
-    qemu_put_be32(f, vmdesc_len);
-    qemu_put_buffer(f, (uint8_t *)qjson_get_str(vmdesc), vmdesc_len);
+    if (!migrate_enable_colo()) {
+        qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
+        qemu_put_be32(f, vmdesc_len);
+        qemu_put_buffer(f, (uint8_t *)qjson_get_str(vmdesc), vmdesc_len);
+    }
     object_unref(OBJECT(vmdesc));
 
     qemu_fflush(f);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 07/27] COLO: Implement colo checkpoint protocol
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (5 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 06/27] migration: Don't send vm description in COLO mode zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 08/27] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

                  Primary                 Secondary
  NEW             @
                                          Suspend
  SUSPENDED                               @
                  Suspend&Save state
  SEND            @
                  Send state              Receive state
  RECEIVED                                @
                  Flush network           Load state
  LOADED                                  @
                  Resume                  Resume

                  Start Comparing
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 migration/colo.c | 231 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 228 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index b54bb52..bb7a1b1 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -23,22 +23,174 @@ do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
 #define DPRINTF(fmt, ...) do {} while (0)
 #endif
 
+enum {
+    COLO_READY = 0x46,
+
+    /*
+    * Checkpoint synchronizing points.
+    *
+    *                  Primary                 Secondary
+    *  NEW             @
+    *                                          Suspend
+    *  SUSPENDED                               @
+    *                  Suspend&Save state
+    *  SEND            @
+    *                  Send state              Receive state
+    *  RECEIVED                                @
+    *                  Flush network           Load state
+    *  LOADED                                  @
+    *                  Resume                  Resume
+    *
+    *                  Start Comparing
+    * NOTE:
+    * 1) '@' who sends the message
+    * 2) Every sync-point is synchronized by two sides with only
+    *    one handshake(single direction) for low-latency.
+    *    If more strict synchronization is required, a opposite direction
+    *    sync-point should be added.
+    * 3) Since sync-points are single direction, the remote side may
+    *    go forward a lot when this side just receives the sync-point.
+    */
+    COLO_CHECKPOINT_NEW,
+    COLO_CHECKPOINT_SUSPENDED,
+    COLO_CHECKPOINT_SEND,
+    COLO_CHECKPOINT_RECEIVED,
+    COLO_CHECKPOINT_LOADED,
+};
+
 static QEMUBH *colo_bh;
 static Coroutine *colo;
+
+/* colo checkpoint control helper */
+static int colo_ctl_put(QEMUFile *f, uint64_t request)
+{
+    int ret = 0;
+
+    qemu_put_be64(f, request);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+
+    return ret;
+}
+
+static int colo_ctl_get_value(QEMUFile *f, uint64_t *value)
+{
+    int ret = 0;
+    uint64_t temp;
+
+    temp = qemu_get_be64(f);
+
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return -1;
+    }
+
+    *value = temp;
+    return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint64_t require)
+{
+    int ret;
+    uint64_t value;
+
+    ret = colo_ctl_get_value(f, &value);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (value != require) {
+        error_report("unexpected state! expected: %"PRIu64
+                     ", received: %"PRIu64, require, value);
+        exit(1);
+    }
+
+    return ret;
+}
+
+static int do_colo_transaction(MigrationState *s, QEMUFile *control)
+{
+    int ret;
+
+    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to slave */
+
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("got COLO_CHECKPOINT_RECEIVED\n");
+    ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("got COLO_CHECKPOINT_LOADED\n");
+
+    /* TODO: resume master */
+
+out:
+    return ret;
+}
+
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    QEMUFile *colo_control = NULL;
+    int ret;
+
+    colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
+    if (!colo_control) {
+        error_report("Open colo_control failed!");
+        goto out;
+    }
+
+    /*
+     * Wait for slave finish loading vm states and enter COLO
+     * restore.
+     */
+    ret = colo_ctl_get(colo_control, COLO_READY);
+    if (ret < 0) {
+        goto out;
+    }
+    DPRINTF("get COLO_READY\n");
 
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     DPRINTF("vm resume to run\n");
 
+    while (s->state == MIG_STATE_COLO) {
+        /* start a colo checkpoint */
+        if (do_colo_transaction(s, colo_control)) {
+            goto out;
+        }
+    }
 
-    /*TODO: COLO checkpoint savevm loop*/
-
+out:
     migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
 
+
+    if (colo_control) {
+        qemu_fclose(colo_control);
+    }
+
     qemu_mutex_lock_iothread();
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
@@ -71,14 +223,87 @@ void colo_init_checkpointer(MigrationState *s)
     qemu_bh_schedule(colo_bh);
 }
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * -1: some error happened, exit colo restore
+ */
+static int slave_wait_new_checkpoint(QEMUFile *f)
+{
+    int ret;
+    uint64_t cmd;
+
+    ret = colo_ctl_get_value(f, &cmd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    switch (cmd) {
+    case COLO_CHECKPOINT_NEW:
+        return 0;
+    default:
+        return -1;
+    }
+}
+
 void *colo_process_incoming_checkpoints(void *opaque)
 {
+    struct colo_incoming *colo_in = opaque;
+    QEMUFile *f = colo_in->file;
+    int fd = qemu_get_fd(f);
+    QEMUFile *ctl = NULL;
+    int ret;
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
-    /* TODO: COLO checkpoint restore loop */
+    ctl = qemu_fopen_socket(fd, "wb");
+    if (!ctl) {
+        error_report("Can't open incoming channel!");
+        goto out;
+    }
+    ret = colo_ctl_put(ctl, COLO_READY);
+    if (ret < 0) {
+        goto out;
+    }
+    /* TODO: in COLO mode, slave is runing, so start the vm */
+    while (true) {
+        if (slave_wait_new_checkpoint(f)) {
+            break;
+        }
 
+        /* TODO: suspend guest */
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("Got COLO_CHECKPOINT_SEND\n");
+
+        /* TODO: read migration data into colo buffer */
+
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("Recived vm state\n");
+
+        /* TODO: load vm state */
+
+        ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
+        if (ret < 0) {
+            goto out;
+        }
+}
+
+out:
     colo = NULL;
+    if (ctl) {
+        qemu_fclose(ctl);
+    }
     loadvm_exit_colo();
 
     return NULL;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 08/27] COLO: Add a new RunState RUN_STATE_COLO
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (6 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 07/27] COLO: Implement colo checkpoint protocol zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 09/27] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Lai Jiangshan

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 qapi-schema.json | 5 ++++-
 vl.c             | 8 ++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 8c59e50..0e7e21e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -148,12 +148,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.3)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked' ] }
+            'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index 40badc4..aed26c1 100644
--- a/vl.c
+++ b/vl.c
@@ -552,6 +552,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
+    { RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
     { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -561,6 +562,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
     { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_PAUSED, RUN_STATE_COLO},
 
     { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -571,9 +573,12 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+    { RUN_STATE_COLO, RUN_STATE_RUNNING },
+
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
     { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
     { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -584,6 +589,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
     { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
     { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+    { RUN_STATE_RUNNING, RUN_STATE_COLO},
 
     { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -594,9 +600,11 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
     { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+    { RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 09/27] QEMUSizedBuffer: Introduce two help functions for qsb
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (7 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 08/27] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 10/27] COLO: Save VM state to slave when do checkpoint zhanghailiang
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c     | 57 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a923cec..07039e2 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -141,7 +141,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
                        uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
                      off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index e97e0bd..78170bc 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -392,6 +392,63 @@ QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
     return out;
 }
 
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
+{
+    int i, l;
+
+    for (i = 0; i < qsb->n_iov && size > 0; i++) {
+        l = MIN(qsb->iov[i].iov_len, size);
+        qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+        size -= l;
+    }
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
+{
+    ssize_t rc = qsb_grow(qsb, size);
+    int pending = size, i;
+    qsb->used = 0;
+    uint8_t *buf = NULL;
+
+    if (rc < 0) {
+        return rc;
+    }
+
+    for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+        int doneone = 0;
+        /* read until iov full */
+        while (doneone < qsb->iov[i].iov_len && pending > 0) {
+            int readone = 0;
+            buf = qsb->iov[i].iov_base;
+            readone = qemu_get_buffer(f, buf,
+                                MIN(qsb->iov[i].iov_len - doneone, pending));
+            if (readone == 0) {
+                return qsb->used;
+            }
+            buf += readone;
+            doneone += readone;
+            pending -= readone;
+            qsb->used += readone;
+        }
+    }
+    return qsb->used;
+}
+
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 10/27] COLO: Save VM state to slave when do checkpoint
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (8 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 09/27] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 11/27] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

We should save PVM's RAM/device to slave when needed.

For VM state, we  will cache them in slave, we use QEMUSizedBuffer
to store the data, we need know the data size of VM state, so in master,
we use qsb to store VM state temporarily, and then migrate the data to
slave.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 arch_init.c                        | 33 ++++++++++++++++--
 include/migration/migration-colo.h |  2 ++
 migration/colo.c                   | 68 +++++++++++++++++++++++++++++++++++---
 savevm.c                           |  2 +-
 stubs/migration-colo.c             |  5 +++
 5 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 89c8fa4..a07ca76 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -52,6 +52,7 @@
 #include "exec/ram_addr.h"
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
+#include "migration/migration-colo.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -767,11 +768,17 @@ static void ram_migration_cancel(void *opaque)
 
 static void reset_ram_globals(void)
 {
-    last_seen_block = NULL;
     last_sent_block = NULL;
     last_offset = 0;
     last_version = ram_list.version;
-    ram_bulk_stage = true;
+    if (migrate_in_colo_state()) {
+        ram_bulk_stage = false;
+    } else {
+        ram_bulk_stage = true;
+    }
+    if (ram_bulk_stage) {
+        last_seen_block = NULL;
+    }
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -781,6 +788,18 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
+    /*
+     * migration has already setup the bitmap, reuse it.
+     */
+    if (migrate_in_colo_state()) {
+        qemu_mutex_lock_ramlist();
+        bytes_transferred = 0;
+        migration_dirty_pages = 0;
+        reset_ram_globals();
+        DPRINTF("ram_save_setup for colo\n");
+        goto out_setup;
+    }
+
     mig_throttle_on = false;
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
@@ -841,6 +860,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap_sync();
     qemu_mutex_unlock_iothread();
 
+out_setup:
     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
@@ -950,7 +970,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
 
     ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-    migration_end();
+
+    /*
+     * Since we need to reuse dirty bitmap in colo,
+     * don't cleanup the bitmap.
+     */
+    if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) {
+        migration_end();
+    }
 
     qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 9f1443b..0e281b9 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -26,6 +26,8 @@ struct colo_incoming {
 };
 
 void colo_init_checkpointer(MigrationState *s);
+bool migrate_in_colo_state(void);
+
 /* loadvm */
 extern Coroutine *migration_incoming_co;
 bool loadvm_enable_colo(void);
diff --git a/migration/colo.c b/migration/colo.c
index bb7a1b1..8de0303 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -60,6 +60,9 @@ enum {
 
 static QEMUBH *colo_bh;
 static Coroutine *colo;
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
+QEMUSizedBuffer *colo_buffer;
 
 /* colo checkpoint control helper */
 static int colo_ctl_put(QEMUFile *f, uint64_t request)
@@ -109,9 +112,17 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
     return ret;
 }
 
+bool migrate_in_colo_state(void)
+{
+    MigrationState *s = migrate_get_current();
+    return (s->state == MIG_STATE_COLO);
+}
+
 static int do_colo_transaction(MigrationState *s, QEMUFile *control)
 {
     int ret;
+    size_t size;
+    QEMUFile *trans = NULL;
 
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
     if (ret < 0) {
@@ -122,16 +133,47 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
     if (ret < 0) {
         goto out;
     }
+    /* Reset colo buffer and open it for write */
+    qsb_set_length(colo_buffer, 0);
+    trans = qemu_bufopen("w", colo_buffer);
+    if (!trans) {
+        error_report("Open colo buffer for write failed");
+        goto out;
+    }
+
+    /* suspend and save vm state to colo buffer */
+    qemu_mutex_lock_iothread();
+    vm_stop_force_state(RUN_STATE_COLO);
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm is stoped\n");
+
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_mutex_lock_iothread();
+    qemu_savevm_state_begin(trans, &s->params);
+    qemu_savevm_state_complete(trans);
+    qemu_mutex_unlock_iothread();
 
-    /* TODO: suspend and save vm state to colo buffer */
+    qemu_fflush(trans);
 
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
     }
+    /* we send the total size of the vmstate first */
+    size = qsb_get_length(colo_buffer);
+    ret = colo_ctl_put(s->file, size);
+    if (ret < 0) {
+        goto out;
+    }
 
-    /* TODO: send vmstate to slave */
-
+    qsb_put_buffer(s->file, colo_buffer, size);
+    qemu_fflush(s->file);
+    ret = qemu_file_get_error(s->file);
+    if (ret < 0) {
+        goto out;
+    }
     ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
     if (ret < 0) {
         goto out;
@@ -143,9 +185,18 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
     }
     DPRINTF("got COLO_CHECKPOINT_LOADED\n");
 
-    /* TODO: resume master */
+    ret = 0;
+    /* resume master */
+    qemu_mutex_lock_iothread();
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm resume to run again\n");
 
 out:
+    if (trans) {
+        qemu_fclose(trans);
+    }
+
     return ret;
 }
 
@@ -171,6 +222,12 @@ static void *colo_thread(void *opaque)
     }
     DPRINTF("get COLO_READY\n");
 
+    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (colo_buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -186,6 +243,9 @@ static void *colo_thread(void *opaque)
 out:
     migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
 
+    if (colo_buffer) {
+        qsb_free(colo_buffer);
+    }
 
     if (colo_control) {
         qemu_fclose(colo_control);
diff --git a/savevm.c b/savevm.c
index 7d79a4b..4c8540a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -42,7 +42,7 @@
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
-
+#include "migration/migration-colo.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 7a3dbc5..274dfcf 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -20,3 +20,8 @@ void *colo_process_incoming_checkpoints(void *opaque)
 {
     return NULL;
 }
+
+bool migrate_in_colo_state(void)
+{
+    return false;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 11/27] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (9 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 10/27] COLO: Save VM state to slave when do checkpoint zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 12/27] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

The ram cache is initially the same as SVM/PVM's memory.

At checkpoint, we cache the dirty RAM of PVM into RAM cache in the slave
(so that RAM cache always the same as PVM's memory at every
checkpoint), we will flush cached RAM to SVM after we receive
all PVM's vmstate (RAM/device).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 arch_init.c                        | 72 ++++++++++++++++++++++++++++++++++++--
 include/exec/cpu-all.h             |  1 +
 include/migration/migration-colo.h |  3 ++
 migration/colo.c                   | 27 +++++++++++---
 4 files changed, 96 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a07ca76..4a1d825 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -330,6 +330,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -1035,6 +1036,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
     return 0;
 }
 
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block);
 static inline void *host_from_stream_offset(QEMUFile *f,
                                             ram_addr_t offset,
                                             int flags)
@@ -1049,7 +1051,17 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             return NULL;
         }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        if (ram_cache_enable) {
+            /*
+            * During colo checkpoint, we need bitmap of these migrated pages.
+            * It help us to decide which pages in ram cache should be flushed
+            * into VM's RAM later.
+            */
+            migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+            return memory_region_get_ram_cache_ptr(block->mr, block) + offset;
+        } else {
+            return memory_region_get_ram_ptr(block->mr) + offset;
+        }
     }
 
     len = qemu_get_byte(f);
@@ -1058,8 +1070,14 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
         if (!strncmp(id, block->idstr, sizeof(id)) &&
-            block->max_length > offset) {
-            return memory_region_get_ram_ptr(block->mr) + offset;
+            block->used_length > offset) {
+            if (ram_cache_enable) {
+                migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+                return memory_region_get_ram_cache_ptr(block->mr, block)
+                       + offset;
+            } else {
+                return memory_region_get_ram_ptr(block->mr) + offset;
+            }
         }
     }
 
@@ -1195,6 +1213,54 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it will be called after first migration.
+ */
+void create_and_init_ram_cache(void)
+{
+    RAMBlock *block;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        block->host_cache = g_malloc(block->used_length);
+        memcpy(block->host_cache, block->host, block->used_length);
+    }
+
+    ram_cache_enable = true;
+}
+
+void release_ram_cache(void)
+{
+    RAMBlock *block;
+
+    ram_cache_enable = false;
+    if (migration_bitmap) {
+        memory_global_dirty_log_stop();
+        g_free(migration_bitmap);
+        migration_bitmap = NULL;
+    }
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        g_free(block->host_cache);
+    }
+}
+
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
+{
+   if (mr->alias) {
+        return memory_region_get_ram_cache_ptr(mr->alias, block) +
+               mr->alias_offset;
+    }
+
+    assert(mr->terminates);
+
+    ram_addr_t addr = mr->ram_addr & TARGET_PAGE_MASK;
+
+    assert(addr - block->offset < block->used_length);
+
+    return block->host_cache + (addr - block->offset);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 2c48286..5c59c3c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -270,6 +270,7 @@ typedef struct RAMBlock RAMBlock;
 struct RAMBlock {
     struct MemoryRegion *mr;
     uint8_t *host;
+    uint8_t *host_cache; /* For colo, VM's ram cache */
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 0e281b9..7d43aed 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -34,4 +34,7 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+/* ram cache */
+void create_and_init_ram_cache(void);
+void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 8de0303..cfde0f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -321,17 +321,29 @@ void *colo_process_incoming_checkpoints(void *opaque)
         error_report("Can't open incoming channel!");
         goto out;
     }
+
+    create_and_init_ram_cache();
+
     ret = colo_ctl_put(ctl, COLO_READY);
     if (ret < 0) {
         goto out;
     }
-    /* TODO: in COLO mode, slave is runing, so start the vm */
+    qemu_mutex_lock_iothread();
+    /* in COLO mode, slave is runing, so start the vm */
+    vm_start();
+    qemu_mutex_unlock_iothread();
+    DPRINTF("vm is start\n");
     while (true) {
         if (slave_wait_new_checkpoint(f)) {
             break;
         }
 
-        /* TODO: suspend guest */
+        /* suspend guest */
+        qemu_mutex_lock_iothread();
+        vm_stop_force_state(RUN_STATE_COLO);
+        qemu_mutex_unlock_iothread();
+        DPRINTF("suspend vm for checkpoint\n");
+
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
         if (ret < 0) {
             goto out;
@@ -343,7 +355,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Got COLO_CHECKPOINT_SEND\n");
 
-        /* TODO: read migration data into colo buffer */
+        /*TODO Load VM state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
         if (ret < 0) {
@@ -351,16 +363,23 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Recived vm state\n");
 
-        /* TODO: load vm state */
+        /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
             goto out;
         }
+
+        /* resume guest */
+        qemu_mutex_lock_iothread();
+        vm_start();
+        qemu_mutex_unlock_iothread();
+        DPRINTF("OK, vm runs again\n");
 }
 
 out:
     colo = NULL;
+    release_ram_cache();
     if (ctl) {
         qemu_fclose(ctl);
     }
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 12/27] COLO VMstate: Load VM state into qsb before restore it
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (10 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 11/27] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
@ 2015-02-12  3:16 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang

We should cache the device state to ensure the data is intact
before restore it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 migration/colo.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index cfde0f5..a0e1b7a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -311,8 +311,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
     struct colo_incoming *colo_in = opaque;
     QEMUFile *f = colo_in->file;
     int fd = qemu_get_fd(f);
-    QEMUFile *ctl = NULL;
+    QEMUFile *ctl = NULL, *fb = NULL;
     int ret;
+    uint64_t total_size;
+
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
@@ -328,6 +330,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
     if (ret < 0) {
         goto out;
     }
+
+    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+    if (colo_buffer == NULL) {
+        error_report("Failed to allocate colo buffer!");
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     /* in COLO mode, slave is runing, so start the vm */
     vm_start();
@@ -355,14 +364,39 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Got COLO_CHECKPOINT_SEND\n");
 
-        /*TODO Load VM state */
+        /* read the VM state total size first */
+        ret = colo_ctl_get_value(f, &total_size);
+        if (ret < 0) {
+            goto out;
+        }
+        DPRINTF("vmstate total size = %ld\n", total_size);
+        /* read vm device state into colo buffer */
+        ret = qsb_fill_buffer(colo_buffer, f, total_size);
+        if (ret != total_size) {
+            error_report("can't get all migration data");
+            goto out;
+        }
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
         if (ret < 0) {
             goto out;
         }
         DPRINTF("Recived vm state\n");
+        /* open colo buffer for read */
+        fb = qemu_bufopen("r", colo_buffer);
+        if (!fb) {
+            error_report("can't open colo buffer for read");
+            goto out;
+        }
 
+        qemu_mutex_lock_iothread();
+        if (qemu_loadvm_state(fb) < 0) {
+            error_report("COLO: loadvm failed");
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        DPRINTF("Finish load all vm state to cache\n");
+        qemu_mutex_unlock_iothread();
         /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
@@ -375,14 +409,27 @@ void *colo_process_incoming_checkpoints(void *opaque)
         vm_start();
         qemu_mutex_unlock_iothread();
         DPRINTF("OK, vm runs again\n");
-}
+
+        qemu_fclose(fb);
+        fb = NULL;
+    }
 
 out:
     colo = NULL;
+
+    if (fb) {
+        qemu_fclose(fb);
+    }
+
     release_ram_cache();
     if (ctl) {
         qemu_fclose(ctl);
     }
+
+    if (colo_buffer) {
+        qsb_free(colo_buffer);
+    }
+
     loadvm_exit_colo();
 
     return NULL;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (11 preceding siblings ...)
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 12/27] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-03-11 19:08   ` Dr. David Alan Gilbert
  2015-03-11 20:07   ` Dr. David Alan Gilbert
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover zhanghailiang
                   ` (16 subsequent siblings)
  29 siblings, 2 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

We only need to flush RAM that is both dirty on PVM and SVM since
last checkpoint. Besides, we must ensure flush RAM cache before load
device state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
 include/migration/migration-colo.h |  1 +
 migration/colo.c                   |  1 -
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4a1d825..f70de23 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
     static uint64_t seq_iter;
+    bool need_flush = false;
 
     seq_iter++;
 
@@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 break;
             }
 
+            need_flush = true;
             ch = qemu_get_byte(f);
             ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
             break;
@@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 break;
             }
 
+            need_flush = true;
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
             break;
         case RAM_SAVE_FLAG_XBZRLE:
@@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
+            need_flush = true;
             break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
@@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             ret = qemu_file_get_error(f);
         }
     }
-
+    if (!ret  && ram_cache_enable && need_flush) {
+        DPRINTF("Flush ram_cache\n");
+        colo_flush_ram_cache();
+    }
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);
     return ret;
@@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 void create_and_init_ram_cache(void)
 {
     RAMBlock *block;
+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
         block->host_cache = g_malloc(block->used_length);
@@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
     }
 
     ram_cache_enable = true;
+    /*
+    * Start dirty log for slave VM, we will use this dirty bitmap together with
+    * VM's cache RAM dirty bitmap to decide which page in cache should be
+    * flushed into VM's RAM.
+    */
+    migration_bitmap = bitmap_new(ram_cache_pages);
+    migration_dirty_pages = 0;
+    memory_global_dirty_log_start();
 }
 
 void release_ram_cache(void)
@@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
     return block->host_cache + (addr - block->offset);
 }
 
+static inline
+ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
+                                            ram_addr_t start)
+{
+    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
+    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
+    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
+
+    unsigned long next;
+
+    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
+                         size, nr);
+    if (next < size) {
+        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+    }
+    return (next - base) << TARGET_PAGE_BITS;
+}
+
+void colo_flush_ram_cache(void)
+{
+    RAMBlock *block = NULL;
+    void *dst_host;
+    void *src_host;
+    ram_addr_t ca  = 0, ha = 0;
+    bool got_ca = 0, got_ha = 0;
+    int64_t host_dirty = 0, both_dirty = 0;
+
+    address_space_sync_dirty_bitmap(&address_space_memory);
+
+    block = QTAILQ_FIRST(&ram_list.blocks);
+    while (true) {
+        if (ca < block->used_length && ca <= ha) {
+            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
+            if (ca < block->used_length) {
+                got_ca = 1;
+            }
+        }
+        if (ha < block->used_length && ha <= ca) {
+            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
+            if (ha < block->used_length && ha != ca) {
+                got_ha = 1;
+            }
+            host_dirty += (ha < block->used_length ? 1 : 0);
+            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
+        }
+        if (ca >= block->used_length && ha >= block->used_length) {
+            ca = 0;
+            ha = 0;
+            block = QTAILQ_NEXT(block, next);
+            if (!block) {
+                break;
+            }
+        } else {
+            if (got_ha) {
+                got_ha = 0;
+                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
+                           + ha;
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+            }
+            if (got_ca) {
+                got_ca = 0;
+                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
+                           + ca;
+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+            }
+        }
+    }
+
+    assert(migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 7d43aed..2084fe2 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
 /* ram cache */
 void create_and_init_ram_cache(void);
+void colo_flush_ram_cache(void);
 void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index a0e1b7a..5ff2ee8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
         DPRINTF("Finish load all vm state to cache\n");
         qemu_mutex_unlock_iothread();
-        /* TODO: flush vm state */
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (12 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-16 23:47   ` Eric Blake
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 15/27] COLO failover: Implement COLO master/slave failover work zhanghailiang
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang,
	Lai Jiangshan

We leave users to use whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use command
'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example,
If send the command to PVM, Primary will exit COLO mode, and takeover,
if to Secondary, Secondary will do failover work and at last takeover server.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 hmp-commands.hx                        | 15 ++++++++++++++
 hmp.c                                  |  7 +++++++
 hmp.h                                  |  1 +
 include/migration/migration-colo.h     |  1 +
 include/migration/migration-failover.h | 20 ++++++++++++++++++
 migration/Makefile.objs                |  2 +-
 migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
 migration/colo.c                       |  1 +
 qapi-schema.json                       |  9 ++++++++
 qmp-commands.hx                        | 19 +++++++++++++++++
 stubs/migration-colo.c                 |  8 +++++++
 11 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..b05e4da 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -985,6 +985,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "colo_lost_heartbeat",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Tell COLO that heartbeat is lost,\n\t\t\t"
+                      "a failover or takeover is needed.",
+        .mhandler.cmd = hmp_colo_lost_heartbeat,
+    },
+
+STEXI
+@item colo_lost_heartbeat
+@findex colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index b47f331..aa99616 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1140,6 +1140,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_colo_lost_heartbeat(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 4bb5dca..a67fbf0 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,6 +64,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 2084fe2..27a515a 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "block/coroutine.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 void colo_info_mig_init(void);
 
diff --git a/include/migration/migration-failover.h b/include/migration/migration-failover.h
new file mode 100644
index 0000000..5fd376a
--- /dev/null
+++ b/include/migration/migration-failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_FAILOVER_H
+#define MIGRATION_FAILOVER_H
+
+#include "qemu-common.h"
+
+void failover_request_set(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 895583e..50d8392 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += colo-comm.o
-common-obj-$(CONFIG_COLO) += colo.o
+common-obj-$(CONFIG_COLO) += colo.o colo-failover.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
new file mode 100644
index 0000000..af78054
--- /dev/null
+++ b/migration/colo-failover.c
@@ -0,0 +1,38 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+#include "migration/migration-failover.h"
+#include "qmp-commands.h"
+
+static bool failover_request;
+
+static QEMUBH *failover_bh;
+
+static void colo_failover_bh(void *opaque)
+{
+    qemu_bh_delete(failover_bh);
+    failover_bh = NULL;
+    /*TODO: Do failover work */
+}
+
+void failover_request_set(void)
+{
+    failover_request = true;
+    failover_bh = qemu_bh_new(colo_failover_bh, NULL);
+    qemu_bh_schedule(failover_bh);
+}
+
+void qmp_colo_lost_heartbeat(Error **errp)
+{
+    failover_request_set();
+}
diff --git a/migration/colo.c b/migration/colo.c
index 5ff2ee8..cd84e4d 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
+#include "migration/migration-failover.h"
 
 /* #define DEBUG_COLO */
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 0e7e21e..4873561 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -543,6 +543,15 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @colo-lost-heartbeat
+#
+# Tell COLO that heartbeat is lost
+#
+# Since: 2.3
+##
+{ 'command': 'colo-lost-heartbeat' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index a85d847..1b4a5ca 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -753,6 +753,25 @@ Example:
 EQMP
 
     {
+        .name       = "colo_lost_heartbeat",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
+    },
+
+SQMP
+colo_lost_heartbeat
+--------------------
+
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+
+Example:
+
+-> { "execute": "colo_lost_heartbeat" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 274dfcf..a690b04 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -11,6 +11,7 @@
  */
 
 #include "migration/migration-colo.h"
+#include "qmp-commands.h"
 
 void colo_init_checkpointer(MigrationState *s)
 {
@@ -25,3 +26,10 @@ bool migrate_in_colo_state(void)
 {
     return false;
 }
+
+void qmp_colo_lost_heartbeat(Error **errp)
+{
+    error_setg(errp, "COLO is not supported, please rerun configure"
+                     " with --enable-colo option in order to support"
+                     " COLO feature");
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 15/27] COLO failover: Implement COLO master/slave failover work
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (13 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 16/27] COLO failover: Don't do failover during loading VM's state zhanghailiang
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Lai Jiangshan

If failover is requested, after some cleanup work,
PVM or SVM will exit COLO mode, and resume to normal run.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/migration/migration-colo.h     |  14 ++++
 include/migration/migration-failover.h |   2 +
 migration/colo-comm.c                  |  10 +++
 migration/colo-failover.c              |  12 +++-
 migration/colo.c                       | 122 ++++++++++++++++++++++++++++++++-
 stubs/migration-colo.c                 |   5 ++
 6 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 27a515a..3bdd1ae 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -21,6 +21,13 @@
 
 void colo_info_mig_init(void);
 
+/* Checkpoint control, called in migration/checkpoint thread */
+enum {
+    COLO_UNPROTECTED_MODE = 0,
+    COLO_PRIMARY_MODE,
+    COLO_SECONDARY_MODE,
+};
+
 struct colo_incoming {
     QEMUFile *file;
     QemuThread thread;
@@ -35,8 +42,15 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+
+int get_colo_mode(void);
+
 /* ram cache */
 void create_and_init_ram_cache(void);
 void colo_flush_ram_cache(void);
 void release_ram_cache(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration-failover.h b/include/migration/migration-failover.h
index 5fd376a..385fab3 100644
--- a/include/migration/migration-failover.h
+++ b/include/migration/migration-failover.h
@@ -16,5 +16,7 @@
 #include "qemu-common.h"
 
 void failover_request_set(void);
+void failover_request_clear(void);
+bool failover_request_is_set(void);
 
 #endif
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 038d12f..57bc6cd 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -32,6 +32,16 @@ static void colo_info_save(QEMUFile *f, void *opaque)
 }
 
 /* restore */
+int get_colo_mode(void)
+{
+    if (migrate_in_colo_state()) {
+        return COLO_PRIMARY_MODE;
+    } else if (loadvm_in_colo_state()) {
+        return COLO_SECONDARY_MODE;
+    } else {
+        return COLO_UNPROTECTED_MODE;
+    }
+}
 static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
 {
     int value = qemu_get_byte(f);
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index af78054..850b05c 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -22,7 +22,7 @@ static void colo_failover_bh(void *opaque)
 {
     qemu_bh_delete(failover_bh);
     failover_bh = NULL;
-    /*TODO: Do failover work */
+    colo_do_failover(NULL);
 }
 
 void failover_request_set(void)
@@ -32,6 +32,16 @@ void failover_request_set(void)
     qemu_bh_schedule(failover_bh);
 }
 
+void failover_request_clear(void)
+{
+    failover_request = false;
+}
+
+bool failover_request_is_set(void)
+{
+    return failover_request;
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     failover_request_set();
diff --git a/migration/colo.c b/migration/colo.c
index cd84e4d..bcde1ec 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -65,6 +65,68 @@ static Coroutine *colo;
 #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
 QEMUSizedBuffer *colo_buffer;
 
+static bool colo_runstate_is_stopped(void)
+{
+    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+/*
+ * there are two way to entry this function
+ * 1. From colo checkpoint incoming thread, in this case
+ * we should protect it by iothread lock
+ * 2. From user command, because hmp/qmp command
+ * was happened in main loop, iothread lock will cause a
+ * dead lock.
+ */
+static void slave_do_failover(void)
+{
+    DPRINTF("do_failover!\n");
+
+    colo = NULL;
+
+    if (!autostart) {
+        error_report("\"-S\" qemu option will be ignored in colo slave side");
+        /* recover runstate to normal migration finish state */
+        autostart = true;
+    }
+
+    /* On slave side, jump to incoming co */
+    if (migration_incoming_co) {
+        qemu_coroutine_enter(migration_incoming_co, NULL);
+    }
+}
+
+static void master_do_failover(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (s->state != MIG_STATE_ERROR) {
+        migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
+    }
+
+    vm_start();
+}
+
+static bool failover_completed;
+void colo_do_failover(MigrationState *s)
+{
+    /* Make sure vm stopped while failover */
+    if (!colo_runstate_is_stopped()) {
+        vm_stop_force_state(RUN_STATE_COLO);
+    }
+
+    if (get_colo_mode() == COLO_SECONDARY_MODE) {
+        slave_do_failover();
+    } else {
+        master_do_failover();
+    }
+    failover_completed = true;
+}
+
 /* colo checkpoint control helper */
 static int colo_ctl_put(QEMUFile *f, uint64_t request)
 {
@@ -142,11 +204,23 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
         goto out;
     }
 
+    if (failover_request_is_set()) {
+        ret = -1;
+        goto out;
+    }
     /* suspend and save vm state to colo buffer */
     qemu_mutex_lock_iothread();
     vm_stop_force_state(RUN_STATE_COLO);
     qemu_mutex_unlock_iothread();
     DPRINTF("vm is stoped\n");
+    /*
+     * failover request bh could be called after
+     * vm_stop_force_state so we check failover_request_is_set() again.
+     */
+    if (failover_request_is_set()) {
+        ret = -1;
+        goto out;
+    }
 
     /* Disable block migration */
     s->params.blk = 0;
@@ -242,7 +316,18 @@ static void *colo_thread(void *opaque)
     }
 
 out:
-    migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
+    fprintf(stderr, "colo: some error happens in colo_thread\n");
+    qemu_mutex_lock_iothread();
+    if (!failover_request_is_set()) {
+        error_report("master takeover from checkpoint channel");
+        failover_request_set();
+    }
+    qemu_mutex_unlock_iothread();
+
+    while (!failover_completed) {
+        ;
+    }
+    failover_request_clear();
 
     if (colo_buffer) {
         qsb_free(colo_buffer);
@@ -284,6 +369,11 @@ void colo_init_checkpointer(MigrationState *s)
     qemu_bh_schedule(colo_bh);
 }
 
+bool loadvm_in_colo_state(void)
+{
+    return colo != NULL;
+}
+
 /*
  * return:
  * 0: start a checkpoint
@@ -347,6 +437,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
         if (slave_wait_new_checkpoint(f)) {
             break;
         }
+        if (failover_request_is_set()) {
+            error_report("failover request from heartbeat channel");
+            goto out;
+        }
 
         /* suspend guest */
         qemu_mutex_lock_iothread();
@@ -415,6 +509,32 @@ void *colo_process_incoming_checkpoints(void *opaque)
     }
 
 out:
+    fprintf(stderr, "Detect some error or get a failover request\n");
+    /* determine whether we need to failover */
+    if (!failover_request_is_set()) {
+        /*
+        * TODO: Here, maybe we should raise a qmp event to the user,
+        * It can help user to know what happens, and help deciding whether to
+        * do failover.
+        */
+        usleep(2000 * 1000);
+    }
+    /* check flag again*/
+    if (!failover_request_is_set()) {
+        /*
+        * We assume that master is still alive according to heartbeat,
+        * just kill slave
+        */
+        error_report("SVM is going to exit!");
+        exit(1);
+    } else {
+        /* if we went here, means master may dead, we are doing failover */
+        while (!failover_completed) {
+            ;
+        }
+        failover_request_clear();
+    }
+
     colo = NULL;
 
     if (fb) {
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index a690b04..c3514c8 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -27,6 +27,11 @@ bool migrate_in_colo_state(void)
     return false;
 }
 
+bool loadvm_in_colo_state(void)
+{
+    return false;
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     error_setg(errp, "COLO is not supported, please rerun configure"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 16/27] COLO failover: Don't do failover during loading VM's state
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (14 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 15/27] COLO failover: Implement COLO master/slave failover work zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Lai Jiangshan

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 migration/colo.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index bcde1ec..82459ec 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -60,6 +60,7 @@ enum {
 };
 
 static QEMUBH *colo_bh;
+static bool vmstate_loading;
 static Coroutine *colo;
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
@@ -80,7 +81,10 @@ static bool colo_runstate_is_stopped(void)
  */
 static void slave_do_failover(void)
 {
-    DPRINTF("do_failover!\n");
+    /* Wait for incoming thread loading vmstate */
+    while (vmstate_loading) {
+        ;
+    }
 
     colo = NULL;
 
@@ -114,6 +118,7 @@ static void master_do_failover(void)
 static bool failover_completed;
 void colo_do_failover(MigrationState *s)
 {
+    DPRINTF("do_failover!\n");
     /* Make sure vm stopped while failover */
     if (!colo_runstate_is_stopped()) {
         vm_stop_force_state(RUN_STATE_COLO);
@@ -485,12 +490,15 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
 
         qemu_mutex_lock_iothread();
+        vmstate_loading = true;
         if (qemu_loadvm_state(fb) < 0) {
             error_report("COLO: loadvm failed");
+            vmstate_loading = false;
             qemu_mutex_unlock_iothread();
             goto out;
         }
         DPRINTF("Finish load all vm state to cache\n");
+        vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (15 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 16/27] COLO failover: Don't do failover during loading VM's state zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-16 23:50   ` Eric Blake
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 18/27] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

The 'colo_nicname' should be assigned with network name,
for exmple, 'eth2'. It will be parameter of 'colo_script',
'colo_script' should be assigned with an scirpt path.

We parse these parameter in tap.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/net.h |  4 ++++
 net/tap.c         | 27 ++++++++++++++++++++++++---
 qapi-schema.json  |  8 +++++++-
 qemu-options.hx   | 10 +++++++++-
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 008d610..6095c28 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -84,6 +84,10 @@ struct NetClientState {
     char *model;
     char *name;
     char info_str[256];
+    char colo_script[1024];
+    char colo_nicname[128];
+    char ifname[128];
+    char ifb[2][128];
     unsigned receive_disabled : 1;
     NetClientDestructor *destructor;
     unsigned int queue_index;
diff --git a/net/tap.c b/net/tap.c
index 1fe0edf..f744dcc 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -607,6 +607,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
 {
     TAPState *s;
     int vhostfd;
+    NetClientState *nc = NULL;
 
     s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
     if (!s) {
@@ -634,6 +635,17 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
     }
 
+    nc = &(s->nc);
+    snprintf(nc->ifname, sizeof(nc->ifname), "%s", ifname);
+    if (tap->has_colo_script) {
+        snprintf(nc->colo_script, sizeof(nc->colo_script), "%s",
+                 tap->colo_script);
+    }
+    if (tap->has_colo_nicname) {
+        snprintf(nc->colo_nicname, sizeof(nc->colo_nicname), "%s",
+                 tap->colo_nicname);
+    }
+
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
@@ -750,9 +762,10 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (tap->has_ifname || tap->has_script || tap->has_downscript ||
             tap->has_vnet_hdr || tap->has_helper || tap->has_queues ||
-            tap->has_vhostfd) {
+            tap->has_vhostfd || tap->has_colo_script || tap->has_colo_nicname) {
             error_report("ifname=, script=, downscript=, vnet_hdr=, "
                          "helper=, queues=, and vhostfd= "
+                         "colo_script=, and colo_nicname= "
                          "are invalid with fds=");
             return -1;
         }
@@ -791,9 +804,11 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
         }
     } else if (tap->has_helper) {
         if (tap->has_ifname || tap->has_script || tap->has_downscript ||
-            tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds) {
+            tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds ||
+            tap->has_colo_script || tap->has_colo_nicname) {
             error_report("ifname=, script=, downscript=, and vnet_hdr= "
-                         "queues=, and vhostfds= are invalid with helper=");
+                         "queues=, vhostfds=, colo_script=, and "
+                         "colo_nicname= are invalid with helper=");
             return -1;
         }
 
@@ -812,6 +827,12 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             return -1;
         }
     } else {
+        if (queues > 1 && (tap->has_colo_script || tap->has_colo_nicname)) {
+            error_report("queues > 1 is invalid if colo_script or "
+                         "colo_nicname is specified");
+            return -1;
+        }
+
         if (tap->has_vhostfds) {
             error_report("vhostfds= is invalid if fds= wasn't specified");
             return -1;
diff --git a/qapi-schema.json b/qapi-schema.json
index 4873561..779acd2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2101,6 +2101,10 @@
 #
 # @queues: #optional number of queues to be created for multiqueue capable tap
 #
+# @colo_nicname: #optional the host physical nic for QEMU (Since 2.3)
+#
+# @colo_script: #optional the script file which used by COLO (Since 2.3)
+#
 # Since 1.2
 ##
 { 'type': 'NetdevTapOptions',
@@ -2117,7 +2121,9 @@
     '*vhostfd':    'str',
     '*vhostfds':   'str',
     '*vhostforce': 'bool',
-    '*queues':     'uint32'} }
+    '*queues':     'uint32',
+    '*colo_nicname':  'str',
+    '*colo_script':   'str'} }
 
 ##
 # @NetdevSocketOptions
diff --git a/qemu-options.hx b/qemu-options.hx
index 85ca3ad..3e9757c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1418,7 +1418,11 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "-net tap[,vlan=n][,name=str],ifname=name\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
 #else
-    "-net tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n"
+    "-net tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]"
+#ifdef CONFIG_COLO
+    "[,colo_nicname=nicname][,colo_script=scriptfile]"
+#endif
+    "\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
     "                use network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
     "                to configure it and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -1438,6 +1442,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "                use 'vhostfd=h' to connect to an already opened vhost net device\n"
     "                use 'vhostfds=x:y:...:z to connect to multiple already opened vhost net devices\n"
     "                use 'queues=n' to specify the number of queues to be created for multiqueue TAP\n"
+#ifdef CONFIG_COLO
+    "                use 'colo_nicname=nicname' to specify the host physical nic for QEMU\n"
+    "                use 'colo_script=scriptfile' to specify script file when colo is enabled\n"
+#endif
     "-net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n"
     "                connects a host TAP network interface to a host bridge device 'br'\n"
     "                (default=" DEFAULT_BRIDGE_INTERFACE ") using the program 'helper'\n"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 18/27] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (16 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure() zhanghailiang
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

When COLO mode, we will do some init work for nic that will be used for COLO.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h | 20 ++++++++++++++
 net/Makefile.objs      |  1 +
 net/colo-nic.c         | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/tap.c              | 18 +++++++++----
 stubs/migration-colo.c |  9 +++++++
 5 files changed, 116 insertions(+), 5 deletions(-)
 create mode 100644 include/net/colo-nic.h
 create mode 100644 net/colo-nic.c

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
new file mode 100644
index 0000000..d35ee17
--- /dev/null
+++ b/include/net/colo-nic.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef COLO_NIC_H
+#define COLO_NIC_H
+
+void colo_add_nic_devices(NetClientState *nc);
+void colo_remove_nic_devices(NetClientState *nc);
+
+#endif
diff --git a/net/Makefile.objs b/net/Makefile.objs
index ec19cb3..73f4a81 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
+common-obj-$(CONFIG_COLO) += colo-nic.o
diff --git a/net/colo-nic.c b/net/colo-nic.c
new file mode 100644
index 0000000..965af49
--- /dev/null
+++ b/net/colo-nic.c
@@ -0,0 +1,73 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#include "include/migration/migration.h"
+#include "migration/migration-colo.h"
+#include "net/net.h"
+#include "net/colo-nic.h"
+#include "qemu/error-report.h"
+
+
+typedef struct nic_device {
+    NetClientState *nc;
+    bool (*support_colo)(NetClientState *nc);
+    int (*configure)(NetClientState *nc, bool up, int side, int index);
+    QTAILQ_ENTRY(nic_device) next;
+    bool is_up;
+} nic_device;
+
+
+
+QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
+static int colo_nic_side = -1;
+
+/*
+* colo_proxy_script usage
+* ./colo_proxy_script master/slave install/uninstall phy_if virt_if index
+*/
+static bool colo_nic_support(NetClientState *nc)
+{
+    return nc && nc->colo_script[0] && nc->colo_nicname[0];
+}
+
+void colo_add_nic_devices(NetClientState *nc)
+{
+    struct nic_device *nic = g_malloc0(sizeof(*nic));
+
+    nic->support_colo = colo_nic_support;
+    nic->configure = NULL;
+    /*
+     * TODO
+     * only support "-netdev tap,colo_scripte..."  options
+     * "-net nic -net tap..." options is not supported
+     */
+    nic->nc = nc;
+
+    QTAILQ_INSERT_TAIL(&nic_devices, nic, next);
+}
+
+void colo_remove_nic_devices(NetClientState *nc)
+{
+    struct nic_device *nic, *next_nic;
+
+    if (!nc || colo_nic_side == -1) {
+        return;
+    }
+
+    QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
+        if (nic->nc == nc) {
+            QTAILQ_REMOVE(&nic_devices, nic, next);
+            g_free(nic);
+        }
+    }
+    colo_nic_side = -1;
+}
diff --git a/net/tap.c b/net/tap.c
index f744dcc..3d24142 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,7 @@
 #include "qemu/error-report.h"
 
 #include "net/tap.h"
+#include "net/colo-nic.h"
 
 #include "net/vhost_net.h"
 
@@ -296,6 +297,8 @@ static void tap_cleanup(NetClientState *nc)
 
     qemu_purge_queued_packets(nc);
 
+    colo_remove_nic_devices(nc);
+
     if (s->down_script[0])
         launch_script(s->down_script, s->down_script_arg, s->fd);
 
@@ -603,7 +606,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                             const char *model, const char *name,
                             const char *ifname, const char *script,
                             const char *downscript, const char *vhostfdname,
-                            int vnet_hdr, int fd)
+                            int vnet_hdr, int fd, bool setup_colo)
 {
     TAPState *s;
     int vhostfd;
@@ -646,6 +649,10 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                  tap->colo_nicname);
     }
 
+    if (setup_colo) {
+        colo_add_nic_devices(nc);
+    }
+
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
@@ -752,7 +759,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (net_init_tap_one(tap, peer, "tap", name, NULL,
                              script, downscript,
-                             vhostfdname, vnet_hdr, fd)) {
+                             vhostfdname, vnet_hdr, fd, true)) {
             return -1;
         }
     } else if (tap->has_fds) {
@@ -798,7 +805,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             if (net_init_tap_one(tap, peer, "tap", name, ifname,
                                  script, downscript,
                                  tap->has_vhostfds ? vhost_fds[i] : NULL,
-                                 vnet_hdr, fd)) {
+                                 vnet_hdr, fd, false)) {
                 return -1;
             }
         }
@@ -822,7 +829,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 
         if (net_init_tap_one(tap, peer, "bridge", name, ifname,
                              script, downscript, vhostfdname,
-                             vnet_hdr, fd)) {
+                             vnet_hdr, fd, false)) {
             close(fd);
             return -1;
         }
@@ -865,7 +872,8 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
             if (net_init_tap_one(tap, peer, "tap", name, ifname,
                                  i >= 1 ? "no" : script,
                                  i >= 1 ? "no" : downscript,
-                                 vhostfdname, vnet_hdr, fd)) {
+                                 vhostfdname, vnet_hdr, fd,
+                                 i == 0)) {
                 close(fd);
                 return -1;
             }
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index c3514c8..7512435 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -12,6 +12,7 @@
 
 #include "migration/migration-colo.h"
 #include "qmp-commands.h"
+#include "net/colo-nic.h"
 
 void colo_init_checkpointer(MigrationState *s)
 {
@@ -32,6 +33,14 @@ bool loadvm_in_colo_state(void)
     return false;
 }
 
+void colo_add_nic_devices(NetClientState *nc)
+{
+}
+
+void colo_remove_nic_devices(NetClientState *nc)
+{
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
     error_setg(errp, "COLO is not supported, please rerun configure"
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (17 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 18/27] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-16 12:03   ` Dr. David Alan Gilbert
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 20/27] COLO NIC : Implement colo nic init/destroy function zhanghailiang
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

Implement colo nic device interface configure()
add a script to configure nic devices:
${QEMU_SCRIPT_DIR}/colo-proxy-script.sh

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
 scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100755 scripts/colo-proxy-script.sh

diff --git a/net/colo-nic.c b/net/colo-nic.c
index 965af49..f8fc35d 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
     return nc && nc->colo_script[0] && nc->colo_nicname[0];
 }
 
+static int launch_colo_script(char *argv[])
+{
+    int pid, status;
+    char *script = argv[0];
+
+    /* try to launch network script */
+    pid = fork();
+    if (pid == 0) {
+        execv(script, argv);
+        _exit(1);
+    } else if (pid > 0) {
+        while (waitpid(pid, &status, 0) != pid) {
+            /* loop */
+        }
+
+        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+            return 0;
+        }
+    }
+    return -1;
+}
+
+static int colo_nic_configure(NetClientState *nc,
+            bool up, int side, int index)
+{
+    int i, argc = 6;
+    char *argv[7], index_str[32];
+    char **parg;
+
+    if (!nc && index <= 0) {
+        error_report("Can not parse colo_script or colo_nicname");
+        return -1;
+    }
+
+    parg = argv;
+    *parg++ = nc->colo_script;
+    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
+    *parg++ = (char *)(up ? "install" : "uninstall");
+    *parg++ = nc->colo_nicname;
+    *parg++ = nc->ifname;
+    sprintf(index_str, "%d", index);
+    *parg++ = index_str;
+    *parg = NULL;
+
+    for (i = 0; i < argc; i++) {
+        if (!argv[i][0]) {
+            error_report("Can not get colo_script argument");
+            return -1;
+        }
+    }
+
+    return launch_colo_script(argv);
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
     struct nic_device *nic = g_malloc0(sizeof(*nic));
 
     nic->support_colo = colo_nic_support;
-    nic->configure = NULL;
+    nic->configure = colo_nic_configure;
     /*
      * TODO
      * only support "-netdev tap,colo_scripte..."  options
diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
new file mode 100755
index 0000000..c7aa53f
--- /dev/null
+++ b/scripts/colo-proxy-script.sh
@@ -0,0 +1,88 @@
+#!/bin/sh
+#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
+#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
+
+side=$1
+action=$2
+phy_if=$3
+virt_if=$4
+index=$5
+br=br1
+failover_br=br0
+
+script_usage()
+{
+    echo -n "usage: ./colo-proxy-script.sh master/slave "
+    echo -e "install/uninstall phy_if virt_if index\n"
+}
+
+master_install()
+{
+    tc qdisc add dev $virt_if root handle 1: prio
+    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+
+    modprobe nf_conntrack_ipv4
+    modprobe xt_PMYCOLO sec_dev=$phy_if
+
+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
+    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
+}
+
+master_uninstall()
+{
+    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+    tc qdisc del dev $virt_if root handle 1: prio
+
+    /usr/local/sbin/iptables -t mangle -F
+    /usr/local/sbin/ip6tables -t mangle -F
+    /usr/local/sbin/arptables -F
+    rmmod xt_PMYCOLO
+}
+
+slave_install()
+{
+    brctl addif $br $phy_if
+    modprobe xt_SECCOLO
+
+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
+}
+
+
+slave_uninstall()
+{
+    brctl delif $br $phy_if
+    brctl delif $br $virt_if
+    brctl addif $failover_br $virt_if
+
+    /usr/local/sbin/iptables -t mangle -F
+    /usr/local/sbin/ip6tables -t mangle -F
+    rmmod xt_SECCOLO
+} 
+
+if [ $# -ne 5 ]; then
+    script_usage
+    exit 1
+fi
+
+if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
+    script_usage
+    exit 2
+fi
+
+if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
+    script_usage
+    exit 3
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 4
+fi
+
+${side}_${action}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 20/27] COLO NIC : Implement colo nic init/destroy function
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (18 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure() zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 21/27] COLO NIC: Some init work related with proxy module zhanghailiang
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

When in colo mode, call colo nic init/destroy function.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c       | 17 +++++++++++
 net/colo-nic.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index d35ee17..40dbcfb 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -14,6 +14,8 @@
 #ifndef COLO_NIC_H
 #define COLO_NIC_H
 
+int colo_proxy_init(int side);
+void colo_proxy_destroy(int side);
 void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
diff --git a/migration/colo.c b/migration/colo.c
index 82459ec..9f8a873 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -14,6 +14,7 @@
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
+#include "net/colo-nic.h"
 
 /* #define DEBUG_COLO */
 
@@ -286,6 +287,12 @@ static void *colo_thread(void *opaque)
     QEMUFile *colo_control = NULL;
     int ret;
 
+    if (colo_proxy_init(COLO_PRIMARY_MODE) != 0) {
+        error_report("Init colo proxy error");
+        goto out;
+    }
+    DPRINTF("proxy init complete\n");
+
     colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
     if (!colo_control) {
         error_report("Open colo_control failed!");
@@ -346,6 +353,8 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
+    colo_proxy_destroy(COLO_PRIMARY_MODE);
+
     return NULL;
 }
 
@@ -414,6 +423,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
+     /* configure the network */
+    if (colo_proxy_init(COLO_SECONDARY_MODE) != 0) {
+        error_report("Init colo proxy error\n");
+        goto out;
+    }
+    DPRINTF("proxy init complete\n");
+
     ctl = qemu_fopen_socket(fd, "wb");
     if (!ctl) {
         error_report("Can't open incoming channel!");
@@ -560,5 +576,6 @@ out:
 
     loadvm_exit_colo();
 
+    colo_proxy_destroy(COLO_SECONDARY_MODE);
     return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index f8fc35d..a4719ce 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -26,6 +26,12 @@ typedef struct nic_device {
 } nic_device;
 
 
+typedef struct colo_proxy {
+    int sockfd;
+    int index;
+} colo_proxy;
+
+static colo_proxy cp_info = {-1, -1};
 
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 static int colo_nic_side = -1;
@@ -93,6 +99,60 @@ static int colo_nic_configure(NetClientState *nc,
     return launch_colo_script(argv);
 }
 
+static int configure_one_nic(NetClientState *nc,
+             bool up, int side, int index)
+{
+    struct nic_device *nic;
+
+    assert(nc);
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        if (nic->nc == nc) {
+            if (!nic->support_colo || !nic->support_colo(nic->nc)
+                || !nic->configure) {
+                return -1;
+            }
+            if (up == nic->is_up) {
+                return 0;
+            }
+
+            if (nic->configure(nic->nc, up, side, index) && up) {
+                return -1;
+            }
+            nic->is_up = up;
+            return 0;
+        }
+    }
+
+    return -1;
+}
+
+static int configure_nic(int side, int index)
+{
+    struct nic_device *nic;
+
+    if (QTAILQ_EMPTY(&nic_devices)) {
+        return -1;
+    }
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        if (configure_one_nic(nic->nc, 1, side, index)) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static void teardown_nic(int side, int index)
+{
+    struct nic_device *nic;
+
+    QTAILQ_FOREACH(nic, &nic_devices, next) {
+        configure_one_nic(nic->nc, 0, side, index);
+    }
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
     struct nic_device *nic = g_malloc0(sizeof(*nic));
@@ -119,9 +179,29 @@ void colo_remove_nic_devices(NetClientState *nc)
 
     QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
         if (nic->nc == nc) {
+            configure_one_nic(nc, 0, colo_nic_side, cp_info.index);
             QTAILQ_REMOVE(&nic_devices, nic, next);
             g_free(nic);
         }
     }
     colo_nic_side = -1;
 }
+
+int colo_proxy_init(int side)
+{
+    int ret = -1;
+
+    ret = configure_nic(side, cp_info.index);
+    if (ret != 0) {
+        error_report("excute colo-proxy-script failed");
+    }
+    colo_nic_side = side;
+    return ret;
+}
+
+void colo_proxy_destroy(int side)
+{
+    teardown_nic(side, cp_info.index);
+    cp_info.index = -1;
+    colo_nic_side = -1;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 21/27] COLO NIC: Some init work related with proxy module
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (19 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 20/27] COLO NIC : Implement colo nic init/destroy function zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 22/27] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

Implement communication protocol with proxy module by using
netlink, and do some init work.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/colo-nic.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

diff --git a/net/colo-nic.c b/net/colo-nic.c
index a4719ce..38d9bf5 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -15,7 +15,19 @@
 #include "net/net.h"
 #include "net/colo-nic.h"
 #include "qemu/error-report.h"
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
 
+#define NETLINK_COLO 28
+
+enum colo_netlink_op {
+    COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+    COLO_CHECKPOINT,
+    COLO_FAILOVER,
+    COLO_PROXY_INIT,
+    COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
 
 typedef struct nic_device {
     NetClientState *nc;
@@ -177,6 +189,12 @@ void colo_remove_nic_devices(NetClientState *nc)
         return;
     }
 
+    /* close netlink socket before cleanup tap device. */
+    if (cp_info.sockfd >= 0) {
+        close(cp_info.sockfd);
+        cp_info.sockfd = -1;
+    }
+
     QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
         if (nic->nc == nc) {
             configure_one_nic(nc, 0, colo_nic_side, cp_info.index);
@@ -187,20 +205,173 @@ void colo_remove_nic_devices(NetClientState *nc)
     colo_nic_side = -1;
 }
 
+static int colo_proxy_send(uint8_t *buff, uint64_t size, int type)
+{
+    struct sockaddr_nl sa;
+    struct nlmsghdr msg;
+    struct iovec iov;
+    struct msghdr mh;
+    int ret;
+
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_pid = 0;
+    sa.nl_groups = 0;
+
+    msg.nlmsg_len = NLMSG_SPACE(0);
+    msg.nlmsg_flags = NLM_F_REQUEST;
+    if (type == COLO_PROXY_INIT) {
+        msg.nlmsg_flags |= NLM_F_ACK;
+    }
+    msg.nlmsg_seq = 0;
+    /* This is untrusty */
+    msg.nlmsg_pid = cp_info.index;
+    msg.nlmsg_type = type;
+
+    iov.iov_base = &msg;
+    iov.iov_len = msg.nlmsg_len;
+
+    mh.msg_name = &sa;
+    mh.msg_namelen = sizeof(sa);
+    mh.msg_iov = &iov;
+    mh.msg_iovlen = 1;
+    mh.msg_control = NULL;
+    mh.msg_controllen = 0;
+    mh.msg_flags = 0;
+
+    ret = sendmsg(cp_info.sockfd, &mh, 0);
+    if (ret <= 0) {
+        error_report("can't send msg to kernel by netlink: %s",
+                     strerror(errno));
+    }
+
+    return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(uint8_t **buff, int flags)
+{
+    struct sockaddr_nl sa;
+    struct iovec iov;
+    struct msghdr mh = {
+        .msg_name = &sa,
+        .msg_namelen = sizeof(sa),
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+    };
+    uint8_t *tmp = g_malloc(16384);
+    uint32_t size = 16384;
+    int64_t len = 0;
+    int ret;
+
+    iov.iov_base = tmp;
+    iov.iov_len = size;
+next:
+   ret = recvmsg(cp_info.sockfd, &mh, flags);
+    if (ret <= 0) {
+        goto out;
+    }
+
+    len += ret;
+    if (mh.msg_flags & MSG_TRUNC) {
+        size += 16384;
+        tmp = g_realloc(tmp, size);
+        iov.iov_base = tmp + len;
+        iov.iov_len = size - len;
+        goto next;
+    }
+
+    *buff = tmp;
+    return len;
+
+out:
+    g_free(tmp);
+    *buff = NULL;
+    return ret;
+}
+
 int colo_proxy_init(int side)
 {
+    int skfd = 0;
+    struct sockaddr_nl sa;
+    struct nlmsghdr *h;
+    struct timeval tv = {0, 500000}; /* timeout for recvmsg from kernel */
+    int i = 1;
     int ret = -1;
+    uint8_t *buff = NULL;
+    int64_t size;
+
+    skfd = socket(PF_NETLINK, SOCK_RAW, NETLINK_COLO);
+    if (skfd < 0) {
+        error_report("can not create a netlink socket: %s", strerror(errno));
+        goto out;
+    }
+    cp_info.sockfd = skfd;
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_groups = 0;
+retry:
+    sa.nl_pid = i++;
+
+    if (i > 10) {
+        error_report("netlink bind error");
+        goto out;
+    }
+
+    ret = bind(skfd, (struct sockaddr *)&sa, sizeof(sa));
+    if (ret < 0 && errno == EADDRINUSE) {
+        error_report("colo index %d has already in used", sa.nl_pid);
+        goto retry;
+    }
+
+    cp_info.index = sa.nl_pid;
+    ret = colo_proxy_send(NULL, 0, COLO_PROXY_INIT);
+    if (ret < 0) {
+        goto out;
+    }
+    setsockopt(cp_info.sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    ret = -1;
+    size = colo_proxy_recv(&buff, 0);
+    /* disable SO_RCVTIMEO */
+    tv.tv_usec = 0;
+    setsockopt(cp_info.sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    if (size < 0) {
+        error_report("Can't recv msg from kernel by netlink: %s",
+                     strerror(errno));
+        goto out;
+    }
+
+    if (size) {
+        h = (struct nlmsghdr *)buff;
+
+        if (h->nlmsg_type == NLMSG_ERROR) {
+            struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
+            if (size - sizeof(*h) < sizeof(*err)) {
+                goto out;
+            }
+            ret = -err->error;
+            if (ret) {
+                goto out;
+            }
+        }
+    }
 
     ret = configure_nic(side, cp_info.index);
     if (ret != 0) {
         error_report("excute colo-proxy-script failed");
     }
     colo_nic_side = side;
+
+out:
+    g_free(buff);
     return ret;
 }
 
 void colo_proxy_destroy(int side)
 {
+    if (cp_info.sockfd >= 0) {
+        close(cp_info.sockfd);
+    }
     teardown_nic(side, cp_info.index);
     cp_info.index = -1;
     colo_nic_side = -1;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 22/27] COLO: Do checkpoint according to the result of net packets comparing
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (20 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 21/27] COLO NIC: Some init work related with proxy module zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 23/27] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

Only do checkpoint, when the VMs' output net packets are inconsistent.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c       | 23 +++++++++++++++++++++++
 net/colo-nic.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 40dbcfb..67c9807 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -19,4 +19,6 @@ void colo_proxy_destroy(int side);
 void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
+int colo_proxy_compare(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 9f8a873..3e13611 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -24,6 +24,12 @@ do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
 #else
 #define DPRINTF(fmt, ...) do {} while (0)
 #endif
+/*
+ * force checkpoint timer: unit ms
+ * this is large because COLO checkpoint will mostly depend on
+ * COLO compare module.
+ */
+#define CHKPOINT_TIMER 10000
 
 enum {
     COLO_READY = 0x46,
@@ -321,6 +327,23 @@ static void *colo_thread(void *opaque)
     DPRINTF("vm resume to run\n");
 
     while (s->state == MIG_STATE_COLO) {
+        int proxy_checkpoint_req;
+
+        /* wait for a colo checkpoint */
+        proxy_checkpoint_req = colo_proxy_compare();
+        if (proxy_checkpoint_req < 0) {
+            goto out;
+        } else if (!proxy_checkpoint_req) {
+            /*
+             * No checkpoint is needed, wait for 1ms and then
+             * check if we need checkpoint again
+             */
+            usleep(1000);
+            continue;
+        } else {
+            DPRINTF("Net packets is not consistent!!!\n");
+        }
+
         /* start a colo checkpoint */
         if (do_colo_transaction(s, colo_control)) {
             goto out;
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 38d9bf5..563d661 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -37,6 +37,9 @@ typedef struct nic_device {
     bool is_up;
 } nic_device;
 
+typedef struct colo_msg {
+    bool is_checkpoint;
+} colo_msg;
 
 typedef struct colo_proxy {
     int sockfd;
@@ -376,3 +379,41 @@ void colo_proxy_destroy(int side)
     cp_info.index = -1;
     colo_nic_side = -1;
 }
+/*
+do checkpoint: return 1
+error: return -1
+do not checkpoint: return 0
+*/
+int colo_proxy_compare(void)
+{
+    uint8_t *buff;
+    int64_t size;
+    struct nlmsghdr *h;
+    struct colo_msg *m;
+    int ret = -1;
+
+    size = colo_proxy_recv(&buff, MSG_DONTWAIT);
+
+    /* timeout, return no checkpoint message. */
+    if (size <= 0) {
+        return 0;
+    }
+
+    h = (struct nlmsghdr *) buff;
+
+    if (h->nlmsg_type == NLMSG_ERROR) {
+        goto out;
+    }
+
+    if (h->nlmsg_len < NLMSG_LENGTH(sizeof(*m))) {
+        goto out;
+    }
+
+    m = NLMSG_DATA(h);
+
+    ret = m->is_checkpoint ? 1 : 0;
+
+out:
+    g_free(buff);
+    return ret;
+}
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 23/27] COLO: Improve checkpoint efficiency by do additional periodic checkpoint
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (21 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 22/27] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

Besides normal checkpoint which according to the result of net packets
comparing, We do additional checkpoint periodically, it will reduce the number
of dirty pages when do one checkpoint, if we don't do checkpoint for a long
time (This is a special case when the net packets is always consistent).

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 3e13611..579aabf 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "qemu/error-report.h"
@@ -290,6 +291,8 @@ out:
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    int64_t current_time;
     QEMUFile *colo_control = NULL;
     int ret;
 
@@ -338,8 +341,14 @@ static void *colo_thread(void *opaque)
              * No checkpoint is needed, wait for 1ms and then
              * check if we need checkpoint again
              */
-            usleep(1000);
-            continue;
+            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+            if (current_time - start_time < CHKPOINT_TIMER) {
+                if (failover_request_is_set()) {
+                    goto out;
+                }
+                usleep(1000);
+                continue;
+            }
         } else {
             DPRINTF("Net packets is not consistent!!!\n");
         }
@@ -348,6 +357,8 @@ static void *colo_thread(void *opaque)
         if (do_colo_transaction(s, colo_control)) {
             goto out;
         }
+
+        start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     }
 
 out:
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (22 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 23/27] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-03-05 17:12   ` Dr. David Alan Gilbert
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 25/27] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gao feng, stefanha, pbonzini

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 include/net/colo-nic.h |  3 ++-
 migration/colo.c       | 22 ++++++++++++++++++----
 net/colo-nic.c         | 19 +++++++++++++++++++
 3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 67c9807..ddc21cd 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -20,5 +20,6 @@ void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
 
 int colo_proxy_compare(void);
-
+int colo_proxy_failover(void);
+int colo_proxy_checkpoint(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 579aabf..874971c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -94,6 +94,11 @@ static void slave_do_failover(void)
         ;
     }
 
+    if (colo_proxy_failover() != 0) {
+        error_report("colo proxy failed to do failover");
+    }
+    colo_proxy_destroy(COLO_SECONDARY_MODE);
+
     colo = NULL;
 
     if (!autostart) {
@@ -115,7 +120,7 @@ static void master_do_failover(void)
     if (!colo_runstate_is_stopped()) {
         vm_stop_force_state(RUN_STATE_COLO);
     }
-
+    colo_proxy_destroy(COLO_PRIMARY_MODE);
     if (s->state != MIG_STATE_ERROR) {
         migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
     }
@@ -245,6 +250,11 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
 
     qemu_fflush(trans);
 
+    ret = colo_proxy_checkpoint();
+    if (ret < 0) {
+        goto out;
+    }
+
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
@@ -387,8 +397,6 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
-    colo_proxy_destroy(COLO_PRIMARY_MODE);
-
     return NULL;
 }
 
@@ -508,6 +516,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
             goto out;
         }
 
+        ret = colo_proxy_checkpoint();
+        if (ret < 0) {
+                goto out;
+        }
+        DPRINTF("proxy begin to do checkpoint\n");
+
         ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
         if (ret < 0) {
             goto out;
@@ -584,6 +598,7 @@ out:
         * just kill slave
         */
         error_report("SVM is going to exit!");
+        colo_proxy_destroy(COLO_SECONDARY_MODE);
         exit(1);
     } else {
         /* if we went here, means master may dead, we are doing failover */
@@ -610,6 +625,5 @@ out:
 
     loadvm_exit_colo();
 
-    colo_proxy_destroy(COLO_SECONDARY_MODE);
     return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 563d661..02a454d 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -379,6 +379,25 @@ void colo_proxy_destroy(int side)
     cp_info.index = -1;
     colo_nic_side = -1;
 }
+
+int colo_proxy_failover(void)
+{
+    if (colo_proxy_send(NULL, 0, COLO_FAILOVER) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+int colo_proxy_checkpoint(void)
+{
+    if (colo_proxy_send(NULL, 0, COLO_CHECKPOINT) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
 do checkpoint: return 1
 error: return -1
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 25/27] COLO: Disable qdev hotplug when VM is in COLO mode
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (23 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 26/27] COLO: Implement shutdown checkpoint zhanghailiang
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

COLO do not support qdev hotplug migration, disable it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 874971c..aadecc5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "hw/qdev-core.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
@@ -301,6 +302,7 @@ out:
 static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    int dev_hotplug = qdev_hotplug;
     int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t current_time;
     QEMUFile *colo_control = NULL;
@@ -318,6 +320,8 @@ static void *colo_thread(void *opaque)
         goto out;
     }
 
+    qdev_hotplug = 0;
+
     /*
      * Wait for slave finish loading vm states and enter COLO
      * restore.
@@ -397,6 +401,8 @@ out:
     qemu_bh_schedule(s->cleanup_bh);
     qemu_mutex_unlock_iothread();
 
+    qdev_hotplug = dev_hotplug;
+
     return NULL;
 }
 
@@ -458,10 +464,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
     struct colo_incoming *colo_in = opaque;
     QEMUFile *f = colo_in->file;
     int fd = qemu_get_fd(f);
+    int dev_hotplug = qdev_hotplug;
     QEMUFile *ctl = NULL, *fb = NULL;
     int ret;
     uint64_t total_size;
 
+    qdev_hotplug = 0;
+
     colo = qemu_coroutine_self();
     assert(colo != NULL);
 
@@ -625,5 +634,7 @@ out:
 
     loadvm_exit_colo();
 
+    qdev_hotplug = dev_hotplug;
+
     return NULL;
 }
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 26/27] COLO: Implement shutdown checkpoint
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (24 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 25/27] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 27/27] COLO: Add block replication into colo process zhanghailiang
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Lai Jiangshan

For SVM, we forbid it shutdown directly when in COLO mode,
FOR PVM's shutdown, we should do some work to ensure the consistent action
between PVM and SVM.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/migration/migration-colo.h |  1 +
 include/sysemu/sysemu.h            |  3 +++
 migration/colo-comm.c              |  5 +++++
 migration/colo.c                   | 19 +++++++++++++++++++
 vl.c                               | 23 +++++++++++++++++++++--
 5 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
index 3bdd1ae..5747c0d 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -43,6 +43,7 @@ void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
 
+bool vm_in_colo_state(void);
 int get_colo_mode(void);
 
 /* ram cache */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 748d059..045d11d 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -51,6 +51,8 @@ typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -58,6 +60,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 57bc6cd..90c109b 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -32,6 +32,11 @@ static void colo_info_save(QEMUFile *f, void *opaque)
 }
 
 /* restore */
+bool vm_in_colo_state(void)
+{
+    return migrate_in_colo_state() || loadvm_in_colo_state();
+}
+
 int get_colo_mode(void)
 {
     if (migrate_in_colo_state()) {
diff --git a/migration/colo.c b/migration/colo.c
index aadecc5..d5baf87 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -66,6 +66,8 @@ enum {
     COLO_CHECKPOINT_SEND,
     COLO_CHECKPOINT_RECEIVED,
     COLO_CHECKPOINT_LOADED,
+
+    COLO_GUEST_SHUTDOWN
 };
 
 static QEMUBH *colo_bh;
@@ -284,6 +286,13 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
     }
     DPRINTF("got COLO_CHECKPOINT_LOADED\n");
 
+    if (colo_shutdown_requested) {
+        colo_ctl_put(s->file, COLO_GUEST_SHUTDOWN);
+        qemu_fflush(s->file);
+        colo_shutdown_requested = 0;
+        qemu_system_shutdown_request_core();
+    }
+
     ret = 0;
     /* resume master */
     qemu_mutex_lock_iothread();
@@ -454,6 +463,16 @@ static int slave_wait_new_checkpoint(QEMUFile *f)
     switch (cmd) {
     case COLO_CHECKPOINT_NEW:
         return 0;
+    case COLO_GUEST_SHUTDOWN:
+        qemu_mutex_lock_iothread();
+        qemu_system_shutdown_request_core();
+        qemu_mutex_unlock_iothread();
+        /* the main thread will exit and termiante the whole
+        * process, do we need some cleanup?
+        */
+        for (;;) {
+            ;
+        }
     default:
         return -1;
     }
diff --git a/vl.c b/vl.c
index aed26c1..74ffb57 100644
--- a/vl.c
+++ b/vl.c
@@ -1528,6 +1528,8 @@ static NotifierList wakeup_notifiers =
     NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
+int colo_shutdown_requested;
+
 int qemu_shutdown_requested_get(void)
 {
     return shutdown_requested;
@@ -1644,6 +1646,10 @@ void qemu_system_reset(bool report)
 void qemu_system_reset_request(void)
 {
     if (no_reboot) {
+        if (vm_in_colo_state()) {
+            colo_shutdown_requested = 1;
+            return;
+        }
         shutdown_requested = 1;
     } else {
         reset_requested = 1;
@@ -1712,13 +1718,26 @@ void qemu_system_killed(int signal, pid_t pid)
     qemu_system_shutdown_request();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 {
-    trace_qemu_system_shutdown_request();
     shutdown_requested = 1;
     qemu_notify_event();
 }
 
+void qemu_system_shutdown_request(void)
+{
+    trace_qemu_system_shutdown_request();
+    /*
+    * if in colo mode, we need do some significant work before respond to the
+    * shutdown request.
+    */
+    if (vm_in_colo_state()) {
+        colo_shutdown_requested = 1;
+        return;
+    }
+    qemu_system_shutdown_request_core();
+}
+
 static void qemu_system_powerdown(void)
 {
     qapi_event_send_powerdown(&error_abort);
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Qemu-devel] [PATCH RFC v3 27/27] COLO: Add block replication into colo process
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (25 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 26/27] COLO: Implement shutdown checkpoint zhanghailiang
@ 2015-02-12  3:17 ` zhanghailiang
  2015-02-16 13:11 ` [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-12  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

Make sure master start block replication after slave's block replication started

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 migration/colo.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 89 insertions(+), 5 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index d5baf87..042dec8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,8 @@
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
 #include "net/colo-nic.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
 
 /* #define DEBUG_COLO */
 
@@ -82,6 +84,66 @@ static bool colo_runstate_is_stopped(void)
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static int blk_start_replication(bool primary)
+{
+    int mode = primary ? COLO_PRIMARY_MODE : COLO_SECONDARY_MODE;
+    BlockBackend *blk, *temp;
+    int ret = 0;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk)) {
+            continue;
+        }
+        ret = bdrv_start_replication(blk_bs(blk), mode);
+        if (ret) {
+            return 0;
+        }
+    }
+
+    if (ret < 0) {
+        for (temp = blk_next(NULL); temp != blk; temp = blk_next(temp)) {
+            bdrv_stop_replication(blk_bs(temp));
+        }
+    }
+
+    return ret;
+}
+
+static int blk_do_checkpoint(void)
+{
+    BlockBackend *blk;
+    int ret = 0;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk)) {
+            continue;
+        }
+
+        if (bdrv_do_checkpoint(blk_bs(blk))) {
+            ret = -1;
+        }
+    }
+
+    return ret;
+}
+
+static int blk_stop_replication(void)
+{
+    BlockBackend *blk;
+    int ret = 0;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (blk_is_read_only(blk)) {
+            continue;
+        }
+        if (bdrv_stop_replication(blk_bs(blk))) {
+            ret = -1;
+        }
+    }
+
+    return ret;
+}
+
 /*
  * there are two way to entry this function
  * 1. From colo checkpoint incoming thread, in this case
@@ -101,6 +163,7 @@ static void slave_do_failover(void)
         error_report("colo proxy failed to do failover");
     }
     colo_proxy_destroy(COLO_SECONDARY_MODE);
+    blk_stop_replication();
 
     colo = NULL;
 
@@ -128,6 +191,8 @@ static void master_do_failover(void)
         migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
     }
 
+    blk_stop_replication();
+
     vm_start();
 }
 
@@ -258,6 +323,9 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
         goto out;
     }
 
+    /* we call this api although this may do nothing on primary side */
+    blk_do_checkpoint();
+
     ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
     if (ret < 0) {
         goto out;
@@ -347,6 +415,12 @@ static void *colo_thread(void *opaque)
         goto out;
     }
 
+    /* start block replication */
+    ret = blk_start_replication(true);
+    if (ret) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
@@ -508,17 +582,24 @@ void *colo_process_incoming_checkpoints(void *opaque)
 
     create_and_init_ram_cache();
 
-    ret = colo_ctl_put(ctl, COLO_READY);
-    if (ret < 0) {
-        goto out;
-    }
-
     colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
     if (colo_buffer == NULL) {
         error_report("Failed to allocate colo buffer!");
         goto out;
     }
 
+    /* start block replication */
+    ret = blk_start_replication(false);
+    if (ret) {
+        goto out;
+    }
+    DPRINTF("finish block replication\n");
+
+    ret = colo_ctl_put(ctl, COLO_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     /* in COLO mode, slave is runing, so start the vm */
     vm_start();
@@ -593,6 +674,9 @@ void *colo_process_incoming_checkpoints(void *opaque)
         vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
+        /* discard colo disk buffer */
+        blk_do_checkpoint();
+
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
         if (ret < 0) {
             goto out;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure() zhanghailiang
@ 2015-02-16 12:03   ` Dr. David Alan Gilbert
  2015-02-25  3:44     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-16 12:03 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, Gao feng, stefanha, pbonzini

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Implement colo nic device interface configure()
> add a script to configure nic devices:
> ${QEMU_SCRIPT_DIR}/colo-proxy-script.sh

Do you have some more documentation of the new colo-proxy?  I've
been reading the kernel module source and I can see that it's
a nice idea to do the sequence number adjustment on the host,
that reduces the need to modify the guest kernel; I was trying to
figure out how you synchronise the master/slave idea of sequence numbers -
is that purely from the 'ack' that's duplicated back to the secondary?
If you were unlucky and the 'ack' packet was lost on the duplicated
link from the primary to secondary how would you recover?
What about TCP connections setup before colo was activated?

The other thought is that passing the 'sec_dev' as a module parameter
gives you an artificial limitation; it forces all of the pairs
to be between the same pair of hosts.   If the 'sec_dev' was a parameter
to the connection then you could have different slaves associated with
each guest on the primary host.

Dave
P.S. You probably need to clean the debug messages up in the kernel module!

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
>  scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 143 insertions(+), 1 deletion(-)
>  create mode 100755 scripts/colo-proxy-script.sh
> 
> diff --git a/net/colo-nic.c b/net/colo-nic.c
> index 965af49..f8fc35d 100644
> --- a/net/colo-nic.c
> +++ b/net/colo-nic.c
> @@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
>      return nc && nc->colo_script[0] && nc->colo_nicname[0];
>  }
>  
> +static int launch_colo_script(char *argv[])
> +{
> +    int pid, status;
> +    char *script = argv[0];
> +
> +    /* try to launch network script */
> +    pid = fork();
> +    if (pid == 0) {
> +        execv(script, argv);
> +        _exit(1);
> +    } else if (pid > 0) {
> +        while (waitpid(pid, &status, 0) != pid) {
> +            /* loop */
> +        }
> +
> +        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
> +            return 0;
> +        }
> +    }
> +    return -1;
> +}
> +
> +static int colo_nic_configure(NetClientState *nc,
> +            bool up, int side, int index)
> +{
> +    int i, argc = 6;
> +    char *argv[7], index_str[32];
> +    char **parg;
> +
> +    if (!nc && index <= 0) {
> +        error_report("Can not parse colo_script or colo_nicname");
> +        return -1;
> +    }
> +
> +    parg = argv;
> +    *parg++ = nc->colo_script;
> +    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
> +    *parg++ = (char *)(up ? "install" : "uninstall");
> +    *parg++ = nc->colo_nicname;
> +    *parg++ = nc->ifname;
> +    sprintf(index_str, "%d", index);
> +    *parg++ = index_str;
> +    *parg = NULL;
> +
> +    for (i = 0; i < argc; i++) {
> +        if (!argv[i][0]) {
> +            error_report("Can not get colo_script argument");
> +            return -1;
> +        }
> +    }
> +
> +    return launch_colo_script(argv);
> +}
> +
>  void colo_add_nic_devices(NetClientState *nc)
>  {
>      struct nic_device *nic = g_malloc0(sizeof(*nic));
>  
>      nic->support_colo = colo_nic_support;
> -    nic->configure = NULL;
> +    nic->configure = colo_nic_configure;
>      /*
>       * TODO
>       * only support "-netdev tap,colo_scripte..."  options
> diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
> new file mode 100755
> index 0000000..c7aa53f
> --- /dev/null
> +++ b/scripts/colo-proxy-script.sh
> @@ -0,0 +1,88 @@
> +#!/bin/sh
> +#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
> +#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
> +
> +side=$1
> +action=$2
> +phy_if=$3
> +virt_if=$4
> +index=$5
> +br=br1
> +failover_br=br0
> +
> +script_usage()
> +{
> +    echo -n "usage: ./colo-proxy-script.sh master/slave "
> +    echo -e "install/uninstall phy_if virt_if index\n"
> +}
> +
> +master_install()
> +{
> +    tc qdisc add dev $virt_if root handle 1: prio
> +    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +
> +    modprobe nf_conntrack_ipv4
> +    modprobe xt_PMYCOLO sec_dev=$phy_if
> +
> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> +    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
> +}
> +
> +master_uninstall()
> +{
> +    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> +    tc qdisc del dev $virt_if root handle 1: prio
> +
> +    /usr/local/sbin/iptables -t mangle -F
> +    /usr/local/sbin/ip6tables -t mangle -F
> +    /usr/local/sbin/arptables -F
> +    rmmod xt_PMYCOLO
> +}
> +
> +slave_install()
> +{
> +    brctl addif $br $phy_if
> +    modprobe xt_SECCOLO
> +
> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> +}
> +
> +
> +slave_uninstall()
> +{
> +    brctl delif $br $phy_if
> +    brctl delif $br $virt_if
> +    brctl addif $failover_br $virt_if
> +
> +    /usr/local/sbin/iptables -t mangle -F
> +    /usr/local/sbin/ip6tables -t mangle -F
> +    rmmod xt_SECCOLO
> +} 
> +
> +if [ $# -ne 5 ]; then
> +    script_usage
> +    exit 1
> +fi
> +
> +if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
> +    script_usage
> +    exit 2
> +fi
> +
> +if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
> +    script_usage
> +    exit 3
> +fi
> +
> +if [ $index -lt 0 ] || [ $index -gt 100 ]; then
> +    echo "index overflow"
> +    exit 4
> +fi
> +
> +${side}_${action}
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (26 preceding siblings ...)
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 27/27] COLO: Add block replication into colo process zhanghailiang
@ 2015-02-16 13:11 ` Dr. David Alan Gilbert
  2015-02-25  5:17   ` Gao feng
  2015-02-24 11:08 ` Dr. David Alan Gilbert
  2015-02-24 20:13 ` Dr. David Alan Gilbert
  29 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-16 13:11 UTC (permalink / raw)
  To: zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, dgilbert, stefanha,
	pbonzini, peter.huangpeng

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> This is the 3th version of COLO, it is only COLO frame part, include: VM checkpoint,
> failover, proxy API, block replication API, not include block replication.
> The block part has been sent by wencongyang:
> '[RFC PATCH 00/14] Block replication for continuous checkpoints'
> 
> You can get the integrated qemu colo patches from github:
> https://github.com/coloft/qemu/commits/colo-v1.0
> 
> Compared with the previous version, we have realized all parts of COLO frame, 
> and it is works now.
> 
> The main change since last version is, we use colo proxy mode instead of
> colo agent, they are all used for network packets compare, but proxy is more
> efficient, it is based on netfilter.
> Another modification is we implement new block replication scheme, 
> you can get more info from wencongyang's block patch series 
> 
> If you don't know about COLO, please refer to below link for detailed 
> information.
> 
> The idea is presented in Xen summit 2012, and 2013,
> and academia paper in SOCC 2013. It's also presented in KVM forum in 2013:
> http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
> 
> Previous posted RFC proposal:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> 
> The below is the detail about test COLO, you can also get this info
> from http://wiki.qemu.org/Features/COLO.
> * Hardware requirements
> There is at least one directly connected nic to forward the network requests
> from client to secondary VM. The directly connected nic must not be used by
> any other purpose.
> 
> * Network link topology
> =================================normal ======================================
>                                 +--------+
>                                 |client  |
>    master                       +----+---+                    slave
> -------------------------+           |            + -------------------------+
>    PVM                   |           +            |                          |
> +-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
> |guest  |     +---+-+    |                        |       +---+-+            |
> |     [tap0]--+ br0 |    |                        |       | br0 |            |
> |       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+      SVM   |
> +-------+                |                        |    |            +-------+|
>                          |                        |    |  +-----+   | guest ||
>                        [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
>                          |                        |       +-----+   |       ||
>                          |                        |                 +-------+|
> -------------------------+                        +--------------------------+
> e.g.
> master:
> br0: 192.168.0.33
> eth1: 192.168.1.33
> eth2: 192.168.2.33
> 
> slave:
> br0: 192.168.0.88
> br1: no ip address
> eth1: 192.168.1.88
> eth2: 192.168.2.88
> (Actually, you can also use eth0 as checkpoint channel)
> Note: in normal, SVM will always be linked to br1 like above until
> failover.

Why does eth1 need IP addresses?  Isn't the traffic on eth1 just a copy of the
traffic on eth0 for the proxy modules to compare/forward?
Wouldn't any ARP traffic or the like generated from having IPs on those
interfaces confuse the comparison process?
(Similarly for the bridges, is it best to turn off STP and the like
to stop the bridges adding extra packets on eth1/eth0 ?)

Dave

> * Test environment prepare:
> 1. Set Up the Bridge and network environment
> You must setup you network environment like above picture,
> In master, setup a bridge br0, using command brctl, like:
> # ifconfig eth0 down
> # ifconfig eth0 0.0.0.0
> # brctl addbr br0
> # brctl addif br0 eth0
> # ifconfig br0 192.168.0.33 netmask 255.255.255.0
> # ifconfig eth0 up
> In slave, setup two bridge br0, br1, commands are same with above,
> please note that br1 is linked to eth1(the forward nic).
> 
> 2.Qemu-ifup
> We need a script to bring up the TAP interface.
> You can find this info from http://en.wikibooks.org/wiki/QEMU/Networking.
> Master:
> root@master# cat /etc/qemu-ifup
> #!/bin/sh
> switch=br0
> if [ -n "$1" ]; then
>         ip link set $1 up
>         brctl addif ${switch} $1
> fi
> Slave:
> root@slave # cat /etc/qemu-ifup
> #!/bin/sh
> switch=br1  #in primary, switch is br0. in secondary switch is br1
> if [ -n "$1" ]; then
>         ip link set $1 up
>         brctl addif ${switch} $1
> fi 
> 
> 3. Prepare host kernel
> colo-proxy kernel module need cooperate with linux kernel.
> You should put a kernel patch 'colo-patch-for-kernel.patch'
> (It's based on linux kernel-3.19) which you can get from 
> https://github.com/gao-feng/colo-proxy.git
> and then compile kernel and intall the new kernel.
> 
> 4. Proxy module
> proxy module is used for network packets compare, you can also get the lastest
> version from: https://github.com/gao-feng/colo-proxy.git.
> You can compile and install it by using command 'make' && 'make install'.
> 
> 5. Modified iptables
> We have add a new rule to iptables command, so please get the patch from
> https://github.com/gao-feng/colo-proxy/blob/master/COLO-library_for_iptables-1.4.21.patch
> It is based on version 1.4.21.
> 
> 6. Qemu colo
> Checkout the latest colo branch from
> https://github.com/coloft/qemu/commits/colo-v1.0
> configure and make: 
> # ./configure --target-list=x86_64-softmmu --enable-colo --enable-quorum 
> # make
> 
> * Test steps:
> 1. load module
> # modprobe nf_conntrack_colo (Other colo module will be automatically loaded by
> script colo-proxy-script.sh)
> # modprobe xt_mark
> # modprobe kvm-intel
> 
> 2. startup qemu
> master:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=quorum,read-pattern=first,children.0.file.filename=suse11_3.img,children.0.driver=raw,children.1.file.driver=nbd+colo,children.1.file.host=192.168.2.88,children.1.file.port=8889,children.1.file.export=colo1,children.1.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S
> slave:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=blkcolo,export=colo1,backing.file.filename=suse11_3.img,backing.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8888
> 
> 3. On Secondary VM's QEMU monitor, run
> (qemu) nbd_server_start 192.168.2.88:8889 
> 
> 4.on Primary VM's QEMU monitor, run following command:
> (qemu) migrate_set_capability colo on
> (qemu) migrate tcp:192.168.2.88:8888
> 
> 5. done
> You will see two runing VMs, whenever you make changes to PVM, SVM
> will be synced to PVM's state.
> 
> 6. failover test:
> You can kill SVM (PVM) and run 'colo_lost_heartbeat' in PVM's (SVM's) monitor
> at the same time, then PVM (SVM) will failover and client will not feel this
> change.
> 
> It is still a framework, far away from commercial use,
> so any comments/feedbacks are warmly welcomed ;)
> 
> PS: 
> We (huawei) have cooperated with fujitsu on COLO work,
> and we work mainly on COLO frame and fujitsu will focus on COLO block.
> 
> TODO list:
> 1) Optimize the process of checkpoint, shorten the time-consuming
> 2) Add more debug/stat info 
> 3) Strengthen failover 
> 4) The capability of continuous FT
> 
> v3:
> - use proxy instead of colo agent to compare network packets
> - add block replication
> - Optimize failover disposal
> - handle shutdown
> 
> v2:
> - use QEMUSizedBuffer/QEMUFile as COLO buffer
> - colo support is enabled by default
> - add nic replication support
> - addressed comments from Eric Blake and Dr. David Alan Gilbert
> 
> v1:
> - implement the frame of colo
> 
> 
> zhanghailiang (27):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'colo' to migration
>   COLO: migrate colo related info to slave
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   migration: Don't send vm description in COLO mode
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save VM state to slave when do checkpoint
>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>   COLO VMstate: Load VM state into qsb before restore it
>   COLO RAM: Flush cached RAM into SVM's memory
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Implement COLO master/slave failover work
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>   COLO NIC: Implement colo nic device interface configure()
>   COLO NIC : Implement colo nic init/destroy function
>   COLO NIC: Some init work related with proxy module
>   COLO: Do checkpoint according to the result of net packets comparing
>   COLO: Improve checkpoint efficiency by do additional periodic
>     checkpoint
>   COLO NIC: Implement NIC checkpoint and failover
>   COLO: Disable qdev hotplug when VM is in COLO mode
>   COLO: Implement shutdown checkpoint
>   COLO: Add block replication into colo process
> 
>  arch_init.c                            | 196 ++++++++-
>  configure                              |  14 +
>  hmp-commands.hx                        |  15 +
>  hmp.c                                  |   7 +
>  hmp.h                                  |   1 +
>  include/exec/cpu-all.h                 |   1 +
>  include/migration/migration-colo.h     |  57 +++
>  include/migration/migration-failover.h |  22 +
>  include/migration/migration.h          |  14 +
>  include/migration/qemu-file.h          |   3 +-
>  include/net/colo-nic.h                 |  25 ++
>  include/net/net.h                      |   4 +
>  include/sysemu/sysemu.h                |   3 +
>  migration/Makefile.objs                |   2 +
>  migration/colo-comm.c                  |  81 ++++
>  migration/colo-failover.c              |  48 +++
>  migration/colo.c                       | 743 +++++++++++++++++++++++++++++++++
>  migration/migration.c                  |  74 +++-
>  migration/qemu-file-buf.c              |  57 +++
>  net/Makefile.objs                      |   1 +
>  net/colo-nic.c                         | 438 +++++++++++++++++++
>  net/tap.c                              |  45 +-
>  qapi-schema.json                       |  27 +-
>  qemu-options.hx                        |  10 +-
>  qmp-commands.hx                        |  19 +
>  savevm.c                               |  10 +-
>  scripts/colo-proxy-script.sh           |  88 ++++
>  stubs/Makefile.objs                    |   1 +
>  stubs/migration-colo.c                 |  49 +++
>  vl.c                                   |  36 +-
>  30 files changed, 2047 insertions(+), 44 deletions(-)
>  create mode 100644 include/migration/migration-colo.h
>  create mode 100644 include/migration/migration-failover.h
>  create mode 100644 include/net/colo-nic.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 net/colo-nic.c
>  create mode 100755 scripts/colo-proxy-script.sh
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration zhanghailiang
@ 2015-02-16 21:57   ` Eric Blake
  2015-02-25  9:19     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-02-16 21:57 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: Lai Jiangshan, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On 02/11/2015 08:16 PM, zhanghailiang wrote:
> This capability allows Primary VM (PVM) to be continuously checkpointed
> to secondary VM.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/migration/migration.h |  1 +
>  migration/migration.c         | 15 +++++++++++++++
>  qapi-schema.json              |  5 ++++-
>  3 files changed, 20 insertions(+), 1 deletion(-)
> 

> +++ b/migration/migration.c
> @@ -276,6 +276,15 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>      }
>  
>      for (cap = params; cap; cap = cap->next) {
> +#ifndef CONFIG_COLO
> +        if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
> +            cap->value->state) {
> +            error_setg(errp, "COLO is not currently supported, please"
> +                             " configure with --enable-colo option in order to"
> +                             " support COLO feature");
> +            continue;
> +        }
> +#endif
>          s->enabled_capabilities[cap->value->capability] = cap->value->state;
>      }

Yuck.  This means that probing whether colo is supported requires a
usage test (try setting the capability with migrate-set-capabilities and
see if it fails) instead of a query test (list the current capabilities;
if colo is in the set then it is supported).  Can you figure out a way
to avoid exposing the colo capability if !CONFIG_COLO, so that
query-migate-capabilities is sufficient to learn if colo is supported?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave zhanghailiang
@ 2015-02-16 23:20   ` Eric Blake
  2015-02-25  6:21     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-02-16 23:20 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: Lai Jiangshan, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]

On 02/11/2015 08:16 PM, zhanghailiang wrote:
> We can know if we should go into COLO mode by the info that
> has been migrated from PVM.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  include/migration/migration-colo.h | 21 ++++++++++++++
>  migration/Makefile.objs            |  1 +
>  migration/colo-comm.c              | 56 ++++++++++++++++++++++++++++++++++++++
>  vl.c                               |  5 +++-
>  4 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/migration-colo.h
>  create mode 100644 migration/colo-comm.c

> +
> +/* #define DEBUG_COLO */
> +
> +#ifdef DEBUG_COLO
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stdout, "COLO: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif

This is not very good (that is, it is a great way to write stale
debugging statements that tend to bit-rot, and later fail to compile
when you turn debug on).  Better is a usage pattern that enforces that
the debug compiles but has no impact.  For example, see how block/ssh.c
defines DPRINTF.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration
  2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration zhanghailiang
@ 2015-02-16 23:27   ` Eric Blake
  2015-02-25  6:43     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-02-16 23:27 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: Lai Jiangshan, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, Gonglei, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On 02/11/2015 08:16 PM, zhanghailiang wrote:
> Add a migrate state: MIG_STATE_COLO, enter this migration state
> after the first live migration successfully finished.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/migration/migration-colo.h |  2 ++
>  include/migration/migration.h      | 13 +++++++
>  migration/Makefile.objs            |  1 +
>  migration/colo.c                   | 72 ++++++++++++++++++++++++++++++++++++++
>  migration/migration.c              | 38 +++++++++++---------
>  stubs/Makefile.objs                |  1 +
>  stubs/migration-colo.c             | 17 +++++++++
>  7 files changed, 128 insertions(+), 16 deletions(-)
>  create mode 100644 migration/colo.c
>  create mode 100644 stubs/migration-colo.c
> 

> +++ b/include/migration/migration.h
> @@ -65,6 +65,19 @@ struct MigrationState
>      int64_t dirty_sync_count;
>  };
>  
> +enum {
> +    MIG_STATE_ERROR = -1,
> +    MIG_STATE_NONE,
> +    MIG_STATE_SETUP,
> +    MIG_STATE_CANCELLING,
> +    MIG_STATE_CANCELLED,
> +    MIG_STATE_ACTIVE,
> +    MIG_STATE_COLO,
> +    MIG_STATE_COMPLETED,
> +};

Is the new state intended to be user-visible?  If so, wouldn't it be
better to expose this enum via qapi-schema.json?


> +
> +/* #define DEBUG_COLO */
> +
> +#ifdef DEBUG_COLO
> +#define DPRINTF(fmt, ...) \
> +do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) do {} while (0)
> +#endif
> +

Same comment as in 3/27 about avoiding bit-rotting debug statements.  Or
even better,...

> +static QEMUBH *colo_bh;
> +
> +static void *colo_thread(void *opaque)
> +{
> +    MigrationState *s = opaque;
> +
> +    qemu_mutex_lock_iothread();
> +    vm_start();
> +    qemu_mutex_unlock_iothread();
> +    DPRINTF("vm resume to run\n");

...why not add tracepoints instead of using DPRINTF?


> @@ -227,6 +218,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  
>          get_xbzrle_cache_stats(info);
>          break;
> +    case MIG_STATE_COLO:
> +        info->has_status = true;
> +        info->status = g_strdup("colo");
> +        /* TODO: display COLO specific informations(checkpoint info etc.),*/
> +        break;

Uggh.  We REALLY need to fix MigrationInfo to convert 'status' to use an
enum type, instead of an open-coded 'str' (such a conversion is
backwards compatible, and better documented).  Then it would be more
obvious that you are adding an enum value.  Doing the conversion would
be a good prerequisite patch.

s/informations(checkpoint info etc.),/information (checkpoint info etc.)/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover zhanghailiang
@ 2015-02-16 23:47   ` Eric Blake
  2015-02-25  7:04     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-02-16 23:47 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong, dgilbert,
	peter.huangpeng, stefanha, pbonzini, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]

On 02/11/2015 08:17 PM, zhanghailiang wrote:
> We leave users to use whatever heartbeat solution they want, if the heartbeat
> is lost, or other errors they detect, they can use command
> 'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
> accordingly.
> 
> For example,
> If send the command to PVM, Primary will exit COLO mode, and takeover,
> if to Secondary, Secondary will do failover work and at last takeover server.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  hmp-commands.hx                        | 15 ++++++++++++++
>  hmp.c                                  |  7 +++++++
>  hmp.h                                  |  1 +
>  include/migration/migration-colo.h     |  1 +
>  include/migration/migration-failover.h | 20 ++++++++++++++++++
>  migration/Makefile.objs                |  2 +-
>  migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
>  migration/colo.c                       |  1 +
>  qapi-schema.json                       |  9 ++++++++
>  qmp-commands.hx                        | 19 +++++++++++++++++
>  stubs/migration-colo.c                 |  8 +++++++
>  11 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 include/migration/migration-failover.h
>  create mode 100644 migration/colo-failover.c
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx

> +++ b/qapi-schema.json
> @@ -543,6 +543,15 @@
>  { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>  
>  ##
> +# @colo-lost-heartbeat
> +#
> +# Tell COLO that heartbeat is lost
> +#
> +# Since: 2.3
> +##
> +{ 'command': 'colo-lost-heartbeat' }

Okay...

> +
> +##
>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index a85d847..1b4a5ca 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -753,6 +753,25 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "colo_lost_heartbeat",

...but documented incorrectly (this should use '-' to match the command
name in the .json file, not '_')

> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
> +    },
> +
> +SQMP
> +colo_lost_heartbeat
> +--------------------
> +
> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
> +
> +Example:
> +
> +-> { "execute": "colo_lost_heartbeat" }
> +<- { "return": {} }

This example won't work unless you fix the spelling.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
@ 2015-02-16 23:50   ` Eric Blake
  2015-02-24  9:50     ` Wen Congyang
  2015-02-25  7:50     ` zhanghailiang
  0 siblings, 2 replies; 65+ messages in thread
From: Eric Blake @ 2015-02-16 23:50 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: Li Zhijian, yunhong.jiang, eddie.dong, dgilbert, peter.huangpeng,
	Gao feng, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

On 02/11/2015 08:17 PM, zhanghailiang wrote:
> The 'colo_nicname' should be assigned with network name,
> for exmple, 'eth2'. It will be parameter of 'colo_script',

s/exmple/example/

> 'colo_script' should be assigned with an scirpt path.

s/an scirpt/a script/

> 
> We parse these parameter in tap.

Script files are in general very hard to secure.  Libvirt marks any
domain that uses a script file for controlling networking as tainted,
because it cannot guarantee that the script did not do arbitrary
actions.  Can you come up with any better solution that does not require
a script file, such as having management software responsible for
passing in an already-opened fd?

> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  include/net/net.h |  4 ++++
>  net/tap.c         | 27 ++++++++++++++++++++++++---
>  qapi-schema.json  |  8 +++++++-
>  qemu-options.hx   | 10 +++++++++-
>  4 files changed, 44 insertions(+), 5 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -2101,6 +2101,10 @@
>  #
>  # @queues: #optional number of queues to be created for multiqueue capable tap
>  #
> +# @colo_nicname: #optional the host physical nic for QEMU (Since 2.3)
> +#
> +# @colo_script: #optional the script file which used by COLO (Since 2.3)

s/_/-/ in both parameter names, please.  Since they are optional, it
might be worth documenting what they default to when not present.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-16 23:50   ` Eric Blake
@ 2015-02-24  9:50     ` Wen Congyang
  2015-02-24 16:30       ` Eric Blake
  2015-02-25  7:50     ` zhanghailiang
  1 sibling, 1 reply; 65+ messages in thread
From: Wen Congyang @ 2015-02-24  9:50 UTC (permalink / raw)
  To: Eric Blake, zhanghailiang, qemu-devel
  Cc: Li Zhijian, yunhong.jiang, eddie.dong, dgilbert, peter.huangpeng,
	Gao feng, stefanha, pbonzini

On 02/17/2015 07:50 AM, Eric Blake wrote:
> On 02/11/2015 08:17 PM, zhanghailiang wrote:
>> The 'colo_nicname' should be assigned with network name,
>> for exmple, 'eth2'. It will be parameter of 'colo_script',
> 
> s/exmple/example/
> 
>> 'colo_script' should be assigned with an scirpt path.
> 
> s/an scirpt/a script/
> 
>>
>> We parse these parameter in tap.
> 
> Script files are in general very hard to secure.  Libvirt marks any
> domain that uses a script file for controlling networking as tainted,
> because it cannot guarantee that the script did not do arbitrary
> actions.  Can you come up with any better solution that does not require
> a script file, such as having management software responsible for
> passing in an already-opened fd?

Do you mean that opening the script in libvirt?

Thanks
Wen Congyang

> 
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>  include/net/net.h |  4 ++++
>>  net/tap.c         | 27 ++++++++++++++++++++++++---
>>  qapi-schema.json  |  8 +++++++-
>>  qemu-options.hx   | 10 +++++++++-
>>  4 files changed, 44 insertions(+), 5 deletions(-)
>>
> 
>> +++ b/qapi-schema.json
>> @@ -2101,6 +2101,10 @@
>>  #
>>  # @queues: #optional number of queues to be created for multiqueue capable tap
>>  #
>> +# @colo_nicname: #optional the host physical nic for QEMU (Since 2.3)
>> +#
>> +# @colo_script: #optional the script file which used by COLO (Since 2.3)
> 
> s/_/-/ in both parameter names, please.  Since they are optional, it
> might be worth documenting what they default to when not present.
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (27 preceding siblings ...)
  2015-02-16 13:11 ` [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
@ 2015-02-24 11:08 ` Dr. David Alan Gilbert
  2015-02-24 20:13 ` Dr. David Alan Gilbert
  29 siblings, 0 replies; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-24 11:08 UTC (permalink / raw)
  To: zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, peter.huangpeng, gaofeng,
	stefanha, pbonzini

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> 3. Prepare host kernel
> colo-proxy kernel module need cooperate with linux kernel.
> You should put a kernel patch 'colo-patch-for-kernel.patch'
> (It's based on linux kernel-3.19) which you can get from 
> https://github.com/gao-feng/colo-proxy.git
> and then compile kernel and intall the new kernel.
> 
> 4. Proxy module
> proxy module is used for network packets compare, you can also get the lastest
> version from: https://github.com/gao-feng/colo-proxy.git.
> You can compile and install it by using command 'make' && 'make install'.

I'm seeing an rcu hang when a COLO enabled qemu quits:

Feb 24 05:29:14 virtlab413 kernel: INFO: task qemu-system-x86:13033 blocked for more than 120 seconds.
Feb 24 05:29:14 virtlab413 kernel:      Tainted: G           OE  3.18.0uf-colo-00028-g75d30f0-dirty #18
Feb 24 05:29:14 virtlab413 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 24 05:29:14 virtlab413 kernel: qemu-system-x86 D 0000000000000000     0 13033  13004 0x00000080
Feb 24 05:29:14 virtlab413 kernel: ffff880ff5837b38 0000000000000096 ffff880ff2f35b00 0000000000012d40
Feb 24 05:29:14 virtlab413 kernel: ffff880ff5837fd8 0000000000012d40 ffff88100bdc16c0 ffff880ff2f35b00
Feb 24 05:29:14 virtlab413 kernel: 0000000000000000 7fffffffffffffff ffff880ff5837ca0 ffff880ff5837c98
Feb 24 05:29:14 virtlab413 kernel: Call Trace:
Feb 24 05:29:14 virtlab413 kernel: [<ffffffff81669d29>] schedule+0x29/0x70
Feb 24 05:29:14 virtlab413 kernel: [<ffffffff8166f16c>] schedule_timeout+0x1ec/0x350
Feb 24 05:29:14 virtlab413 kernel: [<ffffffff8166b4f2>] ? wait_for_completion+0x32/0x120
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8166b5a4>] wait_for_completion+0xe4/0x120
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8108e110>] ? wake_up_state+0x20/0x20
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff810c2460>] ? rcu_barrier+0x20/0x20
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff810beb3c>] wait_rcu_gp+0x5c/0x80
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff810beac0>] ? ftrace_raw_output_rcu_utilization+0x50/0x50
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff810c29bf>] synchronize_rcu.part.54+0x1f/0x40
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff810c29f8>] synchronize_rcu+0x18/0x20
Feb 24 05:29:15 virtlab413 kernel: [<ffffffffa078d175>] colo_node_unregister+0x45/0x70 [nf_conntrack_colo]
Feb 24 05:29:15 virtlab413 kernel: [<ffffffffa078d9b5>] colonl_close_event+0xa5/0xac [nf_conntrack_colo]
Feb 24 05:29:15 virtlab413 kernel: [<ffffffffa078d948>] ? colonl_close_event+0x38/0xac [nf_conntrack_colo]
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81080b25>] ? __atomic_notifier_call_chain+0x5/0xa0
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8108035c>] notifier_call_chain+0x4c/0x70
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81080b82>] __atomic_notifier_call_chain+0x62/0xa0
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81080b25>] ? __atomic_notifier_call_chain+0x5/0xa0
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8150c02d>] ? skb_dequeue+0x5d/0x80
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81080bd6>] atomic_notifier_call_chain+0x16/0x20
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8155eb02>] netlink_release+0x302/0x340
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81503c8f>] sock_release+0x1f/0x90
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81503d12>] sock_close+0x12/0x20
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff811dce53>] __fput+0xd3/0x210
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff811dcfde>] ____fput+0xe/0x10
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff8107d5a7>] task_work_run+0xa7/0xe0
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81002dd7>] do_notify_resume+0x97/0xb0
Feb 24 05:29:15 virtlab413 kernel: [<ffffffff81671047>] int_signal+0x12/0x17
Feb 24 05:29:15 virtlab413 kernel: INFO: lockdep is turned off.
Feb 24 05:29:15 virtlab413 kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 4, t=240014 jiffies, g=60214, c=60213, q=0)
Feb 24 05:29:15 virtlab413 kernel: INFO: Stall ended before state dump start
Feb 24 05:31:15 virtlab413 kernel: INFO: task qemu-system-x86:13033 blocked for more than 120 seconds.
Feb 24 05:31:15 virtlab413 kernel:      Tainted: G           OE  3.18.0uf-colo-00028-g75d30f0-dirty #18
Feb 24 05:31:15 virtlab413 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 24 05:31:15 virtlab413 kernel: qemu-system-x86 D 0000000000000000     0 13033  13004 0x00000080
Feb 24 05:31:15 virtlab413 kernel: ffff880ff5837b38 0000000000000096 ffff880ff2f35b00 0000000000012d40
Feb 24 05:31:15 virtlab413 kernel: ffff880ff5837fd8 0000000000012d40 ffff88100bdc16c0 ffff880ff2f35b00
Feb 24 05:31:15 virtlab413 kernel: 0000000000000000 7fffffffffffffff ffff880ff5837ca0 ffff880ff5837c98
Feb 24 05:31:15 virtlab413 kernel: Call Trace:
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81669d29>] schedule+0x29/0x70
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8166f16c>] schedule_timeout+0x1ec/0x350
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8166b4f2>] ? wait_for_completion+0x32/0x120
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8166b5a4>] wait_for_completion+0xe4/0x120
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8108e110>] ? wake_up_state+0x20/0x20
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff810c2460>] ? rcu_barrier+0x20/0x20
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff810beb3c>] wait_rcu_gp+0x5c/0x80
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff810beac0>] ? ftrace_raw_output_rcu_utilization+0x50/0x50
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff810c29bf>] synchronize_rcu.part.54+0x1f/0x40
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff810c29f8>] synchronize_rcu+0x18/0x20
Feb 24 05:31:15 virtlab413 kernel: [<ffffffffa078d175>] colo_node_unregister+0x45/0x70 [nf_conntrack_colo]
Feb 24 05:31:15 virtlab413 kernel: [<ffffffffa078d9b5>] colonl_close_event+0xa5/0xac [nf_conntrack_colo]
Feb 24 05:31:15 virtlab413 kernel: [<ffffffffa078d948>] ? colonl_close_event+0x38/0xac [nf_conntrack_colo]
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81080b25>] ? __atomic_notifier_call_chain+0x5/0xa0
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8108035c>] notifier_call_chain+0x4c/0x70
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81080b82>] __atomic_notifier_call_chain+0x62/0xa0
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81080b25>] ? __atomic_notifier_call_chain+0x5/0xa0
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8150c02d>] ? skb_dequeue+0x5d/0x80
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81080bd6>] atomic_notifier_call_chain+0x16/0x20
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8155eb02>] netlink_release+0x302/0x340
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81503c8f>] sock_release+0x1f/0x90
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81503d12>] sock_close+0x12/0x20
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff811dce53>] __fput+0xd3/0x210
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff811dcfde>] ____fput+0xe/0x10
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff8107d5a7>] task_work_run+0xa7/0xe0
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81002dd7>] do_notify_resume+0x97/0xb0
Feb 24 05:31:15 virtlab413 kernel: [<ffffffff81671047>] int_signal+0x12/0x17

> 
> 5. Modified iptables
> We have add a new rule to iptables command, so please get the patch from
> https://github.com/gao-feng/colo-proxy/blob/master/COLO-library_for_iptables-1.4.21.patch
> It is based on version 1.4.21.

I see there's also an arptables patch as well that I built.

Dave

> 
> 6. Qemu colo
> Checkout the latest colo branch from
> https://github.com/coloft/qemu/commits/colo-v1.0
> configure and make: 
> # ./configure --target-list=x86_64-softmmu --enable-colo --enable-quorum 
> # make
> 
> * Test steps:
> 1. load module
> # modprobe nf_conntrack_colo (Other colo module will be automatically loaded by
> script colo-proxy-script.sh)
> # modprobe xt_mark
> # modprobe kvm-intel
> 
> 2. startup qemu
> master:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=quorum,read-pattern=first,children.0.file.filename=suse11_3.img,children.0.driver=raw,children.1.file.driver=nbd+colo,children.1.file.host=192.168.2.88,children.1.file.port=8889,children.1.file.export=colo1,children.1.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S
> slave:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=blkcolo,export=colo1,backing.file.filename=suse11_3.img,backing.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8888
> 
> 3. On Secondary VM's QEMU monitor, run
> (qemu) nbd_server_start 192.168.2.88:8889 
> 
> 4.on Primary VM's QEMU monitor, run following command:
> (qemu) migrate_set_capability colo on
> (qemu) migrate tcp:192.168.2.88:8888
> 
> 5. done
> You will see two runing VMs, whenever you make changes to PVM, SVM
> will be synced to PVM's state.
> 
> 6. failover test:
> You can kill SVM (PVM) and run 'colo_lost_heartbeat' in PVM's (SVM's) monitor
> at the same time, then PVM (SVM) will failover and client will not feel this
> change.
> 
> It is still a framework, far away from commercial use,
> so any comments/feedbacks are warmly welcomed ;)
> 
> PS: 
> We (huawei) have cooperated with fujitsu on COLO work,
> and we work mainly on COLO frame and fujitsu will focus on COLO block.
> 
> TODO list:
> 1) Optimize the process of checkpoint, shorten the time-consuming
> 2) Add more debug/stat info 
> 3) Strengthen failover 
> 4) The capability of continuous FT
> 
> v3:
> - use proxy instead of colo agent to compare network packets
> - add block replication
> - Optimize failover disposal
> - handle shutdown
> 
> v2:
> - use QEMUSizedBuffer/QEMUFile as COLO buffer
> - colo support is enabled by default
> - add nic replication support
> - addressed comments from Eric Blake and Dr. David Alan Gilbert
> 
> v1:
> - implement the frame of colo
> 
> 
> zhanghailiang (27):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'colo' to migration
>   COLO: migrate colo related info to slave
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   migration: Don't send vm description in COLO mode
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save VM state to slave when do checkpoint
>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>   COLO VMstate: Load VM state into qsb before restore it
>   COLO RAM: Flush cached RAM into SVM's memory
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Implement COLO master/slave failover work
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>   COLO NIC: Implement colo nic device interface configure()
>   COLO NIC : Implement colo nic init/destroy function
>   COLO NIC: Some init work related with proxy module
>   COLO: Do checkpoint according to the result of net packets comparing
>   COLO: Improve checkpoint efficiency by do additional periodic
>     checkpoint
>   COLO NIC: Implement NIC checkpoint and failover
>   COLO: Disable qdev hotplug when VM is in COLO mode
>   COLO: Implement shutdown checkpoint
>   COLO: Add block replication into colo process
> 
>  arch_init.c                            | 196 ++++++++-
>  configure                              |  14 +
>  hmp-commands.hx                        |  15 +
>  hmp.c                                  |   7 +
>  hmp.h                                  |   1 +
>  include/exec/cpu-all.h                 |   1 +
>  include/migration/migration-colo.h     |  57 +++
>  include/migration/migration-failover.h |  22 +
>  include/migration/migration.h          |  14 +
>  include/migration/qemu-file.h          |   3 +-
>  include/net/colo-nic.h                 |  25 ++
>  include/net/net.h                      |   4 +
>  include/sysemu/sysemu.h                |   3 +
>  migration/Makefile.objs                |   2 +
>  migration/colo-comm.c                  |  81 ++++
>  migration/colo-failover.c              |  48 +++
>  migration/colo.c                       | 743 +++++++++++++++++++++++++++++++++
>  migration/migration.c                  |  74 +++-
>  migration/qemu-file-buf.c              |  57 +++
>  net/Makefile.objs                      |   1 +
>  net/colo-nic.c                         | 438 +++++++++++++++++++
>  net/tap.c                              |  45 +-
>  qapi-schema.json                       |  27 +-
>  qemu-options.hx                        |  10 +-
>  qmp-commands.hx                        |  19 +
>  savevm.c                               |  10 +-
>  scripts/colo-proxy-script.sh           |  88 ++++
>  stubs/Makefile.objs                    |   1 +
>  stubs/migration-colo.c                 |  49 +++
>  vl.c                                   |  36 +-
>  30 files changed, 2047 insertions(+), 44 deletions(-)
>  create mode 100644 include/migration/migration-colo.h
>  create mode 100644 include/migration/migration-failover.h
>  create mode 100644 include/net/colo-nic.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 net/colo-nic.c
>  create mode 100755 scripts/colo-proxy-script.sh
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-24  9:50     ` Wen Congyang
@ 2015-02-24 16:30       ` Eric Blake
  2015-02-24 17:24         ` Daniel P. Berrange
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-02-24 16:30 UTC (permalink / raw)
  To: Wen Congyang, zhanghailiang, qemu-devel
  Cc: Li Zhijian, yunhong.jiang, eddie.dong, dgilbert, peter.huangpeng,
	Gao feng, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

On 02/24/2015 02:50 AM, Wen Congyang wrote:
>> Script files are in general very hard to secure.  Libvirt marks any
>> domain that uses a script file for controlling networking as tainted,
>> because it cannot guarantee that the script did not do arbitrary
>> actions.  Can you come up with any better solution that does not require
>> a script file, such as having management software responsible for
>> passing in an already-opened fd?
> 
> Do you mean that opening the script in libvirt?
> 

No, I mean a solution that needs no script file at all.  Have libvirt
pre-open the TAP device you will need, then pass in the fd that will be
used for the colo NIC.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-24 16:30       ` Eric Blake
@ 2015-02-24 17:24         ` Daniel P. Berrange
  2015-02-25  8:21           ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Daniel P. Berrange @ 2015-02-24 17:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: zhanghailiang, Li Zhijian, yunhong.jiang, eddie.dong, qemu-devel,
	dgilbert, Gao feng, stefanha, pbonzini, peter.huangpeng

On Tue, Feb 24, 2015 at 09:30:56AM -0700, Eric Blake wrote:
> On 02/24/2015 02:50 AM, Wen Congyang wrote:
> >> Script files are in general very hard to secure.  Libvirt marks any
> >> domain that uses a script file for controlling networking as tainted,
> >> because it cannot guarantee that the script did not do arbitrary
> >> actions.  Can you come up with any better solution that does not require
> >> a script file, such as having management software responsible for
> >> passing in an already-opened fd?
> > 
> > Do you mean that opening the script in libvirt?
> > 
> 
> No, I mean a solution that needs no script file at all.  Have libvirt
> pre-open the TAP device you will need, then pass in the fd that will be
> used for the colo NIC.

Agreed, we really must not add new features that require executing
arbitrary blackbox shell scripts to QEMU, when we know that reslts in
a flawed security model. And just pushing the script execution upto
libvirt is not really a satisfactory solution either.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
                   ` (28 preceding siblings ...)
  2015-02-24 11:08 ` Dr. David Alan Gilbert
@ 2015-02-24 20:13 ` Dr. David Alan Gilbert
  2015-02-25  3:20   ` Gao feng
  29 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-24 20:13 UTC (permalink / raw)
  To: zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, peter.huangpeng, stefanha,
	pbonzini

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> This is the 3th version of COLO, it is only COLO frame part, include: VM checkpoint,
> failover, proxy API, block replication API, not include block replication.
> The block part has been sent by wencongyang:
> '[RFC PATCH 00/14] Block replication for continuous checkpoints'
> 
> You can get the integrated qemu colo patches from github:
> https://github.com/coloft/qemu/commits/colo-v1.0
> 
> Compared with the previous version, we have realized all parts of COLO frame, 
> and it is works now.
> 
> The main change since last version is, we use colo proxy mode instead of
> colo agent, they are all used for network packets compare, but proxy is more
> efficient, it is based on netfilter.
> Another modification is we implement new block replication scheme, 
> you can get more info from wencongyang's block patch series 
> 
> If you don't know about COLO, please refer to below link for detailed 
> information.
> 
> The idea is presented in Xen summit 2012, and 2013,
> and academia paper in SOCC 2013. It's also presented in KVM forum in 2013:
> http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
> 
> Previous posted RFC proposal:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
> 
> The below is the detail about test COLO, you can also get this info
> from http://wiki.qemu.org/Features/COLO.
> * Hardware requirements
> There is at least one directly connected nic to forward the network requests
> from client to secondary VM. The directly connected nic must not be used by
> any other purpose.
> 
> * Network link topology
> =================================normal ======================================
>                                 +--------+
>                                 |client  |
>    master                       +----+---+                    slave
> -------------------------+           |            + -------------------------+
>    PVM                   |           +            |                          |
> +-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
> |guest  |     +---+-+    |                        |       +---+-+            |
> |     [tap0]--+ br0 |    |                        |       | br0 |            |
> |       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+      SVM   |
> +-------+                |                        |    |            +-------+|
>                          |                        |    |  +-----+   | guest ||
>                        [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
>                          |                        |       +-----+   |       ||
>                          |                        |                 +-------+|
> -------------------------+                        +--------------------------+
> e.g.
> master:
> br0: 192.168.0.33
> eth1: 192.168.1.33
> eth2: 192.168.2.33
> 
> slave:
> br0: 192.168.0.88
> br1: no ip address
> eth1: 192.168.1.88
> eth2: 192.168.2.88
> (Actually, you can also use eth0 as checkpoint channel)
> Note: in normal, SVM will always be linked to br1 like above until
> failover.
> 
> * Test environment prepare:
> 1. Set Up the Bridge and network environment
> You must setup you network environment like above picture,
> In master, setup a bridge br0, using command brctl, like:
> # ifconfig eth0 down
> # ifconfig eth0 0.0.0.0
> # brctl addbr br0
> # brctl addif br0 eth0
> # ifconfig br0 192.168.0.33 netmask 255.255.255.0
> # ifconfig eth0 up
> In slave, setup two bridge br0, br1, commands are same with above,
> please note that br1 is linked to eth1(the forward nic).
>
> 2.Qemu-ifup
> We need a script to bring up the TAP interface.
> You can find this info from http://en.wikibooks.org/wiki/QEMU/Networking.
> Master:
> root@master# cat /etc/qemu-ifup
> #!/bin/sh
> switch=br0
> if [ -n "$1" ]; then
>         ip link set $1 up
>         brctl addif ${switch} $1
> fi
> Slave:
> root@slave # cat /etc/qemu-ifup
> #!/bin/sh
> switch=br1  #in primary, switch is br0. in secondary switch is br1
> if [ -n "$1" ]; then
>         ip link set $1 up
>         brctl addif ${switch} $1
> fi 
> 
> 3. Prepare host kernel
> colo-proxy kernel module need cooperate with linux kernel.
> You should put a kernel patch 'colo-patch-for-kernel.patch'
> (It's based on linux kernel-3.19) which you can get from 
> https://github.com/gao-feng/colo-proxy.git
> and then compile kernel and intall the new kernel.
> 
> 4. Proxy module
> proxy module is used for network packets compare, you can also get the lastest
> version from: https://github.com/gao-feng/colo-proxy.git.
> You can compile and install it by using command 'make' && 'make install'.
> 
> 5. Modified iptables
> We have add a new rule to iptables command, so please get the patch from
> https://github.com/gao-feng/colo-proxy/blob/master/COLO-library_for_iptables-1.4.21.patch
> It is based on version 1.4.21.

I'm getting closer but I don't think I'm getting packets from the secondary
to the primary yet; it looks like the primary is holding onto it's packets
until the end of the 10second checkpoint.
Still debugging that but I'd take any tips.

> 2. startup qemu
> master:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=quorum,read-pattern=first,children.0.file.filename=suse11_3.img,children.0.driver=raw,children.1.file.driver=nbd+colo,children.1.file.host=192.168.2.88,children.1.file.port=8889,children.1.file.export=colo1,children.1.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S
> slave:
> # qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,colo_script=./scripts/colo-proxy-script.sh,colo_nicname=eth1 -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive driver=blkcolo,export=colo1,backing.file.filename=suse11_3.img,backing.driver=raw,if=virtio -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8888
> 
> 3. On Secondary VM's QEMU monitor, run
> (qemu) nbd_server_start 192.168.2.88:8889 
> 
> 4.on Primary VM's QEMU monitor, run following command:
> (qemu) migrate_set_capability colo on
> (qemu) migrate tcp:192.168.2.88:8888
> 
> 5. done
> You will see two runing VMs, whenever you make changes to PVM, SVM
> will be synced to PVM's state.
> 
> 6. failover test:
> You can kill SVM (PVM) and run 'colo_lost_heartbeat' in PVM's (SVM's) monitor
> at the same time, then PVM (SVM) will failover and client will not feel this
> change.

I've not got that to work yet; so far if I kill the PVM the SVM quits
shortly after.  (I've tried both 'q' in the PVMs monitor and also kill -9).

Dave

> It is still a framework, far away from commercial use,
> so any comments/feedbacks are warmly welcomed ;)
> 
> PS: 
> We (huawei) have cooperated with fujitsu on COLO work,
> and we work mainly on COLO frame and fujitsu will focus on COLO block.
> 
> TODO list:
> 1) Optimize the process of checkpoint, shorten the time-consuming
> 2) Add more debug/stat info 
> 3) Strengthen failover 
> 4) The capability of continuous FT
> 
> v3:
> - use proxy instead of colo agent to compare network packets
> - add block replication
> - Optimize failover disposal
> - handle shutdown
> 
> v2:
> - use QEMUSizedBuffer/QEMUFile as COLO buffer
> - colo support is enabled by default
> - add nic replication support
> - addressed comments from Eric Blake and Dr. David Alan Gilbert
> 
> v1:
> - implement the frame of colo
> 
> 
> zhanghailiang (27):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'colo' to migration
>   COLO: migrate colo related info to slave
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   migration: Don't send vm description in COLO mode
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save VM state to slave when do checkpoint
>   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
>   COLO VMstate: Load VM state into qsb before restore it
>   COLO RAM: Flush cached RAM into SVM's memory
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Implement COLO master/slave failover work
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
>   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
>   COLO NIC: Implement colo nic device interface configure()
>   COLO NIC : Implement colo nic init/destroy function
>   COLO NIC: Some init work related with proxy module
>   COLO: Do checkpoint according to the result of net packets comparing
>   COLO: Improve checkpoint efficiency by do additional periodic
>     checkpoint
>   COLO NIC: Implement NIC checkpoint and failover
>   COLO: Disable qdev hotplug when VM is in COLO mode
>   COLO: Implement shutdown checkpoint
>   COLO: Add block replication into colo process
> 
>  arch_init.c                            | 196 ++++++++-
>  configure                              |  14 +
>  hmp-commands.hx                        |  15 +
>  hmp.c                                  |   7 +
>  hmp.h                                  |   1 +
>  include/exec/cpu-all.h                 |   1 +
>  include/migration/migration-colo.h     |  57 +++
>  include/migration/migration-failover.h |  22 +
>  include/migration/migration.h          |  14 +
>  include/migration/qemu-file.h          |   3 +-
>  include/net/colo-nic.h                 |  25 ++
>  include/net/net.h                      |   4 +
>  include/sysemu/sysemu.h                |   3 +
>  migration/Makefile.objs                |   2 +
>  migration/colo-comm.c                  |  81 ++++
>  migration/colo-failover.c              |  48 +++
>  migration/colo.c                       | 743 +++++++++++++++++++++++++++++++++
>  migration/migration.c                  |  74 +++-
>  migration/qemu-file-buf.c              |  57 +++
>  net/Makefile.objs                      |   1 +
>  net/colo-nic.c                         | 438 +++++++++++++++++++
>  net/tap.c                              |  45 +-
>  qapi-schema.json                       |  27 +-
>  qemu-options.hx                        |  10 +-
>  qmp-commands.hx                        |  19 +
>  savevm.c                               |  10 +-
>  scripts/colo-proxy-script.sh           |  88 ++++
>  stubs/Makefile.objs                    |   1 +
>  stubs/migration-colo.c                 |  49 +++
>  vl.c                                   |  36 +-
>  30 files changed, 2047 insertions(+), 44 deletions(-)
>  create mode 100644 include/migration/migration-colo.h
>  create mode 100644 include/migration/migration-failover.h
>  create mode 100644 include/net/colo-nic.h
>  create mode 100644 migration/colo-comm.c
>  create mode 100644 migration/colo-failover.c
>  create mode 100644 migration/colo.c
>  create mode 100644 net/colo-nic.c
>  create mode 100755 scripts/colo-proxy-script.sh
>  create mode 100644 stubs/migration-colo.c
> 
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-02-24 20:13 ` Dr. David Alan Gilbert
@ 2015-02-25  3:20   ` Gao feng
  0 siblings, 0 replies; 65+ messages in thread
From: Gao feng @ 2015-02-25  3:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, peter.huangpeng, stefanha,
	pbonzini

On 02/25/2015 04:13 AM, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> This is the 3th version of COLO, it is only COLO frame part, include: VM checkpoint,
>> failover, proxy API, block replication API, not include block replication.
>> The block part has been sent by wencongyang:
>> '[RFC PATCH 00/14] Block replication for continuous checkpoints'
>>
>> You can get the integrated qemu colo patches from github:
>> https://github.com/coloft/qemu/commits/colo-v1.0
>>
>> Compared with the previous version, we have realized all parts of COLO frame, 
>> and it is works now.
>>
>> The main change since last version is, we use colo proxy mode instead of
>> colo agent, they are all used for network packets compare, but proxy is more
>> efficient, it is based on netfilter.
>> Another modification is we implement new block replication scheme, 
>> you can get more info from wencongyang's block patch series 
>>
>> If you don't know about COLO, please refer to below link for detailed 
>> information.
>>
>> The idea is presented in Xen summit 2012, and 2013,
>> and academia paper in SOCC 2013. It's also presented in KVM forum in 2013:
>> http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
>>
>> Previous posted RFC proposal:
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
>>
>> The below is the detail about test COLO, you can also get this info
>> from http://wiki.qemu.org/Features/COLO.
>> * Hardware requirements
>> There is at least one directly connected nic to forward the network requests
>> from client to secondary VM. The directly connected nic must not be used by
>> any other purpose.
>>
>> * Network link topology
>> =================================normal ======================================
>>                                 +--------+
>>                                 |client  |
>>    master                       +----+---+                    slave
>> -------------------------+           |            + -------------------------+
>>    PVM                   |           +            |                          |
>> +-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
>> |guest  |     +---+-+    |                        |       +---+-+            |
>> |     [tap0]--+ br0 |    |                        |       | br0 |            |
>> |       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+      SVM   |
>> +-------+                |                        |    |            +-------+|
>>                          |                        |    |  +-----+   | guest ||
>>                        [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
>>                          |                        |       +-----+   |       ||
>>                          |                        |                 +-------+|
>> -------------------------+                        +--------------------------+
>> e.g.
>> master:
>> br0: 192.168.0.33
>> eth1: 192.168.1.33
>> eth2: 192.168.2.33
>>
>> slave:
>> br0: 192.168.0.88
>> br1: no ip address
>> eth1: 192.168.1.88
>> eth2: 192.168.2.88
>> (Actually, you can also use eth0 as checkpoint channel)
>> Note: in normal, SVM will always be linked to br1 like above until
>> failover.
>>
>> * Test environment prepare:
>> 1. Set Up the Bridge and network environment
>> You must setup you network environment like above picture,
>> In master, setup a bridge br0, using command brctl, like:
>> # ifconfig eth0 down
>> # ifconfig eth0 0.0.0.0
>> # brctl addbr br0
>> # brctl addif br0 eth0
>> # ifconfig br0 192.168.0.33 netmask 255.255.255.0
>> # ifconfig eth0 up
>> In slave, setup two bridge br0, br1, commands are same with above,
>> please note that br1 is linked to eth1(the forward nic).
>>
>> 2.Qemu-ifup
>> We need a script to bring up the TAP interface.
>> You can find this info from http://en.wikibooks.org/wiki/QEMU/Networking.
>> Master:
>> root@master# cat /etc/qemu-ifup
>> #!/bin/sh
>> switch=br0
>> if [ -n "$1" ]; then
>>         ip link set $1 up
>>         brctl addif ${switch} $1
>> fi
>> Slave:
>> root@slave # cat /etc/qemu-ifup
>> #!/bin/sh
>> switch=br1  #in primary, switch is br0. in secondary switch is br1
>> if [ -n "$1" ]; then
>>         ip link set $1 up
>>         brctl addif ${switch} $1
>> fi 
>>
>> 3. Prepare host kernel
>> colo-proxy kernel module need cooperate with linux kernel.
>> You should put a kernel patch 'colo-patch-for-kernel.patch'
>> (It's based on linux kernel-3.19) which you can get from 
>> https://github.com/gao-feng/colo-proxy.git
>> and then compile kernel and intall the new kernel.
>>
>> 4. Proxy module
>> proxy module is used for network packets compare, you can also get the lastest
>> version from: https://github.com/gao-feng/colo-proxy.git.
>> You can compile and install it by using command 'make' && 'make install'.
>>
>> 5. Modified iptables
>> We have add a new rule to iptables command, so please get the patch from
>> https://github.com/gao-feng/colo-proxy/blob/master/COLO-library_for_iptables-1.4.21.patch
>> It is based on version 1.4.21.
> 
> I'm getting closer but I don't think I'm getting packets from the secondary
> to the primary yet; it looks like the primary is holding onto it's packets
> until the end of the 10second checkpoint.
> Still debugging that but I'd take any tips.
> 
Hi David,

you can use tcpdump on secondary's tap device to see if secondary guest already
sent out the packet. and tcpdump on primary's forward device to see if the secondary
guest's packets are sent to primary node.

thanks,
Gao

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-16 12:03   ` Dr. David Alan Gilbert
@ 2015-02-25  3:44     ` zhanghailiang
  2015-02-25  9:08       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  3:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gao feng, stefanha, pbonzini

On 2015/2/16 20:03, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Implement colo nic device interface configure()
>> add a script to configure nic devices:
>> ${QEMU_SCRIPT_DIR}/colo-proxy-script.sh
>
> Do you have some more documentation of the new colo-proxy?  I've

Yes, gaofeng is writing it now...

> been reading the kernel module source and I can see that it's
> a nice idea to do the sequence number adjustment on the host,
> that reduces the need to modify the guest kernel; I was trying to
> figure out how you synchronise the master/slave idea of sequence numbers -
> is that purely from the 'ack' that's duplicated back to the secondary?

Yes, you've got it :)

> If you were unlucky and the 'ack' packet was lost on the duplicated
> link from the primary to secondary how would you recover?

The 'ack' packet will be consider to be lost, because the primary will not
respond to this 'ack' packet until it got secondary's response,
and client will resend it ('ack' packet).

> What about TCP connections setup before colo was activated?
>

Actually, now, we only support activate colo before guest is startup (for test procedure,
'-S' is needed for qemu command line).

> The other thought is that passing the 'sec_dev' as a module parameter
> gives you an artificial limitation; it forces all of the pairs
> to be between the same pair of hosts.   If the 'sec_dev' was a parameter
> to the connection then you could have different slaves associated with
> each guest on the primary host.
>

Hmm, do you mean we should pass this 'sec_dev' as a parameter from qemu to proxy module by
maybe ioctl ?
Yes, it is ugly to pass this 'sec_dev' directly to module as parameter.
We will consider this, thanks ;)

> Dave
> P.S. You probably need to clean the debug messages up in the kernel module!
>

OK, will do that.

>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
>>   scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 143 insertions(+), 1 deletion(-)
>>   create mode 100755 scripts/colo-proxy-script.sh
>>
>> diff --git a/net/colo-nic.c b/net/colo-nic.c
>> index 965af49..f8fc35d 100644
>> --- a/net/colo-nic.c
>> +++ b/net/colo-nic.c
>> @@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
>>       return nc && nc->colo_script[0] && nc->colo_nicname[0];
>>   }
>>
>> +static int launch_colo_script(char *argv[])
>> +{
>> +    int pid, status;
>> +    char *script = argv[0];
>> +
>> +    /* try to launch network script */
>> +    pid = fork();
>> +    if (pid == 0) {
>> +        execv(script, argv);
>> +        _exit(1);
>> +    } else if (pid > 0) {
>> +        while (waitpid(pid, &status, 0) != pid) {
>> +            /* loop */
>> +        }
>> +
>> +        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
>> +            return 0;
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +static int colo_nic_configure(NetClientState *nc,
>> +            bool up, int side, int index)
>> +{
>> +    int i, argc = 6;
>> +    char *argv[7], index_str[32];
>> +    char **parg;
>> +
>> +    if (!nc && index <= 0) {
>> +        error_report("Can not parse colo_script or colo_nicname");
>> +        return -1;
>> +    }
>> +
>> +    parg = argv;
>> +    *parg++ = nc->colo_script;
>> +    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
>> +    *parg++ = (char *)(up ? "install" : "uninstall");
>> +    *parg++ = nc->colo_nicname;
>> +    *parg++ = nc->ifname;
>> +    sprintf(index_str, "%d", index);
>> +    *parg++ = index_str;
>> +    *parg = NULL;
>> +
>> +    for (i = 0; i < argc; i++) {
>> +        if (!argv[i][0]) {
>> +            error_report("Can not get colo_script argument");
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    return launch_colo_script(argv);
>> +}
>> +
>>   void colo_add_nic_devices(NetClientState *nc)
>>   {
>>       struct nic_device *nic = g_malloc0(sizeof(*nic));
>>
>>       nic->support_colo = colo_nic_support;
>> -    nic->configure = NULL;
>> +    nic->configure = colo_nic_configure;
>>       /*
>>        * TODO
>>        * only support "-netdev tap,colo_scripte..."  options
>> diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
>> new file mode 100755
>> index 0000000..c7aa53f
>> --- /dev/null
>> +++ b/scripts/colo-proxy-script.sh
>> @@ -0,0 +1,88 @@
>> +#!/bin/sh
>> +#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
>> +#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
>> +
>> +side=$1
>> +action=$2
>> +phy_if=$3
>> +virt_if=$4
>> +index=$5
>> +br=br1
>> +failover_br=br0
>> +
>> +script_usage()
>> +{
>> +    echo -n "usage: ./colo-proxy-script.sh master/slave "
>> +    echo -e "install/uninstall phy_if virt_if index\n"
>> +}
>> +
>> +master_install()
>> +{
>> +    tc qdisc add dev $virt_if root handle 1: prio
>> +    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +
>> +    modprobe nf_conntrack_ipv4
>> +    modprobe xt_PMYCOLO sec_dev=$phy_if
>> +
>> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
>> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
>> +    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
>> +}
>> +
>> +master_uninstall()
>> +{
>> +    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>> +    tc qdisc del dev $virt_if root handle 1: prio
>> +
>> +    /usr/local/sbin/iptables -t mangle -F
>> +    /usr/local/sbin/ip6tables -t mangle -F
>> +    /usr/local/sbin/arptables -F
>> +    rmmod xt_PMYCOLO
>> +}
>> +
>> +slave_install()
>> +{
>> +    brctl addif $br $phy_if
>> +    modprobe xt_SECCOLO
>> +
>> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
>> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
>> +}
>> +
>> +
>> +slave_uninstall()
>> +{
>> +    brctl delif $br $phy_if
>> +    brctl delif $br $virt_if
>> +    brctl addif $failover_br $virt_if
>> +
>> +    /usr/local/sbin/iptables -t mangle -F
>> +    /usr/local/sbin/ip6tables -t mangle -F
>> +    rmmod xt_SECCOLO
>> +}
>> +
>> +if [ $# -ne 5 ]; then
>> +    script_usage
>> +    exit 1
>> +fi
>> +
>> +if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
>> +    script_usage
>> +    exit 2
>> +fi
>> +
>> +if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
>> +    script_usage
>> +    exit 3
>> +fi
>> +
>> +if [ $index -lt 0 ] || [ $index -gt 100 ]; then
>> +    echo "index overflow"
>> +    exit 4
>> +fi
>> +
>> +${side}_${action}
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
  2015-02-16 13:11 ` [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
@ 2015-02-25  5:17   ` Gao feng
  0 siblings, 0 replies; 65+ messages in thread
From: Gao feng @ 2015-02-25  5:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, peter.huangpeng, stefanha,
	pbonzini

On 02/16/2015 09:11 PM, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> > This is the 3th version of COLO, it is only COLO frame part, include: VM checkpoint,
>> > failover, proxy API, block replication API, not include block replication.
>> > The block part has been sent by wencongyang:
>> > '[RFC PATCH 00/14] Block replication for continuous checkpoints'
>> > 
>> > You can get the integrated qemu colo patches from github:
>> > https://github.com/coloft/qemu/commits/colo-v1.0
>> > 
>> > Compared with the previous version, we have realized all parts of COLO frame, 
>> > and it is works now.
>> > 
>> > The main change since last version is, we use colo proxy mode instead of
>> > colo agent, they are all used for network packets compare, but proxy is more
>> > efficient, it is based on netfilter.
>> > Another modification is we implement new block replication scheme, 
>> > you can get more info from wencongyang's block patch series 
>> > 
>> > If you don't know about COLO, please refer to below link for detailed 
>> > information.
>> > 
>> > The idea is presented in Xen summit 2012, and 2013,
>> > and academia paper in SOCC 2013. It's also presented in KVM forum in 2013:
>> > http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
>> > 
>> > Previous posted RFC proposal:
>> > http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
>> > http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html
>> > 
>> > The below is the detail about test COLO, you can also get this info
>> > from http://wiki.qemu.org/Features/COLO.
>> > * Hardware requirements
>> > There is at least one directly connected nic to forward the network requests
>> > from client to secondary VM. The directly connected nic must not be used by
>> > any other purpose.
>> > 
>> > * Network link topology
>> > =================================normal ======================================
>> >                                 +--------+
>> >                                 |client  |
>> >    master                       +----+---+                    slave
>> > -------------------------+           |            + -------------------------+
>> >    PVM                   |           +            |                          |
>> > +-------+         +----[eth0]-----[switch]-----[eth0]---------+              |
>> > |guest  |     +---+-+    |                        |       +---+-+            |
>> > |     [tap0]--+ br0 |    |                        |       | br0 |            |
>> > |       |     +-----+  [eth1]-----[forward]----[eth1]--+  +-----+      SVM   |
>> > +-------+                |                        |    |            +-------+|
>> >                          |                        |    |  +-----+   | guest ||
>> >                        [eth2]---[checkpoint]---[eth2]  +--+br1  |-[tap0]    ||
>> >                          |                        |       +-----+   |       ||
>> >                          |                        |                 +-------+|
>> > -------------------------+                        +--------------------------+
>> > e.g.
>> > master:
>> > br0: 192.168.0.33
>> > eth1: 192.168.1.33
>> > eth2: 192.168.2.33
>> > 
>> > slave:
>> > br0: 192.168.0.88
>> > br1: no ip address
>> > eth1: 192.168.1.88
>> > eth2: 192.168.2.88
>> > (Actually, you can also use eth0 as checkpoint channel)
>> > Note: in normal, SVM will always be linked to br1 like above until
>> > failover.
> Why does eth1 need IP addresses? 

Yes, eth1 shouldn't have IP address.

> Isn't the traffic on eth1 just a copy of the
> traffic on eth0 for the proxy modules to compare/forward?

you are right.

> Wouldn't any ARP traffic or the like generated from having IPs on those
> interfaces confuse the comparison process?

Yes, the arp packet sending from slaver eth1 may confuse the proxy.

> (Similarly for the bridges, is it best to turn off STP and the like
> to stop the bridges adding extra packets on eth1/eth0 ?)
> 

I haven't checked if enable STP will cause some problem, but proxy only
handle the packets witch has related netfilter conntrack. in my test,
proxy runs well when STP is enabled.

Thanks,
Gao

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave
  2015-02-16 23:20   ` Eric Blake
@ 2015-02-25  6:21     ` zhanghailiang
  0 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  6:21 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, Gonglei, stefanha, pbonzini,
	Yang Hongyang

On 2015/2/17 7:20, Eric Blake wrote:
> On 02/11/2015 08:16 PM, zhanghailiang wrote:
>> We can know if we should go into COLO mode by the info that
>> has been migrated from PVM.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>   include/migration/migration-colo.h | 21 ++++++++++++++
>>   migration/Makefile.objs            |  1 +
>>   migration/colo-comm.c              | 56 ++++++++++++++++++++++++++++++++++++++
>>   vl.c                               |  5 +++-
>>   4 files changed, 82 insertions(+), 1 deletion(-)
>>   create mode 100644 include/migration/migration-colo.h
>>   create mode 100644 migration/colo-comm.c
>
>> +
>> +/* #define DEBUG_COLO */
>> +
>> +#ifdef DEBUG_COLO
>> +#define DPRINTF(fmt, ...) \
>> +    do { fprintf(stdout, "COLO: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
>
> This is not very good (that is, it is a great way to write stale
> debugging statements that tend to bit-rot, and later fail to compile
> when you turn debug on).  Better is a usage pattern that enforces that
> the debug compiles but has no impact.  For example, see how block/ssh.c
> defines DPRINTF.
>

OK, will fix that in next version, thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration
  2015-02-16 23:27   ` Eric Blake
@ 2015-02-25  6:43     ` zhanghailiang
  0 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  6:43 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, Gonglei, stefanha, pbonzini

On 2015/2/17 7:27, Eric Blake wrote:
> On 02/11/2015 08:16 PM, zhanghailiang wrote:
>> Add a migrate state: MIG_STATE_COLO, enter this migration state
>> after the first live migration successfully finished.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>   include/migration/migration-colo.h |  2 ++
>>   include/migration/migration.h      | 13 +++++++
>>   migration/Makefile.objs            |  1 +
>>   migration/colo.c                   | 72 ++++++++++++++++++++++++++++++++++++++
>>   migration/migration.c              | 38 +++++++++++---------
>>   stubs/Makefile.objs                |  1 +
>>   stubs/migration-colo.c             | 17 +++++++++
>>   7 files changed, 128 insertions(+), 16 deletions(-)
>>   create mode 100644 migration/colo.c
>>   create mode 100644 stubs/migration-colo.c
>>
>
>> +++ b/include/migration/migration.h
>> @@ -65,6 +65,19 @@ struct MigrationState
>>       int64_t dirty_sync_count;
>>   };
>>
>> +enum {
>> +    MIG_STATE_ERROR = -1,
>> +    MIG_STATE_NONE,
>> +    MIG_STATE_SETUP,
>> +    MIG_STATE_CANCELLING,
>> +    MIG_STATE_CANCELLED,
>> +    MIG_STATE_ACTIVE,
>> +    MIG_STATE_COLO,
>> +    MIG_STATE_COMPLETED,
>> +};
>
> Is the new state intended to be user-visible?  If so, wouldn't it be
> better to expose this enum via qapi-schema.json?
>

No, for now it is only used internally.

>
>> +
>> +/* #define DEBUG_COLO */
>> +
>> +#ifdef DEBUG_COLO
>> +#define DPRINTF(fmt, ...) \
>> +do { fprintf(stdout, "colo: " fmt , ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) do {} while (0)
>> +#endif
>> +
>
> Same comment as in 3/27 about avoiding bit-rotting debug statements.  Or
> even better,...
>

OK, will fix it.

>> +static QEMUBH *colo_bh;
>> +
>> +static void *colo_thread(void *opaque)
>> +{
>> +    MigrationState *s = opaque;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    vm_start();
>> +    qemu_mutex_unlock_iothread();
>> +    DPRINTF("vm resume to run\n");
>
> ...why not add tracepoints instead of using DPRINTF?
>

Hmm, we will change it to using tracepoints, for now, we use DPRINTF just for convenience.

>
>> @@ -227,6 +218,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>>
>>           get_xbzrle_cache_stats(info);
>>           break;
>> +    case MIG_STATE_COLO:
>> +        info->has_status = true;
>> +        info->status = g_strdup("colo");
>> +        /* TODO: display COLO specific informations(checkpoint info etc.),*/
>> +        break;
>
> Uggh.  We REALLY need to fix MigrationInfo to convert 'status' to use an
> enum type, instead of an open-coded 'str' (such a conversion is
> backwards compatible, and better documented).  Then it would be more
> obvious that you are adding an enum value.  Doing the conversion would
> be a good prerequisite patch.
>

Good idea, i will do this, send a patch like that. ;)

> s/informations(checkpoint info etc.),/information (checkpoint info etc.)/
>

Will fix it, thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-16 23:47   ` Eric Blake
@ 2015-02-25  7:04     ` zhanghailiang
  2015-02-25  7:16       ` Hongyang Yang
                         ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  7:04 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, stefanha, pbonzini, Yang Hongyang

On 2015/2/17 7:47, Eric Blake wrote:
> On 02/11/2015 08:17 PM, zhanghailiang wrote:
>> We leave users to use whatever heartbeat solution they want, if the heartbeat
>> is lost, or other errors they detect, they can use command
>> 'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
>> accordingly.
>>
>> For example,
>> If send the command to PVM, Primary will exit COLO mode, and takeover,
>> if to Secondary, Secondary will do failover work and at last takeover server.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   hmp-commands.hx                        | 15 ++++++++++++++
>>   hmp.c                                  |  7 +++++++
>>   hmp.h                                  |  1 +
>>   include/migration/migration-colo.h     |  1 +
>>   include/migration/migration-failover.h | 20 ++++++++++++++++++
>>   migration/Makefile.objs                |  2 +-
>>   migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
>>   migration/colo.c                       |  1 +
>>   qapi-schema.json                       |  9 ++++++++
>>   qmp-commands.hx                        | 19 +++++++++++++++++
>>   stubs/migration-colo.c                 |  8 +++++++
>>   11 files changed, 120 insertions(+), 1 deletion(-)
>>   create mode 100644 include/migration/migration-failover.h
>>   create mode 100644 migration/colo-failover.c
>>
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>
>> +++ b/qapi-schema.json
>> @@ -543,6 +543,15 @@
>>   { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>>
>>   ##
>> +# @colo-lost-heartbeat
>> +#
>> +# Tell COLO that heartbeat is lost
>> +#
>> +# Since: 2.3
>> +##
>> +{ 'command': 'colo-lost-heartbeat' }
>
> Okay...
>
>> +
>> +##
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index a85d847..1b4a5ca 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -753,6 +753,25 @@ Example:
>>   EQMP
>>
>>       {
>> +        .name       = "colo_lost_heartbeat",
>
> ...but documented incorrectly (this should use '-' to match the command
> name in the .json file, not '_')
>

Er, yes, you are right, here it should be 'colo-lost-heartbeat' in qmp-commands.hx,
but 'colo_lost_heartbeat' in hmp-commands.hx, it is a little confused for me,
why it should be like this?

i will fix it.

>> +        .args_type  = "",
>> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
>> +    },
>> +
>> +SQMP
>> +colo_lost_heartbeat
>> +--------------------
>> +
>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>> +
>> +Example:
>> +
>> +-> { "execute": "colo_lost_heartbeat" }
>> +<- { "return": {} }
>
> This example won't work unless you fix the spelling.
>

Should here also be changed to 'colo-lost-heartbeat' ?

Thanks,
zhanghailiang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-25  7:04     ` zhanghailiang
@ 2015-02-25  7:16       ` Hongyang Yang
  2015-02-25  7:40       ` Wen Congyang
  2015-03-06 16:10       ` Eric Blake
  2 siblings, 0 replies; 65+ messages in thread
From: Hongyang Yang @ 2015-02-25  7:16 UTC (permalink / raw)
  To: zhanghailiang, Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, stefanha, pbonzini



在 02/25/2015 03:04 PM, zhanghailiang 写道:
> On 2015/2/17 7:47, Eric Blake wrote:
>> On 02/11/2015 08:17 PM, zhanghailiang wrote:
>>> We leave users to use whatever heartbeat solution they want, if the heartbeat
>>> is lost, or other errors they detect, they can use command
>>> 'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
>>> accordingly.
>>>
>>> For example,
>>> If send the command to PVM, Primary will exit COLO mode, and takeover,
>>> if to Secondary, Secondary will do failover work and at last takeover server.
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> ---
>>>   hmp-commands.hx                        | 15 ++++++++++++++
>>>   hmp.c                                  |  7 +++++++
>>>   hmp.h                                  |  1 +
>>>   include/migration/migration-colo.h     |  1 +
>>>   include/migration/migration-failover.h | 20 ++++++++++++++++++
>>>   migration/Makefile.objs                |  2 +-
>>>   migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
>>>   migration/colo.c                       |  1 +
>>>   qapi-schema.json                       |  9 ++++++++
>>>   qmp-commands.hx                        | 19 +++++++++++++++++
>>>   stubs/migration-colo.c                 |  8 +++++++
>>>   11 files changed, 120 insertions(+), 1 deletion(-)
>>>   create mode 100644 include/migration/migration-failover.h
>>>   create mode 100644 migration/colo-failover.c
>>>
>>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>>
>>> +++ b/qapi-schema.json
>>> @@ -543,6 +543,15 @@
>>>   { 'command': 'query-migrate-capabilities', 'returns':
>>> ['MigrationCapabilityStatus']}
>>>
>>>   ##
>>> +# @colo-lost-heartbeat
>>> +#
>>> +# Tell COLO that heartbeat is lost
>>> +#
>>> +# Since: 2.3
>>> +##
>>> +{ 'command': 'colo-lost-heartbeat' }
>>
>> Okay...
>>
>>> +
>>> +##
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>>> index a85d847..1b4a5ca 100644
>>> --- a/qmp-commands.hx
>>> +++ b/qmp-commands.hx
>>> @@ -753,6 +753,25 @@ Example:
>>>   EQMP
>>>
>>>       {
>>> +        .name       = "colo_lost_heartbeat",
>>
>> ...but documented incorrectly (this should use '-' to match the command
>> name in the .json file, not '_')
>>
>
> Er, yes, you are right, here it should be 'colo-lost-heartbeat' in qmp-commands.hx,
> but 'colo_lost_heartbeat' in hmp-commands.hx, it is a little confused for me,
> why it should be like this?
>
> i will fix it.
>
>>> +        .args_type  = "",
>>> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
>>> +    },
>>> +
>>> +SQMP
>>> +colo_lost_heartbeat
>>> +--------------------
>>> +
>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>> +
>>> +Example:
>>> +
>>> +-> { "execute": "colo_lost_heartbeat" }
>>> +<- { "return": {} }
>>
>> This example won't work unless you fix the spelling.
>>
>
> Should here also be changed to 'colo-lost-heartbeat' ?

No...

>
> Thanks,
> zhanghailiang
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-25  7:04     ` zhanghailiang
  2015-02-25  7:16       ` Hongyang Yang
@ 2015-02-25  7:40       ` Wen Congyang
  2015-03-06 16:10       ` Eric Blake
  2 siblings, 0 replies; 65+ messages in thread
From: Wen Congyang @ 2015-02-25  7:40 UTC (permalink / raw)
  To: zhanghailiang, Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, stefanha, pbonzini, Yang Hongyang

On 02/25/2015 03:04 PM, zhanghailiang wrote:
> On 2015/2/17 7:47, Eric Blake wrote:
>> On 02/11/2015 08:17 PM, zhanghailiang wrote:
>>> We leave users to use whatever heartbeat solution they want, if the heartbeat
>>> is lost, or other errors they detect, they can use command
>>> 'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
>>> accordingly.
>>>
>>> For example,
>>> If send the command to PVM, Primary will exit COLO mode, and takeover,
>>> if to Secondary, Secondary will do failover work and at last takeover server.
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> ---
>>>   hmp-commands.hx                        | 15 ++++++++++++++
>>>   hmp.c                                  |  7 +++++++
>>>   hmp.h                                  |  1 +
>>>   include/migration/migration-colo.h     |  1 +
>>>   include/migration/migration-failover.h | 20 ++++++++++++++++++
>>>   migration/Makefile.objs                |  2 +-
>>>   migration/colo-failover.c              | 38 ++++++++++++++++++++++++++++++++++
>>>   migration/colo.c                       |  1 +
>>>   qapi-schema.json                       |  9 ++++++++
>>>   qmp-commands.hx                        | 19 +++++++++++++++++
>>>   stubs/migration-colo.c                 |  8 +++++++
>>>   11 files changed, 120 insertions(+), 1 deletion(-)
>>>   create mode 100644 include/migration/migration-failover.h
>>>   create mode 100644 migration/colo-failover.c
>>>
>>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>>
>>> +++ b/qapi-schema.json
>>> @@ -543,6 +543,15 @@
>>>   { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>>>
>>>   ##
>>> +# @colo-lost-heartbeat
>>> +#
>>> +# Tell COLO that heartbeat is lost
>>> +#
>>> +# Since: 2.3
>>> +##
>>> +{ 'command': 'colo-lost-heartbeat' }
>>
>> Okay...
>>
>>> +
>>> +##
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>>> index a85d847..1b4a5ca 100644
>>> --- a/qmp-commands.hx
>>> +++ b/qmp-commands.hx
>>> @@ -753,6 +753,25 @@ Example:
>>>   EQMP
>>>
>>>       {
>>> +        .name       = "colo_lost_heartbeat",
>>
>> ...but documented incorrectly (this should use '-' to match the command
>> name in the .json file, not '_')
>>
> 
> Er, yes, you are right, here it should be 'colo-lost-heartbeat' in qmp-commands.hx,
> but 'colo_lost_heartbeat' in hmp-commands.hx, it is a little confused for me,
> why it should be like this?
> 
> i will fix it.
> 
>>> +        .args_type  = "",
>>> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
>>> +    },
>>> +
>>> +SQMP
>>> +colo_lost_heartbeat

same here

>>> +--------------------
>>> +
>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>> +
>>> +Example:
>>> +
>>> +-> { "execute": "colo_lost_heartbeat" }
>>> +<- { "return": {} }
>>
>> This example won't work unless you fix the spelling.
>>
> 
> Should here also be changed to 'colo-lost-heartbeat' ?

Yes.

Thanks
Wen Congyang

> 
> Thanks,
> zhanghailiang
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-16 23:50   ` Eric Blake
  2015-02-24  9:50     ` Wen Congyang
@ 2015-02-25  7:50     ` zhanghailiang
  1 sibling, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  7:50 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, Gao feng, stefanha, pbonzini

On 2015/2/17 7:50, Eric Blake wrote:
> On 02/11/2015 08:17 PM, zhanghailiang wrote:
>> The 'colo_nicname' should be assigned with network name,
>> for exmple, 'eth2'. It will be parameter of 'colo_script',
>
> s/exmple/example/
>
>> 'colo_script' should be assigned with an scirpt path.
>
> s/an scirpt/a script/
>
>>
>> We parse these parameter in tap.
>
> Script files are in general very hard to secure.  Libvirt marks any
> domain that uses a script file for controlling networking as tainted,
> because it cannot guarantee that the script did not do arbitrary
> actions.  Can you come up with any better solution that does not require
> a script file, such as having management software responsible for
> passing in an already-opened fd?
>

Hmm, it is a good idea to discard the script, i will look into it later ;)

>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   include/net/net.h |  4 ++++
>>   net/tap.c         | 27 ++++++++++++++++++++++++---
>>   qapi-schema.json  |  8 +++++++-
>>   qemu-options.hx   | 10 +++++++++-
>>   4 files changed, 44 insertions(+), 5 deletions(-)
>>
>
>> +++ b/qapi-schema.json
>> @@ -2101,6 +2101,10 @@
>>   #
>>   # @queues: #optional number of queues to be created for multiqueue capable tap
>>   #
>> +# @colo_nicname: #optional the host physical nic for QEMU (Since 2.3)
>> +#
>> +# @colo_script: #optional the script file which used by COLO (Since 2.3)
>
> s/_/-/ in both parameter names, please.  Since they are optional, it
> might be worth documenting what they default to when not present.
>

OK, will fix that. thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-24 17:24         ` Daniel P. Berrange
@ 2015-02-25  8:21           ` zhanghailiang
  2015-02-25 10:09             ` Daniel P. Berrange
  0 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  8:21 UTC (permalink / raw)
  To: Daniel P. Berrange, Eric Blake
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gao feng, stefanha, pbonzini,
	dgilbert

On 2015/2/25 1:24, Daniel P. Berrange wrote:
> On Tue, Feb 24, 2015 at 09:30:56AM -0700, Eric Blake wrote:
>> On 02/24/2015 02:50 AM, Wen Congyang wrote:
>>>> Script files are in general very hard to secure.  Libvirt marks any
>>>> domain that uses a script file for controlling networking as tainted,
>>>> because it cannot guarantee that the script did not do arbitrary
>>>> actions.  Can you come up with any better solution that does not require
>>>> a script file, such as having management software responsible for
>>>> passing in an already-opened fd?
>>>
>>> Do you mean that opening the script in libvirt?
>>>
>>
>> No, I mean a solution that needs no script file at all.  Have libvirt
>> pre-open the TAP device you will need, then pass in the fd that will be
>> used for the colo NIC.
>
> Agreed, we really must not add new features that require executing
> arbitrary blackbox shell scripts to QEMU, when we know that reslts in
> a flawed security model. And just pushing the script execution upto
> libvirt is not really a satisfactory solution either.
>

Hmm, this script is mainly used for controlling net packet forward by using tc
command and setting iptable rules for colo by using iptables command.
Is there any API for linux iptables and tc (traffic control) ?

Thanks,
zhanghailiang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-25  3:44     ` zhanghailiang
@ 2015-02-25  9:08       ` Dr. David Alan Gilbert
  2015-02-25  9:38         ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-25  9:08 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, Gao feng, stefanha, pbonzini

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/2/16 20:03, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>Implement colo nic device interface configure()
> >>add a script to configure nic devices:
> >>${QEMU_SCRIPT_DIR}/colo-proxy-script.sh
> >
> >Do you have some more documentation of the new colo-proxy?  I've
> 
> Yes, gaofeng is writing it now...

Great.

> >been reading the kernel module source and I can see that it's
> >a nice idea to do the sequence number adjustment on the host,
> >that reduces the need to modify the guest kernel; I was trying to
> >figure out how you synchronise the master/slave idea of sequence numbers -
> >is that purely from the 'ack' that's duplicated back to the secondary?
> 
> Yes, you've got it :)
> 
> >If you were unlucky and the 'ack' packet was lost on the duplicated
> >link from the primary to secondary how would you recover?
> 
> The 'ack' packet will be consider to be lost, because the primary will not
> respond to this 'ack' packet until it got secondary's response,
> and client will resend it ('ack' packet).
> 
> >What about TCP connections setup before colo was activated?
> >
> 
> Actually, now, we only support activate colo before guest is startup (for test procedure,
> '-S' is needed for qemu command line).

Consider this:
     1) Start primary
     2) Start secondary
     3) Start the colo pairing
     4) Primary fails
     5) Colo failover to secondary

Now we have only the old secondary running; we'd really like to get back to
having a pair of fault-tolerant hosts, so it would be good to be able to:

     6) Make the old secondary the new primary
     7) Add a new secondary
     8) Start colo-pairing to the new secondary

You could theoretically do this with colo-agent, but not with colo-proxy.

> >The other thought is that passing the 'sec_dev' as a module parameter
> >gives you an artificial limitation; it forces all of the pairs
> >to be between the same pair of hosts.   If the 'sec_dev' was a parameter
> >to the connection then you could have different slaves associated with
> >each guest on the primary host.
> >
> 
> Hmm, do you mean we should pass this 'sec_dev' as a parameter from qemu to proxy module by
> maybe ioctl ?

Yes, ioctl or tc or whatever; and make it per-guest.

> Yes, it is ugly to pass this 'sec_dev' directly to module as parameter.
> We will consider this, thanks ;)

Thanks!

> >Dave
> >P.S. You probably need to clean the debug messages up in the kernel module!
> >
> 
> OK, will do that.

Thanks.

Dave

> 
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> >>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>---
> >>  net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
> >>  scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 143 insertions(+), 1 deletion(-)
> >>  create mode 100755 scripts/colo-proxy-script.sh
> >>
> >>diff --git a/net/colo-nic.c b/net/colo-nic.c
> >>index 965af49..f8fc35d 100644
> >>--- a/net/colo-nic.c
> >>+++ b/net/colo-nic.c
> >>@@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
> >>      return nc && nc->colo_script[0] && nc->colo_nicname[0];
> >>  }
> >>
> >>+static int launch_colo_script(char *argv[])
> >>+{
> >>+    int pid, status;
> >>+    char *script = argv[0];
> >>+
> >>+    /* try to launch network script */
> >>+    pid = fork();
> >>+    if (pid == 0) {
> >>+        execv(script, argv);
> >>+        _exit(1);
> >>+    } else if (pid > 0) {
> >>+        while (waitpid(pid, &status, 0) != pid) {
> >>+            /* loop */
> >>+        }
> >>+
> >>+        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
> >>+            return 0;
> >>+        }
> >>+    }
> >>+    return -1;
> >>+}
> >>+
> >>+static int colo_nic_configure(NetClientState *nc,
> >>+            bool up, int side, int index)
> >>+{
> >>+    int i, argc = 6;
> >>+    char *argv[7], index_str[32];
> >>+    char **parg;
> >>+
> >>+    if (!nc && index <= 0) {
> >>+        error_report("Can not parse colo_script or colo_nicname");
> >>+        return -1;
> >>+    }
> >>+
> >>+    parg = argv;
> >>+    *parg++ = nc->colo_script;
> >>+    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
> >>+    *parg++ = (char *)(up ? "install" : "uninstall");
> >>+    *parg++ = nc->colo_nicname;
> >>+    *parg++ = nc->ifname;
> >>+    sprintf(index_str, "%d", index);
> >>+    *parg++ = index_str;
> >>+    *parg = NULL;
> >>+
> >>+    for (i = 0; i < argc; i++) {
> >>+        if (!argv[i][0]) {
> >>+            error_report("Can not get colo_script argument");
> >>+            return -1;
> >>+        }
> >>+    }
> >>+
> >>+    return launch_colo_script(argv);
> >>+}
> >>+
> >>  void colo_add_nic_devices(NetClientState *nc)
> >>  {
> >>      struct nic_device *nic = g_malloc0(sizeof(*nic));
> >>
> >>      nic->support_colo = colo_nic_support;
> >>-    nic->configure = NULL;
> >>+    nic->configure = colo_nic_configure;
> >>      /*
> >>       * TODO
> >>       * only support "-netdev tap,colo_scripte..."  options
> >>diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
> >>new file mode 100755
> >>index 0000000..c7aa53f
> >>--- /dev/null
> >>+++ b/scripts/colo-proxy-script.sh
> >>@@ -0,0 +1,88 @@
> >>+#!/bin/sh
> >>+#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
> >>+#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
> >>+
> >>+side=$1
> >>+action=$2
> >>+phy_if=$3
> >>+virt_if=$4
> >>+index=$5
> >>+br=br1
> >>+failover_br=br0
> >>+
> >>+script_usage()
> >>+{
> >>+    echo -n "usage: ./colo-proxy-script.sh master/slave "
> >>+    echo -e "install/uninstall phy_if virt_if index\n"
> >>+}
> >>+
> >>+master_install()
> >>+{
> >>+    tc qdisc add dev $virt_if root handle 1: prio
> >>+    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+
> >>+    modprobe nf_conntrack_ipv4
> >>+    modprobe xt_PMYCOLO sec_dev=$phy_if
> >>+
> >>+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> >>+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> >>+    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
> >>+}
> >>+
> >>+master_uninstall()
> >>+{
> >>+    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>+    tc qdisc del dev $virt_if root handle 1: prio
> >>+
> >>+    /usr/local/sbin/iptables -t mangle -F
> >>+    /usr/local/sbin/ip6tables -t mangle -F
> >>+    /usr/local/sbin/arptables -F
> >>+    rmmod xt_PMYCOLO
> >>+}
> >>+
> >>+slave_install()
> >>+{
> >>+    brctl addif $br $phy_if
> >>+    modprobe xt_SECCOLO
> >>+
> >>+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> >>+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> >>+}
> >>+
> >>+
> >>+slave_uninstall()
> >>+{
> >>+    brctl delif $br $phy_if
> >>+    brctl delif $br $virt_if
> >>+    brctl addif $failover_br $virt_if
> >>+
> >>+    /usr/local/sbin/iptables -t mangle -F
> >>+    /usr/local/sbin/ip6tables -t mangle -F
> >>+    rmmod xt_SECCOLO
> >>+}
> >>+
> >>+if [ $# -ne 5 ]; then
> >>+    script_usage
> >>+    exit 1
> >>+fi
> >>+
> >>+if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
> >>+    script_usage
> >>+    exit 2
> >>+fi
> >>+
> >>+if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
> >>+    script_usage
> >>+    exit 3
> >>+fi
> >>+
> >>+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
> >>+    echo "index overflow"
> >>+    exit 4
> >>+fi
> >>+
> >>+${side}_${action}
> >>--
> >>1.7.12.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration
  2015-02-16 21:57   ` Eric Blake
@ 2015-02-25  9:19     ` zhanghailiang
  0 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  9:19 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, Gonglei, stefanha, pbonzini,
	Yang Hongyang

On 2015/2/17 5:57, Eric Blake wrote:
> On 02/11/2015 08:16 PM, zhanghailiang wrote:
>> This capability allows Primary VM (PVM) to be continuously checkpointed
>> to secondary VM.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>   include/migration/migration.h |  1 +
>>   migration/migration.c         | 15 +++++++++++++++
>>   qapi-schema.json              |  5 ++++-
>>   3 files changed, 20 insertions(+), 1 deletion(-)
>>
>
>> +++ b/migration/migration.c
>> @@ -276,6 +276,15 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>>       }
>>
>>       for (cap = params; cap; cap = cap->next) {
>> +#ifndef CONFIG_COLO
>> +        if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
>> +            cap->value->state) {
>> +            error_setg(errp, "COLO is not currently supported, please"
>> +                             " configure with --enable-colo option in order to"
>> +                             " support COLO feature");
>> +            continue;
>> +        }
>> +#endif
>>           s->enabled_capabilities[cap->value->capability] = cap->value->state;
>>       }
>
> Yuck.  This means that probing whether colo is supported requires a
> usage test (try setting the capability with migrate-set-capabilities and
> see if it fails) instead of a query test (list the current capabilities;
> if colo is in the set then it is supported).  Can you figure out a way
> to avoid exposing the colo capability if !CONFIG_COLO, so that
> query-migate-capabilities is sufficient to learn if colo is supported?
>

What about using colo_supported() function to instead of compile macro ?
Like what we have done in v2 version:
in colo.c

bool colo_supported(void)
{
     return true;
}

in stubs/migration-colo.c
bool colo_supported(void)
{
     return false;
}

And then we call this function in qmp_migrate_set_capabilities
@@ -292,15 +295,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
      }

      for (cap = params; cap; cap = cap->next) {
-#ifndef CONFIG_COLO
          if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
-            cap->value->state) {
+            cap->value->state && !colo_supported()) {
              error_setg(errp, "COLO is not currently supported, please"
                               " configure with --enable-colo option in order to"
                               " support COLO feature");
              continue;
          }
-#endif
          s->enabled_capabilities[cap->value->capability] = cap->value->state;
      }
  }

For qmp_query_migrate_capabilities we call it like:

@@ -158,6 +158,9 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)

      caps = NULL; /* silence compiler warning */
      for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+        if (i == MIGRATION_CAPABILITY_COLO && !colo_supported()) {
+            continue;
+        }

Thanks,
zhanghailiang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-25  9:08       ` Dr. David Alan Gilbert
@ 2015-02-25  9:38         ` zhanghailiang
  2015-02-25  9:40           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-02-25  9:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gao feng, stefanha, pbonzini

On 2015/2/25 17:08, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/2/16 20:03, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> Implement colo nic device interface configure()
>>>> add a script to configure nic devices:
>>>> ${QEMU_SCRIPT_DIR}/colo-proxy-script.sh
>>>
>>> Do you have some more documentation of the new colo-proxy?  I've
>>
>> Yes, gaofeng is writing it now...
>
> Great.
>
>>> been reading the kernel module source and I can see that it's
>>> a nice idea to do the sequence number adjustment on the host,
>>> that reduces the need to modify the guest kernel; I was trying to
>>> figure out how you synchronise the master/slave idea of sequence numbers -
>>> is that purely from the 'ack' that's duplicated back to the secondary?
>>
>> Yes, you've got it :)
>>
>>> If you were unlucky and the 'ack' packet was lost on the duplicated
>>> link from the primary to secondary how would you recover?
>>
>> The 'ack' packet will be consider to be lost, because the primary will not
>> respond to this 'ack' packet until it got secondary's response,
>> and client will resend it ('ack' packet).
>>
>>> What about TCP connections setup before colo was activated?
>>>
>>
>> Actually, now, we only support activate colo before guest is startup (for test procedure,
>> '-S' is needed for qemu command line).
>
> Consider this:
>       1) Start primary
>       2) Start secondary
>       3) Start the colo pairing
>       4) Primary fails
>       5) Colo failover to secondary
>
> Now we have only the old secondary running; we'd really like to get back to
> having a pair of fault-tolerant hosts, so it would be good to be able to:
>
>       6) Make the old secondary the new primary
>       7) Add a new secondary
>       8) Start colo-pairing to the new secondary
>

Er, what you described is continuous FT, yes, it is in our TODO list.

> You could theoretically do this with colo-agent, but not with colo-proxy.
>

We have decided using colo-proxy, it has more advantages.

>>> The other thought is that passing the 'sec_dev' as a module parameter
>>> gives you an artificial limitation; it forces all of the pairs
>>> to be between the same pair of hosts.   If the 'sec_dev' was a parameter
>>> to the connection then you could have different slaves associated with
>>> each guest on the primary host.
>>>
>>
>> Hmm, do you mean we should pass this 'sec_dev' as a parameter from qemu to proxy module by
>> maybe ioctl ?
>
> Yes, ioctl or tc or whatever; and make it per-guest.
>

OK. Thanks.

>> Yes, it is ugly to pass this 'sec_dev' directly to module as parameter.
>> We will consider this, thanks ;)
>
> Thanks!
>
>>> Dave
>>> P.S. You probably need to clean the debug messages up in the kernel module!
>>>
>>
>> OK, will do that.
>
> Thanks.
>
> Dave
>
>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> ---
>>>>   net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
>>>>   scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 143 insertions(+), 1 deletion(-)
>>>>   create mode 100755 scripts/colo-proxy-script.sh
>>>>
>>>> diff --git a/net/colo-nic.c b/net/colo-nic.c
>>>> index 965af49..f8fc35d 100644
>>>> --- a/net/colo-nic.c
>>>> +++ b/net/colo-nic.c
>>>> @@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
>>>>       return nc && nc->colo_script[0] && nc->colo_nicname[0];
>>>>   }
>>>>
>>>> +static int launch_colo_script(char *argv[])
>>>> +{
>>>> +    int pid, status;
>>>> +    char *script = argv[0];
>>>> +
>>>> +    /* try to launch network script */
>>>> +    pid = fork();
>>>> +    if (pid == 0) {
>>>> +        execv(script, argv);
>>>> +        _exit(1);
>>>> +    } else if (pid > 0) {
>>>> +        while (waitpid(pid, &status, 0) != pid) {
>>>> +            /* loop */
>>>> +        }
>>>> +
>>>> +        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
>>>> +            return 0;
>>>> +        }
>>>> +    }
>>>> +    return -1;
>>>> +}
>>>> +
>>>> +static int colo_nic_configure(NetClientState *nc,
>>>> +            bool up, int side, int index)
>>>> +{
>>>> +    int i, argc = 6;
>>>> +    char *argv[7], index_str[32];
>>>> +    char **parg;
>>>> +
>>>> +    if (!nc && index <= 0) {
>>>> +        error_report("Can not parse colo_script or colo_nicname");
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    parg = argv;
>>>> +    *parg++ = nc->colo_script;
>>>> +    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
>>>> +    *parg++ = (char *)(up ? "install" : "uninstall");
>>>> +    *parg++ = nc->colo_nicname;
>>>> +    *parg++ = nc->ifname;
>>>> +    sprintf(index_str, "%d", index);
>>>> +    *parg++ = index_str;
>>>> +    *parg = NULL;
>>>> +
>>>> +    for (i = 0; i < argc; i++) {
>>>> +        if (!argv[i][0]) {
>>>> +            error_report("Can not get colo_script argument");
>>>> +            return -1;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return launch_colo_script(argv);
>>>> +}
>>>> +
>>>>   void colo_add_nic_devices(NetClientState *nc)
>>>>   {
>>>>       struct nic_device *nic = g_malloc0(sizeof(*nic));
>>>>
>>>>       nic->support_colo = colo_nic_support;
>>>> -    nic->configure = NULL;
>>>> +    nic->configure = colo_nic_configure;
>>>>       /*
>>>>        * TODO
>>>>        * only support "-netdev tap,colo_scripte..."  options
>>>> diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
>>>> new file mode 100755
>>>> index 0000000..c7aa53f
>>>> --- /dev/null
>>>> +++ b/scripts/colo-proxy-script.sh
>>>> @@ -0,0 +1,88 @@
>>>> +#!/bin/sh
>>>> +#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
>>>> +#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
>>>> +
>>>> +side=$1
>>>> +action=$2
>>>> +phy_if=$3
>>>> +virt_if=$4
>>>> +index=$5
>>>> +br=br1
>>>> +failover_br=br0
>>>> +
>>>> +script_usage()
>>>> +{
>>>> +    echo -n "usage: ./colo-proxy-script.sh master/slave "
>>>> +    echo -e "install/uninstall phy_if virt_if index\n"
>>>> +}
>>>> +
>>>> +master_install()
>>>> +{
>>>> +    tc qdisc add dev $virt_if root handle 1: prio
>>>> +    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +
>>>> +    modprobe nf_conntrack_ipv4
>>>> +    modprobe xt_PMYCOLO sec_dev=$phy_if
>>>> +
>>>> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
>>>> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
>>>> +    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
>>>> +}
>>>> +
>>>> +master_uninstall()
>>>> +{
>>>> +    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
>>>> +    tc qdisc del dev $virt_if root handle 1: prio
>>>> +
>>>> +    /usr/local/sbin/iptables -t mangle -F
>>>> +    /usr/local/sbin/ip6tables -t mangle -F
>>>> +    /usr/local/sbin/arptables -F
>>>> +    rmmod xt_PMYCOLO
>>>> +}
>>>> +
>>>> +slave_install()
>>>> +{
>>>> +    brctl addif $br $phy_if
>>>> +    modprobe xt_SECCOLO
>>>> +
>>>> +    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
>>>> +    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
>>>> +}
>>>> +
>>>> +
>>>> +slave_uninstall()
>>>> +{
>>>> +    brctl delif $br $phy_if
>>>> +    brctl delif $br $virt_if
>>>> +    brctl addif $failover_br $virt_if
>>>> +
>>>> +    /usr/local/sbin/iptables -t mangle -F
>>>> +    /usr/local/sbin/ip6tables -t mangle -F
>>>> +    rmmod xt_SECCOLO
>>>> +}
>>>> +
>>>> +if [ $# -ne 5 ]; then
>>>> +    script_usage
>>>> +    exit 1
>>>> +fi
>>>> +
>>>> +if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
>>>> +    script_usage
>>>> +    exit 2
>>>> +fi
>>>> +
>>>> +if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
>>>> +    script_usage
>>>> +    exit 3
>>>> +fi
>>>> +
>>>> +if [ $index -lt 0 ] || [ $index -gt 100 ]; then
>>>> +    echo "index overflow"
>>>> +    exit 4
>>>> +fi
>>>> +
>>>> +${side}_${action}
>>>> --
>>>> 1.7.12.4
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure()
  2015-02-25  9:38         ` zhanghailiang
@ 2015-02-25  9:40           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-02-25  9:40 UTC (permalink / raw)
  To: zhanghailiang
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gao feng, stefanha, pbonzini

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/2/25 17:08, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2015/2/16 20:03, Dr. David Alan Gilbert wrote:
> >>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>Implement colo nic device interface configure()
> >>>>add a script to configure nic devices:
> >>>>${QEMU_SCRIPT_DIR}/colo-proxy-script.sh
> >>>
> >>>Do you have some more documentation of the new colo-proxy?  I've
> >>
> >>Yes, gaofeng is writing it now...
> >
> >Great.
> >
> >>>been reading the kernel module source and I can see that it's
> >>>a nice idea to do the sequence number adjustment on the host,
> >>>that reduces the need to modify the guest kernel; I was trying to
> >>>figure out how you synchronise the master/slave idea of sequence numbers -
> >>>is that purely from the 'ack' that's duplicated back to the secondary?
> >>
> >>Yes, you've got it :)
> >>
> >>>If you were unlucky and the 'ack' packet was lost on the duplicated
> >>>link from the primary to secondary how would you recover?
> >>
> >>The 'ack' packet will be consider to be lost, because the primary will not
> >>respond to this 'ack' packet until it got secondary's response,
> >>and client will resend it ('ack' packet).
> >>
> >>>What about TCP connections setup before colo was activated?
> >>>
> >>
> >>Actually, now, we only support activate colo before guest is startup (for test procedure,
> >>'-S' is needed for qemu command line).
> >
> >Consider this:
> >      1) Start primary
> >      2) Start secondary
> >      3) Start the colo pairing
> >      4) Primary fails
> >      5) Colo failover to secondary
> >
> >Now we have only the old secondary running; we'd really like to get back to
> >having a pair of fault-tolerant hosts, so it would be good to be able to:
> >
> >      6) Make the old secondary the new primary
> >      7) Add a new secondary
> >      8) Start colo-pairing to the new secondary
> >
> 
> Er, what you described is continuous FT, yes, it is in our TODO list.

Ah, OK, I wondered what that meant.

Dave

> >You could theoretically do this with colo-agent, but not with colo-proxy.
> >
> 
> We have decided using colo-proxy, it has more advantages.
> 
> >>>The other thought is that passing the 'sec_dev' as a module parameter
> >>>gives you an artificial limitation; it forces all of the pairs
> >>>to be between the same pair of hosts.   If the 'sec_dev' was a parameter
> >>>to the connection then you could have different slaves associated with
> >>>each guest on the primary host.
> >>>
> >>
> >>Hmm, do you mean we should pass this 'sec_dev' as a parameter from qemu to proxy module by
> >>maybe ioctl ?
> >
> >Yes, ioctl or tc or whatever; and make it per-guest.
> >
> 
> OK. Thanks.
> 
> >>Yes, it is ugly to pass this 'sec_dev' directly to module as parameter.
> >>We will consider this, thanks ;)
> >
> >Thanks!
> >
> >>>Dave
> >>>P.S. You probably need to clean the debug messages up in the kernel module!
> >>>
> >>
> >>OK, will do that.
> >
> >Thanks.
> >
> >Dave
> >
> >>
> >>>>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>>>Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> >>>>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>>>---
> >>>>  net/colo-nic.c               | 56 +++++++++++++++++++++++++++-
> >>>>  scripts/colo-proxy-script.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++
> >>>>  2 files changed, 143 insertions(+), 1 deletion(-)
> >>>>  create mode 100755 scripts/colo-proxy-script.sh
> >>>>
> >>>>diff --git a/net/colo-nic.c b/net/colo-nic.c
> >>>>index 965af49..f8fc35d 100644
> >>>>--- a/net/colo-nic.c
> >>>>+++ b/net/colo-nic.c
> >>>>@@ -39,12 +39,66 @@ static bool colo_nic_support(NetClientState *nc)
> >>>>      return nc && nc->colo_script[0] && nc->colo_nicname[0];
> >>>>  }
> >>>>
> >>>>+static int launch_colo_script(char *argv[])
> >>>>+{
> >>>>+    int pid, status;
> >>>>+    char *script = argv[0];
> >>>>+
> >>>>+    /* try to launch network script */
> >>>>+    pid = fork();
> >>>>+    if (pid == 0) {
> >>>>+        execv(script, argv);
> >>>>+        _exit(1);
> >>>>+    } else if (pid > 0) {
> >>>>+        while (waitpid(pid, &status, 0) != pid) {
> >>>>+            /* loop */
> >>>>+        }
> >>>>+
> >>>>+        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
> >>>>+            return 0;
> >>>>+        }
> >>>>+    }
> >>>>+    return -1;
> >>>>+}
> >>>>+
> >>>>+static int colo_nic_configure(NetClientState *nc,
> >>>>+            bool up, int side, int index)
> >>>>+{
> >>>>+    int i, argc = 6;
> >>>>+    char *argv[7], index_str[32];
> >>>>+    char **parg;
> >>>>+
> >>>>+    if (!nc && index <= 0) {
> >>>>+        error_report("Can not parse colo_script or colo_nicname");
> >>>>+        return -1;
> >>>>+    }
> >>>>+
> >>>>+    parg = argv;
> >>>>+    *parg++ = nc->colo_script;
> >>>>+    *parg++ = (char *)(side == COLO_SECONDARY_MODE ? "slave" : "master");
> >>>>+    *parg++ = (char *)(up ? "install" : "uninstall");
> >>>>+    *parg++ = nc->colo_nicname;
> >>>>+    *parg++ = nc->ifname;
> >>>>+    sprintf(index_str, "%d", index);
> >>>>+    *parg++ = index_str;
> >>>>+    *parg = NULL;
> >>>>+
> >>>>+    for (i = 0; i < argc; i++) {
> >>>>+        if (!argv[i][0]) {
> >>>>+            error_report("Can not get colo_script argument");
> >>>>+            return -1;
> >>>>+        }
> >>>>+    }
> >>>>+
> >>>>+    return launch_colo_script(argv);
> >>>>+}
> >>>>+
> >>>>  void colo_add_nic_devices(NetClientState *nc)
> >>>>  {
> >>>>      struct nic_device *nic = g_malloc0(sizeof(*nic));
> >>>>
> >>>>      nic->support_colo = colo_nic_support;
> >>>>-    nic->configure = NULL;
> >>>>+    nic->configure = colo_nic_configure;
> >>>>      /*
> >>>>       * TODO
> >>>>       * only support "-netdev tap,colo_scripte..."  options
> >>>>diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
> >>>>new file mode 100755
> >>>>index 0000000..c7aa53f
> >>>>--- /dev/null
> >>>>+++ b/scripts/colo-proxy-script.sh
> >>>>@@ -0,0 +1,88 @@
> >>>>+#!/bin/sh
> >>>>+#usage: ./colo-proxy-script.sh master/slave install/uninstall phy_if virt_if index
> >>>>+#.e.g ./colo-proxy-script.sh master install eth2 tap0 1
> >>>>+
> >>>>+side=$1
> >>>>+action=$2
> >>>>+phy_if=$3
> >>>>+virt_if=$4
> >>>>+index=$5
> >>>>+br=br1
> >>>>+failover_br=br0
> >>>>+
> >>>>+script_usage()
> >>>>+{
> >>>>+    echo -n "usage: ./colo-proxy-script.sh master/slave "
> >>>>+    echo -e "install/uninstall phy_if virt_if index\n"
> >>>>+}
> >>>>+
> >>>>+master_install()
> >>>>+{
> >>>>+    tc qdisc add dev $virt_if root handle 1: prio
> >>>>+    tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+    tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+    tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+
> >>>>+    modprobe nf_conntrack_ipv4
> >>>>+    modprobe xt_PMYCOLO sec_dev=$phy_if
> >>>>+
> >>>>+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> >>>>+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j PMYCOLO --index $index
> >>>>+    /usr/local/sbin/arptables -I INPUT -i $phy_if -j MARK --set-mark $index
> >>>>+}
> >>>>+
> >>>>+master_uninstall()
> >>>>+{
> >>>>+    tc filter del dev $virt_if parent 1: protocol ip prio 10 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+    tc filter del dev $virt_if parent 1: protocol arp prio 11 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+    tc filter del dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $phy_if
> >>>>+    tc qdisc del dev $virt_if root handle 1: prio
> >>>>+
> >>>>+    /usr/local/sbin/iptables -t mangle -F
> >>>>+    /usr/local/sbin/ip6tables -t mangle -F
> >>>>+    /usr/local/sbin/arptables -F
> >>>>+    rmmod xt_PMYCOLO
> >>>>+}
> >>>>+
> >>>>+slave_install()
> >>>>+{
> >>>>+    brctl addif $br $phy_if
> >>>>+    modprobe xt_SECCOLO
> >>>>+
> >>>>+    /usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> >>>>+    /usr/local/sbin/ip6tables -t mangle -I PREROUTING -m physdev --physdev-in $virt_if -j SECCOLO --index $index
> >>>>+}
> >>>>+
> >>>>+
> >>>>+slave_uninstall()
> >>>>+{
> >>>>+    brctl delif $br $phy_if
> >>>>+    brctl delif $br $virt_if
> >>>>+    brctl addif $failover_br $virt_if
> >>>>+
> >>>>+    /usr/local/sbin/iptables -t mangle -F
> >>>>+    /usr/local/sbin/ip6tables -t mangle -F
> >>>>+    rmmod xt_SECCOLO
> >>>>+}
> >>>>+
> >>>>+if [ $# -ne 5 ]; then
> >>>>+    script_usage
> >>>>+    exit 1
> >>>>+fi
> >>>>+
> >>>>+if [ "x$side" != "xmaster" ] && [ "x$side" != "xslave" ]; then
> >>>>+    script_usage
> >>>>+    exit 2
> >>>>+fi
> >>>>+
> >>>>+if [ "x$action" != "xinstall" ] && [ "x$action" != "xuninstall" ]; then
> >>>>+    script_usage
> >>>>+    exit 3
> >>>>+fi
> >>>>+
> >>>>+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
> >>>>+    echo "index overflow"
> >>>>+    exit 4
> >>>>+fi
> >>>>+
> >>>>+${side}_${action}
> >>>>--
> >>>>1.7.12.4
> >>>>
> >>>>
> >>>--
> >>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>.
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  2015-02-25  8:21           ` zhanghailiang
@ 2015-02-25 10:09             ` Daniel P. Berrange
  0 siblings, 0 replies; 65+ messages in thread
From: Daniel P. Berrange @ 2015-02-25 10:09 UTC (permalink / raw)
  To: zhanghailiang
  Cc: hangaohuai, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gao feng, stefanha, pbonzini,
	dgilbert

On Wed, Feb 25, 2015 at 04:21:15PM +0800, zhanghailiang wrote:
> On 2015/2/25 1:24, Daniel P. Berrange wrote:
> >On Tue, Feb 24, 2015 at 09:30:56AM -0700, Eric Blake wrote:
> >>On 02/24/2015 02:50 AM, Wen Congyang wrote:
> >>>>Script files are in general very hard to secure.  Libvirt marks any
> >>>>domain that uses a script file for controlling networking as tainted,
> >>>>because it cannot guarantee that the script did not do arbitrary
> >>>>actions.  Can you come up with any better solution that does not require
> >>>>a script file, such as having management software responsible for
> >>>>passing in an already-opened fd?
> >>>
> >>>Do you mean that opening the script in libvirt?
> >>>
> >>
> >>No, I mean a solution that needs no script file at all.  Have libvirt
> >>pre-open the TAP device you will need, then pass in the fd that will be
> >>used for the colo NIC.
> >
> >Agreed, we really must not add new features that require executing
> >arbitrary blackbox shell scripts to QEMU, when we know that reslts in
> >a flawed security model. And just pushing the script execution upto
> >libvirt is not really a satisfactory solution either.
> >
> 
> Hmm, this script is mainly used for controlling net packet forward by using tc
> command and setting iptable rules for colo by using iptables command.
> Is there any API for linux iptables and tc (traffic control) ?

I think you'll need to explain in detail exactly what the requirements
are in terms of firewall and traffic shaping setup.  Libvirt itself
already applies firewall and traffic shaping rules to guests, when
instructed by the mgmt application todo so. So if this new feature
requires specific settings for firewall / traffic shaping, then it
will be neccessary to update libvirt to make it do the right thing.
You can't have two separate bits of code both modifying the firewall
and traffic shaping rules for the same guest as it will end in
disaster

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
@ 2015-03-05 17:12   ` Dr. David Alan Gilbert
  2015-03-06  2:35     ` zhanghailiang
  0 siblings, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-05 17:12 UTC (permalink / raw)
  To: zhanghailiang
  Cc: yunhong.jiang, eddie.dong, qemu-devel, dgilbert, Gao feng,
	stefanha, pbonzini, peter.huangpeng

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  include/net/colo-nic.h |  3 ++-
>  migration/colo.c       | 22 ++++++++++++++++++----
>  net/colo-nic.c         | 19 +++++++++++++++++++
>  3 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
> index 67c9807..ddc21cd 100644
> --- a/include/net/colo-nic.h
> +++ b/include/net/colo-nic.h
> @@ -20,5 +20,6 @@ void colo_add_nic_devices(NetClientState *nc);
>  void colo_remove_nic_devices(NetClientState *nc);
>  
>  int colo_proxy_compare(void);
> -
> +int colo_proxy_failover(void);
> +int colo_proxy_checkpoint(void);
>  #endif
> diff --git a/migration/colo.c b/migration/colo.c
> index 579aabf..874971c 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -94,6 +94,11 @@ static void slave_do_failover(void)
>          ;
>      }
>  
> +    if (colo_proxy_failover() != 0) {
> +        error_report("colo proxy failed to do failover");
> +    }
> +    colo_proxy_destroy(COLO_SECONDARY_MODE);

I'm not sure if this is the best thing to do on a secondary failover.
If I understand correctly, when it's running, we have:


-------+
       |                    br0---eth0
       |
 slave +-tun - xt_SECCOLO - br1---eth1
       |
-------+

what I think that colo-proxy-destroy  is doing is rewiring that as:


-------+
       |     +--------------br0---eth0
       |     |
 slave +-tun +              br1---eth1
       |
-------+

but now we've lost the sequence number adjustment data that
was held in xt_SECCOLO and so you are likely to break existing TCP
connections.

Also, I don't think colo-proxy-script is passed a flag to let it
know whether the reason it's doing a slave_uninstall is due to
a failover or a simple shutdown; and so it assumes it has
to do the rewire for a failover.
(Actually the script in the qemu repo is newer than the script in
the colo-proxy repo, that one doesn't have the rewire at all).

Dave

> +
>      colo = NULL;
>  
>      if (!autostart) {
> @@ -115,7 +120,7 @@ static void master_do_failover(void)
>      if (!colo_runstate_is_stopped()) {
>          vm_stop_force_state(RUN_STATE_COLO);
>      }
> -
> +    colo_proxy_destroy(COLO_PRIMARY_MODE);
>      if (s->state != MIG_STATE_ERROR) {
>          migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
>      }
> @@ -245,6 +250,11 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
>  
>      qemu_fflush(trans);
>  
> +    ret = colo_proxy_checkpoint();
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>      if (ret < 0) {
>          goto out;
> @@ -387,8 +397,6 @@ out:
>      qemu_bh_schedule(s->cleanup_bh);
>      qemu_mutex_unlock_iothread();
>  
> -    colo_proxy_destroy(COLO_PRIMARY_MODE);
> -
>      return NULL;
>  }
>  
> @@ -508,6 +516,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
>              goto out;
>          }
>  
> +        ret = colo_proxy_checkpoint();
> +        if (ret < 0) {
> +                goto out;
> +        }
> +        DPRINTF("proxy begin to do checkpoint\n");
> +
>          ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
>          if (ret < 0) {
>              goto out;
> @@ -584,6 +598,7 @@ out:
>          * just kill slave
>          */
>          error_report("SVM is going to exit!");
> +        colo_proxy_destroy(COLO_SECONDARY_MODE);
>          exit(1);
>      } else {
>          /* if we went here, means master may dead, we are doing failover */
> @@ -610,6 +625,5 @@ out:
>  
>      loadvm_exit_colo();
>  
> -    colo_proxy_destroy(COLO_SECONDARY_MODE);
>      return NULL;
>  }
> diff --git a/net/colo-nic.c b/net/colo-nic.c
> index 563d661..02a454d 100644
> --- a/net/colo-nic.c
> +++ b/net/colo-nic.c
> @@ -379,6 +379,25 @@ void colo_proxy_destroy(int side)
>      cp_info.index = -1;
>      colo_nic_side = -1;
>  }
> +
> +int colo_proxy_failover(void)
> +{
> +    if (colo_proxy_send(NULL, 0, COLO_FAILOVER) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +int colo_proxy_checkpoint(void)
> +{
> +    if (colo_proxy_send(NULL, 0, COLO_CHECKPOINT) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>  do checkpoint: return 1
>  error: return -1
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover
  2015-03-05 17:12   ` Dr. David Alan Gilbert
@ 2015-03-06  2:35     ` zhanghailiang
  0 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-03-06  2:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: hangaohuai, yunhong.jiang, eddie.dong, peter.huangpeng,
	qemu-devel, Gao feng, stefanha, pbonzini

On 2015/3/6 1:12, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> ---
>>   include/net/colo-nic.h |  3 ++-
>>   migration/colo.c       | 22 ++++++++++++++++++----
>>   net/colo-nic.c         | 19 +++++++++++++++++++
>>   3 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
>> index 67c9807..ddc21cd 100644
>> --- a/include/net/colo-nic.h
>> +++ b/include/net/colo-nic.h
>> @@ -20,5 +20,6 @@ void colo_add_nic_devices(NetClientState *nc);
>>   void colo_remove_nic_devices(NetClientState *nc);
>>
>>   int colo_proxy_compare(void);
>> -
>> +int colo_proxy_failover(void);
>> +int colo_proxy_checkpoint(void);
>>   #endif
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 579aabf..874971c 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -94,6 +94,11 @@ static void slave_do_failover(void)
>>           ;
>>       }
>>
>> +    if (colo_proxy_failover() != 0) {
>> +        error_report("colo proxy failed to do failover");
>> +    }
>> +    colo_proxy_destroy(COLO_SECONDARY_MODE);
>

Hi, Dave

> I'm not sure if this is the best thing to do on a secondary failover.
> If I understand correctly, when it's running, we have:
>
>
> -------+
>         |                    br0---eth0
>         |
>   slave +-tun - xt_SECCOLO - br1---eth1
>         |
> -------+
>
> what I think that colo-proxy-destroy  is doing is rewiring that as:
>
>
> -------+
>         |     +--------------br0---eth0
>         |     |
>   slave +-tun +              br1---eth1
>         |
> -------+
>

Yes, you got it.

> but now we've lost the sequence number adjustment data that
> was held in xt_SECCOLO and so you are likely to break existing TCP
> connections.
>

In our test, we didn't come across the 'break existing TCP connections' situation,
We only adjust the sequence number at the beginning of building connection, after
the connection is build, this data in xt_SECCOLO is useless ...

> Also, I don't think colo-proxy-script is passed a flag to let it
> know whether the reason it's doing a slave_uninstall is due to
> a failover or a simple shutdown; and so it assumes it has
> to do the rewire for a failover.
> (Actually the script in the qemu repo is newer than the script in
> the colo-proxy repo, that one doesn't have the rewire at all).
>

You are right, we should distinguish between shutdown and failover for the slave_uninstall,
Actually, using script to do the corresponding work maybe not so appropriate,
we are trying to fix the net-related part.

Thanks,
zhanghailiang
> Dave
>
>> +
>>       colo = NULL;
>>
>>       if (!autostart) {
>> @@ -115,7 +120,7 @@ static void master_do_failover(void)
>>       if (!colo_runstate_is_stopped()) {
>>           vm_stop_force_state(RUN_STATE_COLO);
>>       }
>> -
>> +    colo_proxy_destroy(COLO_PRIMARY_MODE);
>>       if (s->state != MIG_STATE_ERROR) {
>>           migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
>>       }
>> @@ -245,6 +250,11 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
>>
>>       qemu_fflush(trans);
>>
>> +    ret = colo_proxy_checkpoint();
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>>       if (ret < 0) {
>>           goto out;
>> @@ -387,8 +397,6 @@ out:
>>       qemu_bh_schedule(s->cleanup_bh);
>>       qemu_mutex_unlock_iothread();
>>
>> -    colo_proxy_destroy(COLO_PRIMARY_MODE);
>> -
>>       return NULL;
>>   }
>>
>> @@ -508,6 +516,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>               goto out;
>>           }
>>
>> +        ret = colo_proxy_checkpoint();
>> +        if (ret < 0) {
>> +                goto out;
>> +        }
>> +        DPRINTF("proxy begin to do checkpoint\n");
>> +
>>           ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
>>           if (ret < 0) {
>>               goto out;
>> @@ -584,6 +598,7 @@ out:
>>           * just kill slave
>>           */
>>           error_report("SVM is going to exit!");
>> +        colo_proxy_destroy(COLO_SECONDARY_MODE);
>>           exit(1);
>>       } else {
>>           /* if we went here, means master may dead, we are doing failover */
>> @@ -610,6 +625,5 @@ out:
>>
>>       loadvm_exit_colo();
>>
>> -    colo_proxy_destroy(COLO_SECONDARY_MODE);
>>       return NULL;
>>   }
>> diff --git a/net/colo-nic.c b/net/colo-nic.c
>> index 563d661..02a454d 100644
>> --- a/net/colo-nic.c
>> +++ b/net/colo-nic.c
>> @@ -379,6 +379,25 @@ void colo_proxy_destroy(int side)
>>       cp_info.index = -1;
>>       colo_nic_side = -1;
>>   }
>> +
>> +int colo_proxy_failover(void)
>> +{
>> +    if (colo_proxy_send(NULL, 0, COLO_FAILOVER) < 0) {
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +int colo_proxy_checkpoint(void)
>> +{
>> +    if (colo_proxy_send(NULL, 0, COLO_CHECKPOINT) < 0) {
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   /*
>>   do checkpoint: return 1
>>   error: return -1
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-02-25  7:04     ` zhanghailiang
  2015-02-25  7:16       ` Hongyang Yang
  2015-02-25  7:40       ` Wen Congyang
@ 2015-03-06 16:10       ` Eric Blake
  2015-03-09  1:15         ` zhanghailiang
  2 siblings, 1 reply; 65+ messages in thread
From: Eric Blake @ 2015-03-06 16:10 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, stefanha, pbonzini, Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

On 02/25/2015 12:04 AM, zhanghailiang wrote:

>>> +++ b/qmp-commands.hx
>>> @@ -753,6 +753,25 @@ Example:
>>>   EQMP
>>>
>>>       {
>>> +        .name       = "colo_lost_heartbeat",
>>
>> ...but documented incorrectly (this should use '-' to match the command
>> name in the .json file, not '_')
>>
> 
> Er, yes, you are right, here it should be 'colo-lost-heartbeat' in
> qmp-commands.hx,
> but 'colo_lost_heartbeat' in hmp-commands.hx, it is a little confused
> for me,
> why it should be like this?

Historical madness.  HMP has traditionally used '_' (and relied on
tab-completion to allow users to skip having to use the shift key),
while QMP has traditionally used '-' (in all but the oldest interfaces).

> 
> i will fix it.
> 
>>> +        .args_type  = "",
>>> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
>>> +    },
>>> +
>>> +SQMP
>>> +colo_lost_heartbeat
>>> +--------------------
>>> +
>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>> +
>>> +Example:
>>> +
>>> +-> { "execute": "colo_lost_heartbeat" }
>>> +<- { "return": {} }
>>
>> This example won't work unless you fix the spelling.
>>
> 
> Should here also be changed to 'colo-lost-heartbeat' ?

Yes.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover
  2015-03-06 16:10       ` Eric Blake
@ 2015-03-09  1:15         ` zhanghailiang
  0 siblings, 0 replies; 65+ messages in thread
From: zhanghailiang @ 2015-03-09  1:15 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, stefanha, pbonzini, Yang Hongyang

On 2015/3/7 0:10, Eric Blake wrote:
> On 02/25/2015 12:04 AM, zhanghailiang wrote:
>
>>>> +++ b/qmp-commands.hx
>>>> @@ -753,6 +753,25 @@ Example:
>>>>    EQMP
>>>>
>>>>        {
>>>> +        .name       = "colo_lost_heartbeat",
>>>
>>> ...but documented incorrectly (this should use '-' to match the command
>>> name in the .json file, not '_')
>>>
>>
>> Er, yes, you are right, here it should be 'colo-lost-heartbeat' in
>> qmp-commands.hx,
>> but 'colo_lost_heartbeat' in hmp-commands.hx, it is a little confused
>> for me,
>> why it should be like this?
>
> Historical madness.  HMP has traditionally used '_' (and relied on
> tab-completion to allow users to skip having to use the shift key),
> while QMP has traditionally used '-' (in all but the oldest interfaces).
>

Got it.

>>
>> i will fix it.
>>
>>>> +        .args_type  = "",
>>>> +        .mhandler.cmd_new = qmp_marshal_input_colo_lost_heartbeat,
>>>> +    },
>>>> +
>>>> +SQMP
>>>> +colo_lost_heartbeat
>>>> +--------------------
>>>> +
>>>> +Tell COLO that heartbeat is lost, a failover or takeover is needed.
>>>> +
>>>> +Example:
>>>> +
>>>> +-> { "execute": "colo_lost_heartbeat" }
>>>> +<- { "return": {} }
>>>
>>> This example won't work unless you fix the spelling.
>>>
>>
>> Should here also be changed to 'colo-lost-heartbeat' ?
>
> Yes.
>

OK, thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
@ 2015-03-11 19:08   ` Dr. David Alan Gilbert
  2015-03-12  2:02     ` zhanghailiang
  2015-03-11 20:07   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-11 19:08 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong, qemu-devel,
	peter.huangpeng, Gonglei, stefanha, pbonzini, Yang Hongyang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We only need to flush RAM that is both dirty on PVM and SVM since
> last checkpoint. Besides, we must ensure flush RAM cache before load
> device state.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>

This could do with some more comments; colo_flush_ram_cache is quite complex.
See below.

> ---
>  arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
>  include/migration/migration-colo.h |  1 +
>  migration/colo.c                   |  1 -
>  3 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 4a1d825..f70de23 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
>      static uint64_t seq_iter;
> +    bool need_flush = false;
>  
>      seq_iter++;
>  
> @@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  break;
>              }
>  
> +            need_flush = true;
>              ch = qemu_get_byte(f);
>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>              break;
> @@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  break;
>              }
>  
> +            need_flush = true;
>              qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>              break;
>          case RAM_SAVE_FLAG_XBZRLE:
> @@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  ret = -EINVAL;
>                  break;
>              }
> +            need_flush = true;
>              break;
>          case RAM_SAVE_FLAG_EOS:
>              /* normal exit */
> @@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              ret = qemu_file_get_error(f);
>          }
>      }
> -
> +    if (!ret  && ram_cache_enable && need_flush) {
> +        DPRINTF("Flush ram_cache\n");
> +        colo_flush_ram_cache();
> +    }
>      DPRINTF("Completed load of VM with exit code %d seq iteration "
>              "%" PRIu64 "\n", ret, seq_iter);
>      return ret;
> @@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  void create_and_init_ram_cache(void)
>  {
>      RAMBlock *block;
> +    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>  
>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>          block->host_cache = g_malloc(block->used_length);
> @@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
>      }
>  
>      ram_cache_enable = true;
> +    /*
> +    * Start dirty log for slave VM, we will use this dirty bitmap together with
> +    * VM's cache RAM dirty bitmap to decide which page in cache should be
> +    * flushed into VM's RAM.
> +    */
> +    migration_bitmap = bitmap_new(ram_cache_pages);
> +    migration_dirty_pages = 0;
> +    memory_global_dirty_log_start();
>  }
>  
>  void release_ram_cache(void)
> @@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
>      return block->host_cache + (addr - block->offset);
>  }
>  
> +static inline
> +ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
> +                                            ram_addr_t start)
> +{
> +    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
> +    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> +    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
> +
> +    unsigned long next;
> +
> +    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
> +                         size, nr);
> +    if (next < size) {
> +        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> +    }
> +    return (next - base) << TARGET_PAGE_BITS;
> +}
> +
> +void colo_flush_ram_cache(void)
> +{
> +    RAMBlock *block = NULL;
> +    void *dst_host;
> +    void *src_host;
> +    ram_addr_t ca  = 0, ha = 0;
> +    bool got_ca = 0, got_ha = 0;
> +    int64_t host_dirty = 0, both_dirty = 0;
> +
> +    address_space_sync_dirty_bitmap(&address_space_memory);
> +
> +    block = QTAILQ_FIRST(&ram_list.blocks);
> +    while (true) {
> +        if (ca < block->used_length && ca <= ha) {
> +            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
> +            if (ca < block->used_length) {
> +                got_ca = 1;
> +            }
> +        }
> +        if (ha < block->used_length && ha <= ca) {
> +            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
> +            if (ha < block->used_length && ha != ca) {
> +                got_ha = 1;
> +            }
> +            host_dirty += (ha < block->used_length ? 1 : 0);
> +            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
> +        }
> +        if (ca >= block->used_length && ha >= block->used_length) {
> +            ca = 0;
> +            ha = 0;
> +            block = QTAILQ_NEXT(block, next);
> +            if (!block) {
> +                break;
> +            }
> +        } else {
> +            if (got_ha) {
> +                got_ha = 0;
> +                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> +                           + ha;
> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> +            }
> +            if (got_ca) {
> +                got_ca = 0;
> +                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> +                           + ca;
> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> +            }

Both of these cases are copying from the ram_cache to the main RAM; what
copies from main RAM into the RAM cache, other than create_and_init_ram_cache?

I can see create_and_init_ram_cache creates the initial copy at startup,
and I can see the code that feeds the memory from the PVM into the SVM via
the RAM cache; but don't you need to take a copy of the SVM memory before
you start running each checkpoint, in case the SVM changes a page that
the PVM didn't change (SVM dirty, PVM isn't dirty) and then when you load
that new checkpoint how do you restore that SVM page to be the same as the
PVM (i.e. the same as at the start of that checkpoint)?

Does that rely on a previous checkpoint receiving the new page from the PVM
to update the ram cache?

Dave

> +        }
> +    }
> +
> +    assert(migration_dirty_pages == 0);
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_live_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
> index 7d43aed..2084fe2 100644
> --- a/include/migration/migration-colo.h
> +++ b/include/migration/migration-colo.h
> @@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
>  bool loadvm_in_colo_state(void);
>  /* ram cache */
>  void create_and_init_ram_cache(void);
> +void colo_flush_ram_cache(void);
>  void release_ram_cache(void);
>  #endif
> diff --git a/migration/colo.c b/migration/colo.c
> index a0e1b7a..5ff2ee8 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
>          }
>          DPRINTF("Finish load all vm state to cache\n");
>          qemu_mutex_unlock_iothread();
> -        /* TODO: flush vm state */
>  
>          ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>          if (ret < 0) {
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
  2015-03-11 19:08   ` Dr. David Alan Gilbert
@ 2015-03-11 20:07   ` Dr. David Alan Gilbert
  2015-03-12  2:27     ` zhanghailiang
  1 sibling, 1 reply; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-11 20:07 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong, qemu-devel,
	dgilbert, Gonglei, stefanha, pbonzini, peter.huangpeng,
	Yang Hongyang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We only need to flush RAM that is both dirty on PVM and SVM since
> last checkpoint. Besides, we must ensure flush RAM cache before load
> device state.

Actually with a follow up to my previous question, can you explain the 'both'
in that description.

If a page was dirty on just the PVM, but not the SVM, you would have to copy
the new PVM page into the SVM ram before executing with the newly received device
state, otherwise the device state would be inconsistent with the RAM state.

Dave

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
>  include/migration/migration-colo.h |  1 +
>  migration/colo.c                   |  1 -
>  3 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 4a1d825..f70de23 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
>      static uint64_t seq_iter;
> +    bool need_flush = false;
>  
>      seq_iter++;
>  
> @@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  break;
>              }
>  
> +            need_flush = true;
>              ch = qemu_get_byte(f);
>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>              break;
> @@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  break;
>              }
>  
> +            need_flush = true;
>              qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>              break;
>          case RAM_SAVE_FLAG_XBZRLE:
> @@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  ret = -EINVAL;
>                  break;
>              }
> +            need_flush = true;
>              break;
>          case RAM_SAVE_FLAG_EOS:
>              /* normal exit */
> @@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              ret = qemu_file_get_error(f);
>          }
>      }
> -
> +    if (!ret  && ram_cache_enable && need_flush) {
> +        DPRINTF("Flush ram_cache\n");
> +        colo_flush_ram_cache();
> +    }
>      DPRINTF("Completed load of VM with exit code %d seq iteration "
>              "%" PRIu64 "\n", ret, seq_iter);
>      return ret;
> @@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  void create_and_init_ram_cache(void)
>  {
>      RAMBlock *block;
> +    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>  
>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>          block->host_cache = g_malloc(block->used_length);
> @@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
>      }
>  
>      ram_cache_enable = true;
> +    /*
> +    * Start dirty log for slave VM, we will use this dirty bitmap together with
> +    * VM's cache RAM dirty bitmap to decide which page in cache should be
> +    * flushed into VM's RAM.
> +    */
> +    migration_bitmap = bitmap_new(ram_cache_pages);
> +    migration_dirty_pages = 0;
> +    memory_global_dirty_log_start();
>  }
>  
>  void release_ram_cache(void)
> @@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
>      return block->host_cache + (addr - block->offset);
>  }
>  
> +static inline
> +ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
> +                                            ram_addr_t start)
> +{
> +    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
> +    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> +    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
> +
> +    unsigned long next;
> +
> +    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
> +                         size, nr);
> +    if (next < size) {
> +        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> +    }
> +    return (next - base) << TARGET_PAGE_BITS;
> +}
> +
> +void colo_flush_ram_cache(void)
> +{
> +    RAMBlock *block = NULL;
> +    void *dst_host;
> +    void *src_host;
> +    ram_addr_t ca  = 0, ha = 0;
> +    bool got_ca = 0, got_ha = 0;
> +    int64_t host_dirty = 0, both_dirty = 0;
> +
> +    address_space_sync_dirty_bitmap(&address_space_memory);
> +
> +    block = QTAILQ_FIRST(&ram_list.blocks);
> +    while (true) {
> +        if (ca < block->used_length && ca <= ha) {
> +            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
> +            if (ca < block->used_length) {
> +                got_ca = 1;
> +            }
> +        }
> +        if (ha < block->used_length && ha <= ca) {
> +            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
> +            if (ha < block->used_length && ha != ca) {
> +                got_ha = 1;
> +            }
> +            host_dirty += (ha < block->used_length ? 1 : 0);
> +            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
> +        }
> +        if (ca >= block->used_length && ha >= block->used_length) {
> +            ca = 0;
> +            ha = 0;
> +            block = QTAILQ_NEXT(block, next);
> +            if (!block) {
> +                break;
> +            }
> +        } else {
> +            if (got_ha) {
> +                got_ha = 0;
> +                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> +                           + ha;
> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> +            }
> +            if (got_ca) {
> +                got_ca = 0;
> +                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> +                           + ca;
> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> +            }
> +        }
> +    }
> +
> +    assert(migration_dirty_pages == 0);
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_live_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
> index 7d43aed..2084fe2 100644
> --- a/include/migration/migration-colo.h
> +++ b/include/migration/migration-colo.h
> @@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
>  bool loadvm_in_colo_state(void);
>  /* ram cache */
>  void create_and_init_ram_cache(void);
> +void colo_flush_ram_cache(void);
>  void release_ram_cache(void);
>  #endif
> diff --git a/migration/colo.c b/migration/colo.c
> index a0e1b7a..5ff2ee8 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
>          }
>          DPRINTF("Finish load all vm state to cache\n");
>          qemu_mutex_unlock_iothread();
> -        /* TODO: flush vm state */
>  
>          ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>          if (ret < 0) {
> -- 
> 1.7.12.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-03-11 19:08   ` Dr. David Alan Gilbert
@ 2015-03-12  2:02     ` zhanghailiang
  2015-03-12 11:49       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-03-12  2:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gonglei, stefanha, pbonzini,
	Yang Hongyang

On 2015/3/12 3:08, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We only need to flush RAM that is both dirty on PVM and SVM since
>> last checkpoint. Besides, we must ensure flush RAM cache before load
>> device state.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>
> This could do with some more comments; colo_flush_ram_cache is quite complex.
> See below.
>
>> ---
>>   arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
>>   include/migration/migration-colo.h |  1 +
>>   migration/colo.c                   |  1 -
>>   3 files changed, 91 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 4a1d825..f70de23 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>>       int flags = 0, ret = 0;
>>       static uint64_t seq_iter;
>> +    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   break;
>>               }
>>
>> +            need_flush = true;
>>               ch = qemu_get_byte(f);
>>               ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>>               break;
>> @@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   break;
>>               }
>>
>> +            need_flush = true;
>>               qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>>               break;
>>           case RAM_SAVE_FLAG_XBZRLE:
>> @@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   ret = -EINVAL;
>>                   break;
>>               }
>> +            need_flush = true;
>>               break;
>>           case RAM_SAVE_FLAG_EOS:
>>               /* normal exit */
>> @@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               ret = qemu_file_get_error(f);
>>           }
>>       }
>> -
>> +    if (!ret  && ram_cache_enable && need_flush) {
>> +        DPRINTF("Flush ram_cache\n");
>> +        colo_flush_ram_cache();
>> +    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> @@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>   void create_and_init_ram_cache(void)
>>   {
>>       RAMBlock *block;
>> +    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>>
>>       QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>>           block->host_cache = g_malloc(block->used_length);
>> @@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
>>       }
>>
>>       ram_cache_enable = true;
>> +    /*
>> +    * Start dirty log for slave VM, we will use this dirty bitmap together with
>> +    * VM's cache RAM dirty bitmap to decide which page in cache should be
>> +    * flushed into VM's RAM.
>> +    */
>> +    migration_bitmap = bitmap_new(ram_cache_pages);
>> +    migration_dirty_pages = 0;
>> +    memory_global_dirty_log_start();
>>   }
>>
>>   void release_ram_cache(void)
>> @@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
>>       return block->host_cache + (addr - block->offset);
>>   }
>>
>> +static inline
>> +ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
>> +                                            ram_addr_t start)
>> +{
>> +    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
>> +    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
>> +    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
>> +
>> +    unsigned long next;
>> +
>> +    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
>> +                         size, nr);
>> +    if (next < size) {
>> +        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
>> +    }
>> +    return (next - base) << TARGET_PAGE_BITS;
>> +}
>> +
>> +void colo_flush_ram_cache(void)
>> +{
>> +    RAMBlock *block = NULL;
>> +    void *dst_host;
>> +    void *src_host;
>> +    ram_addr_t ca  = 0, ha = 0;
>> +    bool got_ca = 0, got_ha = 0;
>> +    int64_t host_dirty = 0, both_dirty = 0;
>> +
>> +    address_space_sync_dirty_bitmap(&address_space_memory);
>> +
>> +    block = QTAILQ_FIRST(&ram_list.blocks);
>> +    while (true) {
>> +        if (ca < block->used_length && ca <= ha) {
>> +            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
>> +            if (ca < block->used_length) {
>> +                got_ca = 1;
>> +            }
>> +        }
>> +        if (ha < block->used_length && ha <= ca) {
>> +            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
>> +            if (ha < block->used_length && ha != ca) {
>> +                got_ha = 1;
>> +            }
>> +            host_dirty += (ha < block->used_length ? 1 : 0);
>> +            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
>> +        }
>> +        if (ca >= block->used_length && ha >= block->used_length) {
>> +            ca = 0;
>> +            ha = 0;
>> +            block = QTAILQ_NEXT(block, next);
>> +            if (!block) {
>> +                break;
>> +            }
>> +        } else {
>> +            if (got_ha) {
>> +                got_ha = 0;
>> +                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
>> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
>> +                           + ha;
>> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
>> +            }
>> +            if (got_ca) {
>> +                got_ca = 0;
>> +                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
>> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
>> +                           + ca;
>> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
>> +            }
>
> Both of these cases are copying from the ram_cache to the main RAM; what
> copies from main RAM into the RAM cache, other than create_and_init_ram_cache?
>
> I can see create_and_init_ram_cache creates the initial copy at startup,
> and I can see the code that feeds the memory from the PVM into the SVM via
> the RAM cache; but don't you need to take a copy of the SVM memory before
> you start running each checkpoint, in case the SVM changes a page that
> the PVM didn't change (SVM dirty, PVM isn't dirty) and then when you load
> that new checkpoint how do you restore that SVM page to be the same as the
> PVM (i.e. the same as at the start of that checkpoint)?
>

Er, one thing is clear: after a round of checkpoint, before PVM and SVM continue to
run, the memory of PVM and SVM should be completely the same, and at the same time,
the content of SVM's RAM cache is SAME with both of them.

During the time of VM's running, PVM/SVM may dirty some pages, we will transfer PVM's
dirty pages to SVM and store them into SVM's RAM cache at next checkpoint time.
So, the content of SVM's RAM cache will always be some with PVM's memory after checkpoint.
Yes, we can certainly flush all content of SVM's RAM cache into SVM's MEMORY, to ensure
SVM's memory same with PVM's. But, is it inefficient?

The better way to do it is:
(1) Log SVM's dirty pages
(2) Only flush the page that either dirtied by PVM or either dirtied by SVM.

> Does that rely on a previous checkpoint receiving the new page from the PVM
> to update the ram cache?

Yes, we never clean the content of ram cache during VM's colo lifecycle.

> Dave
>
>> +        }
>> +    }
>> +
>> +    assert(migration_dirty_pages == 0);
>> +}
>> +
>>   static SaveVMHandlers savevm_ram_handlers = {
>>       .save_live_setup = ram_save_setup,
>>       .save_live_iterate = ram_save_iterate,
>> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
>> index 7d43aed..2084fe2 100644
>> --- a/include/migration/migration-colo.h
>> +++ b/include/migration/migration-colo.h
>> @@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
>>   bool loadvm_in_colo_state(void);
>>   /* ram cache */
>>   void create_and_init_ram_cache(void);
>> +void colo_flush_ram_cache(void);
>>   void release_ram_cache(void);
>>   #endif
>> diff --git a/migration/colo.c b/migration/colo.c
>> index a0e1b7a..5ff2ee8 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           }
>>           DPRINTF("Finish load all vm state to cache\n");
>>           qemu_mutex_unlock_iothread();
>> -        /* TODO: flush vm state */
>>
>>           ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>>           if (ret < 0) {
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-03-11 20:07   ` Dr. David Alan Gilbert
@ 2015-03-12  2:27     ` zhanghailiang
  2015-03-12  9:51       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 65+ messages in thread
From: zhanghailiang @ 2015-03-12  2:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gonglei, stefanha, pbonzini,
	Yang Hongyang

On 2015/3/12 4:07, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We only need to flush RAM that is both dirty on PVM and SVM since
>> last checkpoint. Besides, we must ensure flush RAM cache before load
>> device state.
>
> Actually with a follow up to my previous question, can you explain the 'both'
> in that description.
>

The description is wrong,
It should be 'any page that dirtied by PVM or SVM'. Sorry for my poor english.

> If a page was dirty on just the PVM, but not the SVM, you would have to copy
> the new PVM page into the SVM ram before executing with the newly received device
> state, otherwise the device state would be inconsistent with the RAM state.
>
> Dave
>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>   arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
>>   include/migration/migration-colo.h |  1 +
>>   migration/colo.c                   |  1 -
>>   3 files changed, 91 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 4a1d825..f70de23 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>   {
>>       int flags = 0, ret = 0;
>>       static uint64_t seq_iter;
>> +    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   break;
>>               }
>>
>> +            need_flush = true;
>>               ch = qemu_get_byte(f);
>>               ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>>               break;
>> @@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   break;
>>               }
>>
>> +            need_flush = true;
>>               qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>>               break;
>>           case RAM_SAVE_FLAG_XBZRLE:
>> @@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                   ret = -EINVAL;
>>                   break;
>>               }
>> +            need_flush = true;
>>               break;
>>           case RAM_SAVE_FLAG_EOS:
>>               /* normal exit */
>> @@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               ret = qemu_file_get_error(f);
>>           }
>>       }
>> -
>> +    if (!ret  && ram_cache_enable && need_flush) {
>> +        DPRINTF("Flush ram_cache\n");
>> +        colo_flush_ram_cache();
>> +    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> @@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>   void create_and_init_ram_cache(void)
>>   {
>>       RAMBlock *block;
>> +    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>>
>>       QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>>           block->host_cache = g_malloc(block->used_length);
>> @@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
>>       }
>>
>>       ram_cache_enable = true;
>> +    /*
>> +    * Start dirty log for slave VM, we will use this dirty bitmap together with
>> +    * VM's cache RAM dirty bitmap to decide which page in cache should be
>> +    * flushed into VM's RAM.
>> +    */
>> +    migration_bitmap = bitmap_new(ram_cache_pages);
>> +    migration_dirty_pages = 0;
>> +    memory_global_dirty_log_start();
>>   }
>>
>>   void release_ram_cache(void)
>> @@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
>>       return block->host_cache + (addr - block->offset);
>>   }
>>
>> +static inline
>> +ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
>> +                                            ram_addr_t start)
>> +{
>> +    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
>> +    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
>> +    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
>> +
>> +    unsigned long next;
>> +
>> +    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
>> +                         size, nr);
>> +    if (next < size) {
>> +        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
>> +    }
>> +    return (next - base) << TARGET_PAGE_BITS;
>> +}
>> +
>> +void colo_flush_ram_cache(void)
>> +{
>> +    RAMBlock *block = NULL;
>> +    void *dst_host;
>> +    void *src_host;
>> +    ram_addr_t ca  = 0, ha = 0;
>> +    bool got_ca = 0, got_ha = 0;
>> +    int64_t host_dirty = 0, both_dirty = 0;
>> +
>> +    address_space_sync_dirty_bitmap(&address_space_memory);
>> +
>> +    block = QTAILQ_FIRST(&ram_list.blocks);
>> +    while (true) {
>> +        if (ca < block->used_length && ca <= ha) {
>> +            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
>> +            if (ca < block->used_length) {
>> +                got_ca = 1;
>> +            }
>> +        }
>> +        if (ha < block->used_length && ha <= ca) {
>> +            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
>> +            if (ha < block->used_length && ha != ca) {
>> +                got_ha = 1;
>> +            }
>> +            host_dirty += (ha < block->used_length ? 1 : 0);
>> +            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
>> +        }
>> +        if (ca >= block->used_length && ha >= block->used_length) {
>> +            ca = 0;
>> +            ha = 0;
>> +            block = QTAILQ_NEXT(block, next);
>> +            if (!block) {
>> +                break;
>> +            }
>> +        } else {
>> +            if (got_ha) {
>> +                got_ha = 0;
>> +                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
>> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
>> +                           + ha;
>> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
>> +            }
>> +            if (got_ca) {
>> +                got_ca = 0;
>> +                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
>> +                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
>> +                           + ca;
>> +                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
>> +            }
>> +        }
>> +    }
>> +
>> +    assert(migration_dirty_pages == 0);
>> +}
>> +
>>   static SaveVMHandlers savevm_ram_handlers = {
>>       .save_live_setup = ram_save_setup,
>>       .save_live_iterate = ram_save_iterate,
>> diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
>> index 7d43aed..2084fe2 100644
>> --- a/include/migration/migration-colo.h
>> +++ b/include/migration/migration-colo.h
>> @@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
>>   bool loadvm_in_colo_state(void);
>>   /* ram cache */
>>   void create_and_init_ram_cache(void);
>> +void colo_flush_ram_cache(void);
>>   void release_ram_cache(void);
>>   #endif
>> diff --git a/migration/colo.c b/migration/colo.c
>> index a0e1b7a..5ff2ee8 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           }
>>           DPRINTF("Finish load all vm state to cache\n");
>>           qemu_mutex_unlock_iothread();
>> -        /* TODO: flush vm state */
>>
>>           ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>>           if (ret < 0) {
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-03-12  2:27     ` zhanghailiang
@ 2015-03-12  9:51       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-12  9:51 UTC (permalink / raw)
  To: zhanghailiang
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gonglei, stefanha, pbonzini,
	Yang Hongyang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/12 4:07, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>We only need to flush RAM that is both dirty on PVM and SVM since
> >>last checkpoint. Besides, we must ensure flush RAM cache before load
> >>device state.
> >
> >Actually with a follow up to my previous question, can you explain the 'both'
> >in that description.
> >
> 
> The description is wrong,
> It should be 'any page that dirtied by PVM or SVM'. Sorry for my poor english.

That's fine; thank you for the clarification.

Dave

> 
> >If a page was dirty on just the PVM, but not the SVM, you would have to copy
> >the new PVM page into the SVM ram before executing with the newly received device
> >state, otherwise the device state would be inconsistent with the RAM state.
> >
> >Dave
> >
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
> >>Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >>---
> >>  arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
> >>  include/migration/migration-colo.h |  1 +
> >>  migration/colo.c                   |  1 -
> >>  3 files changed, 91 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/arch_init.c b/arch_init.c
> >>index 4a1d825..f70de23 100644
> >>--- a/arch_init.c
> >>+++ b/arch_init.c
> >>@@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>  {
> >>      int flags = 0, ret = 0;
> >>      static uint64_t seq_iter;
> >>+    bool need_flush = false;
> >>
> >>      seq_iter++;
> >>
> >>@@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  break;
> >>              }
> >>
> >>+            need_flush = true;
> >>              ch = qemu_get_byte(f);
> >>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> >>              break;
> >>@@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  break;
> >>              }
> >>
> >>+            need_flush = true;
> >>              qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> >>              break;
> >>          case RAM_SAVE_FLAG_XBZRLE:
> >>@@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  ret = -EINVAL;
> >>                  break;
> >>              }
> >>+            need_flush = true;
> >>              break;
> >>          case RAM_SAVE_FLAG_EOS:
> >>              /* normal exit */
> >>@@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>              ret = qemu_file_get_error(f);
> >>          }
> >>      }
> >>-
> >>+    if (!ret  && ram_cache_enable && need_flush) {
> >>+        DPRINTF("Flush ram_cache\n");
> >>+        colo_flush_ram_cache();
> >>+    }
> >>      DPRINTF("Completed load of VM with exit code %d seq iteration "
> >>              "%" PRIu64 "\n", ret, seq_iter);
> >>      return ret;
> >>@@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>  void create_and_init_ram_cache(void)
> >>  {
> >>      RAMBlock *block;
> >>+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> >>
> >>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> >>          block->host_cache = g_malloc(block->used_length);
> >>@@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
> >>      }
> >>
> >>      ram_cache_enable = true;
> >>+    /*
> >>+    * Start dirty log for slave VM, we will use this dirty bitmap together with
> >>+    * VM's cache RAM dirty bitmap to decide which page in cache should be
> >>+    * flushed into VM's RAM.
> >>+    */
> >>+    migration_bitmap = bitmap_new(ram_cache_pages);
> >>+    migration_dirty_pages = 0;
> >>+    memory_global_dirty_log_start();
> >>  }
> >>
> >>  void release_ram_cache(void)
> >>@@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
> >>      return block->host_cache + (addr - block->offset);
> >>  }
> >>
> >>+static inline
> >>+ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
> >>+                                            ram_addr_t start)
> >>+{
> >>+    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
> >>+    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> >>+    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
> >>+
> >>+    unsigned long next;
> >>+
> >>+    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
> >>+                         size, nr);
> >>+    if (next < size) {
> >>+        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> >>+    }
> >>+    return (next - base) << TARGET_PAGE_BITS;
> >>+}
> >>+
> >>+void colo_flush_ram_cache(void)
> >>+{
> >>+    RAMBlock *block = NULL;
> >>+    void *dst_host;
> >>+    void *src_host;
> >>+    ram_addr_t ca  = 0, ha = 0;
> >>+    bool got_ca = 0, got_ha = 0;
> >>+    int64_t host_dirty = 0, both_dirty = 0;
> >>+
> >>+    address_space_sync_dirty_bitmap(&address_space_memory);
> >>+
> >>+    block = QTAILQ_FIRST(&ram_list.blocks);
> >>+    while (true) {
> >>+        if (ca < block->used_length && ca <= ha) {
> >>+            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
> >>+            if (ca < block->used_length) {
> >>+                got_ca = 1;
> >>+            }
> >>+        }
> >>+        if (ha < block->used_length && ha <= ca) {
> >>+            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
> >>+            if (ha < block->used_length && ha != ca) {
> >>+                got_ha = 1;
> >>+            }
> >>+            host_dirty += (ha < block->used_length ? 1 : 0);
> >>+            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
> >>+        }
> >>+        if (ca >= block->used_length && ha >= block->used_length) {
> >>+            ca = 0;
> >>+            ha = 0;
> >>+            block = QTAILQ_NEXT(block, next);
> >>+            if (!block) {
> >>+                break;
> >>+            }
> >>+        } else {
> >>+            if (got_ha) {
> >>+                got_ha = 0;
> >>+                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
> >>+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> >>+                           + ha;
> >>+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> >>+            }
> >>+            if (got_ca) {
> >>+                got_ca = 0;
> >>+                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
> >>+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> >>+                           + ca;
> >>+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> >>+            }
> >>+        }
> >>+    }
> >>+
> >>+    assert(migration_dirty_pages == 0);
> >>+}
> >>+
> >>  static SaveVMHandlers savevm_ram_handlers = {
> >>      .save_live_setup = ram_save_setup,
> >>      .save_live_iterate = ram_save_iterate,
> >>diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
> >>index 7d43aed..2084fe2 100644
> >>--- a/include/migration/migration-colo.h
> >>+++ b/include/migration/migration-colo.h
> >>@@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
> >>  bool loadvm_in_colo_state(void);
> >>  /* ram cache */
> >>  void create_and_init_ram_cache(void);
> >>+void colo_flush_ram_cache(void);
> >>  void release_ram_cache(void);
> >>  #endif
> >>diff --git a/migration/colo.c b/migration/colo.c
> >>index a0e1b7a..5ff2ee8 100644
> >>--- a/migration/colo.c
> >>+++ b/migration/colo.c
> >>@@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >>          }
> >>          DPRINTF("Finish load all vm state to cache\n");
> >>          qemu_mutex_unlock_iothread();
> >>-        /* TODO: flush vm state */
> >>
> >>          ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
> >>          if (ret < 0) {
> >>--
> >>1.7.12.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory
  2015-03-12  2:02     ` zhanghailiang
@ 2015-03-12 11:49       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 65+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-12 11:49 UTC (permalink / raw)
  To: zhanghailiang
  Cc: hangaohuai, Lai Jiangshan, Li Zhijian, yunhong.jiang, eddie.dong,
	peter.huangpeng, qemu-devel, Gonglei, stefanha, pbonzini,
	Yang Hongyang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/12 3:08, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>We only need to flush RAM that is both dirty on PVM and SVM since
> >>last checkpoint. Besides, we must ensure flush RAM cache before load
> >>device state.
> >>
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>a
> >>Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >
> >This could do with some more comments; colo_flush_ram_cache is quite complex.
> >See below.
> >
> >>---
> >>  arch_init.c                        | 91 +++++++++++++++++++++++++++++++++++++-
> >>  include/migration/migration-colo.h |  1 +
> >>  migration/colo.c                   |  1 -
> >>  3 files changed, 91 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/arch_init.c b/arch_init.c
> >>index 4a1d825..f70de23 100644
> >>--- a/arch_init.c
> >>+++ b/arch_init.c
> >>@@ -1100,6 +1100,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>  {
> >>      int flags = 0, ret = 0;
> >>      static uint64_t seq_iter;
> >>+    bool need_flush = false;
> >>
> >>      seq_iter++;
> >>
> >>@@ -1163,6 +1164,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  break;
> >>              }
> >>
> >>+            need_flush = true;
> >>              ch = qemu_get_byte(f);
> >>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> >>              break;
> >>@@ -1174,6 +1176,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  break;
> >>              }
> >>
> >>+            need_flush = true;
> >>              qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> >>              break;
> >>          case RAM_SAVE_FLAG_XBZRLE:
> >>@@ -1190,6 +1193,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>                  ret = -EINVAL;
> >>                  break;
> >>              }
> >>+            need_flush = true;
> >>              break;
> >>          case RAM_SAVE_FLAG_EOS:
> >>              /* normal exit */
> >>@@ -1207,7 +1211,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>              ret = qemu_file_get_error(f);
> >>          }
> >>      }
> >>-
> >>+    if (!ret  && ram_cache_enable && need_flush) {
> >>+        DPRINTF("Flush ram_cache\n");
> >>+        colo_flush_ram_cache();
> >>+    }
> >>      DPRINTF("Completed load of VM with exit code %d seq iteration "
> >>              "%" PRIu64 "\n", ret, seq_iter);
> >>      return ret;
> >>@@ -1220,6 +1227,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >>  void create_and_init_ram_cache(void)
> >>  {
> >>      RAMBlock *block;
> >>+    int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> >>
> >>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> >>          block->host_cache = g_malloc(block->used_length);
> >>@@ -1227,6 +1235,14 @@ void create_and_init_ram_cache(void)
> >>      }
> >>
> >>      ram_cache_enable = true;
> >>+    /*
> >>+    * Start dirty log for slave VM, we will use this dirty bitmap together with
> >>+    * VM's cache RAM dirty bitmap to decide which page in cache should be
> >>+    * flushed into VM's RAM.
> >>+    */
> >>+    migration_bitmap = bitmap_new(ram_cache_pages);
> >>+    migration_dirty_pages = 0;
> >>+    memory_global_dirty_log_start();
> >>  }
> >>
> >>  void release_ram_cache(void)
> >>@@ -1261,6 +1277,79 @@ static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
> >>      return block->host_cache + (addr - block->offset);
> >>  }
> >>
> >>+static inline
> >>+ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
> >>+                                            ram_addr_t start)
> >>+{
> >>+    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
> >>+    unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> >>+    unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
> >>+
> >>+    unsigned long next;
> >>+
> >>+    next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
> >>+                         size, nr);
> >>+    if (next < size) {
> >>+        clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> >>+    }
> >>+    return (next - base) << TARGET_PAGE_BITS;
> >>+}
> >>+
> >>+void colo_flush_ram_cache(void)
> >>+{
> >>+    RAMBlock *block = NULL;
> >>+    void *dst_host;
> >>+    void *src_host;
> >>+    ram_addr_t ca  = 0, ha = 0;
> >>+    bool got_ca = 0, got_ha = 0;
> >>+    int64_t host_dirty = 0, both_dirty = 0;
> >>+
> >>+    address_space_sync_dirty_bitmap(&address_space_memory);
> >>+
> >>+    block = QTAILQ_FIRST(&ram_list.blocks);
> >>+    while (true) {
> >>+        if (ca < block->used_length && ca <= ha) {
> >>+            ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
> >>+            if (ca < block->used_length) {
> >>+                got_ca = 1;
> >>+            }
> >>+        }
> >>+        if (ha < block->used_length && ha <= ca) {
> >>+            ha = host_bitmap_find_and_reset_dirty(block->mr, ha);
> >>+            if (ha < block->used_length && ha != ca) {
> >>+                got_ha = 1;
> >>+            }
> >>+            host_dirty += (ha < block->used_length ? 1 : 0);
> >>+            both_dirty += (ha < block->used_length && ha == ca ? 1 : 0);
> >>+        }
> >>+        if (ca >= block->used_length && ha >= block->used_length) {
> >>+            ca = 0;
> >>+            ha = 0;
> >>+            block = QTAILQ_NEXT(block, next);
> >>+            if (!block) {
> >>+                break;
> >>+            }
> >>+        } else {
> >>+            if (got_ha) {
> >>+                got_ha = 0;
> >>+                dst_host = memory_region_get_ram_ptr(block->mr) + ha;
> >>+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> >>+                           + ha;
> >>+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> >>+            }
> >>+            if (got_ca) {
> >>+                got_ca = 0;
> >>+                dst_host = memory_region_get_ram_ptr(block->mr) + ca;
> >>+                src_host = memory_region_get_ram_cache_ptr(block->mr, block)
> >>+                           + ca;
> >>+                memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
> >>+            }
> >
> >Both of these cases are copying from the ram_cache to the main RAM; what
> >copies from main RAM into the RAM cache, other than create_and_init_ram_cache?
> >
> >I can see create_and_init_ram_cache creates the initial copy at startup,
> >and I can see the code that feeds the memory from the PVM into the SVM via
> >the RAM cache; but don't you need to take a copy of the SVM memory before
> >you start running each checkpoint, in case the SVM changes a page that
> >the PVM didn't change (SVM dirty, PVM isn't dirty) and then when you load
> >that new checkpoint how do you restore that SVM page to be the same as the
> >PVM (i.e. the same as at the start of that checkpoint)?
> >
> 
> Er, one thing is clear: after a round of checkpoint, before PVM and SVM continue to
> run, the memory of PVM and SVM should be completely the same, and at the same time,
> the content of SVM's RAM cache is SAME with both of them.
> 
> During the time of VM's running, PVM/SVM may dirty some pages, we will transfer PVM's
> dirty pages to SVM and store them into SVM's RAM cache at next checkpoint time.
> So, the content of SVM's RAM cache will always be some with PVM's memory after checkpoint.
> Yes, we can certainly flush all content of SVM's RAM cache into SVM's MEMORY, to ensure
> SVM's memory same with PVM's. But, is it inefficient?
> 
> The better way to do it is:
> (1) Log SVM's dirty pages
> (2) Only flush the page that either dirtied by PVM or either dirtied by SVM.
> 
> >Does that rely on a previous checkpoint receiving the new page from the PVM
> >to update the ram cache?
> 
> Yes, we never clean the content of ram cache during VM's colo lifecycle.

OK, yes that's more efficient than the original idea that I'd understood.

Dave

> 
> >Dave
> >
> >>+        }
> >>+    }
> >>+
> >>+    assert(migration_dirty_pages == 0);
> >>+}
> >>+
> >>  static SaveVMHandlers savevm_ram_handlers = {
> >>      .save_live_setup = ram_save_setup,
> >>      .save_live_iterate = ram_save_iterate,
> >>diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h
> >>index 7d43aed..2084fe2 100644
> >>--- a/include/migration/migration-colo.h
> >>+++ b/include/migration/migration-colo.h
> >>@@ -36,5 +36,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
> >>  bool loadvm_in_colo_state(void);
> >>  /* ram cache */
> >>  void create_and_init_ram_cache(void);
> >>+void colo_flush_ram_cache(void);
> >>  void release_ram_cache(void);
> >>  #endif
> >>diff --git a/migration/colo.c b/migration/colo.c
> >>index a0e1b7a..5ff2ee8 100644
> >>--- a/migration/colo.c
> >>+++ b/migration/colo.c
> >>@@ -397,7 +397,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >>          }
> >>          DPRINTF("Finish load all vm state to cache\n");
> >>          qemu_mutex_unlock_iothread();
> >>-        /* TODO: flush vm state */
> >>
> >>          ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
> >>          if (ret < 0) {
> >>--
> >>1.7.12.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2015-03-12 11:50 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12  3:16 [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 01/27] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 02/27] migration: Introduce capability 'colo' to migration zhanghailiang
2015-02-16 21:57   ` Eric Blake
2015-02-25  9:19     ` zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 03/27] COLO: migrate colo related info to slave zhanghailiang
2015-02-16 23:20   ` Eric Blake
2015-02-25  6:21     ` zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 04/27] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-02-16 23:27   ` Eric Blake
2015-02-25  6:43     ` zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 05/27] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 06/27] migration: Don't send vm description in COLO mode zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 07/27] COLO: Implement colo checkpoint protocol zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 08/27] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 09/27] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 10/27] COLO: Save VM state to slave when do checkpoint zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 11/27] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
2015-02-12  3:16 ` [Qemu-devel] [PATCH RFC v3 12/27] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 13/27] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
2015-03-11 19:08   ` Dr. David Alan Gilbert
2015-03-12  2:02     ` zhanghailiang
2015-03-12 11:49       ` Dr. David Alan Gilbert
2015-03-11 20:07   ` Dr. David Alan Gilbert
2015-03-12  2:27     ` zhanghailiang
2015-03-12  9:51       ` Dr. David Alan Gilbert
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 14/27] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-02-16 23:47   ` Eric Blake
2015-02-25  7:04     ` zhanghailiang
2015-02-25  7:16       ` Hongyang Yang
2015-02-25  7:40       ` Wen Congyang
2015-03-06 16:10       ` Eric Blake
2015-03-09  1:15         ` zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 15/27] COLO failover: Implement COLO master/slave failover work zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 16/27] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 17/27] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
2015-02-16 23:50   ` Eric Blake
2015-02-24  9:50     ` Wen Congyang
2015-02-24 16:30       ` Eric Blake
2015-02-24 17:24         ` Daniel P. Berrange
2015-02-25  8:21           ` zhanghailiang
2015-02-25 10:09             ` Daniel P. Berrange
2015-02-25  7:50     ` zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 18/27] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 19/27] COLO NIC: Implement colo nic device interface configure() zhanghailiang
2015-02-16 12:03   ` Dr. David Alan Gilbert
2015-02-25  3:44     ` zhanghailiang
2015-02-25  9:08       ` Dr. David Alan Gilbert
2015-02-25  9:38         ` zhanghailiang
2015-02-25  9:40           ` Dr. David Alan Gilbert
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 20/27] COLO NIC : Implement colo nic init/destroy function zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 21/27] COLO NIC: Some init work related with proxy module zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 22/27] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 23/27] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 24/27] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
2015-03-05 17:12   ` Dr. David Alan Gilbert
2015-03-06  2:35     ` zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 25/27] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 26/27] COLO: Implement shutdown checkpoint zhanghailiang
2015-02-12  3:17 ` [Qemu-devel] [PATCH RFC v3 27/27] COLO: Add block replication into colo process zhanghailiang
2015-02-16 13:11 ` [Qemu-devel] [PATCH RFC v3 00/27] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
2015-02-25  5:17   ` Gao feng
2015-02-24 11:08 ` Dr. David Alan Gilbert
2015-02-24 20:13 ` Dr. David Alan Gilbert
2015-02-25  3:20   ` Gao feng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.