From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58770) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtaGD-00063O-Ma for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtaG5-0002WH-89 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:21 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:61439) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtaG3-0002Sd-Qw for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:13 -0500 From: zhanghailiang Date: Tue, 3 Nov 2015 19:56:18 +0800 Message-ID: <1446551816-15768-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Subject: [Qemu-devel] [PATCH COLO-Frame v10 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, zhanghailiang This is the 10th version of COLO. Still, this version of COLO is only support periodic checkpoint, just like MicroCheckpointing and Remus does. We call it 'periodic' mode, the normal 'colo' mode is based on packets compare module, which is not supported for now. The compare module 'proxy' is in the process of development. The 'peroidic' mode is based on netfilter which has been merged. It uses the 'filter-buffer' to buffer and release packets. Patch 32 ~ Patch 36 export several APIs for netfilter to support this capability. As usual, here is only COLO frame part, you can get the whole codes from github: https://github.com/coloft/qemu/commits/colo-v2.1-periodic-mode Test procedure: 1. Startup qemu Primary side: #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net-pci,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:66 -boot c -drive if=virtio,id=colo-disk1,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S Secondary side: #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net-pci,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:66 -boot c -drive if=none,id=colo-disk1,file.filename=/dev/null,driver=raw -drive if=virtio,id=active-disk1,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.allow-write-backing-file=on,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing.allow-write-backing-file=on,file.backing.backing.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,file.backing.backing.driver=raw,file.backing.backing.node-name=sdisk -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8881 2. On Secondary VM's QEMU monitor, issue command (qemu) blockdev_remove_medium colo-disk1 (qemu) blockdev_insert_medium colo-disk1 sdisk (qemu) nbd_server_start 192.168.2.88:8880 (qemu) nbd_server_add -w colo-disk1 3. On Primary VM's QEMU monitor, issue command: (qemu) drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=192.168.2.88,file.port=8880,file.export=colo-disk1,node-name=test3,if=none (qemu) blockdev_change add colo-disk1 buddy test3 (qemu) migrate_set_capability x-colo on (qemu) migrate tcp:192.168.2.88:8881 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. You can by issue command "migrate_set_parameter checkpoint-delay 2000" to change the checkpoint period time. 5. Failover test You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's monitor at the same time, then SVM will failover and client will not feel this change. COLO is a totally new feature which is still in early stage, your comments and feedback are warmly welcomed. TODO: 1. implement packets compare module (proxy) in qemu 2. checkpoint based on proxy in qemu 3. The capability of continuous FT v10: - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat' - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion) - Simplify the process of primary side by dropping colo thread and reusing migration thread. (Dave's suggestion) - Add several netfilter related APIs to support buffer/release packets for COLO (patch 32 ~ patch 36) zhanghailiang (38): configure: Add parameter for configure to enable/disable COLO support migration: Introduce capability 'x-colo' to migration COLO: migrate colo related info to secondary node migration: Add state records for migration incoming migration: Integrate COLO checkpoint process into migration migration: Integrate COLO checkpoint process into loadvm migration: Rename the'file' member of MigrationState and MigrationIncomingState COLO/migration: establish a new communication path from destination to source COLO: Implement colo checkpoint protocol COLO: Add a new RunState RUN_STATE_COLO QEMUSizedBuffer: Introduce two help functions for qsb COLO: Save PVM state to secondary side when do checkpoint COLO: Load PVM's dirty pages into SVM's RAM cache temporarily COLO: Load VMState into qsb before restore it ram/COLO: Record pages received from PVM by re-using migration dirty bitmap COLO: Flush PVM's cached RAM into SVM's memory COLO: synchronize PVM's state to SVM periodically COLO failover: Introduce a new command to trigger a failover COLO failover: Introduce state to record failover process COLO: Implement failover work for Primary VM COLO: Implement failover work for Secondary VM COLO: implement default failover treatment qmp event: Add event notification for COLO error COLO failover: Shutdown related socket fd when do failover COLO failover: Don't do failover during loading VM's state COLO: Control the checkpoint delay time by migrate-set-parameters command COLO: Process shutdown command for VM in COLO state COLO: Update the global runstate after going into colo state savevm: Split load vm state function qemu_loadvm_state COLO: Separate the process of saving/loading ram and device state COLO: Split qemu_savevm_state_begin out of checkpoint process netfilter: Add a public API to release all the buffered packets netfilter: Introduce an API to delete the timer of all buffer-filters filter-buffer: Accept zero interval netfilter: Introduce a API to automatically add filter-buffer for each netdev netfilter: Introduce an API to delete all the automatically added netfilters colo: Use the netfilter to buffer and release packets COLO: Add block replication into colo process configure | 11 + docs/qmp-events.txt | 17 + hmp-commands.hx | 15 + hmp.c | 15 + hmp.h | 1 + include/exec/ram_addr.h | 1 + include/migration/colo.h | 44 +++ include/migration/failover.h | 33 ++ include/migration/migration.h | 17 +- include/migration/qemu-file.h | 3 +- include/net/filter.h | 5 + include/net/net.h | 7 + include/sysemu/sysemu.h | 8 + migration/Makefile.objs | 2 + migration/colo-comm.c | 71 ++++ migration/colo-failover.c | 83 +++++ migration/colo.c | 788 ++++++++++++++++++++++++++++++++++++++++++ migration/exec.c | 4 +- migration/fd.c | 4 +- migration/migration.c | 178 +++++++--- migration/qemu-file-buf.c | 58 ++++ migration/ram.c | 177 +++++++++- migration/savevm.c | 309 +++++++++++++---- migration/tcp.c | 4 +- migration/unix.c | 4 +- net/filter-buffer.c | 143 +++++++- net/filter.c | 15 + net/net.c | 44 +++ qapi-schema.json | 105 +++++- qapi/event.json | 17 + qmp-commands.hx | 22 +- stubs/Makefile.objs | 1 + stubs/migration-colo.c | 45 +++ trace-events | 9 + vl.c | 37 +- 35 files changed, 2130 insertions(+), 167 deletions(-) create mode 100644 include/migration/colo.h create mode 100644 include/migration/failover.h create mode 100644 migration/colo-comm.c create mode 100644 migration/colo-failover.c create mode 100644 migration/colo.c create mode 100644 stubs/migration-colo.c -- 1.8.3.1