[Qemu-devel] [PATCH COLO-Frame v10 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

* [Qemu-devel] [PATCH COLO-Frame v10 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2015-11-03 11:56 zhanghailiang
  2015-11-03 11:56 ` [Qemu-devel] [PATCH COLO-Frame v10 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (37 more replies)
  0 siblings, 38 replies; 100+ messages in thread
From: zhanghailiang @ 2015-11-03 11:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, arei.gonglei, stefanha, amit.shah, zhanghailiang

This is the 10th version of COLO.

Still, this version of COLO is only support periodic checkpoint,
just like MicroCheckpointing and Remus does. We call it 'periodic' mode,
the normal 'colo' mode is based on packets compare module, which
is not supported for now. The compare module 'proxy' is in the process
of development.

The 'peroidic' mode is based on netfilter which has been merged.
It uses the 'filter-buffer' to buffer and release packets.
Patch 32 ~ Patch 36 export several APIs for netfilter to support
this capability.

As usual, here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.1-periodic-mode

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net-pci,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:66 -boot c -drive if=virtio,id=colo-disk1,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S

Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net-pci,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:66 -boot c -drive if=none,id=colo-disk1,file.filename=/dev/null,driver=raw -drive if=virtio,id=active-disk1,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.allow-write-backing-file=on,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing.allow-write-backing-file=on,file.backing.backing.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,file.backing.backing.driver=raw,file.backing.backing.node-name=sdisk -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:8881

2. On Secondary VM's QEMU monitor, issue command
(qemu) blockdev_remove_medium colo-disk1
(qemu) blockdev_insert_medium colo-disk1 sdisk
(qemu) nbd_server_start 192.168.2.88:8880
(qemu) nbd_server_add -w colo-disk1

3. On Primary VM's QEMU monitor, issue command:
(qemu) drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=192.168.2.88,file.port=8880,file.export=colo-disk1,node-name=test3,if=none
(qemu) blockdev_change add colo-disk1 buddy test3
(qemu) migrate_set_capability x-colo on
(qemu) migrate tcp:192.168.2.88:8881

4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
You can by issue command "migrate_set_parameter checkpoint-delay 2000"
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

COLO is a totally new feature which is still in early stage,
your comments and feedback are warmly welcomed.

TODO:
1. implement packets compare module (proxy) in qemu
2. checkpoint based on proxy in qemu
3. The capability of continuous FT

v10:
 - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
 - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
 - Simplify the process of primary side by dropping colo thread and reusing
   migration thread. (Dave's suggestion)
 - Add several netfilter related APIs to support buffer/release packets
   for COLO (patch 32 ~ patch 36)

zhanghailiang (38):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Add state records for migration incoming
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  migration: Rename the'file' member of MigrationState and
    MigrationIncomingState
  COLO/migration: establish a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  COLO: Load VMState into qsb before restore it
  ram/COLO: Record pages received from PVM by re-using migration dirty
    bitmap
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  COLO: implement default failover treatment
  qmp event: Add event notification for COLO error
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Control the checkpoint delay time by migrate-set-parameters
    command
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Split load vm state function qemu_loadvm_state
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  netfilter: Add a public API to release all the buffered packets
  netfilter: Introduce an API to delete the timer of all buffer-filters
  filter-buffer: Accept zero interval
  netfilter: Introduce a API to automatically add filter-buffer for each
    netdev
  netfilter: Introduce an API to delete all the automatically added
    netfilters
  colo: Use the netfilter to buffer and release packets
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  17 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   1 +
 include/migration/colo.h      |  44 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  17 +-
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |   5 +
 include/net/net.h             |   7 +
 include/sysemu/sysemu.h       |   8 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  71 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 788 ++++++++++++++++++++++++++++++++++++++++++
 migration/exec.c              |   4 +-
 migration/fd.c                |   4 +-
 migration/migration.c         | 178 +++++++---
 migration/qemu-file-buf.c     |  58 ++++
 migration/ram.c               | 177 +++++++++-
 migration/savevm.c            | 309 +++++++++++++----
 migration/tcp.c               |   4 +-
 migration/unix.c              |   4 +-
 net/filter-buffer.c           | 143 +++++++-
 net/filter.c                  |  15 +
 net/net.c                     |  44 +++
 qapi-schema.json              | 105 +++++-
 qapi/event.json               |  17 +
 qmp-commands.hx               |  22 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  45 +++
 trace-events                  |   9 +
 vl.c                          |  37 +-
 35 files changed, 2130 insertions(+), 167 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 100+ messages in thread