[Qemu-devel] [PATCH COLO-Frame v11 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

* [Qemu-devel] [PATCH COLO-Frame v11 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2015-11-24  9:25 zhanghailiang
  2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 01/39] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (38 more replies)
  0 siblings, 39 replies; 95+ messages in thread
From: zhanghailiang @ 2015-11-24  9:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: lizhijian, quintela, yunhong.jiang, eddie.dong, peter.huangpeng,
	dgilbert, zhanghailiang, arei.gonglei, stefanha, amit.shah,
	hongyang.yang

This is the 11th version of COLO.

As usual, this version of COLO is only support periodic checkpoint,
just like MicroCheckpointing and Remus does.

The 'peroidic' mode is based on netfilter which has been merged.
It uses the 'filter-buffer' to buffer and release packets.
We add each netdev a default buffer filter and the name is 'nop'.
The 'nop' buffer filter will not buffer any packets in default.
So it has no side effect on netdev.

As usual, here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.2-periodic-mode

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 8 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0 -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw

Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 8 -qmp stdio -vnc :7 -name secondary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0 -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/dev/null,driver=raw -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.allow-write-backing-file=on,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing.allow-write-backing-file=on,file.backing.backing.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,file.backing.backing.driver=raw,file.backing.backing.node-name=node0 -incoming tcp:0:8888

2. On Secondary VM's QEMU monitor, issue command
(Note: Hmp command is unstable, so here we recommand using qmp command)
{'execute':'qmp_capabilities'}
{'execute': 'blockdev-remove-medium', 'arguments': {'device': 'colo-disk0'} }
{'execute': 'blockdev-insert-medium', 'arguments': {'device': 'colo-disk0', 'node-name': 'node0'} }
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.
The qmp command is '{ "execute": "x-colo-lost-heartbeat" }'.

COLO is a totally new feature which is still in early stage,
your comments and feedback are warmly welcomed.

TODO:
1. implement packets compare module (proxy) in qemu
2. checkpoint based on proxy in qemu
3. The capability of continuous FT

v11:
 - Re-implement buffer/release packets based on filter-buffer according
   to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
 - Rebase master to re-use some stuff introduced by post-copy.
 - Address several comments from Eric and Dave, the fixing record can
   be found in each patch.

v10:
 - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
 - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
 - Simplify the process of primary side by dropping colo thread and reusing
   migration thread. (Dave's suggestion)
 - Add several netfilter related APIs to support buffer/release packets
   for COLO (patch 32 ~ patch 36)

zhanghailiang (39):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Export migrate_set_state()
  migration: Add state records for migration incoming
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  migration: Rename the'file' member of MigrationState
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  ram: Split host_from_stream_offset() into two helper functions
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  COLO: implement default failover treatment
  qmp event: Add event notification for COLO error
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Split load vm state function qemu_loadvm_state
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter-buffer: Add default filter-buffer for each netdev
  filter-buffer: Accept zero interval
  filter-buffer: Introduce a helper function to enable/disable default
    filter
  filter-buffer: Introduce a helper function to release packets
  colo: Use default buffer-filter to buffer and release packets
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  17 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   1 +
 include/migration/colo.h      |  38 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  18 +-
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |   5 +
 include/net/net.h             |   4 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  71 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 773 ++++++++++++++++++++++++++++++++++++++++++
 migration/exec.c              |   4 +-
 migration/fd.c                |   4 +-
 migration/migration.c         | 215 ++++++++----
 migration/postcopy-ram.c      |   6 +-
 migration/qemu-file-buf.c     |  61 ++++
 migration/ram.c               | 195 ++++++++++-
 migration/savevm.c            | 295 ++++++++++++----
 migration/tcp.c               |   4 +-
 migration/unix.c              |   4 +-
 net/filter-buffer.c           | 123 ++++++-
 net/net.c                     |  37 ++
 qapi-schema.json              | 110 +++++-
 qapi/event.json               |  17 +
 qmp-commands.hx               |  22 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  45 +++
 trace-events                  |   9 +
 vl.c                          |  37 +-
 35 files changed, 2105 insertions(+), 183 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 95+ messages in thread