qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2016-02-06  9:28 zhanghailiang
  2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 01/40] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (39 more replies)
  0 siblings, 40 replies; 60+ messages in thread
From: zhanghailiang @ 2016-02-06  9:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, armbru, yunhong.jiang,
	eddie.dong, peter.huangpeng, dgilbert, zhanghailiang,
	arei.gonglei, stefanha, amit.shah, zhangchen.fnst, hongyang.yang

This is the 14th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.5-periodic-mode

There are little changes for this series except the network releated part.
We have re-implement this part according to Jason's suggestion. Most of other
parts have been reviewed by Dave.

QEMU has approached soft-freeze for 2.6, we hope COLO prototype to be merged
in 2.6, but we are not sure if we have enough time to catch this train.
So please help us, thanks very much.

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none,id=blk-buddy0'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
  Remove the nbd child from the quorum:
  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
  Note: there is no qmp command to remove the blockdev now

Secondary:
  The primary host is down, so we should do the following thing:
  { 'execute': 'nbd-server-stop' }

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v14:
 - Re-implement the network processing based on netfilter (Jason Wang)
 - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
 - Split two new patches (patch 27/28) from patch 29
 - Fix some other comments from Dave and Markus.

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

zhanghailiang (40):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Introduce two helper functions for save/find loadvm_handlers
    entry
  migration/savevm: Add new helpers to process the different stages of
    loadvm
  migration/savevm: Export two helper functions for savevm process
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter: Add a 'status' property for filter object
  net/filter: Introduce a helper to add a filter to the netdev
  filter-buffer: Accept zero interval
  net: Add notifier/callback for netdev init
  COLO/filter: add each netdev a buffer filter
  net/filter: Add a helper to traverse all the filters
  COLO: enable buffer filters for PVM
  filter-buffer: make filter_buffer_flush() public
  COLO: flush buffered packets in checkpoint process or exit COLO
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  16 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   1 +
 include/migration/colo.h      |  42 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  16 +
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |  12 +
 include/net/net.h             |   8 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  76 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 846 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c         | 109 +++++-
 migration/qemu-file-buf.c     |  61 +++
 migration/ram.c               | 175 ++++++++-
 migration/savevm.c            | 114 ++++--
 net/filter-buffer.c           |  14 +-
 net/filter.c                  |  79 ++++
 net/net.c                     |  57 +++
 qapi-schema.json              | 104 +++++-
 qapi/event.json               |  15 +
 qmp-commands.hx               |  23 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  54 +++
 trace-events                  |   8 +
 vl.c                          |  31 +-
 31 files changed, 1959 insertions(+), 75 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-02-23 11:51 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-06  9:28 [Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 01/40] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 02/40] migration: Introduce capability 'x-colo' to migration zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 03/40] COLO: migrate colo related info to secondary node zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 04/40] migration: Integrate COLO checkpoint process into migration zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 05/40] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 06/40] COLO/migration: Create a new communication path from destination to source zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 07/40] COLO: Implement colo checkpoint protocol zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 08/40] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 09/40] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 10/40] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 11/40] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 12/40] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 13/40] COLO: Load VMState into qsb before restore it zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 14/40] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 15/40] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 16/40] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 17/40] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 18/40] COLO failover: Introduce state to record failover process zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 19/40] COLO: Implement failover work for Primary VM zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 20/40] COLO: Implement failover work for Secondary VM zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 21/40] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 22/40] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 23/40] COLO failover: Don't do failover during loading VM's state zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 24/40] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-02-12 15:09   ` Dr. David Alan Gilbert
2016-02-16  6:17     ` Hailiang Zhang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 25/40] COLO: Update the global runstate after going into colo state zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 26/40] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 27/40] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 28/40] migration/savevm: Export two helper functions for savevm process zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 29/40] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 30/40] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 31/40] net/filter: Add a 'status' property for filter object zhanghailiang
2016-02-18  3:00   ` Jason Wang
2016-02-18  3:27     ` Hailiang Zhang
2016-02-23  8:34       ` Jason Wang
2016-02-23  9:37         ` Hailiang Zhang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 32/40] net/filter: Introduce a helper to add a filter to the netdev zhanghailiang
2016-02-18  3:19   ` Jason Wang
2016-02-18  3:30     ` Hailiang Zhang
2016-02-23  8:36       ` Jason Wang
2016-02-23 11:39         ` Hailiang Zhang
2016-02-23 11:50           ` Hailiang Zhang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 33/40] filter-buffer: Accept zero interval zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 34/40] net: Add notifier/callback for netdev init zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 35/40] COLO/filter: add each netdev a buffer filter zhanghailiang
2016-02-18  3:23   ` Jason Wang
2016-02-18  4:07     ` Hailiang Zhang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 36/40] net/filter: Add a helper to traverse all the filters zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 37/40] COLO: enable buffer filters for PVM zhanghailiang
2016-02-18  3:31   ` Jason Wang
2016-02-18  3:46     ` Hailiang Zhang
2016-02-18  7:30       ` Hailiang Zhang
2016-02-23  8:38         ` Jason Wang
2016-02-23  9:10           ` Hailiang Zhang
2016-02-23  8:37       ` Jason Wang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 38/40] filter-buffer: make filter_buffer_flush() public zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 39/40] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
2016-02-06  9:28 ` [Qemu-devel] [PATCH COLO-Frame v14 40/40] COLO: Add block replication into colo process zhanghailiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).