All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
@ 2015-12-29  7:08 zhanghailiang
  2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 01/39] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
                   ` (39 more replies)
  0 siblings, 40 replies; 63+ messages in thread
From: zhanghailiang @ 2015-12-29  7:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xiecl.fnst, lizhijian, quintela, yunhong.jiang, eddie.dong,
	peter.huangpeng, dgilbert, zhanghailiang, arei.gonglei, stefanha,
	amit.shah, zhangchen.fnst, hongyang.yang

This is the 13th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.4-periodic-mode

Please ignore patch 4 ~ 5 which have been picked by Dave into another series.

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
  Remove the nbd child from the quorum:
  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
  Note: there is no qmp command to remove the blockdev now

Secondary:
  The primary host is down, so we should do the following thing:
  { 'execute': 'nbd-server-stop' }

Please review, thanks.

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

v12:
 - Fix the bug that default buffer filter broken vhost-net.
 - Add an flag in struct NetFilterState to help skipping default
  filter for packets travelling through filter layer.
 - Remove the default failover treatment which may cause split-brain.
 - Rename checkpoint-delay to x-checkpoint-delay.
 - Check if all netdev supports default filter before going into COLO.
 - Reconstruct send/receive helper functions in patch 10.
 - Address serveral other comments from Dave 

v11:
 - Re-implement buffer/release packets based on filter-buffer according
   to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
 - Rebase master to re-use some stuff introduced by post-copy.
 - Address several comments from Eric and Dave, the fixing record can
   be found in each patch.

v10:
 - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
 - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
 - Simplify the process of primary side by dropping colo thread and reusing
   migration thread. (Dave's suggestion)
 - Add several netfilter related APIs to support buffer/release packets
   for COLO (patch 32 ~ patch 36)

zhanghailiang (39):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Export migrate_set_state()
  migration: Add state records for migration incoming
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  migration: Rename the'file' member of MigrationState
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  ram: Split host_from_stream_offset() into two helper functions
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Split load vm state function qemu_loadvm_state
  savevm: Introduce two helper functions for save/find loadvm_handlers
    entry
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter-buffer: Add default filter-buffer for each netdev
  filter-buffer: Accept zero interval
  filter-buffer: Introduce a helper function to enable/disable default
    filter
  filter-buffer: Introduce a helper function to release packets
  colo: Use default buffer-filter to buffer and release packets
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  16 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   9 +-
 include/migration/colo.h      |  40 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  21 +-
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |  12 +
 include/net/net.h             |   5 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  71 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 804 ++++++++++++++++++++++++++++++++++++++++++
 migration/exec.c              |   4 +-
 migration/fd.c                |   4 +-
 migration/migration.c         | 216 ++++++++----
 migration/postcopy-ram.c      |   6 +-
 migration/qemu-file-buf.c     |  61 ++++
 migration/ram.c               | 213 +++++++++--
 migration/rdma.c              |   2 +-
 migration/savevm.c            | 248 +++++++++----
 migration/tcp.c               |   4 +-
 migration/unix.c              |   4 +-
 net/filter-buffer.c           | 127 ++++++-
 net/filter.c                  |   6 +-
 net/net.c                     |  58 +++
 qapi-schema.json              | 104 +++++-
 qapi/event.json               |  15 +
 qmp-commands.hx               |  24 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  45 +++
 trace-events                  |  10 +
 vl.c                          |  30 +-
 37 files changed, 2135 insertions(+), 197 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2016-01-30 10:24 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-29  7:08 [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 01/39] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 02/39] migration: Introduce capability 'x-colo' to migration zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 03/39] COLO: migrate colo related info to secondary node zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 04/39] migration: Export migrate_set_state() zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 05/39] migration: Add state records for migration incoming zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 06/39] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 07/39] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 08/39] migration: Rename the'file' member of MigrationState zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 09/39] COLO/migration: Create a new communication path from destination to source zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 10/39] COLO: Implement colo checkpoint protocol zhanghailiang
2016-01-29 13:08   ` Dr. David Alan Gilbert
2016-01-30  8:51     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 11/39] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 12/39] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 13/39] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 14/39] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 15/39] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 16/39] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 17/39] COLO: Load VMState into qsb before restore it zhanghailiang
2016-01-04 19:00   ` Dr. David Alan Gilbert
2016-01-11  1:16     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 18/39] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 19/39] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 20/39] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 21/39] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 22/39] COLO failover: Introduce state to record failover process zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 23/39] COLO: Implement failover work for Primary VM zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 24/39] COLO: Implement failover work for Secondary VM zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 25/39] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 26/39] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 27/39] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 28/39] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-01-26 19:55   ` Dr. David Alan Gilbert
2016-01-27  9:54     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 29/39] COLO: Update the global runstate after going into colo state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 30/39] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 31/39] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-01-26 19:59   ` Dr. David Alan Gilbert
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 32/39] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-01-27 14:14   ` Dr. David Alan Gilbert
2016-01-30 10:23     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 33/39] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 34/39] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
2016-01-11  1:26   ` Hailiang Zhang
2016-01-19  1:46     ` Hailiang Zhang
2016-01-19  3:19   ` Jason Wang
2016-01-19  8:39     ` Hailiang Zhang
2016-01-20  2:39       ` Jason Wang
2016-01-20  7:14         ` Hailiang Zhang
2016-01-20  9:15           ` Jason Wang
2016-01-20  9:27             ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 35/39] filter-buffer: Accept zero interval zhanghailiang
2016-01-19  3:21   ` Jason Wang
2016-01-19  8:40     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 36/39] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
2016-01-19  3:35   ` Jason Wang
2016-01-19  8:44     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 37/39] filter-buffer: Introduce a helper function to release packets zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 38/39] colo: Use default buffer-filter to buffer and " zhanghailiang
2016-01-19  3:59   ` Jason Wang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 39/39] COLO: Add block replication into colo process zhanghailiang
2015-12-29  7:14 ` [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Hailiang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.