All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: qemu-devel@nongnu.org
Cc: xiecl.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com,
	quintela@redhat.com, Markus Armbruster <armbru@redhat.com>,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	peter.huangpeng@huawei.com, dgilbert@redhat.com,
	arei.gonglei@huawei.com, stefanha@redhat.com,
	amit.shah@redhat.com, zhangchen.fnst@cn.fujitsu.com,
	hongyang.yang@easystack.cn
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Tue, 29 Dec 2015 15:14:34 +0800	[thread overview]
Message-ID: <568232DA.10100@huawei.com> (raw)
In-Reply-To: <1451372975-5048-1-git-send-email-zhang.zhanghailiang@huawei.com>

Cc: Markus Armbruster <armbru@redhat.com>

On 2015/12/29 15:08, zhanghailiang wrote:
> This is the 13th version of COLO (Still only support periodic checkpoint).
>
> Here is only COLO frame part, you can get the whole codes from github:
> https://github.com/coloft/qemu/commits/colo-v2.4-periodic-mode
>
> Please ignore patch 4 ~ 5 which have been picked by Dave into another series.
>
> Test procedure:
> 1. Startup qemu
> Primary side:
> #x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
> Secondary side:
> #x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -netdev tap,id=hn0,vhost=off -device virtio-net-pci,id=net-pci0,netdev=hn0 -drive if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=raw,node-name=node0 -drive if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0 -incoming tcp:0:8888
> 2. On Secondary VM's QEMU monitor, issue command
> {'execute':'qmp_capabilities'}
> {'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '192.168.2.88', 'port': '8889'} } } }
> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
> {'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }
>
> 3. On Primary VM's QEMU monitor, issue command:
> {'execute':'qmp_capabilities'}
> {'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none'}}
> {'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
> {'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
> {'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }
>
> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
> You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
> to change the checkpoint period time.
>
> 5. Failover test
> You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
> monitor at the same time, then SVM will failover and client will not feel this
> change.
>
> Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
> issue block related command to stop block replication.
> Primary:
>    Remove the nbd child from the quorum:
>    { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>    Note: there is no qmp command to remove the blockdev now
>
> Secondary:
>    The primary host is down, so we should do the following thing:
>    { 'execute': 'nbd-server-stop' }
>
> Please review, thanks.
>
> TODO:
> 1. Checkpoint based on proxy in qemu
> 2. The capability of continuous FT
> 3. Optimize the VM's downtime during checkpoint
>
> v13:
>   - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
>    instead of return value to indicate success or failure. (patch 10)
>   - Remove the optional error message for COLO_EXIT event. (patch 25)
>   - Use semaphore to notify colo/colo incoming loop that failover work is
>     finished. (patch 26)
>   - Move COLO shutdown related codes to colo.c file. (patch 28)
>   - Fix memory leak bug for colo incoming loop. (new patch 31)
>   - Re-use some existed helper functions to realize the process of
>     saving/loading ram and device. (patch 32)
>   - Fix some other comments from Dave and Markus.
>
> v12:
>   - Fix the bug that default buffer filter broken vhost-net.
>   - Add an flag in struct NetFilterState to help skipping default
>    filter for packets travelling through filter layer.
>   - Remove the default failover treatment which may cause split-brain.
>   - Rename checkpoint-delay to x-checkpoint-delay.
>   - Check if all netdev supports default filter before going into COLO.
>   - Reconstruct send/receive helper functions in patch 10.
>   - Address serveral other comments from Dave
>
> v11:
>   - Re-implement buffer/release packets based on filter-buffer according
>     to Jason Wang's suggestion. (patch 34, patch 36 ~ patch 38)
>   - Rebase master to re-use some stuff introduced by post-copy.
>   - Address several comments from Eric and Dave, the fixing record can
>     be found in each patch.
>
> v10:
>   - Rename 'colo_lost_heartbeat' command to experimental 'x_colo_lost_heartbeat'
>   - Rename migration capability 'colo' to 'x-colo' (Eric's suggestion)
>   - Simplify the process of primary side by dropping colo thread and reusing
>     migration thread. (Dave's suggestion)
>   - Add several netfilter related APIs to support buffer/release packets
>     for COLO (patch 32 ~ patch 36)
>
> zhanghailiang (39):
>    configure: Add parameter for configure to enable/disable COLO support
>    migration: Introduce capability 'x-colo' to migration
>    COLO: migrate colo related info to secondary node
>    migration: Export migrate_set_state()
>    migration: Add state records for migration incoming
>    migration: Integrate COLO checkpoint process into migration
>    migration: Integrate COLO checkpoint process into loadvm
>    migration: Rename the'file' member of MigrationState
>    COLO/migration: Create a new communication path from destination to
>      source
>    COLO: Implement colo checkpoint protocol
>    COLO: Add a new RunState RUN_STATE_COLO
>    QEMUSizedBuffer: Introduce two help functions for qsb
>    COLO: Save PVM state to secondary side when do checkpoint
>    ram: Split host_from_stream_offset() into two helper functions
>    COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>    ram/COLO: Record the dirty pages that SVM received
>    COLO: Load VMState into qsb before restore it
>    COLO: Flush PVM's cached RAM into SVM's memory
>    COLO: Add checkpoint-delay parameter for migrate-set-parameters
>    COLO: synchronize PVM's state to SVM periodically
>    COLO failover: Introduce a new command to trigger a failover
>    COLO failover: Introduce state to record failover process
>    COLO: Implement failover work for Primary VM
>    COLO: Implement failover work for Secondary VM
>    qmp event: Add COLO_EXIT event to notify users while exited from COLO
>    COLO failover: Shutdown related socket fd when do failover
>    COLO failover: Don't do failover during loading VM's state
>    COLO: Process shutdown command for VM in COLO state
>    COLO: Update the global runstate after going into colo state
>    savevm: Split load vm state function qemu_loadvm_state
>    savevm: Introduce two helper functions for save/find loadvm_handlers
>      entry
>    COLO: Separate the process of saving/loading ram and device state
>    COLO: Split qemu_savevm_state_begin out of checkpoint process
>    net/filter-buffer: Add default filter-buffer for each netdev
>    filter-buffer: Accept zero interval
>    filter-buffer: Introduce a helper function to enable/disable default
>      filter
>    filter-buffer: Introduce a helper function to release packets
>    colo: Use default buffer-filter to buffer and release packets
>    COLO: Add block replication into colo process
>
>   configure                     |  11 +
>   docs/qmp-events.txt           |  16 +
>   hmp-commands.hx               |  15 +
>   hmp.c                         |  15 +
>   hmp.h                         |   1 +
>   include/exec/ram_addr.h       |   9 +-
>   include/migration/colo.h      |  40 +++
>   include/migration/failover.h  |  33 ++
>   include/migration/migration.h |  21 +-
>   include/migration/qemu-file.h |   3 +-
>   include/net/filter.h          |  12 +
>   include/net/net.h             |   5 +
>   include/sysemu/sysemu.h       |   9 +
>   migration/Makefile.objs       |   2 +
>   migration/colo-comm.c         |  71 ++++
>   migration/colo-failover.c     |  83 +++++
>   migration/colo.c              | 804 ++++++++++++++++++++++++++++++++++++++++++
>   migration/exec.c              |   4 +-
>   migration/fd.c                |   4 +-
>   migration/migration.c         | 216 ++++++++----
>   migration/postcopy-ram.c      |   6 +-
>   migration/qemu-file-buf.c     |  61 ++++
>   migration/ram.c               | 213 +++++++++--
>   migration/rdma.c              |   2 +-
>   migration/savevm.c            | 248 +++++++++----
>   migration/tcp.c               |   4 +-
>   migration/unix.c              |   4 +-
>   net/filter-buffer.c           | 127 ++++++-
>   net/filter.c                  |   6 +-
>   net/net.c                     |  58 +++
>   qapi-schema.json              | 104 +++++-
>   qapi/event.json               |  15 +
>   qmp-commands.hx               |  24 +-
>   stubs/Makefile.objs           |   1 +
>   stubs/migration-colo.c        |  45 +++
>   trace-events                  |  10 +
>   vl.c                          |  30 +-
>   37 files changed, 2135 insertions(+), 197 deletions(-)
>   create mode 100644 include/migration/colo.h
>   create mode 100644 include/migration/failover.h
>   create mode 100644 migration/colo-comm.c
>   create mode 100644 migration/colo-failover.c
>   create mode 100644 migration/colo.c
>   create mode 100644 stubs/migration-colo.c
>

      parent reply	other threads:[~2015-12-29  7:15 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-29  7:08 [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 01/39] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 02/39] migration: Introduce capability 'x-colo' to migration zhanghailiang
2015-12-29  7:08 ` [Qemu-devel] [PATCH COLO-Frame v13 03/39] COLO: migrate colo related info to secondary node zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 04/39] migration: Export migrate_set_state() zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 05/39] migration: Add state records for migration incoming zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 06/39] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 07/39] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 08/39] migration: Rename the'file' member of MigrationState zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 09/39] COLO/migration: Create a new communication path from destination to source zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 10/39] COLO: Implement colo checkpoint protocol zhanghailiang
2016-01-29 13:08   ` Dr. David Alan Gilbert
2016-01-30  8:51     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 11/39] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 12/39] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 13/39] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 14/39] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 15/39] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 16/39] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 17/39] COLO: Load VMState into qsb before restore it zhanghailiang
2016-01-04 19:00   ` Dr. David Alan Gilbert
2016-01-11  1:16     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 18/39] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 19/39] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 20/39] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 21/39] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 22/39] COLO failover: Introduce state to record failover process zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 23/39] COLO: Implement failover work for Primary VM zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 24/39] COLO: Implement failover work for Secondary VM zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 25/39] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 26/39] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 27/39] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 28/39] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-01-26 19:55   ` Dr. David Alan Gilbert
2016-01-27  9:54     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 29/39] COLO: Update the global runstate after going into colo state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 30/39] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 31/39] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-01-26 19:59   ` Dr. David Alan Gilbert
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 32/39] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-01-27 14:14   ` Dr. David Alan Gilbert
2016-01-30 10:23     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 33/39] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 34/39] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
2016-01-11  1:26   ` Hailiang Zhang
2016-01-19  1:46     ` Hailiang Zhang
2016-01-19  3:19   ` Jason Wang
2016-01-19  8:39     ` Hailiang Zhang
2016-01-20  2:39       ` Jason Wang
2016-01-20  7:14         ` Hailiang Zhang
2016-01-20  9:15           ` Jason Wang
2016-01-20  9:27             ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 35/39] filter-buffer: Accept zero interval zhanghailiang
2016-01-19  3:21   ` Jason Wang
2016-01-19  8:40     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 36/39] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
2016-01-19  3:35   ` Jason Wang
2016-01-19  8:44     ` Hailiang Zhang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 37/39] filter-buffer: Introduce a helper function to release packets zhanghailiang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 38/39] colo: Use default buffer-filter to buffer and " zhanghailiang
2016-01-19  3:59   ` Jason Wang
2015-12-29  7:09 ` [Qemu-devel] [PATCH COLO-Frame v13 39/39] COLO: Add block replication into colo process zhanghailiang
2015-12-29  7:14 ` Hailiang Zhang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568232DA.10100@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.