[RFC Patch v2 00/16] COarse-grain LOck-stepping Virtual Machines for Non-stop Service

* [RFC Patch v2 00/16] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2013-07-11  8:35 Wen Congyang
  2013-07-11  8:35 ` [RFC Patch v2 01/16] xen: introduce new hypercall to reset vcpu Wen Congyang
                   ` (17 more replies)
  0 siblings, 18 replies; 30+ messages in thread
From: Wen Congyang @ 2013-07-11  8:35 UTC (permalink / raw)
  To: Dong Eddie, Lai Jiangshan, xen-devl, Shriram Rajagopalan
  Cc: Jiang Yunhong, Wen Congyang, Ye Wei, Xu Yao, Hong Tao

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.

In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:
http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary VM(SVM) is a valid replica of the primary
VM(PVM), and can successfully take over when a hardware failure of the
PVM is detected.

This patchset is RFC, and implements the frame of colo:
1. Both PVM and SVM are running
2. do checkpoint only when the output packets from PVM and SVM are different
3. cache write requests from SVM

ChangeLog from v1 to v2:
1. update block-remus to support colo
2. split large patch to small one
3. fix some bugs
4. add a new hypercall for colo

Changelog:
  Patch 1: optimize the dirty pages transfer speed.
  Patch 2-3: allow SVM running after checkpoint
  Patch 4-5: modification for colo on the master side(wait a new checkpoint,
             communicate with slaver when doing checkoint)
  Patch 6-7: implement colo's user interface

Wen Congyang (16):
  xen: introduce new hypercall to reset vcpu
  block-remus: introduce colo mode
  block-remus: introduce a interface to allow the user specify which
    mode the backup end uses
  dominfo.completeRestore() will be called more than once in colo mode
  xc_domain_restore: introduce restore_callbacks for colo
  colo: implement restore_callbacks init()/free()
  colo: implement restore_callbacks get_page()
  colo: implement restore_callbacks flush_memory
  colo: implement restore_callbacks update_p2m()
  colo: implement restore_callbacks finish_restore()
  xc_restore: implement for colo
  XendCheckpoint: implement colo
  xc_domain_save: flush cache before calling callbacks->postcopy()
  add callback to configure network for colo
  xc_domain_save: implement save_callbacks for colo
  remus: implement colo mode

 tools/blktap2/drivers/block-remus.c               |  188 ++++-
 tools/libxc/Makefile                              |    8 +-
 tools/libxc/xc_domain_restore.c                   |  264 ++++--
 tools/libxc/xc_domain_restore_colo.c              |  939 +++++++++++++++++++++
 tools/libxc/xc_domain_save.c                      |   23 +-
 tools/libxc/xc_save_restore_colo.h                |   14 +
 tools/libxc/xenguest.h                            |   51 ++
 tools/libxl/Makefile                              |    2 +-
 tools/python/xen/lowlevel/checkpoint/checkpoint.c |  322 +++++++-
 tools/python/xen/lowlevel/checkpoint/checkpoint.h |    1 +
 tools/python/xen/remus/device.py                  |    8 +
 tools/python/xen/remus/image.py                   |    8 +-
 tools/python/xen/remus/save.py                    |   13 +-
 tools/python/xen/xend/XendCheckpoint.py           |  127 ++-
 tools/python/xen/xend/XendDomainInfo.py           |   13 +-
 tools/remus/remus                                 |   28 +-
 tools/xcutils/Makefile                            |    4 +-
 tools/xcutils/xc_restore.c                        |   36 +-
 xen/arch/x86/domain.c                             |   57 ++
 xen/arch/x86/x86_64/entry.S                       |    4 +
 xen/include/public/xen.h                          |    1 +
 21 files changed, 1947 insertions(+), 164 deletions(-)
 create mode 100644 tools/libxc/xc_domain_restore_colo.c
 create mode 100644 tools/libxc/xc_save_restore_colo.h

-- 
1.7.4

^ permalink raw reply	[flat|nested] 30+ messages in thread