From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [RFC Patch v2 00/16] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Date: Thu, 11 Jul 2013 10:37:55 +0100 Message-ID: <51DE7CF3.7050609@citrix.com> References: <1373531748-12547-1-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1373531748-12547-1-git-send-email-wency@cn.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wen Congyang Cc: Lai Jiangshan , Jiang Yunhong , Dong Eddie , Ye Wei , xen-devl , Hong Tao , Xu Yao , Shriram Rajagopalan List-Id: xen-devel@lists.xenproject.org On 11/07/13 09:35, Wen Congyang wrote: > Virtual machine (VM) replication is a well known technique for providing > application-agnostic software-implemented hardware fault tolerance - > "non-stop service". Currently, remus provides this function, but it buffers > all output packets, and the latency is unacceptable. > > In xen summit 2012, We introduce a new VM replication solution: colo > (COarse-grain LOck-stepping virtual machine). The presentation is in > the following URL: > http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service > > Here is the summary of the solution: > >From the client's point of view, as long as the client observes identical > responses from the primary and secondary VMs, according to the service > semantics, then the secondary VM(SVM) is a valid replica of the primary > VM(PVM), and can successfully take over when a hardware failure of the > PVM is detected. How set in stone are you about the terms PVM and SVM? SVM already has a specific meaning in Xen, being AMD Software Virtual Machine extensions which allow for HVM guests. As a lesser problem, PVM is sometimes used to mean PV, as a mirror of HVM. ~Andrew > > This patchset is RFC, and implements the frame of colo: > 1. Both PVM and SVM are running > 2. do checkpoint only when the output packets from PVM and SVM are different > 3. cache write requests from SVM > > ChangeLog from v1 to v2: > 1. update block-remus to support colo > 2. split large patch to small one > 3. fix some bugs > 4. add a new hypercall for colo > > Changelog: > Patch 1: optimize the dirty pages transfer speed. > Patch 2-3: allow SVM running after checkpoint > Patch 4-5: modification for colo on the master side(wait a new checkpoint, > communicate with slaver when doing checkoint) > Patch 6-7: implement colo's user interface > > > Wen Congyang (16): > xen: introduce new hypercall to reset vcpu > block-remus: introduce colo mode > block-remus: introduce a interface to allow the user specify which > mode the backup end uses > dominfo.completeRestore() will be called more than once in colo mode > xc_domain_restore: introduce restore_callbacks for colo > colo: implement restore_callbacks init()/free() > colo: implement restore_callbacks get_page() > colo: implement restore_callbacks flush_memory > colo: implement restore_callbacks update_p2m() > colo: implement restore_callbacks finish_restore() > xc_restore: implement for colo > XendCheckpoint: implement colo > xc_domain_save: flush cache before calling callbacks->postcopy() > add callback to configure network for colo > xc_domain_save: implement save_callbacks for colo > remus: implement colo mode > > tools/blktap2/drivers/block-remus.c | 188 ++++- > tools/libxc/Makefile | 8 +- > tools/libxc/xc_domain_restore.c | 264 ++++-- > tools/libxc/xc_domain_restore_colo.c | 939 +++++++++++++++++++++ > tools/libxc/xc_domain_save.c | 23 +- > tools/libxc/xc_save_restore_colo.h | 14 + > tools/libxc/xenguest.h | 51 ++ > tools/libxl/Makefile | 2 +- > tools/python/xen/lowlevel/checkpoint/checkpoint.c | 322 +++++++- > tools/python/xen/lowlevel/checkpoint/checkpoint.h | 1 + > tools/python/xen/remus/device.py | 8 + > tools/python/xen/remus/image.py | 8 +- > tools/python/xen/remus/save.py | 13 +- > tools/python/xen/xend/XendCheckpoint.py | 127 ++- > tools/python/xen/xend/XendDomainInfo.py | 13 +- > tools/remus/remus | 28 +- > tools/xcutils/Makefile | 4 +- > tools/xcutils/xc_restore.c | 36 +- > xen/arch/x86/domain.c | 57 ++ > xen/arch/x86/x86_64/entry.S | 4 + > xen/include/public/xen.h | 1 + > 21 files changed, 1947 insertions(+), 164 deletions(-) > create mode 100644 tools/libxc/xc_domain_restore_colo.c > create mode 100644 tools/libxc/xc_save_restore_colo.h >