From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 23/27] tools/libxl: [RFC] Write checkpoint records into the stream Date: Wed, 17 Jun 2015 10:55:04 +0100 Message-ID: <558143F8.6040102@citrix.com> References: <1434375880-30914-1-git-send-email-andrew.cooper3@citrix.com> <1434375880-30914-24-git-send-email-andrew.cooper3@citrix.com> <1434466981.13744.212.camel@citrix.com> <5580466B.609@citrix.com> <1434526208.3342.120.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1434526208.3342.120.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Wei Liu , Yang Hongyang , Ian Jackson , Xen-devel List-Id: xen-devel@lists.xenproject.org On 17/06/15 08:30, Ian Campbell wrote: > On Tue, 2015-06-16 at 16:53 +0100, Andrew Cooper wrote: >> On 16/06/15 16:03, Ian Campbell wrote: >>> On Mon, 2015-06-15 at 14:44 +0100, Andrew Cooper wrote: >>>> when signalled to do so by libxl__remus_domain_checkpoint_callback() >>> I think I saw that Remus wasn't currently working, so I'll let you and >>> Hongyang thrash something out before I spend too much effort reviewing >>> these last few RFC bits. Unless you think it is worth my having a look >>> now? >>> >>> >> Remus was broken by patch 19 in the series, and this patch forms part of >> fixing it again. >> >> I can't find a way of fixing the layering violation in both plain >> migration and Remus, in a readable, bisectable way. >> >> Remus requires identical source and destination toolstacks, and the >> Remus maintainers are happy enough with the "break it and fix it up in >> the same series" approach. >> >> Now that the series is comeplete, there is some shuffling room to reduce >> the window of breakage, but short of folding patches 19, 21, 23-25 >> together, Remus will break. > The report I was referring to thinking I'd seen was that Remus was still > broken even after the complete series was applied i.e. there was still > more to be done. That is because I was blind-coding Remus support without an ability to test. > > I'm happy with the transient breakage in this series on this occasion, > but I was proposing not to review until Remus was thought to be working > OK at the end. It is mostly fixed now. I just need to fix the failover handling, and have instructions on how to do so. ~Andrew