From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Hongyang Subject: Re: [PATCH Remus v5 1/2] libxc/save: implement Remus checkpointed save Date: Fri, 15 May 2015 10:12:17 +0800 Message-ID: <55555601.90409@cn.fujitsu.com> References: <1431597974-15624-1-git-send-email-yanghy@cn.fujitsu.com> <1431597974-15624-2-git-send-email-yanghy@cn.fujitsu.com> <1431607637.13579.73.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431607637.13579.73.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: wei.liu2@citrix.com, eddie.dong@intel.com, wency@cn.fujitsu.com, andrew.cooper3@citrix.com, yunhong.jiang@intel.com, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, guijianfeng@cn.fujitsu.com, rshriram@cs.ubc.ca List-Id: xen-devel@lists.xenproject.org On 05/14/2015 08:47 PM, Ian Campbell wrote: > On Thu, 2015-05-14 at 18:06 +0800, Yang Hongyang wrote: >> With Remus, the save flow should be: >> live migration->{ periodically save(checkpointed save) } >> >> Signed-off-by: Yang Hongyang >> Reviewed-by: Andrew Cooper >> CC: Ian Campbell >> CC: Ian Jackson >> CC: Wei Liu >> --- >> tools/libxc/xc_sr_save.c | 80 ++++++++++++++++++++++++++++++++++++------------ >> 1 file changed, 61 insertions(+), 19 deletions(-) >> >> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c >> index 1d0a46d..1c5d199 100644 >> --- a/tools/libxc/xc_sr_save.c >> +++ b/tools/libxc/xc_sr_save.c >> @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) >> } >> >> /* >> + * Writes an CHECKPOINT record into the stream. > > "a CHECKPOINT" > >> + */ >> +static int write_checkpoint_record(struct xc_sr_context *ctx) >> +{ >> + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; >> + >> + return write_record(ctx, &checkpoint); >> +} >> + >> +/* >> * Writes a batch of memory as a PAGE_DATA record into the stream. The batch >> * is constructed in ctx->save.batch_pfns. >> * >> @@ -467,6 +477,14 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) >> DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, >> &ctx->save.dirty_bitmap_hbuf); >> >> + /* >> + * With Remus, we will enter checkpointed save after live migration. >> + * In checkpointed save loop, we skip the live part and pause straight >> + * away to send dirty pages between checkpoints. >> + */ >> + if ( !ctx->save.live ) >> + goto last_iter; > > Rather than use goto would it work to refactor everything from here to > the label into some sort of helper and just call that in the "actually > live" case? > > Or perhaps everything from the label to the end should be a helper > function which the caller can also use in thecheckpoint case instead of > calling send_domain_memory_live (and which s_d_m_l also calls of > course). I'm going to refactor the send_domain_memory_live() as follows: split the send_domain_memory_live() into three helper function: - send_memory_live() do the actually live case - suspend_and_send_dirty() suspend the guest and send dirty pages - send_memory_verify() then: - send_domain_memory_live() combination of those three helper functions - send_domain_momory_checkpointed() calls suspend_and_send_dirty() and send_memory_verify() - send_domain_memory_nonlive() stay as it is Does it make sense? > >> + if ( ctx->save.checkpointed ) >> + { >> + if ( ctx->save.live ) >> + { >> + /* End of live migration, we are sending checkpointed stream */ >> + ctx->save.live = false; > > I think I'm misunderstanding either the purpose of this code or the > comment (or both). > > Is it the case that a checkpoint starts with an iteration of live (to > transfer everything over) and then drops into sending periodical > non-live updates at each checkpoint? > > If so then I think a more useful comment would be: > > /* > * We have now completed the initial live portion of the > checkpoint > * process. Therefore switch into periodically sending synchronous > * batches of pages. > */ > This is much better, Thank you! > Personally I don't have a problem with just a direct assignment without > the if, since assigning false to an already flase value is a nop. Will drop the if. > >> + } >> + >> + rc = write_checkpoint_record(ctx); >> + if ( rc ) >> + goto err; >> + >> + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); >> + >> + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); >> + if ( rc > 0 ) >> + xc_report_progress_single(xch, "Checkpointed save"); >> + else >> + ctx->save.checkpointed = false; >> + } >> + } while ( ctx->save.checkpointed ); >> >> xc_report_progress_single(xch, "End of stream"); >> > > > . > -- Thanks, Yang.