* [PATCH Remus v3 0/3] Remus support for Migration-v2 @ 2015-05-13 8:34 Yang Hongyang 2015-05-13 8:34 ` [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Yang Hongyang ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Yang Hongyang @ 2015-05-13 8:34 UTC (permalink / raw) To: xen-devel Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong, guijianfeng, rshriram, ian.jackson This patchset implement the Remus support for Migration v2 but without memory compressing. Git tree available at: https://github.com/macrosheep/xen/tree/Remus-newmig-v3 This patchset is based on [PATCH v5 00/14] Misc patches to aid migration v2 Remus support https://github.com/macrosheep/xen/tree/misc-remus-v5 Yang Hongyang (3): libxc/save: implement Remus checkpointed save libxc/restore: add checkpointed flag to the restore context libxc/restore: implement Remus checkpointed restore tools/libxc/xc_sr_common.h | 16 ++++++ tools/libxc/xc_sr_restore.c | 125 +++++++++++++++++++++++++++++++++++++++----- tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++------- 3 files changed, 183 insertions(+), 31 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save 2015-05-13 8:34 [PATCH Remus v3 0/3] Remus support for Migration-v2 Yang Hongyang @ 2015-05-13 8:34 ` Yang Hongyang 2015-05-13 15:14 ` Andrew Cooper 2015-05-13 8:35 ` [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context Yang Hongyang 2015-05-13 8:35 ` [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore Yang Hongyang 2 siblings, 1 reply; 13+ messages in thread From: Yang Hongyang @ 2015-05-13 8:34 UTC (permalink / raw) To: xen-devel Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong, guijianfeng, rshriram, ian.jackson With Remus, the save flow should be: live migration->{ periodically save(checkpointed save) } Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> --- tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 54 insertions(+), 19 deletions(-) diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 1d0a46d..3890c4d 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) } /* + * Writes an CHECKPOINT record into the stream. + */ +static int write_checkpoint_record(struct xc_sr_context *ctx) +{ + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; + + return write_record(ctx, &checkpoint); +} + +/* * Writes a batch of memory as a PAGE_DATA record into the stream. The batch * is constructed in ctx->save.batch_pfns. * @@ -467,6 +477,9 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, &ctx->save.dirty_bitmap_hbuf); + if ( !ctx->save.live ) + goto last_iter; + rc = enable_logdirty(ctx); if ( rc ) goto out; @@ -505,6 +518,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) goto out; } + last_iter: rc = suspend_domain(ctx); if ( rc ) goto out; @@ -667,29 +681,50 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) if ( rc ) goto err; - rc = ctx->save.ops.start_of_checkpoint(ctx); - if ( rc ) - goto err; + do { + rc = ctx->save.ops.start_of_checkpoint(ctx); + if ( rc ) + goto err; - if ( ctx->save.live ) - rc = send_domain_memory_live(ctx); - else - rc = send_domain_memory_nonlive(ctx); + if ( ctx->save.live || ctx->save.checkpointed ) + rc = send_domain_memory_live(ctx); + else + rc = send_domain_memory_nonlive(ctx); - if ( rc ) - goto err; + if ( rc ) + goto err; - if ( !ctx->dominfo.shutdown || - (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) - { - ERROR("Domain has not been suspended"); - rc = -1; - goto err; - } + if ( !ctx->dominfo.shutdown || + (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) + { + ERROR("Domain has not been suspended"); + rc = -1; + goto err; + } - rc = ctx->save.ops.end_of_checkpoint(ctx); - if ( rc ) - goto err; + rc = ctx->save.ops.end_of_checkpoint(ctx); + if ( rc ) + goto err; + + if ( ctx->save.checkpointed ) { + if ( ctx->save.live ) { + /* End of live migration, we are sending checkpointed stream */ + ctx->save.live = false; + } + + rc = write_checkpoint_record(ctx); + if ( rc ) + goto err; + + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); + + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); + if ( rc > 0 ) + IPRINTF("Checkpointed save"); + else + ctx->save.checkpointed = false; + } + } while ( ctx->save.checkpointed ); xc_report_progress_single(xch, "End of stream"); -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save 2015-05-13 8:34 ` [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Yang Hongyang @ 2015-05-13 15:14 ` Andrew Cooper 2015-05-14 1:08 ` Yang Hongyang 0 siblings, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2015-05-13 15:14 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong, guijianfeng, rshriram On 13/05/15 09:34, Yang Hongyang wrote: > With Remus, the save flow should be: > live migration->{ periodically save(checkpointed save) } > > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> > CC: Ian Campbell <Ian.Campbell@citrix.com> > CC: Ian Jackson <Ian.Jackson@eu.citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > --- > tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++++++++++++++++++------------- > 1 file changed, 54 insertions(+), 19 deletions(-) > > diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c > index 1d0a46d..3890c4d 100644 > --- a/tools/libxc/xc_sr_save.c > +++ b/tools/libxc/xc_sr_save.c > @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) > } > > /* > + * Writes an CHECKPOINT record into the stream. > + */ > +static int write_checkpoint_record(struct xc_sr_context *ctx) > +{ > + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; > + > + return write_record(ctx, &checkpoint); > +} > + > +/* > * Writes a batch of memory as a PAGE_DATA record into the stream. The batch > * is constructed in ctx->save.batch_pfns. > * > @@ -467,6 +477,9 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) > DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, > &ctx->save.dirty_bitmap_hbuf); > > + if ( !ctx->save.live ) > + goto last_iter; > + I really want to avoid this code getting into the same spiders web as it used to be with goto, but I cannot suggest a better alternative. However, please leave a comment explaining that subsequent checkpointed loop skip the live part and pause straight away. > rc = enable_logdirty(ctx); > if ( rc ) > goto out; > @@ -505,6 +518,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) > goto out; > } > > + last_iter: > rc = suspend_domain(ctx); > if ( rc ) > goto out; > @@ -667,29 +681,50 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) > if ( rc ) > goto err; > > - rc = ctx->save.ops.start_of_checkpoint(ctx); > - if ( rc ) > - goto err; > + do { > + rc = ctx->save.ops.start_of_checkpoint(ctx); > + if ( rc ) > + goto err; > > - if ( ctx->save.live ) > - rc = send_domain_memory_live(ctx); > - else > - rc = send_domain_memory_nonlive(ctx); > + if ( ctx->save.live || ctx->save.checkpointed ) > + rc = send_domain_memory_live(ctx); > + else > + rc = send_domain_memory_nonlive(ctx); > > - if ( rc ) > - goto err; > + if ( rc ) > + goto err; > > - if ( !ctx->dominfo.shutdown || > - (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) > - { > - ERROR("Domain has not been suspended"); > - rc = -1; > - goto err; > - } > + if ( !ctx->dominfo.shutdown || > + (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) > + { > + ERROR("Domain has not been suspended"); > + rc = -1; > + goto err; > + } > > - rc = ctx->save.ops.end_of_checkpoint(ctx); > - if ( rc ) > - goto err; > + rc = ctx->save.ops.end_of_checkpoint(ctx); > + if ( rc ) > + goto err; > + > + if ( ctx->save.checkpointed ) { > + if ( ctx->save.live ) { Coding style. libxc is admittedly inconsistent, but follows hypervisor style, so { on the next line. > + /* End of live migration, we are sending checkpointed stream */ > + ctx->save.live = false; > + } > + > + rc = write_checkpoint_record(ctx); > + if ( rc ) > + goto err; > + > + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); > + > + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); > + if ( rc > 0 ) > + IPRINTF("Checkpointed save"); This should probably be progress information , so xc_report_progress_single() Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> (I think this code is a bit of an improvement over xc_domain_save.c) > + else > + ctx->save.checkpointed = false; > + } > + } while ( ctx->save.checkpointed ); > > xc_report_progress_single(xch, "End of stream"); > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save 2015-05-13 15:14 ` Andrew Cooper @ 2015-05-14 1:08 ` Yang Hongyang 0 siblings, 0 replies; 13+ messages in thread From: Yang Hongyang @ 2015-05-14 1:08 UTC (permalink / raw) To: Andrew Cooper, xen-devel Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong, guijianfeng, rshriram On 05/13/2015 11:14 PM, Andrew Cooper wrote: > On 13/05/15 09:34, Yang Hongyang wrote: >> With Remus, the save flow should be: >> live migration->{ periodically save(checkpointed save) } >> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >> CC: Ian Campbell <Ian.Campbell@citrix.com> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> >> CC: Wei Liu <wei.liu2@citrix.com> >> CC: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> tools/libxc/xc_sr_save.c | 73 +++++++++++++++++++++++++++++++++++------------- >> 1 file changed, 54 insertions(+), 19 deletions(-) >> >> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c >> index 1d0a46d..3890c4d 100644 >> --- a/tools/libxc/xc_sr_save.c >> +++ b/tools/libxc/xc_sr_save.c >> @@ -57,6 +57,16 @@ static int write_end_record(struct xc_sr_context *ctx) >> } >> >> /* >> + * Writes an CHECKPOINT record into the stream. >> + */ >> +static int write_checkpoint_record(struct xc_sr_context *ctx) >> +{ >> + struct xc_sr_record checkpoint = { REC_TYPE_CHECKPOINT, 0, NULL }; >> + >> + return write_record(ctx, &checkpoint); >> +} >> + >> +/* >> * Writes a batch of memory as a PAGE_DATA record into the stream. The batch >> * is constructed in ctx->save.batch_pfns. >> * >> @@ -467,6 +477,9 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) >> DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, >> &ctx->save.dirty_bitmap_hbuf); >> >> + if ( !ctx->save.live ) >> + goto last_iter; >> + > > I really want to avoid this code getting into the same spiders web as it > used to be with goto, but I cannot suggest a better alternative. > > However, please leave a comment explaining that subsequent checkpointed > loop skip the live part and pause straight away. OK > >> rc = enable_logdirty(ctx); >> if ( rc ) >> goto out; >> @@ -505,6 +518,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx) >> goto out; >> } >> >> + last_iter: >> rc = suspend_domain(ctx); >> if ( rc ) >> goto out; >> @@ -667,29 +681,50 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) >> if ( rc ) >> goto err; >> >> - rc = ctx->save.ops.start_of_checkpoint(ctx); >> - if ( rc ) >> - goto err; >> + do { >> + rc = ctx->save.ops.start_of_checkpoint(ctx); >> + if ( rc ) >> + goto err; >> >> - if ( ctx->save.live ) >> - rc = send_domain_memory_live(ctx); >> - else >> - rc = send_domain_memory_nonlive(ctx); >> + if ( ctx->save.live || ctx->save.checkpointed ) >> + rc = send_domain_memory_live(ctx); >> + else >> + rc = send_domain_memory_nonlive(ctx); >> >> - if ( rc ) >> - goto err; >> + if ( rc ) >> + goto err; >> >> - if ( !ctx->dominfo.shutdown || >> - (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) >> - { >> - ERROR("Domain has not been suspended"); >> - rc = -1; >> - goto err; >> - } >> + if ( !ctx->dominfo.shutdown || >> + (ctx->dominfo.shutdown_reason != SHUTDOWN_suspend) ) >> + { >> + ERROR("Domain has not been suspended"); >> + rc = -1; >> + goto err; >> + } >> >> - rc = ctx->save.ops.end_of_checkpoint(ctx); >> - if ( rc ) >> - goto err; >> + rc = ctx->save.ops.end_of_checkpoint(ctx); >> + if ( rc ) >> + goto err; >> + >> + if ( ctx->save.checkpointed ) { >> + if ( ctx->save.live ) { > > Coding style. libxc is admittedly inconsistent, but follows hypervisor > style, so { on the next line. OK > >> + /* End of live migration, we are sending checkpointed stream */ >> + ctx->save.live = false; >> + } >> + >> + rc = write_checkpoint_record(ctx); >> + if ( rc ) >> + goto err; >> + >> + ctx->save.callbacks->postcopy(ctx->save.callbacks->data); >> + >> + rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data); >> + if ( rc > 0 ) >> + IPRINTF("Checkpointed save"); > > This should probably be progress information , so > xc_report_progress_single() Will fix. > > Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > > (I think this code is a bit of an improvement over xc_domain_save.c) > >> + else >> + ctx->save.checkpointed = false; >> + } >> + } while ( ctx->save.checkpointed ); >> >> xc_report_progress_single(xch, "End of stream"); >> > > . > -- Thanks, Yang. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context 2015-05-13 8:34 [PATCH Remus v3 0/3] Remus support for Migration-v2 Yang Hongyang 2015-05-13 8:34 ` [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Yang Hongyang @ 2015-05-13 8:35 ` Yang Hongyang 2015-05-13 15:15 ` Andrew Cooper 2015-05-13 8:35 ` [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore Yang Hongyang 2 siblings, 1 reply; 13+ messages in thread From: Yang Hongyang @ 2015-05-13 8:35 UTC (permalink / raw) To: xen-devel Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong, guijianfeng, rshriram, ian.jackson add checkpointed flag to the restore context. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> --- tools/libxc/xc_sr_common.h | 3 +++ tools/libxc/xc_sr_restore.c | 1 + 2 files changed, 4 insertions(+) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 0ba9728..f8121e7 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -205,6 +205,9 @@ struct xc_sr_context uint32_t guest_type; uint32_t guest_page_size; + /* Plain VM, or checkpoints over time. */ + bool checkpointed; + /* * Xenstore and Console parameters. * INPUT: evtchn & domid diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 8022c3d..0e512ec 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -604,6 +604,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, ctx.restore.console_domid = console_domid; ctx.restore.xenstore_evtchn = store_evtchn; ctx.restore.xenstore_domid = store_domid; + ctx.restore.checkpointed = checkpointed_stream; ctx.restore.callbacks = callbacks; IPRINTF("In experimental %s", __func__); -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context 2015-05-13 8:35 ` [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context Yang Hongyang @ 2015-05-13 15:15 ` Andrew Cooper 2015-05-14 1:09 ` Yang Hongyang 0 siblings, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2015-05-13 15:15 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong, guijianfeng, rshriram On 13/05/15 09:35, Yang Hongyang wrote: > add checkpointed flag to the restore context. > > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> > CC: Ian Campbell <Ian.Campbell@citrix.com> > CC: Ian Jackson <Ian.Jackson@eu.citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> This appears unused, given the new restore code. ~Andrew > --- > tools/libxc/xc_sr_common.h | 3 +++ > tools/libxc/xc_sr_restore.c | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h > index 0ba9728..f8121e7 100644 > --- a/tools/libxc/xc_sr_common.h > +++ b/tools/libxc/xc_sr_common.h > @@ -205,6 +205,9 @@ struct xc_sr_context > uint32_t guest_type; > uint32_t guest_page_size; > > + /* Plain VM, or checkpoints over time. */ > + bool checkpointed; > + > /* > * Xenstore and Console parameters. > * INPUT: evtchn & domid > diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c > index 8022c3d..0e512ec 100644 > --- a/tools/libxc/xc_sr_restore.c > +++ b/tools/libxc/xc_sr_restore.c > @@ -604,6 +604,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, > ctx.restore.console_domid = console_domid; > ctx.restore.xenstore_evtchn = store_evtchn; > ctx.restore.xenstore_domid = store_domid; > + ctx.restore.checkpointed = checkpointed_stream; > ctx.restore.callbacks = callbacks; > > IPRINTF("In experimental %s", __func__); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context 2015-05-13 15:15 ` Andrew Cooper @ 2015-05-14 1:09 ` Yang Hongyang 2015-05-14 7:18 ` Andrew Cooper 0 siblings, 1 reply; 13+ messages in thread From: Yang Hongyang @ 2015-05-14 1:09 UTC (permalink / raw) To: Andrew Cooper, xen-devel Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang, eddie.dong, guijianfeng, rshriram On 05/13/2015 11:15 PM, Andrew Cooper wrote: > On 13/05/15 09:35, Yang Hongyang wrote: >> add checkpointed flag to the restore context. >> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >> CC: Ian Campbell <Ian.Campbell@citrix.com> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> >> CC: Wei Liu <wei.liu2@citrix.com> >> CC: Andrew Cooper <andrew.cooper3@citrix.com> > > This appears unused, given the new restore code. This is used by handle_checkpoint() for the sanity check. > > ~Andrew > >> --- >> tools/libxc/xc_sr_common.h | 3 +++ >> tools/libxc/xc_sr_restore.c | 1 + >> 2 files changed, 4 insertions(+) >> >> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h >> index 0ba9728..f8121e7 100644 >> --- a/tools/libxc/xc_sr_common.h >> +++ b/tools/libxc/xc_sr_common.h >> @@ -205,6 +205,9 @@ struct xc_sr_context >> uint32_t guest_type; >> uint32_t guest_page_size; >> >> + /* Plain VM, or checkpoints over time. */ >> + bool checkpointed; >> + >> /* >> * Xenstore and Console parameters. >> * INPUT: evtchn & domid >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >> index 8022c3d..0e512ec 100644 >> --- a/tools/libxc/xc_sr_restore.c >> +++ b/tools/libxc/xc_sr_restore.c >> @@ -604,6 +604,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, >> ctx.restore.console_domid = console_domid; >> ctx.restore.xenstore_evtchn = store_evtchn; >> ctx.restore.xenstore_domid = store_domid; >> + ctx.restore.checkpointed = checkpointed_stream; >> ctx.restore.callbacks = callbacks; >> >> IPRINTF("In experimental %s", __func__); > > . > -- Thanks, Yang. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context 2015-05-14 1:09 ` Yang Hongyang @ 2015-05-14 7:18 ` Andrew Cooper 0 siblings, 0 replies; 13+ messages in thread From: Andrew Cooper @ 2015-05-14 7:18 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, eddie.dong, yunhong.jiang, ian.jackson, guijianfeng, rshriram On 14/05/2015 02:09, Yang Hongyang wrote: > > > On 05/13/2015 11:15 PM, Andrew Cooper wrote: >> On 13/05/15 09:35, Yang Hongyang wrote: >>> add checkpointed flag to the restore context. >>> >>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >>> CC: Ian Campbell <Ian.Campbell@citrix.com> >>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> >>> CC: Wei Liu <wei.liu2@citrix.com> >>> CC: Andrew Cooper <andrew.cooper3@citrix.com> >> >> This appears unused, given the new restore code. > > This is used by handle_checkpoint() for the sanity check. Ah yes, and I have realised that it will need to be used when I fix up the qemu record handling in the future, so Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > >> >> ~Andrew >> >>> --- >>> tools/libxc/xc_sr_common.h | 3 +++ >>> tools/libxc/xc_sr_restore.c | 1 + >>> 2 files changed, 4 insertions(+) >>> >>> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h >>> index 0ba9728..f8121e7 100644 >>> --- a/tools/libxc/xc_sr_common.h >>> +++ b/tools/libxc/xc_sr_common.h >>> @@ -205,6 +205,9 @@ struct xc_sr_context >>> uint32_t guest_type; >>> uint32_t guest_page_size; >>> >>> + /* Plain VM, or checkpoints over time. */ >>> + bool checkpointed; >>> + >>> /* >>> * Xenstore and Console parameters. >>> * INPUT: evtchn & domid >>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >>> index 8022c3d..0e512ec 100644 >>> --- a/tools/libxc/xc_sr_restore.c >>> +++ b/tools/libxc/xc_sr_restore.c >>> @@ -604,6 +604,7 @@ int xc_domain_restore2(xc_interface *xch, int >>> io_fd, uint32_t dom, >>> ctx.restore.console_domid = console_domid; >>> ctx.restore.xenstore_evtchn = store_evtchn; >>> ctx.restore.xenstore_domid = store_domid; >>> + ctx.restore.checkpointed = checkpointed_stream; >>> ctx.restore.callbacks = callbacks; >>> >>> IPRINTF("In experimental %s", __func__); >> >> . >> > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore 2015-05-13 8:34 [PATCH Remus v3 0/3] Remus support for Migration-v2 Yang Hongyang 2015-05-13 8:34 ` [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Yang Hongyang 2015-05-13 8:35 ` [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context Yang Hongyang @ 2015-05-13 8:35 ` Yang Hongyang 2015-05-13 16:16 ` Andrew Cooper 2 siblings, 1 reply; 13+ messages in thread From: Yang Hongyang @ 2015-05-13 8:35 UTC (permalink / raw) To: xen-devel Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong, guijianfeng, rshriram, ian.jackson With Remus, the restore flow should be: the first full migration stream -> { periodically restore stream } Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> --- tools/libxc/xc_sr_common.h | 13 +++++ tools/libxc/xc_sr_restore.c | 124 +++++++++++++++++++++++++++++++++++++++----- 2 files changed, 125 insertions(+), 12 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index f8121e7..3740b89 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -208,6 +208,19 @@ struct xc_sr_context /* Plain VM, or checkpoints over time. */ bool checkpointed; + /* Currently buffering records between a checkpoint */ + bool buffer_all_records; + +/* + * With Remus, we buffer the records sent by primary at checkpoint, + * in case the primary will fail, we can recover from the last + * checkpoint state. + * This should be enough because primary only send dirty pages at + * checkpoint. + */ +#define MAX_BUF_RECORDS 1024 + struct xc_sr_record *buffered_records[MAX_BUF_RECORDS]; + /* * Xenstore and Console parameters. * INPUT: evtchn & domid diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 0e512ec..85534a8 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -468,10 +468,83 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) return rc; } +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); +static int handle_checkpoint(struct xc_sr_context *ctx) +{ + xc_interface *xch = ctx->xch; + int rc = 0, i; + struct xc_sr_record *rec; + + if ( !ctx->restore.checkpointed ) + { + ERROR("Found checkpoint in non-checkpointed stream"); + rc = -1; + goto err; + } + + if ( ctx->restore.buffer_all_records ) + { + IPRINTF("All records buffered"); + + /* + * We need to set buffer_all_records to false in + * order to process records instead of buffer records. + * buffer_all_records should be set back to true after + * we successfully processed all records. + */ + ctx->restore.buffer_all_records = false; + i = 0; + rec = ctx->restore.buffered_records[i++]; + while (rec) + { + rc = process_record(ctx, rec); + free(rec); + ctx->restore.buffered_records[i-1] = NULL; + if ( rc ) + goto err; + + rec = ctx->restore.buffered_records[i++]; + } + IPRINTF("All records processed"); + ctx->restore.buffer_all_records = true; + } + else + ctx->restore.buffer_all_records = true; + + err: + return rc; +} + static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec) { xc_interface *xch = ctx->xch; - int rc = 0; + int rc = 0, i; + struct xc_sr_record *buf_rec; + + if ( ctx->restore.buffer_all_records && + rec->type != REC_TYPE_END && + rec->type != REC_TYPE_CHECKPOINT ) + { + buf_rec = malloc(sizeof(struct xc_sr_record)); + if (!buf_rec) + { + ERROR("Unable to allocate memory for record"); + return -1; + } + memcpy(buf_rec, rec, sizeof(struct xc_sr_record)); + + for ( i = 0; i < MAX_BUF_RECORDS; i++ ) + if ( !ctx->restore.buffered_records[i] ) + break; + + if ( i >= MAX_BUF_RECORDS ) + { + ERROR("There are too many records within a checkpoint"); + return -1; + } + ctx->restore.buffered_records[i] = buf_rec; + return 0; + } switch ( rec->type ) { @@ -487,6 +560,10 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec) ctx->restore.verify = true; break; + case REC_TYPE_CHECKPOINT: + rc = handle_checkpoint(ctx); + break; + default: rc = ctx->restore.ops.process_record(ctx, rec); break; @@ -520,7 +597,7 @@ static int restore(struct xc_sr_context *ctx) { xc_interface *xch = ctx->xch; struct xc_sr_record rec; - int rc, saved_rc = 0, saved_errno = 0; + int rc, saved_rc = 0, saved_errno = 0, i; IPRINTF("Restoring domain"); @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx) { rc = read_record(ctx, &rec); if ( rc ) - goto err; + { + if ( ctx->restore.buffer_all_records ) + goto err_buf; + else + goto err; + } + +#ifdef XG_LIBXL_HVM_COMPAT + if ( ctx->dominfo.hvm && + (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) ) + { + rc = read_qemu(ctx); + if ( rc ) + { + if ( ctx->restore.buffer_all_records ) + goto err_buf; + else + goto err; + } + } +#endif rc = process_record(ctx, &rec); if ( rc ) @@ -549,15 +646,11 @@ static int restore(struct xc_sr_context *ctx) } while ( rec.type != REC_TYPE_END ); -#ifdef XG_LIBXL_HVM_COMPAT - if ( ctx->dominfo.hvm ) - { - rc = read_qemu(ctx); - if ( rc ) - goto err; - } -#endif - + err_buf: + /* + * With Remus, if we reach here, there must be some error on primary, + * failover from the last checkpoint state. + */ rc = ctx->restore.ops.stream_complete(ctx); if ( rc ) goto err; @@ -571,6 +664,13 @@ static int restore(struct xc_sr_context *ctx) PERROR("Restore failed"); done: + for ( i = 0; i < MAX_BUF_RECORDS; i++) + { + if ( ctx->restore.buffered_records[i] ) { + free(ctx->restore.buffered_records[i]->data); + free(ctx->restore.buffered_records[i]); + } + } free(ctx->restore.populated_pfns); rc = ctx->restore.ops.cleanup(ctx); if ( rc ) -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore 2015-05-13 8:35 ` [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore Yang Hongyang @ 2015-05-13 16:16 ` Andrew Cooper 2015-05-13 16:38 ` Andrew Cooper 2015-05-14 5:42 ` Yang Hongyang 0 siblings, 2 replies; 13+ messages in thread From: Andrew Cooper @ 2015-05-13 16:16 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang, eddie.dong, rshriram, ian.jackson On 13/05/15 09:35, Yang Hongyang wrote: > With Remus, the restore flow should be: > the first full migration stream -> { periodically restore stream } > > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Ian Campbell <Ian.Campbell@citrix.com> > CC: Ian Jackson <Ian.Jackson@eu.citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> > --- > tools/libxc/xc_sr_common.h | 13 +++++ > tools/libxc/xc_sr_restore.c | 124 +++++++++++++++++++++++++++++++++++++++----- > 2 files changed, 125 insertions(+), 12 deletions(-) > > diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h > index f8121e7..3740b89 100644 > --- a/tools/libxc/xc_sr_common.h > +++ b/tools/libxc/xc_sr_common.h > @@ -208,6 +208,19 @@ struct xc_sr_context > /* Plain VM, or checkpoints over time. */ > bool checkpointed; > > + /* Currently buffering records between a checkpoint */ > + bool buffer_all_records; > + > +/* > + * With Remus, we buffer the records sent by primary at checkpoint, "sent by the primary" > + * in case the primary will fail, we can recover from the last > + * checkpoint state. > + * This should be enough because primary only send dirty pages at > + * checkpoint. > + */ > +#define MAX_BUF_RECORDS 1024 As a general point, it occurs to me that this might be better as a linked list, rather than having a hard limit. A burst of activity on the primary could potentially hit this limit > + struct xc_sr_record *buffered_records[MAX_BUF_RECORDS]; Your data handling will be much more simple if this were struct xc_sr_record *buffered_records; and you do a one-time allocation of MAX_BUF_RECORDS * sizeof(struct xc_sr_record), although it will require an index integer as well. It would probably be best to factor out setup() and clean() just like you did for the save side. > + > /* > * Xenstore and Console parameters. > * INPUT: evtchn & domid > diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c > index 0e512ec..85534a8 100644 > --- a/tools/libxc/xc_sr_restore.c > +++ b/tools/libxc/xc_sr_restore.c > @@ -468,10 +468,83 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) > return rc; > } > > +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); > +static int handle_checkpoint(struct xc_sr_context *ctx) > +{ > + xc_interface *xch = ctx->xch; > + int rc = 0, i; Always an unsigned i for populated_pfnsindexing an array please. > + struct xc_sr_record *rec; > + > + if ( !ctx->restore.checkpointed ) > + { > + ERROR("Found checkpoint in non-checkpointed stream"); > + rc = -1; > + goto err; > + } > + > + if ( ctx->restore.buffer_all_records ) > + { > + IPRINTF("All records buffered"); > + > + /* > + * We need to set buffer_all_records to false in > + * order to process records instead of buffer records. > + * buffer_all_records should be set back to true after > + * we successfully processed all records. > + */ > + ctx->restore.buffer_all_records = false; > + i = 0; > + rec = ctx->restore.buffered_records[i++]; > + while (rec) Style (hypervisor style). > + { > + rc = process_record(ctx, rec); > + free(rec); > + ctx->restore.buffered_records[i-1] = NULL; > + if ( rc ) > + goto err; > + > + rec = ctx->restore.buffered_records[i++]; > + } > + IPRINTF("All records processed"); > + ctx->restore.buffer_all_records = true; > + } > + else > + ctx->restore.buffer_all_records = true; > + > + err: > + return rc; > +} > + > static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec) > { > xc_interface *xch = ctx->xch; > - int rc = 0; > + int rc = 0, i; > + struct xc_sr_record *buf_rec; > + > + if ( ctx->restore.buffer_all_records && > + rec->type != REC_TYPE_END && > + rec->type != REC_TYPE_CHECKPOINT ) > + { > + buf_rec = malloc(sizeof(struct xc_sr_record)); > + if (!buf_rec) > + { > + ERROR("Unable to allocate memory for record"); > + return -1; > + } > + memcpy(buf_rec, rec, sizeof(struct xc_sr_record)); > + > + for ( i = 0; i < MAX_BUF_RECORDS; i++ ) > + if ( !ctx->restore.buffered_records[i] ) > + break; > + > + if ( i >= MAX_BUF_RECORDS ) > + { > + ERROR("There are too many records within a checkpoint"); > + return -1; > + } > + ctx->restore.buffered_records[i] = buf_rec; > + return 0; > + } > > switch ( rec->type ) > { > @@ -487,6 +560,10 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec) > ctx->restore.verify = true; > break; > > + case REC_TYPE_CHECKPOINT: > + rc = handle_checkpoint(ctx); > + break; > + > default: > rc = ctx->restore.ops.process_record(ctx, rec); > break; > @@ -520,7 +597,7 @@ static int restore(struct xc_sr_context *ctx) > { > xc_interface *xch = ctx->xch; > struct xc_sr_record rec; > - int rc, saved_rc = 0, saved_errno = 0; > + int rc, saved_rc = 0, saved_errno = 0, i; > > IPRINTF("Restoring domain"); > > @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx) > { > rc = read_record(ctx, &rec); > if ( rc ) > - goto err; > + { > + if ( ctx->restore.buffer_all_records ) > + goto err_buf; Please can we choose a label sufficiently different to "err". "resume_from_checkpoint" perhaps? ~Andrew > + else > + goto err; > + } > + > +#ifdef XG_LIBXL_HVM_COMPAT > + if ( ctx->dominfo.hvm && > + (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) ) > + { > + rc = read_qemu(ctx); > + if ( rc ) > + { > + if ( ctx->restore.buffer_all_records ) > + goto err_buf; > + else > + goto err; > + } > + } > +#endif > > rc = process_record(ctx, &rec); > if ( rc ) > @@ -549,15 +646,11 @@ static int restore(struct xc_sr_context *ctx) > > } while ( rec.type != REC_TYPE_END ); > > -#ifdef XG_LIBXL_HVM_COMPAT > - if ( ctx->dominfo.hvm ) > - { > - rc = read_qemu(ctx); > - if ( rc ) > - goto err; > - } > -#endif > - > + err_buf: > + /* > + * With Remus, if we reach here, there must be some error on primary, > + * failover from the last checkpoint state. > + */ > rc = ctx->restore.ops.stream_complete(ctx); > if ( rc ) > goto err; > @@ -571,6 +664,13 @@ static int restore(struct xc_sr_context *ctx) > PERROR("Restore failed"); > > done: > + for ( i = 0; i < MAX_BUF_RECORDS; i++) > + { > + if ( ctx->restore.buffered_records[i] ) { > + free(ctx->restore.buffered_records[i]->data); > + free(ctx->restore.buffered_records[i]); > + } > + } > free(ctx->restore.populated_pfns); > rc = ctx->restore.ops.cleanup(ctx); > if ( rc ) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore 2015-05-13 16:16 ` Andrew Cooper @ 2015-05-13 16:38 ` Andrew Cooper 2015-05-14 5:42 ` Yang Hongyang 1 sibling, 0 replies; 13+ messages in thread From: Andrew Cooper @ 2015-05-13 16:38 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang, eddie.dong, rshriram, ian.jackson On 13/05/15 17:16, Andrew Cooper wrote: > >> + >> /* >> * Xenstore and Console parameters. >> * INPUT: evtchn & domid >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >> index 0e512ec..85534a8 100644 >> --- a/tools/libxc/xc_sr_restore.c >> +++ b/tools/libxc/xc_sr_restore.c >> @@ -468,10 +468,83 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) >> return rc; >> } >> >> +static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); >> +static int handle_checkpoint(struct xc_sr_context *ctx) >> +{ >> + xc_interface *xch = ctx->xch; >> + int rc = 0, i; > Always an unsigned i for populated_pfnsindexing an array please. Erm - not sure what happened here. Ignore the "populated_pfns" please. Sorry. ~Andrew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore 2015-05-13 16:16 ` Andrew Cooper 2015-05-13 16:38 ` Andrew Cooper @ 2015-05-14 5:42 ` Yang Hongyang 2015-05-14 7:19 ` Andrew Cooper 1 sibling, 1 reply; 13+ messages in thread From: Yang Hongyang @ 2015-05-14 5:42 UTC (permalink / raw) To: Andrew Cooper, xen-devel Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang, eddie.dong, rshriram, ian.jackson On 05/14/2015 12:16 AM, Andrew Cooper wrote: > On 13/05/15 09:35, Yang Hongyang wrote: >> With Remus, the restore flow should be: >> the first full migration stream -> { periodically restore stream } >> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> CC: Ian Campbell <Ian.Campbell@citrix.com> >> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> >> CC: Wei Liu <wei.liu2@citrix.com> >> --- >> tools/libxc/xc_sr_common.h | 13 +++++ >> tools/libxc/xc_sr_restore.c | 124 +++++++++++++++++++++++++++++++++++++++----- >> 2 files changed, 125 insertions(+), 12 deletions(-) >> >> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h >> index f8121e7..3740b89 100644 >> --- a/tools/libxc/xc_sr_common.h >> +++ b/tools/libxc/xc_sr_common.h >> @@ -208,6 +208,19 @@ struct xc_sr_context >> /* Plain VM, or checkpoints over time. */ >> bool checkpointed; >> >> + /* Currently buffering records between a checkpoint */ >> + bool buffer_all_records; >> + >> +/* >> + * With Remus, we buffer the records sent by primary at checkpoint, > > "sent by the primary" > >> + * in case the primary will fail, we can recover from the last >> + * checkpoint state. >> + * This should be enough because primary only send dirty pages at >> + * checkpoint. >> + */ >> +#define MAX_BUF_RECORDS 1024 > > As a general point, it occurs to me that this might be better as a > linked list, rather than having a hard limit. A burst of activity on > the primary could potentially hit this limit There are only few records at every checkpoint in my test, mostly under 10, probably because I don't do much operations in the Guest. This limit can be adjusted later by further testing. Even with linked list, we still need a limit IMO, otherwise it will eat too much memory. > >> + struct xc_sr_record *buffered_records[MAX_BUF_RECORDS]; > > Your data handling will be much more simple if this were struct > xc_sr_record *buffered_records; and you do a one-time allocation of > MAX_BUF_RECORDS * sizeof(struct xc_sr_record), although it will require > an index integer as well. Good point, it will avoid the alloc/free of the struct xc_sr_record, I will add a 'buffered_rec_num' to the context also. > > It would probably be best to factor out setup() and clean() just like > you did for the save side. OK. > >> + >> /* >> * Xenstore and Console parameters. >> * INPUT: evtchn & domid >> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c >> index 0e512ec..85534a8 100644 >> --- a/tools/libxc/xc_sr_restore.c [...] >> @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx) >> { >> rc = read_record(ctx, &rec); >> if ( rc ) >> - goto err; >> + { >> + if ( ctx->restore.buffer_all_records ) >> + goto err_buf; > > Please can we choose a label sufficiently different to "err". > > "resume_from_checkpoint" perhaps? I think "remus_failover" is better? If you don't have objections, I will use it as the label. > > ~Andrew > >> + else >> + goto err; >> + } >> + >> +#ifdef XG_LIBXL_HVM_COMPAT >> + if ( ctx->dominfo.hvm && >> + (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) ) >> + { >> + rc = read_qemu(ctx); >> + if ( rc ) >> + { >> + if ( ctx->restore.buffer_all_records ) >> + goto err_buf; >> + else >> + goto err; >> + } >> + } >> +#endif >> >> rc = process_record(ctx, &rec); >> if ( rc ) >> @@ -549,15 +646,11 @@ static int restore(struct xc_sr_context *ctx) >> >> } while ( rec.type != REC_TYPE_END ); >> >> -#ifdef XG_LIBXL_HVM_COMPAT >> - if ( ctx->dominfo.hvm ) >> - { >> - rc = read_qemu(ctx); >> - if ( rc ) >> - goto err; >> - } >> -#endif >> - >> + err_buf: >> + /* >> + * With Remus, if we reach here, there must be some error on primary, >> + * failover from the last checkpoint state. >> + */ >> rc = ctx->restore.ops.stream_complete(ctx); >> if ( rc ) >> goto err; >> @@ -571,6 +664,13 @@ static int restore(struct xc_sr_context *ctx) >> PERROR("Restore failed"); >> >> done: >> + for ( i = 0; i < MAX_BUF_RECORDS; i++) >> + { >> + if ( ctx->restore.buffered_records[i] ) { >> + free(ctx->restore.buffered_records[i]->data); >> + free(ctx->restore.buffered_records[i]); >> + } >> + } >> free(ctx->restore.populated_pfns); >> rc = ctx->restore.ops.cleanup(ctx); >> if ( rc ) > > . > -- Thanks, Yang. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore 2015-05-14 5:42 ` Yang Hongyang @ 2015-05-14 7:19 ` Andrew Cooper 0 siblings, 0 replies; 13+ messages in thread From: Andrew Cooper @ 2015-05-14 7:19 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang, eddie.dong, rshriram, ian.jackson On 14/05/2015 06:42, Yang Hongyang wrote: > >>> @@ -541,7 +618,27 @@ static int restore(struct xc_sr_context *ctx) >>> { >>> rc = read_record(ctx, &rec); >>> if ( rc ) >>> - goto err; >>> + { >>> + if ( ctx->restore.buffer_all_records ) >>> + goto err_buf; >> >> Please can we choose a label sufficiently different to "err". >> >> "resume_from_checkpoint" perhaps? > > I think "remus_failover" is better? If you don't have objections, I will > use it as the label. Fine by me. ~Andrew ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-05-14 7:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-13 8:34 [PATCH Remus v3 0/3] Remus support for Migration-v2 Yang Hongyang 2015-05-13 8:34 ` [PATCH Remus v3 1/3] libxc/save: implement Remus checkpointed save Yang Hongyang 2015-05-13 15:14 ` Andrew Cooper 2015-05-14 1:08 ` Yang Hongyang 2015-05-13 8:35 ` [PATCH Remus v3 2/3] libxc/restore: add checkpointed flag to the restore context Yang Hongyang 2015-05-13 15:15 ` Andrew Cooper 2015-05-14 1:09 ` Yang Hongyang 2015-05-14 7:18 ` Andrew Cooper 2015-05-13 8:35 ` [PATCH Remus v3 3/3] libxc/restore: implement Remus checkpointed restore Yang Hongyang 2015-05-13 16:16 ` Andrew Cooper 2015-05-13 16:38 ` Andrew Cooper 2015-05-14 5:42 ` Yang Hongyang 2015-05-14 7:19 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.