* [PATCH] libxl/remus: fix the return value of the checkpoint callback
@ 2015-07-15 10:32 Yang Hongyang
2015-07-15 12:13 ` Ian Campbell
0 siblings, 1 reply; 10+ messages in thread
From: Yang Hongyang @ 2015-07-15 10:32 UTC (permalink / raw)
To: xen-devel; +Cc: Wei Liu, Yang Hongyang, Ian Jackson, Ian Campbell
In checkpoint callback, we wait for the interval and then start
another checkpoint, so the ERROR_TIMEDOUT should be intended
and should not treat as error.
This patch is based on
[PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
tools/libxl/libxl_remus.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 46dcc3c..ffc92a7 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
* (xc_domain_save.c). in order to continue executing the infinite loop
* (suspend, checkpoint, resume) in xc_domain_save().
*/
-
- if (rc)
+ if (rc == ERROR_TIMEDOUT) {
+ /* This is intended, we set the timeout and start another checkpoint */
+ libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);
+ } else {
dss->rc = rc;
-
- libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+ libxl__xc_domain_saverestore_async_callback_done(egc,
+ &dss->sws.shs, !rc);
+ }
}
/*---------------------- remus callbacks (restore) -----------------------*/
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
2015-07-15 10:32 [PATCH] libxl/remus: fix the return value of the checkpoint callback Yang Hongyang
@ 2015-07-15 12:13 ` Ian Campbell
2015-07-15 12:30 ` Yang Hongyang
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-15 12:13 UTC (permalink / raw)
To: Yang Hongyang; +Cc: Wei Liu, Ian Jackson, xen-devel
On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote:
> In checkpoint callback, we wait for the interval and then start
> another checkpoint, so the ERROR_TIMEDOUT should be intended
> and should not treat as error.
>
> This patch is based on
> [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
Does that mean it won't apply to current staging?
I think we probably want this fix ASAP rather than waiting for that
series?
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
> tools/libxl/libxl_remus.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
> index 46dcc3c..ffc92a7 100644
> --- a/tools/libxl/libxl_remus.c
> +++ b/tools/libxl/libxl_remus.c
> @@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
> * (xc_domain_save.c). in order to continue executing the infinite loop
> * (suspend, checkpoint, resume) in xc_domain_save().
> */
> -
> - if (rc)
> + if (rc == ERROR_TIMEDOUT) {
> + /* This is intended, we set the timeout and start another checkpoint */
> + libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);
Please wrap this (slightly) overlong line (and probably the comment too
which is borderline AFAICT).
> + } else {
> dss->rc = rc;
> -
> - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
> + libxl__xc_domain_saverestore_async_callback_done(egc,
> + &dss->sws.shs, !rc);
> + }
> }
>
> /*---------------------- remus callbacks (restore) -----------------------*/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
2015-07-15 12:13 ` Ian Campbell
@ 2015-07-15 12:30 ` Yang Hongyang
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
2015-07-15 13:37 ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
2 siblings, 0 replies; 10+ messages in thread
From: Yang Hongyang @ 2015-07-15 12:30 UTC (permalink / raw)
To: Ian Campbell; +Cc: Wei Liu, Ian Jackson, xen-devel
On 07/15/2015 08:13 PM, Ian Campbell wrote:
> On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote:
>> In checkpoint callback, we wait for the interval and then start
>> another checkpoint, so the ERROR_TIMEDOUT should be intended
>> and should not treat as error.
>>
>> This patch is based on
>> [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
>
> Does that mean it won't apply to current staging?
No. This can apply on top of colo pre series.
>
> I think we probably want this fix ASAP rather than waiting for that
> series?
I can resubmit the patch apply to staging, but the colo pre series will need
to be rebased...
>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>> tools/libxl/libxl_remus.c | 11 +++++++----
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
>> index 46dcc3c..ffc92a7 100644
>> --- a/tools/libxl/libxl_remus.c
>> +++ b/tools/libxl/libxl_remus.c
>> @@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>> * (xc_domain_save.c). in order to continue executing the infinite loop
>> * (suspend, checkpoint, resume) in xc_domain_save().
>> */
>> -
>> - if (rc)
>> + if (rc == ERROR_TIMEDOUT) {
>> + /* This is intended, we set the timeout and start another checkpoint */
>> + libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);
>
> Please wrap this (slightly) overlong line (and probably the comment too
> which is borderline AFAICT).
>
>> + } else {
>> dss->rc = rc;
>> -
>> - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
>> + libxl__xc_domain_saverestore_async_callback_done(egc,
>> + &dss->sws.shs, !rc);
>> + }
>> }
>>
>> /*---------------------- remus callbacks (restore) -----------------------*/
>
>
> .
>
--
Thanks,
Yang.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
2015-07-15 12:13 ` Ian Campbell
2015-07-15 12:30 ` Yang Hongyang
@ 2015-07-15 13:35 ` Ian Jackson
2015-07-15 13:45 ` Ian Campbell
` (2 more replies)
2015-07-15 13:37 ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
2 siblings, 3 replies; 10+ messages in thread
From: Ian Jackson @ 2015-07-15 13:35 UTC (permalink / raw)
To: xen-devel; +Cc: Wei Liu, Yang Hongyang, Ian Jackson, Ian Campbell
When the timeout set for prompting the next remus iteration fires, we
should not treat the ERROR_TIMEDOUT as an error.
Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
---
tools/libxl/libxl_dom.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 81adb3d..4cb247a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
STATE_AO_GC(dss->ao);
+ if (rc == ERROR_TIMEDOUT) /* As intended */
+ rc = 0;
+
/*
* Time to checkpoint the guest again. We return 1 to libxc
* (xc_domain_save.c). in order to continue executing the infinite loop
--
1.7.10.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
2015-07-15 12:13 ` Ian Campbell
2015-07-15 12:30 ` Yang Hongyang
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
@ 2015-07-15 13:37 ` Ian Jackson
2015-07-16 2:45 ` Yang Hongyang
2 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2015-07-15 13:37 UTC (permalink / raw)
To: Ian Campbell; +Cc: Yang Hongyang, Wei Liu, xen-devel
Ian Campbell writes ("Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback"):
> Does that mean it won't apply to current staging?
Indeed it doesn't.
> I think we probably want this fix ASAP rather than waiting for that
> series?
Yes. Patch just sent. Untested but fairly obvious. Yang, do you
want to test this, or do you want us to apply it as-is ? I don't have
a remus test setup.
This is the second rc-handling bug in 31c836f4 "libxl: events: Permit
timeouts to signal ao abort". I am going to re-read that patch to see
if I can find any more.
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
@ 2015-07-15 13:45 ` Ian Campbell
2015-07-15 13:47 ` Wei Liu
2015-07-16 2:44 ` Yang Hongyang
2 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-15 13:45 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel, Wei Liu, Yang Hongyang
On Wed, 2015-07-15 at 14:35 +0100, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
>
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
>
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
2015-07-15 13:45 ` Ian Campbell
@ 2015-07-15 13:47 ` Wei Liu
2015-07-16 2:44 ` Yang Hongyang
2 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-07-15 13:47 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel, Wei Liu, Ian Campbell, Yang Hongyang
On Wed, Jul 15, 2015 at 02:35:56PM +0100, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
>
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
>
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
> ---
> tools/libxl/libxl_dom.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 81adb3d..4cb247a 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>
> STATE_AO_GC(dss->ao);
>
> + if (rc == ERROR_TIMEDOUT) /* As intended */
> + rc = 0;
> +
> /*
> * Time to checkpoint the guest again. We return 1 to libxc
> * (xc_domain_save.c). in order to continue executing the infinite loop
> --
> 1.7.10.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
2015-07-15 13:45 ` Ian Campbell
2015-07-15 13:47 ` Wei Liu
@ 2015-07-16 2:44 ` Yang Hongyang
2015-07-16 15:51 ` Ian Campbell
2 siblings, 1 reply; 10+ messages in thread
From: Yang Hongyang @ 2015-07-16 2:44 UTC (permalink / raw)
To: Ian Jackson, xen-devel; +Cc: Wei Liu, Ian Campbell
On 07/15/2015 09:35 PM, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
>
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
>
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
> tools/libxl/libxl_dom.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 81adb3d..4cb247a 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>
> STATE_AO_GC(dss->ao);
>
> + if (rc == ERROR_TIMEDOUT) /* As intended */
> + rc = 0;
> +
> /*
> * Time to checkpoint the guest again. We return 1 to libxc
> * (xc_domain_save.c). in order to continue executing the infinite loop
>
--
Thanks,
Yang.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
2015-07-15 13:37 ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
@ 2015-07-16 2:45 ` Yang Hongyang
0 siblings, 0 replies; 10+ messages in thread
From: Yang Hongyang @ 2015-07-16 2:45 UTC (permalink / raw)
To: Ian Jackson, Ian Campbell; +Cc: Wei Liu, xen-devel
On 07/15/2015 09:37 PM, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback"):
>> Does that mean it won't apply to current staging?
>
> Indeed it doesn't.
>
>> I think we probably want this fix ASAP rather than waiting for that
>> series?
>
> Yes. Patch just sent. Untested but fairly obvious. Yang, do you
> want to test this, or do you want us to apply it as-is ? I don't have
> a remus test setup.
Please apply, thanks!
>
> This is the second rc-handling bug in 31c836f4 "libxl: events: Permit
> timeouts to signal ao abort". I am going to re-read that patch to see
> if I can find any more.
>
> Ian.
> .
>
--
Thanks,
Yang.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
2015-07-16 2:44 ` Yang Hongyang
@ 2015-07-16 15:51 ` Ian Campbell
0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-16 15:51 UTC (permalink / raw)
To: Yang Hongyang; +Cc: Wei Liu, xen-devel, Ian Jackson
On Thu, 2015-07-16 at 10:44 +0800, Yang Hongyang wrote:
>
> On 07/15/2015 09:35 PM, Ian Jackson wrote:
> > When the timeout set for prompting the next remus iteration fires, we
> > should not treat the ERROR_TIMEDOUT as an error.
> >
> > Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
> >
> > Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wei Liu <wei.liu2@citrix.com>
> > CC: Ian Campbell <ian.campbell@citrix.com>
>
> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-16 15:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 10:32 [PATCH] libxl/remus: fix the return value of the checkpoint callback Yang Hongyang
2015-07-15 12:13 ` Ian Campbell
2015-07-15 12:30 ` Yang Hongyang
2015-07-15 13:35 ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
2015-07-15 13:45 ` Ian Campbell
2015-07-15 13:47 ` Wei Liu
2015-07-16 2:44 ` Yang Hongyang
2015-07-16 15:51 ` Ian Campbell
2015-07-15 13:37 ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
2015-07-16 2:45 ` Yang Hongyang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).