xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] libxl/remus: fix the return value of the checkpoint callback
@ 2015-07-15 10:32 Yang Hongyang
  2015-07-15 12:13 ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Yang Hongyang @ 2015-07-15 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Yang Hongyang, Ian Jackson, Ian Campbell

In checkpoint callback, we wait for the interval and then start
another checkpoint, so the ERROR_TIMEDOUT should be intended
and should not treat as error.

This patch is based on
[PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_remus.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 46dcc3c..ffc92a7 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
      * (xc_domain_save.c). in order to continue executing the infinite loop
      * (suspend, checkpoint, resume) in xc_domain_save().
      */
-
-    if (rc)
+    if (rc == ERROR_TIMEDOUT) {
+        /* This is intended, we set the timeout and start another checkpoint */
+        libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);
+    } else {
         dss->rc = rc;
-
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+        libxl__xc_domain_saverestore_async_callback_done(egc,
+                                                         &dss->sws.shs, !rc);
+    }
 }
 
 /*---------------------- remus callbacks (restore) -----------------------*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
  2015-07-15 10:32 [PATCH] libxl/remus: fix the return value of the checkpoint callback Yang Hongyang
@ 2015-07-15 12:13 ` Ian Campbell
  2015-07-15 12:30   ` Yang Hongyang
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-15 12:13 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: Wei Liu, Ian Jackson, xen-devel

On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote:
> In checkpoint callback, we wait for the interval and then start
> another checkpoint, so the ERROR_TIMEDOUT should be intended
> and should not treat as error.
> 
> This patch is based on
> [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service

Does that mean it won't apply to current staging?

I think we probably want this fix ASAP rather than waiting for that
series?

> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_remus.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
> index 46dcc3c..ffc92a7 100644
> --- a/tools/libxl/libxl_remus.c
> +++ b/tools/libxl/libxl_remus.c
> @@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>       * (xc_domain_save.c). in order to continue executing the infinite loop
>       * (suspend, checkpoint, resume) in xc_domain_save().
>       */
> -
> -    if (rc)
> +    if (rc == ERROR_TIMEDOUT) {
> +        /* This is intended, we set the timeout and start another checkpoint */
> +        libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);

Please wrap this (slightly) overlong line (and probably the comment too
which is borderline AFAICT).

> +    } else {
>          dss->rc = rc;
> -
> -    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
> +        libxl__xc_domain_saverestore_async_callback_done(egc,
> +                                                         &dss->sws.shs, !rc);
> +    }
>  }
>  
>  /*---------------------- remus callbacks (restore) -----------------------*/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
  2015-07-15 12:13 ` Ian Campbell
@ 2015-07-15 12:30   ` Yang Hongyang
  2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
  2015-07-15 13:37   ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
  2 siblings, 0 replies; 10+ messages in thread
From: Yang Hongyang @ 2015-07-15 12:30 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, Ian Jackson, xen-devel



On 07/15/2015 08:13 PM, Ian Campbell wrote:
> On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote:
>> In checkpoint callback, we wait for the interval and then start
>> another checkpoint, so the ERROR_TIMEDOUT should be intended
>> and should not treat as error.
>>
>> This patch is based on
>> [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
>
> Does that mean it won't apply to current staging?

No. This can apply on top of colo pre series.

>
> I think we probably want this fix ASAP rather than waiting for that
> series?

I can resubmit the patch apply to staging, but the colo pre series will need
to be rebased...

>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> ---
>>   tools/libxl/libxl_remus.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
>> index 46dcc3c..ffc92a7 100644
>> --- a/tools/libxl/libxl_remus.c
>> +++ b/tools/libxl/libxl_remus.c
>> @@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>>        * (xc_domain_save.c). in order to continue executing the infinite loop
>>        * (suspend, checkpoint, resume) in xc_domain_save().
>>        */
>> -
>> -    if (rc)
>> +    if (rc == ERROR_TIMEDOUT) {
>> +        /* This is intended, we set the timeout and start another checkpoint */
>> +        libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 1);
>
> Please wrap this (slightly) overlong line (and probably the comment too
> which is borderline AFAICT).
>
>> +    } else {
>>           dss->rc = rc;
>> -
>> -    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
>> +        libxl__xc_domain_saverestore_async_callback_done(egc,
>> +                                                         &dss->sws.shs, !rc);
>> +    }
>>   }
>>
>>   /*---------------------- remus callbacks (restore) -----------------------*/
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
  2015-07-15 12:13 ` Ian Campbell
  2015-07-15 12:30   ` Yang Hongyang
@ 2015-07-15 13:35   ` Ian Jackson
  2015-07-15 13:45     ` Ian Campbell
                       ` (2 more replies)
  2015-07-15 13:37   ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
  2 siblings, 3 replies; 10+ messages in thread
From: Ian Jackson @ 2015-07-15 13:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Yang Hongyang, Ian Jackson, Ian Campbell

When the timeout set for prompting the next remus iteration fires, we
should not treat the ERROR_TIMEDOUT as an error.

Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".

Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_dom.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 81adb3d..4cb247a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
 
     STATE_AO_GC(dss->ao);
 
+    if (rc == ERROR_TIMEDOUT) /* As intended */
+        rc = 0;
+
     /*
      * Time to checkpoint the guest again. We return 1 to libxc
      * (xc_domain_save.c). in order to continue executing the infinite loop
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
  2015-07-15 12:13 ` Ian Campbell
  2015-07-15 12:30   ` Yang Hongyang
  2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
@ 2015-07-15 13:37   ` Ian Jackson
  2015-07-16  2:45     ` Yang Hongyang
  2 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2015-07-15 13:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Yang Hongyang, Wei Liu, xen-devel

Ian Campbell writes ("Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback"):
> Does that mean it won't apply to current staging?

Indeed it doesn't.

> I think we probably want this fix ASAP rather than waiting for that
> series?

Yes.  Patch just sent.  Untested but fairly obvious.  Yang, do you
want to test this, or do you want us to apply it as-is ?  I don't have
a remus test setup.

This is the second rc-handling bug in 31c836f4 "libxl: events: Permit
timeouts to signal ao abort".  I am going to re-read that patch to see
if I can find any more.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
  2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
@ 2015-07-15 13:45     ` Ian Campbell
  2015-07-15 13:47     ` Wei Liu
  2015-07-16  2:44     ` Yang Hongyang
  2 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-15 13:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Yang Hongyang

On Wed, 2015-07-15 at 14:35 +0100, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
> 
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
> 
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
  2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
  2015-07-15 13:45     ` Ian Campbell
@ 2015-07-15 13:47     ` Wei Liu
  2015-07-16  2:44     ` Yang Hongyang
  2 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-07-15 13:47 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Ian Campbell, Yang Hongyang

On Wed, Jul 15, 2015 at 02:35:56PM +0100, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
> 
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
> 
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  tools/libxl/libxl_dom.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 81adb3d..4cb247a 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>  
>      STATE_AO_GC(dss->ao);
>  
> +    if (rc == ERROR_TIMEDOUT) /* As intended */
> +        rc = 0;
> +
>      /*
>       * Time to checkpoint the guest again. We return 1 to libxc
>       * (xc_domain_save.c). in order to continue executing the infinite loop
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
  2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
  2015-07-15 13:45     ` Ian Campbell
  2015-07-15 13:47     ` Wei Liu
@ 2015-07-16  2:44     ` Yang Hongyang
  2015-07-16 15:51       ` Ian Campbell
  2 siblings, 1 reply; 10+ messages in thread
From: Yang Hongyang @ 2015-07-16  2:44 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Wei Liu, Ian Campbell



On 07/15/2015 09:35 PM, Ian Jackson wrote:
> When the timeout set for prompting the next remus iteration fires, we
> should not treat the ERROR_TIMEDOUT as an error.
>
> Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
>
> Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>

> ---
>   tools/libxl/libxl_dom.c |    3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 81adb3d..4cb247a 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
>
>       STATE_AO_GC(dss->ao);
>
> +    if (rc == ERROR_TIMEDOUT) /* As intended */
> +        rc = 0;
> +
>       /*
>        * Time to checkpoint the guest again. We return 1 to libxc
>        * (xc_domain_save.c). in order to continue executing the infinite loop
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback
  2015-07-15 13:37   ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
@ 2015-07-16  2:45     ` Yang Hongyang
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Hongyang @ 2015-07-16  2:45 UTC (permalink / raw)
  To: Ian Jackson, Ian Campbell; +Cc: Wei Liu, xen-devel



On 07/15/2015 09:37 PM, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback"):
>> Does that mean it won't apply to current staging?
>
> Indeed it doesn't.
>
>> I think we probably want this fix ASAP rather than waiting for that
>> series?
>
> Yes.  Patch just sent.  Untested but fairly obvious.  Yang, do you
> want to test this, or do you want us to apply it as-is ?  I don't have
> a remus test setup.

Please apply, thanks!

>
> This is the second rc-handling bug in 31c836f4 "libxl: events: Permit
> timeouts to signal ao abort".  I am going to re-read that patch to see
> if I can find any more.
>
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
  2015-07-16  2:44     ` Yang Hongyang
@ 2015-07-16 15:51       ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2015-07-16 15:51 UTC (permalink / raw)
  To: Yang Hongyang; +Cc: Wei Liu, xen-devel, Ian Jackson

On Thu, 2015-07-16 at 10:44 +0800, Yang Hongyang wrote:
> 
> On 07/15/2015 09:35 PM, Ian Jackson wrote:
> > When the timeout set for prompting the next remus iteration fires, we
> > should not treat the ERROR_TIMEDOUT as an error.
> >
> > Bug in 31c836f4 "libxl: events: Permit timeouts to signal ao abort".
> >
> > Reported-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wei Liu <wei.liu2@citrix.com>
> > CC: Ian Campbell <ian.campbell@citrix.com>
> 
> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-16 15:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 10:32 [PATCH] libxl/remus: fix the return value of the checkpoint callback Yang Hongyang
2015-07-15 12:13 ` Ian Campbell
2015-07-15 12:30   ` Yang Hongyang
2015-07-15 13:35   ` [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT Ian Jackson
2015-07-15 13:45     ` Ian Campbell
2015-07-15 13:47     ` Wei Liu
2015-07-16  2:44     ` Yang Hongyang
2015-07-16 15:51       ` Ian Campbell
2015-07-15 13:37   ` [PATCH] libxl/remus: fix the return value of the checkpoint callback Ian Jackson
2015-07-16  2:45     ` Yang Hongyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).