All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] migration/remus: bug fix and cleanup
@ 2016-01-08  6:38 Wen Congyang
  2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang


Wen Congyang (5):
  remus: don't call stream_continue() when doing failover
  remus: resume immediately if libxl__xc_domain_save_done() completes
  tools/libxc: don't send end record if remus fails
  tools/libxc: error handling for the postcopy() callback
  tools/libxl: remove unused function libxl__domain_save_device_model()

 tools/libxc/xc_sr_save.c         |  6 ++-
 tools/libxl/libxl.c              |  4 --
 tools/libxl/libxl_dom.c          | 91 ----------------------------------------
 tools/libxl/libxl_internal.h     |  6 ---
 tools/libxl/libxl_stream_read.c  | 21 +++++++---
 tools/libxl/libxl_stream_write.c | 13 +++++-
 6 files changed, 31 insertions(+), 110 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/5] remus: don't call stream_continue() when doing failover
  2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
@ 2016-01-08  6:38 ` Wen Congyang
  2016-01-08 16:20   ` Ian Campbell
  2016-01-26  6:37   ` Yang Hongyang
  2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

stream_continue() is used for migration to read emulator
xenstore data and emulator context. For remus, if we do
failover, we have read it in the checkpoint cycle, and
we only need to complete the stream.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxl/libxl_stream_read.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..65219d5 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -758,6 +758,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     libxl__stream_read_state *stream = &dcs->srs;
     STATE_AO_GC(dcs->ao);
 
+    /* convenience aliases */
+    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
+
     if (rc)
         goto err;
 
@@ -777,11 +780,19 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
      * If the stream is not still alive, we must not continue any work.
      */
     if (libxl__stream_read_inuse(stream)) {
-        /*
-         * Libxc has indicated that it is done with the stream.  Resume reading
-         * libxl records from it.
-         */
-        stream_continue(egc, stream);
+        if (checkpointed_stream) {
+            /*
+             * Failover from primary. Domain state is currently at a
+             * consistent checkpoint, ready to go.
+             */
+            stream_complete(egc, stream, 0);
+        } else {
+            /*
+             * Libxc has indicated that it is done with the stream.
+             * Resume reading libxl records from it.
+             */
+            stream_continue(egc, stream);
+        }
     }
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
  2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
@ 2016-01-08  6:38 ` Wen Congyang
  2016-01-08  9:52   ` Andrew Cooper
                     ` (2 more replies)
  2016-01-08  6:38 ` [PATCH v3 3/5] tools/libxc: don't send end record if remus fails Wen Congyang
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

For example: if the secondary host is down, and we fail to send the data to
the secondary host. xc_domain_save() returns 0. So in the function
libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
and retval is 0(it is xc_domain_save()'s return value). In such case, we
just need to complete the stream.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_stream_write.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 80d9208..82e7719 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
      * alive, and check_all_finished() may have torn it down around us.
      * If the stream is not still alive, we must not continue any work.
      */
-    if (libxl__stream_write_inuse(stream))
-        write_emulator_xenstore_record(egc, stream);
+    if (libxl__stream_write_inuse(stream)) {
+        if (dss->remus)
+            /*
+             * For remus, if libxl__xc_domain_save_done() completes,
+             * there was an error sending data to the secondary.
+             * Resume the primary ASAP.
+             */
+            stream_complete(egc, stream, 0);
+        else
+            write_emulator_xenstore_record(egc, stream);
+    }
 }
 
 static void write_emulator_xenstore_record(libxl__egc *egc,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/5] tools/libxc: don't send end record if remus fails
  2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
  2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
  2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
@ 2016-01-08  6:38 ` Wen Congyang
  2016-01-26  6:44   ` Yang Hongyang
  2016-01-08  6:38 ` [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback Wen Congyang
  2016-01-08  6:38 ` [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model() Wen Congyang
  4 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_save.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 88d85ef..e532168 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -795,7 +795,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 
             rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
             if ( rc <= 0 )
-                ctx->save.checkpointed = false;
+                goto err;
         }
     } while ( ctx->save.checkpointed );
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback
  2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
                   ` (2 preceding siblings ...)
  2016-01-08  6:38 ` [PATCH v3 3/5] tools/libxc: don't send end record if remus fails Wen Congyang
@ 2016-01-08  6:38 ` Wen Congyang
  2016-01-26  6:45   ` Yang Hongyang
  2016-01-08  6:38 ` [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model() Wen Congyang
  4 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_save.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index e532168..e4ba560 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -791,7 +791,9 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
             if ( rc )
                 goto err;
 
-            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+            rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+            if ( rc <= 0 )
+                goto err;
 
             rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
             if ( rc <= 0 )
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model()
  2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
                   ` (3 preceding siblings ...)
  2016-01-08  6:38 ` [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback Wen Congyang
@ 2016-01-08  6:38 ` Wen Congyang
  2016-01-08 16:30   ` Ian Campbell
  4 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-08  6:38 UTC (permalink / raw)
  To: xen devel, Andrew Cooper
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Wen Congyang, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

After the commit d77570e7, libxl__domain_save_device_model() can
be dropped.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  4 --
 tools/libxl/libxl_dom.c      | 91 --------------------------------------------
 tools/libxl/libxl_internal.h |  6 ---
 3 files changed, 101 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 9207621..c286881 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1551,10 +1551,6 @@ static void stubdom_destroy_callback(libxl__egc *egc,
     dds->stubdom_finished = 1;
     savefile = libxl__device_model_savefile(gc, dis->domid);
     rc = libxl__remove_file(gc, savefile);
-    /*
-     * On suspend libxl__domain_save_device_model will have already
-     * unlinked the save file.
-     */
     if (rc) {
         LOG(ERROR, "failed to remove device-model savefile %s", savefile);
     }
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 47971a9..2269998 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1785,97 +1785,6 @@ static void stream_done(libxl__egc *egc,
     domain_save_done(egc, sws->dss, rc);
 }
 
-static void save_device_model_datacopier_done(libxl__egc *egc,
-     libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
-
-void libxl__domain_save_device_model(libxl__egc *egc,
-                                     libxl__domain_suspend_state *dss,
-                                     libxl__save_device_model_cb *callback)
-{
-    STATE_AO_GC(dss->ao);
-    struct stat st;
-    uint32_t qemu_state_len;
-    int rc;
-
-    dss->save_dm_callback = callback;
-
-    /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
-    const int fd = dss->fd;
-
-    libxl__datacopier_state *dc = &dss->save_dm_datacopier;
-    memset(dc, 0, sizeof(*dc));
-    dc->readwhat = GCSPRINTF("qemu save file %s", filename);
-    dc->ao = ao;
-    dc->readfd = -1;
-    dc->writefd = fd;
-    dc->maxsz = INT_MAX;
-    dc->bytes_to_read = -1;
-    dc->copywhat = GCSPRINTF("qemu save file for domain %"PRIu32, dss->domid);
-    dc->writewhat = "save/migration stream";
-    dc->callback = save_device_model_datacopier_done;
-
-    dc->readfd = open(filename, O_RDONLY);
-    if (dc->readfd < 0) {
-        LOGE(ERROR, "unable to open %s", dc->readwhat);
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    if (fstat(dc->readfd, &st))
-    {
-        LOGE(ERROR, "unable to fstat %s", dc->readwhat);
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    if (!S_ISREG(st.st_mode)) {
-        LOG(ERROR, "%s is not a plain file!", dc->readwhat);
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    qemu_state_len = st.st_size;
-    LOG(DEBUG, "%s is %d bytes", dc->readwhat, qemu_state_len);
-
-    rc = libxl__datacopier_start(dc);
-    if (rc) goto out;
-
-    libxl__datacopier_prefixdata(egc, dc,
-                                 QEMU_SIGNATURE, strlen(QEMU_SIGNATURE));
-
-    libxl__datacopier_prefixdata(egc, dc,
-                                 &qemu_state_len, sizeof(qemu_state_len));
-    return;
-
- out:
-    save_device_model_datacopier_done(egc, dc, rc, -1, EIO);
-}
-
-static void save_device_model_datacopier_done(libxl__egc *egc,
-     libxl__datacopier_state *dc, int our_rc, int onwrite, int errnoval)
-{
-    libxl__domain_suspend_state *dss =
-        CONTAINER_OF(dc, *dss, save_dm_datacopier);
-    STATE_AO_GC(dss->ao);
-
-    /* Convenience aliases */
-    const char *const filename = dss->dm_savefile;
-    int rc;
-
-    libxl__datacopier_kill(dc);
-
-    if (dc->readfd >= 0) {
-        close(dc->readfd);
-        dc->readfd = -1;
-    }
-
-    rc = libxl__remove_file(gc, filename);
-    if (!our_rc) our_rc = rc;
-
-    dss->save_dm_callback(egc, dss, our_rc);
-}
-
 static void libxl__remus_teardown(libxl__egc *egc,
                                   libxl__domain_suspend_state *dss,
                                   int rc);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a556a38..233d44a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3103,9 +3103,6 @@ struct libxl__domain_suspend_state {
     libxl__logdirty_switch logdirty;
     void (*callback_common_done)(libxl__egc*,
                                  struct libxl__domain_suspend_state*, int ok);
-    /* private for libxl__domain_save_device_model */
-    libxl__save_device_model_cb *save_dm_callback;
-    libxl__datacopier_state save_dm_datacopier;
 };
 
 
@@ -3498,9 +3495,6 @@ static inline bool libxl__save_helper_inuse(const libxl__save_helper_state *shs)
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
                                            libxl__domain_suspend_state *dss);
-_hidden void libxl__domain_save_device_model(libxl__egc *egc,
-                                     libxl__domain_suspend_state *dss,
-                                     libxl__save_device_model_cb *callback);
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
@ 2016-01-08  9:52   ` Andrew Cooper
  2016-01-08 16:27   ` Ian Campbell
  2016-01-26  6:41   ` Yang Hongyang
  2 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2016-01-08  9:52 UTC (permalink / raw)
  To: Wen Congyang, xen devel
  Cc: Changlong Xie, Wei Liu, Ian Campbell, Ian Jackson,
	Shriram Rajagopalan, Yang Hongyang

On 08/01/16 06:38, Wen Congyang wrote:
> For example: if the secondary host is down, and we fail to send the data to
> the secondary host. xc_domain_save() returns 0. So in the function
> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
> and retval is 0(it is xc_domain_save()'s return value). In such case, we
> just need to complete the stream.
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/5] remus: don't call stream_continue() when doing failover
  2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
@ 2016-01-08 16:20   ` Ian Campbell
  2016-01-12  1:36     ` Wen Congyang
  2016-01-26  6:37   ` Yang Hongyang
  1 sibling, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2016-01-08 16:20 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> stream_continue() is used for migration to read emulator
> xenstore data and emulator context. For remus, if we do
> failover, we have read it in the checkpoint cycle, and
> we only need to complete the stream.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxl/libxl_stream_read.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl_stream_read.c
> b/tools/libxl/libxl_stream_read.c
> index 258dec4..65219d5 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -758,6 +758,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc,
> void *dcs_void,
>      libxl__stream_read_state *stream = &dcs->srs;
>      STATE_AO_GC(dcs->ao);
>  
> +    /* convenience aliases */
> +    const int checkpointed_stream = dcs-
> >restore_params.checkpointed_stream;
> +
>      if (rc)
>          goto err;
>  
> @@ -777,11 +780,19 @@ void libxl__xc_domain_restore_done(libxl__egc *egc,
> void *dcs_void,
>       * If the stream is not still alive, we must not continue any work.
>       */
>      if (libxl__stream_read_inuse(stream)) {
> -        /*
> -         * Libxc has indicated that it is done with the stream.  Resume
> reading
> -         * libxl records from it.
> -         */
> -        stream_continue(egc, stream);
> +        if (checkpointed_stream) {
> +            /*
> +             * Failover from primary. Domain state is currently at a
> +             * consistent checkpoint, ready to go.

This implies that the stream is currently at a consistent point. Whereas
what I think is meant is that things have failed (perhaps halfway through a
checkpoint, i.e. not at a consistent state), therefore we stop and continue
with the previous fully consistent checkpoint (which may have been earlier
in the stream, not at the current point). Is that right?

And what does "ready to go" mean? Does it mean that we will return back to
the next higher level or that we go somewhere else first?

The big comment about flow control at the top of this file doesn't seem to
cover the checkpoint case, if it did I suspect I would have found the
answers there.

> +             */
> +            stream_complete(egc, stream, 0);
> +        } else {
> +            /*
> +             * Libxc has indicated that it is done with the stream.
> +             * Resume reading libxl records from it.
> +             */
> +            stream_continue(egc, stream);
> +        }
>      }
>  }
>  
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
  2016-01-08  9:52   ` Andrew Cooper
@ 2016-01-08 16:27   ` Ian Campbell
  2016-01-12  1:40     ` Wen Congyang
  2016-01-26  6:41   ` Yang Hongyang
  2 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2016-01-08 16:27 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> For example: if the secondary host is down, and we fail to send the data
> to
> the secondary host. xc_domain_save() returns 0.

Just to be check: On failure in this way xc_domain_save() returns 0 (i.e.
success)?

>  So in the function
> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
> and retval is 0(it is xc_domain_save()'s return value). In such case, we
> just need to complete the stream.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_stream_write.c
> b/tools/libxl/libxl_stream_write.c
> index 80d9208..82e7719 100644
> --- a/tools/libxl/libxl_stream_write.c
> +++ b/tools/libxl/libxl_stream_write.c
> @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc,
> void *dss_void,
>       * alive, and check_all_finished() may have torn it down around us.
>       * If the stream is not still alive, we must not continue any work.
>       */
> -    if (libxl__stream_write_inuse(stream))
> -        write_emulator_xenstore_record(egc, stream);
> +    if (libxl__stream_write_inuse(stream)) {
> +        if (dss->remus)
> +            /*
> +             * For remus, if libxl__xc_domain_save_done() completes,
> +             * there was an error sending data to the secondary.
> +             * Resume the primary ASAP.
> +             */
> +            stream_complete(egc, stream, 0);

Is there an indication to the caller that things have failed in this way?
Would that information be of use to the caller?

Or does the called infer this has happened because
otherwise libxl_domain_remus_start is not supposed to return?

> +        else
> +            write_emulator_xenstore_record(egc, stream);
> +    }
>  }
>  
>  static void write_emulator_xenstore_record(libxl__egc *egc,

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model()
  2016-01-08  6:38 ` [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model() Wen Congyang
@ 2016-01-08 16:30   ` Ian Campbell
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2016-01-08 16:30 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Yang Hongyang

On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> After the commit d77570e7, libxl__domain_save_device_model() can
                                                               ^ is
completely unused and can ...

apart from that:

> be dropped.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/5] remus: don't call stream_continue() when doing failover
  2016-01-08 16:20   ` Ian Campbell
@ 2016-01-12  1:36     ` Wen Congyang
  2016-01-14 10:19       ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-12  1:36 UTC (permalink / raw)
  To: Ian Campbell, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On 01/09/2016 12:20 AM, Ian Campbell wrote:
> On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
>> stream_continue() is used for migration to read emulator
>> xenstore data and emulator context. For remus, if we do
>> failover, we have read it in the checkpoint cycle, and
>> we only need to complete the stream.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  tools/libxl/libxl_stream_read.c | 21 ++++++++++++++++-----
>>  1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_stream_read.c
>> b/tools/libxl/libxl_stream_read.c
>> index 258dec4..65219d5 100644
>> --- a/tools/libxl/libxl_stream_read.c
>> +++ b/tools/libxl/libxl_stream_read.c
>> @@ -758,6 +758,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc,
>> void *dcs_void,
>>      libxl__stream_read_state *stream = &dcs->srs;
>>      STATE_AO_GC(dcs->ao);
>>  
>> +    /* convenience aliases */
>> +    const int checkpointed_stream = dcs-
>>> restore_params.checkpointed_stream;
>> +
>>      if (rc)
>>          goto err;
>>  
>> @@ -777,11 +780,19 @@ void libxl__xc_domain_restore_done(libxl__egc *egc,
>> void *dcs_void,
>>       * If the stream is not still alive, we must not continue any work.
>>       */
>>      if (libxl__stream_read_inuse(stream)) {
>> -        /*
>> -         * Libxc has indicated that it is done with the stream.  Resume
>> reading
>> -         * libxl records from it.
>> -         */
>> -        stream_continue(egc, stream);
>> +        if (checkpointed_stream) {
>> +            /*
>> +             * Failover from primary. Domain state is currently at a
>> +             * consistent checkpoint, ready to go.
> 
> This implies that the stream is currently at a consistent point. Whereas
> what I think is meant is that things have failed (perhaps halfway through a
> checkpoint, i.e. not at a consistent state), therefore we stop and continue
> with the previous fully consistent checkpoint (which may have been earlier
> in the stream, not at the current point). Is that right?

The state is always consistent, because we buffer the state until all state are
received.

> 
> And what does "ready to go" mean? Does it mean that we will return back to
> the next higher level or that we go somewhere else first?

stream's callback will be called to resume the guest.

> 
> The big comment about flow control at the top of this file doesn't seem to
> cover the checkpoint case, if it did I suspect I would have found the
> answers there.

We buffer the state in xc_sr_restore.c.

Thanks
Wen Congyang

> 
>> +             */
>> +            stream_complete(egc, stream, 0);
>> +        } else {
>> +            /*
>> +             * Libxc has indicated that it is done with the stream.
>> +             * Resume reading libxl records from it.
>> +             */
>> +            stream_continue(egc, stream);
>> +        }
>>      }
>>  }
>>  
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-08 16:27   ` Ian Campbell
@ 2016-01-12  1:40     ` Wen Congyang
  2016-01-14 10:21       ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-12  1:40 UTC (permalink / raw)
  To: Ian Campbell, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On 01/09/2016 12:27 AM, Ian Campbell wrote:
> On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
>> For example: if the secondary host is down, and we fail to send the data
>> to
>> the secondary host. xc_domain_save() returns 0.
> 
> Just to be check: On failure in this way xc_domain_save() returns 0 (i.e.
> success)?

Yes, it returns 0. I am not sure the return value is right.

> 
>>  So in the function
>> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
>> and retval is 0(it is xc_domain_save()'s return value). In such case, we
>> just need to complete the stream.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_stream_write.c
>> b/tools/libxl/libxl_stream_write.c
>> index 80d9208..82e7719 100644
>> --- a/tools/libxl/libxl_stream_write.c
>> +++ b/tools/libxl/libxl_stream_write.c
>> @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc,
>> void *dss_void,
>>       * alive, and check_all_finished() may have torn it down around us.
>>       * If the stream is not still alive, we must not continue any work.
>>       */
>> -    if (libxl__stream_write_inuse(stream))
>> -        write_emulator_xenstore_record(egc, stream);
>> +    if (libxl__stream_write_inuse(stream)) {
>> +        if (dss->remus)
>> +            /*
>> +             * For remus, if libxl__xc_domain_save_done() completes,
>> +             * there was an error sending data to the secondary.
>> +             * Resume the primary ASAP.
>> +             */
>> +            stream_complete(egc, stream, 0);
> 
> Is there an indication to the caller that things have failed in this way?
> Would that information be of use to the caller?

For remus, when we come here, something is wrong regardless of the return value.

> 
> Or does the called infer this has happened because
> otherwise libxl_domain_remus_start is not supposed to return?

Yes, libxl_domain_remus_start() should not return unless somethins is wrong.

Thanks
Wen Congyang

> 
>> +        else
>> +            write_emulator_xenstore_record(egc, stream);
>> +    }
>>  }
>>  
>>  static void write_emulator_xenstore_record(libxl__egc *egc,
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/5] remus: don't call stream_continue() when doing failover
  2016-01-12  1:36     ` Wen Congyang
@ 2016-01-14 10:19       ` Ian Campbell
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2016-01-14 10:19 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On Tue, 2016-01-12 at 09:36 +0800, Wen Congyang wrote:
> On 01/09/2016 12:20 AM, Ian Campbell wrote:
> > On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> > > stream_continue() is used for migration to read emulator
> > > xenstore data and emulator context. For remus, if we do
> > > failover, we have read it in the checkpoint cycle, and
> > > we only need to complete the stream.
> > > 
> > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > ---
> > >  tools/libxl/libxl_stream_read.c | 21 ++++++++++++++++-----
> > >  1 file changed, 16 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/libxl/libxl_stream_read.c
> > > b/tools/libxl/libxl_stream_read.c
> > > index 258dec4..65219d5 100644
> > > --- a/tools/libxl/libxl_stream_read.c
> > > +++ b/tools/libxl/libxl_stream_read.c
> > > @@ -758,6 +758,9 @@ void libxl__xc_domain_restore_done(libxl__egc
> > > *egc,
> > > void *dcs_void,
> > >      libxl__stream_read_state *stream = &dcs->srs;
> > >      STATE_AO_GC(dcs->ao);
> > >  
> > > +    /* convenience aliases */
> > > +    const int checkpointed_stream = dcs-
> > > > restore_params.checkpointed_stream;
> > > +
> > >      if (rc)
> > >          goto err;
> > >  
> > > @@ -777,11 +780,19 @@ void libxl__xc_domain_restore_done(libxl__egc
> > > *egc,
> > > void *dcs_void,
> > >       * If the stream is not still alive, we must not continue any
> > > work.
> > >       */
> > >      if (libxl__stream_read_inuse(stream)) {
> > > -        /*
> > > -         * Libxc has indicated that it is done with the
> > > stream.  Resume
> > > reading
> > > -         * libxl records from it.
> > > -         */
> > > -        stream_continue(egc, stream);
> > > +        if (checkpointed_stream) {
> > > +            /*
> > > +             * Failover from primary. Domain state is currently at a
> > > +             * consistent checkpoint, ready to go.
> > 
> > This implies that the stream is currently at a consistent point.
> > Whereas
> > what I think is meant is that things have failed (perhaps halfway
> > through a
> > checkpoint, i.e. not at a consistent state), therefore we stop and
> > continue
> > with the previous fully consistent checkpoint (which may have been
> > earlier
> > in the stream, not at the current point). Is that right?
> 
> The state is always consistent, because we buffer the state until all state are
> received.

I think the comment is misleading, it implies that the state up to and
including the current point is consistent, when really there is buffered
partial state which is not consistent (because it is partial). This
misleadingness is mainly because of the "at a consistent checkpoint"
wording I think.


> 
> > 
> > And what does "ready to go" mean? Does it mean that we will return back
> > to
> > the next higher level or that we go somewhere else first?
> 
> stream's callback will be called to resume the guest.

Please update the comment.

> > 
> > The big comment about flow control at the top of this file doesn't seem to
> > cover the checkpoint case, if it did I suspect I would have found the
> > answers there.
> 
> We buffer the state in xc_sr_restore.c.

Perhaps I was too subtle: Please update the comment about flow control at
the top of the file.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-12  1:40     ` Wen Congyang
@ 2016-01-14 10:21       ` Ian Campbell
  2016-01-15  5:44         ` Wen Congyang
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2016-01-14 10:21 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On Tue, 2016-01-12 at 09:40 +0800, Wen Congyang wrote:
> On 01/09/2016 12:27 AM, Ian Campbell wrote:
> > On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> > > For example: if the secondary host is down, and we fail to send the
> > > data
> > > to
> > > the secondary host. xc_domain_save() returns 0.
> > 
> > Just to be check: On failure in this way xc_domain_save() returns 0
> > (i.e.
> > success)?
> 
> Yes, it returns 0. I am not sure the return value is right.
> 
> > 
> > >  So in the function
> > > libxl__xc_domain_save_done(), rc is 0(the helper program exits
> > > normally),
> > > and retval is 0(it is xc_domain_save()'s return value). In such case,
> > > we
> > > just need to complete the stream.
> > > 
> > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > ---
> > >  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/libxl/libxl_stream_write.c
> > > b/tools/libxl/libxl_stream_write.c
> > > index 80d9208..82e7719 100644
> > > --- a/tools/libxl/libxl_stream_write.c
> > > +++ b/tools/libxl/libxl_stream_write.c
> > > @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc,
> > > void *dss_void,
> > >       * alive, and check_all_finished() may have torn it down around
> > > us.
> > >       * If the stream is not still alive, we must not continue any
> > > work.
> > >       */
> > > -    if (libxl__stream_write_inuse(stream))
> > > -        write_emulator_xenstore_record(egc, stream);
> > > +    if (libxl__stream_write_inuse(stream)) {
> > > +        if (dss->remus)
> > > +            /*
> > > +             * For remus, if libxl__xc_domain_save_done() completes,
> > > +             * there was an error sending data to the secondary.
> > > +             * Resume the primary ASAP.
> > > +             */
> > > +            stream_complete(egc, stream, 0);
> > 
> > Is there an indication to the caller that things have failed in this
> > way?
> > Would that information be of use to the caller?
> 
> For remus, when we come here, something is wrong regardless of the return
> value.

But does the caller know this? Can it tell.

> 
> > 
> > Or does the called infer this has happened because
> > otherwise libxl_domain_remus_start is not supposed to return?
> 
> Yes, libxl_domain_remus_start() should not return unless somethins is
> wrong.

This really ought to be documented somewhere.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-14 10:21       ` Ian Campbell
@ 2016-01-15  5:44         ` Wen Congyang
  2016-01-15  9:48           ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-15  5:44 UTC (permalink / raw)
  To: Ian Campbell, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On 01/14/2016 06:21 PM, Ian Campbell wrote:
> On Tue, 2016-01-12 at 09:40 +0800, Wen Congyang wrote:
>> On 01/09/2016 12:27 AM, Ian Campbell wrote:
>>> On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
>>>> For example: if the secondary host is down, and we fail to send the
>>>> data
>>>> to
>>>> the secondary host. xc_domain_save() returns 0.
>>>
>>> Just to be check: On failure in this way xc_domain_save() returns 0
>>> (i.e.
>>> success)?
>>
>> Yes, it returns 0. I am not sure the return value is right.
>>
>>>
>>>>  So in the function
>>>> libxl__xc_domain_save_done(), rc is 0(the helper program exits
>>>> normally),
>>>> and retval is 0(it is xc_domain_save()'s return value). In such case,
>>>> we
>>>> just need to complete the stream.
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> ---
>>>>  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
>>>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/libxl/libxl_stream_write.c
>>>> b/tools/libxl/libxl_stream_write.c
>>>> index 80d9208..82e7719 100644
>>>> --- a/tools/libxl/libxl_stream_write.c
>>>> +++ b/tools/libxl/libxl_stream_write.c
>>>> @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc,
>>>> void *dss_void,
>>>>       * alive, and check_all_finished() may have torn it down around
>>>> us.
>>>>       * If the stream is not still alive, we must not continue any
>>>> work.
>>>>       */
>>>> -    if (libxl__stream_write_inuse(stream))
>>>> -        write_emulator_xenstore_record(egc, stream);
>>>> +    if (libxl__stream_write_inuse(stream)) {
>>>> +        if (dss->remus)
>>>> +            /*
>>>> +             * For remus, if libxl__xc_domain_save_done() completes,
>>>> +             * there was an error sending data to the secondary.
>>>> +             * Resume the primary ASAP.
>>>> +             */
>>>> +            stream_complete(egc, stream, 0);
>>>
>>> Is there an indication to the caller that things have failed in this
>>> way?
>>> Would that information be of use to the caller?
>>
>> For remus, when we come here, something is wrong regardless of the return
>> value.
> 
> But does the caller know this? Can it tell.
> 
>>
>>>
>>> Or does the called infer this has happened because
>>> otherwise libxl_domain_remus_start is not supposed to return?
>>
>> Yes, libxl_domain_remus_start() should not return unless somethins is
>> wrong.
> 
> This really ought to be documented somewhere.

libxl_domain_remus_start():
    /* Point of no return */
    libxl__remus_setup(egc, dss);
    return AO_INPROGRESS;

Thanks
Wen Congyang

> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-15  5:44         ` Wen Congyang
@ 2016-01-15  9:48           ` Ian Campbell
  2016-01-15  9:54             ` Wen Congyang
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2016-01-15  9:48 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On Fri, 2016-01-15 at 13:44 +0800, Wen Congyang wrote:
> On 01/14/2016 06:21 PM, Ian Campbell wrote:
> > On Tue, 2016-01-12 at 09:40 +0800, Wen Congyang wrote:
> > > On 01/09/2016 12:27 AM, Ian Campbell wrote:
> > > > On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
> > > > > For example: if the secondary host is down, and we fail to send
> > > > > the
> > > > > data
> > > > > to
> > > > > the secondary host. xc_domain_save() returns 0.
> > > > 
> > > > Just to be check: On failure in this way xc_domain_save() returns 0
> > > > (i.e.
> > > > success)?
> > > 
> > > Yes, it returns 0. I am not sure the return value is right.
> > > 
> > > > 
> > > > >  So in the function
> > > > > libxl__xc_domain_save_done(), rc is 0(the helper program exits
> > > > > normally),
> > > > > and retval is 0(it is xc_domain_save()'s return value). In such
> > > > > case,
> > > > > we
> > > > > just need to complete the stream.
> > > > > 
> > > > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > > > ---
> > > > >  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
> > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/tools/libxl/libxl_stream_write.c
> > > > > b/tools/libxl/libxl_stream_write.c
> > > > > index 80d9208..82e7719 100644
> > > > > --- a/tools/libxl/libxl_stream_write.c
> > > > > +++ b/tools/libxl/libxl_stream_write.c
> > > > > @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc
> > > > > *egc,
> > > > > void *dss_void,
> > > > >       * alive, and check_all_finished() may have torn it down
> > > > > around
> > > > > us.
> > > > >       * If the stream is not still alive, we must not continue
> > > > > any
> > > > > work.
> > > > >       */
> > > > > -    if (libxl__stream_write_inuse(stream))
> > > > > -        write_emulator_xenstore_record(egc, stream);
> > > > > +    if (libxl__stream_write_inuse(stream)) {
> > > > > +        if (dss->remus)
> > > > > +            /*
> > > > > +             * For remus, if libxl__xc_domain_save_done()
> > > > > completes,
> > > > > +             * there was an error sending data to the secondary.
> > > > > +             * Resume the primary ASAP.
> > > > > +             */
> > > > > +            stream_complete(egc, stream, 0);
> > > > 
> > > > Is there an indication to the caller that things have failed in
> > > > this
> > > > way?
> > > > Would that information be of use to the caller?
> > > 
> > > For remus, when we come here, something is wrong regardless of the
> > > return
> > > value.
> > 
> > But does the caller know this? Can it tell.
> > 
> > > 
> > > > 
> > > > Or does the called infer this has happened because
> > > > otherwise libxl_domain_remus_start is not supposed to return?
> > > 
> > > Yes, libxl_domain_remus_start() should not return unless somethins is
> > > wrong.
> > 
> > This really ought to be documented somewhere.
> 
> libxl_domain_remus_start():
>     /* Point of no return */
>     libxl__remus_setup(egc, dss);
>     return AO_INPROGRESS;

This is (obviously) not documentation.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-15  9:48           ` Ian Campbell
@ 2016-01-15  9:54             ` Wen Congyang
  0 siblings, 0 replies; 24+ messages in thread
From: Wen Congyang @ 2016-01-15  9:54 UTC (permalink / raw)
  To: Ian Campbell, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Yang Hongyang

On 01/15/2016 05:48 PM, Ian Campbell wrote:
> On Fri, 2016-01-15 at 13:44 +0800, Wen Congyang wrote:
>> On 01/14/2016 06:21 PM, Ian Campbell wrote:
>>> On Tue, 2016-01-12 at 09:40 +0800, Wen Congyang wrote:
>>>> On 01/09/2016 12:27 AM, Ian Campbell wrote:
>>>>> On Fri, 2016-01-08 at 14:38 +0800, Wen Congyang wrote:
>>>>>> For example: if the secondary host is down, and we fail to send
>>>>>> the
>>>>>> data
>>>>>> to
>>>>>> the secondary host. xc_domain_save() returns 0.
>>>>>
>>>>> Just to be check: On failure in this way xc_domain_save() returns 0
>>>>> (i.e.
>>>>> success)?
>>>>
>>>> Yes, it returns 0. I am not sure the return value is right.
>>>>
>>>>>
>>>>>>  So in the function
>>>>>> libxl__xc_domain_save_done(), rc is 0(the helper program exits
>>>>>> normally),
>>>>>> and retval is 0(it is xc_domain_save()'s return value). In such
>>>>>> case,
>>>>>> we
>>>>>> just need to complete the stream.
>>>>>>
>>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>>> ---
>>>>>>  tools/libxl/libxl_stream_write.c | 13 +++++++++++--
>>>>>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libxl/libxl_stream_write.c
>>>>>> b/tools/libxl/libxl_stream_write.c
>>>>>> index 80d9208..82e7719 100644
>>>>>> --- a/tools/libxl/libxl_stream_write.c
>>>>>> +++ b/tools/libxl/libxl_stream_write.c
>>>>>> @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc
>>>>>> *egc,
>>>>>> void *dss_void,
>>>>>>       * alive, and check_all_finished() may have torn it down
>>>>>> around
>>>>>> us.
>>>>>>       * If the stream is not still alive, we must not continue
>>>>>> any
>>>>>> work.
>>>>>>       */
>>>>>> -    if (libxl__stream_write_inuse(stream))
>>>>>> -        write_emulator_xenstore_record(egc, stream);
>>>>>> +    if (libxl__stream_write_inuse(stream)) {
>>>>>> +        if (dss->remus)
>>>>>> +            /*
>>>>>> +             * For remus, if libxl__xc_domain_save_done()
>>>>>> completes,
>>>>>> +             * there was an error sending data to the secondary.
>>>>>> +             * Resume the primary ASAP.
>>>>>> +             */
>>>>>> +            stream_complete(egc, stream, 0);
>>>>>
>>>>> Is there an indication to the caller that things have failed in
>>>>> this
>>>>> way?
>>>>> Would that information be of use to the caller?
>>>>
>>>> For remus, when we come here, something is wrong regardless of the
>>>> return
>>>> value.
>>>
>>> But does the caller know this? Can it tell.
>>>
>>>>
>>>>>
>>>>> Or does the called infer this has happened because
>>>>> otherwise libxl_domain_remus_start is not supposed to return?
>>>>
>>>> Yes, libxl_domain_remus_start() should not return unless somethins is
>>>> wrong.
>>>
>>> This really ought to be documented somewhere.
>>
>> libxl_domain_remus_start():
>>     /* Point of no return */
>>     libxl__remus_setup(egc, dss);
>>     return AO_INPROGRESS;
> 
> This is (obviously) not documentation.

OK, I will update the comment.

Thanks
Wen Congyang

> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/5] remus: don't call stream_continue() when doing failover
  2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
  2016-01-08 16:20   ` Ian Campbell
@ 2016-01-26  6:37   ` Yang Hongyang
  1 sibling, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2016-01-26  6:37 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Wei Liu, Changlong Xie, Ian Jackson, Ian Campbell



On 01/08/2016 02:38 PM, Wen Congyang wrote:
> stream_continue() is used for migration to read emulator
> xenstore data and emulator context. For remus, if we do
> failover, we have read it in the checkpoint cycle, and
> we only need to complete the stream.

Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>

>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   tools/libxl/libxl_stream_read.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index 258dec4..65219d5 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -758,6 +758,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>       libxl__stream_read_state *stream = &dcs->srs;
>       STATE_AO_GC(dcs->ao);
>
> +    /* convenience aliases */
> +    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
> +
>       if (rc)
>           goto err;
>
> @@ -777,11 +780,19 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>        * If the stream is not still alive, we must not continue any work.
>        */
>       if (libxl__stream_read_inuse(stream)) {
> -        /*
> -         * Libxc has indicated that it is done with the stream.  Resume reading
> -         * libxl records from it.
> -         */
> -        stream_continue(egc, stream);
> +        if (checkpointed_stream) {
> +            /*
> +             * Failover from primary. Domain state is currently at a
> +             * consistent checkpoint, ready to go.
> +             */
> +            stream_complete(egc, stream, 0);
> +        } else {
> +            /*
> +             * Libxc has indicated that it is done with the stream.
> +             * Resume reading libxl records from it.
> +             */
> +            stream_continue(egc, stream);
> +        }
>       }
>   }
>
>

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes
  2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
  2016-01-08  9:52   ` Andrew Cooper
  2016-01-08 16:27   ` Ian Campbell
@ 2016-01-26  6:41   ` Yang Hongyang
  2 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2016-01-26  6:41 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Ian Campbell



On 01/08/2016 02:38 PM, Wen Congyang wrote:
> For example: if the secondary host is down, and we fail to send the data to
> the secondary host. xc_domain_save() returns 0. So in the function
> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
> and retval is 0(it is xc_domain_save()'s return value). In such case, we
> just need to complete the stream.
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>   tools/libxl/libxl_stream_write.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
> index 80d9208..82e7719 100644
> --- a/tools/libxl/libxl_stream_write.c
> +++ b/tools/libxl/libxl_stream_write.c
> @@ -354,8 +354,17 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
>        * alive, and check_all_finished() may have torn it down around us.
>        * If the stream is not still alive, we must not continue any work.
>        */
> -    if (libxl__stream_write_inuse(stream))
> -        write_emulator_xenstore_record(egc, stream);
> +    if (libxl__stream_write_inuse(stream)) {
> +        if (dss->remus)
> +            /*
> +             * For remus, if libxl__xc_domain_save_done() completes,

xc_domain_save() completes.

> +             * there was an error sending data to the secondary.
> +             * Resume the primary ASAP.
> +             */
> +            stream_complete(egc, stream, 0);
> +        else
> +            write_emulator_xenstore_record(egc, stream);
> +    }
>   }
>
>   static void write_emulator_xenstore_record(libxl__egc *egc,
>

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/5] tools/libxc: don't send end record if remus fails
  2016-01-08  6:38 ` [PATCH v3 3/5] tools/libxc: don't send end record if remus fails Wen Congyang
@ 2016-01-26  6:44   ` Yang Hongyang
  0 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2016-01-26  6:44 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Ian Campbell

some commit message would be better.

Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>

On 01/08/2016 02:38 PM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   tools/libxc/xc_sr_save.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 88d85ef..e532168 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -795,7 +795,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>
>               rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>               if ( rc <= 0 )
> -                ctx->save.checkpointed = false;
> +                goto err;
>           }
>       } while ( ctx->save.checkpointed );
>
>

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback
  2016-01-08  6:38 ` [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback Wen Congyang
@ 2016-01-26  6:45   ` Yang Hongyang
  2016-01-26  6:48     ` Wen Congyang
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2016-01-26  6:45 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Ian Campbell

ditto

Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>

On 01/08/2016 02:38 PM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   tools/libxc/xc_sr_save.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index e532168..e4ba560 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -791,7 +791,9 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>               if ( rc )
>                   goto err;
>
> -            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
> +            rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
> +            if ( rc <= 0 )
> +                goto err;
>
>               rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>               if ( rc <= 0 )
>

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback
  2016-01-26  6:45   ` Yang Hongyang
@ 2016-01-26  6:48     ` Wen Congyang
  2016-01-26  7:02       ` Yang Hongyang
  0 siblings, 1 reply; 24+ messages in thread
From: Wen Congyang @ 2016-01-26  6:48 UTC (permalink / raw)
  To: Yang Hongyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Ian Campbell

On 01/26/2016 02:45 PM, Yang Hongyang wrote:
> ditto
> 
> Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>

The newest version is v5, and this series is in the staging now.

Thanks for your review.
Wen Congyang

> 
> On 01/08/2016 02:38 PM, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>   tools/libxc/xc_sr_save.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index e532168..e4ba560 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -791,7 +791,9 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>>               if ( rc )
>>                   goto err;
>>
>> -            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>> +            rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>> +            if ( rc <= 0 )
>> +                goto err;
>>
>>               rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>>               if ( rc <= 0 )
>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback
  2016-01-26  6:48     ` Wen Congyang
@ 2016-01-26  7:02       ` Yang Hongyang
  2016-01-26 16:08         ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2016-01-26  7:02 UTC (permalink / raw)
  To: Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu, Ian Campbell



On 01/26/2016 02:48 PM, Wen Congyang wrote:
> On 01/26/2016 02:45 PM, Yang Hongyang wrote:
>> ditto
>>
>> Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
>
> The newest version is v5, and this series is in the staging now.

Sorry for the noise...I saw the series too late, please ignore my
comments...

>
> Thanks for your review.
> Wen Congyang
>
>>
>> On 01/08/2016 02:38 PM, Wen Congyang wrote:
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>    tools/libxc/xc_sr_save.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>> index e532168..e4ba560 100644
>>> --- a/tools/libxc/xc_sr_save.c
>>> +++ b/tools/libxc/xc_sr_save.c
>>> @@ -791,7 +791,9 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>>>                if ( rc )
>>>                    goto err;
>>>
>>> -            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>>> +            rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
>>> +            if ( rc <= 0 )
>>> +                goto err;
>>>
>>>                rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
>>>                if ( rc <= 0 )
>>>
>>
>
>
>
>

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback
  2016-01-26  7:02       ` Yang Hongyang
@ 2016-01-26 16:08         ` Ian Campbell
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Campbell @ 2016-01-26 16:08 UTC (permalink / raw)
  To: Yang Hongyang, Wen Congyang, xen devel, Andrew Cooper
  Cc: Shriram Rajagopalan, Ian Jackson, Changlong Xie, Wei Liu

On Tue, 2016-01-26 at 15:02 +0800, Yang Hongyang wrote:
> 
> On 01/26/2016 02:48 PM, Wen Congyang wrote:
> > On 01/26/2016 02:45 PM, Yang Hongyang wrote:
> > > ditto
> > > 
> > > Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > 
> > The newest version is v5, and this series is in the staging now.
> 
> Sorry for the noise...I saw the series too late, please ignore my
> comments...

Is an update to the MAINTAINERS file required?

Ian.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-01-26 16:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-08  6:38 [PATCH v3 0/5] migration/remus: bug fix and cleanup Wen Congyang
2016-01-08  6:38 ` [PATCH v3 1/5] remus: don't call stream_continue() when doing failover Wen Congyang
2016-01-08 16:20   ` Ian Campbell
2016-01-12  1:36     ` Wen Congyang
2016-01-14 10:19       ` Ian Campbell
2016-01-26  6:37   ` Yang Hongyang
2016-01-08  6:38 ` [PATCH v3 2/5] remus: resume immediately if libxl__xc_domain_save_done() completes Wen Congyang
2016-01-08  9:52   ` Andrew Cooper
2016-01-08 16:27   ` Ian Campbell
2016-01-12  1:40     ` Wen Congyang
2016-01-14 10:21       ` Ian Campbell
2016-01-15  5:44         ` Wen Congyang
2016-01-15  9:48           ` Ian Campbell
2016-01-15  9:54             ` Wen Congyang
2016-01-26  6:41   ` Yang Hongyang
2016-01-08  6:38 ` [PATCH v3 3/5] tools/libxc: don't send end record if remus fails Wen Congyang
2016-01-26  6:44   ` Yang Hongyang
2016-01-08  6:38 ` [PATCH v3 4/5] tools/libxc: error handling for the postcopy() callback Wen Congyang
2016-01-26  6:45   ` Yang Hongyang
2016-01-26  6:48     ` Wen Congyang
2016-01-26  7:02       ` Yang Hongyang
2016-01-26 16:08         ` Ian Campbell
2016-01-08  6:38 ` [PATCH v3 5/5] tools/libxl: remove unused function libxl__domain_save_device_model() Wen Congyang
2016-01-08 16:30   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.