All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO
@ 2015-06-08  3:43 Yang Hongyang
  2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
                   ` (12 more replies)
  0 siblings, 13 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This patchset is Prerequisite for COLO feature. For what COLO is, refer
to http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

This patchset is based on:
[PATCH v2 0/6] Misc cleanups for libxl

You can also get the patchset from:
    https://github.com/macrosheep/xen/tree/colo-v6

v1->v2:
 - Rebased to [PATCH v2 0/6] Misc cleanups for libxl
 - Add a bugfix for the error handling of process_record

Wen Congyang (4):
  tools/libxc: support to resume uncooperative HVM guests
  tools/libxl: Introduce a new internal API libxl__domain_unpause()
  tools/libxl: Update libxl_save_msgs_gen.pl to support return data from
    xl to xc
  tools/libxl: Add back channel to allow migration target send data back

Yang Hongyang (9):
  libxc/restore: fix error handle of process_record
  libxc/restore: zero ioreq page only one time
  tools/libxc: export xc_bitops.h
  tools/libxl: introduce a new API libxl__domain_restore() to load qemu
    state
  tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  tools/libxl: rename remus device to checkpoint device
  tools/libxl: adjust the indentation
  tools/libxl: don't touch remus in checkpoint_device

 tools/libxc/include/xc_bitops.h       |  76 ++++++++
 tools/libxc/xc_bitops.h               |  76 --------
 tools/libxc/xc_resume.c               |  22 ++-
 tools/libxc/xc_sr_restore.c           |  28 +--
 tools/libxc/xc_sr_restore_x86_hvm.c   |   3 +-
 tools/libxl/Makefile                  |   2 +-
 tools/libxl/libxl.c                   |  62 +++++--
 tools/libxl/libxl_checkpoint_device.c | 282 +++++++++++++++++++++++++++++
 tools/libxl/libxl_create.c            |  14 +-
 tools/libxl/libxl_dom_save.c          | 128 +++++++++----
 tools/libxl/libxl_internal.h          | 171 ++++++++++--------
 tools/libxl/libxl_netbuffer.c         | 117 ++++++------
 tools/libxl/libxl_nonetbuffer.c       |  10 +-
 tools/libxl/libxl_qmp.c               |  10 ++
 tools/libxl/libxl_remus.c             | 140 ++++++++++-----
 tools/libxl/libxl_remus_device.c      | 327 ----------------------------------
 tools/libxl/libxl_remus_disk_drbd.c   |  56 +++---
 tools/libxl/libxl_save_callout.c      |  31 ++++
 tools/libxl/libxl_save_helper.c       |  17 ++
 tools/libxl/libxl_save_msgs_gen.pl    |  65 ++++++-
 tools/libxl/libxl_types.idl           |  11 +-
 tools/libxl/xl_cmdimpl.c              |   7 +
 22 files changed, 970 insertions(+), 685 deletions(-)
 create mode 100644 tools/libxc/include/xc_bitops.h
 delete mode 100644 tools/libxc/xc_bitops.h
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-08  9:24   ` Andrew Cooper
  2015-06-10 14:55   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests Yang Hongyang
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

If the err is RECORD_NOT_PROCESSED, and it is an optional record,
restore will still fail. The patch fix this.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 9e27dba..2d2edd3 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     free(rec->data);
     rec->data = NULL;
 
-    if ( rc == RECORD_NOT_PROCESSED )
-    {
-        if ( rec->type & REC_TYPE_OPTIONAL )
-            DPRINTF("Ignoring optional record %#x (%s)",
-                    rec->type, rec_type_to_str(rec->type));
-        else
-        {
-            ERROR("Mandatory record %#x (%s) not handled",
-                  rec->type, rec_type_to_str(rec->type));
-            rc = -1;
-        }
-    }
-
     return rc;
 }
 
@@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
         else
         {
             rc = process_record(ctx, &rec);
-            if ( rc )
+            if ( rc == RECORD_NOT_PROCESSED )
+            {
+                if ( rec.type & REC_TYPE_OPTIONAL )
+                    DPRINTF("Ignoring optional record %#x (%s)",
+                            rec.type, rec_type_to_str(rec.type));
+                else
+                {
+                    ERROR("Mandatory record %#x (%s) not handled",
+                          rec.type, rec_type_to_str(rec.type));
+                    rc = -1;
+                    goto err;
+                }
+            }
+            else if ( rc )
                 goto err;
         }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
  2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-10 15:18   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Yang Hongyang
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

For PVHVM, the hypercall return code is 0, and it can be resumed
in a new domain context.
we suspend PVHVM and resume it is like this:
1. suspend it via evtchn
2. modifty the return code to 1
3. the guest know that the suspend is cancelled, we will use fast path
   to resume it.

Under COLO, we will update the guest's state(modify memory, cpu's registers,
device status...). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use a slow path to resume the guest. We have
updated the guest state, so we call it a new domain context.

For HVM, the hypercall is a NOP.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e67bebd..bd82334 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    /*
+     * If it is PVHVM, the hypercall return code is 0, because this
+     * is not a fast path resume, we do not modify_returncode as in
+     * xc_domain_resume_cooperative.
+     * (resuming it in a new domain context)
+     *
+     * If it is a HVM, the hypercall is a NOP.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
 {
     DECLARE_DOMCTL;
@@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
      */
 #if defined(__i386__) || defined(__x86_64__)
     if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);
 
     if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
  2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
  2015-06-08  3:43 ` [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-08  9:46   ` Andrew Cooper
  2015-06-08  3:43 ` [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h Yang Hongyang
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

ioreq page contains evtchn which will be set when we resume the
secondary vm the first time. The hypervisor will check if the
evtchn is corrupted, so we cannot zero the ioreq page more
than one time.

The ioreq->state is always STATE_IOREQ_NONE after the vm is
suspended, so it is OK if we only zero it one time.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c
index 6f5af0e..06177e0 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
             break;
         case HVM_PARAM_IOREQ_PFN:
         case HVM_PARAM_BUFIOREQ_PFN:
-            xc_clear_domain_page(xch, ctx->domid, entry->value);
+            if ( !ctx->restore.buffer_all_records )
+                xc_clear_domain_page(xch, ctx->domid, entry->value);
             break;
         }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (2 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-08 10:04   ` Yang Hongyang
  2015-06-10 15:20   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state Yang Hongyang
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

When we are under COLO, we will send dirty page bitmap info from
secondary to primary at every checkpoint. So we need to get/test
the dirty page bitmap. We just expose xc_bitops.h for libxl use.

NOTE:
  Need to make clean and rerun configure to get it compiled.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/include/xc_bitops.h | 76 +++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_bitops.h         | 76 -----------------------------------------
 2 files changed, 76 insertions(+), 76 deletions(-)
 create mode 100644 tools/libxc/include/xc_bitops.h
 delete mode 100644 tools/libxc/xc_bitops.h

diff --git a/tools/libxc/include/xc_bitops.h b/tools/libxc/include/xc_bitops.h
new file mode 100644
index 0000000..cd749f4
--- /dev/null
+++ b/tools/libxc/include/xc_bitops.h
@@ -0,0 +1,76 @@
+#ifndef XC_BITOPS_H
+#define XC_BITOPS_H 1
+
+/* bitmap operations for single threaded access */
+
+#include <stdlib.h>
+#include <string.h>
+
+#define BITS_PER_LONG (sizeof(unsigned long) * 8)
+#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
+
+#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
+#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
+
+/* calculate required space for number of longs needed to hold nr_bits */
+static inline int bitmap_size(int nr_bits)
+{
+    int nr_long, nr_bytes;
+    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
+    nr_bytes = nr_long * sizeof(unsigned long);
+    return nr_bytes;
+}
+
+static inline unsigned long *bitmap_alloc(int nr_bits)
+{
+    return calloc(1, bitmap_size(nr_bits));
+}
+
+static inline void bitmap_set(unsigned long *addr, int nr_bits)
+{
+    memset(addr, 0xff, bitmap_size(nr_bits));
+}
+
+static inline void bitmap_clear(unsigned long *addr, int nr_bits)
+{
+    memset(addr, 0, bitmap_size(nr_bits));
+}
+
+static inline int test_bit(int nr, unsigned long *addr)
+{
+    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
+}
+
+static inline void clear_bit(int nr, unsigned long *addr)
+{
+    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
+}
+
+static inline void set_bit(int nr, unsigned long *addr)
+{
+    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
+}
+
+static inline int test_and_clear_bit(int nr, unsigned long *addr)
+{
+    int oldbit = test_bit(nr, addr);
+    clear_bit(nr, addr);
+    return oldbit;
+}
+
+static inline int test_and_set_bit(int nr, unsigned long *addr)
+{
+    int oldbit = test_bit(nr, addr);
+    set_bit(nr, addr);
+    return oldbit;
+}
+
+static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
+                             int nr_bits)
+{
+    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
+    for ( i = 0; i < nr_longs; ++i )
+        dst[i] |= other[i];
+}
+
+#endif  /* XC_BITOPS_H */
diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
deleted file mode 100644
index cd749f4..0000000
--- a/tools/libxc/xc_bitops.h
+++ /dev/null
@@ -1,76 +0,0 @@
-#ifndef XC_BITOPS_H
-#define XC_BITOPS_H 1
-
-/* bitmap operations for single threaded access */
-
-#include <stdlib.h>
-#include <string.h>
-
-#define BITS_PER_LONG (sizeof(unsigned long) * 8)
-#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
-
-#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
-#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
-
-/* calculate required space for number of longs needed to hold nr_bits */
-static inline int bitmap_size(int nr_bits)
-{
-    int nr_long, nr_bytes;
-    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
-    nr_bytes = nr_long * sizeof(unsigned long);
-    return nr_bytes;
-}
-
-static inline unsigned long *bitmap_alloc(int nr_bits)
-{
-    return calloc(1, bitmap_size(nr_bits));
-}
-
-static inline void bitmap_set(unsigned long *addr, int nr_bits)
-{
-    memset(addr, 0xff, bitmap_size(nr_bits));
-}
-
-static inline void bitmap_clear(unsigned long *addr, int nr_bits)
-{
-    memset(addr, 0, bitmap_size(nr_bits));
-}
-
-static inline int test_bit(int nr, unsigned long *addr)
-{
-    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
-}
-
-static inline void clear_bit(int nr, unsigned long *addr)
-{
-    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
-}
-
-static inline void set_bit(int nr, unsigned long *addr)
-{
-    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
-}
-
-static inline int test_and_clear_bit(int nr, unsigned long *addr)
-{
-    int oldbit = test_bit(nr, addr);
-    clear_bit(nr, addr);
-    return oldbit;
-}
-
-static inline int test_and_set_bit(int nr, unsigned long *addr)
-{
-    int oldbit = test_bit(nr, addr);
-    set_bit(nr, addr);
-    return oldbit;
-}
-
-static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
-                             int nr_bits)
-{
-    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
-    for ( i = 0; i < nr_longs; ++i )
-        dst[i] |= other[i];
-}
-
-#endif  /* XC_BITOPS_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (3 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-10 15:35   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause() Yang Hongyang
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. suspend both primay vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
We will send qemu's state each time in step2, and
slave's qemu should read it each time before resuming
secondary vm. Introduce a new API libxl__domain_restore()
to do it. This API should be called before resuming
secondary vm.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom_save.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  4 ++++
 tools/libxl/libxl_qmp.c      | 10 ++++++++++
 3 files changed, 61 insertions(+)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 74a6bae..f9627f8 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -663,6 +663,53 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
     return 0;
 }
 
+int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
+{
+    int rc = 0;
+
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    if (type != LIBXL_DOMAIN_TYPE_HVM) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_restore_device_model(gc, domid);
+    if (rc)
+        LOG(ERROR, "failed to restore device mode for domain %u:%d",
+            domid, rc);
+out:
+    return rc;
+}
+
+int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
+{
+    char *state_file;
+    int rc;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        /* not supported now */
+        rc = ERROR_INVAL;
+        break;
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        /*
+         * This function may be called too many times for the same gc,
+         * so we use NOGC, and free the memory before return to avoid
+         * OOM.
+         */
+        state_file = libxl__sprintf(NOGC,
+                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
+                                    domid);
+        rc = libxl__qmp_restore(gc, domid, state_file);
+        free(state_file);
+        break;
+    default:
+        rc = ERROR_INVAL;
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1905195..20364c6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1022,6 +1022,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
 
 _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
                                      uint32_t size, void *data);
+_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
 
 _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
@@ -1039,6 +1040,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
                                   const char *userdata_userid,
                                   const uint8_t *data, int datalen);
 
+_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
                                  int suspend_cancel);
 _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
@@ -1651,6 +1653,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
 _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
 /* Save current QEMU state into fd. */
 _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
+/* Load current QEMU state from fd. */
+_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
 /* Set dirty bitmap logging status */
 _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 9aa7e2e..a6f1a21 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
                            NULL, NULL);
 }
 
+int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_string(gc, &args, "filename", state_file);
+
+    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
+                           NULL, NULL);
+}
+
 static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
                       char *device, char *target, char *arg)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (4 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-10 15:37   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

The guest is paused after libxl_domain_create_restore().
Secondary vm is running in colo mode. So we need to unpause
the guest. The current API libxl_domain_unpause() is
not an internal API. Introduce a new API to support it.
No functional change.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c          | 20 ++++++++++++++------
 tools/libxl/libxl_internal.h |  1 +
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ba2da92..d5691dc 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -933,9 +933,8 @@ out:
     return AO_INPROGRESS;
 }
 
-int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
 {
-    GC_INIT(ctx);
     char *path;
     char *state;
     int ret, rc = 0;
@@ -947,7 +946,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
+        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
 
         path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
         state = libxl__xs_read(gc, XBT_NULL, path);
@@ -957,12 +956,21 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
                                          NULL, NULL, NULL);
         }
     }
-    ret = xc_domain_unpause(ctx->xch, domid);
-    if (ret<0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
+
+    ret = xc_domain_unpause(CTX->xch, domid);
+    if (ret < 0) {
+        LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
         rc = ERROR_FAIL;
     }
  out:
+    return rc;
+}
+
+int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
+{
+    GC_INIT(ctx);
+    int rc = libxl__domain_unpause(gc, domid);
+
     GC_FREE;
     return rc;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 20364c6..366470f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1044,6 +1044,7 @@ _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
                                  int suspend_cancel);
 _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
+_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
 
 /* returns 0 or 1, or a libxl error code */
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (5 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause() Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-12 12:33   ` Wei Liu
  2015-06-08  3:43 ` [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Yang Hongyang
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Currently, libxl__domain_unpause() only supports
qemu-xen-traditional. Update it to support qemu-xen.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index d5691dc..5c843c2 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -933,10 +933,37 @@ out:
     return AO_INPROGRESS;
 }
 
-int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
+static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
 {
     char *path;
     char *state;
+
+    switch (libxl__device_model_version_running(gc, domid)) {
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
+        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
+
+        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
+        state = libxl__xs_read(gc, XBT_NULL, path);
+        if (state != NULL && !strcmp(state, "paused")) {
+            libxl__qemu_traditional_cmd(gc, domid, "continue");
+            libxl__wait_for_device_model_deprecated(gc, domid, "running",
+                                                    NULL, NULL, NULL);
+        }
+        break;
+    }
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+        if (libxl__qmp_resume(gc, domid))
+            return ERROR_FAIL;
+        break;
+    default:
+        return ERROR_INVAL;
+    }
+
+    return 0;
+}
+
+int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
+{
     int ret, rc = 0;
 
     libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -946,14 +973,11 @@ int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
-
-        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
-        state = libxl__xs_read(gc, XBT_NULL, path);
-        if (state != NULL && !strcmp(state, "paused")) {
-            libxl__qemu_traditional_cmd(gc, domid, "continue");
-            libxl__wait_for_device_model_deprecated(gc, domid, "running",
-                                         NULL, NULL, NULL);
+        rc = libxl__domain_unpause_device_model(gc, domid);
+        if (rc < 0) {
+            LOG(ERROR, "failed to unpause device model for domain %u:%d",
+                domid, rc);
+            goto out;
         }
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (6 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-16 10:45   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Secondary vm is running in colo mode, we need to send
secondary vm's dirty page information to master at checkpoint,
so we have to enable qemu logdirty on secondary.

libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses domain_save_state, and calls
libxl__xc_domain_saverestore_async_callback_done()
before exits. This can not be used for secondary vm.

Update libxl__domain_suspend_common_switch_qemu_logdirty() to
introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_dom_save.c | 78 ++++++++++++++++++++++++++------------------
 tools/libxl/libxl_internal.h |  8 +++++
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index f9627f8..c15e9f1 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -44,7 +44,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss, int ok);
+                                 libxl__logdirty_switch *lds, int ok);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -54,13 +54,10 @@ static void logdirty_init(libxl__logdirty_switch *lds)
 }
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = CONTAINER_OF(shs, *dss, shs);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
     xs_transaction_t t = 0;
     const char *got;
@@ -122,64 +119,81 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
-                               (int domid, unsigned enable,
-                                libxl__save_helper_state *shs)
+                               (libxl__egc *egc, int domid, unsigned enable,
+                                libxl__logdirty_switch *lds)
 {
-    libxl__egc *egc = shs->egc;
-    libxl__domain_save_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+    STATE_AO_GC(lds->ao);
     int rc;
 
     rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
     if (!rc) {
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        lds->callback(egc, lds, 0);
     } else {
         LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 
+static void libxl__domain_suspend_switch_qemu_logdirty_done
+                        (libxl__egc *egc, libxl__logdirty_switch *lds, int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, rc);
+}
+
 void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned enable, void *user)
 {
     libxl__save_helper_state *shs = user;
     libxl__egc *egc = shs->egc;
     libxl__domain_save_state *dss = CONTAINER_OF(shs, *dss, shs);
-    STATE_AO_GC(dss->ao);
+
+    /* convenience aliases */
+    libxl__logdirty_switch *const lds = &dss->logdirty;
+
+    lds->callback = libxl__domain_suspend_switch_qemu_logdirty_done;
+    libxl__domain_common_switch_qemu_logdirty(egc, domid, enable, lds);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds)
+{
+    STATE_AO_GC(lds->ao);
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-        domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_traditional_logdirty(egc, domid, enable,
+                                                            lds);
         break;
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
+        domain_suspend_switch_qemu_xen_logdirty(egc, domid, enable, lds);
         break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, aborting suspend");
-        libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
+        lds->callback(egc, lds, -1);
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
                                     const struct timeval *requested_abs)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(ev, *lds, timeout);
+    STATE_AO_GC(lds->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,lds,-1);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
                             const char *watch_path, const char *event_path)
 {
-    libxl__domain_save_state *dss =
-        CONTAINER_OF(watch, *dss, logdirty.watch);
-    libxl__logdirty_switch *lds = &dss->logdirty;
-    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = CONTAINER_OF(watch, *lds, watch);
+    STATE_AO_GC(lds->ao);
     const char *got;
     xs_transaction_t t = 0;
     int rc;
@@ -223,24 +237,23 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
     libxl__xs_transaction_abort(gc, &t);
 
     if (!rc) {
-        switch_logdirty_done(egc,dss,0);
+        switch_logdirty_done(egc,lds,0);
     } else if (rc < 0) {
         LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,-1);
+        switch_logdirty_done(egc,lds,-1);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_save_state *dss,
+                                 libxl__logdirty_switch *lds,
                                  int broke)
 {
-    STATE_AO_GC(dss->ao);
-    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(lds->ao);
 
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
+    lds->callback(egc, lds, broke);
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
@@ -357,6 +370,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     libxl__domain_suspend_state *dsps = &dss->dsps;
 
     logdirty_init(&dss->logdirty);
+    dss->logdirty.ao = ao;
     libxl__xswait_init(&dsps->pvcontrol);
     libxl__ev_evtchn_init(&dsps->guest_evtchn);
     libxl__ev_xswatch_init(&dsps->guest_watch);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 366470f..0b62107 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2837,6 +2837,11 @@ typedef void libxl__save_device_model_cb(libxl__egc*,
                                          libxl__domain_save_state*, int rc);
 
 typedef struct libxl__logdirty_switch {
+    /* set by caller of libxl__domain_common_switch_qemu_logdirty */
+    libxl__ao *ao;
+    void (*callback)(libxl__egc *egc, struct libxl__logdirty_switch *lds,
+                     int rc);
+
     const char *cmd;
     const char *cmd_path;
     const char *ret_path;
@@ -3179,6 +3184,9 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
+_hidden void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+                                               int domid, unsigned enable,
+                                               libxl__logdirty_switch *lds);
 _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (7 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-16 10:49   ` Ian Campbell
  2015-06-16 11:05   ` Ian Jackson
  2015-06-08  3:43 ` [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back Yang Hongyang
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

 Currently, all callbacks return an integer value or void. We cannot
 return some data to xc via callback. Update libxl_save_msgs_gen.pl
 to support this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h       |  3 ++
 tools/libxl/libxl_save_callout.c   | 31 ++++++++++++++++++
 tools/libxl/libxl_save_helper.c    | 17 ++++++++++
 tools/libxl/libxl_save_msgs_gen.pl | 65 ++++++++++++++++++++++++++++++++++----
 4 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0b62107..fbbae93 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3180,6 +3180,9 @@ _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
  * When they are ready to indicate completion, they call this. */
 void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
                            libxl__save_helper_state *shs, int return_value);
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size);
 
 
 _hidden void libxl__domain_suspend_common_switch_qemu_logdirty
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index cd342b9..5c691eb 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -145,6 +145,15 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
     shs->egc = 0;
 }
 
+void libxl__xc_domain_saverestore_async_callback_done_with_data(libxl__egc *egc,
+                           libxl__save_helper_state *shs,
+                           const void *data, uint64_t size)
+{
+    shs->egc = egc;
+    libxl__srm_callout_sendreply_data(data, size, shs);
+    shs->egc = 0;
+}
+
 /*----- helper execution -----*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
@@ -370,6 +379,28 @@ void libxl__srm_callout_sendreply(int r, void *user)
         helper_failed(egc, shs, ERROR_FAIL);
 }
 
+void libxl__srm_callout_sendreply_data(const void *data, uint64_t size, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(shs->ao);
+    int errnoval;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   &size, sizeof(size), shs->stdin_what,
+                                   "callback return data length");
+    if (errnoval)
+        goto out;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   data, size, shs->stdin_what,
+                                   "callback return data");
+
+out:
+    if (errnoval)
+        helper_failed(egc, shs, ERROR_FAIL);
+}
+
 void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
                   const char *context, const char *formatted, void *user)
 {
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 74826a1..44c5807 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -155,6 +155,23 @@ int helper_getreply(void *user)
     return v;
 }
 
+uint8_t *helper_getreply_data(void *user)
+{
+    uint64_t size;
+    int r = read_exactly(0, &size, sizeof(size));
+    uint8_t *data;
+
+    if (r <= 0)
+        exit(-2);
+
+    data = helper_allocbuf(size, user);
+    r = read_exactly(0, data, size);
+    if (r <= 0)
+        exit(-2);
+
+    return data;
+}
+
 /*----- other callbacks -----*/
 
 static int toolstack_save_fd;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 6b4b65e..41ee000 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -15,6 +15,7 @@ our @msgs = (
     #         and its null-ness needs to be passed through to the helper's xc
     #   W  - needs a return value; callback is synchronous
     #   A  - needs a return value; callback is asynchronous
+    #   B  - return value is an pointer
     [  1, 'sr',     "log",                   [qw(uint32_t level
                                                  uint32_t errnoval
                                                  STRING context
@@ -99,23 +100,28 @@ our $libxl = "libxl__srm";
 our $callback = "${libxl}_callout_callback";
 our $receiveds = "${libxl}_callout_received";
 our $sendreply = "${libxl}_callout_sendreply";
+our $sendreply_data = "${libxl}_callout_sendreply_data";
 our $getcallbacks = "${libxl}_callout_get_callbacks";
 our $enumcallbacks = "${libxl}_callout_enumcallbacks";
 sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
 
 f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
+f_decl($sendreply_data, 'callout', 'void',
+       "(const void *data, uint64_t size, void *user)");
 
 our $helper = "helper";
 our $encode = "${helper}_stub";
 our $allocbuf = "${helper}_allocbuf";
 our $transmit = "${helper}_transmitmsg";
 our $getreply = "${helper}_getreply";
+our $getreply_data = "${helper}_getreply_data";
 our $setcallbacks = "${helper}_setcallbacks";
 
 f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
 f_decl($transmit, 'helper', 'void',
        '(unsigned char *msg_freed, int len, void *user)');
 f_decl($getreply, 'helper', 'int', '(void *user)');
+f_decl($getreply_data, 'helper', 'uint8_t *', '(void *user)');
 
 sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
 
@@ -259,12 +265,36 @@ foreach my $msginfo (@msgs) {
 
     $f_more_sr->("    case $msgnum: { /* $name */\n");
     if ($flags =~ m/W/) {
-        $f_more_sr->("        int r;\n");
+        if ($flags =~ m/B/) {
+            $f_more_sr->("        uint8_t *data;\n".
+                         "        uint64_t size;\n");
+        } else {
+            $f_more_sr->("        int r;\n");
+        }
     }
 
-    my $c_rtype_helper = $flags =~ m/[WA]/ ? 'int' : 'void';
-    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_helper;
+    if ($flags =~ m/[WA]/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_helper = 'uint8_t *'
+        } else {
+            $c_rtype_helper = 'int'
+        }
+    } else {
+        $c_rtype_helper = 'void';
+    }
+    my $c_rtype_callout;
+    if ($flags =~ m/W/) {
+        if ($flags =~ m/B/) {
+            $c_rtype_callout = 'uint8_t *';
+        } else {
+            $c_rtype_callout = 'int';
+        }
+    } else {
+        $c_rtype_callout = 'void';
+    }
     my $c_decl = '(';
+    my $c_helper_decl = '';
     my $c_callback_args = '';
 
     f_more("${encode}_$name",
@@ -305,7 +335,15 @@ END_ALWAYS
         f_more("${encode}_$name", "	${typeid}_put(buf, &len, $c_args);\n");
     }
     $f_more_sr->($c_recv);
+    $c_helper_decl = $c_decl;
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_decl .= "uint64_t *size, "
+    }
     $c_decl .= "void *user)";
+    $c_helper_decl .= "void *user)";
+    if ($flags =~ m/W/ and $flags =~ m/B/) {
+        $c_callback_args .= "&size, "
+    }
     $c_callback_args .= "user";
 
     $f_more_sr->("        if (msg != endmsg) return 0;\n");
@@ -326,10 +364,12 @@ END_ALWAYS
     my $c_make_callback = "$c_callback($c_callback_args)";
     if ($flags !~ m/W/) {
 	$f_more_sr->("        $c_make_callback;\n");
+    } elsif ($flags =~ m/B/) {
+        $f_more_sr->("        data = $c_make_callback;\n".
+                     "        $sendreply_data(data, size, user);\n");
     } else {
         $f_more_sr->("        r = $c_make_callback;\n".
                      "        $sendreply(r, user);\n");
-	f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
     }
     if ($flags =~ m/x/) {
         my $c_v = "(1u<<$msgnum)";
@@ -340,7 +380,7 @@ END_ALWAYS
     }
     $f_more_sr->("        return 1;\n    }\n\n");
     f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
-    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
+    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_helper_decl);
     f_more("${encode}_$name",
 "        if (buf) break;
         buf = ${helper}_allocbuf(len, user);
@@ -352,12 +392,23 @@ END_ALWAYS
     ${transmit}(buf, len, user);
 ");
     if ($flags =~ m/[WA]/) {
-	f_more("${encode}_$name",
-               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
+        if ($flags =~ m/B/) {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
+    uint8_t *r = ${helper}_getreply_data(user);
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: $name got reply data\\n");
+END_DEBUG
+        } else {
+            f_more("${encode}_$name",
+                   (<<END_ALWAYS.($debug ? <<END_DEBUG : '')));
     int r = ${helper}_getreply(user);
 END_ALWAYS
     fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
 END_DEBUG
+    }
+
+    f_more("${encode}_$name", (<<END_ALWAYS));
     return r;
 END_ALWAYS
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (8 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-12 12:54   ` Wei Liu
  2015-06-08  3:43 ` [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device Yang Hongyang
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

In colo mode, slave needs to send data to master, but the io_fd
only can be written in master, and only can be read in slave.
Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c          |  2 +-
 tools/libxl/libxl_create.c   | 14 ++++++++++----
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  7 +++++++
 tools/libxl/xl_cmdimpl.c     |  7 +++++++
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5c843c2..36b97fe 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -832,7 +832,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->callback = remus_failover_cb;
     dss->domid = domid;
     dss->fd = send_fd;
-    /* TODO do something with recv_fd */
+    dss->recv_fd = recv_fd;
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..bd8149c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1577,8 +1577,8 @@ static void domain_create_cb(libxl__egc *egc,
                              int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-                            uint32_t *domid,
-                            int restore_fd, int checkpointed_stream,
+                            uint32_t *domid, int restore_fd,
+                            int send_fd, int checkpointed_stream,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1591,6 +1591,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.send_fd = send_fd;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1619,7 +1620,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
                             ao_how, aop_console_how);
 }
 
@@ -1629,7 +1630,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, restore_fd,
+    int send_fd = -1;
+
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
+        send_fd = params->send_fd;
+
+    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
                             params->checkpointed_stream, ao_how, aop_console_how);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fbbae93..6d214b5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2874,6 +2874,7 @@ struct libxl__domain_save_state {
 
     uint32_t domid;
     int fd;
+    int recv_fd;
     libxl_domain_type type;
     int live;
     int debug;
@@ -3143,6 +3144,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    int send_fd;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 23f27d4..8a3d7ba 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -198,6 +198,12 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
     (3, "reference_tsc"),
     ])
 
+libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
+    (0, "NONE"),
+    (1, "REMUS"),
+    (2, "COLO"),
+    ], init_val = 0)
+
 #
 # Complex libxl types
 #
@@ -346,6 +352,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
+    ("send_fd", integer),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..adfadd1 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -154,6 +154,7 @@ struct domain_create {
     const char *extra_config; /* extra config string */
     const char *restore_file;
     int migrate_fd; /* -1 means none */
+    int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
 };
 
@@ -2533,6 +2534,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
     void *config_data = 0;
     int config_len = 0;
     int restore_fd = -1;
+    int send_fd = -1;
     const libxl_asyncprogress_how *autoconnect_console_how;
     struct save_file_header hdr;
 
@@ -2549,6 +2551,7 @@ static uint32_t create_domain(struct domain_create *dom_info)
         if (migrate_fd >= 0) {
             restore_source = "<incoming migration stream>";
             restore_fd = migrate_fd;
+            send_fd = dom_info->send_fd;
         } else {
             restore_source = restore_file;
             restore_fd = open(restore_file, O_RDONLY);
@@ -2723,6 +2726,7 @@ start:
         libxl_domain_restore_params_init(&params);
 
         params.checkpointed_stream = dom_info->checkpointed_stream;
+        params.send_fd = send_fd;
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
                                           &params,
@@ -4266,6 +4270,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.monitor = monitor;
     dom_info.paused = 1;
     dom_info.migrate_fd = recv_fd;
+    dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = remus;
 
@@ -4436,6 +4441,7 @@ int main_restore(int argc, char **argv)
     dom_info.config_file = config_file;
     dom_info.restore_file = checkpoint_file;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
@@ -4886,6 +4892,7 @@ int main_create(int argc, char **argv)
     dom_info.config_file = filename;
     dom_info.extra_config = extra_config;
     dom_info.migrate_fd = -1;
+    dom_info.send_fd = -1;
     dom_info.vnc = vnc;
     dom_info.vncautopass = vncautopass;
     dom_info.console_autoconnect = console_autoconnect;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (9 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-12 13:30   ` Wei Liu
  2015-06-08  3:43 ` [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation Yang Hongyang
  2015-06-08  3:43 ` [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device Yang Hongyang
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This patch is auto generated by the following commands:
 1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c
 2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile
 3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' tools/libxl/*.[ch]
 4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' tools/libxl/*.[ch]
 5. perl -pi -e 's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g' tools/libxl/*.[ch]
 6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' tools/libxl/*.[ch]
 7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' tools/libxl/*.[ch]
 8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' tools/libxl/*.[ch]
 9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' tools/libxl/*.[ch]
10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl
12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl
13. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch]
14. perl -pi -e 's/\bremus device/checkpoint device/g' tools/libxl/libxl_internal.h
15. perl -pi -e 's/\bRemus device/checkpoint device/g' tools/libxl/libxl_internal.h
16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' tools/libxl/libxl_internal.h
17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' tools/libxl/libxl_internal.h
18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' tools/libxl/libxl_internal.h

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile                  |   2 +-
 tools/libxl/libxl_checkpoint_device.c | 327 ++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h          | 112 ++++++------
 tools/libxl/libxl_netbuffer.c         | 108 +++++------
 tools/libxl/libxl_nonetbuffer.c       |  10 +-
 tools/libxl/libxl_remus.c             |  76 ++++----
 tools/libxl/libxl_remus_device.c      | 327 ----------------------------------
 tools/libxl/libxl_remus_disk_drbd.c   |  52 +++---
 tools/libxl/libxl_types.idl           |   4 +-
 9 files changed, 509 insertions(+), 509 deletions(-)
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index df51b22..cd63dac 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
new file mode 100644
index 0000000..109cd23
--- /dev/null
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
+    &remus_device_drbd_disk,
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(cds);
+        if (rc) goto out;
+    }
+
+    rc = init_subkind_drbd_disk(cds);
+    if (rc) goto out;
+
+    rc = 0;
+out:
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(cds);
+
+    cleanup_subkind_drbd_disk(cds);
+}
+
+/*----- setup() and teardown() -----*/
+
+/* callbacks */
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc);
+static void device_setup_iterate(libxl__egc *egc,
+                                 libxl__ao_device *aodev);
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc);
+
+/* checkpoint device setup and teardown */
+
+static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds,
+                                              libxl__device_kind kind,
+                                              void *libxl_dev)
+{
+    libxl__checkpoint_device *dev = NULL;
+
+    STATE_AO_GC(cds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->cds = cds;
+
+    return dev;
+}
+
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds);
+
+void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
+{
+    int i, rc;
+
+    STATE_AO_GC(cds->ao);
+
+    rc = init_device_subkind(cds);
+    if (rc)
+        goto out;
+
+    cds->num_devices = 0;
+    cds->num_nics = 0;
+    cds->num_disks = 0;
+
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
+        cds->nics = libxl_device_nic_list(CTX, cds->domid, &cds->num_nics);
+
+    if (cds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
+        cds->disks = libxl_device_disk_list(CTX, cds->domid, &cds->num_disks);
+
+    if (cds->num_nics == 0 && cds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(cds->devs, cds->num_nics + cds->num_disks);
+
+    for (i = 0; i < cds->num_nics; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
+                                                LIBXL__DEVICE_KIND_VIF,
+                                                &cds->nics[i]);
+    }
+
+    for (i = 0; i < cds->num_disks; i++) {
+        cds->devs[cds->num_devices++] = checkpoint_device_init(egc, cds,
+                                                LIBXL__DEVICE_KIND_VBD,
+                                                &cds->disks[i]);
+    }
+
+    checkpoint_devices_setup(egc, cds);
+
+    return;
+
+out:
+    cds->callback(egc, cds, rc);
+}
+
+static void checkpoint_devices_setup(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds)
+{
+    int i, rc;
+
+    STATE_AO_GC(cds->ao);
+
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = all_devices_setup_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        libxl__checkpoint_device *dev = cds->devs[i];
+        dev->ops_index = -1;
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
+
+        dev->aodev.rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+        dev->aodev.callback = device_setup_iterate;
+        device_setup_iterate(egc,&dev->aodev);
+    }
+
+    rc = 0;
+    libxl__multidev_prepared(egc, &cds->multidev, rc);
+}
+
+
+static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
+{
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    EGC_GC;
+
+    if (aodev->rc != ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED &&
+        aodev->rc != ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH)
+        /* might be success or disaster */
+        goto out;
+
+    do {
+        dev->ops = remus_ops[++dev->ops_index];
+        if (!dev->ops) {
+            libxl_device_nic * nic = NULL;
+            libxl_device_disk * disk = NULL;
+            uint32_t domid;
+            int devid;
+            if (dev->kind == LIBXL__DEVICE_KIND_VIF) {
+                nic = (libxl_device_nic *)dev->backend_dev;
+                domid = nic->backend_domid;
+                devid = nic->devid;
+            } else if (dev->kind == LIBXL__DEVICE_KIND_VBD) {
+                disk = (libxl_device_disk *)dev->backend_dev;
+                domid = disk->backend_domid;
+                devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
+            } else {
+                LOG(ERROR,"device kind not handled by checkpoint: %s",
+                    libxl__device_kind_to_string(dev->kind));
+                aodev->rc = ERROR_FAIL;
+                goto out;
+            }
+            LOG(ERROR,"device not handled by checkpoint"
+                " (device=%s:%"PRId32"/%"PRId32")",
+                libxl__device_kind_to_string(dev->kind),
+                domid, devid);
+            aodev->rc = ERROR_CHECKPOINT_DEVICE_NOT_SUPPORTED;
+            goto out;
+        }
+    } while (dev->ops->kind != dev->kind);
+
+    /* found the next ops_index to try */
+    assert(dev->aodev.callback == device_setup_iterate);
+    dev->ops->setup(egc,dev);
+    return;
+
+ out:
+    libxl__multidev_one_callback(egc,aodev);
+}
+
+static void all_devices_setup_cb(libxl__egc *egc,
+                                 libxl__multidev *multidev,
+                                 int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    cds->callback(egc, cds, rc);
+}
+
+void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds)
+{
+    int i;
+    libxl__checkpoint_device *dev;
+
+    STATE_AO_GC(cds->ao);
+
+    libxl__multidev_begin(ao, &cds->multidev);
+    cds->multidev.callback = devices_teardown_cb;
+    for (i = 0; i < cds->num_devices; i++) {
+        dev = cds->devs[i];
+        if (!dev->ops || !dev->matched)
+            continue;
+
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);
+        dev->ops->teardown(egc,dev);
+    }
+
+    libxl__multidev_prepared(egc, &cds->multidev, 0);
+}
+
+static void devices_teardown_cb(libxl__egc *egc,
+                                libxl__multidev *multidev,
+                                int rc)
+{
+    int i;
+
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    /* clean nic */
+    for (i = 0; i < cds->num_nics; i++)
+        libxl_device_nic_dispose(&cds->nics[i]);
+    free(cds->nics);
+    cds->nics = NULL;
+    cds->num_nics = 0;
+
+    /* clean disk */
+    for (i = 0; i < cds->num_disks; i++)
+        libxl_device_disk_dispose(&cds->disks[i]);
+    free(cds->disks);
+    cds->disks = NULL;
+    cds->num_disks = 0;
+
+    cleanup_device_subkind(cds);
+
+    cds->callback(egc, cds, rc);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc);
+
+/* API implementations */
+
+#define define_checkpoint_api(api)                                \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
+                                libxl__checkpoint_devices_state *cds)        \
+{                                                                       \
+    int i;                                                              \
+    libxl__checkpoint_device *dev;                                           \
+                                                                        \
+    STATE_AO_GC(cds->ao);                                               \
+                                                                        \
+    libxl__multidev_begin(ao, &cds->multidev);                          \
+    cds->multidev.callback = devices_checkpoint_cb;                     \
+    for (i = 0; i < cds->num_devices; i++) {                            \
+        dev = cds->devs[i];                                             \
+        if (!dev->matched || !dev->ops->api)                            \
+            continue;                                                   \
+        libxl__multidev_prepare_with_aodev(&cds->multidev, &dev->aodev);\
+        dev->ops->api(egc,dev);                                         \
+    }                                                                   \
+                                                                        \
+    libxl__multidev_prepared(egc, &cds->multidev, 0);                   \
+}
+
+define_checkpoint_api(postsuspend);
+
+define_checkpoint_api(preresume);
+
+define_checkpoint_api(commit);
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+                                  libxl__multidev *multidev,
+                                  int rc)
+{
+    STATE_AO_GC(multidev->ao);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds =
+                            CONTAINER_OF(multidev, *cds, multidev);
+
+    cds->callback(egc, cds, rc);
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6d214b5..5399601 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2649,9 +2649,9 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
-/*----- remus device related state structure -----*/
+/*----- checkpoint device related state structure -----*/
 /*
- * The abstract Remus device layer exposes a common
+ * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
  * a guest protected by Remus. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
@@ -2659,34 +2659,34 @@ typedef struct libxl__save_helper_state {
  * The following API are exposed to libxl:
  *
  * One-time configuration operations:
- *  +libxl__remus_devices_setup
+ *  +libxl__checkpoint_devices_setup
  *    > Enable output buffering for NICs, setup disk replication, etc.
- *  +libxl__remus_devices_teardown
+ *  +libxl__checkpoint_devices_teardown
  *    > Disable output buffering and disk replication; teardown any
  *       associated external setups like qdiscs for NICs.
  *
  * Operations executed every checkpoint (in order of invocation):
- *  +libxl__remus_devices_postsuspend
- *  +libxl__remus_devices_preresume
- *  +libxl__remus_devices_commit
+ *  +libxl__checkpoint_devices_postsuspend
+ *  +libxl__checkpoint_devices_preresume
+ *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__remus_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the Remus device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
- *    |-> libxl__remus_devices_setup
- *      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ *    |-> libxl__checkpoint_devices_setup
+ *      |-> Per-checkpoint libxl__checkpoint_devices_[postsuspend,preresume,commit]
  *        ...
  *        |-> On backup failure, network error or other internal errors:
- *            libxl__remus_devices_teardown
+ *            libxl__checkpoint_devices_teardown
  */
 
-typedef struct libxl__remus_device libxl__remus_device;
-typedef struct libxl__remus_devices_state libxl__remus_devices_state;
-typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops;
+typedef struct libxl__checkpoint_device libxl__checkpoint_device;
+typedef struct libxl__checkpoint_devices_state libxl__checkpoint_devices_state;
+typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_instance_ops;
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
@@ -2696,7 +2696,7 @@ typedef struct libxl__remus_device_instance_ops libxl__remus_device_instance_ops
  * synchronous and call dev->aodev.callback directly (as the last
  * thing they do).
  */
-struct libxl__remus_device_instance_ops {
+struct libxl__checkpoint_device_instance_ops {
     /* the device kind this ops belongs to... */
     libxl__device_kind kind;
 
@@ -2707,12 +2707,12 @@ struct libxl__remus_device_instance_ops {
      * Asynchronous.
      */
 
-    void (*postsuspend)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*preresume)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*commit)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*postsuspend)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*preresume)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*commit)(libxl__egc *egc, libxl__checkpoint_device *dev);
 
     /*
-     * setup() and teardown() are refer to the actual remus device.
+     * setup() and teardown() are refer to the actual checkpoint device.
      * Asynchronous.
      * teardown is called even if setup fails.
      */
@@ -2721,45 +2721,45 @@ struct libxl__remus_device_instance_ops {
      * device. If matched, the device will then be managed with this set of
      * subkind operations.
      * Yields 0 if the device successfully set up.
-     * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+     * CHECKPOINT_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
      * any other rc indicates failure.
      */
-    void (*setup)(libxl__egc *egc, libxl__remus_device *dev);
-    void (*teardown)(libxl__egc *egc, libxl__remus_device *dev);
+    void (*setup)(libxl__egc *egc, libxl__checkpoint_device *dev);
+    void (*teardown)(libxl__egc *egc, libxl__checkpoint_device *dev);
 };
 
-int init_subkind_nic(libxl__remus_devices_state *rds);
-void cleanup_subkind_nic(libxl__remus_devices_state *rds);
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds);
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds);
+int init_subkind_nic(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds);
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
-typedef void libxl__remus_callback(libxl__egc *,
-                                   libxl__remus_devices_state *, int rc);
+typedef void libxl__checkpoint_callback(libxl__egc *,
+                                   libxl__checkpoint_devices_state *, int rc);
 
 /*
- * State associated with a remus invocation, including parameters
- * passed to the remus abstract device layer by the remus
+ * State associated with a checkpoint invocation, including parameters
+ * passed to the checkpoint abstract device layer by the remus
  * save/restore machinery.
  */
-struct libxl__remus_devices_state {
-    /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+struct libxl__checkpoint_devices_state {
+    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
 
     libxl__ao *ao;
     uint32_t domid;
-    libxl__remus_callback *callback;
+    libxl__checkpoint_callback *callback;
     int device_kind_flags;
 
     /*----- private for abstract layer only -----*/
 
     int num_devices;
     /*
-     * this array is allocated before setup the remus devices by the
-     * remus abstract layer.
-     * devs may be NULL, means there's no remus devices that has been set up.
+     * this array is allocated before setup the checkpoint devices by the
+     * checkpoint abstract layer.
+     * devs may be NULL, means there's no checkpoint devices that has been set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
-    libxl__remus_device **devs;
+    libxl__checkpoint_device **devs;
 
     libxl_device_nic *nics;
     int num_nics;
@@ -2781,20 +2781,20 @@ struct libxl__remus_devices_state {
 
 /*
  * Information about a single device being handled by remus.
- * Allocated by the remus abstract layer.
+ * Allocated by the checkpoint abstract layer.
  */
-struct libxl__remus_device {
+struct libxl__checkpoint_device {
     /*----- shared between abstract and concrete layers -----*/
     /*
      * if this is true, that means the subkind ops match the device
      */
     bool matched;
 
-    /*----- set by remus device abstruct layer -----*/
-    /* libxl__device_* which this remus device related to */
+    /*----- set by checkpoint device abstruct layer -----*/
+    /* libxl__device_* which this checkpoint device related to */
     const void *backend_dev;
     libxl__device_kind kind;
-    libxl__remus_devices_state *rds;
+    libxl__checkpoint_devices_state *cds;
     libxl__ao_device aodev;
 
     /*----- private for abstract layer only -----*/
@@ -2805,7 +2805,7 @@ struct libxl__remus_device {
      * individual devices.
      */
     int ops_index;
-    const libxl__remus_device_instance_ops *ops;
+    const libxl__checkpoint_device_instance_ops *ops;
 
     /*----- private for concrete (device-specific) layer -----*/
 
@@ -2813,17 +2813,17 @@ struct libxl__remus_device {
     void *concrete_data;
 };
 
-/* the following 5 APIs are async ops, call rds->callback when done */
-_hidden void libxl__remus_devices_setup(libxl__egc *egc,
-                                        libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
-                                           libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
-                                            libxl__remus_devices_state *rds);
-_hidden void libxl__remus_devices_commit(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds);
+/* the following 5 APIs are async ops, call cds->callback when done */
+_hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
+                                           libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
+                                              libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
+                                            libxl__checkpoint_devices_state *cds);
+_hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
+                                         libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2883,7 +2883,7 @@ struct libxl__domain_save_state {
     libxl__domain_suspend_state dsps;
     int hvm;
     int xcflags;
-    libxl__remus_devices_state rds;
+    libxl__checkpoint_devices_state cds;
     libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 71c6531..86afba6 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -38,21 +38,21 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 1;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->nlsock = nl_socket_alloc();
-    if (!rds->nlsock) {
+    cds->nlsock = nl_socket_alloc();
+    if (!cds->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +61,7 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,9 +70,9 @@ int init_subkind_nic(libxl__remus_devices_state *rds)
     }
 
     if (dss->remus->netbufscript) {
-        rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
                                       libxl__xen_script_dir_path());
     }
 
@@ -82,22 +82,22 @@ out:
     return rc;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (rds->qdisc_cache) {
-        nl_cache_clear(rds->qdisc_cache);
-        nl_cache_free(rds->qdisc_cache);
-        rds->qdisc_cache = NULL;
+    if (cds->qdisc_cache) {
+        nl_cache_clear(cds->qdisc_cache);
+        nl_cache_free(cds->qdisc_cache);
+        cds->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (rds->nlsock) {
-        nl_close(rds->nlsock);
-        nl_socket_free(rds->nlsock);
-        rds->nlsock = NULL;
+    if (cds->nlsock) {
+        nl_close(cds->nlsock);
+        nl_socket_free(cds->nlsock);
+        cds->nlsock = NULL;
     }
 }
 
@@ -111,17 +111,17 @@ void cleanup_subkind_nic(libxl__remus_devices_state *rds)
  * it must ONLY be used for remus because if driver domains
  * were in use it would constitute a security vulnerability.
  */
-static const char *get_vifname(libxl__remus_device *dev,
+static const char *get_vifname(libxl__checkpoint_device *dev,
                                const libxl_device_nic *nic)
 {
     const char *vifname = NULL;
     const char *path;
     int rc;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = dev->rds->domid;
+    const uint32_t domid = dev->cds->domid;
 
     path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
                      libxl__xs_get_dompath(gc, 0), domid, nic->devid);
@@ -144,19 +144,19 @@ static void free_qdisc(libxl__remus_device_nic *remus_nic)
     remus_nic->qdisc = NULL;
 }
 
-static int init_qdisc(libxl__remus_devices_state *rds,
+static int init_qdisc(libxl__checkpoint_devices_state *cds,
                       libxl__remus_device_nic *remus_nic)
 {
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +164,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +187,7 @@ static int init_qdisc(libxl__remus_devices_state *rds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -231,19 +231,19 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
  * $REMUS_IFB (for teardown)
  * setup/teardown as command line arg.
  */
-static void setup_async_exec(libxl__remus_device *dev, char *op)
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
 {
     int arraysize, nr = 0;
     char **env = NULL, **args = NULL;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, rds->netbufscript);
-    const uint32_t domid = rds->domid;
+    char *const script = libxl__strdup(gc, cds->netbufscript);
+    const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char *const ifb = remus_nic->ifb;
@@ -269,7 +269,7 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
     args[nr++] = NULL;
     assert(nr == arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", args[0], args[1]);
     aes->env = env;
     aes->args = args;
@@ -286,13 +286,13 @@ static void setup_async_exec(libxl__remus_device *dev, char *op)
 
 /* setup() and teardown() */
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic;
     const libxl_device_nic *nic = dev->backend_dev;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /*
      * thers's no subkind of nic devices, so nic ops is always matched
@@ -330,16 +330,16 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
                                    int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
-    libxl__remus_devices_state *rds = dev->rds;
+    libxl__checkpoint_devices_state *cds = dev->cds;
     const char *out_path_base, *hotplug_error = NULL;
     int rc;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    const uint32_t domid = rds->domid;
+    const uint32_t domid = cds->domid;
     const int devid = remus_nic->devid;
     const char *const vif = remus_nic->vif;
     const char **const ifb = &remus_nic->ifb;
@@ -373,7 +373,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            rds->netbufscript, vif, hotplug_error);
+            cds->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -384,17 +384,17 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     }
 
     LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
-    rc = init_qdisc(rds, remus_nic);
+    rc = init_qdisc(cds, remus_nic);
 
 out:
     aodev->rc = rc;
     aodev->callback(egc, aodev);
 }
 
-static void nic_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     setup_async_exec(dev, "teardown");
 
@@ -415,7 +415,7 @@ static void netbuf_teardown_script_cb(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
     if (status)
@@ -440,12 +440,12 @@ enum {
 /* API implementations */
 
 static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
-                           libxl__remus_devices_state *rds,
+                           libxl__checkpoint_devices_state *cds,
                            int buffer_op)
 {
     int rc, ret;
 
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
     if (buffer_op == tc_buffer_start)
         ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
@@ -457,7 +457,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
@@ -474,33 +474,33 @@ out:
     return rc;
 }
 
-static void nic_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_start);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-static void nic_commit(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_commit(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int rc;
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
-    rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+    rc = remus_netbuf_op(remus_nic, dev->cds, tc_buffer_release);
 
     dev->aodev.rc = rc;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
     .teardown = nic_teardown,
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 3c659c2..4b68152 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,25 +22,25 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
-int init_subkind_nic(libxl__remus_devices_state *rds)
+int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return 0;
 }
 
-void cleanup_subkind_nic(libxl__remus_devices_state *rds)
+void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
-static void nic_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     dev->aodev.rc = ERROR_FAIL;
     dev->aodev.callback(egc, &dev->aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_nic = {
+const libxl__checkpoint_device_instance_ops remus_device_nic = {
     .kind = LIBXL__DEVICE_KIND_VIF,
     .setup = nic_setup,
 };
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index d6cb832..ba26cde 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -20,14 +20,14 @@
 /*----- Remus setup and teardown -----*/
 
 static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc);
+                             libxl__checkpoint_devices_state *cds, int rc);
 static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc);
+                               libxl__checkpoint_devices_state *cds, int rc);
 
 void libxl__remus_setup(libxl__egc *egc, libxl__domain_save_state *dss)
 {
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
     const libxl_domain_remus_info *const info = dss->remus;
 
     STATE_AO_GC(dss->ao);
@@ -37,17 +37,17 @@ void libxl__remus_setup(libxl__egc *egc, libxl__domain_save_state *dss)
             LOG(ERROR, "Remus: No support for network buffering");
             goto out;
         }
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VIF);
     }
 
     if (libxl_defbool_val(info->diskbuf))
-        rds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
+        cds->device_kind_flags |= (1 << LIBXL__DEVICE_KIND_VBD);
 
-    rds->ao = ao;
-    rds->domid = dss->domid;
-    rds->callback = remus_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+    cds->callback = remus_setup_done;
 
-    libxl__remus_devices_setup(egc, rds);
+    libxl__checkpoint_devices_setup(egc, cds);
     return;
 
 out:
@@ -55,9 +55,9 @@ out:
 }
 
 static void remus_setup_done(libxl__egc *egc,
-                             libxl__remus_devices_state *rds, int rc)
+                                   libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -67,14 +67,14 @@ static void remus_setup_done(libxl__egc *egc,
 
     LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
         dss->domid, rc);
-    rds->callback = remus_setup_failed;
-    libxl__remus_devices_teardown(egc, rds);
+    cds->callback = remus_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, cds);
 }
 
 static void remus_setup_failed(libxl__egc *egc,
-                               libxl__remus_devices_state *rds, int rc)
+                               libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -85,7 +85,7 @@ static void remus_setup_failed(libxl__egc *egc,
 }
 
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc);
 void libxl__remus_teardown(libxl__egc *egc,
                            libxl__domain_save_state *dss,
@@ -95,15 +95,15 @@ void libxl__remus_teardown(libxl__egc *egc,
 
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->rds.callback = remus_teardown_done;
-    libxl__remus_devices_teardown(egc, &dss->rds);
+    dss->cds.callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
-                                libxl__remus_devices_state *rds,
+                                libxl__checkpoint_devices_state *cds,
                                 int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -117,10 +117,10 @@ static void remus_teardown_done(libxl__egc *egc,
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dsps, int ok);
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc);
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc);
 
 void libxl__remus_domain_suspend_callback(void *data)
@@ -142,9 +142,9 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (!ok)
         goto out;
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_postsuspend_cb;
-    libxl__remus_devices_postsuspend(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
 
 out:
@@ -152,11 +152,11 @@ out:
 }
 
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-                                         libxl__remus_devices_state *rds,
+                                         libxl__checkpoint_devices_state *cds,
                                          int rc)
 {
     int ok = 0;
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     if (rc)
         goto out;
@@ -174,17 +174,17 @@ void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_save_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    libxl__remus_devices_state *const rds = &dss->rds;
-    rds->callback = remus_devices_preresume_cb;
-    libxl__remus_devices_preresume(egc, rds);
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    cds->callback = remus_devices_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
 }
 
 static void remus_devices_preresume_cb(libxl__egc *egc,
-                                       libxl__remus_devices_state *rds,
+                                       libxl__checkpoint_devices_state *cds,
                                        int rc)
 {
     int ok = 0;
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -206,7 +206,7 @@ out:
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_save_state *dss, int rc);
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc);
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs);
@@ -230,7 +230,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_save_state *dss, int rc)
 {
     /* Convenience aliases */
-    libxl__remus_devices_state *const rds = &dss->rds;
+    libxl__checkpoint_devices_state *const cds = &dss->cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -239,8 +239,8 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
         goto out;
     }
 
-    rds->callback = remus_devices_commit_cb;
-    libxl__remus_devices_commit(egc, rds);
+    cds->callback = remus_devices_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
 
     return;
 
@@ -249,10 +249,10 @@ out:
 }
 
 static void remus_devices_commit_cb(libxl__egc *egc,
-                                    libxl__remus_devices_state *rds,
+                                    libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(rds, *dss, rds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
deleted file mode 100644
index a6cb7f6..0000000
--- a/tools/libxl/libxl_remus_device.c
+++ /dev/null
@@ -1,327 +0,0 @@
-/*
- * Copyright (C) 2014 FUJITSU LIMITED
- * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published
- * by the Free Software Foundation; version 2.1 only. with the special
- * exception on linking described in file LICENSE.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- */
-
-#include "libxl_osdeps.h" /* must come before any other headers */
-
-#include "libxl_internal.h"
-
-extern const libxl__remus_device_instance_ops remus_device_nic;
-extern const libxl__remus_device_instance_ops remus_device_drbd_disk;
-static const libxl__remus_device_instance_ops *remus_ops[] = {
-    &remus_device_nic,
-    &remus_device_drbd_disk,
-    NULL,
-};
-
-/*----- helper functions -----*/
-
-static int init_device_subkind(libxl__remus_devices_state *rds)
-{
-    /* init device subkind-specific state in the libxl ctx */
-    int rc;
-    STATE_AO_GC(rds->ao);
-
-    if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(rds);
-        if (rc) goto out;
-    }
-
-    rc = init_subkind_drbd_disk(rds);
-    if (rc) goto out;
-
-    rc = 0;
-out:
-    return rc;
-}
-
-static void cleanup_device_subkind(libxl__remus_devices_state *rds)
-{
-    /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(rds->ao);
-
-    if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(rds);
-
-    cleanup_subkind_drbd_disk(rds);
-}
-
-/*----- setup() and teardown() -----*/
-
-/* callbacks */
-
-static void all_devices_setup_cb(libxl__egc *egc,
-                                 libxl__multidev *multidev,
-                                 int rc);
-static void device_setup_iterate(libxl__egc *egc,
-                                 libxl__ao_device *aodev);
-static void devices_teardown_cb(libxl__egc *egc,
-                                libxl__multidev *multidev,
-                                int rc);
-
-/* remus device setup and teardown */
-
-static libxl__remus_device* remus_device_init(libxl__egc *egc,
-                                              libxl__remus_devices_state *rds,
-                                              libxl__device_kind kind,
-                                              void *libxl_dev)
-{
-    libxl__remus_device *dev = NULL;
-
-    STATE_AO_GC(rds->ao);
-    GCNEW(dev);
-    dev->backend_dev = libxl_dev;
-    dev->kind = kind;
-    dev->rds = rds;
-
-    return dev;
-}
-
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds);
-
-void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
-{
-    int i, rc;
-
-    STATE_AO_GC(rds->ao);
-
-    rc = init_device_subkind(rds);
-    if (rc)
-        goto out;
-
-    rds->num_devices = 0;
-    rds->num_nics = 0;
-    rds->num_disks = 0;
-
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VIF))
-        rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
-
-    if (rds->device_kind_flags & (1 << LIBXL__DEVICE_KIND_VBD))
-        rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
-
-    if (rds->num_nics == 0 && rds->num_disks == 0)
-        goto out;
-
-    GCNEW_ARRAY(rds->devs, rds->num_nics + rds->num_disks);
-
-    for (i = 0; i < rds->num_nics; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
-                                                LIBXL__DEVICE_KIND_VIF,
-                                                &rds->nics[i]);
-    }
-
-    for (i = 0; i < rds->num_disks; i++) {
-        rds->devs[rds->num_devices++] = remus_device_init(egc, rds,
-                                                LIBXL__DEVICE_KIND_VBD,
-                                                &rds->disks[i]);
-    }
-
-    remus_devices_setup(egc, rds);
-
-    return;
-
-out:
-    rds->callback(egc, rds, rc);
-}
-
-static void remus_devices_setup(libxl__egc *egc,
-                                libxl__remus_devices_state *rds)
-{
-    int i, rc;
-
-    STATE_AO_GC(rds->ao);
-
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = all_devices_setup_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        libxl__remus_device *dev = rds->devs[i];
-        dev->ops_index = -1;
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
-
-        dev->aodev.rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-        dev->aodev.callback = device_setup_iterate;
-        device_setup_iterate(egc,&dev->aodev);
-    }
-
-    rc = 0;
-    libxl__multidev_prepared(egc, &rds->multidev, rc);
-}
-
-
-static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
-{
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
-    EGC_GC;
-
-    if (aodev->rc != ERROR_REMUS_DEVICE_NOT_SUPPORTED &&
-        aodev->rc != ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
-        /* might be success or disaster */
-        goto out;
-
-    do {
-        dev->ops = remus_ops[++dev->ops_index];
-        if (!dev->ops) {
-            libxl_device_nic * nic = NULL;
-            libxl_device_disk * disk = NULL;
-            uint32_t domid;
-            int devid;
-            if (dev->kind == LIBXL__DEVICE_KIND_VIF) {
-                nic = (libxl_device_nic *)dev->backend_dev;
-                domid = nic->backend_domid;
-                devid = nic->devid;
-            } else if (dev->kind == LIBXL__DEVICE_KIND_VBD) {
-                disk = (libxl_device_disk *)dev->backend_dev;
-                domid = disk->backend_domid;
-                devid = libxl__device_disk_dev_number(disk->vdev, NULL, NULL);
-            } else {
-                LOG(ERROR,"device kind not handled by remus: %s",
-                    libxl__device_kind_to_string(dev->kind));
-                aodev->rc = ERROR_FAIL;
-                goto out;
-            }
-            LOG(ERROR,"device not handled by remus"
-                " (device=%s:%"PRId32"/%"PRId32")",
-                libxl__device_kind_to_string(dev->kind),
-                domid, devid);
-            aodev->rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
-            goto out;
-        }
-    } while (dev->ops->kind != dev->kind);
-
-    /* found the next ops_index to try */
-    assert(dev->aodev.callback == device_setup_iterate);
-    dev->ops->setup(egc,dev);
-    return;
-
- out:
-    libxl__multidev_one_callback(egc,aodev);
-}
-
-static void all_devices_setup_cb(libxl__egc *egc,
-                                 libxl__multidev *multidev,
-                                 int rc)
-{
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    rds->callback(egc, rds, rc);
-}
-
-void libxl__remus_devices_teardown(libxl__egc *egc,
-                                   libxl__remus_devices_state *rds)
-{
-    int i;
-    libxl__remus_device *dev;
-
-    STATE_AO_GC(rds->ao);
-
-    libxl__multidev_begin(ao, &rds->multidev);
-    rds->multidev.callback = devices_teardown_cb;
-    for (i = 0; i < rds->num_devices; i++) {
-        dev = rds->devs[i];
-        if (!dev->ops || !dev->matched)
-            continue;
-
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
-        dev->ops->teardown(egc,dev);
-    }
-
-    libxl__multidev_prepared(egc, &rds->multidev, 0);
-}
-
-static void devices_teardown_cb(libxl__egc *egc,
-                                libxl__multidev *multidev,
-                                int rc)
-{
-    int i;
-
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    /* clean nic */
-    for (i = 0; i < rds->num_nics; i++)
-        libxl_device_nic_dispose(&rds->nics[i]);
-    free(rds->nics);
-    rds->nics = NULL;
-    rds->num_nics = 0;
-
-    /* clean disk */
-    for (i = 0; i < rds->num_disks; i++)
-        libxl_device_disk_dispose(&rds->disks[i]);
-    free(rds->disks);
-    rds->disks = NULL;
-    rds->num_disks = 0;
-
-    cleanup_device_subkind(rds);
-
-    rds->callback(egc, rds, rc);
-}
-
-/*----- checkpointing APIs -----*/
-
-/* callbacks */
-
-static void devices_checkpoint_cb(libxl__egc *egc,
-                                  libxl__multidev *multidev,
-                                  int rc);
-
-/* API implementations */
-
-#define define_remus_checkpoint_api(api)                                \
-void libxl__remus_devices_##api(libxl__egc *egc,                        \
-                                libxl__remus_devices_state *rds)        \
-{                                                                       \
-    int i;                                                              \
-    libxl__remus_device *dev;                                           \
-                                                                        \
-    STATE_AO_GC(rds->ao);                                               \
-                                                                        \
-    libxl__multidev_begin(ao, &rds->multidev);                          \
-    rds->multidev.callback = devices_checkpoint_cb;                     \
-    for (i = 0; i < rds->num_devices; i++) {                            \
-        dev = rds->devs[i];                                             \
-        if (!dev->matched || !dev->ops->api)                            \
-            continue;                                                   \
-        libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
-        dev->ops->api(egc,dev);                                         \
-    }                                                                   \
-                                                                        \
-    libxl__multidev_prepared(egc, &rds->multidev, 0);                   \
-}
-
-define_remus_checkpoint_api(postsuspend);
-
-define_remus_checkpoint_api(preresume);
-
-define_remus_checkpoint_api(commit);
-
-static void devices_checkpoint_cb(libxl__egc *egc,
-                                  libxl__multidev *multidev,
-                                  int rc)
-{
-    STATE_AO_GC(multidev->ao);
-
-    /* Convenience aliases */
-    libxl__remus_devices_state *const rds =
-                            CONTAINER_OF(multidev, *rds, multidev);
-
-    rds->callback(egc, rds, rc);
-}
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index afe9b61..50b897d 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -26,30 +26,30 @@ typedef struct libxl__remus_drbd_disk {
     int ackwait;
 } libxl__remus_drbd_disk;
 
-int init_subkind_drbd_disk(libxl__remus_devices_state *rds)
+int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
-    STATE_AO_GC(rds->ao);
+    STATE_AO_GC(cds->ao);
 
-    rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
                                        libxl__xen_script_dir_path());
 
     return 0;
 }
 
-void cleanup_subkind_drbd_disk(libxl__remus_devices_state *rds)
+void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
     return;
 }
 
 /*----- helper functions, for async calls -----*/
 static void drbd_async_call(libxl__egc *egc,
-                            libxl__remus_device *dev,
-                            void func(libxl__remus_device *),
+                            libxl__checkpoint_device *dev,
+                            void func(libxl__checkpoint_device *),
                             libxl__ev_child_callback callback)
 {
     int pid = -1, rc;
     libxl__ao_device *aodev = &dev->aodev;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* Fork and call */
     pid = libxl__ev_child_fork(gc, &aodev->child, callback);
@@ -82,21 +82,21 @@ static void match_async_exec_cb(libxl__egc *egc,
 
 /* implementations */
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev);
 
-static void drbd_setup(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     match_async_exec(egc, dev);
 }
 
-static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
     arraysize = 1;
@@ -107,12 +107,12 @@ static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->rds->drbd_probe_script;
+    aes->args[nr++] = dev->cds->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
 
-    aes->ao = dev->rds->ao;
+    aes->ao = dev->cds->ao;
     aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
     aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
     aes->callback = match_async_exec_cb;
@@ -137,14 +137,14 @@ static void match_async_exec_cb(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *drbd_disk;
     const libxl_device_disk *disk = dev->backend_dev;
 
     STATE_AO_GC(aodev->ao);
 
     if (status) {
-        rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
         /* BUG: seems to assume that any exit status means `no match' */
         /* BUG: exit status will have been logged as an error */
         goto out;
@@ -169,10 +169,10 @@ out:
     aodev->callback(egc, aodev);
 }
 
-static void drbd_teardown(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_teardown(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     close(drbd_disk->ctl_fd);
     dev->aodev.rc = 0;
@@ -189,9 +189,9 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 /* API implementations */
 
 /* this op will not wait and block, so implement as sync op */
-static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_postsuspend(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
@@ -205,16 +205,16 @@ static void drbd_postsuspend(libxl__egc *egc, libxl__remus_device *dev)
 }
 
 
-static void drbd_preresume_async(libxl__remus_device *dev);
+static void drbd_preresume_async(libxl__checkpoint_device *dev);
 
-static void drbd_preresume(libxl__egc *egc, libxl__remus_device *dev)
+static void drbd_preresume(libxl__egc *egc, libxl__checkpoint_device *dev)
 {
-    STATE_AO_GC(dev->rds->ao);
+    STATE_AO_GC(dev->cds->ao);
 
     drbd_async_call(egc, dev, drbd_preresume_async, checkpoint_async_call_done);
 }
 
-static void drbd_preresume_async(libxl__remus_device *dev)
+static void drbd_preresume_async(libxl__checkpoint_device *dev)
 {
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
     int ackwait = rdd->ackwait;
@@ -233,7 +233,7 @@ static void checkpoint_async_call_done(libxl__egc *egc,
 {
     int rc;
     libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
-    libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *rdd = dev->concrete_data;
 
     STATE_AO_GC(aodev->ao);
@@ -251,7 +251,7 @@ out:
     aodev->callback(egc, aodev);
 }
 
-const libxl__remus_device_instance_ops remus_device_drbd_disk = {
+const libxl__checkpoint_device_instance_ops remus_device_drbd_disk = {
     .kind = LIBXL__DEVICE_KIND_VBD,
     .setup = drbd_setup,
     .teardown = drbd_teardown,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 8a3d7ba..375c546 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,8 +61,8 @@ libxl_error = Enumeration("error", [
     (-15, "LOCK_FAIL"),
     (-16, "JSON_CONFIG_EMPTY"),
     (-17, "DEVICE_EXISTS"),
-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
     (-20, "VNUMA_CONFIG_INVALID"),
     (-21, "DOMAIN_NOTFOUND"),
     ], value_namespace = "")
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (10 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-16 10:53   ` Ian Campbell
  2015-06-08  3:43 ` [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device Yang Hongyang
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This is just tidying up after the previous automatic renaming.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_checkpoint_device.c | 21 +++++++++++----------
 tools/libxl/libxl_internal.h          | 19 +++++++++++--------
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 109cd23..226f159 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc,
 /* checkpoint device setup and teardown */
 
 static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds,
-                                              libxl__device_kind kind,
-                                              void *libxl_dev)
+                                        libxl__checkpoint_devices_state *cds,
+                                        libxl__device_kind kind,
+                                        void *libxl_dev)
 {
     libxl__checkpoint_device *dev = NULL;
 
@@ -89,9 +89,10 @@ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds);
+                                     libxl__checkpoint_devices_state *cds);
 
-void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc,
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -137,7 +138,7 @@ out:
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-                                libxl__checkpoint_devices_state *cds)
+                                     libxl__checkpoint_devices_state *cds)
 {
     int i, rc;
 
@@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_checkpoint_api(api)                                \
-void libxl__checkpoint_devices_##api(libxl__egc *egc,                        \
-                                libxl__checkpoint_devices_state *cds)        \
+#define define_checkpoint_api(api)                                      \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,                   \
+                                libxl__checkpoint_devices_state *cds)   \
 {                                                                       \
     int i;                                                              \
-    libxl__checkpoint_device *dev;                                           \
+    libxl__checkpoint_device *dev;                                      \
                                                                         \
     STATE_AO_GC(cds->ao);                                               \
                                                                         \
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5399601..3a1360b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2673,7 +2673,8 @@ typedef struct libxl__save_helper_state {
  * Each device type needs to implement the interfaces specified in
  * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the checkpoint device layer is shown below:
+ * The high-level control flow through the checkpoint device layer is shown
+ * below:
  *
  * xl remus
  *  |->  libxl_domain_remus_start
@@ -2734,7 +2735,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
-                                   libxl__checkpoint_devices_state *, int rc);
+                                        libxl__checkpoint_devices_state *,
+                                        int rc);
 
 /*
  * State associated with a checkpoint invocation, including parameters
@@ -2742,7 +2744,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *,
  * save/restore machinery.
  */
 struct libxl__checkpoint_devices_state {
-    /*---- must be set by caller of libxl__checkpoint_device_(setup|teardown) ----*/
+    /*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) --*/
 
     libxl__ao *ao;
     uint32_t domid;
@@ -2755,7 +2757,8 @@ struct libxl__checkpoint_devices_state {
     /*
      * this array is allocated before setup the checkpoint devices by the
      * checkpoint abstract layer.
-     * devs may be NULL, means there's no checkpoint devices that has been set up.
+     * devs may be NULL, means there's no checkpoint devices that has been
+     * set up.
      * the size of this array is 'num_devices', which is the total number
      * of libxl nic devices and disk devices(num_nics + num_disks).
      */
@@ -2817,13 +2820,13 @@ struct libxl__checkpoint_device {
 _hidden void libxl__checkpoint_devices_setup(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_teardown(libxl__egc *egc,
-                                           libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_postsuspend(libxl__egc *egc,
-                                              libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
-                                            libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
-                                         libxl__checkpoint_devices_state *cds);
+                                        libxl__checkpoint_devices_state *cds);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device
  2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
                   ` (11 preceding siblings ...)
  2015-06-08  3:43 ` [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation Yang Hongyang
@ 2015-06-08  3:43 ` Yang Hongyang
  2015-06-12 13:28   ` Wei Liu
  12 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  3:43 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Checkpoint device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there are
still some codes in checkpoint device which touch remus:
1. remus_ops: we use remus ops directly in checkpoint
   device. Store it in checkpoint device state.
2. concrete layer's private member: add a new structure
   remus state, and move them to remus state.
3. init/cleanup device subkind: we call (init|cleanup)_subkind_nic
   and (init|cleanup)_subkind_drbd_disk directly in checkpoint
   device. Call them before calling libxl__checkpoint_devices_setup()
   or after calling libxl__checkpoint_devices_teardown().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl.c                   |  2 +-
 tools/libxl/libxl_checkpoint_device.c | 52 ++-------------------
 tools/libxl/libxl_dom_save.c          |  3 +-
 tools/libxl/libxl_internal.h          | 40 ++++++++++------
 tools/libxl/libxl_netbuffer.c         | 51 +++++++++++---------
 tools/libxl/libxl_remus.c             | 88 ++++++++++++++++++++++++++++-------
 tools/libxl/libxl_remus_disk_drbd.c   |  8 ++--
 7 files changed, 135 insertions(+), 109 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 36b97fe..10d3d82 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -841,7 +841,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, dss);
+    libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c
index 226f159..0a16dbb 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,46 +17,6 @@
 
 #include "libxl_internal.h"
 
-extern const libxl__checkpoint_device_instance_ops remus_device_nic;
-extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
-static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
-    &remus_device_nic,
-    &remus_device_drbd_disk,
-    NULL,
-};
-
-/*----- helper functions -----*/
-
-static int init_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* init device subkind-specific state in the libxl ctx */
-    int rc;
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc)) {
-        rc = init_subkind_nic(cds);
-        if (rc) goto out;
-    }
-
-    rc = init_subkind_drbd_disk(cds);
-    if (rc) goto out;
-
-    rc = 0;
-out:
-    return rc;
-}
-
-static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-    /* cleanup device subkind-specific state in the libxl ctx */
-    STATE_AO_GC(cds->ao);
-
-    if (libxl__netbuffer_enabled(gc))
-        cleanup_subkind_nic(cds);
-
-    cleanup_subkind_drbd_disk(cds);
-}
-
 /*----- setup() and teardown() -----*/
 
 /* callbacks */
@@ -94,14 +54,10 @@ static void checkpoint_devices_setup(libxl__egc *egc,
 void libxl__checkpoint_devices_setup(libxl__egc *egc,
                                      libxl__checkpoint_devices_state *cds)
 {
-    int i, rc;
+    int i;
 
     STATE_AO_GC(cds->ao);
 
-    rc = init_device_subkind(cds);
-    if (rc)
-        goto out;
-
     cds->num_devices = 0;
     cds->num_nics = 0;
     cds->num_disks = 0;
@@ -134,7 +90,7 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc,
     return;
 
 out:
-    cds->callback(egc, cds, rc);
+    cds->callback(egc, cds, 0);
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
@@ -172,7 +128,7 @@ static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev)
         goto out;
 
     do {
-        dev->ops = remus_ops[++dev->ops_index];
+        dev->ops = dev->cds->ops[++dev->ops_index];
         if (!dev->ops) {
             libxl_device_nic * nic = NULL;
             libxl_device_disk * disk = NULL;
@@ -271,8 +227,6 @@ static void devices_teardown_cb(libxl__egc *egc,
     cds->disks = NULL;
     cds->num_disks = 0;
 
-    cleanup_device_subkind(cds);
-
     cds->callback(egc, cds, rc);
 }
 
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index c15e9f1..cb3d8db 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -402,7 +402,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     dsps->dm_savefile = libxl__device_model_savefile(gc, domid);
 
     if (r_info != NULL) {
-        dss->interval = r_info->interval;
         dss->xcflags |= XCFLAGS_CHECKPOINTED;
         if (libxl_defbool_val(r_info->compression))
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
@@ -601,7 +600,7 @@ static void domain_save_done(libxl__egc *egc,
      * from sending checkpoints. Teardown the network buffers and
      * release netlink resources.  This is an async op.
      */
-    libxl__remus_teardown(egc, dss, rc);
+    libxl__remus_teardown(egc, &dss->rs, rc);
 }
 
 /*==================== Domain restore ====================*/
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3a1360b..0ebb104 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2750,6 +2750,8 @@ struct libxl__checkpoint_devices_state {
     uint32_t domid;
     libxl__checkpoint_callback *callback;
     int device_kind_flags;
+    /* The ops must be pointer array, and the last ops must be NULL */
+    const libxl__checkpoint_device_instance_ops **ops;
 
     /*----- private for abstract layer only -----*/
 
@@ -2770,16 +2772,6 @@ struct libxl__checkpoint_devices_state {
     int num_disks;
 
     libxl__multidev multidev;
-
-    /*----- private for concrete (device-specific) layer only -----*/
-
-    /* private for nic device subkind ops */
-    char *netbufscript;
-    struct nl_sock *nlsock;
-    struct nl_cache *qdisc_cache;
-
-    /* private for drbd disk subkind ops */
-    char *drbd_probe_script;
 };
 
 /*
@@ -2827,6 +2819,26 @@ _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
                                         libxl__checkpoint_devices_state *cds);
+
+/*----- Remus related state structure -----*/
+typedef struct libxl__remus_state libxl__remus_state;
+struct libxl__remus_state {
+    /* private */
+    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
+    int interval; /* checkpoint interval */
+
+    /* abstract layer */
+    libxl__checkpoint_devices_state cds;
+
+    /*----- private for concrete (device-specific) layer only -----*/
+    /* private for nic device subkind ops */
+    char *netbufscript;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+
+    /* private for drbd disk subkind ops */
+    char *drbd_probe_script;
+};
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2886,9 +2898,7 @@ struct libxl__domain_save_state {
     libxl__domain_suspend_state dsps;
     int hvm;
     int xcflags;
-    libxl__checkpoint_devices_state cds;
-    libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
-    int interval; /* checkpoint interval (for Remus) */
+    libxl__remus_state rs;
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
@@ -3231,9 +3241,9 @@ _hidden void libxl__remus_domain_resume_callback(void *data);
 _hidden void libxl__remus_domain_checkpoint_callback(void *data);
 /* Remus setup and teardown*/
 _hidden void libxl__remus_setup(libxl__egc *egc,
-                                libxl__domain_save_state *dss);
+                                libxl__remus_state *rs);
 _hidden void libxl__remus_teardown(libxl__egc *egc,
-                                   libxl__domain_save_state *dss,
+                                   libxl__remus_state *rs,
                                    int rc);
 
 /*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 86afba6..1d01e10 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,18 +41,19 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
     int rc, ret;
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(rs, *dss, rs);
 
     STATE_AO_GC(cds->ao);
 
-    cds->nlsock = nl_socket_alloc();
-    if (!cds->nlsock) {
+    rs->nlsock = nl_socket_alloc();
+    if (!rs->nlsock) {
         LOG(ERROR, "cannot allocate nl socket");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    ret = nl_connect(cds->nlsock, NETLINK_ROUTE);
+    ret = nl_connect(rs->nlsock, NETLINK_ROUTE);
     if (ret) {
         LOG(ERROR, "failed to open netlink socket: %s",
             nl_geterror(ret));
@@ -61,7 +62,7 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     /* get list of all qdiscs installed on network devs. */
-    ret = rtnl_qdisc_alloc_cache(cds->nlsock, &cds->qdisc_cache);
+    ret = rtnl_qdisc_alloc_cache(rs->nlsock, &rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "failed to allocate qdisc cache: %s",
             nl_geterror(ret));
@@ -70,10 +71,10 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds)
     }
 
     if (dss->remus->netbufscript) {
-        cds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+        rs->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
     } else {
-        cds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
-                                      libxl__xen_script_dir_path());
+        rs->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+                                     libxl__xen_script_dir_path());
     }
 
     rc = 0;
@@ -84,20 +85,22 @@ out:
 
 void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
+
     STATE_AO_GC(cds->ao);
 
     /* free qdisc cache */
-    if (cds->qdisc_cache) {
-        nl_cache_clear(cds->qdisc_cache);
-        nl_cache_free(cds->qdisc_cache);
-        cds->qdisc_cache = NULL;
+    if (rs->qdisc_cache) {
+        nl_cache_clear(rs->qdisc_cache);
+        nl_cache_free(rs->qdisc_cache);
+        rs->qdisc_cache = NULL;
     }
 
     /* close & free nlsock */
-    if (cds->nlsock) {
-        nl_close(cds->nlsock);
-        nl_socket_free(cds->nlsock);
-        cds->nlsock = NULL;
+    if (rs->nlsock) {
+        nl_close(rs->nlsock);
+        nl_socket_free(rs->nlsock);
+        rs->nlsock = NULL;
     }
 }
 
@@ -150,13 +153,14 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     int rc, ret, ifindex;
     struct rtnl_link *ifb = NULL;
     struct rtnl_qdisc *qdisc = NULL;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
     /* Now that we have brought up REMUS_IFB device with plug qdisc for
      * this vif, so we need to refill the qdisc cache.
      */
-    ret = nl_cache_refill(cds->nlsock, cds->qdisc_cache);
+    ret = nl_cache_refill(rs->nlsock, rs->qdisc_cache);
     if (ret) {
         LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
         rc = ERROR_FAIL;
@@ -164,7 +168,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
     }
 
     /* get a handle to the REMUS_IFB interface */
-    ret = rtnl_link_get_kernel(cds->nlsock, 0, remus_nic->ifb, &ifb);
+    ret = rtnl_link_get_kernel(rs->nlsock, 0, remus_nic->ifb, &ifb);
     if (ret) {
         LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
             nl_geterror(ret));
@@ -187,7 +191,7 @@ static int init_qdisc(libxl__checkpoint_devices_state *cds,
      * There is no need to explicitly free this qdisc as its just a
      * reference from the qdisc cache we allocated earlier.
      */
-    qdisc = rtnl_qdisc_get_by_parent(cds->qdisc_cache, ifindex, TC_H_ROOT);
+    qdisc = rtnl_qdisc_get_by_parent(rs->qdisc_cache, ifindex, TC_H_ROOT);
     if (qdisc) {
         const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
         /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
@@ -238,11 +242,12 @@ static void setup_async_exec(libxl__checkpoint_device *dev, char *op)
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
     /* Convenience aliases */
-    char *const script = libxl__strdup(gc, cds->netbufscript);
+    char *const script = libxl__strdup(gc, rs->netbufscript);
     const uint32_t domid = cds->domid;
     const int dev_id = remus_nic->devid;
     const char *const vif = remus_nic->vif;
@@ -333,6 +338,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
     const char *out_path_base, *hotplug_error = NULL;
     int rc;
 
@@ -373,7 +379,7 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
 
     if (hotplug_error) {
         LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
-            cds->netbufscript, vif, hotplug_error);
+            rs->netbufscript, vif, hotplug_error);
         rc = ERROR_FAIL;
         goto out;
     }
@@ -444,6 +450,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
                            int buffer_op)
 {
     int rc, ret;
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
 
     STATE_AO_GC(cds->ao);
 
@@ -457,7 +464,7 @@ static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
         goto out;
     }
 
-    ret = rtnl_qdisc_add(cds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+    ret = rtnl_qdisc_add(rs->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
     if (ret) {
         rc = ERROR_FAIL;
         goto out;
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index ba26cde..fdba4ee 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -17,6 +17,46 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+    &remus_device_nic,
+    &remus_device_drbd_disk,
+    NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc)) {
+        rc = init_subkind_nic(cds);
+        if (rc) goto out;
+    }
+
+    rc = init_subkind_drbd_disk(cds);
+    if (rc) goto out;
+
+    rc = 0;
+out:
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+
+    if (libxl__netbuffer_enabled(gc))
+        cleanup_subkind_nic(cds);
+
+    cleanup_subkind_drbd_disk(cds);
+}
+
 /*----- Remus setup and teardown -----*/
 
 static void remus_setup_done(libxl__egc *egc,
@@ -24,10 +64,12 @@ static void remus_setup_done(libxl__egc *egc,
 static void remus_setup_failed(libxl__egc *egc,
                                libxl__checkpoint_devices_state *cds, int rc);
 
-void libxl__remus_setup(libxl__egc *egc, libxl__domain_save_state *dss)
+void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs)
 {
+    libxl__domain_save_state *dss = CONTAINER_OF(rs, *dss, rs);
+
     /* Convenience aliases */
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &rs->cds;
     const libxl_domain_remus_info *const info = dss->remus;
 
     STATE_AO_GC(dss->ao);
@@ -46,6 +88,14 @@ void libxl__remus_setup(libxl__egc *egc, libxl__domain_save_state *dss)
     cds->ao = ao;
     cds->domid = dss->domid;
     cds->callback = remus_setup_done;
+    cds->ops = remus_ops;
+    rs->interval = info->interval;
+
+    if (init_device_subkind(cds)) {
+        LOG(ERROR, "Remus: failed to init device subkind for guest %u",
+            dss->domid);
+        goto out;
+    }
 
     libxl__checkpoint_devices_setup(egc, cds);
     return;
@@ -57,7 +107,7 @@ out:
 static void remus_setup_done(libxl__egc *egc,
                                    libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (!rc) {
@@ -74,13 +124,15 @@ static void remus_setup_done(libxl__egc *egc,
 static void remus_setup_failed(libxl__egc *egc,
                                libxl__checkpoint_devices_state *cds, int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
         LOG(ERROR, "Remus: failed to teardown device after setup failed"
             " for guest with domid %u, rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
@@ -88,28 +140,30 @@ static void remus_teardown_done(libxl__egc *egc,
                                 libxl__checkpoint_devices_state *cds,
                                 int rc);
 void libxl__remus_teardown(libxl__egc *egc,
-                           libxl__domain_save_state *dss,
+                           libxl__remus_state *rs,
                            int rc)
 {
     EGC_GC;
 
     LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
         " teardown Remus devices...", rc);
-    dss->cds.callback = remus_teardown_done;
-    libxl__checkpoint_devices_teardown(egc, &dss->cds);
+    rs->cds.callback = remus_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &rs->cds);
 }
 
 static void remus_teardown_done(libxl__egc *egc,
                                 libxl__checkpoint_devices_state *cds,
                                 int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
         LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
             " rc %d", dss->domid, rc);
 
+    cleanup_device_subkind(cds);
+
     dss->callback(egc, dss, rc);
 }
 
@@ -142,7 +196,7 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     if (!ok)
         goto out;
 
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
     cds->callback = remus_devices_postsuspend_cb;
     libxl__checkpoint_devices_postsuspend(egc, cds);
     return;
@@ -156,7 +210,7 @@ static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          int rc)
 {
     int ok = 0;
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
 
     if (rc)
         goto out;
@@ -174,7 +228,7 @@ void libxl__remus_domain_resume_callback(void *data)
     libxl__domain_save_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
     cds->callback = remus_devices_preresume_cb;
     libxl__checkpoint_devices_preresume(egc, cds);
 }
@@ -184,7 +238,7 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
                                        int rc)
 {
     int ok = 0;
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
     STATE_AO_GC(dss->ao);
 
     if (rc)
@@ -230,7 +284,7 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_save_state *dss, int rc)
 {
     /* Convenience aliases */
-    libxl__checkpoint_devices_state *const cds = &dss->cds;
+    libxl__checkpoint_devices_state *const cds = &dss->rs.cds;
 
     STATE_AO_GC(dss->ao);
 
@@ -252,7 +306,7 @@ static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__checkpoint_devices_state *cds,
                                     int rc)
 {
-    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(cds, *dss, rs.cds);
 
     STATE_AO_GC(dss->ao);
 
@@ -270,9 +324,9 @@ static void remus_devices_commit_cb(libxl__egc *egc,
      */
 
     /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+    rc = libxl__ev_time_register_rel(gc, &dss->rs.checkpoint_timeout,
                                      remus_next_checkpoint,
-                                     dss->interval);
+                                     dss->rs.interval);
 
     if (rc)
         goto out;
@@ -287,7 +341,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
                                   const struct timeval *requested_abs)
 {
     libxl__domain_save_state *dss =
-                            CONTAINER_OF(ev, *dss, checkpoint_timeout);
+                            CONTAINER_OF(ev, *dss, rs.checkpoint_timeout);
 
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 50b897d..a8d8949 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -28,10 +28,11 @@ typedef struct libxl__remus_drbd_disk {
 
 int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds)
 {
+    libxl__remus_state *rs = CONTAINER_OF(cds, *rs, cds);
     STATE_AO_GC(cds->ao);
 
-    cds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
-                                       libxl__xen_script_dir_path());
+    rs->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                      libxl__xen_script_dir_path());
 
     return 0;
 }
@@ -96,6 +97,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     int arraysize, nr = 0, rc;
     const libxl_device_disk *disk = dev->backend_dev;
     libxl__async_exec_state *aes = &dev->aodev.aes;
+    libxl__remus_state *rs = CONTAINER_OF(dev->cds, *rs, cds);
     STATE_AO_GC(dev->cds->ao);
 
     /* setup env & args */
@@ -107,7 +109,7 @@ static void match_async_exec(libxl__egc *egc, libxl__checkpoint_device *dev)
     arraysize = 3;
     nr = 0;
     GCNEW_ARRAY(aes->args, arraysize);
-    aes->args[nr++] = dev->cds->drbd_probe_script;
+    aes->args[nr++] = rs->drbd_probe_script;
     aes->args[nr++] = disk->pdev_path;
     aes->args[nr++] = NULL;
     assert(nr <= arraysize);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
@ 2015-06-08  9:24   ` Andrew Cooper
  2015-06-08  9:37     ` Yang Hongyang
  2015-06-10 14:55   ` Ian Campbell
  1 sibling, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-08  9:24 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram

On 08/06/15 04:43, Yang Hongyang wrote:
> If the err is RECORD_NOT_PROCESSED, and it is an optional record,
> restore will still fail. The patch fix this.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> index 9e27dba..2d2edd3 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>      free(rec->data);
>      rec->data = NULL;
>  
> -    if ( rc == RECORD_NOT_PROCESSED )
> -    {
> -        if ( rec->type & REC_TYPE_OPTIONAL )
> -            DPRINTF("Ignoring optional record %#x (%s)",
> -                    rec->type, rec_type_to_str(rec->type));

You would be best setting rc to 0 here, rather than moving the logic out
of process_record().

~Andrew

> -        else
> -        {
> -            ERROR("Mandatory record %#x (%s) not handled",
> -                  rec->type, rec_type_to_str(rec->type));
> -            rc = -1;
> -        }
> -    }
> -
>      return rc;
>  }
>  
> @@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
>          else
>          {
>              rc = process_record(ctx, &rec);
> -            if ( rc )
> +            if ( rc == RECORD_NOT_PROCESSED )
> +            {
> +                if ( rec.type & REC_TYPE_OPTIONAL )
> +                    DPRINTF("Ignoring optional record %#x (%s)",
> +                            rec.type, rec_type_to_str(rec.type));
> +                else
> +                {
> +                    ERROR("Mandatory record %#x (%s) not handled",
> +                          rec.type, rec_type_to_str(rec.type));
> +                    rc = -1;
> +                    goto err;
> +                }
> +            }
> +            else if ( rc )
>                  goto err;
>          }
>  

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-08  9:24   ` Andrew Cooper
@ 2015-06-08  9:37     ` Yang Hongyang
  2015-06-08  9:39       ` Andrew Cooper
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  9:37 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram



On 06/08/2015 05:24 PM, Andrew Cooper wrote:
> On 08/06/15 04:43, Yang Hongyang wrote:
>> If the err is RECORD_NOT_PROCESSED, and it is an optional record,
>> restore will still fail. The patch fix this.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>   tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
>>   1 file changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>> index 9e27dba..2d2edd3 100644
>> --- a/tools/libxc/xc_sr_restore.c
>> +++ b/tools/libxc/xc_sr_restore.c
>> @@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>       free(rec->data);
>>       rec->data = NULL;
>>
>> -    if ( rc == RECORD_NOT_PROCESSED )
>> -    {
>> -        if ( rec->type & REC_TYPE_OPTIONAL )
>> -            DPRINTF("Ignoring optional record %#x (%s)",
>> -                    rec->type, rec_type_to_str(rec->type));
>
> You would be best setting rc to 0 here, rather than moving the logic out
> of process_record().

There will be another error type in COLO, which indicates a failover, that
needs to be handled in restore(), so I moved the error handling down to
avoid duplex code...Otherwise, in process_record, RECORD_NOT_PROCESSED is
handled, and in restore another error type returned from process_record is
handled...

>
> ~Andrew
>
>> -        else
>> -        {
>> -            ERROR("Mandatory record %#x (%s) not handled",
>> -                  rec->type, rec_type_to_str(rec->type));
>> -            rc = -1;
>> -        }
>> -    }
>> -
>>       return rc;
>>   }
>>
>> @@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
>>           else
>>           {
>>               rc = process_record(ctx, &rec);
>> -            if ( rc )
>> +            if ( rc == RECORD_NOT_PROCESSED )
>> +            {
>> +                if ( rec.type & REC_TYPE_OPTIONAL )
>> +                    DPRINTF("Ignoring optional record %#x (%s)",
>> +                            rec.type, rec_type_to_str(rec.type));
>> +                else
>> +                {
>> +                    ERROR("Mandatory record %#x (%s) not handled",
>> +                          rec.type, rec_type_to_str(rec.type));
>> +                    rc = -1;
>> +                    goto err;
>> +                }
>> +            }
>> +            else if ( rc )
>>                   goto err;
>>           }
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-08  9:37     ` Yang Hongyang
@ 2015-06-08  9:39       ` Andrew Cooper
  0 siblings, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2015-06-08  9:39 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram

On 08/06/15 10:37, Yang Hongyang wrote:
>
>
> On 06/08/2015 05:24 PM, Andrew Cooper wrote:
>> On 08/06/15 04:43, Yang Hongyang wrote:
>>> If the err is RECORD_NOT_PROCESSED, and it is an optional record,
>>> restore will still fail. The patch fix this.
>>>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>> CC: Wei Liu <wei.liu2@citrix.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>   tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
>>>   1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>>> index 9e27dba..2d2edd3 100644
>>> --- a/tools/libxc/xc_sr_restore.c
>>> +++ b/tools/libxc/xc_sr_restore.c
>>> @@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context
>>> *ctx, struct xc_sr_record *rec)
>>>       free(rec->data);
>>>       rec->data = NULL;
>>>
>>> -    if ( rc == RECORD_NOT_PROCESSED )
>>> -    {
>>> -        if ( rec->type & REC_TYPE_OPTIONAL )
>>> -            DPRINTF("Ignoring optional record %#x (%s)",
>>> -                    rec->type, rec_type_to_str(rec->type));
>>
>> You would be best setting rc to 0 here, rather than moving the logic out
>> of process_record().
>
> There will be another error type in COLO, which indicates a failover,
> that
> needs to be handled in restore(), so I moved the error handling down to
> avoid duplex code...Otherwise, in process_record, RECORD_NOT_PROCESSED is
> handled, and in restore another error type returned from
> process_record is
> handled...

Ah ok - I will wait till I get that far through the series.

~Andrew

>
>>
>> ~Andrew
>>
>>> -        else
>>> -        {
>>> -            ERROR("Mandatory record %#x (%s) not handled",
>>> -                  rec->type, rec_type_to_str(rec->type));
>>> -            rc = -1;
>>> -        }
>>> -    }
>>> -
>>>       return rc;
>>>   }
>>>
>>> @@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
>>>           else
>>>           {
>>>               rc = process_record(ctx, &rec);
>>> -            if ( rc )
>>> +            if ( rc == RECORD_NOT_PROCESSED )
>>> +            {
>>> +                if ( rec.type & REC_TYPE_OPTIONAL )
>>> +                    DPRINTF("Ignoring optional record %#x (%s)",
>>> +                            rec.type, rec_type_to_str(rec.type));
>>> +                else
>>> +                {
>>> +                    ERROR("Mandatory record %#x (%s) not handled",
>>> +                          rec.type, rec_type_to_str(rec.type));
>>> +                    rc = -1;
>>> +                    goto err;
>>> +                }
>>> +            }
>>> +            else if ( rc )
>>>                   goto err;
>>>           }
>>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08  3:43 ` [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Yang Hongyang
@ 2015-06-08  9:46   ` Andrew Cooper
  2015-06-08  9:49     ` Andrew Cooper
  2015-06-08  9:58     ` Yang Hongyang
  0 siblings, 2 replies; 106+ messages in thread
From: Andrew Cooper @ 2015-06-08  9:46 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 08/06/15 04:43, Yang Hongyang wrote:
> ioreq page contains evtchn which will be set when we resume the
> secondary vm the first time. The hypervisor will check if the
> evtchn is corrupted, so we cannot zero the ioreq page more
> than one time.
>
> The ioreq->state is always STATE_IOREQ_NONE after the vm is
> suspended, so it is OK if we only zero it one time.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

The issue here is that we are running the restore algorithm over a
domain which has already been running in Xen for a while.  This is a
brand new usecase, as far as I am aware.

Does the qemu process associated with this domain get frozen while the
secondary is being reset, or does the process get destroyed and recreated.

I have a gut feeling that it would be safer to clear all of the page
other than the event channel, but that depends on exactly what else is
going on.  We absolutely don't want to do is have an update to this page
from the primary with an in-progress IOREQ.

~Andrew

> ---
>  tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c
> index 6f5af0e..06177e0 100644
> --- a/tools/libxc/xc_sr_restore_x86_hvm.c
> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c
> @@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
>              break;
>          case HVM_PARAM_IOREQ_PFN:
>          case HVM_PARAM_BUFIOREQ_PFN:
> -            xc_clear_domain_page(xch, ctx->domid, entry->value);
> +            if ( !ctx->restore.buffer_all_records )
> +                xc_clear_domain_page(xch, ctx->domid, entry->value);
>              break;
>          }
>  

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08  9:46   ` Andrew Cooper
@ 2015-06-08  9:49     ` Andrew Cooper
  2015-06-08  9:58     ` Yang Hongyang
  1 sibling, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2015-06-08  9:49 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 08/06/15 10:46, Andrew Cooper wrote:
> On 08/06/15 04:43, Yang Hongyang wrote:
>> ioreq page contains evtchn which will be set when we resume the
>> secondary vm the first time. The hypervisor will check if the
>> evtchn is corrupted, so we cannot zero the ioreq page more
>> than one time.
>>
>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>> suspended, so it is OK if we only zero it one time.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> The issue here is that we are running the restore algorithm over a
> domain which has already been running in Xen for a while.  This is a
> brand new usecase, as far as I am aware.
>
> Does the qemu process associated with this domain get frozen while the
> secondary is being reset, or does the process get destroyed and recreated.
>
> I have a gut feeling that it would be safer to clear all of the page
> other than the event channel, but that depends on exactly what else is
> going on.  We absolutely don't want to do is have an update to this page
> from the primary with an in-progress IOREQ.

Or actually worse, an update from the primary with a different event
channel in it.  There is no requirement or guarantee that the bufioreq
event channels are the same on either side.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08  9:46   ` Andrew Cooper
  2015-06-08  9:49     ` Andrew Cooper
@ 2015-06-08  9:58     ` Yang Hongyang
  2015-06-08 10:15       ` Andrew Cooper
  1 sibling, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08  9:58 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> On 08/06/15 04:43, Yang Hongyang wrote:
>> ioreq page contains evtchn which will be set when we resume the
>> secondary vm the first time. The hypervisor will check if the
>> evtchn is corrupted, so we cannot zero the ioreq page more
>> than one time.
>>
>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>> suspended, so it is OK if we only zero it one time.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>
> The issue here is that we are running the restore algorithm over a
> domain which has already been running in Xen for a while.  This is a
> brand new usecase, as far as I am aware.

Exactly.

>
> Does the qemu process associated with this domain get frozen while the
> secondary is being reset, or does the process get destroyed and recreated.

What do you mean by reset? do you mean secondary is suspended at checkpoint?

>
> I have a gut feeling that it would be safer to clear all of the page
> other than the event channel, but that depends on exactly what else is
> going on.  We absolutely don't want to do is have an update to this page
> from the primary with an in-progress IOREQ.
>
> ~Andrew
>
>> ---
>>   tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c
>> index 6f5af0e..06177e0 100644
>> --- a/tools/libxc/xc_sr_restore_x86_hvm.c
>> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c
>> @@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
>>               break;
>>           case HVM_PARAM_IOREQ_PFN:
>>           case HVM_PARAM_BUFIOREQ_PFN:
>> -            xc_clear_domain_page(xch, ctx->domid, entry->value);
>> +            if ( !ctx->restore.buffer_all_records )
>> +                xc_clear_domain_page(xch, ctx->domid, entry->value);
>>               break;
>>           }
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-08  3:43 ` [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h Yang Hongyang
@ 2015-06-08 10:04   ` Yang Hongyang
  2015-06-10 15:20   ` Ian Campbell
  1 sibling, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-08 10:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Just to note that xc_bitops.h needs cleanup as Andy pointed out in v1...
It will done in v3.

On 06/08/2015 11:43 AM, Yang Hongyang wrote:
> When we are under COLO, we will send dirty page bitmap info from
> secondary to primary at every checkpoint. So we need to get/test
> the dirty page bitmap. We just expose xc_bitops.h for libxl use.
>
> NOTE:
>    Need to make clean and rerun configure to get it compiled.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>   tools/libxc/include/xc_bitops.h | 76 +++++++++++++++++++++++++++++++++++++++++
>   tools/libxc/xc_bitops.h         | 76 -----------------------------------------
>   2 files changed, 76 insertions(+), 76 deletions(-)
>   create mode 100644 tools/libxc/include/xc_bitops.h
>   delete mode 100644 tools/libxc/xc_bitops.h
>
> diff --git a/tools/libxc/include/xc_bitops.h b/tools/libxc/include/xc_bitops.h
> new file mode 100644
> index 0000000..cd749f4
> --- /dev/null
> +++ b/tools/libxc/include/xc_bitops.h
> @@ -0,0 +1,76 @@
> +#ifndef XC_BITOPS_H
> +#define XC_BITOPS_H 1
> +
> +/* bitmap operations for single threaded access */
> +
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> +#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> +
> +#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> +#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> +
> +/* calculate required space for number of longs needed to hold nr_bits */
> +static inline int bitmap_size(int nr_bits)
> +{
> +    int nr_long, nr_bytes;
> +    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> +    nr_bytes = nr_long * sizeof(unsigned long);
> +    return nr_bytes;
> +}
> +
> +static inline unsigned long *bitmap_alloc(int nr_bits)
> +{
> +    return calloc(1, bitmap_size(nr_bits));
> +}
> +
> +static inline void bitmap_set(unsigned long *addr, int nr_bits)
> +{
> +    memset(addr, 0xff, bitmap_size(nr_bits));
> +}
> +
> +static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> +{
> +    memset(addr, 0, bitmap_size(nr_bits));
> +}
> +
> +static inline int test_bit(int nr, unsigned long *addr)
> +{
> +    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> +}
> +
> +static inline void clear_bit(int nr, unsigned long *addr)
> +{
> +    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> +}
> +
> +static inline void set_bit(int nr, unsigned long *addr)
> +{
> +    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> +}
> +
> +static inline int test_and_clear_bit(int nr, unsigned long *addr)
> +{
> +    int oldbit = test_bit(nr, addr);
> +    clear_bit(nr, addr);
> +    return oldbit;
> +}
> +
> +static inline int test_and_set_bit(int nr, unsigned long *addr)
> +{
> +    int oldbit = test_bit(nr, addr);
> +    set_bit(nr, addr);
> +    return oldbit;
> +}
> +
> +static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> +                             int nr_bits)
> +{
> +    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> +    for ( i = 0; i < nr_longs; ++i )
> +        dst[i] |= other[i];
> +}
> +
> +#endif  /* XC_BITOPS_H */
> diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
> deleted file mode 100644
> index cd749f4..0000000
> --- a/tools/libxc/xc_bitops.h
> +++ /dev/null
> @@ -1,76 +0,0 @@
> -#ifndef XC_BITOPS_H
> -#define XC_BITOPS_H 1
> -
> -/* bitmap operations for single threaded access */
> -
> -#include <stdlib.h>
> -#include <string.h>
> -
> -#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> -#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> -
> -#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> -#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> -
> -/* calculate required space for number of longs needed to hold nr_bits */
> -static inline int bitmap_size(int nr_bits)
> -{
> -    int nr_long, nr_bytes;
> -    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> -    nr_bytes = nr_long * sizeof(unsigned long);
> -    return nr_bytes;
> -}
> -
> -static inline unsigned long *bitmap_alloc(int nr_bits)
> -{
> -    return calloc(1, bitmap_size(nr_bits));
> -}
> -
> -static inline void bitmap_set(unsigned long *addr, int nr_bits)
> -{
> -    memset(addr, 0xff, bitmap_size(nr_bits));
> -}
> -
> -static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> -{
> -    memset(addr, 0, bitmap_size(nr_bits));
> -}
> -
> -static inline int test_bit(int nr, unsigned long *addr)
> -{
> -    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> -}
> -
> -static inline void clear_bit(int nr, unsigned long *addr)
> -{
> -    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> -}
> -
> -static inline void set_bit(int nr, unsigned long *addr)
> -{
> -    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> -}
> -
> -static inline int test_and_clear_bit(int nr, unsigned long *addr)
> -{
> -    int oldbit = test_bit(nr, addr);
> -    clear_bit(nr, addr);
> -    return oldbit;
> -}
> -
> -static inline int test_and_set_bit(int nr, unsigned long *addr)
> -{
> -    int oldbit = test_bit(nr, addr);
> -    set_bit(nr, addr);
> -    return oldbit;
> -}
> -
> -static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> -                             int nr_bits)
> -{
> -    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> -    for ( i = 0; i < nr_longs; ++i )
> -        dst[i] |= other[i];
> -}
> -
> -#endif  /* XC_BITOPS_H */
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08  9:58     ` Yang Hongyang
@ 2015-06-08 10:15       ` Andrew Cooper
  2015-06-09  0:59         ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-08 10:15 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 08/06/15 10:58, Yang Hongyang wrote:
>
>
> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>> On 08/06/15 04:43, Yang Hongyang wrote:
>>> ioreq page contains evtchn which will be set when we resume the
>>> secondary vm the first time. The hypervisor will check if the
>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>> than one time.
>>>
>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>> suspended, so it is OK if we only zero it one time.
>>>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> The issue here is that we are running the restore algorithm over a
>> domain which has already been running in Xen for a while.  This is a
>> brand new usecase, as far as I am aware.
>
> Exactly.
>
>>
>> Does the qemu process associated with this domain get frozen while the
>> secondary is being reset, or does the process get destroyed and
>> recreated.
>
> What do you mean by reset? do you mean secondary is suspended at
> checkpoint?

Well - at the point that the buffered records are being processed, we
are in the process of resetting the state of the secondary to match the
primary.

~Andrew

>
>>
>> I have a gut feeling that it would be safer to clear all of the page
>> other than the event channel, but that depends on exactly what else is
>> going on.  We absolutely don't want to do is have an update to this page
>> from the primary with an in-progress IOREQ.
>>
>> ~Andrew
>>
>>> ---
>>>   tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c
>>> b/tools/libxc/xc_sr_restore_x86_hvm.c
>>> index 6f5af0e..06177e0 100644
>>> --- a/tools/libxc/xc_sr_restore_x86_hvm.c
>>> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c
>>> @@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context
>>> *ctx,
>>>               break;
>>>           case HVM_PARAM_IOREQ_PFN:
>>>           case HVM_PARAM_BUFIOREQ_PFN:
>>> -            xc_clear_domain_page(xch, ctx->domid, entry->value);
>>> +            if ( !ctx->restore.buffer_all_records )
>>> +                xc_clear_domain_page(xch, ctx->domid, entry->value);
>>>               break;
>>>           }
>>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-08 10:15       ` Andrew Cooper
@ 2015-06-09  0:59         ` Yang Hongyang
  2015-06-09  7:30           ` Andrew Cooper
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-09  0:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> On 08/06/15 10:58, Yang Hongyang wrote:
>>
>>
>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>> ioreq page contains evtchn which will be set when we resume the
>>>> secondary vm the first time. The hypervisor will check if the
>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>> than one time.
>>>>
>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>> suspended, so it is OK if we only zero it one time.
>>>>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> The issue here is that we are running the restore algorithm over a
>>> domain which has already been running in Xen for a while.  This is a
>>> brand new usecase, as far as I am aware.
>>
>> Exactly.
>>
>>>
>>> Does the qemu process associated with this domain get frozen while the
>>> secondary is being reset, or does the process get destroyed and
>>> recreated.
>>
>> What do you mean by reset? do you mean secondary is suspended at
>> checkpoint?
>
> Well - at the point that the buffered records are being processed, we
> are in the process of resetting the state of the secondary to match the
> primary.

Yes, at this point, the qemu process associated with this domain is frozen.
the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to pause
qemu. After we processed all records, qemu will be restored with the received
state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in qemu)
api to restore qemu with received state. Currently in libxl, qemu only start
with the received state, there's no api to load received state while qemu is
running for a while.

>
> ~Andrew
>
>>
>>>
>>> I have a gut feeling that it would be safer to clear all of the page
>>> other than the event channel, but that depends on exactly what else is
>>> going on.  We absolutely don't want to do is have an update to this page
>>> from the primary with an in-progress IOREQ.
>>>
>>> ~Andrew
>>>
>>>> ---
>>>>    tools/libxc/xc_sr_restore_x86_hvm.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c
>>>> b/tools/libxc/xc_sr_restore_x86_hvm.c
>>>> index 6f5af0e..06177e0 100644
>>>> --- a/tools/libxc/xc_sr_restore_x86_hvm.c
>>>> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c
>>>> @@ -78,7 +78,8 @@ static int handle_hvm_params(struct xc_sr_context
>>>> *ctx,
>>>>                break;
>>>>            case HVM_PARAM_IOREQ_PFN:
>>>>            case HVM_PARAM_BUFIOREQ_PFN:
>>>> -            xc_clear_domain_page(xch, ctx->domid, entry->value);
>>>> +            if ( !ctx->restore.buffer_all_records )
>>>> +                xc_clear_domain_page(xch, ctx->domid, entry->value);
>>>>                break;
>>>>            }
>>>>
>>>
>>> .
>>>
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-09  0:59         ` Yang Hongyang
@ 2015-06-09  7:30           ` Andrew Cooper
  2015-06-10  5:26             ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-09  7:30 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 09/06/2015 01:59, Yang Hongyang wrote:
>
>
> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>
>>>
>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>> secondary vm the first time. The hypervisor will check if the
>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>> than one time.
>>>>>
>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>> suspended, so it is OK if we only zero it one time.
>>>>>
>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>>> The issue here is that we are running the restore algorithm over a
>>>> domain which has already been running in Xen for a while.  This is a
>>>> brand new usecase, as far as I am aware.
>>>
>>> Exactly.
>>>
>>>>
>>>> Does the qemu process associated with this domain get frozen while the
>>>> secondary is being reset, or does the process get destroyed and
>>>> recreated.
>>>
>>> What do you mean by reset? do you mean secondary is suspended at
>>> checkpoint?
>>
>> Well - at the point that the buffered records are being processed, we
>> are in the process of resetting the state of the secondary to match the
>> primary.
>
> Yes, at this point, the qemu process associated with this domain is
> frozen.
> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
> pause
> qemu. After we processed all records, qemu will be restored with the
> received
> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
> qemu)
> api to restore qemu with received state. Currently in libxl, qemu only
> start
> with the received state, there's no api to load received state while
> qemu is
> running for a while.

Now I consider this more, it is absolutely wrong to not zero the page
here.  The event channel in the page is not guaranteed to be the same
between the primary and secondary, and we don't want to unexpectedly
find a pending/in-flight ioreq.

Either qemu needs to take care of re-initialising the event channels
back to appropriate values, or Xen should tolerate the channels
disappearing.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-09  7:30           ` Andrew Cooper
@ 2015-06-10  5:26             ` Yang Hongyang
  2015-06-10  7:44               ` Andrew Cooper
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-10  5:26 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> On 09/06/2015 01:59, Yang Hongyang wrote:
>>
>>
>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>
>>>>
>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>> than one time.
>>>>>>
>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>
>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>
>>>>> The issue here is that we are running the restore algorithm over a
>>>>> domain which has already been running in Xen for a while.  This is a
>>>>> brand new usecase, as far as I am aware.
>>>>
>>>> Exactly.
>>>>
>>>>>
>>>>> Does the qemu process associated with this domain get frozen while the
>>>>> secondary is being reset, or does the process get destroyed and
>>>>> recreated.
>>>>
>>>> What do you mean by reset? do you mean secondary is suspended at
>>>> checkpoint?
>>>
>>> Well - at the point that the buffered records are being processed, we
>>> are in the process of resetting the state of the secondary to match the
>>> primary.
>>
>> Yes, at this point, the qemu process associated with this domain is
>> frozen.
>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>> pause
>> qemu. After we processed all records, qemu will be restored with the
>> received
>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
>> qemu)
>> api to restore qemu with received state. Currently in libxl, qemu only
>> start
>> with the received state, there's no api to load received state while
>> qemu is
>> running for a while.
>
> Now I consider this more, it is absolutely wrong to not zero the page
> here.  The event channel in the page is not guaranteed to be the same
> between the primary and secondary,

That's why we don't zero it on secondary.

> and we don't want to unexpectedly
> find a pending/in-flight ioreq.

ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
should be no pending/in-flight ioreq at checkpoint.

>
> Either qemu needs to take care of re-initialising the event channels
> back to appropriate values, or Xen should tolerate the channels
> disappearing.
>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10  5:26             ` Yang Hongyang
@ 2015-06-10  7:44               ` Andrew Cooper
  2015-06-10  9:06                 ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-10  7:44 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 10/06/2015 06:26, Yang Hongyang wrote:
>
>
> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>
>>>
>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>
>>>>>
>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>> than one time.
>>>>>>>
>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>
>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>
>>>>>> The issue here is that we are running the restore algorithm over a
>>>>>> domain which has already been running in Xen for a while.  This is a
>>>>>> brand new usecase, as far as I am aware.
>>>>>
>>>>> Exactly.
>>>>>
>>>>>>
>>>>>> Does the qemu process associated with this domain get frozen
>>>>>> while the
>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>> recreated.
>>>>>
>>>>> What do you mean by reset? do you mean secondary is suspended at
>>>>> checkpoint?
>>>>
>>>> Well - at the point that the buffered records are being processed, we
>>>> are in the process of resetting the state of the secondary to match
>>>> the
>>>> primary.
>>>
>>> Yes, at this point, the qemu process associated with this domain is
>>> frozen.
>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>>> pause
>>> qemu. After we processed all records, qemu will be restored with the
>>> received
>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
>>> qemu)
>>> api to restore qemu with received state. Currently in libxl, qemu only
>>> start
>>> with the received state, there's no api to load received state while
>>> qemu is
>>> running for a while.
>>
>> Now I consider this more, it is absolutely wrong to not zero the page
>> here.  The event channel in the page is not guaranteed to be the same
>> between the primary and secondary,
>
> That's why we don't zero it on secondary.

I think you missed my point.  Apologies for the double negative.   It
must, under all circumstances, be zeroed at this point, for safety reasons.

The page in question is subject to logdirty just like any other guest
pages, which means that if the guest writes to it naturally (i.e. not a
Xen or Qemu write, both of whom have magic mappings which are not
subject to logdirty), it will be transmitted in the stream.  As the
event channel could be different, the lack of zeroing it at this point
means that the event channel would be wrong as opposed to simply
missing.  This is a worse position to be in.

>
>> and we don't want to unexpectedly
>> find a pending/in-flight ioreq.
>
> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
> should be no pending/in-flight ioreq at checkpoint.

In the common case perhaps, but we must consider the exceptional case. 
The exceptional case here is some corruption which happens to appear as
an in-flight ioreq.

>
>>
>> Either qemu needs to take care of re-initialising the event channels
>> back to appropriate values, or Xen should tolerate the channels
>> disappearing.

I still stand by this statement.  I believe it is the only safe way of
solving the issue you have discovered.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10  7:44               ` Andrew Cooper
@ 2015-06-10  9:06                 ` Wen Congyang
  2015-06-10 10:08                   ` Andrew Cooper
  2015-06-10 10:40                   ` Paul Durrant
  0 siblings, 2 replies; 106+ messages in thread
From: Wen Congyang @ 2015-06-10  9:06 UTC (permalink / raw)
  To: Andrew Cooper, Yang Hongyang, xen-devel, Paul Durrant
  Cc: wei.liu2, ian.campbell, guijianfeng, yunhong.jiang, eddie.dong,
	rshriram, ian.jackson

Cc: Paul Durrant

On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> On 10/06/2015 06:26, Yang Hongyang wrote:
>>
>>
>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>
>>>>
>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>
>>>>>>
>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>> than one time.
>>>>>>>>
>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>
>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>
>>>>>>> The issue here is that we are running the restore algorithm over a
>>>>>>> domain which has already been running in Xen for a while.  This is a
>>>>>>> brand new usecase, as far as I am aware.
>>>>>>
>>>>>> Exactly.
>>>>>>
>>>>>>>
>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>> while the
>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>> recreated.
>>>>>>
>>>>>> What do you mean by reset? do you mean secondary is suspended at
>>>>>> checkpoint?
>>>>>
>>>>> Well - at the point that the buffered records are being processed, we
>>>>> are in the process of resetting the state of the secondary to match
>>>>> the
>>>>> primary.
>>>>
>>>> Yes, at this point, the qemu process associated with this domain is
>>>> frozen.
>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>>>> pause
>>>> qemu. After we processed all records, qemu will be restored with the
>>>> received
>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
>>>> qemu)
>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>> start
>>>> with the received state, there's no api to load received state while
>>>> qemu is
>>>> running for a while.
>>>
>>> Now I consider this more, it is absolutely wrong to not zero the page
>>> here.  The event channel in the page is not guaranteed to be the same
>>> between the primary and secondary,
>>
>> That's why we don't zero it on secondary.
> 
> I think you missed my point.  Apologies for the double negative.   It
> must, under all circumstances, be zeroed at this point, for safety reasons.
> 
> The page in question is subject to logdirty just like any other guest
> pages, which means that if the guest writes to it naturally (i.e. not a
> Xen or Qemu write, both of whom have magic mappings which are not
> subject to logdirty), it will be transmitted in the stream.  As the
> event channel could be different, the lack of zeroing it at this point
> means that the event channel would be wrong as opposed to simply
> missing.  This is a worse position to be in.

The guest should not access this page. I am not sure if the guest can
access the ioreq page.

But in the exceptional case, the ioreq page is dirtied, and is copied to
the secondary vm. The ioreq page will contain a wrong event channel, the
hypervisor will check it: if the event channel is wrong, the guest will
be crashed.

> 
>>
>>> and we don't want to unexpectedly
>>> find a pending/in-flight ioreq.
>>
>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
>> should be no pending/in-flight ioreq at checkpoint.
> 
> In the common case perhaps, but we must consider the exceptional case. 
> The exceptional case here is some corruption which happens to appear as
> an in-flight ioreq.

If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the hypervisor
has a bug, anything can happen. I think we should trust the hypervisor.

> 
>>
>>>
>>> Either qemu needs to take care of re-initialising the event channels
>>> back to appropriate values, or Xen should tolerate the channels
>>> disappearing.
> 
> I still stand by this statement.  I believe it is the only safe way of
> solving the issue you have discovered.

Add a new qemu monitor command to update ioreq page?

Thanks
Wen Congyang

> 
> ~Andrew
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10  9:06                 ` Wen Congyang
@ 2015-06-10 10:08                   ` Andrew Cooper
  2015-06-10 10:35                     ` Paul Durrant
  2015-06-10 10:40                   ` Paul Durrant
  1 sibling, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-10 10:08 UTC (permalink / raw)
  To: Wen Congyang, Yang Hongyang, xen-devel, Paul Durrant
  Cc: wei.liu2, ian.campbell, guijianfeng, yunhong.jiang, eddie.dong,
	rshriram, ian.jackson

On 10/06/15 10:06, Wen Congyang wrote:
> Cc: Paul Durrant
>
> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>
>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>
>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>
>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>> than one time.
>>>>>>>>>
>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>> The issue here is that we are running the restore algorithm over a
>>>>>>>> domain which has already been running in Xen for a while.  This is a
>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>> Exactly.
>>>>>>>
>>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>>> while the
>>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>>> recreated.
>>>>>>> What do you mean by reset? do you mean secondary is suspended at
>>>>>>> checkpoint?
>>>>>> Well - at the point that the buffered records are being processed, we
>>>>>> are in the process of resetting the state of the secondary to match
>>>>>> the
>>>>>> primary.
>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>> frozen.
>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>>>>> pause
>>>>> qemu. After we processed all records, qemu will be restored with the
>>>>> received
>>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
>>>>> qemu)
>>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>>> start
>>>>> with the received state, there's no api to load received state while
>>>>> qemu is
>>>>> running for a while.
>>>> Now I consider this more, it is absolutely wrong to not zero the page
>>>> here.  The event channel in the page is not guaranteed to be the same
>>>> between the primary and secondary,
>>> That's why we don't zero it on secondary.
>> I think you missed my point.  Apologies for the double negative.   It
>> must, under all circumstances, be zeroed at this point, for safety reasons.
>>
>> The page in question is subject to logdirty just like any other guest
>> pages, which means that if the guest writes to it naturally (i.e. not a
>> Xen or Qemu write, both of whom have magic mappings which are not
>> subject to logdirty), it will be transmitted in the stream.  As the
>> event channel could be different, the lack of zeroing it at this point
>> means that the event channel would be wrong as opposed to simply
>> missing.  This is a worse position to be in.
> The guest should not access this page. I am not sure if the guest can
> access the ioreq page.

"should not" and "can't" are two very different things.  We have had
XSAs covering the fact that the guest can write to these pages in the past.

In practice, a guest can't actually query the appropriate hvmparam, but
it can rely on the fact that the domain builder is incredibly
predictable in this regard.

>
> But in the exceptional case, the ioreq page is dirtied, and is copied to
> the secondary vm. The ioreq page will contain a wrong event channel, the
> hypervisor will check it: if the event channel is wrong, the guest will
> be crashed.

This is my point.  It is completely legitimate for the event channels to
be different between the primary and secondary, which means that we
should be capable of dealing cleanly with the fallout when the bufioreq
page does appear as dirty update.

>
>>>> and we don't want to unexpectedly
>>>> find a pending/in-flight ioreq.
>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
>>> should be no pending/in-flight ioreq at checkpoint.
>> In the common case perhaps, but we must consider the exceptional case. 
>> The exceptional case here is some corruption which happens to appear as
>> an in-flight ioreq.
> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the hypervisor
> has a bug, anything can happen. I think we should trust the hypervisor.

In the worst case, the contents of the pages can be completely
arbitrary.  Zeroing of the pages is to cover the case where there is
junk present, so Xen doesn't crash the guest due to a bad ioreq state.

I think Xen's behaviour is legitimate here.  If it observes wonky ioreq
state, all bets are off.

>
>>>> Either qemu needs to take care of re-initialising the event channels
>>>> back to appropriate values, or Xen should tolerate the channels
>>>> disappearing.
>> I still stand by this statement.  I believe it is the only safe way of
>> solving the issue you have discovered.
> Add a new qemu monitor command to update ioreq page?

Who/what actually complains about the event channel?  I can't see any
event channels in the ABI for the pages.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 10:08                   ` Andrew Cooper
@ 2015-06-10 10:35                     ` Paul Durrant
  0 siblings, 0 replies; 106+ messages in thread
From: Paul Durrant @ 2015-06-10 10:35 UTC (permalink / raw)
  To: Andrew Cooper, Wen Congyang, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 10 June 2015 11:08
> To: Wen Congyang; Yang Hongyang; xen-devel@lists.xen.org; Paul Durrant
> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 10/06/15 10:06, Wen Congyang wrote:
> > Cc: Paul Durrant
> >
> > On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>
> >>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>
> >>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>
> >>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>> ioreq page contains evtchn which will be set when we resume
> the
> >>>>>>>>> secondary vm the first time. The hypervisor will check if the
> >>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
> >>>>>>>>> than one time.
> >>>>>>>>>
> >>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> >>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>> The issue here is that we are running the restore algorithm over a
> >>>>>>>> domain which has already been running in Xen for a while.  This is
> a
> >>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>> Exactly.
> >>>>>>>
> >>>>>>>> Does the qemu process associated with this domain get frozen
> >>>>>>>> while the
> >>>>>>>> secondary is being reset, or does the process get destroyed and
> >>>>>>>> recreated.
> >>>>>>> What do you mean by reset? do you mean secondary is suspended
> at
> >>>>>>> checkpoint?
> >>>>>> Well - at the point that the buffered records are being processed,
> we
> >>>>>> are in the process of resetting the state of the secondary to match
> >>>>>> the
> >>>>>> primary.
> >>>>> Yes, at this point, the qemu process associated with this domain is
> >>>>> frozen.
> >>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
> >>>>> pause
> >>>>> qemu. After we processed all records, qemu will be restored with the
> >>>>> received
> >>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate()
> in
> >>>>> qemu)
> >>>>> api to restore qemu with received state. Currently in libxl, qemu only
> >>>>> start
> >>>>> with the received state, there's no api to load received state while
> >>>>> qemu is
> >>>>> running for a while.
> >>>> Now I consider this more, it is absolutely wrong to not zero the page
> >>>> here.  The event channel in the page is not guaranteed to be the same
> >>>> between the primary and secondary,
> >>> That's why we don't zero it on secondary.
> >> I think you missed my point.  Apologies for the double negative.   It
> >> must, under all circumstances, be zeroed at this point, for safety reasons.
> >>
> >> The page in question is subject to logdirty just like any other guest
> >> pages, which means that if the guest writes to it naturally (i.e. not a
> >> Xen or Qemu write, both of whom have magic mappings which are not
> >> subject to logdirty), it will be transmitted in the stream.  As the
> >> event channel could be different, the lack of zeroing it at this point
> >> means that the event channel would be wrong as opposed to simply
> >> missing.  This is a worse position to be in.
> > The guest should not access this page. I am not sure if the guest can
> > access the ioreq page.
> 
> "should not" and "can't" are two very different things.  We have had
> XSAs covering the fact that the guest can write to these pages in the past.
> 
> In practice, a guest can't actually query the appropriate hvmparam, but
> it can rely on the fact that the domain builder is incredibly
> predictable in this regard.
> 
> >
> > But in the exceptional case, the ioreq page is dirtied, and is copied to
> > the secondary vm. The ioreq page will contain a wrong event channel, the
> > hypervisor will check it: if the event channel is wrong, the guest will
> > be crashed.
> 
> This is my point.  It is completely legitimate for the event channels to
> be different between the primary and secondary, which means that we
> should be capable of dealing cleanly with the fallout when the bufioreq
> page does appear as dirty update.
> 
> >
> >>>> and we don't want to unexpectedly
> >>>> find a pending/in-flight ioreq.
> >>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
> there
> >>> should be no pending/in-flight ioreq at checkpoint.
> >> In the common case perhaps, but we must consider the exceptional case.
> >> The exceptional case here is some corruption which happens to appear as
> >> an in-flight ioreq.
> > If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> hypervisor
> > has a bug, anything can happen. I think we should trust the hypervisor.
> 
> In the worst case, the contents of the pages can be completely
> arbitrary.  Zeroing of the pages is to cover the case where there is
> junk present, so Xen doesn't crash the guest due to a bad ioreq state.
> 
> I think Xen's behaviour is legitimate here.  If it observes wonky ioreq
> state, all bets are off.
> 
> >
> >>>> Either qemu needs to take care of re-initialising the event channels
> >>>> back to appropriate values, or Xen should tolerate the channels
> >>>> disappearing.
> >> I still stand by this statement.  I believe it is the only safe way of
> >> solving the issue you have discovered.
> > Add a new qemu monitor command to update ioreq page?
> 
> Who/what actually complains about the event channel?  I can't see any
> event channels in the ABI for the pages.
> 

QEMU only samples the event channels from the shared ioreq page on startup as it does not expect them to change in its lifetime.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10  9:06                 ` Wen Congyang
  2015-06-10 10:08                   ` Andrew Cooper
@ 2015-06-10 10:40                   ` Paul Durrant
  2015-06-10 10:54                     ` Wen Congyang
  1 sibling, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-10 10:40 UTC (permalink / raw)
  To: Wen Congyang, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 10 June 2015 10:06
> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul Durrant
> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> Cc: Paul Durrant
> 
> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> > On 10/06/2015 06:26, Yang Hongyang wrote:
> >>
> >>
> >> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>
> >>>>
> >>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>> ioreq page contains evtchn which will be set when we resume the
> >>>>>>>> secondary vm the first time. The hypervisor will check if the
> >>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
> >>>>>>>> than one time.
> >>>>>>>>
> >>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> >>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>
> >>>>>>> The issue here is that we are running the restore algorithm over a
> >>>>>>> domain which has already been running in Xen for a while.  This is a
> >>>>>>> brand new usecase, as far as I am aware.
> >>>>>>
> >>>>>> Exactly.
> >>>>>>
> >>>>>>>
> >>>>>>> Does the qemu process associated with this domain get frozen
> >>>>>>> while the
> >>>>>>> secondary is being reset, or does the process get destroyed and
> >>>>>>> recreated.
> >>>>>>
> >>>>>> What do you mean by reset? do you mean secondary is suspended
> at
> >>>>>> checkpoint?
> >>>>>
> >>>>> Well - at the point that the buffered records are being processed, we
> >>>>> are in the process of resetting the state of the secondary to match
> >>>>> the
> >>>>> primary.
> >>>>
> >>>> Yes, at this point, the qemu process associated with this domain is
> >>>> frozen.
> >>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
> >>>> pause
> >>>> qemu. After we processed all records, qemu will be restored with the
> >>>> received
> >>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate()
> in
> >>>> qemu)
> >>>> api to restore qemu with received state. Currently in libxl, qemu only
> >>>> start
> >>>> with the received state, there's no api to load received state while
> >>>> qemu is
> >>>> running for a while.
> >>>
> >>> Now I consider this more, it is absolutely wrong to not zero the page
> >>> here.  The event channel in the page is not guaranteed to be the same
> >>> between the primary and secondary,
> >>
> >> That's why we don't zero it on secondary.
> >
> > I think you missed my point.  Apologies for the double negative.   It
> > must, under all circumstances, be zeroed at this point, for safety reasons.
> >
> > The page in question is subject to logdirty just like any other guest
> > pages, which means that if the guest writes to it naturally (i.e. not a
> > Xen or Qemu write, both of whom have magic mappings which are not
> > subject to logdirty), it will be transmitted in the stream.  As the
> > event channel could be different, the lack of zeroing it at this point
> > means that the event channel would be wrong as opposed to simply
> > missing.  This is a worse position to be in.
> 
> The guest should not access this page. I am not sure if the guest can
> access the ioreq page.
> 
> But in the exceptional case, the ioreq page is dirtied, and is copied to
> the secondary vm. The ioreq page will contain a wrong event channel, the
> hypervisor will check it: if the event channel is wrong, the guest will
> be crashed.
> 
> >
> >>
> >>> and we don't want to unexpectedly
> >>> find a pending/in-flight ioreq.
> >>
> >> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
> there
> >> should be no pending/in-flight ioreq at checkpoint.
> >
> > In the common case perhaps, but we must consider the exceptional case.
> > The exceptional case here is some corruption which happens to appear as
> > an in-flight ioreq.
> 
> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> hypervisor
> has a bug, anything can happen. I think we should trust the hypervisor.
> 
> >
> >>
> >>>
> >>> Either qemu needs to take care of re-initialising the event channels
> >>> back to appropriate values, or Xen should tolerate the channels
> >>> disappearing.
> >
> > I still stand by this statement.  I believe it is the only safe way of
> > solving the issue you have discovered.
> 
> Add a new qemu monitor command to update ioreq page?
>

If you're attaching to a 'new' VM (i.e one with an updated image) then I suspect you're going to have to destroy and re-create the ioreq server so that the shared page gets re-populated with the correct event channels. Either that or you're going to have to ensure that the page is not part of restored image and sample the new one that Xen should have set up.

  Paul 

 
> Thanks
> Wen Congyang
> 
> >
> > ~Andrew
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 10:40                   ` Paul Durrant
@ 2015-06-10 10:54                     ` Wen Congyang
  2015-06-10 10:58                       ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-10 10:54 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/10/2015 06:40 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 10 June 2015 10:06
>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul Durrant
>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> Cc: Paul Durrant
>>
>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>
>>>>
>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>
>>>>>>
>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>>> than one time.
>>>>>>>>>>
>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>
>>>>>>>>> The issue here is that we are running the restore algorithm over a
>>>>>>>>> domain which has already been running in Xen for a while.  This is a
>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>
>>>>>>>> Exactly.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>>>> while the
>>>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>>>> recreated.
>>>>>>>>
>>>>>>>> What do you mean by reset? do you mean secondary is suspended
>> at
>>>>>>>> checkpoint?
>>>>>>>
>>>>>>> Well - at the point that the buffered records are being processed, we
>>>>>>> are in the process of resetting the state of the secondary to match
>>>>>>> the
>>>>>>> primary.
>>>>>>
>>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>>> frozen.
>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>>>>>> pause
>>>>>> qemu. After we processed all records, qemu will be restored with the
>>>>>> received
>>>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate()
>> in
>>>>>> qemu)
>>>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>>>> start
>>>>>> with the received state, there's no api to load received state while
>>>>>> qemu is
>>>>>> running for a while.
>>>>>
>>>>> Now I consider this more, it is absolutely wrong to not zero the page
>>>>> here.  The event channel in the page is not guaranteed to be the same
>>>>> between the primary and secondary,
>>>>
>>>> That's why we don't zero it on secondary.
>>>
>>> I think you missed my point.  Apologies for the double negative.   It
>>> must, under all circumstances, be zeroed at this point, for safety reasons.
>>>
>>> The page in question is subject to logdirty just like any other guest
>>> pages, which means that if the guest writes to it naturally (i.e. not a
>>> Xen or Qemu write, both of whom have magic mappings which are not
>>> subject to logdirty), it will be transmitted in the stream.  As the
>>> event channel could be different, the lack of zeroing it at this point
>>> means that the event channel would be wrong as opposed to simply
>>> missing.  This is a worse position to be in.
>>
>> The guest should not access this page. I am not sure if the guest can
>> access the ioreq page.
>>
>> But in the exceptional case, the ioreq page is dirtied, and is copied to
>> the secondary vm. The ioreq page will contain a wrong event channel, the
>> hypervisor will check it: if the event channel is wrong, the guest will
>> be crashed.
>>
>>>
>>>>
>>>>> and we don't want to unexpectedly
>>>>> find a pending/in-flight ioreq.
>>>>
>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
>> there
>>>> should be no pending/in-flight ioreq at checkpoint.
>>>
>>> In the common case perhaps, but we must consider the exceptional case.
>>> The exceptional case here is some corruption which happens to appear as
>>> an in-flight ioreq.
>>
>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>> hypervisor
>> has a bug, anything can happen. I think we should trust the hypervisor.
>>
>>>
>>>>
>>>>>
>>>>> Either qemu needs to take care of re-initialising the event channels
>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>> disappearing.
>>>
>>> I still stand by this statement.  I believe it is the only safe way of
>>> solving the issue you have discovered.
>>
>> Add a new qemu monitor command to update ioreq page?
>>
> 
> If you're attaching to a 'new' VM (i.e one with an updated image) then I suspect you're going to have to destroy and re-create the ioreq server so that the shared page gets re-populated with the correct event channels. Either that or you're going to have to ensure that the page is not part of restored image and sample the new one that Xen should have set up.


I agree with it. I will try to add a new qemu monitor command(or do it when
updating qemu's state) to destroy and re-create it.

Thanks
Wen Congyang

> 
>   Paul 
> 
>  
>> Thanks
>> Wen Congyang
>>
>>>
>>> ~Andrew
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 10:54                     ` Wen Congyang
@ 2015-06-10 10:58                       ` Paul Durrant
  2015-06-10 11:37                         ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-10 10:58 UTC (permalink / raw)
  To: Wen Congyang, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 10 June 2015 11:55
> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/10/2015 06:40 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 10 June 2015 10:06
> >> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
> Durrant
> >> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> Cc: Paul Durrant
> >>
> >> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >>> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>>
> >>>>
> >>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>>> ioreq page contains evtchn which will be set when we resume
> the
> >>>>>>>>>> secondary vm the first time. The hypervisor will check if the
> >>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
> >>>>>>>>>> than one time.
> >>>>>>>>>>
> >>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> >>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>>>
> >>>>>>>>> The issue here is that we are running the restore algorithm over
> a
> >>>>>>>>> domain which has already been running in Xen for a while.  This
> is a
> >>>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>>>
> >>>>>>>> Exactly.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Does the qemu process associated with this domain get frozen
> >>>>>>>>> while the
> >>>>>>>>> secondary is being reset, or does the process get destroyed and
> >>>>>>>>> recreated.
> >>>>>>>>
> >>>>>>>> What do you mean by reset? do you mean secondary is
> suspended
> >> at
> >>>>>>>> checkpoint?
> >>>>>>>
> >>>>>>> Well - at the point that the buffered records are being processed,
> we
> >>>>>>> are in the process of resetting the state of the secondary to match
> >>>>>>> the
> >>>>>>> primary.
> >>>>>>
> >>>>>> Yes, at this point, the qemu process associated with this domain is
> >>>>>> frozen.
> >>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu)
> to
> >>>>>> pause
> >>>>>> qemu. After we processed all records, qemu will be restored with
> the
> >>>>>> received
> >>>>>> state, that's why we add a
> libxl__qmp_restore(qemu_load_vmstate()
> >> in
> >>>>>> qemu)
> >>>>>> api to restore qemu with received state. Currently in libxl, qemu only
> >>>>>> start
> >>>>>> with the received state, there's no api to load received state while
> >>>>>> qemu is
> >>>>>> running for a while.
> >>>>>
> >>>>> Now I consider this more, it is absolutely wrong to not zero the page
> >>>>> here.  The event channel in the page is not guaranteed to be the
> same
> >>>>> between the primary and secondary,
> >>>>
> >>>> That's why we don't zero it on secondary.
> >>>
> >>> I think you missed my point.  Apologies for the double negative.   It
> >>> must, under all circumstances, be zeroed at this point, for safety
> reasons.
> >>>
> >>> The page in question is subject to logdirty just like any other guest
> >>> pages, which means that if the guest writes to it naturally (i.e. not a
> >>> Xen or Qemu write, both of whom have magic mappings which are not
> >>> subject to logdirty), it will be transmitted in the stream.  As the
> >>> event channel could be different, the lack of zeroing it at this point
> >>> means that the event channel would be wrong as opposed to simply
> >>> missing.  This is a worse position to be in.
> >>
> >> The guest should not access this page. I am not sure if the guest can
> >> access the ioreq page.
> >>
> >> But in the exceptional case, the ioreq page is dirtied, and is copied to
> >> the secondary vm. The ioreq page will contain a wrong event channel, the
> >> hypervisor will check it: if the event channel is wrong, the guest will
> >> be crashed.
> >>
> >>>
> >>>>
> >>>>> and we don't want to unexpectedly
> >>>>> find a pending/in-flight ioreq.
> >>>>
> >>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
> >> there
> >>>> should be no pending/in-flight ioreq at checkpoint.
> >>>
> >>> In the common case perhaps, but we must consider the exceptional
> case.
> >>> The exceptional case here is some corruption which happens to appear
> as
> >>> an in-flight ioreq.
> >>
> >> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> >> hypervisor
> >> has a bug, anything can happen. I think we should trust the hypervisor.
> >>
> >>>
> >>>>
> >>>>>
> >>>>> Either qemu needs to take care of re-initialising the event channels
> >>>>> back to appropriate values, or Xen should tolerate the channels
> >>>>> disappearing.
> >>>
> >>> I still stand by this statement.  I believe it is the only safe way of
> >>> solving the issue you have discovered.
> >>
> >> Add a new qemu monitor command to update ioreq page?
> >>
> >
> > If you're attaching to a 'new' VM (i.e one with an updated image) then I
> suspect you're going to have to destroy and re-create the ioreq server so
> that the shared page gets re-populated with the correct event channels.
> Either that or you're going to have to ensure that the page is not part of
> restored image and sample the new one that Xen should have set up.
> 
> 
> I agree with it. I will try to add a new qemu monitor command(or do it when
> updating qemu's state) to destroy and re-create it.

The slightly tricky part of that is that you're going to have to cache and replay all the registrations that were done on the old instance, but you need to do that in any case as it's not state that is transferred in the VM save record.

  Paul

> 
> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>> ~Andrew
> >>> .
> >>>
> >
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 10:58                       ` Paul Durrant
@ 2015-06-10 11:37                         ` Wen Congyang
  2015-06-10 11:47                           ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-10 11:37 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/10/2015 06:58 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 10 June 2015 11:55
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 10 June 2015 10:06
>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
>> Durrant
>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> Cc: Paul Durrant
>>>>
>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>
>>>>>>
>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>> ioreq page contains evtchn which will be set when we resume
>> the
>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>>>>> than one time.
>>>>>>>>>>>>
>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>>>
>>>>>>>>>>> The issue here is that we are running the restore algorithm over
>> a
>>>>>>>>>>> domain which has already been running in Xen for a while.  This
>> is a
>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>
>>>>>>>>>> Exactly.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>>>>>> while the
>>>>>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>>>>>> recreated.
>>>>>>>>>>
>>>>>>>>>> What do you mean by reset? do you mean secondary is
>> suspended
>>>> at
>>>>>>>>>> checkpoint?
>>>>>>>>>
>>>>>>>>> Well - at the point that the buffered records are being processed,
>> we
>>>>>>>>> are in the process of resetting the state of the secondary to match
>>>>>>>>> the
>>>>>>>>> primary.
>>>>>>>>
>>>>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>>>>> frozen.
>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu)
>> to
>>>>>>>> pause
>>>>>>>> qemu. After we processed all records, qemu will be restored with
>> the
>>>>>>>> received
>>>>>>>> state, that's why we add a
>> libxl__qmp_restore(qemu_load_vmstate()
>>>> in
>>>>>>>> qemu)
>>>>>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>>>>>> start
>>>>>>>> with the received state, there's no api to load received state while
>>>>>>>> qemu is
>>>>>>>> running for a while.
>>>>>>>
>>>>>>> Now I consider this more, it is absolutely wrong to not zero the page
>>>>>>> here.  The event channel in the page is not guaranteed to be the
>> same
>>>>>>> between the primary and secondary,
>>>>>>
>>>>>> That's why we don't zero it on secondary.
>>>>>
>>>>> I think you missed my point.  Apologies for the double negative.   It
>>>>> must, under all circumstances, be zeroed at this point, for safety
>> reasons.
>>>>>
>>>>> The page in question is subject to logdirty just like any other guest
>>>>> pages, which means that if the guest writes to it naturally (i.e. not a
>>>>> Xen or Qemu write, both of whom have magic mappings which are not
>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>> event channel could be different, the lack of zeroing it at this point
>>>>> means that the event channel would be wrong as opposed to simply
>>>>> missing.  This is a worse position to be in.
>>>>
>>>> The guest should not access this page. I am not sure if the guest can
>>>> access the ioreq page.
>>>>
>>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
>>>> the secondary vm. The ioreq page will contain a wrong event channel, the
>>>> hypervisor will check it: if the event channel is wrong, the guest will
>>>> be crashed.
>>>>
>>>>>
>>>>>>
>>>>>>> and we don't want to unexpectedly
>>>>>>> find a pending/in-flight ioreq.
>>>>>>
>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
>>>> there
>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>
>>>>> In the common case perhaps, but we must consider the exceptional
>> case.
>>>>> The exceptional case here is some corruption which happens to appear
>> as
>>>>> an in-flight ioreq.
>>>>
>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>>>> hypervisor
>>>> has a bug, anything can happen. I think we should trust the hypervisor.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Either qemu needs to take care of re-initialising the event channels
>>>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>>>> disappearing.
>>>>>
>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>> solving the issue you have discovered.
>>>>
>>>> Add a new qemu monitor command to update ioreq page?
>>>>
>>>
>>> If you're attaching to a 'new' VM (i.e one with an updated image) then I
>> suspect you're going to have to destroy and re-create the ioreq server so
>> that the shared page gets re-populated with the correct event channels.
>> Either that or you're going to have to ensure that the page is not part of
>> restored image and sample the new one that Xen should have set up.
>>
>>
>> I agree with it. I will try to add a new qemu monitor command(or do it when
>> updating qemu's state) to destroy and re-create it.
> 
> The slightly tricky part of that is that you're going to have to cache and replay all the registrations that were done on the old instance, but you need to do that in any case as it's not state that is transferred in the VM save record.

Why do we have to cache and replay all the registrations that were done on the old instance?
We will set to the guest to a new state, the old state should be dropped.

Thanks
Wen Congyang

> 
>   Paul
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>> ~Andrew
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 11:37                         ` Wen Congyang
@ 2015-06-10 11:47                           ` Paul Durrant
  2015-06-11  1:13                             ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-10 11:47 UTC (permalink / raw)
  To: Wen Congyang, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Wen Congyang
> Sent: 10 June 2015 12:38
> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/10/2015 06:58 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 10 June 2015 11:55
> >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/10/2015 06:40 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>> Sent: 10 June 2015 10:06
> >>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
> >> Durrant
> >>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> Cc: Paul Durrant
> >>>>
> >>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>>>>> ioreq page contains evtchn which will be set when we
> resume
> >> the
> >>>>>>>>>>>> secondary vm the first time. The hypervisor will check if the
> >>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
> >>>>>>>>>>>> than one time.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm
> is
> >>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> >>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>>>>>
> >>>>>>>>>>> The issue here is that we are running the restore algorithm
> over
> >> a
> >>>>>>>>>>> domain which has already been running in Xen for a while.
> This
> >> is a
> >>>>>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>>>>>
> >>>>>>>>>> Exactly.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Does the qemu process associated with this domain get
> frozen
> >>>>>>>>>>> while the
> >>>>>>>>>>> secondary is being reset, or does the process get destroyed
> and
> >>>>>>>>>>> recreated.
> >>>>>>>>>>
> >>>>>>>>>> What do you mean by reset? do you mean secondary is
> >> suspended
> >>>> at
> >>>>>>>>>> checkpoint?
> >>>>>>>>>
> >>>>>>>>> Well - at the point that the buffered records are being
> processed,
> >> we
> >>>>>>>>> are in the process of resetting the state of the secondary to
> match
> >>>>>>>>> the
> >>>>>>>>> primary.
> >>>>>>>>
> >>>>>>>> Yes, at this point, the qemu process associated with this domain is
> >>>>>>>> frozen.
> >>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
> qemu)
> >> to
> >>>>>>>> pause
> >>>>>>>> qemu. After we processed all records, qemu will be restored with
> >> the
> >>>>>>>> received
> >>>>>>>> state, that's why we add a
> >> libxl__qmp_restore(qemu_load_vmstate()
> >>>> in
> >>>>>>>> qemu)
> >>>>>>>> api to restore qemu with received state. Currently in libxl, qemu
> only
> >>>>>>>> start
> >>>>>>>> with the received state, there's no api to load received state while
> >>>>>>>> qemu is
> >>>>>>>> running for a while.
> >>>>>>>
> >>>>>>> Now I consider this more, it is absolutely wrong to not zero the
> page
> >>>>>>> here.  The event channel in the page is not guaranteed to be the
> >> same
> >>>>>>> between the primary and secondary,
> >>>>>>
> >>>>>> That's why we don't zero it on secondary.
> >>>>>
> >>>>> I think you missed my point.  Apologies for the double negative.   It
> >>>>> must, under all circumstances, be zeroed at this point, for safety
> >> reasons.
> >>>>>
> >>>>> The page in question is subject to logdirty just like any other guest
> >>>>> pages, which means that if the guest writes to it naturally (i.e. not a
> >>>>> Xen or Qemu write, both of whom have magic mappings which are
> not
> >>>>> subject to logdirty), it will be transmitted in the stream.  As the
> >>>>> event channel could be different, the lack of zeroing it at this point
> >>>>> means that the event channel would be wrong as opposed to simply
> >>>>> missing.  This is a worse position to be in.
> >>>>
> >>>> The guest should not access this page. I am not sure if the guest can
> >>>> access the ioreq page.
> >>>>
> >>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
> >>>> the secondary vm. The ioreq page will contain a wrong event channel,
> the
> >>>> hypervisor will check it: if the event channel is wrong, the guest will
> >>>> be crashed.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> and we don't want to unexpectedly
> >>>>>>> find a pending/in-flight ioreq.
> >>>>>>
> >>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
> suspended,
> >>>> there
> >>>>>> should be no pending/in-flight ioreq at checkpoint.
> >>>>>
> >>>>> In the common case perhaps, but we must consider the exceptional
> >> case.
> >>>>> The exceptional case here is some corruption which happens to
> appear
> >> as
> >>>>> an in-flight ioreq.
> >>>>
> >>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> >>>> hypervisor
> >>>> has a bug, anything can happen. I think we should trust the hypervisor.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Either qemu needs to take care of re-initialising the event channels
> >>>>>>> back to appropriate values, or Xen should tolerate the channels
> >>>>>>> disappearing.
> >>>>>
> >>>>> I still stand by this statement.  I believe it is the only safe way of
> >>>>> solving the issue you have discovered.
> >>>>
> >>>> Add a new qemu monitor command to update ioreq page?
> >>>>
> >>>
> >>> If you're attaching to a 'new' VM (i.e one with an updated image) then I
> >> suspect you're going to have to destroy and re-create the ioreq server so
> >> that the shared page gets re-populated with the correct event channels.
> >> Either that or you're going to have to ensure that the page is not part of
> >> restored image and sample the new one that Xen should have set up.
> >>
> >>
> >> I agree with it. I will try to add a new qemu monitor command(or do it
> when
> >> updating qemu's state) to destroy and re-create it.
> >
> > The slightly tricky part of that is that you're going to have to cache and
> replay all the registrations that were done on the old instance, but you need
> to do that in any case as it's not state that is transferred in the VM save
> record.
> 
> Why do we have to cache and replay all the registrations that were done on
> the old instance?

Do you not have device models that you need to continue to function? When the ioreq server is torn down then all MMIO, port IO and PCI config ranges that were mapped to it will disappear.

  Paul

> We will set to the guest to a new state, the old state should be dropped.
> 
> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>   Paul
> >>>
> >>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>> ~Andrew
> >>>>> .
> >>>>>
> >>>
> >>> .
> >>>
> >
> > .
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
  2015-06-08  9:24   ` Andrew Cooper
@ 2015-06-10 14:55   ` Ian Campbell
  2015-06-11  2:10     ` Yang Hongyang
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-10 14:55 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> If the err is RECORD_NOT_PROCESSED, and it is an optional record,
> restore will still fail. The patch fix this.

Whichever approach you take to fixing this, please say _how_ the change
fixes it, it's not at all clear why moving this code should matter.

And if there is an ulterior motive behind the move, please say that too.

> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
> index 9e27dba..2d2edd3 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>      free(rec->data);
>      rec->data = NULL;
>  
> -    if ( rc == RECORD_NOT_PROCESSED )
> -    {
> -        if ( rec->type & REC_TYPE_OPTIONAL )
> -            DPRINTF("Ignoring optional record %#x (%s)",
> -                    rec->type, rec_type_to_str(rec->type));
> -        else
> -        {
> -            ERROR("Mandatory record %#x (%s) not handled",
> -                  rec->type, rec_type_to_str(rec->type));
> -            rc = -1;
> -        }
> -    }
> -
>      return rc;
>  }
>  
> @@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
>          else
>          {
>              rc = process_record(ctx, &rec);
> -            if ( rc )
> +            if ( rc == RECORD_NOT_PROCESSED )
> +            {
> +                if ( rec.type & REC_TYPE_OPTIONAL )
> +                    DPRINTF("Ignoring optional record %#x (%s)",
> +                            rec.type, rec_type_to_str(rec.type));
> +                else
> +                {
> +                    ERROR("Mandatory record %#x (%s) not handled",
> +                          rec.type, rec_type_to_str(rec.type));
> +                    rc = -1;
> +                    goto err;
> +                }
> +            }
> +            else if ( rc )
>                  goto err;
>          }
>  

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-08  3:43 ` [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests Yang Hongyang
@ 2015-06-10 15:18   ` Ian Campbell
  2015-06-11  2:42     ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-10 15:18 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> For PVHVM, the hypercall return code is 0, and it can be resumed
> in a new domain context.
> we suspend PVHVM and resume it is like this:
> 1. suspend it via evtchn
> 2. modifty the return code to 1
> 3. the guest know that the suspend is cancelled, we will use fast path
>    to resume it.
> 
> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> device status...). In this case, we cannot use the fast path to resume it.
> Keep the return code 0, and use a slow path to resume the guest. We have
> updated the guest state, so we call it a new domain context.
> 
> For HVM, the hypercall is a NOP.

This doesn't match my reading of domain_resume on the Xen side, which is
the ultimate effect of this hypercall. It seems to unpause the domain
(and all vcpus) regardless of the domain type, including PVHVM vs HVM
(which isn't something Xen is generally aware of anyway).

I also can't really follow the stuff about PVHVM vs HVM vs uncooperative
guests, and I certainly can't see where the PVHVM vs HVM distinction is
made in this patch.

Ian.


> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index e67bebd..bd82334 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>      return do_domctl(xch, &domctl);
>  }
>  
> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
> +{
> +    DECLARE_DOMCTL;
> +
> +    /*
> +     * If it is PVHVM, the hypercall return code is 0, because this
> +     * is not a fast path resume, we do not modify_returncode as in
> +     * xc_domain_resume_cooperative.
> +     * (resuming it in a new domain context)
> +     *
> +     * If it is a HVM, the hypercall is a NOP.
> +     */
> +    domctl.cmd = XEN_DOMCTL_resumedomain;
> +    domctl.domain = domid;
> +    return do_domctl(xch, &domctl);
> +}
> +
>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>  {
>      DECLARE_DOMCTL;
> @@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>       */
>  #if defined(__i386__) || defined(__x86_64__)
>      if ( info.hvm )
> -    {
> -        ERROR("Cannot resume uncooperative HVM guests");
> -        return rc;
> -    }
> +        return xc_domain_resume_hvm(xch, domid);
>  
>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>      {

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-08  3:43 ` [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h Yang Hongyang
  2015-06-08 10:04   ` Yang Hongyang
@ 2015-06-10 15:20   ` Ian Campbell
  2015-06-11  2:07     ` Yang Hongyang
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-10 15:20 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> When we are under COLO, we will send dirty page bitmap info from
> secondary to primary at every checkpoint.

... and this is a _libxl_ operation? Is that the right layer here?

>  So we need to get/test
> the dirty page bitmap. We just expose xc_bitops.h for libxl use.
> 
> NOTE:
>   Need to make clean and rerun configure to get it compiled.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/include/xc_bitops.h | 76 +++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xc_bitops.h         | 76 -----------------------------------------
>  2 files changed, 76 insertions(+), 76 deletions(-)
>  create mode 100644 tools/libxc/include/xc_bitops.h
>  delete mode 100644 tools/libxc/xc_bitops.h
> 
> diff --git a/tools/libxc/include/xc_bitops.h b/tools/libxc/include/xc_bitops.h
> new file mode 100644
> index 0000000..cd749f4
> --- /dev/null
> +++ b/tools/libxc/include/xc_bitops.h
> @@ -0,0 +1,76 @@
> +#ifndef XC_BITOPS_H
> +#define XC_BITOPS_H 1
> +
> +/* bitmap operations for single threaded access */
> +
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> +#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> +
> +#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> +#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> +
> +/* calculate required space for number of longs needed to hold nr_bits */
> +static inline int bitmap_size(int nr_bits)
> +{
> +    int nr_long, nr_bytes;
> +    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> +    nr_bytes = nr_long * sizeof(unsigned long);
> +    return nr_bytes;
> +}
> +
> +static inline unsigned long *bitmap_alloc(int nr_bits)
> +{
> +    return calloc(1, bitmap_size(nr_bits));
> +}
> +
> +static inline void bitmap_set(unsigned long *addr, int nr_bits)
> +{
> +    memset(addr, 0xff, bitmap_size(nr_bits));
> +}
> +
> +static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> +{
> +    memset(addr, 0, bitmap_size(nr_bits));
> +}
> +
> +static inline int test_bit(int nr, unsigned long *addr)
> +{
> +    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> +}
> +
> +static inline void clear_bit(int nr, unsigned long *addr)
> +{
> +    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> +}
> +
> +static inline void set_bit(int nr, unsigned long *addr)
> +{
> +    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> +}
> +
> +static inline int test_and_clear_bit(int nr, unsigned long *addr)
> +{
> +    int oldbit = test_bit(nr, addr);
> +    clear_bit(nr, addr);
> +    return oldbit;
> +}
> +
> +static inline int test_and_set_bit(int nr, unsigned long *addr)
> +{
> +    int oldbit = test_bit(nr, addr);
> +    set_bit(nr, addr);
> +    return oldbit;
> +}
> +
> +static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> +                             int nr_bits)
> +{
> +    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> +    for ( i = 0; i < nr_longs; ++i )
> +        dst[i] |= other[i];
> +}
> +
> +#endif  /* XC_BITOPS_H */
> diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
> deleted file mode 100644
> index cd749f4..0000000
> --- a/tools/libxc/xc_bitops.h
> +++ /dev/null
> @@ -1,76 +0,0 @@
> -#ifndef XC_BITOPS_H
> -#define XC_BITOPS_H 1
> -
> -/* bitmap operations for single threaded access */
> -
> -#include <stdlib.h>
> -#include <string.h>
> -
> -#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> -#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> -
> -#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> -#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> -
> -/* calculate required space for number of longs needed to hold nr_bits */
> -static inline int bitmap_size(int nr_bits)
> -{
> -    int nr_long, nr_bytes;
> -    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> -    nr_bytes = nr_long * sizeof(unsigned long);
> -    return nr_bytes;
> -}
> -
> -static inline unsigned long *bitmap_alloc(int nr_bits)
> -{
> -    return calloc(1, bitmap_size(nr_bits));
> -}
> -
> -static inline void bitmap_set(unsigned long *addr, int nr_bits)
> -{
> -    memset(addr, 0xff, bitmap_size(nr_bits));
> -}
> -
> -static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> -{
> -    memset(addr, 0, bitmap_size(nr_bits));
> -}
> -
> -static inline int test_bit(int nr, unsigned long *addr)
> -{
> -    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> -}
> -
> -static inline void clear_bit(int nr, unsigned long *addr)
> -{
> -    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> -}
> -
> -static inline void set_bit(int nr, unsigned long *addr)
> -{
> -    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> -}
> -
> -static inline int test_and_clear_bit(int nr, unsigned long *addr)
> -{
> -    int oldbit = test_bit(nr, addr);
> -    clear_bit(nr, addr);
> -    return oldbit;
> -}
> -
> -static inline int test_and_set_bit(int nr, unsigned long *addr)
> -{
> -    int oldbit = test_bit(nr, addr);
> -    set_bit(nr, addr);
> -    return oldbit;
> -}
> -
> -static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> -                             int nr_bits)
> -{
> -    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> -    for ( i = 0; i < nr_longs; ++i )
> -        dst[i] |= other[i];
> -}
> -
> -#endif  /* XC_BITOPS_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-08  3:43 ` [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state Yang Hongyang
@ 2015-06-10 15:35   ` Ian Campbell
  2015-06-11  2:09     ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-10 15:35 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> Secondary vm is running in colo mode. So we will do
> the following things again and again:
> 1. suspend both primay vm and secondary vm
> 2. sync the state
> 3. resume both primary vm and secondary vm
> We will send qemu's state each time in step2, and
> slave's qemu should read it each time before resuming
> secondary vm. Introduce a new API libxl__domain_restore()
> to do it. This API should be called before resuming
> secondary vm.

Is this a preexisting qemu interface or one to be added?

> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_dom_save.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h |  4 ++++
>  tools/libxl/libxl_qmp.c      | 10 ++++++++++
>  3 files changed, 61 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> index 74a6bae..f9627f8 100644
> --- a/tools/libxl/libxl_dom_save.c
> +++ b/tools/libxl/libxl_dom_save.c
> @@ -663,6 +663,53 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>      return 0;
>  }
>  
> +int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
> +{
> +    int rc = 0;
> +
> +    libxl_domain_type type = libxl__domain_type(gc, domid);
> +    if (type != LIBXL_DOMAIN_TYPE_HVM) {
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +
> +    rc = libxl__domain_restore_device_model(gc, domid);
> +    if (rc)
> +        LOG(ERROR, "failed to restore device mode for domain %u:%d",
> +            domid, rc);
> +out:
> +    return rc;
> +}
> +
> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
> +{
> +    char *state_file;
> +    int rc;
> +
> +    switch (libxl__device_model_version_running(gc, domid)) {
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        /* not supported now */
> +        rc = ERROR_INVAL;
> +        break;
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +        /*
> +         * This function may be called too many times for the same gc,
> +         * so we use NOGC, and free the memory before return to avoid
> +         * OOM.
> +         */
> +        state_file = libxl__sprintf(NOGC,
> +                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
> +                                    domid);
> +        rc = libxl__qmp_restore(gc, domid, state_file);
> +        free(state_file);
> +        break;
> +    default:
> +        rc = ERROR_INVAL;
> +    }
> +
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 1905195..20364c6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1022,6 +1022,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
>  
>  _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>                                       uint32_t size, void *data);
> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
>  _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
>  
>  _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
> @@ -1039,6 +1040,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
>                                    const char *userdata_userid,
>                                    const uint8_t *data, int datalen);
>  
> +_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>  _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>                                   int suspend_cancel);
>  _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
> @@ -1651,6 +1653,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>  _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
>  /* Save current QEMU state into fd. */
>  _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
> +/* Load current QEMU state from fd. */
> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
>  /* Set dirty bitmap logging status */
>  _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
>  _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index 9aa7e2e..a6f1a21 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>                             NULL, NULL);
>  }
>  
> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
> +{
> +    libxl__json_object *args = NULL;
> +
> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
> +
> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
> +                           NULL, NULL);
> +}
> +
>  static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>                        char *device, char *target, char *arg)
>  {

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-08  3:43 ` [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause() Yang Hongyang
@ 2015-06-10 15:37   ` Ian Campbell
  2015-06-11  2:21     ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-10 15:37 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> The guest is paused after libxl_domain_create_restore().
> Secondary vm is running in colo mode. So we need to unpause
> the guest. The current API libxl_domain_unpause() is
> not an internal API. Introduce a new API to support it.
> No functional change.

In general there is nothing wrong with using a public function
internally. Is there some special consideration here?

> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c          | 20 ++++++++++++++------
>  tools/libxl/libxl_internal.h |  1 +
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index ba2da92..d5691dc 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -933,9 +933,8 @@ out:
>      return AO_INPROGRESS;
>  }
>  
> -int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> +int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>  {
> -    GC_INIT(ctx);
>      char *path;
>      char *state;
>      int ret, rc = 0;
> @@ -947,7 +946,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>      }
>  
>      if (type == LIBXL_DOMAIN_TYPE_HVM) {
> -        uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
>  
>          path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
>          state = libxl__xs_read(gc, XBT_NULL, path);
> @@ -957,12 +956,21 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>                                           NULL, NULL, NULL);
>          }
>      }
> -    ret = xc_domain_unpause(ctx->xch, domid);
> -    if (ret<0) {
> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
> +
> +    ret = xc_domain_unpause(CTX->xch, domid);
> +    if (ret < 0) {
> +        LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
>          rc = ERROR_FAIL;
>      }
>   out:
> +    return rc;
> +}
> +
> +int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> +{
> +    GC_INIT(ctx);
> +    int rc = libxl__domain_unpause(gc, domid);
> +
>      GC_FREE;
>      return rc;
>  }
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 20364c6..366470f 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1044,6 +1044,7 @@ _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>  _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>                                   int suspend_cancel);
>  _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
> +_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
>  
>  /* returns 0 or 1, or a libxl error code */
>  _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-10 11:47                           ` Paul Durrant
@ 2015-06-11  1:13                             ` Wen Congyang
  2015-06-11  8:32                               ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  1:13 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/10/2015 07:47 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Wen Congyang
>> Sent: 10 June 2015 12:38
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 10 June 2015 11:55
>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>> Sent: 10 June 2015 10:06
>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
>>>> Durrant
>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> Cc: Paul Durrant
>>>>>>
>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
>> resume
>>>> the
>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>>>>>>> than one time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm
>> is
>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue here is that we are running the restore algorithm
>> over
>>>> a
>>>>>>>>>>>>> domain which has already been running in Xen for a while.
>> This
>>>> is a
>>>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>>>
>>>>>>>>>>>> Exactly.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does the qemu process associated with this domain get
>> frozen
>>>>>>>>>>>>> while the
>>>>>>>>>>>>> secondary is being reset, or does the process get destroyed
>> and
>>>>>>>>>>>>> recreated.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
>>>> suspended
>>>>>> at
>>>>>>>>>>>> checkpoint?
>>>>>>>>>>>
>>>>>>>>>>> Well - at the point that the buffered records are being
>> processed,
>>>> we
>>>>>>>>>>> are in the process of resetting the state of the secondary to
>> match
>>>>>>>>>>> the
>>>>>>>>>>> primary.
>>>>>>>>>>
>>>>>>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>>>>>>> frozen.
>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
>> qemu)
>>>> to
>>>>>>>>>> pause
>>>>>>>>>> qemu. After we processed all records, qemu will be restored with
>>>> the
>>>>>>>>>> received
>>>>>>>>>> state, that's why we add a
>>>> libxl__qmp_restore(qemu_load_vmstate()
>>>>>> in
>>>>>>>>>> qemu)
>>>>>>>>>> api to restore qemu with received state. Currently in libxl, qemu
>> only
>>>>>>>>>> start
>>>>>>>>>> with the received state, there's no api to load received state while
>>>>>>>>>> qemu is
>>>>>>>>>> running for a while.
>>>>>>>>>
>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the
>> page
>>>>>>>>> here.  The event channel in the page is not guaranteed to be the
>>>> same
>>>>>>>>> between the primary and secondary,
>>>>>>>>
>>>>>>>> That's why we don't zero it on secondary.
>>>>>>>
>>>>>>> I think you missed my point.  Apologies for the double negative.   It
>>>>>>> must, under all circumstances, be zeroed at this point, for safety
>>>> reasons.
>>>>>>>
>>>>>>> The page in question is subject to logdirty just like any other guest
>>>>>>> pages, which means that if the guest writes to it naturally (i.e. not a
>>>>>>> Xen or Qemu write, both of whom have magic mappings which are
>> not
>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>>>> event channel could be different, the lack of zeroing it at this point
>>>>>>> means that the event channel would be wrong as opposed to simply
>>>>>>> missing.  This is a worse position to be in.
>>>>>>
>>>>>> The guest should not access this page. I am not sure if the guest can
>>>>>> access the ioreq page.
>>>>>>
>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
>>>>>> the secondary vm. The ioreq page will contain a wrong event channel,
>> the
>>>>>> hypervisor will check it: if the event channel is wrong, the guest will
>>>>>> be crashed.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> and we don't want to unexpectedly
>>>>>>>>> find a pending/in-flight ioreq.
>>>>>>>>
>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
>> suspended,
>>>>>> there
>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>>>
>>>>>>> In the common case perhaps, but we must consider the exceptional
>>>> case.
>>>>>>> The exceptional case here is some corruption which happens to
>> appear
>>>> as
>>>>>>> an in-flight ioreq.
>>>>>>
>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>>>>>> hypervisor
>>>>>> has a bug, anything can happen. I think we should trust the hypervisor.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Either qemu needs to take care of re-initialising the event channels
>>>>>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>>>>>> disappearing.
>>>>>>>
>>>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>>>> solving the issue you have discovered.
>>>>>>
>>>>>> Add a new qemu monitor command to update ioreq page?
>>>>>>
>>>>>
>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then I
>>>> suspect you're going to have to destroy and re-create the ioreq server so
>>>> that the shared page gets re-populated with the correct event channels.
>>>> Either that or you're going to have to ensure that the page is not part of
>>>> restored image and sample the new one that Xen should have set up.
>>>>
>>>>
>>>> I agree with it. I will try to add a new qemu monitor command(or do it
>> when
>>>> updating qemu's state) to destroy and re-create it.
>>>
>>> The slightly tricky part of that is that you're going to have to cache and
>> replay all the registrations that were done on the old instance, but you need
>> to do that in any case as it's not state that is transferred in the VM save
>> record.
>>
>> Why do we have to cache and replay all the registrations that were done on
>> the old instance?
> 
> Do you not have device models that you need to continue to function? When the ioreq server is torn down then all MMIO, port IO and PCI config ranges that were mapped to it will disappear.

Yes, I don't known which should be done unless I implement and test it.

I have some questions about it:
1. Can guest access the ioreq page? If the page is modified by the guest unexpectedly,
   what will happen?
2. If the ioreq page is dirtied by the guest, it will be transfered from primary
   to secondary during checkpoint. The evtchn is invalid, I think the best behavior
   is that: make the guest crashed, not continue to run.

Thanks
Wen Congyang

> 
>   Paul
> 
>> We will set to the guest to a new state, the old state should be dropped.
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>   Paul
>>>>>
>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>> ~Andrew
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-10 15:20   ` Ian Campbell
@ 2015-06-11  2:07     ` Yang Hongyang
  2015-06-11  8:41       ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11  2:07 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 06/10/2015 11:20 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> When we are under COLO, we will send dirty page bitmap info from
>> secondary to primary at every checkpoint.
>
> ... and this is a _libxl_ operation? Is that the right layer here?

For the first question, Yes, this is done in the suspend callback on
restore side. We do this in libxl because currently we only added a
back channel on libxl side. There're no back channel in libxc.

By considering this more, if we do this in libxc part, the code will be
less complex: we can drop the 4th & 9th patch of this series and also
get rid of the get_dirty_pfn() callback. instead we will add a patch to
add back channel in libxc.

For the second question, I'm not sure, what's Andrew's opinion? which
is the right layer to do this operation, libxl or libxc?

>
>>   So we need to get/test
>> the dirty page bitmap. We just expose xc_bitops.h for libxl use.
>>
>> NOTE:
>>    Need to make clean and rerun configure to get it compiled.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxc/include/xc_bitops.h | 76 +++++++++++++++++++++++++++++++++++++++++
>>   tools/libxc/xc_bitops.h         | 76 -----------------------------------------
>>   2 files changed, 76 insertions(+), 76 deletions(-)
>>   create mode 100644 tools/libxc/include/xc_bitops.h
>>   delete mode 100644 tools/libxc/xc_bitops.h
>>
>> diff --git a/tools/libxc/include/xc_bitops.h b/tools/libxc/include/xc_bitops.h
>> new file mode 100644
>> index 0000000..cd749f4
>> --- /dev/null
>> +++ b/tools/libxc/include/xc_bitops.h
>> @@ -0,0 +1,76 @@
>> +#ifndef XC_BITOPS_H
>> +#define XC_BITOPS_H 1
>> +
>> +/* bitmap operations for single threaded access */
>> +
>> +#include <stdlib.h>
>> +#include <string.h>
>> +
>> +#define BITS_PER_LONG (sizeof(unsigned long) * 8)
>> +#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
>> +
>> +#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
>> +#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
>> +
>> +/* calculate required space for number of longs needed to hold nr_bits */
>> +static inline int bitmap_size(int nr_bits)
>> +{
>> +    int nr_long, nr_bytes;
>> +    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
>> +    nr_bytes = nr_long * sizeof(unsigned long);
>> +    return nr_bytes;
>> +}
>> +
>> +static inline unsigned long *bitmap_alloc(int nr_bits)
>> +{
>> +    return calloc(1, bitmap_size(nr_bits));
>> +}
>> +
>> +static inline void bitmap_set(unsigned long *addr, int nr_bits)
>> +{
>> +    memset(addr, 0xff, bitmap_size(nr_bits));
>> +}
>> +
>> +static inline void bitmap_clear(unsigned long *addr, int nr_bits)
>> +{
>> +    memset(addr, 0, bitmap_size(nr_bits));
>> +}
>> +
>> +static inline int test_bit(int nr, unsigned long *addr)
>> +{
>> +    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
>> +}
>> +
>> +static inline void clear_bit(int nr, unsigned long *addr)
>> +{
>> +    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
>> +}
>> +
>> +static inline void set_bit(int nr, unsigned long *addr)
>> +{
>> +    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
>> +}
>> +
>> +static inline int test_and_clear_bit(int nr, unsigned long *addr)
>> +{
>> +    int oldbit = test_bit(nr, addr);
>> +    clear_bit(nr, addr);
>> +    return oldbit;
>> +}
>> +
>> +static inline int test_and_set_bit(int nr, unsigned long *addr)
>> +{
>> +    int oldbit = test_bit(nr, addr);
>> +    set_bit(nr, addr);
>> +    return oldbit;
>> +}
>> +
>> +static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
>> +                             int nr_bits)
>> +{
>> +    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
>> +    for ( i = 0; i < nr_longs; ++i )
>> +        dst[i] |= other[i];
>> +}
>> +
>> +#endif  /* XC_BITOPS_H */
>> diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
>> deleted file mode 100644
>> index cd749f4..0000000
>> --- a/tools/libxc/xc_bitops.h
>> +++ /dev/null
>> @@ -1,76 +0,0 @@
>> -#ifndef XC_BITOPS_H
>> -#define XC_BITOPS_H 1
>> -
>> -/* bitmap operations for single threaded access */
>> -
>> -#include <stdlib.h>
>> -#include <string.h>
>> -
>> -#define BITS_PER_LONG (sizeof(unsigned long) * 8)
>> -#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
>> -
>> -#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
>> -#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
>> -
>> -/* calculate required space for number of longs needed to hold nr_bits */
>> -static inline int bitmap_size(int nr_bits)
>> -{
>> -    int nr_long, nr_bytes;
>> -    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
>> -    nr_bytes = nr_long * sizeof(unsigned long);
>> -    return nr_bytes;
>> -}
>> -
>> -static inline unsigned long *bitmap_alloc(int nr_bits)
>> -{
>> -    return calloc(1, bitmap_size(nr_bits));
>> -}
>> -
>> -static inline void bitmap_set(unsigned long *addr, int nr_bits)
>> -{
>> -    memset(addr, 0xff, bitmap_size(nr_bits));
>> -}
>> -
>> -static inline void bitmap_clear(unsigned long *addr, int nr_bits)
>> -{
>> -    memset(addr, 0, bitmap_size(nr_bits));
>> -}
>> -
>> -static inline int test_bit(int nr, unsigned long *addr)
>> -{
>> -    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
>> -}
>> -
>> -static inline void clear_bit(int nr, unsigned long *addr)
>> -{
>> -    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
>> -}
>> -
>> -static inline void set_bit(int nr, unsigned long *addr)
>> -{
>> -    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
>> -}
>> -
>> -static inline int test_and_clear_bit(int nr, unsigned long *addr)
>> -{
>> -    int oldbit = test_bit(nr, addr);
>> -    clear_bit(nr, addr);
>> -    return oldbit;
>> -}
>> -
>> -static inline int test_and_set_bit(int nr, unsigned long *addr)
>> -{
>> -    int oldbit = test_bit(nr, addr);
>> -    set_bit(nr, addr);
>> -    return oldbit;
>> -}
>> -
>> -static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
>> -                             int nr_bits)
>> -{
>> -    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
>> -    for ( i = 0; i < nr_longs; ++i )
>> -        dst[i] |= other[i];
>> -}
>> -
>> -#endif  /* XC_BITOPS_H */
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-10 15:35   ` Ian Campbell
@ 2015-06-11  2:09     ` Yang Hongyang
  2015-06-11  8:43       ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11  2:09 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/10/2015 11:35 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> Secondary vm is running in colo mode. So we will do
>> the following things again and again:
>> 1. suspend both primay vm and secondary vm
>> 2. sync the state
>> 3. resume both primary vm and secondary vm
>> We will send qemu's state each time in step2, and
>> slave's qemu should read it each time before resuming
>> secondary vm. Introduce a new API libxl__domain_restore()
>> to do it. This API should be called before resuming
>> secondary vm.
>
> Is this a preexisting qemu interface or one to be added?

We added the qemu interface "xen-load-devices-state",
it's not in qemu upstream yet.

>
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl_dom_save.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_internal.h |  4 ++++
>>   tools/libxl/libxl_qmp.c      | 10 ++++++++++
>>   3 files changed, 61 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
>> index 74a6bae..f9627f8 100644
>> --- a/tools/libxl/libxl_dom_save.c
>> +++ b/tools/libxl/libxl_dom_save.c
>> @@ -663,6 +663,53 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>>       return 0;
>>   }
>>
>> +int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
>> +{
>> +    int rc = 0;
>> +
>> +    libxl_domain_type type = libxl__domain_type(gc, domid);
>> +    if (type != LIBXL_DOMAIN_TYPE_HVM) {
>> +        rc = ERROR_FAIL;
>> +        goto out;
>> +    }
>> +
>> +    rc = libxl__domain_restore_device_model(gc, domid);
>> +    if (rc)
>> +        LOG(ERROR, "failed to restore device mode for domain %u:%d",
>> +            domid, rc);
>> +out:
>> +    return rc;
>> +}
>> +
>> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
>> +{
>> +    char *state_file;
>> +    int rc;
>> +
>> +    switch (libxl__device_model_version_running(gc, domid)) {
>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
>> +        /* not supported now */
>> +        rc = ERROR_INVAL;
>> +        break;
>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
>> +        /*
>> +         * This function may be called too many times for the same gc,
>> +         * so we use NOGC, and free the memory before return to avoid
>> +         * OOM.
>> +         */
>> +        state_file = libxl__sprintf(NOGC,
>> +                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
>> +                                    domid);
>> +        rc = libxl__qmp_restore(gc, domid, state_file);
>> +        free(state_file);
>> +        break;
>> +    default:
>> +        rc = ERROR_INVAL;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 1905195..20364c6 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -1022,6 +1022,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
>>
>>   _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>>                                        uint32_t size, void *data);
>> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
>>   _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
>>
>>   _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
>> @@ -1039,6 +1040,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
>>                                     const char *userdata_userid,
>>                                     const uint8_t *data, int datalen);
>>
>> +_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>>   _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>>                                    int suspend_cancel);
>>   _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
>> @@ -1651,6 +1653,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>>   _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
>>   /* Save current QEMU state into fd. */
>>   _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
>> +/* Load current QEMU state from fd. */
>> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
>>   /* Set dirty bitmap logging status */
>>   _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
>>   _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
>> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
>> index 9aa7e2e..a6f1a21 100644
>> --- a/tools/libxl/libxl_qmp.c
>> +++ b/tools/libxl/libxl_qmp.c
>> @@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>>                              NULL, NULL);
>>   }
>>
>> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
>> +{
>> +    libxl__json_object *args = NULL;
>> +
>> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
>> +
>> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
>> +                           NULL, NULL);
>> +}
>> +
>>   static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>>                         char *device, char *target, char *arg)
>>   {
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record
  2015-06-10 14:55   ` Ian Campbell
@ 2015-06-11  2:10     ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11  2:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram



On 06/10/2015 10:55 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> If the err is RECORD_NOT_PROCESSED, and it is an optional record,
>> restore will still fail. The patch fix this.
>
> Whichever approach you take to fixing this, please say _how_ the change
> fixes it, it's not at all clear why moving this code should matter.
>
> And if there is an ulterior motive behind the move, please say that too.

Okay, will describe this in the next version, thank you!

>
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>   tools/libxc/xc_sr_restore.c | 28 ++++++++++++++--------------
>>   1 file changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
>> index 9e27dba..2d2edd3 100644
>> --- a/tools/libxc/xc_sr_restore.c
>> +++ b/tools/libxc/xc_sr_restore.c
>> @@ -560,19 +560,6 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
>>       free(rec->data);
>>       rec->data = NULL;
>>
>> -    if ( rc == RECORD_NOT_PROCESSED )
>> -    {
>> -        if ( rec->type & REC_TYPE_OPTIONAL )
>> -            DPRINTF("Ignoring optional record %#x (%s)",
>> -                    rec->type, rec_type_to_str(rec->type));
>> -        else
>> -        {
>> -            ERROR("Mandatory record %#x (%s) not handled",
>> -                  rec->type, rec_type_to_str(rec->type));
>> -            rc = -1;
>> -        }
>> -    }
>> -
>>       return rc;
>>   }
>>
>> @@ -678,7 +665,20 @@ static int restore(struct xc_sr_context *ctx)
>>           else
>>           {
>>               rc = process_record(ctx, &rec);
>> -            if ( rc )
>> +            if ( rc == RECORD_NOT_PROCESSED )
>> +            {
>> +                if ( rec.type & REC_TYPE_OPTIONAL )
>> +                    DPRINTF("Ignoring optional record %#x (%s)",
>> +                            rec.type, rec_type_to_str(rec.type));
>> +                else
>> +                {
>> +                    ERROR("Mandatory record %#x (%s) not handled",
>> +                          rec.type, rec_type_to_str(rec.type));
>> +                    rc = -1;
>> +                    goto err;
>> +                }
>> +            }
>> +            else if ( rc )
>>                   goto err;
>>           }
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-10 15:37   ` Ian Campbell
@ 2015-06-11  2:21     ` Yang Hongyang
  2015-06-11  8:43       ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11  2:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/10/2015 11:37 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> The guest is paused after libxl_domain_create_restore().
>> Secondary vm is running in colo mode. So we need to unpause
>> the guest. The current API libxl_domain_unpause() is
>> not an internal API. Introduce a new API to support it.
>> No functional change.
>
> In general there is nothing wrong with using a public function
> internally. Is there some special consideration here?

It's just that we thought it's better to use internal functions for
internal purpose.
Most the public functions take ctx as the first param, the internal functions
take gc/egc as the first param(although we can get ctx from gcs and call
public functions when needed).
If it doesn't matter, we can drop this patch.

>
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl.c          | 20 ++++++++++++++------
>>   tools/libxl/libxl_internal.h |  1 +
>>   2 files changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index ba2da92..d5691dc 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -933,9 +933,8 @@ out:
>>       return AO_INPROGRESS;
>>   }
>>
>> -int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>> +int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>>   {
>> -    GC_INIT(ctx);
>>       char *path;
>>       char *state;
>>       int ret, rc = 0;
>> @@ -947,7 +946,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>       }
>>
>>       if (type == LIBXL_DOMAIN_TYPE_HVM) {
>> -        uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
>> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
>>
>>           path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
>>           state = libxl__xs_read(gc, XBT_NULL, path);
>> @@ -957,12 +956,21 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>                                            NULL, NULL, NULL);
>>           }
>>       }
>> -    ret = xc_domain_unpause(ctx->xch, domid);
>> -    if (ret<0) {
>> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
>> +
>> +    ret = xc_domain_unpause(CTX->xch, domid);
>> +    if (ret < 0) {
>> +        LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
>>           rc = ERROR_FAIL;
>>       }
>>    out:
>> +    return rc;
>> +}
>> +
>> +int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>> +{
>> +    GC_INIT(ctx);
>> +    int rc = libxl__domain_unpause(gc, domid);
>> +
>>       GC_FREE;
>>       return rc;
>>   }
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 20364c6..366470f 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -1044,6 +1044,7 @@ _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>>   _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>>                                    int suspend_cancel);
>>   _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
>> +_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
>>
>>   /* returns 0 or 1, or a libxl error code */
>>   _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-10 15:18   ` Ian Campbell
@ 2015-06-11  2:42     ` Wen Congyang
  2015-06-11  8:44       ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  2:42 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, eddie.dong, andrew.cooper3, yunhong.jiang, ian.jackson,
	xen-devel, guijianfeng, rshriram

On 06/10/2015 11:18 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> For PVHVM, the hypercall return code is 0, and it can be resumed
>> in a new domain context.
>> we suspend PVHVM and resume it is like this:
>> 1. suspend it via evtchn
>> 2. modifty the return code to 1
>> 3. the guest know that the suspend is cancelled, we will use fast path
>>    to resume it.
>>
>> Under COLO, we will update the guest's state(modify memory, cpu's registers,
>> device status...). In this case, we cannot use the fast path to resume it.
>> Keep the return code 0, and use a slow path to resume the guest. We have
>> updated the guest state, so we call it a new domain context.
>>
>> For HVM, the hypercall is a NOP.
> 
> This doesn't match my reading of domain_resume on the Xen side, which is
> the ultimate effect of this hypercall. It seems to unpause the domain
> (and all vcpus) regardless of the domain type, including PVHVM vs HVM
> (which isn't something Xen is generally aware of anyway).
> 
> I also can't really follow the stuff about PVHVM vs HVM vs uncooperative
> guests, and I certainly can't see where the PVHVM vs HVM distinction is
> made in this patch.

Sorry for my mistake. I read the codes again:

1. suspend
a. PVHVM and PV: we use the same way to suspend the guest(send the suspend
   request to the guest)
b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
   the guest
c. ???: suspending the guest via XenBus control node
I don't know we will goto c in which case.

2. Resume:
a. fast path
   In this case, we don't change the guest's state.
   PV: modify the return code to 1, and than call the domctl: XEN_DOMCTL_resumedomain
   PVHVM: same with PV
   HVM: do nothing in modify_returncode, and than call the domctl: XEN_DOMCTL_resumedomain
b. slow
   In this case, we have changed the guest's state.
   PV: update start info, and reset all secondary CPU states. Than call the
   domctl: XEN_DOMCTL_resumedomain
   PVHVM and HVM can not be resumed.

For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain
can work. I am not sure if we should update start info and reset all secondary CPU
states.

For pure HVM guest, in my test, only call the domctl: XEN_DOMCTL_resumedomain can
work.

So we can call libxl__domain_resume(..., 1) if we don't change the guest state, otherwise
call libxl__domain_resume(..., 0).

Any suggestion is welcomed.

Thanks
Wen Congyang


> 
> Ian.
> 
> 
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>  tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index e67bebd..bd82334 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>>      return do_domctl(xch, &domctl);
>>  }
>>  
>> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
>> +{
>> +    DECLARE_DOMCTL;
>> +
>> +    /*
>> +     * If it is PVHVM, the hypercall return code is 0, because this
>> +     * is not a fast path resume, we do not modify_returncode as in
>> +     * xc_domain_resume_cooperative.
>> +     * (resuming it in a new domain context)
>> +     *
>> +     * If it is a HVM, the hypercall is a NOP.
>> +     */
>> +    domctl.cmd = XEN_DOMCTL_resumedomain;
>> +    domctl.domain = domid;
>> +    return do_domctl(xch, &domctl);
>> +}
>> +
>>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>  {
>>      DECLARE_DOMCTL;
>> @@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>       */
>>  #if defined(__i386__) || defined(__x86_64__)
>>      if ( info.hvm )
>> -    {
>> -        ERROR("Cannot resume uncooperative HVM guests");
>> -        return rc;
>> -    }
>> +        return xc_domain_resume_hvm(xch, domid);
>>  
>>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>>      {
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11  1:13                             ` Wen Congyang
@ 2015-06-11  8:32                               ` Paul Durrant
  2015-06-11  8:48                                 ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-11  8:32 UTC (permalink / raw)
  To: Wen Congyang, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 11 June 2015 02:14
> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/10/2015 07:47 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >> bounces@lists.xen.org] On Behalf Of Wen Congyang
> >> Sent: 10 June 2015 12:38
> >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/10/2015 06:58 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>> Sent: 10 June 2015 11:55
> >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>> Sent: 10 June 2015 10:06
> >>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
> >>>> Durrant
> >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> zero
> >>>> ioreq
> >>>>>> page only one time
> >>>>>>
> >>>>>> Cc: Paul Durrant
> >>>>>>
> >>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
> >> resume
> >>>> the
> >>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
> the
> >>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
> more
> >>>>>>>>>>>>>> than one time.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the
> vm
> >> is
> >>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
> >>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The issue here is that we are running the restore algorithm
> >> over
> >>>> a
> >>>>>>>>>>>>> domain which has already been running in Xen for a while.
> >> This
> >>>> is a
> >>>>>>>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Exactly.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Does the qemu process associated with this domain get
> >> frozen
> >>>>>>>>>>>>> while the
> >>>>>>>>>>>>> secondary is being reset, or does the process get destroyed
> >> and
> >>>>>>>>>>>>> recreated.
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you mean by reset? do you mean secondary is
> >>>> suspended
> >>>>>> at
> >>>>>>>>>>>> checkpoint?
> >>>>>>>>>>>
> >>>>>>>>>>> Well - at the point that the buffered records are being
> >> processed,
> >>>> we
> >>>>>>>>>>> are in the process of resetting the state of the secondary to
> >> match
> >>>>>>>>>>> the
> >>>>>>>>>>> primary.
> >>>>>>>>>>
> >>>>>>>>>> Yes, at this point, the qemu process associated with this
> domain is
> >>>>>>>>>> frozen.
> >>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
> >> qemu)
> >>>> to
> >>>>>>>>>> pause
> >>>>>>>>>> qemu. After we processed all records, qemu will be restored
> with
> >>>> the
> >>>>>>>>>> received
> >>>>>>>>>> state, that's why we add a
> >>>> libxl__qmp_restore(qemu_load_vmstate()
> >>>>>> in
> >>>>>>>>>> qemu)
> >>>>>>>>>> api to restore qemu with received state. Currently in libxl,
> qemu
> >> only
> >>>>>>>>>> start
> >>>>>>>>>> with the received state, there's no api to load received state
> while
> >>>>>>>>>> qemu is
> >>>>>>>>>> running for a while.
> >>>>>>>>>
> >>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the
> >> page
> >>>>>>>>> here.  The event channel in the page is not guaranteed to be the
> >>>> same
> >>>>>>>>> between the primary and secondary,
> >>>>>>>>
> >>>>>>>> That's why we don't zero it on secondary.
> >>>>>>>
> >>>>>>> I think you missed my point.  Apologies for the double negative.   It
> >>>>>>> must, under all circumstances, be zeroed at this point, for safety
> >>>> reasons.
> >>>>>>>
> >>>>>>> The page in question is subject to logdirty just like any other guest
> >>>>>>> pages, which means that if the guest writes to it naturally (i.e. not a
> >>>>>>> Xen or Qemu write, both of whom have magic mappings which are
> >> not
> >>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
> >>>>>>> event channel could be different, the lack of zeroing it at this point
> >>>>>>> means that the event channel would be wrong as opposed to
> simply
> >>>>>>> missing.  This is a worse position to be in.
> >>>>>>
> >>>>>> The guest should not access this page. I am not sure if the guest can
> >>>>>> access the ioreq page.
> >>>>>>
> >>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
> >>>>>> the secondary vm. The ioreq page will contain a wrong event
> channel,
> >> the
> >>>>>> hypervisor will check it: if the event channel is wrong, the guest will
> >>>>>> be crashed.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> and we don't want to unexpectedly
> >>>>>>>>> find a pending/in-flight ioreq.
> >>>>>>>>
> >>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
> >> suspended,
> >>>>>> there
> >>>>>>>> should be no pending/in-flight ioreq at checkpoint.
> >>>>>>>
> >>>>>>> In the common case perhaps, but we must consider the
> exceptional
> >>>> case.
> >>>>>>> The exceptional case here is some corruption which happens to
> >> appear
> >>>> as
> >>>>>>> an in-flight ioreq.
> >>>>>>
> >>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> >>>>>> hypervisor
> >>>>>> has a bug, anything can happen. I think we should trust the
> hypervisor.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Either qemu needs to take care of re-initialising the event
> channels
> >>>>>>>>> back to appropriate values, or Xen should tolerate the channels
> >>>>>>>>> disappearing.
> >>>>>>>
> >>>>>>> I still stand by this statement.  I believe it is the only safe way of
> >>>>>>> solving the issue you have discovered.
> >>>>>>
> >>>>>> Add a new qemu monitor command to update ioreq page?
> >>>>>>
> >>>>>
> >>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then
> I
> >>>> suspect you're going to have to destroy and re-create the ioreq server
> so
> >>>> that the shared page gets re-populated with the correct event
> channels.
> >>>> Either that or you're going to have to ensure that the page is not part of
> >>>> restored image and sample the new one that Xen should have set up.
> >>>>
> >>>>
> >>>> I agree with it. I will try to add a new qemu monitor command(or do it
> >> when
> >>>> updating qemu's state) to destroy and re-create it.
> >>>
> >>> The slightly tricky part of that is that you're going to have to cache and
> >> replay all the registrations that were done on the old instance, but you
> need
> >> to do that in any case as it's not state that is transferred in the VM save
> >> record.
> >>
> >> Why do we have to cache and replay all the registrations that were done
> on
> >> the old instance?
> >
> > Do you not have device models that you need to continue to function?
> When the ioreq server is torn down then all MMIO, port IO and PCI config
> ranges that were mapped to it will disappear.
> 
> Yes, I don't known which should be done unless I implement and test it.
> 
> I have some questions about it:
> 1. Can guest access the ioreq page? If the page is modified by the guest
> unexpectedly,
>    what will happen?

No, the guest cannot modify the pages once a non-default ioreq server is active. The pages are removed from the guest P2M when it is activated, which is one of the reasons for modifying QEMU to not behave as a legacy default server.

> 2. If the ioreq page is dirtied by the guest, it will be transfered from primary
>    to secondary during checkpoint. The evtchn is invalid, I think the best
> behavior
>    is that: make the guest crashed, not continue to run.

As I said, the pages are not in the P2M if the server is active so they will not be transferred as part of the VM state. However, this presents a problem; at the far end, the emulator will not be able to hook into the guest. So, when the source domain is paused, the ioreq server needs to be torn down (so that its pages are re-inserted into the P2M and marked dirty for transfer). This is what happens in a normal migration. One extra problem you have is that the source domain is not then killed, it is resumed along with the emulator. Thus, on resume, the emulator needs to create a new ioreq server and re-register all its device models with that new server.
I don't know the detail of what you do at the far end, but if you always start a new emulator instance using the QEMU save record then you should be fine (just like with a normal migration).

  Paul
 
> 
> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >> We will set to the guest to a new state, the old state should be dropped.
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>   Paul
> >>>
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>   Paul
> >>>>>
> >>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>> ~Andrew
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>
> >>> .
> >>>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-11  2:07     ` Yang Hongyang
@ 2015-06-11  8:41       ` Ian Campbell
  2015-06-11 10:45         ` Andrew Cooper
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  8:41 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	ian.jackson, xen-devel, guijianfeng, rshriram

On Thu, 2015-06-11 at 10:07 +0800, Yang Hongyang wrote:
> 
> On 06/10/2015 11:20 PM, Ian Campbell wrote:
> > On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >> When we are under COLO, we will send dirty page bitmap info from
> >> secondary to primary at every checkpoint.
> >
> > ... and this is a _libxl_ operation? Is that the right layer here?
> 
> For the first question, Yes, this is done in the suspend callback on
> restore side. We do this in libxl because currently we only added a
> back channel on libxl side. There're no back channel in libxc.
> 
> By considering this more, if we do this in libxc part, the code will be
> less complex: we can drop the 4th & 9th patch of this series and also
> get rid of the get_dirty_pfn() callback. instead we will add a patch to
> add back channel in libxc.

That sounds better to me, but lets see what Andrew thinks.

> 
> For the second question, I'm not sure, what's Andrew's opinion? which
> is the right layer to do this operation, libxl or libxc?
> 
> >
> >>   So we need to get/test
> >> the dirty page bitmap. We just expose xc_bitops.h for libxl use.
> >>
> >> NOTE:
> >>    Need to make clean and rerun configure to get it compiled.
> >>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> ---
> >>   tools/libxc/include/xc_bitops.h | 76 +++++++++++++++++++++++++++++++++++++++++
> >>   tools/libxc/xc_bitops.h         | 76 -----------------------------------------
> >>   2 files changed, 76 insertions(+), 76 deletions(-)
> >>   create mode 100644 tools/libxc/include/xc_bitops.h
> >>   delete mode 100644 tools/libxc/xc_bitops.h
> >>
> >> diff --git a/tools/libxc/include/xc_bitops.h b/tools/libxc/include/xc_bitops.h
> >> new file mode 100644
> >> index 0000000..cd749f4
> >> --- /dev/null
> >> +++ b/tools/libxc/include/xc_bitops.h
> >> @@ -0,0 +1,76 @@
> >> +#ifndef XC_BITOPS_H
> >> +#define XC_BITOPS_H 1
> >> +
> >> +/* bitmap operations for single threaded access */
> >> +
> >> +#include <stdlib.h>
> >> +#include <string.h>
> >> +
> >> +#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> >> +#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> >> +
> >> +#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> >> +#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> >> +
> >> +/* calculate required space for number of longs needed to hold nr_bits */
> >> +static inline int bitmap_size(int nr_bits)
> >> +{
> >> +    int nr_long, nr_bytes;
> >> +    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> >> +    nr_bytes = nr_long * sizeof(unsigned long);
> >> +    return nr_bytes;
> >> +}
> >> +
> >> +static inline unsigned long *bitmap_alloc(int nr_bits)
> >> +{
> >> +    return calloc(1, bitmap_size(nr_bits));
> >> +}
> >> +
> >> +static inline void bitmap_set(unsigned long *addr, int nr_bits)
> >> +{
> >> +    memset(addr, 0xff, bitmap_size(nr_bits));
> >> +}
> >> +
> >> +static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> >> +{
> >> +    memset(addr, 0, bitmap_size(nr_bits));
> >> +}
> >> +
> >> +static inline int test_bit(int nr, unsigned long *addr)
> >> +{
> >> +    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> >> +}
> >> +
> >> +static inline void clear_bit(int nr, unsigned long *addr)
> >> +{
> >> +    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> >> +}
> >> +
> >> +static inline void set_bit(int nr, unsigned long *addr)
> >> +{
> >> +    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> >> +}
> >> +
> >> +static inline int test_and_clear_bit(int nr, unsigned long *addr)
> >> +{
> >> +    int oldbit = test_bit(nr, addr);
> >> +    clear_bit(nr, addr);
> >> +    return oldbit;
> >> +}
> >> +
> >> +static inline int test_and_set_bit(int nr, unsigned long *addr)
> >> +{
> >> +    int oldbit = test_bit(nr, addr);
> >> +    set_bit(nr, addr);
> >> +    return oldbit;
> >> +}
> >> +
> >> +static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> >> +                             int nr_bits)
> >> +{
> >> +    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> >> +    for ( i = 0; i < nr_longs; ++i )
> >> +        dst[i] |= other[i];
> >> +}
> >> +
> >> +#endif  /* XC_BITOPS_H */
> >> diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
> >> deleted file mode 100644
> >> index cd749f4..0000000
> >> --- a/tools/libxc/xc_bitops.h
> >> +++ /dev/null
> >> @@ -1,76 +0,0 @@
> >> -#ifndef XC_BITOPS_H
> >> -#define XC_BITOPS_H 1
> >> -
> >> -/* bitmap operations for single threaded access */
> >> -
> >> -#include <stdlib.h>
> >> -#include <string.h>
> >> -
> >> -#define BITS_PER_LONG (sizeof(unsigned long) * 8)
> >> -#define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
> >> -
> >> -#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
> >> -#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
> >> -
> >> -/* calculate required space for number of longs needed to hold nr_bits */
> >> -static inline int bitmap_size(int nr_bits)
> >> -{
> >> -    int nr_long, nr_bytes;
> >> -    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
> >> -    nr_bytes = nr_long * sizeof(unsigned long);
> >> -    return nr_bytes;
> >> -}
> >> -
> >> -static inline unsigned long *bitmap_alloc(int nr_bits)
> >> -{
> >> -    return calloc(1, bitmap_size(nr_bits));
> >> -}
> >> -
> >> -static inline void bitmap_set(unsigned long *addr, int nr_bits)
> >> -{
> >> -    memset(addr, 0xff, bitmap_size(nr_bits));
> >> -}
> >> -
> >> -static inline void bitmap_clear(unsigned long *addr, int nr_bits)
> >> -{
> >> -    memset(addr, 0, bitmap_size(nr_bits));
> >> -}
> >> -
> >> -static inline int test_bit(int nr, unsigned long *addr)
> >> -{
> >> -    return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
> >> -}
> >> -
> >> -static inline void clear_bit(int nr, unsigned long *addr)
> >> -{
> >> -    BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
> >> -}
> >> -
> >> -static inline void set_bit(int nr, unsigned long *addr)
> >> -{
> >> -    BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
> >> -}
> >> -
> >> -static inline int test_and_clear_bit(int nr, unsigned long *addr)
> >> -{
> >> -    int oldbit = test_bit(nr, addr);
> >> -    clear_bit(nr, addr);
> >> -    return oldbit;
> >> -}
> >> -
> >> -static inline int test_and_set_bit(int nr, unsigned long *addr)
> >> -{
> >> -    int oldbit = test_bit(nr, addr);
> >> -    set_bit(nr, addr);
> >> -    return oldbit;
> >> -}
> >> -
> >> -static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
> >> -                             int nr_bits)
> >> -{
> >> -    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
> >> -    for ( i = 0; i < nr_longs; ++i )
> >> -        dst[i] |= other[i];
> >> -}
> >> -
> >> -#endif  /* XC_BITOPS_H */
> >
> >
> > .
> >
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-11  2:09     ` Yang Hongyang
@ 2015-06-11  8:43       ` Ian Campbell
  2015-06-11  8:55         ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  8:43 UTC (permalink / raw)
  To: Yang Hongyang, Stefano Stabellini, Anthony Perard
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson

On Thu, 2015-06-11 at 10:09 +0800, Yang Hongyang wrote:
> 
> On 06/10/2015 11:35 PM, Ian Campbell wrote:
> > On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >> Secondary vm is running in colo mode. So we will do
> >> the following things again and again:
> >> 1. suspend both primay vm and secondary vm
> >> 2. sync the state
> >> 3. resume both primary vm and secondary vm
> >> We will send qemu's state each time in step2, and
> >> slave's qemu should read it each time before resuming
> >> secondary vm. Introduce a new API libxl__domain_restore()
> >> to do it. This API should be called before resuming
> >> secondary vm.
> >
> > Is this a preexisting qemu interface or one to be added?
> 
> We added the qemu interface "xen-load-devices-state",
> it's not in qemu upstream yet.

OK, please mention this dependency in the commit text since we will want
to be sure the interface is going to be accepted in this form by QEMU
upstream before we start using it. Please also CC the QEMU maintainers
on this patch in the future (by adding Cc: below the S-o-b if you don't
want to spam them the whole series), I've added them here now.

In particular "devices" seems odd to me, perhaps
"xen-load-device-state"?

> 
> >
> >>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> ---
> >>   tools/libxl/libxl_dom_save.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> >>   tools/libxl/libxl_internal.h |  4 ++++
> >>   tools/libxl/libxl_qmp.c      | 10 ++++++++++
> >>   3 files changed, 61 insertions(+)
> >>
> >> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> >> index 74a6bae..f9627f8 100644
> >> --- a/tools/libxl/libxl_dom_save.c
> >> +++ b/tools/libxl/libxl_dom_save.c
> >> @@ -663,6 +663,53 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
> >>       return 0;
> >>   }
> >>
> >> +int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
> >> +{
> >> +    int rc = 0;
> >> +
> >> +    libxl_domain_type type = libxl__domain_type(gc, domid);
> >> +    if (type != LIBXL_DOMAIN_TYPE_HVM) {
> >> +        rc = ERROR_FAIL;
> >> +        goto out;
> >> +    }
> >> +
> >> +    rc = libxl__domain_restore_device_model(gc, domid);
> >> +    if (rc)
> >> +        LOG(ERROR, "failed to restore device mode for domain %u:%d",
> >> +            domid, rc);
> >> +out:
> >> +    return rc;
> >> +}
> >> +
> >> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
> >> +{
> >> +    char *state_file;
> >> +    int rc;
> >> +
> >> +    switch (libxl__device_model_version_running(gc, domid)) {
> >> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> >> +        /* not supported now */
> >> +        rc = ERROR_INVAL;
> >> +        break;
> >> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> >> +        /*
> >> +         * This function may be called too many times for the same gc,
> >> +         * so we use NOGC, and free the memory before return to avoid
> >> +         * OOM.
> >> +         */
> >> +        state_file = libxl__sprintf(NOGC,
> >> +                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
> >> +                                    domid);
> >> +        rc = libxl__qmp_restore(gc, domid, state_file);
> >> +        free(state_file);
> >> +        break;
> >> +    default:
> >> +        rc = ERROR_INVAL;
> >> +    }
> >> +
> >> +    return rc;
> >> +}
> >> +
> >>   /*
> >>    * Local variables:
> >>    * mode: C
> >> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> >> index 1905195..20364c6 100644
> >> --- a/tools/libxl/libxl_internal.h
> >> +++ b/tools/libxl/libxl_internal.h
> >> @@ -1022,6 +1022,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
> >>
> >>   _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
> >>                                        uint32_t size, void *data);
> >> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
> >>   _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
> >>
> >>   _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
> >> @@ -1039,6 +1040,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
> >>                                     const char *userdata_userid,
> >>                                     const uint8_t *data, int datalen);
> >>
> >> +_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
> >>   _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
> >>                                    int suspend_cancel);
> >>   _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
> >> @@ -1651,6 +1653,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
> >>   _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
> >>   /* Save current QEMU state into fd. */
> >>   _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
> >> +/* Load current QEMU state from fd. */
> >> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
> >>   /* Set dirty bitmap logging status */
> >>   _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
> >>   _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
> >> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> >> index 9aa7e2e..a6f1a21 100644
> >> --- a/tools/libxl/libxl_qmp.c
> >> +++ b/tools/libxl/libxl_qmp.c
> >> @@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
> >>                              NULL, NULL);
> >>   }
> >>
> >> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
> >> +{
> >> +    libxl__json_object *args = NULL;
> >> +
> >> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
> >> +
> >> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
> >> +                           NULL, NULL);
> >> +}
> >> +
> >>   static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
> >>                         char *device, char *target, char *arg)
> >>   {
> >
> >
> > .
> >
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-11  2:21     ` Yang Hongyang
@ 2015-06-11  8:43       ` Ian Campbell
  2015-06-11  9:09         ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  8:43 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson

On Thu, 2015-06-11 at 10:21 +0800, Yang Hongyang wrote:
> 
> On 06/10/2015 11:37 PM, Ian Campbell wrote:
> > On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >> From: Wen Congyang <wency@cn.fujitsu.com>
> >>
> >> The guest is paused after libxl_domain_create_restore().
> >> Secondary vm is running in colo mode. So we need to unpause
> >> the guest. The current API libxl_domain_unpause() is
> >> not an internal API. Introduce a new API to support it.
> >> No functional change.
> >
> > In general there is nothing wrong with using a public function
> > internally. Is there some special consideration here?
> 
> It's just that we thought it's better to use internal functions for
> internal purpose.
> Most the public functions take ctx as the first param, the internal functions
> take gc/egc as the first param(although we can get ctx from gcs and call
> public functions when needed).
> If it doesn't matter, we can drop this patch.

It doesn't matter so you can, yes.

> 
> >
> >>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> ---
> >>   tools/libxl/libxl.c          | 20 ++++++++++++++------
> >>   tools/libxl/libxl_internal.h |  1 +
> >>   2 files changed, 15 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> >> index ba2da92..d5691dc 100644
> >> --- a/tools/libxl/libxl.c
> >> +++ b/tools/libxl/libxl.c
> >> @@ -933,9 +933,8 @@ out:
> >>       return AO_INPROGRESS;
> >>   }
> >>
> >> -int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> >> +int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
> >>   {
> >> -    GC_INIT(ctx);
> >>       char *path;
> >>       char *state;
> >>       int ret, rc = 0;
> >> @@ -947,7 +946,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> >>       }
> >>
> >>       if (type == LIBXL_DOMAIN_TYPE_HVM) {
> >> -        uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
> >> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
> >>
> >>           path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
> >>           state = libxl__xs_read(gc, XBT_NULL, path);
> >> @@ -957,12 +956,21 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> >>                                            NULL, NULL, NULL);
> >>           }
> >>       }
> >> -    ret = xc_domain_unpause(ctx->xch, domid);
> >> -    if (ret<0) {
> >> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
> >> +
> >> +    ret = xc_domain_unpause(CTX->xch, domid);
> >> +    if (ret < 0) {
> >> +        LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
> >>           rc = ERROR_FAIL;
> >>       }
> >>    out:
> >> +    return rc;
> >> +}
> >> +
> >> +int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
> >> +{
> >> +    GC_INIT(ctx);
> >> +    int rc = libxl__domain_unpause(gc, domid);
> >> +
> >>       GC_FREE;
> >>       return rc;
> >>   }
> >> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> >> index 20364c6..366470f 100644
> >> --- a/tools/libxl/libxl_internal.h
> >> +++ b/tools/libxl/libxl_internal.h
> >> @@ -1044,6 +1044,7 @@ _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
> >>   _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
> >>                                    int suspend_cancel);
> >>   _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
> >> +_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
> >>
> >>   /* returns 0 or 1, or a libxl error code */
> >>   _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
> >
> >
> > .
> >
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-11  2:42     ` Wen Congyang
@ 2015-06-11  8:44       ` Ian Campbell
  2015-06-11  8:56         ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  8:44 UTC (permalink / raw)
  To: Wen Congyang
  Cc: wei.liu2, eddie.dong, andrew.cooper3, yunhong.jiang, ian.jackson,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

On Thu, 2015-06-11 at 10:42 +0800, Wen Congyang wrote:
> On 06/10/2015 11:18 PM, Ian Campbell wrote:
> > On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >> From: Wen Congyang <wency@cn.fujitsu.com>
> >>
> >> For PVHVM, the hypercall return code is 0, and it can be resumed
> >> in a new domain context.
> >> we suspend PVHVM and resume it is like this:
> >> 1. suspend it via evtchn
> >> 2. modifty the return code to 1
> >> 3. the guest know that the suspend is cancelled, we will use fast path
> >>    to resume it.
> >>
> >> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> >> device status...). In this case, we cannot use the fast path to resume it.
> >> Keep the return code 0, and use a slow path to resume the guest. We have
> >> updated the guest state, so we call it a new domain context.
> >>
> >> For HVM, the hypercall is a NOP.
> > 
> > This doesn't match my reading of domain_resume on the Xen side, which is
> > the ultimate effect of this hypercall. It seems to unpause the domain
> > (and all vcpus) regardless of the domain type, including PVHVM vs HVM
> > (which isn't something Xen is generally aware of anyway).
> > 
> > I also can't really follow the stuff about PVHVM vs HVM vs uncooperative
> > guests, and I certainly can't see where the PVHVM vs HVM distinction is
> > made in this patch.
> 
> Sorry for my mistake. I read the codes again:
> 
> 1. suspend
> a. PVHVM and PV: we use the same way to suspend the guest(send the suspend
>    request to the guest)
> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>    the guest
> c. ???: suspending the guest via XenBus control node

AFAIK c is another option under a, it depends on whether the guest
supports evtchn or not, if not then the xenstore variant will be used.

> I don't know we will goto c in which case.
> 
> 2. Resume:
> a. fast path
>    In this case, we don't change the guest's state.
>    PV: modify the return code to 1, and than call the domctl: XEN_DOMCTL_resumedomain
>    PVHVM: same with PV
>    HVM: do nothing in modify_returncode, and than call the domctl: XEN_DOMCTL_resumedomain
> b. slow
>    In this case, we have changed the guest's state.
>    PV: update start info, and reset all secondary CPU states. Than call the
>    domctl: XEN_DOMCTL_resumedomain
>    PVHVM and HVM can not be resumed.
> 
> For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain
> can work. I am not sure if we should update start info and reset all secondary CPU
> states.
> 
> For pure HVM guest, in my test, only call the domctl: XEN_DOMCTL_resumedomain can
> work.
> 
> So we can call libxl__domain_resume(..., 1) if we don't change the guest state, otherwise
> call libxl__domain_resume(..., 0).
> 
> Any suggestion is welcomed.
> 
> Thanks
> Wen Congyang
> 
> 
> > 
> > Ian.
> > 
> > 
> >>
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >> ---
> >>  tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
> >>  1 file changed, 18 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> >> index e67bebd..bd82334 100644
> >> --- a/tools/libxc/xc_resume.c
> >> +++ b/tools/libxc/xc_resume.c
> >> @@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
> >>      return do_domctl(xch, &domctl);
> >>  }
> >>  
> >> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
> >> +{
> >> +    DECLARE_DOMCTL;
> >> +
> >> +    /*
> >> +     * If it is PVHVM, the hypercall return code is 0, because this
> >> +     * is not a fast path resume, we do not modify_returncode as in
> >> +     * xc_domain_resume_cooperative.
> >> +     * (resuming it in a new domain context)
> >> +     *
> >> +     * If it is a HVM, the hypercall is a NOP.
> >> +     */
> >> +    domctl.cmd = XEN_DOMCTL_resumedomain;
> >> +    domctl.domain = domid;
> >> +    return do_domctl(xch, &domctl);
> >> +}
> >> +
> >>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
> >>  {
> >>      DECLARE_DOMCTL;
> >> @@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
> >>       */
> >>  #if defined(__i386__) || defined(__x86_64__)
> >>      if ( info.hvm )
> >> -    {
> >> -        ERROR("Cannot resume uncooperative HVM guests");
> >> -        return rc;
> >> -    }
> >> +        return xc_domain_resume_hvm(xch, domid);
> >>  
> >>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
> >>      {
> > 
> > 
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11  8:32                               ` Paul Durrant
@ 2015-06-11  8:48                                 ` Wen Congyang
  2015-06-11 10:20                                   ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  8:48 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/11/2015 04:32 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 11 June 2015 02:14
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/10/2015 07:47 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>>> bounces@lists.xen.org] On Behalf Of Wen Congyang
>>>> Sent: 10 June 2015 12:38
>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>> Sent: 10 June 2015 11:55
>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>> Sent: 10 June 2015 10:06
>>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org; Paul
>>>>>> Durrant
>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>> zero
>>>>>> ioreq
>>>>>>>> page only one time
>>>>>>>>
>>>>>>>> Cc: Paul Durrant
>>>>>>>>
>>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
>>>> resume
>>>>>> the
>>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
>> the
>>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
>> more
>>>>>>>>>>>>>>>> than one time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the
>> vm
>>>> is
>>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@cn.fujitsu.com>
>>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issue here is that we are running the restore algorithm
>>>> over
>>>>>> a
>>>>>>>>>>>>>>> domain which has already been running in Xen for a while.
>>>> This
>>>>>> is a
>>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Exactly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does the qemu process associated with this domain get
>>>> frozen
>>>>>>>>>>>>>>> while the
>>>>>>>>>>>>>>> secondary is being reset, or does the process get destroyed
>>>> and
>>>>>>>>>>>>>>> recreated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
>>>>>> suspended
>>>>>>>> at
>>>>>>>>>>>>>> checkpoint?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well - at the point that the buffered records are being
>>>> processed,
>>>>>> we
>>>>>>>>>>>>> are in the process of resetting the state of the secondary to
>>>> match
>>>>>>>>>>>>> the
>>>>>>>>>>>>> primary.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, at this point, the qemu process associated with this
>> domain is
>>>>>>>>>>>> frozen.
>>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
>>>> qemu)
>>>>>> to
>>>>>>>>>>>> pause
>>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
>> with
>>>>>> the
>>>>>>>>>>>> received
>>>>>>>>>>>> state, that's why we add a
>>>>>> libxl__qmp_restore(qemu_load_vmstate()
>>>>>>>> in
>>>>>>>>>>>> qemu)
>>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
>> qemu
>>>> only
>>>>>>>>>>>> start
>>>>>>>>>>>> with the received state, there's no api to load received state
>> while
>>>>>>>>>>>> qemu is
>>>>>>>>>>>> running for a while.
>>>>>>>>>>>
>>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the
>>>> page
>>>>>>>>>>> here.  The event channel in the page is not guaranteed to be the
>>>>>> same
>>>>>>>>>>> between the primary and secondary,
>>>>>>>>>>
>>>>>>>>>> That's why we don't zero it on secondary.
>>>>>>>>>
>>>>>>>>> I think you missed my point.  Apologies for the double negative.   It
>>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
>>>>>> reasons.
>>>>>>>>>
>>>>>>>>> The page in question is subject to logdirty just like any other guest
>>>>>>>>> pages, which means that if the guest writes to it naturally (i.e. not a
>>>>>>>>> Xen or Qemu write, both of whom have magic mappings which are
>>>> not
>>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>>>>>> event channel could be different, the lack of zeroing it at this point
>>>>>>>>> means that the event channel would be wrong as opposed to
>> simply
>>>>>>>>> missing.  This is a worse position to be in.
>>>>>>>>
>>>>>>>> The guest should not access this page. I am not sure if the guest can
>>>>>>>> access the ioreq page.
>>>>>>>>
>>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
>>>>>>>> the secondary vm. The ioreq page will contain a wrong event
>> channel,
>>>> the
>>>>>>>> hypervisor will check it: if the event channel is wrong, the guest will
>>>>>>>> be crashed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> and we don't want to unexpectedly
>>>>>>>>>>> find a pending/in-flight ioreq.
>>>>>>>>>>
>>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>> suspended,
>>>>>>>> there
>>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>>>>>
>>>>>>>>> In the common case perhaps, but we must consider the
>> exceptional
>>>>>> case.
>>>>>>>>> The exceptional case here is some corruption which happens to
>>>> appear
>>>>>> as
>>>>>>>>> an in-flight ioreq.
>>>>>>>>
>>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>>>>>>>> hypervisor
>>>>>>>> has a bug, anything can happen. I think we should trust the
>> hypervisor.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Either qemu needs to take care of re-initialising the event
>> channels
>>>>>>>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>>>>>>>> disappearing.
>>>>>>>>>
>>>>>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>>>>>> solving the issue you have discovered.
>>>>>>>>
>>>>>>>> Add a new qemu monitor command to update ioreq page?
>>>>>>>>
>>>>>>>
>>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then
>> I
>>>>>> suspect you're going to have to destroy and re-create the ioreq server
>> so
>>>>>> that the shared page gets re-populated with the correct event
>> channels.
>>>>>> Either that or you're going to have to ensure that the page is not part of
>>>>>> restored image and sample the new one that Xen should have set up.
>>>>>>
>>>>>>
>>>>>> I agree with it. I will try to add a new qemu monitor command(or do it
>>>> when
>>>>>> updating qemu's state) to destroy and re-create it.
>>>>>
>>>>> The slightly tricky part of that is that you're going to have to cache and
>>>> replay all the registrations that were done on the old instance, but you
>> need
>>>> to do that in any case as it's not state that is transferred in the VM save
>>>> record.
>>>>
>>>> Why do we have to cache and replay all the registrations that were done
>> on
>>>> the old instance?
>>>
>>> Do you not have device models that you need to continue to function?
>> When the ioreq server is torn down then all MMIO, port IO and PCI config
>> ranges that were mapped to it will disappear.
>>
>> Yes, I don't known which should be done unless I implement and test it.
>>
>> I have some questions about it:
>> 1. Can guest access the ioreq page? If the page is modified by the guest
>> unexpectedly,
>>    what will happen?
> 
> No, the guest cannot modify the pages once a non-default ioreq server is active. The pages are removed from the guest P2M when it is activated, which is one of the reasons for modifying QEMU to not behave as a legacy default server.
> 
>> 2. If the ioreq page is dirtied by the guest, it will be transfered from primary
>>    to secondary during checkpoint. The evtchn is invalid, I think the best
>> behavior
>>    is that: make the guest crashed, not continue to run.
> 
> As I said, the pages are not in the P2M if the server is active so they will not be transferred as part of the VM state. However, this presents a problem; at the far end, the emulator will not be able to hook into the guest. So, when the source domain is paused, the ioreq server needs to be torn down (so that its pages are re-inserted into the P2M and marked dirty for transfer). This is what happens in a normal migration. One extra problem you have is that the source domain is not then killed, it is resumed along with the emulator. Thus, on resume, the emulator needs to create a new ioreq server and re-register all its device models with that new server.
> I don't know the detail of what you do at the far end, but if you always start a new emulator instance using the QEMU save record then you should be fine (just like with a normal migration).

I don't find the codes where the ioreq server is torn down when the source domain is paused. Which function?

In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered.

Thanks
Wen Congyang

> 
>   Paul
>  
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>> We will set to the guest to a new state, the old state should be dropped.
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>   Paul
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>   Paul
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ~Andrew
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-11  8:43       ` Ian Campbell
@ 2015-06-11  8:55         ` Yang Hongyang
  2015-06-11  9:41           ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11  8:55 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Anthony Perard
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/11/2015 04:43 PM, Ian Campbell wrote:
> On Thu, 2015-06-11 at 10:09 +0800, Yang Hongyang wrote:
>>
>> On 06/10/2015 11:35 PM, Ian Campbell wrote:
>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>> Secondary vm is running in colo mode. So we will do
>>>> the following things again and again:
>>>> 1. suspend both primay vm and secondary vm
>>>> 2. sync the state
>>>> 3. resume both primary vm and secondary vm
>>>> We will send qemu's state each time in step2, and
>>>> slave's qemu should read it each time before resuming
>>>> secondary vm. Introduce a new API libxl__domain_restore()
>>>> to do it. This API should be called before resuming
>>>> secondary vm.
>>>
>>> Is this a preexisting qemu interface or one to be added?
>>
>> We added the qemu interface "xen-load-devices-state",
>> it's not in qemu upstream yet.
>
> OK, please mention this dependency in the commit text since we will want
> to be sure the interface is going to be accepted in this form by QEMU
> upstream before we start using it. Please also CC the QEMU maintainers
> on this patch in the future (by adding Cc: below the S-o-b if you don't
> want to spam them the whole series), I've added them here now.
>
> In particular "devices" seems odd to me, perhaps
> "xen-load-device-state"?

This api is an invert operation to "xen-save-devices-stat", we used the name
"xen-load-devices-state" in order to follow the existing naming style...

>
>>
>>>
>>>>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> ---
>>>>    tools/libxl/libxl_dom_save.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>>>>    tools/libxl/libxl_internal.h |  4 ++++
>>>>    tools/libxl/libxl_qmp.c      | 10 ++++++++++
>>>>    3 files changed, 61 insertions(+)
>>>>
>>>> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
>>>> index 74a6bae..f9627f8 100644
>>>> --- a/tools/libxl/libxl_dom_save.c
>>>> +++ b/tools/libxl/libxl_dom_save.c
>>>> @@ -663,6 +663,53 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>>>>        return 0;
>>>>    }
>>>>
>>>> +int libxl__domain_restore(libxl__gc *gc, uint32_t domid)
>>>> +{
>>>> +    int rc = 0;
>>>> +
>>>> +    libxl_domain_type type = libxl__domain_type(gc, domid);
>>>> +    if (type != LIBXL_DOMAIN_TYPE_HVM) {
>>>> +        rc = ERROR_FAIL;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    rc = libxl__domain_restore_device_model(gc, domid);
>>>> +    if (rc)
>>>> +        LOG(ERROR, "failed to restore device mode for domain %u:%d",
>>>> +            domid, rc);
>>>> +out:
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid)
>>>> +{
>>>> +    char *state_file;
>>>> +    int rc;
>>>> +
>>>> +    switch (libxl__device_model_version_running(gc, domid)) {
>>>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
>>>> +        /* not supported now */
>>>> +        rc = ERROR_INVAL;
>>>> +        break;
>>>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
>>>> +        /*
>>>> +         * This function may be called too many times for the same gc,
>>>> +         * so we use NOGC, and free the memory before return to avoid
>>>> +         * OOM.
>>>> +         */
>>>> +        state_file = libxl__sprintf(NOGC,
>>>> +                                    XC_DEVICE_MODEL_RESTORE_FILE".%d",
>>>> +                                    domid);
>>>> +        rc = libxl__qmp_restore(gc, domid, state_file);
>>>> +        free(state_file);
>>>> +        break;
>>>> +    default:
>>>> +        rc = ERROR_INVAL;
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>>    /*
>>>>     * Local variables:
>>>>     * mode: C
>>>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>>>> index 1905195..20364c6 100644
>>>> --- a/tools/libxl/libxl_internal.h
>>>> +++ b/tools/libxl/libxl_internal.h
>>>> @@ -1022,6 +1022,7 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
>>>>
>>>>    _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
>>>>                                         uint32_t size, void *data);
>>>> +_hidden int libxl__domain_restore_device_model(libxl__gc *gc, uint32_t domid);
>>>>    _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
>>>>
>>>>    _hidden const char *libxl__userdata_path(libxl__gc *gc, uint32_t domid,
>>>> @@ -1039,6 +1040,7 @@ _hidden int libxl__userdata_store(libxl__gc *gc, uint32_t domid,
>>>>                                      const char *userdata_userid,
>>>>                                      const uint8_t *data, int datalen);
>>>>
>>>> +_hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>>>>    _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>>>>                                     int suspend_cancel);
>>>>    _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
>>>> @@ -1651,6 +1653,8 @@ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>>>>    _hidden int libxl__qmp_resume(libxl__gc *gc, int domid);
>>>>    /* Save current QEMU state into fd. */
>>>>    _hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename);
>>>> +/* Load current QEMU state from fd. */
>>>> +_hidden int libxl__qmp_restore(libxl__gc *gc, int domid, const char *filename);
>>>>    /* Set dirty bitmap logging status */
>>>>    _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enable);
>>>>    _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
>>>> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
>>>> index 9aa7e2e..a6f1a21 100644
>>>> --- a/tools/libxl/libxl_qmp.c
>>>> +++ b/tools/libxl/libxl_qmp.c
>>>> @@ -892,6 +892,16 @@ int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>>>>                               NULL, NULL);
>>>>    }
>>>>
>>>> +int libxl__qmp_restore(libxl__gc *gc, int domid, const char *state_file)
>>>> +{
>>>> +    libxl__json_object *args = NULL;
>>>> +
>>>> +    qmp_parameters_add_string(gc, &args, "filename", state_file);
>>>> +
>>>> +    return qmp_run_command(gc, domid, "xen-load-devices-state", args,
>>>> +                           NULL, NULL);
>>>> +}
>>>> +
>>>>    static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp,
>>>>                          char *device, char *target, char *arg)
>>>>    {
>>>
>>>
>>> .
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-11  8:44       ` Ian Campbell
@ 2015-06-11  8:56         ` Wen Congyang
  2015-06-11  9:41           ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  8:56 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, eddie.dong, andrew.cooper3, yunhong.jiang, ian.jackson,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

On 06/11/2015 04:44 PM, Ian Campbell wrote:
> On Thu, 2015-06-11 at 10:42 +0800, Wen Congyang wrote:
>> On 06/10/2015 11:18 PM, Ian Campbell wrote:
>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>
>>>> For PVHVM, the hypercall return code is 0, and it can be resumed
>>>> in a new domain context.
>>>> we suspend PVHVM and resume it is like this:
>>>> 1. suspend it via evtchn
>>>> 2. modifty the return code to 1
>>>> 3. the guest know that the suspend is cancelled, we will use fast path
>>>>    to resume it.
>>>>
>>>> Under COLO, we will update the guest's state(modify memory, cpu's registers,
>>>> device status...). In this case, we cannot use the fast path to resume it.
>>>> Keep the return code 0, and use a slow path to resume the guest. We have
>>>> updated the guest state, so we call it a new domain context.
>>>>
>>>> For HVM, the hypercall is a NOP.
>>>
>>> This doesn't match my reading of domain_resume on the Xen side, which is
>>> the ultimate effect of this hypercall. It seems to unpause the domain
>>> (and all vcpus) regardless of the domain type, including PVHVM vs HVM
>>> (which isn't something Xen is generally aware of anyway).
>>>
>>> I also can't really follow the stuff about PVHVM vs HVM vs uncooperative
>>> guests, and I certainly can't see where the PVHVM vs HVM distinction is
>>> made in this patch.
>>
>> Sorry for my mistake. I read the codes again:
>>
>> 1. suspend
>> a. PVHVM and PV: we use the same way to suspend the guest(send the suspend
>>    request to the guest)
>> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>>    the guest
>> c. ???: suspending the guest via XenBus control node
> 
> AFAIK c is another option under a, it depends on whether the guest
> supports evtchn or not, if not then the xenstore variant will be used.

I remember it now. IIRC, the behavior in the guest are the same. Is it right?

Thanks
Wen Congyang

> 
>> I don't know we will goto c in which case.
>>
>> 2. Resume:
>> a. fast path
>>    In this case, we don't change the guest's state.
>>    PV: modify the return code to 1, and than call the domctl: XEN_DOMCTL_resumedomain
>>    PVHVM: same with PV
>>    HVM: do nothing in modify_returncode, and than call the domctl: XEN_DOMCTL_resumedomain
>> b. slow
>>    In this case, we have changed the guest's state.
>>    PV: update start info, and reset all secondary CPU states. Than call the
>>    domctl: XEN_DOMCTL_resumedomain
>>    PVHVM and HVM can not be resumed.
>>
>> For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain
>> can work. I am not sure if we should update start info and reset all secondary CPU
>> states.
>>
>> For pure HVM guest, in my test, only call the domctl: XEN_DOMCTL_resumedomain can
>> work.
>>
>> So we can call libxl__domain_resume(..., 1) if we don't change the guest state, otherwise
>> call libxl__domain_resume(..., 0).
>>
>> Any suggestion is welcomed.
>>
>> Thanks
>> Wen Congyang
>>
>>
>>>
>>> Ian.
>>>
>>>
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> ---
>>>>  tools/libxc/xc_resume.c | 22 ++++++++++++++++++----
>>>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>>>> index e67bebd..bd82334 100644
>>>> --- a/tools/libxc/xc_resume.c
>>>> +++ b/tools/libxc/xc_resume.c
>>>> @@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
>>>>      return do_domctl(xch, &domctl);
>>>>  }
>>>>  
>>>> +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
>>>> +{
>>>> +    DECLARE_DOMCTL;
>>>> +
>>>> +    /*
>>>> +     * If it is PVHVM, the hypercall return code is 0, because this
>>>> +     * is not a fast path resume, we do not modify_returncode as in
>>>> +     * xc_domain_resume_cooperative.
>>>> +     * (resuming it in a new domain context)
>>>> +     *
>>>> +     * If it is a HVM, the hypercall is a NOP.
>>>> +     */
>>>> +    domctl.cmd = XEN_DOMCTL_resumedomain;
>>>> +    domctl.domain = domid;
>>>> +    return do_domctl(xch, &domctl);
>>>> +}
>>>> +
>>>>  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>>>  {
>>>>      DECLARE_DOMCTL;
>>>> @@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
>>>>       */
>>>>  #if defined(__i386__) || defined(__x86_64__)
>>>>      if ( info.hvm )
>>>> -    {
>>>> -        ERROR("Cannot resume uncooperative HVM guests");
>>>> -        return rc;
>>>> -    }
>>>> +        return xc_domain_resume_hvm(xch, domid);
>>>>  
>>>>      if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
>>>>      {
>>>
>>>
>>> .
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-11  8:43       ` Ian Campbell
@ 2015-06-11  9:09         ` Wen Congyang
  2015-06-11  9:42           ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  9:09 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, andrew.cooper3, yunhong.jiang, eddie.dong, xen-devel,
	guijianfeng, rshriram, ian.jackson

On 06/11/2015 04:43 PM, Ian Campbell wrote:
> On Thu, 2015-06-11 at 10:21 +0800, Yang Hongyang wrote:
>>
>> On 06/10/2015 11:37 PM, Ian Campbell wrote:
>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>
>>>> The guest is paused after libxl_domain_create_restore().
>>>> Secondary vm is running in colo mode. So we need to unpause
>>>> the guest. The current API libxl_domain_unpause() is
>>>> not an internal API. Introduce a new API to support it.
>>>> No functional change.
>>>
>>> In general there is nothing wrong with using a public function
>>> internally. Is there some special consideration here?
>>
>> It's just that we thought it's better to use internal functions for
>> internal purpose.
>> Most the public functions take ctx as the first param, the internal functions
>> take gc/egc as the first param(although we can get ctx from gcs and call
>> public functions when needed).
>> If it doesn't matter, we can drop this patch.
> 
> It doesn't matter so you can, yes.

If the public API creates a new AO, it is safe to call it directly?

Thanks
Wen Congyang

> 
>>
>>>
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> ---
>>>>   tools/libxl/libxl.c          | 20 ++++++++++++++------
>>>>   tools/libxl/libxl_internal.h |  1 +
>>>>   2 files changed, 15 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>>>> index ba2da92..d5691dc 100644
>>>> --- a/tools/libxl/libxl.c
>>>> +++ b/tools/libxl/libxl.c
>>>> @@ -933,9 +933,8 @@ out:
>>>>       return AO_INPROGRESS;
>>>>   }
>>>>
>>>> -int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>>> +int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>>>>   {
>>>> -    GC_INIT(ctx);
>>>>       char *path;
>>>>       char *state;
>>>>       int ret, rc = 0;
>>>> @@ -947,7 +946,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>>>       }
>>>>
>>>>       if (type == LIBXL_DOMAIN_TYPE_HVM) {
>>>> -        uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
>>>> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
>>>>
>>>>           path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
>>>>           state = libxl__xs_read(gc, XBT_NULL, path);
>>>> @@ -957,12 +956,21 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>>>                                            NULL, NULL, NULL);
>>>>           }
>>>>       }
>>>> -    ret = xc_domain_unpause(ctx->xch, domid);
>>>> -    if (ret<0) {
>>>> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
>>>> +
>>>> +    ret = xc_domain_unpause(CTX->xch, domid);
>>>> +    if (ret < 0) {
>>>> +        LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
>>>>           rc = ERROR_FAIL;
>>>>       }
>>>>    out:
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
>>>> +{
>>>> +    GC_INIT(ctx);
>>>> +    int rc = libxl__domain_unpause(gc, domid);
>>>> +
>>>>       GC_FREE;
>>>>       return rc;
>>>>   }
>>>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>>>> index 20364c6..366470f 100644
>>>> --- a/tools/libxl/libxl_internal.h
>>>> +++ b/tools/libxl/libxl_internal.h
>>>> @@ -1044,6 +1044,7 @@ _hidden int libxl__domain_restore(libxl__gc *gc, uint32_t domid);
>>>>   _hidden int libxl__domain_resume(libxl__gc *gc, uint32_t domid,
>>>>                                    int suspend_cancel);
>>>>   _hidden int libxl__domain_s3_resume(libxl__gc *gc, int domid);
>>>> +_hidden int libxl__domain_unpause(libxl__gc *gc, uint32_t domid);
>>>>
>>>>   /* returns 0 or 1, or a libxl error code */
>>>>   _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
>>>
>>>
>>> .
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state
  2015-06-11  8:55         ` Yang Hongyang
@ 2015-06-11  9:41           ` Ian Campbell
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  9:41 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, Stefano Stabellini, andrew.cooper3,
	yunhong.jiang, eddie.dong, xen-devel, rshriram, guijianfeng,
	Anthony Perard, ian.jackson

On Thu, 2015-06-11 at 16:55 +0800, Yang Hongyang wrote:
> 
> On 06/11/2015 04:43 PM, Ian Campbell wrote:
> > On Thu, 2015-06-11 at 10:09 +0800, Yang Hongyang wrote:
> >>
> >> On 06/10/2015 11:35 PM, Ian Campbell wrote:
> >>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >>>> Secondary vm is running in colo mode. So we will do
> >>>> the following things again and again:
> >>>> 1. suspend both primay vm and secondary vm
> >>>> 2. sync the state
> >>>> 3. resume both primary vm and secondary vm
> >>>> We will send qemu's state each time in step2, and
> >>>> slave's qemu should read it each time before resuming
> >>>> secondary vm. Introduce a new API libxl__domain_restore()
> >>>> to do it. This API should be called before resuming
> >>>> secondary vm.
> >>>
> >>> Is this a preexisting qemu interface or one to be added?
> >>
> >> We added the qemu interface "xen-load-devices-state",
> >> it's not in qemu upstream yet.
> >
> > OK, please mention this dependency in the commit text since we will want
> > to be sure the interface is going to be accepted in this form by QEMU
> > upstream before we start using it. Please also CC the QEMU maintainers
> > on this patch in the future (by adding Cc: below the S-o-b if you don't
> > want to spam them the whole series), I've added them here now.
> >
> > In particular "devices" seems odd to me, perhaps
> > "xen-load-device-state"?
> 
> This api is an invert operation to "xen-save-devices-stat", we used the name
> "xen-load-devices-state" in order to follow the existing naming style...

OK, I guess it'll do then, thanks.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests
  2015-06-11  8:56         ` Wen Congyang
@ 2015-06-11  9:41           ` Ian Campbell
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  9:41 UTC (permalink / raw)
  To: Wen Congyang
  Cc: wei.liu2, ian.jackson, yunhong.jiang, eddie.dong, xen-devel,
	guijianfeng, andrew.cooper3, rshriram, Yang Hongyang

On Thu, 2015-06-11 at 16:56 +0800, Wen Congyang wrote:
> On 06/11/2015 04:44 PM, Ian Campbell wrote:
> > On Thu, 2015-06-11 at 10:42 +0800, Wen Congyang wrote:
> >> On 06/10/2015 11:18 PM, Ian Campbell wrote:
> >>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >>>> From: Wen Congyang <wency@cn.fujitsu.com>
> >>>>
> >>>> For PVHVM, the hypercall return code is 0, and it can be resumed
> >>>> in a new domain context.
> >>>> we suspend PVHVM and resume it is like this:
> >>>> 1. suspend it via evtchn
> >>>> 2. modifty the return code to 1
> >>>> 3. the guest know that the suspend is cancelled, we will use fast path
> >>>>    to resume it.
> >>>>
> >>>> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> >>>> device status...). In this case, we cannot use the fast path to resume it.
> >>>> Keep the return code 0, and use a slow path to resume the guest. We have
> >>>> updated the guest state, so we call it a new domain context.
> >>>>
> >>>> For HVM, the hypercall is a NOP.
> >>>
> >>> This doesn't match my reading of domain_resume on the Xen side, which is
> >>> the ultimate effect of this hypercall. It seems to unpause the domain
> >>> (and all vcpus) regardless of the domain type, including PVHVM vs HVM
> >>> (which isn't something Xen is generally aware of anyway).
> >>>
> >>> I also can't really follow the stuff about PVHVM vs HVM vs uncooperative
> >>> guests, and I certainly can't see where the PVHVM vs HVM distinction is
> >>> made in this patch.
> >>
> >> Sorry for my mistake. I read the codes again:
> >>
> >> 1. suspend
> >> a. PVHVM and PV: we use the same way to suspend the guest(send the suspend
> >>    request to the guest)
> >> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
> >>    the guest
> >> c. ???: suspending the guest via XenBus control node
> > 
> > AFAIK c is another option under a, it depends on whether the guest
> > supports evtchn or not, if not then the xenstore variant will be used.
> 
> I remember it now. IIRC, the behavior in the guest are the same. Is it right?

I _think_ so, but I don't know for sure, you'd have to check.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-11  9:09         ` Wen Congyang
@ 2015-06-11  9:42           ` Ian Campbell
  2015-06-11  9:48             ` Wen Congyang
  2015-06-12 11:23             ` Ian Jackson
  0 siblings, 2 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-11  9:42 UTC (permalink / raw)
  To: Wen Congyang
  Cc: wei.liu2, andrew.cooper3, yunhong.jiang, eddie.dong, xen-devel,
	guijianfeng, rshriram, Yang Hongyang, ian.jackson

On Thu, 2015-06-11 at 17:09 +0800, Wen Congyang wrote:
> On 06/11/2015 04:43 PM, Ian Campbell wrote:
> > On Thu, 2015-06-11 at 10:21 +0800, Yang Hongyang wrote:
> >>
> >> On 06/10/2015 11:37 PM, Ian Campbell wrote:
> >>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >>>> From: Wen Congyang <wency@cn.fujitsu.com>
> >>>>
> >>>> The guest is paused after libxl_domain_create_restore().
> >>>> Secondary vm is running in colo mode. So we need to unpause
> >>>> the guest. The current API libxl_domain_unpause() is
> >>>> not an internal API. Introduce a new API to support it.
> >>>> No functional change.
> >>>
> >>> In general there is nothing wrong with using a public function
> >>> internally. Is there some special consideration here?
> >>
> >> It's just that we thought it's better to use internal functions for
> >> internal purpose.
> >> Most the public functions take ctx as the first param, the internal functions
> >> take gc/egc as the first param(although we can get ctx from gcs and call
> >> public functions when needed).
> >> If it doesn't matter, we can drop this patch.
> > 
> > It doesn't matter so you can, yes.
> 
> If the public API creates a new AO, it is safe to call it directly?

A public function which takes an ao_how is, I believe, an exception to
this rule and should be annotated with LIBXL_EXTERNAL_CALLERS_ONLY to
prevent accidents.

I don't think libxl_domain_unpause is such a function though.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-11  9:42           ` Ian Campbell
@ 2015-06-11  9:48             ` Wen Congyang
  2015-06-12 11:23             ` Ian Jackson
  1 sibling, 0 replies; 106+ messages in thread
From: Wen Congyang @ 2015-06-11  9:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, andrew.cooper3, yunhong.jiang, eddie.dong, xen-devel,
	guijianfeng, rshriram, Yang Hongyang, ian.jackson

On 06/11/2015 05:42 PM, Ian Campbell wrote:
> On Thu, 2015-06-11 at 17:09 +0800, Wen Congyang wrote:
>> On 06/11/2015 04:43 PM, Ian Campbell wrote:
>>> On Thu, 2015-06-11 at 10:21 +0800, Yang Hongyang wrote:
>>>>
>>>> On 06/10/2015 11:37 PM, Ian Campbell wrote:
>>>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>>>
>>>>>> The guest is paused after libxl_domain_create_restore().
>>>>>> Secondary vm is running in colo mode. So we need to unpause
>>>>>> the guest. The current API libxl_domain_unpause() is
>>>>>> not an internal API. Introduce a new API to support it.
>>>>>> No functional change.
>>>>>
>>>>> In general there is nothing wrong with using a public function
>>>>> internally. Is there some special consideration here?
>>>>
>>>> It's just that we thought it's better to use internal functions for
>>>> internal purpose.
>>>> Most the public functions take ctx as the first param, the internal functions
>>>> take gc/egc as the first param(although we can get ctx from gcs and call
>>>> public functions when needed).
>>>> If it doesn't matter, we can drop this patch.
>>>
>>> It doesn't matter so you can, yes.
>>
>> If the public API creates a new AO, it is safe to call it directly?
> 
> A public function which takes an ao_how is, I believe, an exception to
> this rule and should be annotated with LIBXL_EXTERNAL_CALLERS_ONLY to
> prevent accidents.
> 
> I don't think libxl_domain_unpause is such a function though.

OK, we will drop this patch.

Thanks
Wen Congyang

> 
> Ian.
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11  8:48                                 ` Wen Congyang
@ 2015-06-11 10:20                                   ` Paul Durrant
  2015-06-11 11:14                                     ` Wen Congyang
  2015-06-11 12:58                                     ` Yang Hongyang
  0 siblings, 2 replies; 106+ messages in thread
From: Paul Durrant @ 2015-06-11 10:20 UTC (permalink / raw)
  To: Wen Congyang, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 11 June 2015 09:48
> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/11/2015 04:32 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 11 June 2015 02:14
> >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/10/2015 07:47 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >>>> bounces@lists.xen.org] On Behalf Of Wen Congyang
> >>>> Sent: 10 June 2015 12:38
> >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>> Sent: 10 June 2015 11:55
> >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >>>> devel@lists.xen.org
> >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> zero
> >>>> ioreq
> >>>>>> page only one time
> >>>>>>
> >>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>>>> Sent: 10 June 2015 10:06
> >>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org;
> Paul
> >>>>>> Durrant
> >>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
> >>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
> >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> >> zero
> >>>>>> ioreq
> >>>>>>>> page only one time
> >>>>>>>>
> >>>>>>>> Cc: Paul Durrant
> >>>>>>>>
> >>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
> >>>> resume
> >>>>>> the
> >>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
> >> the
> >>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
> >> more
> >>>>>>>>>>>>>>>> than one time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after
> the
> >> vm
> >>>> is
> >>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang
> <yanghy@cn.fujitsu.com>
> >>>>>>>>>>>>>>>> Signed-off-by: Wen congyang
> <wency@cn.fujitsu.com>
> >>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The issue here is that we are running the restore
> algorithm
> >>>> over
> >>>>>> a
> >>>>>>>>>>>>>>> domain which has already been running in Xen for a
> while.
> >>>> This
> >>>>>> is a
> >>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Exactly.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Does the qemu process associated with this domain get
> >>>> frozen
> >>>>>>>>>>>>>>> while the
> >>>>>>>>>>>>>>> secondary is being reset, or does the process get
> destroyed
> >>>> and
> >>>>>>>>>>>>>>> recreated.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
> >>>>>> suspended
> >>>>>>>> at
> >>>>>>>>>>>>>> checkpoint?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Well - at the point that the buffered records are being
> >>>> processed,
> >>>>>> we
> >>>>>>>>>>>>> are in the process of resetting the state of the secondary to
> >>>> match
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> primary.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, at this point, the qemu process associated with this
> >> domain is
> >>>>>>>>>>>> frozen.
> >>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
> >>>> qemu)
> >>>>>> to
> >>>>>>>>>>>> pause
> >>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
> >> with
> >>>>>> the
> >>>>>>>>>>>> received
> >>>>>>>>>>>> state, that's why we add a
> >>>>>> libxl__qmp_restore(qemu_load_vmstate()
> >>>>>>>> in
> >>>>>>>>>>>> qemu)
> >>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
> >> qemu
> >>>> only
> >>>>>>>>>>>> start
> >>>>>>>>>>>> with the received state, there's no api to load received state
> >> while
> >>>>>>>>>>>> qemu is
> >>>>>>>>>>>> running for a while.
> >>>>>>>>>>>
> >>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero
> the
> >>>> page
> >>>>>>>>>>> here.  The event channel in the page is not guaranteed to be
> the
> >>>>>> same
> >>>>>>>>>>> between the primary and secondary,
> >>>>>>>>>>
> >>>>>>>>>> That's why we don't zero it on secondary.
> >>>>>>>>>
> >>>>>>>>> I think you missed my point.  Apologies for the double negative.
> It
> >>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
> >>>>>> reasons.
> >>>>>>>>>
> >>>>>>>>> The page in question is subject to logdirty just like any other
> guest
> >>>>>>>>> pages, which means that if the guest writes to it naturally (i.e.
> not a
> >>>>>>>>> Xen or Qemu write, both of whom have magic mappings which
> are
> >>>> not
> >>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
> >>>>>>>>> event channel could be different, the lack of zeroing it at this
> point
> >>>>>>>>> means that the event channel would be wrong as opposed to
> >> simply
> >>>>>>>>> missing.  This is a worse position to be in.
> >>>>>>>>
> >>>>>>>> The guest should not access this page. I am not sure if the guest
> can
> >>>>>>>> access the ioreq page.
> >>>>>>>>
> >>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied
> to
> >>>>>>>> the secondary vm. The ioreq page will contain a wrong event
> >> channel,
> >>>> the
> >>>>>>>> hypervisor will check it: if the event channel is wrong, the guest
> will
> >>>>>>>> be crashed.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> and we don't want to unexpectedly
> >>>>>>>>>>> find a pending/in-flight ioreq.
> >>>>>>>>>>
> >>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>> suspended,
> >>>>>>>> there
> >>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
> >>>>>>>>>
> >>>>>>>>> In the common case perhaps, but we must consider the
> >> exceptional
> >>>>>> case.
> >>>>>>>>> The exceptional case here is some corruption which happens to
> >>>> appear
> >>>>>> as
> >>>>>>>>> an in-flight ioreq.
> >>>>>>>>
> >>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If
> the
> >>>>>>>> hypervisor
> >>>>>>>> has a bug, anything can happen. I think we should trust the
> >> hypervisor.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Either qemu needs to take care of re-initialising the event
> >> channels
> >>>>>>>>>>> back to appropriate values, or Xen should tolerate the
> channels
> >>>>>>>>>>> disappearing.
> >>>>>>>>>
> >>>>>>>>> I still stand by this statement.  I believe it is the only safe way of
> >>>>>>>>> solving the issue you have discovered.
> >>>>>>>>
> >>>>>>>> Add a new qemu monitor command to update ioreq page?
> >>>>>>>>
> >>>>>>>
> >>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image)
> then
> >> I
> >>>>>> suspect you're going to have to destroy and re-create the ioreq
> server
> >> so
> >>>>>> that the shared page gets re-populated with the correct event
> >> channels.
> >>>>>> Either that or you're going to have to ensure that the page is not part
> of
> >>>>>> restored image and sample the new one that Xen should have set
> up.
> >>>>>>
> >>>>>>
> >>>>>> I agree with it. I will try to add a new qemu monitor command(or do
> it
> >>>> when
> >>>>>> updating qemu's state) to destroy and re-create it.
> >>>>>
> >>>>> The slightly tricky part of that is that you're going to have to cache and
> >>>> replay all the registrations that were done on the old instance, but you
> >> need
> >>>> to do that in any case as it's not state that is transferred in the VM save
> >>>> record.
> >>>>
> >>>> Why do we have to cache and replay all the registrations that were
> done
> >> on
> >>>> the old instance?
> >>>
> >>> Do you not have device models that you need to continue to function?
> >> When the ioreq server is torn down then all MMIO, port IO and PCI config
> >> ranges that were mapped to it will disappear.
> >>
> >> Yes, I don't known which should be done unless I implement and test it.
> >>
> >> I have some questions about it:
> >> 1. Can guest access the ioreq page? If the page is modified by the guest
> >> unexpectedly,
> >>    what will happen?
> >
> > No, the guest cannot modify the pages once a non-default ioreq server is
> active. The pages are removed from the guest P2M when it is activated,
> which is one of the reasons for modifying QEMU to not behave as a legacy
> default server.
> >
> >> 2. If the ioreq page is dirtied by the guest, it will be transfered from
> primary
> >>    to secondary during checkpoint. The evtchn is invalid, I think the best
> >> behavior
> >>    is that: make the guest crashed, not continue to run.
> >
> > As I said, the pages are not in the P2M if the server is active so they will not
> be transferred as part of the VM state. However, this presents a problem; at
> the far end, the emulator will not be able to hook into the guest. So, when
> the source domain is paused, the ioreq server needs to be torn down (so
> that its pages are re-inserted into the P2M and marked dirty for transfer).
> This is what happens in a normal migration. One extra problem you have is
> that the source domain is not then killed, it is resumed along with the
> emulator. Thus, on resume, the emulator needs to create a new ioreq server
> and re-register all its device models with that new server.
> > I don't know the detail of what you do at the far end, but if you always start
> a new emulator instance using the QEMU save record then you should be
> fine (just like with a normal migration).
> 
> I don't find the codes where the ioreq server is torn down when the source
> domain is paused. Which function?

Sorry, I overstated that. By 'torn down' I meant disabled. The function that does it is:

static void xen_hvm_change_state_handler(void *opaque, int running,
                                         RunState rstate)
{
    XenIOState *state = opaque;

    if (running) {
        xen_main_loop_prepare(state);
    }

    xen_set_ioreq_server_state(xen_xc, xen_domid,
                               state->ioservid,
                               (rstate == RUN_STATE_RUNNING));
}

> 
> In our implementation, we don't start a new emulator. The codes can work,
> but some bugs may be not triggered.
> 

How do you reconcile the incoming QEMU save record with the running emulator state?

  Paul

> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>   Paul
> >>>
> >>>> We will set to the guest to a new state, the old state should be
> dropped.
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>   Paul
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>   Paul
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ~Andrew
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xen.org
> >>>> http://lists.xen.org/xen-devel
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xen.org
> >>> http://lists.xen.org/xen-devel
> >>> .
> >>>
> >
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-11  8:41       ` Ian Campbell
@ 2015-06-11 10:45         ` Andrew Cooper
  2015-06-11 10:55           ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Andrew Cooper @ 2015-06-11 10:45 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On 11/06/15 09:41, Ian Campbell wrote:
> On Thu, 2015-06-11 at 10:07 +0800, Yang Hongyang wrote:
>> On 06/10/2015 11:20 PM, Ian Campbell wrote:
>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>> When we are under COLO, we will send dirty page bitmap info from
>>>> secondary to primary at every checkpoint.
>>> ... and this is a _libxl_ operation? Is that the right layer here?
>> For the first question, Yes, this is done in the suspend callback on
>> restore side. We do this in libxl because currently we only added a
>> back channel on libxl side. There're no back channel in libxc.
>>
>> By considering this more, if we do this in libxc part, the code will be
>> less complex: we can drop the 4th & 9th patch of this series and also
>> get rid of the get_dirty_pfn() callback. instead we will add a patch to
>> add back channel in libxc.
> That sounds better to me, but lets see what Andrew thinks.
>
>> For the second question, I'm not sure, what's Andrew's opinion? which
>> is the right layer to do this operation, libxl or libxc?

There are a number of bits of information which would be useful going in
"the backchannel".

Some are definitely more appropriate at the libxc level, but others are
more appropriate at the libxl.

If you recall from the hackathon, there was an Alibaba usecase where
they wanted a positive success/fail from the receiving side that the VM
has started up successfully before choosing between cleaning up or
continuing the VM on the sending side.  This would have to be a libxl
level backchannel.

Whatever happens, backchannel wise, it should be a sensibly
type/length/chunk'd stream.  (I think there is a spec or two floating
around somewhere which might be a good start ;p)  There should probably
be a bit of active negotiation at the start of the backchannel to a)
confirm you have the correct backchannel and b) the backchannel is
actually functioning.

The data on "the backchannel" is always going to be in reply to an
action taking place in the primary channel, but there are complications
in that the libxc bit is inherently a blocking model.  In terms of
coordination, I am leaning towards the view of it being easier and
cleaner for each level to maintain its own backchannel communication. 
The libxc bits can expect to read some records out of the backchannel at
each checkpoint and take appropriate actions before starting the next
checkpoint.

Thoughts?

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-11 10:45         ` Andrew Cooper
@ 2015-06-11 10:55           ` Ian Campbell
  2015-06-15  1:50             ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-11 10:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

On Thu, 2015-06-11 at 11:45 +0100, Andrew Cooper wrote:
> On 11/06/15 09:41, Ian Campbell wrote:
> > On Thu, 2015-06-11 at 10:07 +0800, Yang Hongyang wrote:
> >> On 06/10/2015 11:20 PM, Ian Campbell wrote:
> >>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> >>>> When we are under COLO, we will send dirty page bitmap info from
> >>>> secondary to primary at every checkpoint.
> >>> ... and this is a _libxl_ operation? Is that the right layer here?
> >> For the first question, Yes, this is done in the suspend callback on
> >> restore side. We do this in libxl because currently we only added a
> >> back channel on libxl side. There're no back channel in libxc.
> >>
> >> By considering this more, if we do this in libxc part, the code will be
> >> less complex: we can drop the 4th & 9th patch of this series and also
> >> get rid of the get_dirty_pfn() callback. instead we will add a patch to
> >> add back channel in libxc.
> > That sounds better to me, but lets see what Andrew thinks.
> >
> >> For the second question, I'm not sure, what's Andrew's opinion? which
> >> is the right layer to do this operation, libxl or libxc?
> 
> There are a number of bits of information which would be useful going in
> "the backchannel".
> 
> Some are definitely more appropriate at the libxc level, but others are
> more appropriate at the libxl.
> 
> If you recall from the hackathon, there was an Alibaba usecase where
> they wanted a positive success/fail from the receiving side that the VM
> has started up successfully before choosing between cleaning up or
> continuing the VM on the sending side.  This would have to be a libxl
> level backchannel.

FWIW this particular case is currently an xl level backchannel, but I
think your general point stands.

> Whatever happens, backchannel wise, it should be a sensibly
> type/length/chunk'd stream.  (I think there is a spec or two floating
> around somewhere which might be a good start ;p)  There should probably
> be a bit of active negotiation at the start of the backchannel to a)
> confirm you have the correct backchannel and b) the backchannel is
> actually functioning.
> 
> The data on "the backchannel" is always going to be in reply to an
> action taking place in the primary channel, but there are complications
> in that the libxc bit is inherently a blocking model.  In terms of
> coordination, I am leaning towards the view of it being easier and
> cleaner for each level to maintain its own backchannel communication. 
> The libxc bits can expect to read some records out of the backchannel at
> each checkpoint and take appropriate actions before starting the next
> checkpoint.
> 
> Thoughts?
> 
> ~Andrew
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 10:20                                   ` Paul Durrant
@ 2015-06-11 11:14                                     ` Wen Congyang
  2015-06-11 12:54                                       ` Yang Hongyang
  2015-06-11 12:58                                     ` Yang Hongyang
  1 sibling, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-11 11:14 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, Yang Hongyang, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/11/2015 06:20 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 11 June 2015 09:48
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/11/2015 04:32 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 11 June 2015 02:14
>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/10/2015 07:47 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>>>>> bounces@lists.xen.org] On Behalf Of Wen Congyang
>>>>>> Sent: 10 June 2015 12:38
>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>> Sent: 10 June 2015 11:55
>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>>> devel@lists.xen.org
>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>> zero
>>>>>> ioreq
>>>>>>>> page only one time
>>>>>>>>
>>>>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>>>> Sent: 10 June 2015 10:06
>>>>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org;
>> Paul
>>>>>>>> Durrant
>>>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>>>> zero
>>>>>>>> ioreq
>>>>>>>>>> page only one time
>>>>>>>>>>
>>>>>>>>>> Cc: Paul Durrant
>>>>>>>>>>
>>>>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
>>>>>> resume
>>>>>>>> the
>>>>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
>>>> the
>>>>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
>>>> more
>>>>>>>>>>>>>>>>>> than one time.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after
>> the
>>>> vm
>>>>>> is
>>>>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang
>> <yanghy@cn.fujitsu.com>
>>>>>>>>>>>>>>>>>> Signed-off-by: Wen congyang
>> <wency@cn.fujitsu.com>
>>>>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The issue here is that we are running the restore
>> algorithm
>>>>>> over
>>>>>>>> a
>>>>>>>>>>>>>>>>> domain which has already been running in Xen for a
>> while.
>>>>>> This
>>>>>>>> is a
>>>>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Exactly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does the qemu process associated with this domain get
>>>>>> frozen
>>>>>>>>>>>>>>>>> while the
>>>>>>>>>>>>>>>>> secondary is being reset, or does the process get
>> destroyed
>>>>>> and
>>>>>>>>>>>>>>>>> recreated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
>>>>>>>> suspended
>>>>>>>>>> at
>>>>>>>>>>>>>>>> checkpoint?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well - at the point that the buffered records are being
>>>>>> processed,
>>>>>>>> we
>>>>>>>>>>>>>>> are in the process of resetting the state of the secondary to
>>>>>> match
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> primary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, at this point, the qemu process associated with this
>>>> domain is
>>>>>>>>>>>>>> frozen.
>>>>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
>>>>>> qemu)
>>>>>>>> to
>>>>>>>>>>>>>> pause
>>>>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
>>>> with
>>>>>>>> the
>>>>>>>>>>>>>> received
>>>>>>>>>>>>>> state, that's why we add a
>>>>>>>> libxl__qmp_restore(qemu_load_vmstate()
>>>>>>>>>> in
>>>>>>>>>>>>>> qemu)
>>>>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
>>>> qemu
>>>>>> only
>>>>>>>>>>>>>> start
>>>>>>>>>>>>>> with the received state, there's no api to load received state
>>>> while
>>>>>>>>>>>>>> qemu is
>>>>>>>>>>>>>> running for a while.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero
>> the
>>>>>> page
>>>>>>>>>>>>> here.  The event channel in the page is not guaranteed to be
>> the
>>>>>>>> same
>>>>>>>>>>>>> between the primary and secondary,
>>>>>>>>>>>>
>>>>>>>>>>>> That's why we don't zero it on secondary.
>>>>>>>>>>>
>>>>>>>>>>> I think you missed my point.  Apologies for the double negative.
>> It
>>>>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
>>>>>>>> reasons.
>>>>>>>>>>>
>>>>>>>>>>> The page in question is subject to logdirty just like any other
>> guest
>>>>>>>>>>> pages, which means that if the guest writes to it naturally (i.e.
>> not a
>>>>>>>>>>> Xen or Qemu write, both of whom have magic mappings which
>> are
>>>>>> not
>>>>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>>>>>>>> event channel could be different, the lack of zeroing it at this
>> point
>>>>>>>>>>> means that the event channel would be wrong as opposed to
>>>> simply
>>>>>>>>>>> missing.  This is a worse position to be in.
>>>>>>>>>>
>>>>>>>>>> The guest should not access this page. I am not sure if the guest
>> can
>>>>>>>>>> access the ioreq page.
>>>>>>>>>>
>>>>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied
>> to
>>>>>>>>>> the secondary vm. The ioreq page will contain a wrong event
>>>> channel,
>>>>>> the
>>>>>>>>>> hypervisor will check it: if the event channel is wrong, the guest
>> will
>>>>>>>>>> be crashed.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> and we don't want to unexpectedly
>>>>>>>>>>>>> find a pending/in-flight ioreq.
>>>>>>>>>>>>
>>>>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>> suspended,
>>>>>>>>>> there
>>>>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>>>>>>>
>>>>>>>>>>> In the common case perhaps, but we must consider the
>>>> exceptional
>>>>>>>> case.
>>>>>>>>>>> The exceptional case here is some corruption which happens to
>>>>>> appear
>>>>>>>> as
>>>>>>>>>>> an in-flight ioreq.
>>>>>>>>>>
>>>>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If
>> the
>>>>>>>>>> hypervisor
>>>>>>>>>> has a bug, anything can happen. I think we should trust the
>>>> hypervisor.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Either qemu needs to take care of re-initialising the event
>>>> channels
>>>>>>>>>>>>> back to appropriate values, or Xen should tolerate the
>> channels
>>>>>>>>>>>>> disappearing.
>>>>>>>>>>>
>>>>>>>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>>>>>>>> solving the issue you have discovered.
>>>>>>>>>>
>>>>>>>>>> Add a new qemu monitor command to update ioreq page?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image)
>> then
>>>> I
>>>>>>>> suspect you're going to have to destroy and re-create the ioreq
>> server
>>>> so
>>>>>>>> that the shared page gets re-populated with the correct event
>>>> channels.
>>>>>>>> Either that or you're going to have to ensure that the page is not part
>> of
>>>>>>>> restored image and sample the new one that Xen should have set
>> up.
>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with it. I will try to add a new qemu monitor command(or do
>> it
>>>>>> when
>>>>>>>> updating qemu's state) to destroy and re-create it.
>>>>>>>
>>>>>>> The slightly tricky part of that is that you're going to have to cache and
>>>>>> replay all the registrations that were done on the old instance, but you
>>>> need
>>>>>> to do that in any case as it's not state that is transferred in the VM save
>>>>>> record.
>>>>>>
>>>>>> Why do we have to cache and replay all the registrations that were
>> done
>>>> on
>>>>>> the old instance?
>>>>>
>>>>> Do you not have device models that you need to continue to function?
>>>> When the ioreq server is torn down then all MMIO, port IO and PCI config
>>>> ranges that were mapped to it will disappear.
>>>>
>>>> Yes, I don't known which should be done unless I implement and test it.
>>>>
>>>> I have some questions about it:
>>>> 1. Can guest access the ioreq page? If the page is modified by the guest
>>>> unexpectedly,
>>>>    what will happen?
>>>
>>> No, the guest cannot modify the pages once a non-default ioreq server is
>> active. The pages are removed from the guest P2M when it is activated,
>> which is one of the reasons for modifying QEMU to not behave as a legacy
>> default server.
>>>
>>>> 2. If the ioreq page is dirtied by the guest, it will be transfered from
>> primary
>>>>    to secondary during checkpoint. The evtchn is invalid, I think the best
>>>> behavior
>>>>    is that: make the guest crashed, not continue to run.
>>>
>>> As I said, the pages are not in the P2M if the server is active so they will not
>> be transferred as part of the VM state. However, this presents a problem; at
>> the far end, the emulator will not be able to hook into the guest. So, when
>> the source domain is paused, the ioreq server needs to be torn down (so
>> that its pages are re-inserted into the P2M and marked dirty for transfer).
>> This is what happens in a normal migration. One extra problem you have is
>> that the source domain is not then killed, it is resumed along with the
>> emulator. Thus, on resume, the emulator needs to create a new ioreq server
>> and re-register all its device models with that new server.
>>> I don't know the detail of what you do at the far end, but if you always start
>> a new emulator instance using the QEMU save record then you should be
>> fine (just like with a normal migration).
>>
>> I don't find the codes where the ioreq server is torn down when the source
>> domain is paused. Which function?
> 
> Sorry, I overstated that. By 'torn down' I meant disabled. The function that does it is:
> 
> static void xen_hvm_change_state_handler(void *opaque, int running,
>                                          RunState rstate)
> {
>     XenIOState *state = opaque;
> 
>     if (running) {
>         xen_main_loop_prepare(state);
>     }
> 
>     xen_set_ioreq_server_state(xen_xc, xen_domid,
>                                state->ioservid,
>                                (rstate == RUN_STATE_RUNNING));
> }
> 

Yes, I see it now.

If the ioreq page is cleared, can xen_main_loop_prepare() put the correct
evtchn in the ioreq page? If so, I think COLO can work without this patch.

In the hypervisor, the ioreq page is cleared before re-inserted into the P2M.
So the far end always gets the zeroed ioreq page.

IIRC, there is only default ioreq server when I wrote this patch. In that case
we don't have xen_main_loop_prepare(). It is OK that COLO cannot work with
an older version qemu.

Yang, can you test COLO without this patch? I guess it can work, and we can drop
this patch now.

Thanks
Wen Congyang

>>
>> In our implementation, we don't start a new emulator. The codes can work,
>> but some bugs may be not triggered.
>>
> 
> How do you reconcile the incoming QEMU save record with the running emulator state?
> 
>   Paul
> 
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>   Paul
>>>>>
>>>>>> We will set to the guest to a new state, the old state should be
>> dropped.
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>   Paul
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Paul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~Andrew
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xen.org
>>>>>> http://lists.xen.org/xen-devel
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 11:14                                     ` Wen Congyang
@ 2015-06-11 12:54                                       ` Yang Hongyang
  2015-06-12  3:39                                         ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11 12:54 UTC (permalink / raw)
  To: Wen Congyang, Paul Durrant, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson



On 06/11/2015 07:14 PM, Wen Congyang wrote:
> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>> Sent: 11 June 2015 09:48
>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>>> page only one time
>>>
>>> On 06/11/2015 04:32 PM, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>> Sent: 11 June 2015 02:14
>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>> devel@lists.xen.org
>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>> ioreq
>>>>> page only one time
>>>>>
>>>>> On 06/10/2015 07:47 PM, Paul Durrant wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>>>>>> bounces@lists.xen.org] On Behalf Of Wen Congyang
>>>>>>> Sent: 10 June 2015 12:38
>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>> devel@lists.xen.org
>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>>> ioreq
>>>>>>> page only one time
>>>>>>>
>>>>>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>>> Sent: 10 June 2015 11:55
>>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>>>> devel@lists.xen.org
>>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>>> zero
>>>>>>> ioreq
>>>>>>>>> page only one time
>>>>>>>>>
>>>>>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>>>>> Sent: 10 June 2015 10:06
>>>>>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org;
>>> Paul
>>>>>>>>> Durrant
>>>>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@intel.com; Eddie Dong;
>>>>>>>>>>> guijianfeng@cn.fujitsu.com; rshriram@cs.ubc.ca; Ian Jackson
>>>>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>>>>> zero
>>>>>>>>> ioreq
>>>>>>>>>>> page only one time
>>>>>>>>>>>
>>>>>>>>>>> Cc: Paul Durrant
>>>>>>>>>>>
>>>>>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
>>>>>>> resume
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
>>>>> the
>>>>>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
>>>>> more
>>>>>>>>>>>>>>>>>>> than one time.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after
>>> the
>>>>> vm
>>>>>>> is
>>>>>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang
>>> <yanghy@cn.fujitsu.com>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Wen congyang
>>> <wency@cn.fujitsu.com>
>>>>>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The issue here is that we are running the restore
>>> algorithm
>>>>>>> over
>>>>>>>>> a
>>>>>>>>>>>>>>>>>> domain which has already been running in Xen for a
>>> while.
>>>>>>> This
>>>>>>>>> is a
>>>>>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Exactly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Does the qemu process associated with this domain get
>>>>>>> frozen
>>>>>>>>>>>>>>>>>> while the
>>>>>>>>>>>>>>>>>> secondary is being reset, or does the process get
>>> destroyed
>>>>>>> and
>>>>>>>>>>>>>>>>>> recreated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
>>>>>>>>> suspended
>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> checkpoint?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well - at the point that the buffered records are being
>>>>>>> processed,
>>>>>>>>> we
>>>>>>>>>>>>>>>> are in the process of resetting the state of the secondary to
>>>>>>> match
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> primary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, at this point, the qemu process associated with this
>>>>> domain is
>>>>>>>>>>>>>>> frozen.
>>>>>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
>>>>>>> qemu)
>>>>>>>>> to
>>>>>>>>>>>>>>> pause
>>>>>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
>>>>> with
>>>>>>>>> the
>>>>>>>>>>>>>>> received
>>>>>>>>>>>>>>> state, that's why we add a
>>>>>>>>> libxl__qmp_restore(qemu_load_vmstate()
>>>>>>>>>>> in
>>>>>>>>>>>>>>> qemu)
>>>>>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
>>>>> qemu
>>>>>>> only
>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>> with the received state, there's no api to load received state
>>>>> while
>>>>>>>>>>>>>>> qemu is
>>>>>>>>>>>>>>> running for a while.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero
>>> the
>>>>>>> page
>>>>>>>>>>>>>> here.  The event channel in the page is not guaranteed to be
>>> the
>>>>>>>>> same
>>>>>>>>>>>>>> between the primary and secondary,
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's why we don't zero it on secondary.
>>>>>>>>>>>>
>>>>>>>>>>>> I think you missed my point.  Apologies for the double negative.
>>> It
>>>>>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
>>>>>>>>> reasons.
>>>>>>>>>>>>
>>>>>>>>>>>> The page in question is subject to logdirty just like any other
>>> guest
>>>>>>>>>>>> pages, which means that if the guest writes to it naturally (i.e.
>>> not a
>>>>>>>>>>>> Xen or Qemu write, both of whom have magic mappings which
>>> are
>>>>>>> not
>>>>>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>>>>>>>>> event channel could be different, the lack of zeroing it at this
>>> point
>>>>>>>>>>>> means that the event channel would be wrong as opposed to
>>>>> simply
>>>>>>>>>>>> missing.  This is a worse position to be in.
>>>>>>>>>>>
>>>>>>>>>>> The guest should not access this page. I am not sure if the guest
>>> can
>>>>>>>>>>> access the ioreq page.
>>>>>>>>>>>
>>>>>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied
>>> to
>>>>>>>>>>> the secondary vm. The ioreq page will contain a wrong event
>>>>> channel,
>>>>>>> the
>>>>>>>>>>> hypervisor will check it: if the event channel is wrong, the guest
>>> will
>>>>>>>>>>> be crashed.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and we don't want to unexpectedly
>>>>>>>>>>>>>> find a pending/in-flight ioreq.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>> suspended,
>>>>>>>>>>> there
>>>>>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>>>>>>>>
>>>>>>>>>>>> In the common case perhaps, but we must consider the
>>>>> exceptional
>>>>>>>>> case.
>>>>>>>>>>>> The exceptional case here is some corruption which happens to
>>>>>>> appear
>>>>>>>>> as
>>>>>>>>>>>> an in-flight ioreq.
>>>>>>>>>>>
>>>>>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If
>>> the
>>>>>>>>>>> hypervisor
>>>>>>>>>>> has a bug, anything can happen. I think we should trust the
>>>>> hypervisor.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Either qemu needs to take care of re-initialising the event
>>>>> channels
>>>>>>>>>>>>>> back to appropriate values, or Xen should tolerate the
>>> channels
>>>>>>>>>>>>>> disappearing.
>>>>>>>>>>>>
>>>>>>>>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>>>>>>>>> solving the issue you have discovered.
>>>>>>>>>>>
>>>>>>>>>>> Add a new qemu monitor command to update ioreq page?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image)
>>> then
>>>>> I
>>>>>>>>> suspect you're going to have to destroy and re-create the ioreq
>>> server
>>>>> so
>>>>>>>>> that the shared page gets re-populated with the correct event
>>>>> channels.
>>>>>>>>> Either that or you're going to have to ensure that the page is not part
>>> of
>>>>>>>>> restored image and sample the new one that Xen should have set
>>> up.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I agree with it. I will try to add a new qemu monitor command(or do
>>> it
>>>>>>> when
>>>>>>>>> updating qemu's state) to destroy and re-create it.
>>>>>>>>
>>>>>>>> The slightly tricky part of that is that you're going to have to cache and
>>>>>>> replay all the registrations that were done on the old instance, but you
>>>>> need
>>>>>>> to do that in any case as it's not state that is transferred in the VM save
>>>>>>> record.
>>>>>>>
>>>>>>> Why do we have to cache and replay all the registrations that were
>>> done
>>>>> on
>>>>>>> the old instance?
>>>>>>
>>>>>> Do you not have device models that you need to continue to function?
>>>>> When the ioreq server is torn down then all MMIO, port IO and PCI config
>>>>> ranges that were mapped to it will disappear.
>>>>>
>>>>> Yes, I don't known which should be done unless I implement and test it.
>>>>>
>>>>> I have some questions about it:
>>>>> 1. Can guest access the ioreq page? If the page is modified by the guest
>>>>> unexpectedly,
>>>>>     what will happen?
>>>>
>>>> No, the guest cannot modify the pages once a non-default ioreq server is
>>> active. The pages are removed from the guest P2M when it is activated,
>>> which is one of the reasons for modifying QEMU to not behave as a legacy
>>> default server.
>>>>
>>>>> 2. If the ioreq page is dirtied by the guest, it will be transfered from
>>> primary
>>>>>     to secondary during checkpoint. The evtchn is invalid, I think the best
>>>>> behavior
>>>>>     is that: make the guest crashed, not continue to run.
>>>>
>>>> As I said, the pages are not in the P2M if the server is active so they will not
>>> be transferred as part of the VM state. However, this presents a problem; at
>>> the far end, the emulator will not be able to hook into the guest. So, when
>>> the source domain is paused, the ioreq server needs to be torn down (so
>>> that its pages are re-inserted into the P2M and marked dirty for transfer).
>>> This is what happens in a normal migration. One extra problem you have is
>>> that the source domain is not then killed, it is resumed along with the
>>> emulator. Thus, on resume, the emulator needs to create a new ioreq server
>>> and re-register all its device models with that new server.
>>>> I don't know the detail of what you do at the far end, but if you always start
>>> a new emulator instance using the QEMU save record then you should be
>>> fine (just like with a normal migration).
>>>
>>> I don't find the codes where the ioreq server is torn down when the source
>>> domain is paused. Which function?
>>
>> Sorry, I overstated that. By 'torn down' I meant disabled. The function that does it is:
>>
>> static void xen_hvm_change_state_handler(void *opaque, int running,
>>                                           RunState rstate)
>> {
>>      XenIOState *state = opaque;
>>
>>      if (running) {
>>          xen_main_loop_prepare(state);
>>      }
>>
>>      xen_set_ioreq_server_state(xen_xc, xen_domid,
>>                                 state->ioservid,
>>                                 (rstate == RUN_STATE_RUNNING));
>> }
>>
>
> Yes, I see it now.
>
> If the ioreq page is cleared, can xen_main_loop_prepare() put the correct
> evtchn in the ioreq page? If so, I think COLO can work without this patch.
>
> In the hypervisor, the ioreq page is cleared before re-inserted into the P2M.
> So the far end always gets the zeroed ioreq page.
>
> IIRC, there is only default ioreq server when I wrote this patch. In that case
> we don't have xen_main_loop_prepare(). It is OK that COLO cannot work with
> an older version qemu.
>
> Yang, can you test COLO without this patch? I guess it can work, and we can drop
> this patch now.

Ok, this really is a historical patch...

>
> Thanks
> Wen Congyang
>
>>>
>>> In our implementation, we don't start a new emulator. The codes can work,
>>> but some bugs may be not triggered.
>>>
>>
>> How do you reconcile the incoming QEMU save record with the running emulator state?
>>
>>    Paul
>>
>>> Thanks
>>> Wen Congyang
>>>
>>>>
>>>>    Paul
>>>>
>>>>>
>>>>> Thanks
>>>>> Wen Congyang
>>>>>
>>>>>>
>>>>>>    Paul
>>>>>>
>>>>>>> We will set to the guest to a new state, the old state should be
>>> dropped.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Wen Congyang
>>>>>>>
>>>>>>>>
>>>>>>>>    Paul
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Wen Congyang
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Paul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xen.org
>>>>>> http://lists.xen.org/xen-devel
>>>>>> .
>>>>>>
>>>>
>>>> .
>>>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>> .
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 10:20                                   ` Paul Durrant
  2015-06-11 11:14                                     ` Wen Congyang
@ 2015-06-11 12:58                                     ` Yang Hongyang
  2015-06-11 13:25                                       ` Paul Durrant
  1 sibling, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-11 12:58 UTC (permalink / raw)
  To: Paul Durrant, Wen Congyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson



On 06/11/2015 06:20 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 11 June 2015 09:48
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
[...]
>
>>
>> In our implementation, we don't start a new emulator. The codes can work,
>> but some bugs may be not triggered.
>>
>
> How do you reconcile the incoming QEMU save record with the running emulator state?

We introduce a qmp command "xen-load-devices-state"(libxl__qmp_restore) which
can restore the emulator state. The step of resotre emulator state at a
checkpoint is:

1. libxl__qmp_stop    -> vm_stop() in qemu
2. libxl__qmp_restore -> load_vmstate() in qemu
3. libxl__qmp_resume  -> vm_start() in qemu

>
>    Paul
>
>> Thanks
>> Wen Congyang
>>
>>>
>>>    Paul
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>    Paul
>>>>>
>>>>>> We will set to the guest to a new state, the old state should be
>> dropped.
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Paul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~Andrew
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xen.org
>>>>>> http://lists.xen.org/xen-devel
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel
>>>>> .
>>>>>
>>>
>>> .
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 12:58                                     ` Yang Hongyang
@ 2015-06-11 13:25                                       ` Paul Durrant
  2015-06-12  3:22                                         ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-11 13:25 UTC (permalink / raw)
  To: Yang Hongyang, Wen Congyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
> Sent: 11 June 2015 13:59
> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> 
> 
> On 06/11/2015 06:20 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 11 June 2015 09:48
> >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> [...]
> >
> >>
> >> In our implementation, we don't start a new emulator. The codes can
> work,
> >> but some bugs may be not triggered.
> >>
> >
> > How do you reconcile the incoming QEMU save record with the running
> emulator state?
> 
> We introduce a qmp command "xen-load-devices-
> state"(libxl__qmp_restore) which
> can restore the emulator state. The step of resotre emulator state at a
> checkpoint is:
> 
> 1. libxl__qmp_stop    -> vm_stop() in qemu
> 2. libxl__qmp_restore -> load_vmstate() in qemu
> 3. libxl__qmp_resume  -> vm_start() in qemu
> 

Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server.

  Paul

> >
> >    Paul
> >
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>    Paul
> >>>
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>    Paul
> >>>>>
> >>>>>> We will set to the guest to a new state, the old state should be
> >> dropped.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>    Paul
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>    Paul
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Wen Congyang
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ~Andrew
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Xen-devel mailing list
> >>>>>> Xen-devel@lists.xen.org
> >>>>>> http://lists.xen.org/xen-devel
> >>>>>
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@lists.xen.org
> >>>>> http://lists.xen.org/xen-devel
> >>>>> .
> >>>>>
> >>>
> >>> .
> >>>
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > .
> >
> 
> --
> Thanks,
> Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 13:25                                       ` Paul Durrant
@ 2015-06-12  3:22                                         ` Wen Congyang
  2015-06-12  7:41                                           ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-12  3:22 UTC (permalink / raw)
  To: Paul Durrant, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/11/2015 09:25 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
>> Sent: 11 June 2015 13:59
>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>>
>>
>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 11 June 2015 09:48
>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> [...]
>>>
>>>>
>>>> In our implementation, we don't start a new emulator. The codes can
>> work,
>>>> but some bugs may be not triggered.
>>>>
>>>
>>> How do you reconcile the incoming QEMU save record with the running
>> emulator state?
>>
>> We introduce a qmp command "xen-load-devices-
>> state"(libxl__qmp_restore) which
>> can restore the emulator state. The step of resotre emulator state at a
>> checkpoint is:
>>
>> 1. libxl__qmp_stop    -> vm_stop() in qemu
>> 2. libxl__qmp_restore -> load_vmstate() in qemu
>> 3. libxl__qmp_resume  -> vm_start() in qemu
>>
> 
> Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server.

I have some questions about ioreq server:
1. If we use old version xen and newest version qemu, is it OK? Is default
   ioreq server created when the guest is created. xen_create_ioreq_server() does
   nothing, and xen_get_ioreq_server_info() will get the default ioreq server information.
   Is it right?
2. Why we create a default ioreq server when getting the hvm param if there is already a
   not default ioreq server?
3. In the far end, we will clear the ioreq page, and this ioreq page is used for default
   ioreq server, is it right?

Thanks
Wen Congyang

> 
>   Paul
> 
>>>
>>>    Paul
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>    Paul
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>> We will set to the guest to a new state, the old state should be
>>>> dropped.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Paul
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Paul
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Xen-devel mailing list
>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-11 12:54                                       ` Yang Hongyang
@ 2015-06-12  3:39                                         ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-12  3:39 UTC (permalink / raw)
  To: Wen Congyang, Paul Durrant, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson



On 06/11/2015 08:54 PM, Yang Hongyang wrote:
[...]
>> this patch now.
>
> Ok, this really is a historical patch...
>

Having tested, it is ok to drop this patch now.

>>
>> Thanks
>> Wen Congyang
>>
>>>>
>>>> In our implementation, we don't start a new emulator. The codes can work,
>>>> but some bugs may be not triggered.
>>>>
>>>
>>> How do you reconcile the incoming QEMU save record with the running emulator
>>> state?
>>>
>>>    Paul
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>    Paul
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>> We will set to the guest to a new state, the old state should be
>>>> dropped.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Paul
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Paul
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Xen-devel mailing list
>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>> .
>>>
>>
>> .
>>
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12  3:22                                         ` Wen Congyang
@ 2015-06-12  7:41                                           ` Paul Durrant
  2015-06-12 10:26                                             ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-12  7:41 UTC (permalink / raw)
  To: Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 12 June 2015 04:22
> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/11/2015 09:25 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
> >> Sent: 11 June 2015 13:59
> >> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >>
> >>
> >> On 06/11/2015 06:20 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>> Sent: 11 June 2015 09:48
> >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> [...]
> >>>
> >>>>
> >>>> In our implementation, we don't start a new emulator. The codes can
> >> work,
> >>>> but some bugs may be not triggered.
> >>>>
> >>>
> >>> How do you reconcile the incoming QEMU save record with the running
> >> emulator state?
> >>
> >> We introduce a qmp command "xen-load-devices-
> >> state"(libxl__qmp_restore) which
> >> can restore the emulator state. The step of resotre emulator state at a
> >> checkpoint is:
> >>
> >> 1. libxl__qmp_stop    -> vm_stop() in qemu
> >> 2. libxl__qmp_restore -> load_vmstate() in qemu
> >> 3. libxl__qmp_resume  -> vm_start() in qemu
> >>
> >
> > Ok, that sounds like the ideal time to hook back into Xen by creating a new
> ioreq server.
> 
> I have some questions about ioreq server:
> 1. If we use old version xen and newest version qemu, is it OK? Is default
>    ioreq server created when the guest is created. xen_create_ioreq_server()
> does
>    nothing, and xen_get_ioreq_server_info() will get the default ioreq server
> information.
>    Is it right?

No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round.

> 2. Why we create a default ioreq server when getting the hvm param if there
> is already a
>    not default ioreq server?

If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed.

> 3. In the far end, we will clear the ioreq page, and this ioreq page is used for
> default
>    ioreq server, is it right?

Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages. Other ioreq servers will have their pages cleared on re-insertion to the P2M at the source end when the server is disabled.

  Paul

> 
> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >>>
> >>>    Paul
> >>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>    Paul
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>    Paul
> >>>>>>>
> >>>>>>>> We will set to the guest to a new state, the old state should be
> >>>> dropped.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>    Paul
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Wen Congyang
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>    Paul
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ~Andrew
> >>>>>>>>>>>>> .
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Xen-devel mailing list
> >>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Xen-devel mailing list
> >>>>>>> Xen-devel@lists.xen.org
> >>>>>>> http://lists.xen.org/xen-devel
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xen.org
> >>> http://lists.xen.org/xen-devel
> >>> .
> >>>
> >>
> >> --
> >> Thanks,
> >> Yang.
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12  7:41                                           ` Paul Durrant
@ 2015-06-12 10:26                                             ` Wen Congyang
  2015-06-12 10:54                                               ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-12 10:26 UTC (permalink / raw)
  To: Paul Durrant, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/12/2015 03:41 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 12 June 2015 04:22
>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
>>>> Sent: 11 June 2015 13:59
>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>>
>>>>
>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>> Sent: 11 June 2015 09:48
>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> [...]
>>>>>
>>>>>>
>>>>>> In our implementation, we don't start a new emulator. The codes can
>>>> work,
>>>>>> but some bugs may be not triggered.
>>>>>>
>>>>>
>>>>> How do you reconcile the incoming QEMU save record with the running
>>>> emulator state?
>>>>
>>>> We introduce a qmp command "xen-load-devices-
>>>> state"(libxl__qmp_restore) which
>>>> can restore the emulator state. The step of resotre emulator state at a
>>>> checkpoint is:
>>>>
>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
>>>>
>>>
>>> Ok, that sounds like the ideal time to hook back into Xen by creating a new
>> ioreq server.
>>
>> I have some questions about ioreq server:
>> 1. If we use old version xen and newest version qemu, is it OK? Is default
>>    ioreq server created when the guest is created. xen_create_ioreq_server()
>> does
>>    nothing, and xen_get_ioreq_server_info() will get the default ioreq server
>> information.
>>    Is it right?
> 
> No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round.
> 
>> 2. Why we create a default ioreq server when getting the hvm param if there
>> is already a
>>    not default ioreq server?
> 
> If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed.

If there are two ioreq servers: default ioreq server, and a ioreq server created by emulator. The guest can work it correctly in
this case? Is there any application(not emulator) that uses the libxenctrl directly?

Thanks
Wen Congyang

> 
>> 3. In the far end, we will clear the ioreq page, and this ioreq page is used for
>> default
>>    ioreq server, is it right?
> 
> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages. Other ioreq servers will have their pages cleared on re-insertion to the P2M at the source end when the server is disabled.
> 
>   Paul
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>>>
>>>>>    Paul
>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Paul
>>>>>>>>>
>>>>>>>>>> We will set to the guest to a new state, the old state should be
>>>>>> dropped.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Paul
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Paul
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Xen-devel mailing list
>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel
>>>>> .
>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Yang.
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 10:26                                             ` Wen Congyang
@ 2015-06-12 10:54                                               ` Paul Durrant
  2015-06-12 11:09                                                 ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-12 10:54 UTC (permalink / raw)
  To: Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 12 June 2015 11:26
> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/12/2015 03:41 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 12 June 2015 04:22
> >> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/11/2015 09:25 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
> >>>> Sent: 11 June 2015 13:59
> >>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>>
> >>>>
> >>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>> Sent: 11 June 2015 09:48
> >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >>>> devel@lists.xen.org
> >>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>> [...]
> >>>>>
> >>>>>>
> >>>>>> In our implementation, we don't start a new emulator. The codes
> can
> >>>> work,
> >>>>>> but some bugs may be not triggered.
> >>>>>>
> >>>>>
> >>>>> How do you reconcile the incoming QEMU save record with the
> running
> >>>> emulator state?
> >>>>
> >>>> We introduce a qmp command "xen-load-devices-
> >>>> state"(libxl__qmp_restore) which
> >>>> can restore the emulator state. The step of resotre emulator state at a
> >>>> checkpoint is:
> >>>>
> >>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
> >>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
> >>>> 3. libxl__qmp_resume  -> vm_start() in qemu
> >>>>
> >>>
> >>> Ok, that sounds like the ideal time to hook back into Xen by creating a
> new
> >> ioreq server.
> >>
> >> I have some questions about ioreq server:
> >> 1. If we use old version xen and newest version qemu, is it OK? Is default
> >>    ioreq server created when the guest is created.
> xen_create_ioreq_server()
> >> does
> >>    nothing, and xen_get_ioreq_server_info() will get the default ioreq
> server
> >> information.
> >>    Is it right?
> >
> > No. It's not compatible in that direction. A new Xen will work with an old
> QEMU but not the other way round.
> >
> >> 2. Why we create a default ioreq server when getting the hvm param if
> there
> >> is already a
> >>    not default ioreq server?
> >
> > If something reads the 'legacy' HVM params then that is Xen's trigger to
> create the default server. Any 'new' emulator should be using the ioreq
> server hypercalls so the default server will not be needed.
> 
> If there are two ioreq servers: default ioreq server, and a ioreq server
> created by emulator. The guest can work it correctly in
> this case?

You mean a secondary emulator? Yes, that's why there is the notion of default ioreq server... to allow a secondary emulator to be used even when an old QEMU is in use.

> Is there any application(not emulator) that uses the libxenctrl
> directly?
> 

What do you mean by application? Toolstacks may use libxenctrl.

  Paul

> Thanks
> Wen Congyang
> 
> >
> >> 3. In the far end, we will clear the ioreq page, and this ioreq page is used
> for
> >> default
> >>    ioreq server, is it right?
> >
> > Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and
> that includes the default server pages. Other ioreq servers will have their
> pages cleared on re-insertion to the P2M at the source end when the server
> is disabled.
> >
> >   Paul
> >
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>   Paul
> >>>
> >>>>>
> >>>>>    Paul
> >>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>    Paul
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>    Paul
> >>>>>>>>>
> >>>>>>>>>> We will set to the guest to a new state, the old state should be
> >>>>>> dropped.
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Wen Congyang
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>    Paul
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    Paul
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ~Andrew
> >>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> .
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Xen-devel mailing list
> >>>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Xen-devel mailing list
> >>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@lists.xen.org
> >>>>> http://lists.xen.org/xen-devel
> >>>>> .
> >>>>>
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Yang.
> >>> .
> >>>
> >
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 10:54                                               ` Paul Durrant
@ 2015-06-12 11:09                                                 ` Wen Congyang
  2015-06-12 11:48                                                   ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-12 11:09 UTC (permalink / raw)
  To: Paul Durrant, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

On 06/12/2015 06:54 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 12 June 2015 11:26
>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/12/2015 03:41 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 12 June 2015 04:22
>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
>>>>>> Sent: 11 June 2015 13:59
>>>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>> Sent: 11 June 2015 09:48
>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>>> devel@lists.xen.org
>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>> [...]
>>>>>>>
>>>>>>>>
>>>>>>>> In our implementation, we don't start a new emulator. The codes
>> can
>>>>>> work,
>>>>>>>> but some bugs may be not triggered.
>>>>>>>>
>>>>>>>
>>>>>>> How do you reconcile the incoming QEMU save record with the
>> running
>>>>>> emulator state?
>>>>>>
>>>>>> We introduce a qmp command "xen-load-devices-
>>>>>> state"(libxl__qmp_restore) which
>>>>>> can restore the emulator state. The step of resotre emulator state at a
>>>>>> checkpoint is:
>>>>>>
>>>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
>>>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
>>>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
>>>>>>
>>>>>
>>>>> Ok, that sounds like the ideal time to hook back into Xen by creating a
>> new
>>>> ioreq server.
>>>>
>>>> I have some questions about ioreq server:
>>>> 1. If we use old version xen and newest version qemu, is it OK? Is default
>>>>    ioreq server created when the guest is created.
>> xen_create_ioreq_server()
>>>> does
>>>>    nothing, and xen_get_ioreq_server_info() will get the default ioreq
>> server
>>>> information.
>>>>    Is it right?
>>>
>>> No. It's not compatible in that direction. A new Xen will work with an old
>> QEMU but not the other way round.

If the xen is newest, and qemu is old, how is the default ioreq server created
for the emulator? Will the old qemu call xc_get_hvm_param(), and the hypervisor
then creates a default ioreq server?
get is a 'readonly' operation, and creating the default ioreq server in it is
very strange.

>>>
>>>> 2. Why we create a default ioreq server when getting the hvm param if
>> there
>>>> is already a
>>>>    not default ioreq server?
>>>
>>> If something reads the 'legacy' HVM params then that is Xen's trigger to
>> create the default server. Any 'new' emulator should be using the ioreq
>> server hypercalls so the default server will not be needed.
>>
>> If there are two ioreq servers: default ioreq server, and a ioreq server
>> created by emulator. The guest can work it correctly in
>> this case?
> 
> You mean a secondary emulator? Yes, that's why there is the notion of default ioreq server... to allow a secondary emulator to be used even when an old QEMU is in use.

No, only one emulator. Can we run more than one emulator for one hvm guest? How
to do it?

> 
>> Is there any application(not emulator) that uses the libxenctrl
>> directly?
>>
> 
> What do you mean by application? Toolstacks may use libxenctrl.

For example: libvirt. I know it uses libxl now. Is there any similar application
which uses libxenctrl.

Thanks
Wen Congyang

> 
>   Paul
> 
>> Thanks
>> Wen Congyang
>>
>>>
>>>> 3. In the far end, we will clear the ioreq page, and this ioreq page is used
>> for
>>>> default
>>>>    ioreq server, is it right?
>>>
>>> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and
>> that includes the default server pages. Other ioreq servers will have their
>> pages cleared on re-insertion to the P2M at the source end when the server
>> is disabled.
>>>
>>>   Paul
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>   Paul
>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Paul
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Paul
>>>>>>>>>>>
>>>>>>>>>>>> We will set to the guest to a new state, the old state should be
>>>>>>>> dropped.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Paul
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Paul
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Yang.
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
  2015-06-11  9:42           ` Ian Campbell
  2015-06-11  9:48             ` Wen Congyang
@ 2015-06-12 11:23             ` Ian Jackson
  1 sibling, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2015-06-12 11:23 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, Wen Congyang, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, Yang Hongyang

Ian Campbell writes ("Re: [Xen-devel] [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()"):
> On Thu, 2015-06-11 at 17:09 +0800, Wen Congyang wrote:
> > If the public API creates a new AO, it is safe to call it directly?
> 
> A public function which takes an ao_how is, I believe, an exception to
> this rule and should be annotated with LIBXL_EXTERNAL_CALLERS_ONLY to
> prevent accidents.

Yes.

> I don't think libxl_domain_unpause is such a function though.

Indeed.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 11:09                                                 ` Wen Congyang
@ 2015-06-12 11:48                                                   ` Paul Durrant
  2015-06-12 15:04                                                     ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-12 11:48 UTC (permalink / raw)
  To: Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> Sent: 12 June 2015 12:10
> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/12/2015 06:54 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 12 June 2015 11:26
> >> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/12/2015 03:41 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>> Sent: 12 June 2015 04:22
> >>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
> >>>>>> Sent: 11 June 2015 13:59
> >>>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
> >>>> devel@lists.xen.org
> >>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
> Jackson
> >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> zero
> >>>> ioreq
> >>>>>> page only one time
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>>>> Sent: 11 June 2015 09:48
> >>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >>>>>> devel@lists.xen.org
> >>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>>>> [...]
> >>>>>>>
> >>>>>>>>
> >>>>>>>> In our implementation, we don't start a new emulator. The codes
> >> can
> >>>>>> work,
> >>>>>>>> but some bugs may be not triggered.
> >>>>>>>>
> >>>>>>>
> >>>>>>> How do you reconcile the incoming QEMU save record with the
> >> running
> >>>>>> emulator state?
> >>>>>>
> >>>>>> We introduce a qmp command "xen-load-devices-
> >>>>>> state"(libxl__qmp_restore) which
> >>>>>> can restore the emulator state. The step of resotre emulator state at
> a
> >>>>>> checkpoint is:
> >>>>>>
> >>>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
> >>>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
> >>>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
> >>>>>>
> >>>>>
> >>>>> Ok, that sounds like the ideal time to hook back into Xen by creating a
> >> new
> >>>> ioreq server.
> >>>>
> >>>> I have some questions about ioreq server:
> >>>> 1. If we use old version xen and newest version qemu, is it OK? Is
> default
> >>>>    ioreq server created when the guest is created.
> >> xen_create_ioreq_server()
> >>>> does
> >>>>    nothing, and xen_get_ioreq_server_info() will get the default ioreq
> >> server
> >>>> information.
> >>>>    Is it right?
> >>>
> >>> No. It's not compatible in that direction. A new Xen will work with an old
> >> QEMU but not the other way round.
> 
> If the xen is newest, and qemu is old, how is the default ioreq server created
> for the emulator? Will the old qemu call xc_get_hvm_param(), and the
> hypervisor
> then creates a default ioreq server?
> get is a 'readonly' operation, and creating the default ioreq server in it is
> very strange.
> 

Might be strange, but that's the way it's done.

> >>>
> >>>> 2. Why we create a default ioreq server when getting the hvm param if
> >> there
> >>>> is already a
> >>>>    not default ioreq server?
> >>>
> >>> If something reads the 'legacy' HVM params then that is Xen's trigger to
> >> create the default server. Any 'new' emulator should be using the ioreq
> >> server hypercalls so the default server will not be needed.
> >>
> >> If there are two ioreq servers: default ioreq server, and a ioreq server
> >> created by emulator. The guest can work it correctly in
> >> this case?
> >
> > You mean a secondary emulator? Yes, that's why there is the notion of
> default ioreq server... to allow a secondary emulator to be used even when
> an old QEMU is in use.
> 
> No, only one emulator. Can we run more than one emulator for one hvm
> guest? How
> to do it?
> 

Yes, more than one emulator can run. There's nothing in libxl to do it, but we do it in XenServer using XAPI.

> >
> >> Is there any application(not emulator) that uses the libxenctrl
> >> directly?
> >>
> >
> > What do you mean by application? Toolstacks may use libxenctrl.
> 
> For example: libvirt. I know it uses libxl now. Is there any similar application
> which uses libxenctrl.

Well, XAPI uses it for one.

  Paul

> 
> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>> 3. In the far end, we will clear the ioreq page, and this ioreq page is
> used
> >> for
> >>>> default
> >>>>    ioreq server, is it right?
> >>>
> >>> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and
> >> that includes the default server pages. Other ioreq servers will have their
> >> pages cleared on re-insertion to the P2M at the source end when the
> server
> >> is disabled.
> >>>
> >>>   Paul
> >>>
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>   Paul
> >>>>>
> >>>>>>>
> >>>>>>>    Paul
> >>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>    Paul
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Wen Congyang
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>    Paul
> >>>>>>>>>>>
> >>>>>>>>>>>> We will set to the guest to a new state, the old state should
> be
> >>>>>>>> dropped.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    Paul
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>    Paul
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ~Andrew
> >>>>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> .
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> _______________________________________________
> >>>>>>>>>>>> Xen-devel mailing list
> >>>>>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Xen-devel mailing list
> >>>>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Xen-devel mailing list
> >>>>>>> Xen-devel@lists.xen.org
> >>>>>>> http://lists.xen.org/xen-devel
> >>>>>>> .
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Thanks,
> >>>>>> Yang.
> >>>>> .
> >>>>>
> >>>
> >>> .
> >>>
> >
> > .
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  2015-06-08  3:43 ` [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
@ 2015-06-12 12:33   ` Wei Liu
  2015-06-15  1:29     ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-12 12:33 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote:
> Currently, libxl__domain_unpause() only supports
> qemu-xen-traditional. Update it to support qemu-xen.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>

This looks very similar to an existing function called
libxl__domain_resume_device_model. Maybe you don't need to invent a new
function.

> ---
>  tools/libxl/libxl.c | 42 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index d5691dc..5c843c2 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -933,10 +933,37 @@ out:
>      return AO_INPROGRESS;
>  }
>  
> -int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
> +static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
>  {
>      char *path;
>      char *state;
> +
> +    switch (libxl__device_model_version_running(gc, domid)) {
> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
> +
> +        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
> +        state = libxl__xs_read(gc, XBT_NULL, path);
> +        if (state != NULL && !strcmp(state, "paused")) {

The only difference between your function and
libxl__domain_unpause_device_model is the check for "state" node. I
think you can just add the check to libxl__domain_resume_device_model
and use that function.

Wei.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-08  3:43 ` [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back Yang Hongyang
@ 2015-06-12 12:54   ` Wei Liu
  2015-06-12 15:04     ` Ian Jackson
  2015-06-15  1:33     ` Yang Hongyang
  0 siblings, 2 replies; 106+ messages in thread
From: Wei Liu @ 2015-06-12 12:54 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
> In colo mode, slave needs to send data to master, but the io_fd
> only can be written in master, and only can be read in slave.
> Save recv_fd in domain_suspend_state, and send_fd in
> domain_create_state.
> 

You failed to mention in commit message new structures are introduced in
IDL.

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/libxl.c          |  2 +-
>  tools/libxl/libxl_create.c   | 14 ++++++++++----
>  tools/libxl/libxl_internal.h |  2 ++
>  tools/libxl/libxl_types.idl  |  7 +++++++
>  tools/libxl/xl_cmdimpl.c     |  7 +++++++

You also need to add LIBXL_HAVE in libxl.h.

>  5 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 5c843c2..36b97fe 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -832,7 +832,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
>      dss->callback = remus_failover_cb;
>      dss->domid = domid;
>      dss->fd = send_fd;
> -    /* TODO do something with recv_fd */
> +    dss->recv_fd = recv_fd;
>      dss->type = type;
>      dss->live = 1;
>      dss->debug = 0;
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 86384d2..bd8149c 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1577,8 +1577,8 @@ static void domain_create_cb(libxl__egc *egc,
>                               int rc, uint32_t domid);
>  
>  static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
> -                            uint32_t *domid,
> -                            int restore_fd, int checkpointed_stream,
> +                            uint32_t *domid, int restore_fd,
> +                            int send_fd, int checkpointed_stream,
>                              const libxl_asyncop_how *ao_how,
>                              const libxl_asyncprogress_how *aop_console_how)
>  {
> @@ -1591,6 +1591,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
>      libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
>      libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
>      cdcs->dcs.restore_fd = restore_fd;
> +    cdcs->dcs.send_fd = send_fd;
>      cdcs->dcs.callback = domain_create_cb;
>      cdcs->dcs.checkpointed_stream = checkpointed_stream;
>      libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
> @@ -1619,7 +1620,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
>                              const libxl_asyncop_how *ao_how,
>                              const libxl_asyncprogress_how *aop_console_how)
>  {
> -    return do_domain_create(ctx, d_config, domid, -1, 0,
> +    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
>                              ao_how, aop_console_how);
>  }
>  
> @@ -1629,7 +1630,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
>                                  const libxl_asyncop_how *ao_how,
>                                  const libxl_asyncprogress_how *aop_console_how)
>  {
> -    return do_domain_create(ctx, d_config, domid, restore_fd,
> +    int send_fd = -1;
> +
> +    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
> +        send_fd = params->send_fd;
> +
> +    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
>                              params->checkpointed_stream, ao_how, aop_console_how);
>  }
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index fbbae93..6d214b5 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2874,6 +2874,7 @@ struct libxl__domain_save_state {
>  
>      uint32_t domid;
>      int fd;
> +    int recv_fd;
>      libxl_domain_type type;
>      int live;
>      int debug;
> @@ -3143,6 +3144,7 @@ struct libxl__domain_create_state {
>      libxl_domain_config *guest_config;
>      libxl_domain_config guest_config_saved; /* vanilla config */
>      int restore_fd;
> +    int send_fd;
>      libxl__domain_create_cb *callback;
>      libxl_asyncprogress_how aop_console_how;
>      /* private to domain_create */
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 23f27d4..8a3d7ba 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -198,6 +198,12 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
>      (3, "reference_tsc"),
>      ])
>  
> +libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
> +    (0, "NONE"),
> +    (1, "REMUS"),
> +    (2, "COLO"),
> +    ], init_val = 0)

The default init_val is 0 so you don't need to write it down.

> +
>  #
>  # Complex libxl types
>  #
> @@ -346,6 +352,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
>  
>  libxl_domain_restore_params = Struct("domain_restore_params", [
>      ("checkpointed_stream", integer),
> +    ("send_fd", integer),

I'm not entirely sure if we want to bury an extra argument here.

After looking at code I think you're trying to work around API
limitation. I think we are safe to extend the API -- we've already done
that before. See libxl.h around line 990.

Ian and Ian, what do you think?

>      ])
>  
>  libxl_domain_sched_params = Struct("domain_sched_params",[
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index c858068..adfadd1 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c

I would also suggest you rename the last argument of migrate_receive in
this file from "remus" to "checkpointed_stream_type" since the semantics
of that parameter has changed.

Wei.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device
  2015-06-08  3:43 ` [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device Yang Hongyang
@ 2015-06-12 13:28   ` Wei Liu
  2015-06-15  1:46     ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-12 13:28 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, Jun 08, 2015 at 11:43:17AM +0800, Yang Hongyang wrote:
> Checkpoint device is an abstract layer to do checkpoint.
> COLO can also use it to do checkpoint. But there are
> still some codes in checkpoint device which touch remus:
> 1. remus_ops: we use remus ops directly in checkpoint
>    device. Store it in checkpoint device state.
> 2. concrete layer's private member: add a new structure
>    remus state, and move them to remus state.
> 3. init/cleanup device subkind: we call (init|cleanup)_subkind_nic
>    and (init|cleanup)_subkind_drbd_disk directly in checkpoint
>    device. Call them before calling libxl__checkpoint_devices_setup()
>    or after calling libxl__checkpoint_devices_teardown().
> 

>From the look of it this patch is mostly refactoring and doesn't involve
functional changes, right? If so please state that in commit message.

I suppose this needs review from remus maintainer.

Wei.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-08  3:43 ` [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device Yang Hongyang
@ 2015-06-12 13:30   ` Wei Liu
  2015-06-12 13:35     ` Wei Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-12 13:30 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
[...]
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/Makefile                  |   2 +-
>  tools/libxl/libxl_checkpoint_device.c | 327 ++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_internal.h          | 112 ++++++------
>  tools/libxl/libxl_netbuffer.c         | 108 +++++------
>  tools/libxl/libxl_nonetbuffer.c       |  10 +-
>  tools/libxl/libxl_remus.c             |  76 ++++----
>  tools/libxl/libxl_remus_device.c      | 327 ----------------------------------
>  tools/libxl/libxl_remus_disk_drbd.c   |  52 +++---
>  tools/libxl/libxl_types.idl           |   4 +-

This only thing I need to complain is the change to IDL.

>      .kind = LIBXL__DEVICE_KIND_VBD,
>      .setup = drbd_setup,
>      .teardown = drbd_teardown,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 8a3d7ba..375c546 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -61,8 +61,8 @@ libxl_error = Enumeration("error", [
>      (-15, "LOCK_FAIL"),
>      (-16, "JSON_CONFIG_EMPTY"),
>      (-17, "DEVICE_EXISTS"),
> -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),

You should add two new error numbers.

Wei.

>      (-20, "VNUMA_CONFIG_INVALID"),
>      (-21, "DOMAIN_NOTFOUND"),
>      ], value_namespace = "")
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-12 13:30   ` Wei Liu
@ 2015-06-12 13:35     ` Wei Liu
  2015-06-12 14:57       ` Ian Jackson
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-12 13:35 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
> On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
> [...]
> > 
> > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> > ---
> >  tools/libxl/Makefile                  |   2 +-
> >  tools/libxl/libxl_checkpoint_device.c | 327 ++++++++++++++++++++++++++++++++++
> >  tools/libxl/libxl_internal.h          | 112 ++++++------
> >  tools/libxl/libxl_netbuffer.c         | 108 +++++------
> >  tools/libxl/libxl_nonetbuffer.c       |  10 +-
> >  tools/libxl/libxl_remus.c             |  76 ++++----
> >  tools/libxl/libxl_remus_device.c      | 327 ----------------------------------
> >  tools/libxl/libxl_remus_disk_drbd.c   |  52 +++---
> >  tools/libxl/libxl_types.idl           |   4 +-
> 
> This only thing I need to complain is the change to IDL.
> 
> >      .kind = LIBXL__DEVICE_KIND_VBD,
> >      .setup = drbd_setup,
> >      .teardown = drbd_teardown,
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 8a3d7ba..375c546 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -61,8 +61,8 @@ libxl_error = Enumeration("error", [
> >      (-15, "LOCK_FAIL"),
> >      (-16, "JSON_CONFIG_EMPTY"),
> >      (-17, "DEVICE_EXISTS"),
> > -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> > -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> > +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> > +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
> 
> You should add two new error numbers.
> 

And in that case you might also need to go through all places to make
sure the correct error numbers are return. I.e. old remus code path
still returns REMUS error code and new CHECKPOINT code path returns new
error code.

I merely speak from API backward compatibility point of view. If you
think what I suggest doesn't make sense, please let me know.

Wei.


> Wei.
> 
> >      (-20, "VNUMA_CONFIG_INVALID"),
> >      (-21, "DOMAIN_NOTFOUND"),
> >      ], value_namespace = "")
> > -- 
> > 1.9.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-12 13:35     ` Wei Liu
@ 2015-06-12 14:57       ` Ian Jackson
  2015-06-15  1:45         ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Jackson @ 2015-06-12 14:57 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang, ian.jackson

Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
> On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
> > On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
> > > -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> > > -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> > > +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> > > +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
> > 
> > You should add two new error numbers.
> > 
> 
> And in that case you might also need to go through all places to make
> sure the correct error numbers are return. I.e. old remus code path
> still returns REMUS error code and new CHECKPOINT code path returns new
> error code.
> 
> I merely speak from API backward compatibility point of view. If you
> think what I suggest doesn't make sense, please let me know.

To me this line of reasons prompts me to ask: what would be wrong with
leaving the word REMUS in the error names, and simply updating the
descriptions ?

After all AFIACT the circumstances are very similar.  I don't think it
makes sense to require libxl to do something like
   rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;

Please to contradict me if I have misunderstood...

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-12 12:54   ` Wei Liu
@ 2015-06-12 15:04     ` Ian Jackson
  2015-06-15  1:38       ` Yang Hongyang
  2015-06-15  1:33     ` Yang Hongyang
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Jackson @ 2015-06-12 15:04 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back"):
> On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
> > From: Wen Congyang <wency@cn.fujitsu.com>
> > 
> > In colo mode, slave needs to send data to master, but the io_fd
> > only can be written in master, and only can be read in slave.
> > Save recv_fd in domain_suspend_state, and send_fd in
> > domain_create_state.
...
> >  libxl_domain_restore_params = Struct("domain_restore_params", [
> >      ("checkpointed_stream", integer),
> > +    ("send_fd", integer),
> 
> I'm not entirely sure if we want to bury an extra argument here.
> 
> After looking at code I think you're trying to work around API
> limitation. I think we are safe to extend the API -- we've already done
> that before. See libxl.h around line 990.
> 
> Ian and Ian, what do you think?

I agree with you, Wei.  I don't think an fd should be in
libxl_domain_restore_params at all.

We need to understand what the API semantics are.  Are are going to
introduce a new libxl API entrypoint ?  We already have
libxl_domain_remus_start.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 11:48                                                   ` Paul Durrant
@ 2015-06-12 15:04                                                     ` Wen Congyang
  2015-06-12 15:31                                                       ` Paul Durrant
  0 siblings, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-12 15:04 UTC (permalink / raw)
  To: Paul Durrant, Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

At 2015/6/12 19:48, Paul Durrant Wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>> Sent: 12 June 2015 12:10
>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/12/2015 06:54 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 12 June 2015 11:26
>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/12/2015 03:41 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>> Sent: 12 June 2015 04:22
>>>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
>>>>>>>> Sent: 11 June 2015 13:59
>>>>>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
>>>>>> devel@lists.xen.org
>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
>> Jackson
>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>> zero
>>>>>> ioreq
>>>>>>>> page only one time
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>>>> Sent: 11 June 2015 09:48
>>>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>>>>> devel@lists.xen.org
>>>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In our implementation, we don't start a new emulator. The codes
>>>> can
>>>>>>>> work,
>>>>>>>>>> but some bugs may be not triggered.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How do you reconcile the incoming QEMU save record with the
>>>> running
>>>>>>>> emulator state?
>>>>>>>>
>>>>>>>> We introduce a qmp command "xen-load-devices-
>>>>>>>> state"(libxl__qmp_restore) which
>>>>>>>> can restore the emulator state. The step of resotre emulator state at
>> a
>>>>>>>> checkpoint is:
>>>>>>>>
>>>>>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
>>>>>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
>>>>>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
>>>>>>>>
>>>>>>>
>>>>>>> Ok, that sounds like the ideal time to hook back into Xen by creating a
>>>> new
>>>>>> ioreq server.
>>>>>>
>>>>>> I have some questions about ioreq server:
>>>>>> 1. If we use old version xen and newest version qemu, is it OK? Is
>> default
>>>>>>     ioreq server created when the guest is created.
>>>> xen_create_ioreq_server()
>>>>>> does
>>>>>>     nothing, and xen_get_ioreq_server_info() will get the default ioreq
>>>> server
>>>>>> information.
>>>>>>     Is it right?
>>>>>
>>>>> No. It's not compatible in that direction. A new Xen will work with an old
>>>> QEMU but not the other way round.
>>
>> If the xen is newest, and qemu is old, how is the default ioreq server created
>> for the emulator? Will the old qemu call xc_get_hvm_param(), and the
>> hypervisor
>> then creates a default ioreq server?
>> get is a 'readonly' operation, and creating the default ioreq server in it is
>> very strange.
>>
>
> Might be strange, but that's the way it's done.

If I only want to get the default ioreq server pfn, and don't want to 
create the default
ioreq server, how can I do it?

Thanks
Wen Congyang

>
>>>>>
>>>>>> 2. Why we create a default ioreq server when getting the hvm param if
>>>> there
>>>>>> is already a
>>>>>>     not default ioreq server?
>>>>>
>>>>> If something reads the 'legacy' HVM params then that is Xen's trigger to
>>>> create the default server. Any 'new' emulator should be using the ioreq
>>>> server hypercalls so the default server will not be needed.
>>>>
>>>> If there are two ioreq servers: default ioreq server, and a ioreq server
>>>> created by emulator. The guest can work it correctly in
>>>> this case?
>>>
>>> You mean a secondary emulator? Yes, that's why there is the notion of
>> default ioreq server... to allow a secondary emulator to be used even when
>> an old QEMU is in use.
>>
>> No, only one emulator. Can we run more than one emulator for one hvm
>> guest? How
>> to do it?
>>
>
> Yes, more than one emulator can run. There's nothing in libxl to do it, but we do it in XenServer using XAPI.
>
>>>
>>>> Is there any application(not emulator) that uses the libxenctrl
>>>> directly?
>>>>
>>>
>>> What do you mean by application? Toolstacks may use libxenctrl.
>>
>> For example: libvirt. I know it uses libxl now. Is there any similar application
>> which uses libxenctrl.
>
> Well, XAPI uses it for one.
>
>    Paul
>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>    Paul
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>> 3. In the far end, we will clear the ioreq page, and this ioreq page is
>> used
>>>> for
>>>>>> default
>>>>>>     ioreq server, is it right?
>>>>>
>>>>> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and
>>>> that includes the default server pages. Other ioreq servers will have their
>>>> pages cleared on re-insertion to the P2M at the source end when the
>> server
>>>> is disabled.
>>>>>
>>>>>    Paul
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>    Paul
>>>>>>>
>>>>>>>>>
>>>>>>>>>     Paul
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Wen Congyang
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     Paul
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Paul
>>>>>>>>>>>>>
>>>>>>>>>>>>>> We will set to the guest to a new state, the old state should
>> be
>>>>>>>>>> dropped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Paul
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Paul
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>> _______________________________________________
>>>>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Xen-devel mailing list
>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Yang.
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 15:04                                                     ` Wen Congyang
@ 2015-06-12 15:31                                                       ` Paul Durrant
  2015-06-13  5:58                                                         ` Wen Congyang
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Durrant @ 2015-06-12 15:31 UTC (permalink / raw)
  To: Wen Congyang, Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

> -----Original Message-----
> From: Wen Congyang [mailto:ghostwcy@gmail.com]
> Sent: 12 June 2015 16:05
> To: Paul Durrant; Wen Congyang; Yang Hongyang; Andrew Cooper; xen-
> devel@lists.xen.org
> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> At 2015/6/12 19:48, Paul Durrant Wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >> Sent: 12 June 2015 12:10
> >> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> devel@lists.xen.org
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/12/2015 06:54 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>> Sent: 12 June 2015 11:26
> >>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> >> devel@lists.xen.org
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> On 06/12/2015 03:41 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>> Sent: 12 June 2015 04:22
> >>>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
> >>>> devel@lists.xen.org
> >>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
> Jackson
> >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> zero
> >>>> ioreq
> >>>>>> page only one time
> >>>>>>
> >>>>>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
> >>>>>>>> Sent: 11 June 2015 13:59
> >>>>>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
> >>>>>> devel@lists.xen.org
> >>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
> >> Jackson
> >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> >> zero
> >>>>>> ioreq
> >>>>>>>> page only one time
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
> >>>>>>>>>> Sent: 11 June 2015 09:48
> >>>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >>>>>>>> devel@lists.xen.org
> >>>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
> >>>>>>>> [...]
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> In our implementation, we don't start a new emulator. The
> codes
> >>>> can
> >>>>>>>> work,
> >>>>>>>>>> but some bugs may be not triggered.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> How do you reconcile the incoming QEMU save record with the
> >>>> running
> >>>>>>>> emulator state?
> >>>>>>>>
> >>>>>>>> We introduce a qmp command "xen-load-devices-
> >>>>>>>> state"(libxl__qmp_restore) which
> >>>>>>>> can restore the emulator state. The step of resotre emulator state
> at
> >> a
> >>>>>>>> checkpoint is:
> >>>>>>>>
> >>>>>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
> >>>>>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
> >>>>>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
> >>>>>>>>
> >>>>>>>
> >>>>>>> Ok, that sounds like the ideal time to hook back into Xen by
> creating a
> >>>> new
> >>>>>> ioreq server.
> >>>>>>
> >>>>>> I have some questions about ioreq server:
> >>>>>> 1. If we use old version xen and newest version qemu, is it OK? Is
> >> default
> >>>>>>     ioreq server created when the guest is created.
> >>>> xen_create_ioreq_server()
> >>>>>> does
> >>>>>>     nothing, and xen_get_ioreq_server_info() will get the default
> ioreq
> >>>> server
> >>>>>> information.
> >>>>>>     Is it right?
> >>>>>
> >>>>> No. It's not compatible in that direction. A new Xen will work with an
> old
> >>>> QEMU but not the other way round.
> >>
> >> If the xen is newest, and qemu is old, how is the default ioreq server
> created
> >> for the emulator? Will the old qemu call xc_get_hvm_param(), and the
> >> hypervisor
> >> then creates a default ioreq server?
> >> get is a 'readonly' operation, and creating the default ioreq server in it is
> >> very strange.
> >>
> >
> > Might be strange, but that's the way it's done.
> 
> If I only want to get the default ioreq server pfn, and don't want to
> create the default
> ioreq server, how can I do it?

At the moment you can't. If you read the pfns then you will instantiate the server, which I agree is a bit of a problem if you're only doing a domain save.

  Paul

> 
> Thanks
> Wen Congyang
> 
> >
> >>>>>
> >>>>>> 2. Why we create a default ioreq server when getting the hvm
> param if
> >>>> there
> >>>>>> is already a
> >>>>>>     not default ioreq server?
> >>>>>
> >>>>> If something reads the 'legacy' HVM params then that is Xen's trigger
> to
> >>>> create the default server. Any 'new' emulator should be using the ioreq
> >>>> server hypercalls so the default server will not be needed.
> >>>>
> >>>> If there are two ioreq servers: default ioreq server, and a ioreq server
> >>>> created by emulator. The guest can work it correctly in
> >>>> this case?
> >>>
> >>> You mean a secondary emulator? Yes, that's why there is the notion of
> >> default ioreq server... to allow a secondary emulator to be used even
> when
> >> an old QEMU is in use.
> >>
> >> No, only one emulator. Can we run more than one emulator for one hvm
> >> guest? How
> >> to do it?
> >>
> >
> > Yes, more than one emulator can run. There's nothing in libxl to do it, but
> we do it in XenServer using XAPI.
> >
> >>>
> >>>> Is there any application(not emulator) that uses the libxenctrl
> >>>> directly?
> >>>>
> >>>
> >>> What do you mean by application? Toolstacks may use libxenctrl.
> >>
> >> For example: libvirt. I know it uses libxl now. Is there any similar
> application
> >> which uses libxenctrl.
> >
> > Well, XAPI uses it for one.
> >
> >    Paul
> >
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>    Paul
> >>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>> 3. In the far end, we will clear the ioreq page, and this ioreq page is
> >> used
> >>>> for
> >>>>>> default
> >>>>>>     ioreq server, is it right?
> >>>>>
> >>>>> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end -
> and
> >>>> that includes the default server pages. Other ioreq servers will have
> their
> >>>> pages cleared on re-insertion to the P2M at the source end when the
> >> server
> >>>> is disabled.
> >>>>>
> >>>>>    Paul
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>    Paul
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>     Paul
> >>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Wen Congyang
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>     Paul
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>     Paul
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> We will set to the guest to a new state, the old state
> should
> >> be
> >>>>>>>>>> dropped.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>     Paul
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>     Paul
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>>> Wen Congyang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> ~Andrew
> >>>>>>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >> _______________________________________________
> >>>>>>>>>>>>>> Xen-devel mailing list
> >>>>>>>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> _______________________________________________
> >>>>>>>>>>>>> Xen-devel mailing list
> >>>>>>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>>>>>> .
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Xen-devel mailing list
> >>>>>>>>> Xen-devel@lists.xen.org
> >>>>>>>>> http://lists.xen.org/xen-devel
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Thanks,
> >>>>>>>> Yang.
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>
> >>> .
> >>>
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
  2015-06-12 15:31                                                       ` Paul Durrant
@ 2015-06-13  5:58                                                         ` Wen Congyang
  0 siblings, 0 replies; 106+ messages in thread
From: Wen Congyang @ 2015-06-13  5:58 UTC (permalink / raw)
  To: Paul Durrant, Wen Congyang, Yang Hongyang, Andrew Cooper, xen-devel
  Cc: Wei Liu, Ian Campbell, guijianfeng, yunhong.jiang, Eddie Dong,
	rshriram, Ian Jackson

At 2015/6/12 23:31, Paul Durrant Wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:ghostwcy@gmail.com]
>> Sent: 12 June 2015 16:05
>> To: Paul Durrant; Wen Congyang; Yang Hongyang; Andrew Cooper; xen-
>> devel@lists.xen.org
>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> At 2015/6/12 19:48, Paul Durrant Wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>> Sent: 12 June 2015 12:10
>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>> devel@lists.xen.org
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/12/2015 06:54 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>> Sent: 12 June 2015 11:26
>>>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>>>> devel@lists.xen.org
>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> On 06/12/2015 03:41 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>> Sent: 12 June 2015 04:22
>>>>>>>> To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-
>>>>>> devel@lists.xen.org
>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
>> Jackson
>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>> zero
>>>>>> ioreq
>>>>>>>> page only one time
>>>>>>>>
>>>>>>>> On 06/11/2015 09:25 PM, Paul Durrant wrote:
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Yang Hongyang [mailto:yanghy@cn.fujitsu.com]
>>>>>>>>>> Sent: 11 June 2015 13:59
>>>>>>>>>> To: Paul Durrant; Wen Congyang; Andrew Cooper; xen-
>>>>>>>> devel@lists.xen.org
>>>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>>>>> yunhong.jiang@intel.com; Eddie Dong; rshriram@cs.ubc.ca; Ian
>>>> Jackson
>>>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>>>> zero
>>>>>>>> ioreq
>>>>>>>>>> page only one time
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/11/2015 06:20 PM, Paul Durrant wrote:
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Wen Congyang [mailto:wency@cn.fujitsu.com]
>>>>>>>>>>>> Sent: 11 June 2015 09:48
>>>>>>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>>>>>>>> devel@lists.xen.org
>>>>>>>>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@cn.fujitsu.com;
>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In our implementation, we don't start a new emulator. The
>> codes
>>>>>> can
>>>>>>>>>> work,
>>>>>>>>>>>> but some bugs may be not triggered.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> How do you reconcile the incoming QEMU save record with the
>>>>>> running
>>>>>>>>>> emulator state?
>>>>>>>>>>
>>>>>>>>>> We introduce a qmp command "xen-load-devices-
>>>>>>>>>> state"(libxl__qmp_restore) which
>>>>>>>>>> can restore the emulator state. The step of resotre emulator state
>> at
>>>> a
>>>>>>>>>> checkpoint is:
>>>>>>>>>>
>>>>>>>>>> 1. libxl__qmp_stop    -> vm_stop() in qemu
>>>>>>>>>> 2. libxl__qmp_restore -> load_vmstate() in qemu
>>>>>>>>>> 3. libxl__qmp_resume  -> vm_start() in qemu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, that sounds like the ideal time to hook back into Xen by
>> creating a
>>>>>> new
>>>>>>>> ioreq server.
>>>>>>>>
>>>>>>>> I have some questions about ioreq server:
>>>>>>>> 1. If we use old version xen and newest version qemu, is it OK? Is
>>>> default
>>>>>>>>      ioreq server created when the guest is created.
>>>>>> xen_create_ioreq_server()
>>>>>>>> does
>>>>>>>>      nothing, and xen_get_ioreq_server_info() will get the default
>> ioreq
>>>>>> server
>>>>>>>> information.
>>>>>>>>      Is it right?
>>>>>>>
>>>>>>> No. It's not compatible in that direction. A new Xen will work with an
>> old
>>>>>> QEMU but not the other way round.
>>>>
>>>> If the xen is newest, and qemu is old, how is the default ioreq server
>> created
>>>> for the emulator? Will the old qemu call xc_get_hvm_param(), and the
>>>> hypervisor
>>>> then creates a default ioreq server?
>>>> get is a 'readonly' operation, and creating the default ioreq server in it is
>>>> very strange.
>>>>
>>>
>>> Might be strange, but that's the way it's done.
>>
>> If I only want to get the default ioreq server pfn, and don't want to
>> create the default
>> ioreq server, how can I do it?
>
> At the moment you can't. If you read the pfns then you will instantiate the server, which I agree is a bit of a problem if you're only doing a domain save.

Yes, but we have to instantiate the server so old version qemu can work.
Can we add a new interface to just get the ioreq pfn, buffer ioreq 
pfn/evtchn?
For example HVM_PARAM_DEFAULT_IOREQ_PFN.

Thanks
Wen congyang

>
>    Paul
>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>>>>>
>>>>>>>> 2. Why we create a default ioreq server when getting the hvm
>> param if
>>>>>> there
>>>>>>>> is already a
>>>>>>>>      not default ioreq server?
>>>>>>>
>>>>>>> If something reads the 'legacy' HVM params then that is Xen's trigger
>> to
>>>>>> create the default server. Any 'new' emulator should be using the ioreq
>>>>>> server hypercalls so the default server will not be needed.
>>>>>>
>>>>>> If there are two ioreq servers: default ioreq server, and a ioreq server
>>>>>> created by emulator. The guest can work it correctly in
>>>>>> this case?
>>>>>
>>>>> You mean a secondary emulator? Yes, that's why there is the notion of
>>>> default ioreq server... to allow a secondary emulator to be used even
>> when
>>>> an old QEMU is in use.
>>>>
>>>> No, only one emulator. Can we run more than one emulator for one hvm
>>>> guest? How
>>>> to do it?
>>>>
>>>
>>> Yes, more than one emulator can run. There's nothing in libxl to do it, but
>> we do it in XenServer using XAPI.
>>>
>>>>>
>>>>>> Is there any application(not emulator) that uses the libxenctrl
>>>>>> directly?
>>>>>>
>>>>>
>>>>> What do you mean by application? Toolstacks may use libxenctrl.
>>>>
>>>> For example: libvirt. I know it uses libxl now. Is there any similar
>> application
>>>> which uses libxenctrl.
>>>
>>> Well, XAPI uses it for one.
>>>
>>>     Paul
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>     Paul
>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>> 3. In the far end, we will clear the ioreq page, and this ioreq page is
>>>> used
>>>>>> for
>>>>>>>> default
>>>>>>>>      ioreq server, is it right?
>>>>>>>
>>>>>>> Yes, AFAIK it's only the 'magic' pages that get cleared at the far end -
>> and
>>>>>> that includes the default server pages. Other ioreq servers will have
>> their
>>>>>> pages cleared on re-insertion to the P2M at the source end when the
>>>> server
>>>>>> is disabled.
>>>>>>>
>>>>>>>     Paul
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Paul
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      Paul
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>      Paul
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      Paul
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We will set to the guest to a new state, the old state
>> should
>>>> be
>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>      Paul
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>      Paul
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> Wen Congyang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ~Andrew
>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>> _______________________________________________
>>>>>>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>> _______________________________________________
>>>>>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Xen-devel mailing list
>>>>>>>>>>> Xen-devel@lists.xen.org
>>>>>>>>>>> http://lists.xen.org/xen-devel
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks,
>>>>>>>>>> Yang.
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>>
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  2015-06-12 12:33   ` Wei Liu
@ 2015-06-15  1:29     ` Yang Hongyang
  2015-06-15 16:22       ` Wei Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:29 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/12/2015 08:33 PM, Wei Liu wrote:
> On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote:
>> Currently, libxl__domain_unpause() only supports
>> qemu-xen-traditional. Update it to support qemu-xen.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>
> This looks very similar to an existing function called
> libxl__domain_resume_device_model. Maybe you don't need to invent a new
> function.
>
>> ---
>>   tools/libxl/libxl.c | 42 +++++++++++++++++++++++++++++++++---------
>>   1 file changed, 33 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index d5691dc..5c843c2 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -933,10 +933,37 @@ out:
>>       return AO_INPROGRESS;
>>   }
>>
>> -int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>> +static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
>>   {
>>       char *path;
>>       char *state;
>> +
>> +    switch (libxl__device_model_version_running(gc, domid)) {
>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
>> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
>> +
>> +        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
>> +        state = libxl__xs_read(gc, XBT_NULL, path);
>> +        if (state != NULL && !strcmp(state, "paused")) {
>
> The only difference between your function and
> libxl__domain_unpause_device_model is the check for "state" node. I
> think you can just add the check to libxl__domain_resume_device_model
> and use that function.

I'm not sure if we change the existing function's behavior will affect the
existing callers, if there's no problem to do so, I will do as what you
said in the next version.

>
> Wei.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-12 12:54   ` Wei Liu
  2015-06-12 15:04     ` Ian Jackson
@ 2015-06-15  1:33     ` Yang Hongyang
  1 sibling, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:33 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/12/2015 08:54 PM, Wei Liu wrote:
> On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> In colo mode, slave needs to send data to master, but the io_fd
>> only can be written in master, and only can be read in slave.
>> Save recv_fd in domain_suspend_state, and send_fd in
>> domain_create_state.
>>
>
> You failed to mention in commit message new structures are introduced in
> IDL.
>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl.c          |  2 +-
>>   tools/libxl/libxl_create.c   | 14 ++++++++++----
>>   tools/libxl/libxl_internal.h |  2 ++
>>   tools/libxl/libxl_types.idl  |  7 +++++++
>>   tools/libxl/xl_cmdimpl.c     |  7 +++++++
>
> You also need to add LIBXL_HAVE in libxl.h.
>
>>   5 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 5c843c2..36b97fe 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -832,7 +832,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
>>       dss->callback = remus_failover_cb;
>>       dss->domid = domid;
>>       dss->fd = send_fd;
>> -    /* TODO do something with recv_fd */
>> +    dss->recv_fd = recv_fd;
>>       dss->type = type;
>>       dss->live = 1;
>>       dss->debug = 0;
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index 86384d2..bd8149c 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -1577,8 +1577,8 @@ static void domain_create_cb(libxl__egc *egc,
>>                                int rc, uint32_t domid);
>>
>>   static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
>> -                            uint32_t *domid,
>> -                            int restore_fd, int checkpointed_stream,
>> +                            uint32_t *domid, int restore_fd,
>> +                            int send_fd, int checkpointed_stream,
>>                               const libxl_asyncop_how *ao_how,
>>                               const libxl_asyncprogress_how *aop_console_how)
>>   {
>> @@ -1591,6 +1591,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
>>       libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
>>       libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
>>       cdcs->dcs.restore_fd = restore_fd;
>> +    cdcs->dcs.send_fd = send_fd;
>>       cdcs->dcs.callback = domain_create_cb;
>>       cdcs->dcs.checkpointed_stream = checkpointed_stream;
>>       libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
>> @@ -1619,7 +1620,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
>>                               const libxl_asyncop_how *ao_how,
>>                               const libxl_asyncprogress_how *aop_console_how)
>>   {
>> -    return do_domain_create(ctx, d_config, domid, -1, 0,
>> +    return do_domain_create(ctx, d_config, domid, -1, -1, 0,
>>                               ao_how, aop_console_how);
>>   }
>>
>> @@ -1629,7 +1630,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
>>                                   const libxl_asyncop_how *ao_how,
>>                                   const libxl_asyncprogress_how *aop_console_how)
>>   {
>> -    return do_domain_create(ctx, d_config, domid, restore_fd,
>> +    int send_fd = -1;
>> +
>> +    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO)
>> +        send_fd = params->send_fd;
>> +
>> +    return do_domain_create(ctx, d_config, domid, restore_fd, send_fd,
>>                               params->checkpointed_stream, ao_how, aop_console_how);
>>   }
>>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index fbbae93..6d214b5 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -2874,6 +2874,7 @@ struct libxl__domain_save_state {
>>
>>       uint32_t domid;
>>       int fd;
>> +    int recv_fd;
>>       libxl_domain_type type;
>>       int live;
>>       int debug;
>> @@ -3143,6 +3144,7 @@ struct libxl__domain_create_state {
>>       libxl_domain_config *guest_config;
>>       libxl_domain_config guest_config_saved; /* vanilla config */
>>       int restore_fd;
>> +    int send_fd;
>>       libxl__domain_create_cb *callback;
>>       libxl_asyncprogress_how aop_console_how;
>>       /* private to domain_create */
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 23f27d4..8a3d7ba 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -198,6 +198,12 @@ libxl_viridian_enlightenment = Enumeration("viridian_enlightenment", [
>>       (3, "reference_tsc"),
>>       ])
>>
>> +libxl_checkpointed_stream = Enumeration("checkpointed_stream", [
>> +    (0, "NONE"),
>> +    (1, "REMUS"),
>> +    (2, "COLO"),
>> +    ], init_val = 0)
>
> The default init_val is 0 so you don't need to write it down.

Okay.

>
>> +
>>   #
>>   # Complex libxl types
>>   #
>> @@ -346,6 +352,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
>>
>>   libxl_domain_restore_params = Struct("domain_restore_params", [
>>       ("checkpointed_stream", integer),
>> +    ("send_fd", integer),
>
> I'm not entirely sure if we want to bury an extra argument here.
>
> After looking at code I think you're trying to work around API
> limitation. I think we are safe to extend the API -- we've already done
> that before. See libxl.h around line 990.
>
> Ian and Ian, what do you think?
>
>>       ])
>>
>>   libxl_domain_sched_params = Struct("domain_sched_params",[
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index c858068..adfadd1 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>
> I would also suggest you rename the last argument of migrate_receive in
> this file from "remus" to "checkpointed_stream_type" since the semantics
> of that parameter has changed.

I've renamed it to "checkpointed".

>
> Wei.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-12 15:04     ` Ian Jackson
@ 2015-06-15  1:38       ` Yang Hongyang
  2015-06-16 10:52         ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:38 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 06/12/2015 11:04 PM, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back"):
>> On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> In colo mode, slave needs to send data to master, but the io_fd
>>> only can be written in master, and only can be read in slave.
>>> Save recv_fd in domain_suspend_state, and send_fd in
>>> domain_create_state.
> ...
>>>   libxl_domain_restore_params = Struct("domain_restore_params", [
>>>       ("checkpointed_stream", integer),
>>> +    ("send_fd", integer),
>>
>> I'm not entirely sure if we want to bury an extra argument here.
>>
>> After looking at code I think you're trying to work around API
>> limitation. I think we are safe to extend the API -- we've already done
>> that before. See libxl.h around line 990.
>>
>> Ian and Ian, what do you think?
>
> I agree with you, Wei.  I don't think an fd should be in
> libxl_domain_restore_params at all.

Then I'll just extend the params of libxl_domain_create_restore().

>
> We need to understand what the API semantics are.  Are are going to
> introduce a new libxl API entrypoint ?  We already have
> libxl_domain_remus_start.

We use libxl_domain_remus_start for COLO. COLO is an option of "xl remus".

>
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-12 14:57       ` Ian Jackson
@ 2015-06-15  1:45         ` Yang Hongyang
  2015-06-15 16:24           ` Wei Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:45 UTC (permalink / raw)
  To: Ian Jackson, Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 06/12/2015 10:57 PM, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
>> On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
>>> On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
>>>> -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
>>>> -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
>>>> +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
>>>> +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
>>>
>>> You should add two new error numbers.
>>>
>>
>> And in that case you might also need to go through all places to make
>> sure the correct error numbers are return. I.e. old remus code path
>> still returns REMUS error code and new CHECKPOINT code path returns new
>> error code.
>>
>> I merely speak from API backward compatibility point of view. If you
>> think what I suggest doesn't make sense, please let me know.
>
> To me this line of reasons prompts me to ask: what would be wrong with
> leaving the word REMUS in the error names, and simply updating the
> descriptions ?
>
> After all AFIACT the circumstances are very similar.  I don't think it
> makes sense to require libxl to do something like
>     rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
>
> Please to contradict me if I have misunderstood...

COLO and REMUS both are checkpoint device. We use checkpoint device layer
as a more abstract layer for both COLO and REMUS, come to the error code,
these can be used by both COLO and REMUS. So we don't distinguish if we
are doing COLO or REMUS, uses are aware of what they're executing(colo
or remus).

>
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device
  2015-06-12 13:28   ` Wei Liu
@ 2015-06-15  1:46     ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:46 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/12/2015 09:28 PM, Wei Liu wrote:
> On Mon, Jun 08, 2015 at 11:43:17AM +0800, Yang Hongyang wrote:
>> Checkpoint device is an abstract layer to do checkpoint.
>> COLO can also use it to do checkpoint. But there are
>> still some codes in checkpoint device which touch remus:
>> 1. remus_ops: we use remus ops directly in checkpoint
>>     device. Store it in checkpoint device state.
>> 2. concrete layer's private member: add a new structure
>>     remus state, and move them to remus state.
>> 3. init/cleanup device subkind: we call (init|cleanup)_subkind_nic
>>     and (init|cleanup)_subkind_drbd_disk directly in checkpoint
>>     device. Call them before calling libxl__checkpoint_devices_setup()
>>     or after calling libxl__checkpoint_devices_teardown().
>>
>
>>From the look of it this patch is mostly refactoring and doesn't involve
> functional changes, right? If so please state that in commit message.

Yes, it is refactoring and no functional changes, will mention it in next
version.

>
> I suppose this needs review from remus maintainer.
>
> Wei.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
  2015-06-11 10:55           ` Ian Campbell
@ 2015-06-15  1:50             ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-15  1:50 UTC (permalink / raw)
  To: Ian Campbell, Andrew Cooper
  Cc: wei.liu2, wency, ian.jackson, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram



On 06/11/2015 06:55 PM, Ian Campbell wrote:
> On Thu, 2015-06-11 at 11:45 +0100, Andrew Cooper wrote:
>> On 11/06/15 09:41, Ian Campbell wrote:
>>> On Thu, 2015-06-11 at 10:07 +0800, Yang Hongyang wrote:
>>>> On 06/10/2015 11:20 PM, Ian Campbell wrote:
>>>>> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>>>>>> When we are under COLO, we will send dirty page bitmap info from
>>>>>> secondary to primary at every checkpoint.
>>>>> ... and this is a _libxl_ operation? Is that the right layer here?
>>>> For the first question, Yes, this is done in the suspend callback on
>>>> restore side. We do this in libxl because currently we only added a
>>>> back channel on libxl side. There're no back channel in libxc.
>>>>
>>>> By considering this more, if we do this in libxc part, the code will be
>>>> less complex: we can drop the 4th & 9th patch of this series and also
>>>> get rid of the get_dirty_pfn() callback. instead we will add a patch to
>>>> add back channel in libxc.
>>> That sounds better to me, but lets see what Andrew thinks.
>>>
>>>> For the second question, I'm not sure, what's Andrew's opinion? which
>>>> is the right layer to do this operation, libxl or libxc?
>>
>> There are a number of bits of information which would be useful going in
>> "the backchannel".
>>
>> Some are definitely more appropriate at the libxc level, but others are
>> more appropriate at the libxl.
>>
>> If you recall from the hackathon, there was an Alibaba usecase where
>> they wanted a positive success/fail from the receiving side that the VM
>> has started up successfully before choosing between cleaning up or
>> continuing the VM on the sending side.  This would have to be a libxl
>> level backchannel.
>
> FWIW this particular case is currently an xl level backchannel, but I
> think your general point stands.

So are you both agree that we should add a backchannel to libxc, move this
operation to libxc layer, what's other tools maintainers's opinion?

>
>> Whatever happens, backchannel wise, it should be a sensibly
>> type/length/chunk'd stream.  (I think there is a spec or two floating
>> around somewhere which might be a good start ;p)  There should probably
>> be a bit of active negotiation at the start of the backchannel to a)
>> confirm you have the correct backchannel and b) the backchannel is
>> actually functioning.
>>
>> The data on "the backchannel" is always going to be in reply to an
>> action taking place in the primary channel, but there are complications
>> in that the libxc bit is inherently a blocking model.  In terms of
>> coordination, I am leaning towards the view of it being easier and
>> cleaner for each level to maintain its own backchannel communication.
>> The libxc bits can expect to read some records out of the backchannel at
>> each checkpoint and take appropriate actions before starting the next
>> checkpoint.
>>
>> Thoughts?
>>
>> ~Andrew
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  2015-06-15  1:29     ` Yang Hongyang
@ 2015-06-15 16:22       ` Wei Liu
  2015-06-17  9:02         ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-15 16:22 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Wei Liu, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, Jun 15, 2015 at 09:29:55AM +0800, Yang Hongyang wrote:
> 
> 
> On 06/12/2015 08:33 PM, Wei Liu wrote:
> >On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote:
> >>Currently, libxl__domain_unpause() only supports
> >>qemu-xen-traditional. Update it to support qemu-xen.
> >>
> >>Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >>Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> >
> >This looks very similar to an existing function called
> >libxl__domain_resume_device_model. Maybe you don't need to invent a new
> >function.
> >
> >>---
> >>  tools/libxl/libxl.c | 42 +++++++++++++++++++++++++++++++++---------
> >>  1 file changed, 33 insertions(+), 9 deletions(-)
> >>
> >>diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> >>index d5691dc..5c843c2 100644
> >>--- a/tools/libxl/libxl.c
> >>+++ b/tools/libxl/libxl.c
> >>@@ -933,10 +933,37 @@ out:
> >>      return AO_INPROGRESS;
> >>  }
> >>
> >>-int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
> >>+static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
> >>  {
> >>      char *path;
> >>      char *state;
> >>+
> >>+    switch (libxl__device_model_version_running(gc, domid)) {
> >>+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
> >>+        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
> >>+
> >>+        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
> >>+        state = libxl__xs_read(gc, XBT_NULL, path);
> >>+        if (state != NULL && !strcmp(state, "paused")) {
> >
> >The only difference between your function and
> >libxl__domain_unpause_device_model is the check for "state" node. I
> >think you can just add the check to libxl__domain_resume_device_model
> >and use that function.
> 
> I'm not sure if we change the existing function's behavior will affect the
> existing callers, if there's no problem to do so, I will do as what you
> said in the next version.
> 

Qemu-dm currently has several states. libxl__domain_resume_device_model
doesn't check the state and writes unconditionally. I think checking
before writing would be an improvement.

Wei.

> >
> >Wei.
> >.
> >
> 
> -- 
> Thanks,
> Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-15  1:45         ` Yang Hongyang
@ 2015-06-15 16:24           ` Wei Liu
  2015-06-16 10:53             ` Ian Campbell
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-15 16:24 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Wei Liu, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	Ian Jackson, xen-devel, eddie.dong, guijianfeng, rshriram

On Mon, Jun 15, 2015 at 09:45:54AM +0800, Yang Hongyang wrote:
> 
> 
> On 06/12/2015 10:57 PM, Ian Jackson wrote:
> >Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
> >>On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
> >>>On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
> >>>>-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> >>>>-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> >>>>+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> >>>>+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
> >>>
> >>>You should add two new error numbers.
> >>>
> >>
> >>And in that case you might also need to go through all places to make
> >>sure the correct error numbers are return. I.e. old remus code path
> >>still returns REMUS error code and new CHECKPOINT code path returns new
> >>error code.
> >>
> >>I merely speak from API backward compatibility point of view. If you
> >>think what I suggest doesn't make sense, please let me know.
> >
> >To me this line of reasons prompts me to ask: what would be wrong with
> >leaving the word REMUS in the error names, and simply updating the
> >descriptions ?
> >
> >After all AFIACT the circumstances are very similar.  I don't think it
> >makes sense to require libxl to do something like
> >    rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
> >
> >Please to contradict me if I have misunderstood...
> 
> COLO and REMUS both are checkpoint device. We use checkpoint device layer
> as a more abstract layer for both COLO and REMUS, come to the error code,
> these can be used by both COLO and REMUS. So we don't distinguish if we
> are doing COLO or REMUS, uses are aware of what they're executing(colo
> or remus).
> 

Right. So continue using REMUS_ error code is fine.

Wei.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
  2015-06-08  3:43 ` [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Yang Hongyang
@ 2015-06-16 10:45   ` Ian Campbell
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-16 10:45 UTC (permalink / raw)
  To: Yang Hongyang, ian.jackson
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> Secondary vm is running in colo mode, we need to send
> secondary vm's dirty page information to master at checkpoint,
> so we have to enable qemu logdirty on secondary.
> 
> libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
> qemu logdirty. But it uses domain_save_state, and calls
> libxl__xc_domain_saverestore_async_callback_done()
> before exits. This can not be used for secondary vm.
> 
> Update libxl__domain_suspend_common_switch_qemu_logdirty() to
> introduce a new API libxl__domain_common_switch_qemu_logdirty().
> This API only uses libxl__logdirty_switch, and calls
> lds->callback before exits.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-08  3:43 ` [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
@ 2015-06-16 10:49   ` Ian Campbell
  2015-06-16 10:54     ` Wen Congyang
  2015-06-16 11:01     ` Ian Jackson
  2015-06-16 11:05   ` Ian Jackson
  1 sibling, 2 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-16 10:49 UTC (permalink / raw)
  To: Yang Hongyang, ian.jackson
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
>  Currently, all callbacks return an integer value or void. We cannot
>  return some data to xc via callback. Update libxl_save_msgs_gen.pl
>  to support this case.

(Odd two space indent from somewhere)

Can you be more specific about what kind of data you need to pass back
here? Is it an opaque "void *" blob or is it potentially more
structured? That is useful to know in case there might be a better way
to achieve the goal.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-15  1:38       ` Yang Hongyang
@ 2015-06-16 10:52         ` Ian Campbell
  2015-06-16 10:58           ` Ian Jackson
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-16 10:52 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Wei Liu, eddie.dong, wency, andrew.cooper3, yunhong.jiang,
	Ian Jackson, xen-devel, guijianfeng, rshriram

On Mon, 2015-06-15 at 09:38 +0800, Yang Hongyang wrote:
> 
> On 06/12/2015 11:04 PM, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back"):
> >> On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote:
> >>> From: Wen Congyang <wency@cn.fujitsu.com>
> >>>
> >>> In colo mode, slave needs to send data to master, but the io_fd
> >>> only can be written in master, and only can be read in slave.
> >>> Save recv_fd in domain_suspend_state, and send_fd in
> >>> domain_create_state.
> > ...
> >>>   libxl_domain_restore_params = Struct("domain_restore_params", [
> >>>       ("checkpointed_stream", integer),
> >>> +    ("send_fd", integer),
> >>
> >> I'm not entirely sure if we want to bury an extra argument here.
> >>
> >> After looking at code I think you're trying to work around API
> >> limitation. I think we are safe to extend the API -- we've already done
> >> that before. See libxl.h around line 990.
> >>
> >> Ian and Ian, what do you think?
> >
> > I agree with you, Wei.  I don't think an fd should be in
> > libxl_domain_restore_params at all.
> 
> Then I'll just extend the params of libxl_domain_create_restore().

One question first: is it really to be expected that send_fd and recv_fd
will be different, as opposed to the bidirectional data going over the
same socket?

> 
> >
> > We need to understand what the API semantics are.  Are are going to
> > introduce a new libxl API entrypoint ?  We already have
> > libxl_domain_remus_start.
> 
> We use libxl_domain_remus_start for COLO. COLO is an option of "xl remus".
> 
> >
> > Ian.
> > .
> >
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-15 16:24           ` Wei Liu
@ 2015-06-16 10:53             ` Ian Campbell
  2015-06-25  5:00               ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Campbell @ 2015-06-16 10:53 UTC (permalink / raw)
  To: Wei Liu
  Cc: eddie.dong, wency, andrew.cooper3, yunhong.jiang, Ian Jackson,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

On Mon, 2015-06-15 at 17:24 +0100, Wei Liu wrote:
> On Mon, Jun 15, 2015 at 09:45:54AM +0800, Yang Hongyang wrote:
> > 
> > 
> > On 06/12/2015 10:57 PM, Ian Jackson wrote:
> > >Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
> > >>On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
> > >>>On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
> > >>>>-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> > >>>>-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> > >>>>+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> > >>>>+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
> > >>>
> > >>>You should add two new error numbers.
> > >>>
> > >>
> > >>And in that case you might also need to go through all places to make
> > >>sure the correct error numbers are return. I.e. old remus code path
> > >>still returns REMUS error code and new CHECKPOINT code path returns new
> > >>error code.
> > >>
> > >>I merely speak from API backward compatibility point of view. If you
> > >>think what I suggest doesn't make sense, please let me know.
> > >
> > >To me this line of reasons prompts me to ask: what would be wrong with
> > >leaving the word REMUS in the error names, and simply updating the
> > >descriptions ?
> > >
> > >After all AFIACT the circumstances are very similar.  I don't think it
> > >makes sense to require libxl to do something like
> > >    rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
> > >
> > >Please to contradict me if I have misunderstood...
> > 
> > COLO and REMUS both are checkpoint device. We use checkpoint device layer
> > as a more abstract layer for both COLO and REMUS, come to the error code,
> > these can be used by both COLO and REMUS. So we don't distinguish if we
> > are doing COLO or REMUS, uses are aware of what they're executing(colo
> > or remus).
> > 
> 
> Right. So continue using REMUS_ error code is fine.

Seems like it would also be OK to switch the name and then in libxl,h

#ifdef LIB_API_VERSION < 0xWHENEVER
#define REMUS_BLAH CHECKPOINT_BLAH
#define ...
#endif

_If_ we think the new names make more sense going fwd...

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation
  2015-06-08  3:43 ` [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation Yang Hongyang
@ 2015-06-16 10:53   ` Ian Campbell
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2015-06-16 10:53 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson

On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> This is just tidying up after the previous automatic renaming.
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-16 10:49   ` Ian Campbell
@ 2015-06-16 10:54     ` Wen Congyang
  2015-06-16 10:56       ` Ian Jackson
  2015-06-16 11:01     ` Ian Jackson
  1 sibling, 1 reply; 106+ messages in thread
From: Wen Congyang @ 2015-06-16 10:54 UTC (permalink / raw)
  To: Ian Campbell, Yang Hongyang, ian.jackson
  Cc: wei.liu2, andrew.cooper3, yunhong.jiang, eddie.dong, xen-devel,
	guijianfeng, rshriram

On 06/16/2015 06:49 PM, Ian Campbell wrote:
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>>  Currently, all callbacks return an integer value or void. We cannot
>>  return some data to xc via callback. Update libxl_save_msgs_gen.pl
>>  to support this case.
> 
> (Odd two space indent from somewhere)
> 
> Can you be more specific about what kind of data you need to pass back
> here? Is it an opaque "void *" blob or is it potentially more
> structured? That is useful to know in case there might be a better way
> to achieve the goal.

The secondary vm's dirty bitmap.

Thanks
Wen Congyang

> 
> Ian.
> 
> .
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-16 10:54     ` Wen Congyang
@ 2015-06-16 10:56       ` Ian Jackson
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2015-06-16 10:56 UTC (permalink / raw)
  To: Wen Congyang
  Cc: wei.liu2, Ian Campbell, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram, Yang Hongyang

Wen Congyang writes ("Re: [Xen-devel] [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc"):
> On 06/16/2015 06:49 PM, Ian Campbell wrote:
> > Can you be more specific about what kind of data you need to pass back
> > here? Is it an opaque "void *" blob or is it potentially more
> > structured? That is useful to know in case there might be a better way
> > to achieve the goal.
> 
> The secondary vm's dirty bitmap.

This is going to be quite large, potentially.  Is assembling it in
memory in one go really the right approach ?

Also, I'm not sure why this information needs to be plumbed through to
libxl ?  Maybe it would be better to offer libxc a backchannel,
directly.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
  2015-06-16 10:52         ` Ian Campbell
@ 2015-06-16 10:58           ` Ian Jackson
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2015-06-16 10:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

Ian Campbell writes ("Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back"):
> On Mon, 2015-06-15 at 09:38 +0800, Yang Hongyang wrote:
> > Then I'll just extend the params of libxl_domain_create_restore().
> 
> One question first: is it really to be expected that send_fd and recv_fd
> will be different, as opposed to the bidirectional data going over the
> same socket?

If this is supposed to be able to work over a pipe, and ssh, then yes.

Even if you give ssh copies of the same socketpair fd on 0 and 1,
sshd's child gets ends of two different pipes on 0 and 1.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-16 10:49   ` Ian Campbell
  2015-06-16 10:54     ` Wen Congyang
@ 2015-06-16 11:01     ` Ian Jackson
  1 sibling, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2015-06-16 11:01 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, Yang Hongyang

Ian Campbell writes ("Re: [Xen-devel] [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc"):
> On Mon, 2015-06-08 at 11:43 +0800, Yang Hongyang wrote:
> > From: Wen Congyang <wency@cn.fujitsu.com>
> > 
> >  Currently, all callbacks return an integer value or void. We cannot
> >  return some data to xc via callback. Update libxl_save_msgs_gen.pl
> >  to support this case.

For some reason I have only v2 of this in my INBOX.  Should I start
trying to complain to corporate IT again ?

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-08  3:43 ` [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
  2015-06-16 10:49   ` Ian Campbell
@ 2015-06-16 11:05   ` Ian Jackson
  2015-06-16 14:19     ` Yang Hongyang
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Jackson @ 2015-06-16 11:05 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram

Yang Hongyang writes ("[Xen-devel] [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc"):
> From: Wen Congyang <wency@cn.fujitsu.com>
> 
>  Currently, all callbacks return an integer value or void. We cannot
>  return some data to xc via callback. Update libxl_save_msgs_gen.pl
>  to support this case.

Thanks for this.  I would have some comments on the details, but first
I want to properly understand your use case.  So while I'm the author
and maintainer of this save helper, I won't review this in detail just
yet.  I'm following the thread about what this is for...

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc
  2015-06-16 11:05   ` Ian Jackson
@ 2015-06-16 14:19     ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-16 14:19 UTC (permalink / raw)
  To: Ian Jackson
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, xen-devel, guijianfeng, rshriram

Hi Ian,

On 06/16/2015 07:05 PM, Ian Jackson wrote:
> Yang Hongyang writes ("[Xen-devel] [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc"):
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>>   Currently, all callbacks return an integer value or void. We cannot
>>   return some data to xc via callback. Update libxl_save_msgs_gen.pl
>>   to support this case.
>
> Thanks for this.  I would have some comments on the details, but first
> I want to properly understand your use case.  So while I'm the author
> and maintainer of this save helper, I won't review this in detail just
> yet.  I'm following the thread about what this is for...

We need to send secondary's dirty page pfn back to primary. Primary will
then send pages that are both dirtied on primary/secondary to secondary.
in this way the secondary's memory will be consistent with primary.

As we disscussed in [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h
If we move this operation to libxc layer, this patch could be dropped.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
  2015-06-15 16:22       ` Wei Liu
@ 2015-06-17  9:02         ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-17  9:02 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, guijianfeng, rshriram, ian.jackson



On 06/16/2015 12:22 AM, Wei Liu wrote:
> On Mon, Jun 15, 2015 at 09:29:55AM +0800, Yang Hongyang wrote:
>>
>>
>> On 06/12/2015 08:33 PM, Wei Liu wrote:
>>> On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote:
>>>> Currently, libxl__domain_unpause() only supports
>>>> qemu-xen-traditional. Update it to support qemu-xen.
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>
>>> This looks very similar to an existing function called
>>> libxl__domain_resume_device_model. Maybe you don't need to invent a new
>>> function.
>>>
>>>> ---
>>>>   tools/libxl/libxl.c | 42 +++++++++++++++++++++++++++++++++---------
>>>>   1 file changed, 33 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>>>> index d5691dc..5c843c2 100644
>>>> --- a/tools/libxl/libxl.c
>>>> +++ b/tools/libxl/libxl.c
>>>> @@ -933,10 +933,37 @@ out:
>>>>       return AO_INPROGRESS;
>>>>   }
>>>>
>>>> -int libxl__domain_unpause(libxl__gc *gc, uint32_t domid)
>>>> +static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid)
>>>>   {
>>>>       char *path;
>>>>       char *state;
>>>> +
>>>> +    switch (libxl__device_model_version_running(gc, domid)) {
>>>> +    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
>>>> +        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
>>>> +
>>>> +        path = libxl__device_model_xs_path(gc, dm_domid, domid, "/state");
>>>> +        state = libxl__xs_read(gc, XBT_NULL, path);
>>>> +        if (state != NULL && !strcmp(state, "paused")) {
>>>
>>> The only difference between your function and
>>> libxl__domain_unpause_device_model is the check for "state" node. I
>>> think you can just add the check to libxl__domain_resume_device_model
>>> and use that function.
>>
>> I'm not sure if we change the existing function's behavior will affect the
>> existing callers, if there's no problem to do so, I will do as what you
>> said in the next version.
>>
>
> Qemu-dm currently has several states. libxl__domain_resume_device_model
> doesn't check the state and writes unconditionally. I think checking
> before writing would be an improvement.

fixed, thank you!

>
> Wei.
>
>>>
>>> Wei.
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-16 10:53             ` Ian Campbell
@ 2015-06-25  5:00               ` Yang Hongyang
  2015-06-25  9:09                 ` Wei Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Yang Hongyang @ 2015-06-25  5:00 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu
  Cc: eddie.dong, wency, andrew.cooper3, yunhong.jiang, Ian Jackson,
	xen-devel, guijianfeng, rshriram



On 06/16/2015 06:53 PM, Ian Campbell wrote:
> On Mon, 2015-06-15 at 17:24 +0100, Wei Liu wrote:
>> On Mon, Jun 15, 2015 at 09:45:54AM +0800, Yang Hongyang wrote:
>>>
>>>
>>> On 06/12/2015 10:57 PM, Ian Jackson wrote:
>>>> Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
>>>>> On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
>>>>>> On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
>>>>>>> -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
>>>>>>> -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
>>>>>>> +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
>>>>>>> +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
>>>>>>
>>>>>> You should add two new error numbers.
>>>>>>
>>>>>
>>>>> And in that case you might also need to go through all places to make
>>>>> sure the correct error numbers are return. I.e. old remus code path
>>>>> still returns REMUS error code and new CHECKPOINT code path returns new
>>>>> error code.
>>>>>
>>>>> I merely speak from API backward compatibility point of view. If you
>>>>> think what I suggest doesn't make sense, please let me know.
>>>>
>>>> To me this line of reasons prompts me to ask: what would be wrong with
>>>> leaving the word REMUS in the error names, and simply updating the
>>>> descriptions ?
>>>>
>>>> After all AFIACT the circumstances are very similar.  I don't think it
>>>> makes sense to require libxl to do something like
>>>>     rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
>>>>
>>>> Please to contradict me if I have misunderstood...
>>>
>>> COLO and REMUS both are checkpoint device. We use checkpoint device layer
>>> as a more abstract layer for both COLO and REMUS, come to the error code,
>>> these can be used by both COLO and REMUS. So we don't distinguish if we
>>> are doing COLO or REMUS, uses are aware of what they're executing(colo
>>> or remus).
>>>
>>
>> Right. So continue using REMUS_ error code is fine.
>
> Seems like it would also be OK to switch the name and then in libxl,h
>
> #ifdef LIB_API_VERSION < 0xWHENEVER
> #define REMUS_BLAH CHECKPOINT_BLAH
> #define ...
> #endif
>
> _If_ we think the new names make more sense going fwd...

Well, I think the new names are better, I also think it is safe to just rename
them, I don't find any other users using these error codes except Remus/COLO,
it is only used by Remus/COLO internally.

>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-25  5:00               ` Yang Hongyang
@ 2015-06-25  9:09                 ` Wei Liu
  2015-06-25  9:16                   ` Yang Hongyang
  0 siblings, 1 reply; 106+ messages in thread
From: Wei Liu @ 2015-06-25  9:09 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: Wei Liu, Ian Campbell, wency, andrew.cooper3, yunhong.jiang,
	Ian Jackson, xen-devel, eddie.dong, guijianfeng, rshriram

On Thu, Jun 25, 2015 at 01:00:14PM +0800, Yang Hongyang wrote:
> 
> 
> On 06/16/2015 06:53 PM, Ian Campbell wrote:
> >On Mon, 2015-06-15 at 17:24 +0100, Wei Liu wrote:
> >>On Mon, Jun 15, 2015 at 09:45:54AM +0800, Yang Hongyang wrote:
> >>>
> >>>
> >>>On 06/12/2015 10:57 PM, Ian Jackson wrote:
> >>>>Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
> >>>>>On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
> >>>>>>On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
> >>>>>>>-    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
> >>>>>>>-    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> >>>>>>>+    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
> >>>>>>>+    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
> >>>>>>
> >>>>>>You should add two new error numbers.
> >>>>>>
> >>>>>
> >>>>>And in that case you might also need to go through all places to make
> >>>>>sure the correct error numbers are return. I.e. old remus code path
> >>>>>still returns REMUS error code and new CHECKPOINT code path returns new
> >>>>>error code.
> >>>>>
> >>>>>I merely speak from API backward compatibility point of view. If you
> >>>>>think what I suggest doesn't make sense, please let me know.
> >>>>
> >>>>To me this line of reasons prompts me to ask: what would be wrong with
> >>>>leaving the word REMUS in the error names, and simply updating the
> >>>>descriptions ?
> >>>>
> >>>>After all AFIACT the circumstances are very similar.  I don't think it
> >>>>makes sense to require libxl to do something like
> >>>>    rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
> >>>>
> >>>>Please to contradict me if I have misunderstood...
> >>>
> >>>COLO and REMUS both are checkpoint device. We use checkpoint device layer
> >>>as a more abstract layer for both COLO and REMUS, come to the error code,
> >>>these can be used by both COLO and REMUS. So we don't distinguish if we
> >>>are doing COLO or REMUS, uses are aware of what they're executing(colo
> >>>or remus).
> >>>
> >>
> >>Right. So continue using REMUS_ error code is fine.
> >
> >Seems like it would also be OK to switch the name and then in libxl,h
> >
> >#ifdef LIB_API_VERSION < 0xWHENEVER
> >#define REMUS_BLAH CHECKPOINT_BLAH
> >#define ...
> >#endif
> >
> >_If_ we think the new names make more sense going fwd...
> 
> Well, I think the new names are better, I also think it is safe to just rename
> them, I don't find any other users using these error codes except Remus/COLO,
> it is only used by Remus/COLO internally.
> 

The main point is, this is external visible interface. Some user might
have also developed their solution based on remus. In their code they
check for REMUS_$FOO error code.

I agree renaming to CHECKPOINT even for the sake of matching API names
is good. I think Ian's suggestion should be the simplest way of moving
forward.

Wei.

> >
> >
> >.
> >
> 
> -- 
> Thanks,
> Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device
  2015-06-25  9:09                 ` Wei Liu
@ 2015-06-25  9:16                   ` Yang Hongyang
  0 siblings, 0 replies; 106+ messages in thread
From: Yang Hongyang @ 2015-06-25  9:16 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, wency, andrew.cooper3, yunhong.jiang, Ian Jackson,
	xen-devel, eddie.dong, guijianfeng, rshriram



On 06/25/2015 05:09 PM, Wei Liu wrote:
> On Thu, Jun 25, 2015 at 01:00:14PM +0800, Yang Hongyang wrote:
>>
>>
>> On 06/16/2015 06:53 PM, Ian Campbell wrote:
>>> On Mon, 2015-06-15 at 17:24 +0100, Wei Liu wrote:
>>>> On Mon, Jun 15, 2015 at 09:45:54AM +0800, Yang Hongyang wrote:
>>>>>
>>>>>
>>>>> On 06/12/2015 10:57 PM, Ian Jackson wrote:
>>>>>> Wei Liu writes ("Re: [Xen-devel] [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device"):
>>>>>>> On Fri, Jun 12, 2015 at 02:30:46PM +0100, Wei Liu wrote:
>>>>>>>> On Mon, Jun 08, 2015 at 11:43:15AM +0800, Yang Hongyang wrote:
>>>>>>>>> -    (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
>>>>>>>>> -    (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
>>>>>>>>> +    (-18, "CHECKPOINT_DEVOPS_DOES_NOT_MATCH"),
>>>>>>>>> +    (-19, "CHECKPOINT_DEVICE_NOT_SUPPORTED"),
>>>>>>>>
>>>>>>>> You should add two new error numbers.
>>>>>>>>
>>>>>>>
>>>>>>> And in that case you might also need to go through all places to make
>>>>>>> sure the correct error numbers are return. I.e. old remus code path
>>>>>>> still returns REMUS error code and new CHECKPOINT code path returns new
>>>>>>> error code.
>>>>>>>
>>>>>>> I merely speak from API backward compatibility point of view. If you
>>>>>>> think what I suggest doesn't make sense, please let me know.
>>>>>>
>>>>>> To me this line of reasons prompts me to ask: what would be wrong with
>>>>>> leaving the word REMUS in the error names, and simply updating the
>>>>>> descriptions ?
>>>>>>
>>>>>> After all AFIACT the circumstances are very similar.  I don't think it
>>>>>> makes sense to require libxl to do something like
>>>>>>     rc = were_we_doing_colo_not_remus ? CHECKPOINT_BLAH : REMUS_BLAH;
>>>>>>
>>>>>> Please to contradict me if I have misunderstood...
>>>>>
>>>>> COLO and REMUS both are checkpoint device. We use checkpoint device layer
>>>>> as a more abstract layer for both COLO and REMUS, come to the error code,
>>>>> these can be used by both COLO and REMUS. So we don't distinguish if we
>>>>> are doing COLO or REMUS, uses are aware of what they're executing(colo
>>>>> or remus).
>>>>>
>>>>
>>>> Right. So continue using REMUS_ error code is fine.
>>>
>>> Seems like it would also be OK to switch the name and then in libxl,h
>>>
>>> #ifdef LIB_API_VERSION < 0xWHENEVER
>>> #define REMUS_BLAH CHECKPOINT_BLAH
>>> #define ...
>>> #endif
>>>
>>> _If_ we think the new names make more sense going fwd...
>>
>> Well, I think the new names are better, I also think it is safe to just rename
>> them, I don't find any other users using these error codes except Remus/COLO,
>> it is only used by Remus/COLO internally.
>>
>
> The main point is, this is external visible interface. Some user might
> have also developed their solution based on remus. In their code they
> check for REMUS_$FOO error code.
>
> I agree renaming to CHECKPOINT even for the sake of matching API names
> is good. I think Ian's suggestion should be the simplest way of moving
> forward.

Ok, will add an extra patch to deal with this back compatibility issue
in the next version.

>
> Wei.
>
>>>
>>>
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2015-06-25  9:16 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08  3:43 [PATCH v2 COLOPre 00/13] Prerequisite patches for COLO Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 01/13] libxc/restore: fix error handle of process_record Yang Hongyang
2015-06-08  9:24   ` Andrew Cooper
2015-06-08  9:37     ` Yang Hongyang
2015-06-08  9:39       ` Andrew Cooper
2015-06-10 14:55   ` Ian Campbell
2015-06-11  2:10     ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 02/13] tools/libxc: support to resume uncooperative HVM guests Yang Hongyang
2015-06-10 15:18   ` Ian Campbell
2015-06-11  2:42     ` Wen Congyang
2015-06-11  8:44       ` Ian Campbell
2015-06-11  8:56         ` Wen Congyang
2015-06-11  9:41           ` Ian Campbell
2015-06-08  3:43 ` [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time Yang Hongyang
2015-06-08  9:46   ` Andrew Cooper
2015-06-08  9:49     ` Andrew Cooper
2015-06-08  9:58     ` Yang Hongyang
2015-06-08 10:15       ` Andrew Cooper
2015-06-09  0:59         ` Yang Hongyang
2015-06-09  7:30           ` Andrew Cooper
2015-06-10  5:26             ` Yang Hongyang
2015-06-10  7:44               ` Andrew Cooper
2015-06-10  9:06                 ` Wen Congyang
2015-06-10 10:08                   ` Andrew Cooper
2015-06-10 10:35                     ` Paul Durrant
2015-06-10 10:40                   ` Paul Durrant
2015-06-10 10:54                     ` Wen Congyang
2015-06-10 10:58                       ` Paul Durrant
2015-06-10 11:37                         ` Wen Congyang
2015-06-10 11:47                           ` Paul Durrant
2015-06-11  1:13                             ` Wen Congyang
2015-06-11  8:32                               ` Paul Durrant
2015-06-11  8:48                                 ` Wen Congyang
2015-06-11 10:20                                   ` Paul Durrant
2015-06-11 11:14                                     ` Wen Congyang
2015-06-11 12:54                                       ` Yang Hongyang
2015-06-12  3:39                                         ` Yang Hongyang
2015-06-11 12:58                                     ` Yang Hongyang
2015-06-11 13:25                                       ` Paul Durrant
2015-06-12  3:22                                         ` Wen Congyang
2015-06-12  7:41                                           ` Paul Durrant
2015-06-12 10:26                                             ` Wen Congyang
2015-06-12 10:54                                               ` Paul Durrant
2015-06-12 11:09                                                 ` Wen Congyang
2015-06-12 11:48                                                   ` Paul Durrant
2015-06-12 15:04                                                     ` Wen Congyang
2015-06-12 15:31                                                       ` Paul Durrant
2015-06-13  5:58                                                         ` Wen Congyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 04/13] tools/libxc: export xc_bitops.h Yang Hongyang
2015-06-08 10:04   ` Yang Hongyang
2015-06-10 15:20   ` Ian Campbell
2015-06-11  2:07     ` Yang Hongyang
2015-06-11  8:41       ` Ian Campbell
2015-06-11 10:45         ` Andrew Cooper
2015-06-11 10:55           ` Ian Campbell
2015-06-15  1:50             ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 05/13] tools/libxl: introduce a new API libxl__domain_restore() to load qemu state Yang Hongyang
2015-06-10 15:35   ` Ian Campbell
2015-06-11  2:09     ` Yang Hongyang
2015-06-11  8:43       ` Ian Campbell
2015-06-11  8:55         ` Yang Hongyang
2015-06-11  9:41           ` Ian Campbell
2015-06-08  3:43 ` [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause() Yang Hongyang
2015-06-10 15:37   ` Ian Campbell
2015-06-11  2:21     ` Yang Hongyang
2015-06-11  8:43       ` Ian Campbell
2015-06-11  9:09         ` Wen Congyang
2015-06-11  9:42           ` Ian Campbell
2015-06-11  9:48             ` Wen Congyang
2015-06-12 11:23             ` Ian Jackson
2015-06-08  3:43 ` [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
2015-06-12 12:33   ` Wei Liu
2015-06-15  1:29     ` Yang Hongyang
2015-06-15 16:22       ` Wei Liu
2015-06-17  9:02         ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 08/13] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Yang Hongyang
2015-06-16 10:45   ` Ian Campbell
2015-06-08  3:43 ` [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
2015-06-16 10:49   ` Ian Campbell
2015-06-16 10:54     ` Wen Congyang
2015-06-16 10:56       ` Ian Jackson
2015-06-16 11:01     ` Ian Jackson
2015-06-16 11:05   ` Ian Jackson
2015-06-16 14:19     ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back Yang Hongyang
2015-06-12 12:54   ` Wei Liu
2015-06-12 15:04     ` Ian Jackson
2015-06-15  1:38       ` Yang Hongyang
2015-06-16 10:52         ` Ian Campbell
2015-06-16 10:58           ` Ian Jackson
2015-06-15  1:33     ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 11/13] tools/libxl: rename remus device to checkpoint device Yang Hongyang
2015-06-12 13:30   ` Wei Liu
2015-06-12 13:35     ` Wei Liu
2015-06-12 14:57       ` Ian Jackson
2015-06-15  1:45         ` Yang Hongyang
2015-06-15 16:24           ` Wei Liu
2015-06-16 10:53             ` Ian Campbell
2015-06-25  5:00               ` Yang Hongyang
2015-06-25  9:09                 ` Wei Liu
2015-06-25  9:16                   ` Yang Hongyang
2015-06-08  3:43 ` [PATCH v2 COLOPre 12/13] tools/libxl: adjust the indentation Yang Hongyang
2015-06-16 10:53   ` Ian Campbell
2015-06-08  3:43 ` [PATCH v2 COLOPre 13/13] tools/libxl: don't touch remus in checkpoint_device Yang Hongyang
2015-06-12 13:28   ` Wei Liu
2015-06-15  1:46     ` Yang Hongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.