All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2015-07-15  9:18 Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme Yang Hongyang
                   ` (24 more replies)
  0 siblings, 25 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This patchset implemented the COLO feature for Xen.
For detail/install/use of COLO feature, refer to:
  http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

In this series, we've rebased to the latest libxl migration v2.

This patchset is based on:
  [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO

Only support hvm guest for now. The code is also hosted on github:
  https://github.com/macrosheep/xen/tree/colo-v8

Changelog from v7 to v8:
1. Rebased to the latest libxl migration v2.

Changelog from v6 to v7:
1. Ported to Libxl migration v2
2. Send dirty bitmap from secondary to primary on libxc side
3. Address review comments

Changelog from v5 to v6:
1. based on migration v2(libxc)
2. split the patchset into prerequisite patchset and this main patchset.

Changelog from v4 to v5:
1. rebase to the latest xen upstream
2. disk replication: blktap2->qdisk
3. nic replication: colo-agent->colo-proxy

Changelog from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support


Wen Congyang (7):
  docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo
    streams
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/resume/checkpoint code
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  Support colo mode for qemu disk
  COLO: use qemu block replication

Yang Hongyang (18):
A  docs: add colo readme
  libxc/migration: Specification update for DIRTY_BITMAP records
  libxc/migration: export read_record for common use
  tools/libxl: add back channel support to write stream
  tools/libxl: write colo_context records into the stream
  tools/libxl: add back channel support to read stream
  tools/libxl: handle colo_context records in a libxl migration v2 read
    stream
  tools/libx{l,c}: introduce should_checkpoint callback
  tools/libx{l,c}: add postcopy/suspend callback to restore side
  libxc/restore: support COLO restore
  libxc/restore: send dirty bitmap to primary when checkpoint under colo
  libxc/save: support COLO save
  COLO proxy: implement setup/teardown of COLO proxy module
  COLO proxy: preresume, postresume and checkpoint
  COLO nic: implement COLO nic subkind
  setup and control colo proxy on primary side
  setup and control colo proxy on secondary side
  cmdline switches and config vars to control colo-proxy

 docs/README.colo                         |    9 +
 docs/man/xl.conf.pod.5                   |    6 +
 docs/man/xl.pod.1                        |   11 +-
 docs/misc/xl-disk-configuration.txt      |   38 ++
 docs/specs/libxc-migration-stream.pandoc |   24 +-
 docs/specs/libxl-migration-stream.pandoc |   22 +-
 tools/hotplug/Linux/Makefile             |    1 +
 tools/hotplug/Linux/colo-proxy-setup     |  131 ++++
 tools/libxc/include/xenguest.h           |   36 ++
 tools/libxc/xc_sr_common.c               |   50 ++
 tools/libxc/xc_sr_common.h               |   36 +-
 tools/libxc/xc_sr_restore.c              |  244 +++++--
 tools/libxc/xc_sr_save.c                 |  104 ++-
 tools/libxc/xc_sr_stream_format.h        |    1 +
 tools/libxl/Makefile                     |    4 +
 tools/libxl/libxl.c                      |   77 ++-
 tools/libxl/libxl_colo.h                 |   42 ++
 tools/libxl/libxl_colo_nic.c             |  320 ++++++++++
 tools/libxl/libxl_colo_proxy.c           |  267 ++++++++
 tools/libxl/libxl_colo_qdisk.c           |  209 ++++++
 tools/libxl/libxl_colo_restore.c         | 1024 ++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_save.c            |  709 +++++++++++++++++++++
 tools/libxl/libxl_create.c               |  153 ++++-
 tools/libxl/libxl_device.c               |   38 ++
 tools/libxl/libxl_dm.c                   |  257 +++++++-
 tools/libxl/libxl_dom_save.c             |   14 +-
 tools/libxl/libxl_internal.h             |  217 +++++--
 tools/libxl/libxl_qmp.c                  |   31 +
 tools/libxl/libxl_save_callout.c         |    7 +-
 tools/libxl/libxl_save_msgs_gen.pl       |   11 +-
 tools/libxl/libxl_sr_stream_format.h     |   11 +
 tools/libxl/libxl_stream_read.c          |   68 ++
 tools/libxl/libxl_stream_write.c         |  103 +++
 tools/libxl/libxl_types.idl              |    8 +
 tools/libxl/libxlu_disk_l.l              |    5 +
 tools/libxl/xl.c                         |    3 +
 tools/libxl/xl.h                         |    1 +
 tools/libxl/xl_cmdimpl.c                 |  101 ++-
 tools/libxl/xl_cmdtable.c                |    4 +-
 tools/python/xen/migration/libxl.py      |    9 +
 40 files changed, 4224 insertions(+), 182 deletions(-)
 create mode 100644 docs/README.colo
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_proxy.c
 create mode 100644 tools/libxl/libxl_colo_qdisk.c
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams Yang Hongyang
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

add colo readme, refer to
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 docs/README.colo | 9 +++++++++
 1 file changed, 9 insertions(+)
 create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 0000000..466eb72
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,9 @@
+COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
+project is a high availability solution. Both primary VM (PVM) and secondary VM
+(SVM) run in parallel. They receive the same request from client, and generate
+response in parallel too. If the response packets from PVM and SVM are
+identical, they are released immediately. Otherwise, a VM checkpoint (on demand)
+is conducted.
+
+See the website at http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
+for details.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 16:52   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records Yang Hongyang
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

It is the negotiation record for COLO.
Primary->Secondary:
control_id      0x00000000: Secondary VM is out of sync, start a new checkpoint
Secondary->Primary:
                0x00000001: Secondary VM is suspended
                0x00000002: Secondary VM is ready
                0x00000003: Secondary VM is resumed

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
 tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
 tools/python/xen/migration/libxl.py      |  9 +++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc
index c24a434..5986273 100644
--- a/docs/specs/libxl-migration-stream.pandoc
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -121,7 +121,9 @@ type         0x00000000: END
 
              0x00000004: CHECKPOINT_END
 
-             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x00000005: COLO_CONTEXT
+
+             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
              records.
 
              0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
@@ -215,3 +217,21 @@ A checkpoint end record marks the end of a checkpoint in the image.
     +-------------------------------------------------+
 
 The end record contains no fields; its body_length is 0.
+
+COLO\_CONTEXT
+--------------
+
+A COLO context record contains the control information for COLO.
+
+     0     1     2     3     4     5     6     7 octet
+    +------------------------+------------------------+
+    | control_id             | padding                |
+    +------------------------+------------------------+
+
+--------------------------------------------------------------------
+Field            Description
+------------     ---------------------------------------------------
+control_id       0x00000000: Secondary VM is out of sync, start a new checkpoint
+                 0x00000001: Secondary VM is suspended
+                 0x00000002: Secondary VM is ready
+                 0x00000003: Secondary VM is resumed
diff --git a/tools/libxl/libxl_sr_stream_format.h b/tools/libxl/libxl_sr_stream_format.h
index 3f3c497..1dd2ac4 100644
--- a/tools/libxl/libxl_sr_stream_format.h
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -36,6 +36,7 @@ typedef struct libxl__sr_rec_hdr
 #define REC_TYPE_XENSTORE_DATA       0x00000002U
 #define REC_TYPE_EMULATOR_CONTEXT    0x00000003U
 #define REC_TYPE_CHECKPOINT_END      0x00000004U
+#define REC_TYPE_COLO_CONTEXT        0x00000005U
 
 typedef struct libxl__sr_emulator_hdr
 {
@@ -47,6 +48,16 @@ typedef struct libxl__sr_emulator_hdr
 #define EMULATOR_QEMU_TRADITIONAL    0x00000001U
 #define EMULATOR_QEMU_UPSTREAM       0x00000002U
 
+typedef struct libxl_sr_colo_context
+{
+    uint32_t id;
+} libxl_sr_colo_context;
+
+#define COLO_NEW_CHECKPOINT          0x00000000U
+#define COLO_SVM_SUSPENDED           0x00000001U
+#define COLO_SVM_READY               0x00000002U
+#define COLO_SVM_RESUMED             0x00000003U
+
 #endif /* LIBXL__SR_STREAM_FORMAT_H */
 
 /*
diff --git a/tools/python/xen/migration/libxl.py b/tools/python/xen/migration/libxl.py
index 415502e..57031c6 100644
--- a/tools/python/xen/migration/libxl.py
+++ b/tools/python/xen/migration/libxl.py
@@ -37,6 +37,7 @@ REC_TYPE_libxc_context    = 0x00000001
 REC_TYPE_xenstore_data    = 0x00000002
 REC_TYPE_emulator_context = 0x00000003
 REC_TYPE_checkpoint_end   = 0x00000004
+REC_TYPE_colo_context     = 0x00000005
 
 rec_type_to_str = {
     REC_TYPE_end              : "End",
@@ -44,6 +45,7 @@ rec_type_to_str = {
     REC_TYPE_xenstore_data    : "Xenstore data",
     REC_TYPE_emulator_context : "Emulator context",
     REC_TYPE_checkpoint_end   : "Checkpoint end",
+    REC_TYPE_colo_context     : "COLO context"
 }
 
 # emulator_context
@@ -184,6 +186,11 @@ class VerifyLibxl(VerifyBase):
         if len(content) != 0:
             raise RecordError("Checkpoint end record with non-zero length")
 
+    def verify_record_colo_context(self, content):
+        """ COLO context """
+        if len(content) == 0:
+            raise RecordError("COLO context record with zero length")
+
 
 record_verifiers = {
     REC_TYPE_end:
@@ -196,4 +203,6 @@ record_verifiers = {
         VerifyLibxl.verify_record_emulator_context,
     REC_TYPE_checkpoint_end:
         VerifyLibxl.verify_record_checkpoint_end,
+    REC_TYPE_colo_context:
+        VerifyLibxl.verify_record_colo_context,
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:13   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use Yang Hongyang
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Used by secondary to send it's dirty bitmap to primary under COLO.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/specs/libxc-migration-stream.pandoc | 24 +++++++++++++++++++++++-
 tools/libxc/xc_sr_common.c               |  1 +
 tools/libxc/xc_sr_stream_format.h        |  1 +
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
index 68fa513..480d357 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -227,7 +227,9 @@ type         0x00000000: END
 
              0x0000000E: CHECKPOINT
 
-             0x0000000F - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x0000000F: DIRTY_BITMAP
+
+             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
              records.
 
              0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
@@ -601,6 +603,26 @@ CHECKPOINT record or an END record.
 
 \clearpage
 
+DIRTY_BITMAP
+------------
+
+A dirty_bitmap record is used for secondary to send it's dirty bitmap
+to primary while doing a checkpoint under COLO. This record only exists
+in back channel.
+
+     0     1     2     3     4     5     6     7 octet
+    +-------------------------------------------------+
+    | pfn[0]                                          |
+    +-------------------------------------------------+
+    ...
+    +-------------------------------------------------+
+    | pfn[C-1]                                        |
+    +-------------------------------------------------+
+
+The count of the pfn is: record->length/sizeof(uint64_t).
+
+\clearpage
+
 Layout
 ======
 
diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 945cfa6..becc0f4 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -35,6 +35,7 @@ static const char *mandatory_rec_types[] =
     [REC_TYPE_X86_PV_VCPU_MSRS]     = "x86 PV vcpu msrs",
     [REC_TYPE_VERIFY]               = "Verify",
     [REC_TYPE_CHECKPOINT]           = "Checkpoint",
+    [REC_TYPE_DIRTY_BITMAP]         = "Dirty bitmap",
 };
 
 const char *rec_type_to_str(uint32_t type)
diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h
index 6d0f8fd..43a0209 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -75,6 +75,7 @@ struct xc_sr_rhdr
 #define REC_TYPE_X86_PV_VCPU_MSRS     0x0000000cU
 #define REC_TYPE_VERIFY               0x0000000dU
 #define REC_TYPE_CHECKPOINT           0x0000000eU
+#define REC_TYPE_DIRTY_BITMAP         0x0000000fU
 
 #define REC_TYPE_OPTIONAL             0x80000000U
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (2 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:14   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream Yang Hongyang
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

read_record() could be used by primary to read dirty bitmap
record sent by secondary under COLO.
When used by save side, we need to pass the backchannel fd
instead of ctx->fd to read_record(), so we added a fd param to
it.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_common.c  | 49 +++++++++++++++++++++++++++++++++++
 tools/libxc/xc_sr_common.h  | 14 ++++++++++
 tools/libxc/xc_sr_restore.c | 63 +--------------------------------------------
 3 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index becc0f4..0ee607c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -89,6 +89,55 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec,
     return -1;
 }
 
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_rhdr rhdr;
+    size_t datasz;
+
+    if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
+    {
+        PERROR("Failed to read Record Header from stream");
+        return -1;
+    }
+    else if ( rhdr.length > REC_LENGTH_MAX )
+    {
+        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
+              rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+        return -1;
+    }
+
+    datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+
+    if ( datasz )
+    {
+        rec->data = malloc(datasz);
+
+        if ( !rec->data )
+        {
+            ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
+                  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+            return -1;
+        }
+
+        if ( read_exact(fd, rec->data, datasz) )
+        {
+            free(rec->data);
+            rec->data = NULL;
+            PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
+                   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+            return -1;
+        }
+    }
+    else
+        rec->data = NULL;
+
+    rec->type   = rhdr.type;
+    rec->length = rhdr.length;
+
+    return 0;
+};
+
 static void __attribute__((unused)) build_assertions(void)
 {
     XC_BUILD_BUG_ON(sizeof(struct xc_sr_ihdr) != 24);
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 28755ac..632160e 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -356,6 +356,20 @@ static inline int write_record(struct xc_sr_context *ctx,
 }
 
 /*
+ * Reads a record from the stream, and fills in the record structure.
+ *
+ * Returns 0 on success and non-0 on failure.
+ *
+ * On success, the records type and size shall be valid.
+ * - If size is 0, data shall be NULL.
+ * - If size is non-0, data shall be a buffer allocated by malloc() which must
+ *   be passed to free() by the caller.
+ *
+ * On failure, the contents of the record structure are undefined.
+ */
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+
+/*
  * This would ideally be private in restore.c, but is needed by
  * x86_pv_localise_page() if we receive pagetables frames ahead of the
  * contents of the frames they point at.
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 504463e..d53694b 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -69,67 +69,6 @@ static int read_headers(struct xc_sr_context *ctx)
 }
 
 /*
- * Reads a record from the stream, and fills in the record structure.
- *
- * Returns 0 on success and non-0 on failure.
- *
- * On success, the records type and size shall be valid.
- * - If size is 0, data shall be NULL.
- * - If size is non-0, data shall be a buffer allocated by malloc() which must
- *   be passed to free() by the caller.
- *
- * On failure, the contents of the record structure are undefined.
- */
-static int read_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
-{
-    xc_interface *xch = ctx->xch;
-    struct xc_sr_rhdr rhdr;
-    size_t datasz;
-
-    if ( read_exact(ctx->fd, &rhdr, sizeof(rhdr)) )
-    {
-        PERROR("Failed to read Record Header from stream");
-        return -1;
-    }
-    else if ( rhdr.length > REC_LENGTH_MAX )
-    {
-        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
-              rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
-        return -1;
-    }
-
-    datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
-
-    if ( datasz )
-    {
-        rec->data = malloc(datasz);
-
-        if ( !rec->data )
-        {
-            ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
-                  datasz, rhdr.type, rec_type_to_str(rhdr.type));
-            return -1;
-        }
-
-        if ( read_exact(ctx->fd, rec->data, datasz) )
-        {
-            free(rec->data);
-            rec->data = NULL;
-            PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
-                   datasz, rhdr.type, rec_type_to_str(rhdr.type));
-            return -1;
-        }
-    }
-    else
-        rec->data = NULL;
-
-    rec->type   = rhdr.type;
-    rec->length = rhdr.length;
-
-    return 0;
-};
-
-/*
  * Is a pfn populated?
  */
 static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
@@ -644,7 +583,7 @@ static int restore(struct xc_sr_context *ctx)
 
     do
     {
-        rc = read_record(ctx, &rec);
+        rc = read_record(ctx, ctx->fd, &rec);
         if ( rc )
         {
             if ( ctx->restore.buffer_all_records )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (3 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:25   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream Yang Hongyang
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Add back channel support to write stream. If the write stream is
a back channel stream, this means the write stream is used by
Secondary to send some records back.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_dom_save.c     |  1 +
 tools/libxl/libxl_internal.h     |  1 +
 tools/libxl/libxl_stream_write.c | 16 ++++++++++++++++
 3 files changed, 18 insertions(+)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 9b7159f..25813ce 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -445,6 +445,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
     dss->sws.ao  = dss->ao;
     dss->sws.dss = dss;
     dss->sws.fd  = dss->fd;
+    dss->sws.back_channel = false;
     dss->sws.completion_callback = stream_done;
 
     libxl__stream_write_start(egc, &dss->sws);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9c81d8d..a83d6a5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2989,6 +2989,7 @@ struct libxl__stream_write_state {
     libxl__ao *ao;
     libxl__domain_save_state *dss;
     int fd;
+    bool back_channel;
     void (*completion_callback)(libxl__egc *egc,
                                 libxl__stream_write_state *sws,
                                 int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 16f667a..df55277 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -47,6 +47,13 @@
  *  - Toolstack record
  *  - if (hvm), Qemu record
  *  - Checkpoint end record
+ *
+ * For back channel stream:
+ * - libxl__stream_write_start()
+ *    - Set up the stream to running state
+ *
+ * - Add a new API to write the record. When the record is written
+ *   out, call stream->checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -178,6 +185,9 @@ void libxl__stream_write_start(libxl__egc *egc,
 
     stream->running = true;
 
+    if (stream->back_channel)
+        return;
+
     dc->ao        = ao;
     dc->readfd    = -1;
     dc->writewhat = "save/migration stream";
@@ -207,6 +217,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc,
 {
     assert(stream->running);
     assert(!stream->in_checkpoint);
+    assert(!stream->back_channel);
     stream->in_checkpoint = true;
 
     write_toolstack_record(egc, stream);
@@ -500,6 +511,11 @@ static void stream_done(libxl__egc *egc,
     assert(stream->running);
     stream->running = false;
 
+    if (stream->back_channel) {
+        stream->completion_callback(egc, stream, stream->rc);
+        return;
+    }
+
     if (stream->emu_carefd)
         libxl__carefd_close(stream->emu_carefd);
     free(stream->emu_body);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (4 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:35   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream Yang Hongyang
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

write colo_context records into the stream, used by both
primary and secondary to send colo context.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h     |  5 +++
 tools/libxl/libxl_stream_write.c | 87 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a83d6a5..2634836 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3000,6 +3000,7 @@ struct libxl__stream_write_state {
     int rc;
     bool running;
     bool in_checkpoint;
+    bool in_colo_context;
     libxl__save_helper_state shs;
 
     /* Main stream-writing data. */
@@ -3019,6 +3020,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc,
 _hidden void
 libxl__stream_write_start_checkpoint(libxl__egc *egc,
                                      libxl__stream_write_state *stream);
+_hidden void
+libxl__stream_write_colo_context(libxl__egc *egc,
+                                 libxl__stream_write_state *stream,
+                                 libxl_sr_colo_context *colo_context);
 _hidden void libxl__stream_write_abort(libxl__egc *egc,
                                        libxl__stream_write_state *stream,
                                        int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index df55277..e7a32c4 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -96,6 +96,16 @@ static void write_checkpoint_end_record(libxl__egc *egc,
 static void checkpoint_end_record_done(libxl__egc *egc,
                                        libxl__stream_write_state *stream);
 
+/* COLO context */
+static void write_colo_context(libxl__egc *egc,
+                               libxl__stream_write_state *stream,
+                               libxl_sr_colo_context *colo_context);
+static void write_colo_context_done(libxl__egc *egc,
+                                    libxl__datacopier_state *dc,
+                                    int rc, int onwrite, int errnoval);
+static void colo_context_done(libxl__egc *egc,
+                              libxl__stream_write_state *stream, int rc);
+
 /*----- Helpers -----*/
 
 static void write_done(libxl__egc *egc,
@@ -500,6 +510,11 @@ static void stream_complete(libxl__egc *egc,
         return;
     }
 
+    if (stream->in_colo_context) {
+        colo_context_done(egc, stream, rc);
+        return;
+    }
+
     if (!stream->rc)
         stream->rc = rc;
     stream_done(egc, stream);
@@ -555,6 +570,78 @@ static void check_all_finished(libxl__egc *egc,
     stream->completion_callback(egc, stream, stream->rc);
 }
 
+/*----- COLO context -----*/
+void libxl__stream_write_colo_context(libxl__egc *egc,
+                                      libxl__stream_write_state *stream,
+                                      libxl_sr_colo_context *colo_context)
+{
+    assert(stream->running);
+    assert(!stream->in_checkpoint);
+    assert(!stream->in_colo_context);
+    stream->in_colo_context = true;
+
+    write_colo_context(egc, stream, colo_context);
+}
+
+static void write_colo_context(libxl__egc *egc,
+                               libxl__stream_write_state *stream,
+                               libxl_sr_colo_context *colo_context)
+{
+    static const uint8_t zero_padding[1U << REC_ALIGN_ORDER] = { 0 };
+    libxl__datacopier_state *dc = &stream->dc;
+    STATE_AO_GC(stream->ao);
+    struct libxl__sr_rec_hdr rec = { REC_TYPE_COLO_CONTEXT, 0 };
+    int rc = 0;
+    uint32_t padding_len;
+
+    dc->copywhat = "colo context record";
+    dc->writewhat = "save/migration stream";
+    dc->callback = write_colo_context_done;
+
+    rc = libxl__datacopier_start(dc);
+    if (rc)
+        goto err;
+
+    rec.length = sizeof(*colo_context);
+
+    libxl__datacopier_prefixdata(egc, dc, &rec, sizeof(rec));
+    libxl__datacopier_prefixdata(egc, dc, colo_context, rec.length);
+
+    padding_len = ROUNDUP(rec.length, REC_ALIGN_ORDER) - rec.length;
+    if (padding_len)
+        libxl__datacopier_prefixdata(egc, dc, zero_padding, padding_len);
+
+    return;
+
+ err:
+    assert(rc);
+    stream_complete(egc, stream, rc);
+}
+
+static void write_colo_context_done(libxl__egc *egc,
+                                    libxl__datacopier_state *dc,
+                                    int rc, int onwrite, int errnoval)
+{
+    libxl__stream_write_state *stream = CONTAINER_OF(dc, *stream, dc);
+    STATE_AO_GC(stream->ao);
+
+    if (rc || onwrite || errnoval) {
+        stream_complete(egc, stream, rc ?: ERROR_FAIL);
+        return;
+    }
+
+    colo_context_done(egc, stream, rc);
+    return;
+}
+
+static void colo_context_done(libxl__egc *egc,
+                              libxl__stream_write_state *stream, int rc)
+{
+    assert(stream->in_colo_context);
+    stream->in_colo_context = false;
+    stream->checkpoint_callback(egc, stream, rc);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (5 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:38   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 " Yang Hongyang
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

This is used by primay to read records sent by secondary.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_create.c      |  1 +
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_stream_read.c | 17 +++++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1d4b13b..1af7103 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -978,6 +978,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.dcs = dcs;
     dcs->srs.fd = restore_fd;
     dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
+    dcs->srs.back_channel = false;
     dcs->srs.completion_callback = domcreate_stream_done;
 
     libxl__stream_read_start(egc, &dcs->srs);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2634836..05cee04 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3358,6 +3358,7 @@ struct libxl__stream_read_state {
     libxl__domain_create_state *dcs;
     int fd;
     bool legacy;
+    bool back_channel;
     void (*completion_callback)(libxl__egc *egc,
                                 libxl__stream_read_state *srs,
                                 int rc);
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 2d17403..b924f05 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -104,6 +104,15 @@
  * Depending on the contents of the stream, there are likely to be several
  * parallel tasks being managed.  check_all_finished() is used to join all
  * tasks in both success and error cases.
+ *
+ * For back channel stream:
+ * - libxl__stream_read_start()
+ *    - Set up the stream to running state
+ *
+ * - libxl__stream_read_continue()
+ *     - Set up reading the next record from a started stream.
+ *       Add some codes to process_record() to handle the record.
+ *       Then call stream->checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -200,6 +209,9 @@ void libxl__stream_read_start(libxl__egc *egc,
     stream->running = true;
     stream->phase   = SRS_PHASE_NORMAL;
 
+    if (stream->back_channel)
+        return;
+
     if (stream->legacy) {
         /* Convert the legacy stream. */
         libxl__conversion_helper_state *chs = &stream->chs;
@@ -700,6 +712,11 @@ static void stream_done(libxl__egc *egc,
     assert(!stream->in_checkpoint);
     stream->running = false;
 
+    if (stream->back_channel) {
+        stream->completion_callback(egc, stream, stream->rc);
+        return;
+    }
+
     if (stream->incoming_record)
         free_record(stream->incoming_record);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 read stream
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (6 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 17:44   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback Yang Hongyang
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Read a colo_context and call stream->checkpoint_callback to handle it.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/libxl/libxl_internal.h    |  3 +++
 tools/libxl/libxl_stream_read.c | 51 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 05cee04..1be2a4a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3369,6 +3369,7 @@ struct libxl__stream_read_state {
     int rc;
     bool running;
     bool in_checkpoint;
+    bool in_colo_context;
     libxl__save_helper_state shs;
     libxl__conversion_helper_state chs;
 
@@ -3396,6 +3397,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc,
                                       libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
                                                  libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_colo_context(libxl__egc *egc,
+                                             libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_abort(libxl__egc *egc,
                                       libxl__stream_read_state *stream, int rc);
 static inline bool
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index b924f05..ab47251 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -152,6 +152,13 @@ static void write_emulator_done(libxl__egc *egc,
                                 libxl__datacopier_state *dc,
                                 int rc, int onwrite, int errnoval);
 
+/* Handlers for colo context mini-loop */
+static void handle_colo_context(libxl__egc *egc,
+                                libxl__stream_read_state *stream,
+                                libxl__sr_record_buf *rec);
+static void colo_context_done(libxl__egc *egc,
+                              libxl__stream_read_state *stream, int rc);
+
 /*----- Helpers -----*/
 
 /* Helper to set up reading some data from the stream. */
@@ -569,6 +576,15 @@ static bool process_record(libxl__egc *egc,
         checkpoint_done(egc, stream, 0);
         break;
 
+    case REC_TYPE_COLO_CONTEXT:
+        if (!stream->in_colo_context) {
+            LOG(ERROR, "Unexpected COLO_CONTEXT record in stream");
+            rc = ERROR_FAIL;
+            goto err;
+        }
+        handle_colo_context(egc, stream, rec);
+        break;
+
     default:
         LOG(ERROR, "Unrecognised record 0x%08x", rec->hdr.type);
         rc = ERROR_FAIL;
@@ -678,6 +694,11 @@ static void stream_complete(libxl__egc *egc,
         return;
     }
 
+    if (stream->in_colo_context) {
+        colo_context_done(egc, stream, rc);
+        return;
+    }
+
     if (!stream->rc)
         stream->rc = rc;
     stream_done(egc, stream);
@@ -794,6 +815,36 @@ static void check_all_finished(libxl__egc *egc,
     stream->completion_callback(egc, stream, stream->rc);
 }
 
+/*----- COLO context handlers -----*/
+
+void libxl__stream_read_colo_context(libxl__egc *egc,
+                                     libxl__stream_read_state *stream)
+{
+    assert(stream->running);
+    assert(!stream->in_checkpoint);
+    assert(!stream->in_colo_context);
+    stream->in_colo_context = true;
+
+    setup_read_record(egc, stream);
+}
+
+static void handle_colo_context(libxl__egc *egc,
+                                libxl__stream_read_state *stream,
+                                libxl__sr_record_buf *rec)
+{
+    libxl_sr_colo_context *colo_context = rec->body;
+
+    colo_context_done(egc, stream, colo_context->id);
+}
+
+static void colo_context_done(libxl__egc *egc,
+                              libxl__stream_read_state *stream, int rc)
+{
+    assert(stream->in_colo_context);
+    stream->in_colo_context = false;
+    stream->checkpoint_callback(egc, stream, rc);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (7 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 " Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side Yang Hongyang
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Under COLO, we are doing checkpoint on demand, if this
callback returns 1, we will take another checkpoint.
0 indicates unexpected error.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     | 18 ++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |  7 ++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 4056955..fa06d9b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -63,6 +63,15 @@ struct save_callbacks {
      * 1: take another checkpoint */
     int (*checkpoint)(void* data);
 
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*should_checkpoint)(void* data);
+
     /* Enable qemu-dm logging dirty pages to xen */
     int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */
 
@@ -112,6 +121,15 @@ struct restore_callbacks {
 #define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */
     int (*checkpoint)(void* data);
 
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*should_checkpoint)(void* data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index d6d2967..9107a86 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -26,11 +26,12 @@ our @msgs = (
     [  3, 'scxA',   "suspend", [] ],
     [  4, 'scxA',   "postcopy", [] ],
     [  5, 'srcxA',  "checkpoint", [] ],
-    [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
+    [  6, 'srcxA',  "should_checkpoint", [] ],
+    [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
-    [  7, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
                                               'unsigned long', 'console_mfn'] ],
-    [  8, 'srW',    "complete",              [qw(int retval
+    [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
 );
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (8 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code Yang Hongyang
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Secondary(restore side) is running under COLO, we also need
postcopy/suspend callbacks.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/include/xenguest.h     | 10 ++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |  4 ++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index fa06d9b..1e7e1bb 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -114,6 +114,16 @@ struct restore_callbacks {
     int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
 
+    /* Called after a new checkpoint to suspend the guest.
+     */
+    int (*suspend)(void* data);
+
+    /* Called after the secondary vm is ready to resume.
+     * Callback function resumes the guest & the device model,
+     * returns to xc_domain_restore.
+     */
+    int (*postcopy)(void* data);
+
     /* A checkpoint record has been found in the stream.
      * returns: */
 #define XGR_CHECKPOINT_ERROR    0 /* Terminate processing */
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 9107a86..7c9859b 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -23,8 +23,8 @@ our @msgs = (
                                                  STRING doing_what),
                                                 'unsigned long', 'done',
                                                 'unsigned long', 'total'] ],
-    [  3, 'scxA',   "suspend", [] ],
-    [  4, 'scxA',   "postcopy", [] ],
+    [  3, 'srcxA',  "suspend", [] ],
+    [  4, 'srcxA',  "postcopy", [] ],
     [  5, 'srcxA',  "checkpoint", [] ],
     [  6, 'srcxA',  "should_checkpoint", [] ],
     [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (9 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 12/25] primary " Yang Hongyang
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send LIBXL_COLO_SVM_READY to master.
   b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
   c. If it is the first resume(resume right after live migration),
      - call libxl__xc_domain_restore_done() to build the secondary vm.
      - enable secondary vm's logdirty.
      - call libxl__domain_resume() to resume secondary vm.
      - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
   d. Send LIBXL_COLO_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   b. Read LIBXL_COLO_NEW_CHECKPOINT from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Send LIBXL_COLO_SVM_SUSPENDED to master.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile             |   1 +
 tools/libxl/libxl_colo.h         |  27 ++
 tools/libxl/libxl_colo_restore.c | 991 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_create.c       | 111 ++++-
 tools/libxl/libxl_internal.h     |  19 +
 tools/libxl/libxl_save_callout.c |   7 +-
 6 files changed, 1154 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 3cb3ae9..97b3753 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -63,6 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_colo_restore.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
new file mode 100644
index 0000000..54dc835
--- /dev/null
+++ b/tools/libxl/libxl_colo.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_COLO_H
+#define LIBXL_COLO_H
+
+extern void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                                     int ret, int retval, int errnoval);
+extern void libxl__colo_restore_setup(libxl__egc *egc,
+                                      libxl__colo_restore_state *crs);
+extern void libxl__colo_restore_teardown(libxl__egc *egc,
+                                         libxl__colo_restore_state *crs,
+                                         int rc);
+
+#endif
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
new file mode 100644
index 0000000..5cda0b2
--- /dev/null
+++ b/tools/libxl/libxl_colo_restore.c
@@ -0,0 +1,991 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *         Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+#include "libxl_sr_stream_format.h"
+
+enum {
+    LIBXL_COLO_SETUPED,
+    LIBXL_COLO_SUSPENDED,
+    LIBXL_COLO_RESUMED,
+};
+
+typedef struct libxl__colo_restore_checkpoint_state libxl__colo_restore_checkpoint_state;
+struct libxl__colo_restore_checkpoint_state {
+    libxl__domain_suspend_state dsps;
+    libxl__logdirty_switch lds;
+    libxl__colo_restore_state *crs;
+    libxl__stream_write_state sws;
+    int status;
+    bool preresume;
+    /* used for teardown */
+    int teardown_devices;
+    int saved_rc;
+
+    void (*callback)(libxl__egc *,
+                     libxl__colo_restore_checkpoint_state *,
+                     int);
+};
+
+
+static void libxl__colo_restore_domain_resume_callback(void *data);
+static void libxl__colo_restore_domain_checkpoint_callback(void *data);
+static void libxl__colo_restore_domain_should_checkpoint_callback(void *data);
+static void libxl__colo_restore_domain_suspend_callback(void *data);
+
+static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    NULL,
+};
+
+/* ===================== colo: common functions ===================== */
+static void colo_enable_logdirty(libxl__colo_restore_state *crs, libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid,
+                          XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                          NULL, 0, NULL, 0, NULL) < 0) {
+        LOG(ERROR, "cannot enable secondary vm's logdirty");
+        lds->callback(egc, lds, ERROR_FAIL);
+        return;
+    }
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(egc, domid, 1, lds);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_disable_logdirty(libxl__colo_restore_state *crs,
+                                  libxl__egc *egc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const uint32_t domid = crs->domid;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    /* we need to know which pages are dirty to restore the guest */
+    if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_OFF,
+                          NULL, 0, NULL, 0, NULL) < 0)
+        LOG(WARN, "cannot disable secondary vm's logdirty");
+
+    if (crs->hvm) {
+        libxl__domain_common_switch_qemu_logdirty(egc, domid, 0, lds);
+        return;
+    }
+
+    lds->callback(egc, lds, 0);
+}
+
+static void colo_resume_vm(libxl__egc *egc,
+                           libxl__colo_restore_checkpoint_state *crcs,
+                           int restore_device_model)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (!crs->saved_cb) {
+        /* TODO: sync mmu for hvm? */
+        if (restore_device_model) {
+            rc = libxl__domain_restore_device_model(gc, crs->domid);
+            if (rc) {
+                LOG(ERROR, "cannot restore device model for secondary vm");
+                crcs->callback(egc, crcs, rc);
+                return;
+            }
+        }
+        rc = libxl__domain_resume(gc, crs->domid, 0);
+        if (rc)
+            LOG(ERROR, "cannot resume secondary vm");
+
+        crcs->callback(egc, crcs, rc);
+        return;
+    }
+
+    /*
+     * TODO: get store mfn and console mfn
+     *  We should call the callback restore_results in
+     *  xc_domain_restore() before resuming the guest.
+     */
+    libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
+
+    return;
+}
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+
+/* ================ colo: setup restore environment ================ */
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid);
+
+static int init_dsps(libxl__domain_suspend_state *dsps)
+{
+    int rc = ERROR_FAIL;
+    libxl_domain_type type;
+
+    STATE_AO_GC(dsps->ao);
+
+    type = libxl__domain_type(gc, dsps->domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID)
+        goto out;
+
+    libxl__xswait_init(&dsps->pvcontrol);
+    libxl__ev_evtchn_init(&dsps->guest_evtchn);
+    libxl__ev_xswatch_init(&dsps->guest_watch);
+    libxl__ev_time_init(&dsps->guest_timeout);
+
+    if (type == LIBXL_DOMAIN_TYPE_HVM)
+        dsps->hvm = 1;
+    else
+        dsps->hvm = 0;
+
+    dsps->guest_evtchn.port = -1;
+    dsps->guest_evtchn_lockfd = -1;
+    dsps->guest_responded = 0;
+    dsps->dm_savefile = libxl__device_model_savefile(gc, dsps->domid);
+
+    /* Secondary vm is not created, so we cannot get evtchn port */
+
+    rc = 0;
+
+out:
+    return rc;
+}
+
+void libxl__colo_restore_setup(libxl__egc *egc,
+                               libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs;
+    int rc = ERROR_FAIL;
+
+    /* Convenience aliases */
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->srs.shs.callbacks.restore.a;
+    const int domid = crs->domid;
+
+    STATE_AO_GC(crs->ao);
+
+    GCNEW(crcs);
+    crs->crcs = crcs;
+    crcs->crs = crs;
+
+    /* setup dsps */
+    crcs->dsps.ao = ao;
+    crcs->dsps.domid = domid;
+    if (init_dsps(&crcs->dsps))
+        goto err;
+
+    callbacks->suspend = libxl__colo_restore_domain_suspend_callback;
+    callbacks->postcopy = libxl__colo_restore_domain_resume_callback;
+    callbacks->checkpoint = libxl__colo_restore_domain_checkpoint_callback;
+    callbacks->should_checkpoint = libxl__colo_restore_domain_should_checkpoint_callback;
+
+    /*
+     * Secondary vm is running in colo mode, so we need to call
+     * libxl__xc_domain_restore_done() to create secondary vm.
+     * But we will exit in domain_create_cb(). So replace the
+     * callback here.
+     */
+    crs->saved_cb = dcs->callback;
+    dcs->callback = libxl__colo_domain_create_cb;
+    crcs->status = LIBXL_COLO_SETUPED;
+
+    libxl__logdirty_init(&crcs->lds);
+    crcs->lds.ao = ao;
+
+    crcs->sws.fd = crs->send_fd;
+    crcs->sws.ao = ao;
+    crcs->sws.back_channel = true;
+
+    libxl__stream_write_start(egc, &crcs->sws);
+
+    rc = 0;
+
+out:
+    crs->callback(egc, crs, rc);
+    return;
+
+err:
+    goto out;
+}
+
+static void libxl__colo_domain_create_cb(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int rc, uint32_t domid)
+{
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback(egc, crcs, rc);
+}
+
+
+/* ================ colo: teardown restore environment ================ */
+static void colo_restore_teardown_done(libxl__egc *egc,
+                                       libxl__checkpoint_devices_state *cds,
+                                       int rc);
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc);
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc);
+
+static void do_failover(libxl__egc *egc, libxl__colo_restore_state *crs)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    const int status = crcs->status;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+
+    STATE_AO_GC(crs->ao);
+
+    switch(status) {
+    case LIBXL_COLO_SETUPED:
+        /* We don't enable logdirty now */
+        colo_resume_vm(egc, crcs, 0);
+        return;
+    case LIBXL_COLO_SUSPENDED:
+    case LIBXL_COLO_RESUMED:
+        /* disable logdirty first */
+        lds->callback = colo_disable_logdirty_done;
+        colo_disable_logdirty(crs, egc);
+        return;
+    default:
+        LOG(ERROR, "invalid status: %d", status);
+        crcs->callback(egc, crcs, ERROR_FAIL);
+    }
+}
+
+void libxl__colo_restore_teardown(libxl__egc *egc,
+                                  libxl__colo_restore_state *crs,
+                                  int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    EGC_GC;
+
+    /* TODO: abort the stream it it is in use. */
+
+    crcs->saved_rc = rc;
+    if (!crcs->teardown_devices) {
+        colo_restore_teardown_done(egc, &crs->cds, 0);
+        return;
+    }
+
+    crs->cds.callback = colo_restore_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &crs->cds);
+}
+
+static void colo_restore_teardown_done(libxl__egc *egc,
+                                       libxl__checkpoint_devices_state *cds,
+                                       int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    EGC_GC;
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device for guest with domid %u,"
+            " rc %d", cds->domid, rc);
+
+    if (crcs->teardown_devices)
+        cleanup_device_subkind(cds);
+
+    rc = crcs->saved_rc;
+    if (!rc) {
+        crcs->callback = do_failover_done;
+        do_failover(egc, crs);
+        return;
+    }
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+    crs->callback(egc, crs, rc);
+}
+
+static void do_failover_done(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state* crcs,
+                             int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc)
+        LOG(ERROR, "cannot do failover");
+
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+    }
+
+    crs->callback(egc, crs, rc);
+}
+
+static void colo_disable_logdirty_done(libxl__egc *egc,
+                                       libxl__logdirty_switch *lds,
+                                       int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    STATE_AO_GC(lds->ao);
+
+    if (rc)
+        LOG(WARN, "cannot disable logdirty");
+
+    if (crcs->status == LIBXL_COLO_SUSPENDED) {
+        /*
+         * failover when reading state from master, so no need to
+         * call libxl__domain_restore().
+         */
+        colo_resume_vm(egc, crcs, 0);
+        return;
+    }
+
+    /* If we cannot disable logdirty, we still can do failover */
+    crcs->callback(egc, crcs, 0);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. resume
+ * 2. should_checkpoint
+ * 3. suspend
+ * 4. checkpoint
+ */
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc);
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc);
+/* ===================== colo: resume secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm the first time:
+ *  1. resume secondary vm
+ *  2. enable log dirty
+ *  3. setup checkpoint devices
+ *  4. write LIBXL_COLO_SVM_READY
+ *  5. unpause secondary vm
+ *  6. write LIBXL_COLO_SVM_RESUMED
+ *
+ * Do the following things when resuming secondary vm:
+ *  1. write LIBXL_COLO_SVM_READY
+ *  2. resume secondary vm
+ *  3. write LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_send_svm_ready(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs);
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc);
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc);
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs);
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int retval);
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc);
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc);
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs);
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs);
+
+static void libxl__colo_restore_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    if (crcs->teardown_devices)
+        colo_send_svm_ready(shs->egc, crcs);
+    else
+        colo_restore_resume_vm(shs->egc, crcs);
+}
+
+static void colo_send_svm_ready(libxl__egc *egc,
+                               libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl_sr_colo_context colo_context = { .id = COLO_SVM_READY };
+
+    crcs->callback = colo_send_svm_ready_done;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_colo_context(egc, &crcs->sws, &colo_context);
+}
+
+static void colo_send_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_restore_checkpoint_state *crcs,
+                                     int rc)
+{
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crcs->crs->cds;
+
+    if (!crcs->preresume) {
+        crcs->preresume = true;
+        colo_unpause_svm(egc, crcs);
+        return;
+    }
+
+    cds->callback = colo_restore_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, cds);
+}
+
+static void colo_restore_preresume_cb(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    colo_restore_resume_vm(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_resume_vm(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+
+    crcs->callback = colo_resume_vm_done;
+    colo_resume_vm(egc, crcs, 1);
+}
+
+static void colo_resume_vm_done(libxl__egc *egc,
+                                libxl__colo_restore_checkpoint_state *crcs,
+                                int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__logdirty_switch *const lds = &crcs->lds;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot resume secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_RESUMED;
+
+    /* avoid calling libxl__xc_domain_restore_done() more than once */
+    if (crs->saved_cb) {
+        dcs->callback = crs->saved_cb;
+        crs->saved_cb = NULL;
+
+        lds->callback = colo_enable_logdirty_done;
+        colo_enable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_write_svm_resumed(libxl__egc *egc,
+                                   libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl_sr_colo_context colo_context = { .id = COLO_SVM_RESUMED };
+
+    crcs->callback = NULL;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_colo_context(egc, &crcs->sws, &colo_context);
+}
+
+static void colo_enable_logdirty_done(libxl__egc *egc,
+                                      libxl__logdirty_switch *lds,
+                                      int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        /*
+         * log-dirty already enabled? There's no test op,
+         * so attempt to disable then reenable it
+         */
+        lds->callback = colo_reenable_logdirty;
+        colo_disable_logdirty(crs, egc);
+        return;
+    }
+
+    colo_setup_checkpoint_devices(egc, crs);
+}
+
+static void colo_reenable_logdirty(libxl__egc *egc,
+                                   libxl__logdirty_switch *lds,
+                                   int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__colo_restore_state *const crs = crcs->crs;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    lds->callback = colo_reenable_logdirty_done;
+    colo_enable_logdirty(crs, egc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_reenable_logdirty_done(libxl__egc *egc,
+                                        libxl__logdirty_switch *lds,
+                                        int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(lds, *crcs, lds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crcs->crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot enable logdirty");
+        goto out;
+    }
+
+    colo_setup_checkpoint_devices(egc, crcs->crs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+/*
+ * We cannot setup checkpoint devices in libxl__colo_restore_setup(),
+ * because the guest is not ready.
+ */
+static void colo_setup_checkpoint_devices(libxl__egc *egc,
+                                          libxl__colo_restore_state *crs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crs->cds;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crs->ao);
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->callback = colo_restore_setup_cds_done;
+    cds->ao = ao;
+    cds->domid = crs->domid;
+    cds->ops = colo_restore_ops;
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    crcs->teardown_devices = 1;
+
+    libxl__checkpoint_devices_setup(egc, cds);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_restore_setup_cds_done(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    /* Convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
+
+    colo_send_svm_ready(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+static void colo_unpause_svm(libxl__egc *egc,
+                             libxl__colo_restore_checkpoint_state *crcs)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int rc;
+
+    /* Convenience aliases */
+    const uint32_t domid = crcs->crs->domid;
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    STATE_AO_GC(crcs->crs->ao);
+
+    /* We have enabled secondary vm's logdirty, so we can unpause it now */
+    rc = libxl_domain_unpause(CTX, domid);
+    if (rc) {
+        LOG(ERROR, "cannot unpause secondary vm");
+        goto out;
+    }
+
+    colo_write_svm_resumed(egc, crcs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int real_size);
+
+static void libxl__colo_restore_domain_should_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &dcs->crs.cds;
+
+    cds->callback = colo_restore_commit_cb;
+    libxl__checkpoint_devices_commit(shs->egc, cds);
+}
+
+static void colo_restore_commit_cb(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    crcs->callback = colo_stream_read_done;
+    dcs->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_colo_context(egc, &dcs->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, 0);
+}
+
+static void colo_stream_read_done(libxl__egc *egc,
+                                  libxl__colo_restore_checkpoint_state *crcs,
+                                  int id)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok = 0;
+
+    STATE_AO_GC(dcs->ao);
+
+    if (id != COLO_NEW_CHECKPOINT) {
+        LOG(ERROR, "invalid section: %d", id);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
+
+
+/* ===================== colo: suspend secondary vm ===================== */
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. suspend secondary vm
+ *  2. send LIBXL_COLO_SVM_SUSPENDED
+ */
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dsps,
+                                 int ok);
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc);
+
+static void libxl__colo_restore_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    STATE_AO_GC(dcs->ao);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dsps = &crcs->dsps;
+
+    /* suspend secondary vm */
+    dsps->callback_common_done = colo_suspend_vm_done;
+
+    libxl__domain_suspend(shs->egc, dsps);
+}
+
+static void colo_suspend_vm_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dsps,
+                                 int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs = CONTAINER_OF(dsps, *crcs, dsps);
+    libxl__colo_restore_state *crs = crcs->crs;
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *cds = &crs->cds;
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot suspend secondary vm");
+        goto out;
+    }
+
+    crcs->status = LIBXL_COLO_SUSPENDED;
+
+    cds->callback = colo_restore_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, !rc);
+}
+
+static void colo_restore_postsuspend_cb(libxl__egc *egc,
+                                        libxl__checkpoint_devices_state *cds,
+                                        int rc)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    libxl__colo_restore_checkpoint_state *crcs = crs->crcs;
+    libxl_sr_colo_context colo_context = { .id = COLO_SVM_SUSPENDED };
+
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    crcs->callback = NULL;
+    crcs->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_colo_context(egc, &crcs->sws, &colo_context);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, !rc);
+}
+
+
+/* ======================== colo: checkpoint ======================= */
+/*
+ * Do the following things when resuming secondary vm:
+ *  1. read toolstack context
+ *  2. read emulator context
+ */
+static void libxl__colo_restore_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_read_state *srs = CONTAINER_OF(shs, *srs, shs);
+    libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+
+    crcs->callback = NULL;
+    dcs->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_start_checkpoint(shs->egc, &dcs->srs);
+}
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc)
+{
+    libxl__colo_restore_checkpoint_state *crcs =
+        CONTAINER_OF(stream, *crcs, sws);
+    libxl__domain_create_state *dcs = CONTAINER_OF(crcs->crs, *dcs, crs);
+    int ok;
+
+    STATE_AO_GC(stream->ao);
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    crcs->callback(egc, crcs, 0);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
+
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(stream, *dcs, srs);
+    libxl__colo_restore_checkpoint_state *crcs = dcs->crs.crcs;
+    int ok;
+
+    STATE_AO_GC(stream->ao);
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!crcs->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    /* rc contains the id */
+    crcs->callback(egc, crcs, rc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->srs.shs, ok);
+}
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1af7103..bf4b55d 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -19,6 +19,7 @@
 
 #include "libxl_internal.h"
 #include "libxl_arch.h"
+#include "libxl_colo.h"
 
 #include <xc_dom.h>
 #include <xenguest.h>
@@ -927,6 +928,93 @@ static void domcreate_console_available(libxl__egc *egc,
                                         dcs->aop_console_how.for_event));
 }
 
+static void libxl__colo_restore_teardown_done(libxl__egc *egc,
+                                              libxl__colo_restore_state *crs,
+                                              int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    STATE_AO_GC(crs->ao);
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+    const int domid = crs->domid;
+    const libxl_ctx *const ctx = libxl__gc_owner(gc);
+    xc_interface *const xch = ctx->xch;
+
+    if (!rc)
+        /* failover, no need to destroy the secondary vm */
+        goto out;
+
+    if (shs->retval)
+        /*
+         * shs->retval stores the return value of xc_domain_restore().
+         * If it is not 0, we have destroyed the secondary vm in
+         * xc_domain_restore();
+         */
+        goto out;
+
+    xc_domain_destroy(xch, domid);
+
+out:
+    dcs->callback(egc, dcs, rc, crs->domid);
+}
+
+void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+                              int ret, int retval, int errnoval)
+{
+    libxl__domain_create_state *dcs = dcs_void;
+    int rc = 1;
+
+    /* convenience aliases */
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    STATE_AO_GC(crs->ao);
+
+    /* teardown and failover */
+    crs->callback = libxl__colo_restore_teardown_done;
+
+    if (ret == 0 && retval == 0)
+        rc = 0;
+
+    LOG(INFO, "%s", rc ? "colo fails" : "failover");
+    libxl__colo_restore_teardown(egc, crs, rc);
+}
+
+static void libxl__colo_restore_cp_done(libxl__egc *egc,
+                                        libxl__colo_restore_state *crs,
+                                        int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+    int ok = 0;
+
+    /* convenience aliases */
+    libxl__save_helper_state *const shs = &dcs->srs.shs;
+
+    if (!rc)
+        ok = 1;
+
+    libxl__xc_domain_saverestore_async_callback_done(shs->egc, shs, ok);
+}
+
+static void libxl__colo_restore_setup_done(libxl__egc *egc,
+                                           libxl__colo_restore_state *crs,
+                                           int rc)
+{
+    libxl__domain_create_state *dcs = CONTAINER_OF(crs, *dcs, crs);
+
+    /* convenience aliases */
+    STATE_AO_GC(crs->ao);
+
+    if (rc) {
+        LOG(ERROR, "colo restore setup fails: %d", rc);
+        libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+        return;
+    }
+
+    crs->callback = libxl__colo_restore_cp_done;
+    /*TODO COLO*/
+    libxl__stream_read_start(egc, &dcs->srs);
+}
+
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
                                       int rc)
@@ -941,6 +1029,9 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->srs.shs.callbacks.restore.a;
+    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
+    libxl__colo_restore_state *const crs = &dcs->crs;
+    libxl_domain_build_info *const info = &d_config->b_info;
 
     if (rc) {
         domcreate_rebuild_done(egc, dcs, rc);
@@ -970,6 +1061,13 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     /* Restore */
     callbacks->checkpoint = libxl__remus_domain_restore_checkpoint_callback;
 
+    /* COLO only supports HVM now */
+    if (info->type != LIBXL_DOMAIN_TYPE_HVM &&
+        checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     rc = libxl__build_pre(gc, domid, d_config, state);
     if (rc)
         goto out;
@@ -981,7 +1079,18 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.back_channel = false;
     dcs->srs.completion_callback = domcreate_stream_done;
 
-    libxl__stream_read_start(egc, &dcs->srs);
+    /* colo restore setup */
+    if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        crs->ao = ao;
+        crs->domid = domid;
+        crs->send_fd = dcs->send_fd;
+        crs->recv_fd = restore_fd;
+        crs->hvm = (info->type == LIBXL_DOMAIN_TYPE_HVM);
+        crs->callback = libxl__colo_restore_setup_done;
+        libxl__colo_restore_setup(egc, crs);
+    } else
+        libxl__stream_read_start(egc, &dcs->srs);
+
     return;
 
  out:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1be2a4a..597866d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3407,6 +3407,24 @@ libxl__stream_read_inuse(const libxl__stream_read_state *stream)
     return stream->running;
 }
 
+/* colo related structure */
+typedef struct libxl__colo_restore_state libxl__colo_restore_state;
+typedef void libxl__colo_callback(libxl__egc *,
+                                  libxl__colo_restore_state *, int rc);
+struct libxl__colo_restore_state {
+    /* must set by caller of libxl__colo_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    int send_fd;
+    int recv_fd;
+    int hvm;
+    libxl__colo_callback *callback;
+
+    /* private, colo restore checkpoint state */
+    libxl__domain_create_cb *saved_cb;
+    void *crcs;
+    libxl__checkpoint_devices_state cds;
+};
 
 struct libxl__domain_create_state {
     /* filled in by user */
@@ -3421,6 +3439,7 @@ struct libxl__domain_create_state {
     /* private to domain_create */
     int guest_domid;
     libxl__domain_build_state build_state;
+    libxl__colo_restore_state crs;
     libxl__bootloader_state bl;
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index f8c6cf0..e9f3b2d 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -15,6 +15,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 /* stream_fd is as from the caller (eventually, the application).
  * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
@@ -68,7 +69,11 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->ao = ao;
     shs->domid = domid;
     shs->recv_callback = libxl__srm_callout_received_restore;
-    shs->completion_callback = libxl__xc_domain_restore_done;
+    if (dcs->restore_params.checkpointed_stream ==
+                                                LIBXL_CHECKPOINTED_STREAM_COLO)
+        shs->completion_callback = libxl__colo_restore_done;
+    else
+        shs->completion_callback = libxl__xc_domain_restore_done;
     shs->caller_state = dcs;
     shs->need_results = 1;
     shs->toolstack_data_file = 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 12/25] primary vm suspend/resume/checkpoint code
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (10 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore Yang Hongyang
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read LIBXL_COLO_SVM_SUSPENDED sent by secondary
2. Resume primary vm
   a. Read LIBXL_COLO_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read LIBXL_COLO_SVM_RESUMED from slave
3. Wait a new checkpoint
   a. Wait a new checkpoint(not implemented)
   b. Send LIBXL_COLO_NEW_CHECKPOINT to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile          |   2 +-
 tools/libxl/libxl.c           |   6 +-
 tools/libxl/libxl_colo.h      |  10 +
 tools/libxl/libxl_colo_save.c | 569 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom_save.c  |  13 +-
 tools/libxl/libxl_internal.h  | 167 +++++++------
 tools/libxl/libxl_types.idl   |   1 +
 7 files changed, 689 insertions(+), 79 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_save.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 97b3753..71bf7a2 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -63,7 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
-LIBXL_OBJS-y += libxl_colo_restore.o
+LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5502709..c040909 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 #define PAGE_TO_MEMKB(pages) ((pages) * 4)
 #define BACKEND_STRING_SIZE 5
@@ -845,7 +846,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     assert(info);
 
     /* Point of no return */
-    libxl__remus_setup(egc, &dss->rs);
+    if (libxl_defbool_val(info->colo))
+        libxl__colo_save_setup(egc, &dss->css);
+    else
+        libxl__remus_setup(egc, &dss->rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 54dc835..49a430b 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -24,4 +24,14 @@ extern void libxl__colo_restore_teardown(libxl__egc *egc,
                                          libxl__colo_restore_state *crs,
                                          int rc);
 
+extern void libxl__colo_save_domain_suspend_callback(void *data);
+extern void libxl__colo_save_domain_checkpoint_callback(void *data);
+extern void libxl__colo_save_domain_resume_callback(void *data);
+extern void libxl__colo_save_domain_should_checkpoint_callback(void *data);
+extern void libxl__colo_save_setup(libxl__egc *egc,
+                                   libxl__colo_save_state *css);
+extern void libxl__colo_save_teardown(libxl__egc *egc,
+                                      libxl__colo_save_state *css,
+                                      int rc);
+
 #endif
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
new file mode 100644
index 0000000..f0ab565
--- /dev/null
+++ b/tools/libxl/libxl_colo_save.c
@@ -0,0 +1,569 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *         Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+
+static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    NULL,
+};
+
+/* ================= helper functions ================= */
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* init device subkind-specific state in the libxl ctx */
+    int rc;
+    STATE_AO_GC(cds->ao);
+
+    rc = 0;
+    return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+    /* cleanup device subkind-specific state in the libxl ctx */
+    STATE_AO_GC(cds->ao);
+}
+
+/* ================= colo: setup save environment ================= */
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc);
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc);
+
+void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &css->cds;
+
+    STATE_AO_GC(dss->ao);
+
+    if (dss->type != LIBXL_DOMAIN_TYPE_HVM) {
+        LOG(ERROR, "COLO only supports hvm now");
+        goto out;
+    }
+
+    css->send_fd = dss->fd;
+    css->recv_fd = dss->recv_fd;
+    css->svm_running = false;
+
+    /* TODO: disk/nic support */
+    cds->device_kind_flags = 0;
+    cds->ops = colo_ops;
+    cds->callback = colo_save_setup_done;
+    cds->ao = ao;
+    cds->domid = dss->domid;
+
+    css->srs.ao = ao;
+    css->srs.fd = css->recv_fd;
+    css->srs.back_channel = true;
+    libxl__stream_read_start(egc, &css->srs);
+
+    if (init_device_subkind(cds))
+        goto out;
+
+    libxl__checkpoint_devices_setup(egc, &css->cds);
+
+    return;
+
+out:
+    libxl__ao_complete(egc, ao, ERROR_FAIL);
+}
+
+static void colo_save_setup_done(libxl__egc *egc,
+                                 libxl__checkpoint_devices_state *cds,
+                                 int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    STATE_AO_GC(cds->ao);
+
+    if (!rc) {
+        libxl__domain_save(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "COLO: failed to setup device for guest with domid %u",
+        dss->domid);
+    css->cds.callback = colo_save_setup_failed;
+    libxl__checkpoint_devices_teardown(egc, &css->cds);
+}
+
+static void colo_save_setup_failed(libxl__egc *egc,
+                                   libxl__checkpoint_devices_state *cds,
+                                   int rc)
+{
+    STATE_AO_GC(cds->ao);
+
+    if (rc)
+        LOG(ERROR, "COLO: failed to teardown device after setup failed"
+            " for guest with domid %u, rc %d", cds->domid, rc);
+
+    cleanup_device_subkind(cds);
+    libxl__ao_complete(egc, ao, rc);
+}
+
+
+/* ================= colo: teardown save environment ================= */
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc);
+
+void libxl__colo_save_teardown(libxl__egc *egc,
+                               libxl__colo_save_state *css,
+                               int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    LOG(WARN, "COLO: Domain suspend terminated with rc %d,"
+        " teardown COLO devices...", rc);
+    dss->css.cds.callback = colo_teardown_done;
+    libxl__checkpoint_devices_teardown(egc, &dss->css.cds);
+    return;
+}
+
+static void colo_teardown_done(libxl__egc *egc,
+                               libxl__checkpoint_devices_state *cds,
+                               int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    cleanup_device_subkind(cds);
+    dss->callback(egc, dss, rc);
+}
+
+/*
+ * checkpoint callbacks are called in the following order:
+ * 1. suspend
+ * 2. resume
+ * 3. checkpoint
+ */
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc);
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc);
+/* ===================== colo: suspend primary vm ===================== */
+
+static void colo_read_svm_suspended_done(libxl__egc *egc,
+                                         libxl__colo_save_state *css,
+                                         int id);
+/*
+ * Do the following things when suspending primary vm:
+ * 1. suspend primary vm
+ * 2. do postsuspend
+ * 3. read LIBXL_COLO_SVM_SUSPENDED
+ * 4. read secondary vm's dirty pages
+ */
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dsps,
+                                         int ok);
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc);
+
+void libxl__colo_save_domain_suspend_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *dsps = &dss->dsps;
+
+    dsps->callback_common_done = colo_suspend_primary_vm_done;
+    libxl__domain_suspend(egc, dsps);
+}
+
+static void colo_suspend_primary_vm_done(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dsps,
+                                         int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(dsps, *dss, dsps);
+
+    STATE_AO_GC(dsps->ao);
+
+    if (rc) {
+        LOG(ERROR, "cannot suspend primary vm");
+        goto out;
+    }
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_postsuspend_cb;
+    libxl__checkpoint_devices_postsuspend(egc, cds);
+    return;
+
+out:
+    dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+static void colo_postsuspend_cb(libxl__egc *egc,
+                                libxl__checkpoint_devices_state *cds,
+                                int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "postsuspend fails");
+        goto out;
+    }
+
+    if (!css->svm_running) {
+        rc = 0;
+        goto out;
+    }
+
+    /*
+     * read COLO_SVM_SUSPENDED
+     */
+    css->callback = colo_read_svm_suspended_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_colo_context(egc, &css->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, !rc);
+}
+
+static void colo_read_svm_suspended_done(libxl__egc *egc,
+                                         libxl__colo_save_state *css,
+                                         int id)
+{
+    int ok = 0;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (id != COLO_SVM_SUSPENDED) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id, COLO_SVM_SUSPENDED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+
+/* ===================== colo: send tailbuf ========================== */
+void libxl__colo_save_domain_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    /* write toolstack and emulator context, checkpoint end */
+    css->callback = NULL;
+    dss->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_start_checkpoint(shs->egc, &dss->sws);
+}
+
+/* ===================== colo: resume primary vm ===================== */
+/*
+ * Do the following things when resuming primary vm:
+ *  1. read LIBXL_COLO_SVM_READY
+ *  2. do preresume
+ *  3. resume primary vm
+ *  4. read LIBXL_COLO_SVM_RESUMED
+ */
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_save_state *dss, int rc);
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css,
+                                     int id);
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc);
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css,
+                                       int id);
+
+void libxl__colo_save_domain_resume_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__egc *egc = shs->egc;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+
+    /* This would go into tailbuf. */
+    if (dss->hvm) {
+        libxl__domain_save_device_model(egc, dss, colo_preresume_dm_saved);
+    } else {
+        colo_preresume_dm_saved(egc, dss, 0);
+    }
+
+    return;
+}
+
+static void colo_preresume_dm_saved(libxl__egc *egc,
+                                    libxl__domain_save_state *dss, int rc)
+{
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating COLO..");
+        goto out;
+    }
+
+    /* read COLO_SVM_READY */
+    css->callback = colo_read_svm_ready_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_colo_context(egc, &css->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_read_svm_ready_done(libxl__egc *egc,
+                                     libxl__colo_save_state *css,
+                                     int id)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (id != COLO_SVM_READY) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id, COLO_SVM_READY);
+        goto out;
+    }
+
+    css->svm_running = true;
+    css->cds.callback = colo_preresume_cb;
+    libxl__checkpoint_devices_preresume(egc, &css->cds);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_preresume_cb(libxl__egc *egc,
+                              libxl__checkpoint_devices_state *cds,
+                              int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "preresume fails");
+        goto out;
+    }
+
+    /* Resumes the domain and the device model */
+    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
+        LOG(ERROR, "cannot resume primary vm");
+        goto out;
+    }
+
+    /* read COLO_SVM_RESUMED */
+    css->callback = colo_read_svm_resumed_done;
+    css->srs.checkpoint_callback = colo_common_read_stream_done;
+    libxl__stream_read_colo_context(egc, &css->srs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_read_svm_resumed_done(libxl__egc *egc,
+                                       libxl__colo_save_state *css,
+                                       int id)
+{
+    int ok = 0;
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(css->cds.ao);
+
+    if (id != COLO_SVM_RESUMED) {
+        LOG(ERROR, "invalid section: %d, expected: %d", id, COLO_SVM_RESUMED);
+        goto out;
+    }
+
+    ok = 1;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+
+/* ===================== colo: wait new checkpoint ===================== */
+/*
+ * Do the following things:
+ * 1. do commit
+ * 2. wait for a new checkpoint
+ * 3. write LIBXL_COLO_NEW_CHECKPOINT
+ */
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc);
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+
+void libxl__colo_save_domain_should_checkpoint_callback(void *data)
+{
+    libxl__save_helper_state *shs = data;
+    libxl__stream_write_state *sws = CONTAINER_OF(shs, *sws, shs);
+    libxl__domain_save_state *dss = sws->dss;
+    libxl__egc *egc = dss->sws.shs.egc;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = &dss->css.cds;
+
+    cds->callback = colo_device_commit_cb;
+    libxl__checkpoint_devices_commit(egc, cds);
+}
+
+static void colo_device_commit_cb(libxl__egc *egc,
+                                  libxl__checkpoint_devices_state *cds,
+                                  int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+
+    STATE_AO_GC(cds->ao);
+
+    if (rc) {
+        LOG(ERROR, "commit fails");
+        goto out;
+    }
+
+    /* TODO: wait a new checkpoint */
+    colo_start_new_checkpoint(egc, cds, 0);
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    libxl_sr_colo_context colo_context = { .id = COLO_NEW_CHECKPOINT };
+
+    if (rc)
+        goto out;
+
+    /* write COLO_NEW_CHECKPOINT */
+    css->callback = NULL;
+    dss->sws.checkpoint_callback = colo_common_write_stream_done;
+    libxl__stream_write_colo_context(egc, &dss->sws, &colo_context);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, 0);
+}
+
+
+/* ===================== colo: common callback ===================== */
+static void colo_common_write_stream_done(libxl__egc *egc,
+                                          libxl__stream_write_state *stream,
+                                          int rc)
+{
+    libxl__domain_save_state *dss = CONTAINER_OF(stream, *dss, sws);
+    int ok;
+
+    /* Convenience aliases */
+    libxl__colo_save_state *const css = &dss->css;
+
+    STATE_AO_GC(stream->ao);
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    css->callback(egc, css, 0);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
+
+static void colo_common_read_stream_done(libxl__egc *egc,
+                                         libxl__stream_read_state *stream,
+                                         int rc)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(stream, *css, srs);
+    libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
+    int ok;
+
+    STATE_AO_GC(stream->ao);
+
+    if (rc < 0) {
+        /* TODO: it may be a internal error, but we don't know */
+        LOG(ERROR, "sending data fails");
+        ok = 2;
+        goto out;
+    }
+
+    if (!css->callback) {
+        /* Everythins is OK */
+        ok = 1;
+        goto out;
+    }
+
+    /* rc contains the id */
+    css->callback(egc, css, rc);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->sws.shs, ok);
+}
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 25813ce..851dd24 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -16,6 +16,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_colo.h"
 
 struct libxl__physmap_info {
     uint64_t phys_offset;
@@ -437,6 +438,11 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_save_checkpoint_callback;
+    } else if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        callbacks->suspend = libxl__colo_save_domain_suspend_callback;
+        callbacks->postcopy = libxl__colo_save_domain_resume_callback;
+        callbacks->checkpoint = libxl__colo_save_domain_checkpoint_callback;
+        callbacks->should_checkpoint = libxl__colo_save_domain_should_checkpoint_callback;
     } else
         callbacks->suspend = libxl__domain_suspend_callback;
 
@@ -573,12 +579,15 @@ static void domain_save_done(libxl__egc *egc,
     }
 
     /*
-     * With Remus, if we reach this point, it means either
+     * With Remus/COLO, if we reach this point, it means either
      * backup died or some network error occurred preventing us
      * from sending checkpoints. Teardown the network buffers and
      * release netlink resources.  This is an async op.
      */
-    libxl__remus_teardown(egc, &dss->rs, rc);
+    if (libxl_defbool_val(dss->remus->colo))
+        libxl__colo_save_teardown(egc, &dss->css, rc);
+    else
+        libxl__remus_teardown(egc, &dss->rs, rc);
 }
 
 /*========================= Domain restore ============================*/
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 597866d..c429852 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2745,7 +2745,7 @@ typedef struct libxl__save_helper_state {
 /*
  * The abstract checkpoint device layer exposes a common
  * set of API to [external] libxl for manipulating devices attached to
- * a guest protected by Remus. The device layer also exposes a set of
+ * a guest protected by Remus/COLO. The device layer also exposes a set of
  * [internal] interfaces that every device type must implement.
  *
  * The following API are exposed to libxl:
@@ -2763,7 +2763,7 @@ typedef struct libxl__save_helper_state {
  *  +libxl__checkpoint_devices_commit
  *
  * Each device type needs to implement the interfaces specified in
- * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
+ * the libxl__checkpoint_device_instance_ops if it wishes to support Remus/COLO.
  *
  * The high-level control flow through the checkpoint device layer is shown
  * below:
@@ -2783,7 +2783,7 @@ typedef struct libxl__checkpoint_device_instance_ops libxl__checkpoint_device_in
 
 /*
  * Interfaces to be implemented by every device subkind that wishes to
- * support Remus. Functions must be implemented unless otherwise
+ * support Remus/COLO. Functions must be implemented unless otherwise
  * stated. Many of these functions are asynchronous. They call
  * dev->aodev.callback when done.  The actual implementations may be
  * synchronous and call dev->aodev.callback directly (as the last
@@ -2962,6 +2962,89 @@ static inline bool libxl__conversion_helper_inuse
                     (const libxl__conversion_helper_state *chs)
 { return libxl__ev_child_inuse(&chs->child); }
 
+/* State for manipulating a libxl migration v2 stream */
+typedef struct libxl__domain_create_state libxl__domain_create_state;
+
+typedef void libxl__domain_create_cb(libxl__egc *egc,
+                                     libxl__domain_create_state*,
+                                     int rc, uint32_t domid);
+
+typedef struct libxl__stream_read_state libxl__stream_read_state;
+
+typedef struct libxl__sr_record_buf {
+    /* private to stream read helper */
+    LIBXL_STAILQ_ENTRY(struct libxl__sr_record_buf) entry;
+    libxl__sr_rec_hdr hdr;
+    void *body; /* iff hdr.length != 0 */
+} libxl__sr_record_buf;
+
+struct libxl__stream_read_state {
+    /* filled by the user */
+    libxl__ao *ao;
+    libxl__domain_create_state *dcs;
+    int fd;
+    bool legacy;
+    bool back_channel;
+    void (*completion_callback)(libxl__egc *egc,
+                                libxl__stream_read_state *srs,
+                                int rc);
+    void (*checkpoint_callback)(libxl__egc *egc,
+                                libxl__stream_read_state *srs,
+                                int rc);
+    /* Private */
+    int rc;
+    bool running;
+    bool in_checkpoint;
+    bool in_colo_context;
+    libxl__save_helper_state shs;
+    libxl__conversion_helper_state chs;
+
+    /* Main stream-reading data. */
+    libxl__datacopier_state dc; /* Only used when reading a record */
+    libxl__sr_hdr hdr;
+    LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue; /* NOGC */
+    enum {
+        SRS_PHASE_NORMAL,
+        SRS_PHASE_BUFFERING,
+        SRS_PHASE_UNBUFFERING,
+    } phase;
+    bool recursion_guard;
+
+    /* Only used while actively reading a record from the stream. */
+    libxl__sr_record_buf *incoming_record; /* NOGC */
+
+    /* Both only used when processing an EMULATOR record. */
+    libxl__datacopier_state emu_dc;
+    libxl__carefd *emu_carefd;
+};
+
+_hidden void libxl__stream_read_init(libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_start(libxl__egc *egc,
+                                      libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
+                                                 libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_colo_context(libxl__egc *egc,
+                                             libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_abort(libxl__egc *egc,
+                                      libxl__stream_read_state *stream, int rc);
+static inline bool
+libxl__stream_read_inuse(const libxl__stream_read_state *stream)
+{
+    return stream->running;
+}
+
+/*----- colo related state structure -----*/
+typedef struct libxl__colo_save_state libxl__colo_save_state;
+struct libxl__colo_save_state {
+    libxl__checkpoint_devices_state cds;
+    int send_fd;
+    int recv_fd;
+
+    /* private */
+    libxl__stream_read_state srs;
+    void (*callback)(libxl__egc *, libxl__colo_save_state *, int);
+    bool svm_running;
+};
 
 /*----- Domain suspend (save) state structure -----*/
 /*
@@ -3089,7 +3172,12 @@ struct libxl__domain_save_state {
     int hvm;
     int xcflags;
     libxl__domain_suspend_state dsps;
-    libxl__remus_state rs;
+    union {
+        /* for Remus */
+        libxl__remus_state rs;
+        /* for COLO */
+        libxl__colo_save_state css;
+    };
     libxl__stream_write_state sws;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
@@ -3336,77 +3424,6 @@ _hidden int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t domid);
 
 /*----- Domain creation -----*/
 
-typedef struct libxl__domain_create_state libxl__domain_create_state;
-
-typedef void libxl__domain_create_cb(libxl__egc *egc,
-                                     libxl__domain_create_state*,
-                                     int rc, uint32_t domid);
-
-/* State for manipulating a libxl migration v2 stream */
-typedef struct libxl__stream_read_state libxl__stream_read_state;
-
-typedef struct libxl__sr_record_buf {
-    /* private to stream read helper */
-    LIBXL_STAILQ_ENTRY(struct libxl__sr_record_buf) entry;
-    libxl__sr_rec_hdr hdr;
-    void *body; /* iff hdr.length != 0 */
-} libxl__sr_record_buf;
-
-struct libxl__stream_read_state {
-    /* filled by the user */
-    libxl__ao *ao;
-    libxl__domain_create_state *dcs;
-    int fd;
-    bool legacy;
-    bool back_channel;
-    void (*completion_callback)(libxl__egc *egc,
-                                libxl__stream_read_state *srs,
-                                int rc);
-    void (*checkpoint_callback)(libxl__egc *egc,
-                                libxl__stream_read_state *srs,
-                                int rc);
-    /* Private */
-    int rc;
-    bool running;
-    bool in_checkpoint;
-    bool in_colo_context;
-    libxl__save_helper_state shs;
-    libxl__conversion_helper_state chs;
-
-    /* Main stream-reading data. */
-    libxl__datacopier_state dc; /* Only used when reading a record */
-    libxl__sr_hdr hdr;
-    LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue; /* NOGC */
-    enum {
-        SRS_PHASE_NORMAL,
-        SRS_PHASE_BUFFERING,
-        SRS_PHASE_UNBUFFERING,
-    } phase;
-    bool recursion_guard;
-
-    /* Only used while actively reading a record from the stream. */
-    libxl__sr_record_buf *incoming_record; /* NOGC */
-
-    /* Both only used when processing an EMULATOR record. */
-    libxl__datacopier_state emu_dc;
-    libxl__carefd *emu_carefd;
-};
-
-_hidden void libxl__stream_read_init(libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_start(libxl__egc *egc,
-                                      libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
-                                                 libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_colo_context(libxl__egc *egc,
-                                             libxl__stream_read_state *stream);
-_hidden void libxl__stream_read_abort(libxl__egc *egc,
-                                      libxl__stream_read_state *stream, int rc);
-static inline bool
-libxl__stream_read_inuse(const libxl__stream_read_state *stream)
-{
-    return stream->running;
-}
-
 /* colo related structure */
 typedef struct libxl__colo_restore_state libxl__colo_restore_state;
 typedef void libxl__colo_callback(libxl__egc *,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1d676ef..19bd2c1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -721,6 +721,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("netbuf",       libxl_defbool),
     ("netbufscript", string),
     ("diskbuf",      libxl_defbool),
+    ("colo",         libxl_defbool)
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (11 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 12/25] primary " Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo Yang Hongyang
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

call the callbacks resume/checkpoint/suspend while secondary vm
status is consistent with primary.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_common.h  | 16 ++++++++++--
 tools/libxc/xc_sr_restore.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 632160e..c5603ff 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -167,6 +167,18 @@ struct xc_sr_context
 
     xc_dominfo_t dominfo;
 
+    /*
+     * migration stream
+     * 0: Plain VM
+     * 1: Remus
+     * 2: COLO
+     */
+    enum {
+        MIG_STREAM_PLAIN,
+        MIG_STREAM_REMUS,
+        MIG_STREAM_COLO,
+    } migration_stream;
+
     union /* Common save or restore data. */
     {
         struct /* Save data. */
@@ -209,13 +221,13 @@ struct xc_sr_context
             uint32_t guest_page_size;
 
             /* Plain VM, or checkpoints over time. */
-            bool checkpointed;
+            int checkpointed;
 
             /* Currently buffering records between a checkpoint */
             bool buffer_all_records;
 
 /*
- * With Remus, we buffer the records sent by the primary at checkpoint,
+ * With Remus/COLO, we buffer the records sent by the primary at checkpoint,
  * in case the primary will fail, we can recover from the last
  * checkpoint state.
  * This should be enough for most of the cases because primary only send
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53694b..696bf30 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -454,6 +454,49 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
     else
         ctx->restore.buffer_all_records = true;
 
+    if ( ctx->restore.checkpointed == MIG_STREAM_COLO )
+    {
+#define HANDLE_CALLBACK_RETURN_VALUE(ret)                   \
+    do {                                                    \
+        if ( ret == 1 )                                     \
+            rc = 0; /* Success */                           \
+        else                                                \
+        {                                                   \
+            if ( ret == 2 )                                 \
+                rc = BROKEN_CHANNEL;                        \
+            else                                            \
+                rc = -1; /* Some unspecified error */       \
+            goto err;                                       \
+        }                                                   \
+    } while (0)
+
+        /* COLO */
+
+        /* We need to resume guest */
+        rc = ctx->restore.ops.stream_complete(ctx);
+        if ( rc )
+            goto err;
+
+        /* TODO: call restore_results */
+
+        /* Resume secondary vm */
+        ret = ctx->restore.callbacks->postcopy(ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+        /* Wait for a new checkpoint */
+        ret = ctx->restore.callbacks->should_checkpoint(
+                                                ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+        /* suspend secondary vm */
+        ret = ctx->restore.callbacks->suspend(ctx->restore.callbacks->data);
+        HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+#undef HANDLE_CALLBACK_RETURN_VALUE
+
+        /* TODO: send dirty bitmap to primary */
+    }
+
  err:
     return rc;
 }
@@ -625,6 +668,15 @@ static int restore(struct xc_sr_context *ctx)
     } while ( rec.type != REC_TYPE_END );
 
  remus_failover:
+
+    if ( ctx->restore.checkpointed == MIG_STREAM_COLO )
+    {
+        /* With COLO, we have already called stream_complete */
+        rc = 0;
+        IPRINTF("COLO Failover");
+        goto done;
+    }
+
     /*
      * With Remus, if we reach here, there must be some error on primary,
      * failover from the last checkpoint state.
@@ -679,6 +731,14 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
     if ( checkpointed_stream )
         assert(callbacks->checkpoint);
 
+    if ( ctx.restore.checkpointed == MIG_STREAM_COLO )
+    {
+        /* this is COLO restore */
+        assert(callbacks->suspend &&
+               callbacks->postcopy &&
+               callbacks->should_checkpoint);
+    }
+
     IPRINTF("In experimental %s", __func__);
     DPRINTF("fd %d, dom %u, hvm %u, pae %u, superpages %d"
             ", checkpointed_stream %d", io_fd, dom, hvm, pae,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (12 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Send dirty bitmap to primary when checkpoint under colo.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/xc_sr_common.h  |   4 ++
 tools/libxc/xc_sr_restore.c | 120 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index c5603ff..7fc2021 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -213,6 +213,10 @@ struct xc_sr_context
             struct xc_sr_restore_ops ops;
             struct restore_callbacks *callbacks;
 
+            int send_fd;
+            unsigned long p2m_size;
+            xc_hypercall_buffer_t dirty_bitmap_hbuf;
+
             /* From Image Header. */
             uint32_t format_version;
 
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 696bf30..8b13d8d 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -409,6 +409,92 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     return rc;
 }
 
+/*
+ * Send dirty_bitmap to primary.
+ */
+static int send_dirty_bitmap(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    int rc = -1;
+    unsigned count, written;
+    uint64_t i, *pfns = NULL;
+    struct iovec *iov = NULL;
+    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    struct xc_sr_record rec =
+    {
+        .type = REC_TYPE_DIRTY_BITMAP,
+    };
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
+
+    if ( xc_shadow_control(
+             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+             HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
+             NULL, 0, &stats) != ctx->restore.p2m_size )
+    {
+        PERROR("Failed to retrieve logdirty bitmap");
+        goto err;
+    }
+
+    for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
+    {
+        if ( test_bit(i, dirty_bitmap) )
+            count++;
+    }
+
+
+    pfns = malloc(count * sizeof(*pfns));
+    if ( !pfns )
+    {
+        ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
+              count * sizeof(*pfns));
+        goto err;
+    }
+
+    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
+    {
+        if ( !test_bit(i, dirty_bitmap) )
+            continue;
+
+        if ( written > count )
+        {
+            ERROR("Dirty pfn list exceed");
+            goto err;
+        }
+
+        pfns[written++] = i;
+    }
+
+    /* iovec[] for writev(). */
+    iov = malloc(3 * sizeof(*iov));
+    if ( !iov )
+    {
+        ERROR("Unable to allocate memory for sending dirty bitmap");
+        goto err;
+    }
+
+    rec.length = count * sizeof(*pfns);
+
+    iov[0].iov_base = &rec.type;
+    iov[0].iov_len = sizeof(rec.type);
+
+    iov[1].iov_base = &rec.length;
+    iov[1].iov_len = sizeof(rec.length);
+
+    iov[2].iov_base = pfns;
+    iov[2].iov_len = count * sizeof(*pfns);
+
+    if ( writev_exact(ctx->restore.send_fd, iov, 3) )
+    {
+        PERROR("Failed to write dirty bitmap to stream");
+        goto err;
+    }
+
+    rc = 0;
+ err:
+    return rc;
+}
+
 static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
@@ -494,7 +580,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 
 #undef HANDLE_CALLBACK_RETURN_VALUE
 
-        /* TODO: send dirty bitmap to primary */
+        rc = send_dirty_bitmap(ctx);
+        if ( rc )
+            goto err;
     }
 
  err:
@@ -566,6 +654,21 @@ static int setup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
     int rc;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->restore.dirty_bitmap_hbuf);
+
+    if ( ctx->restore.checkpointed == MIG_STREAM_COLO )
+    {
+        dirty_bitmap = xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap,
+                                NRPAGES(bitmap_size(ctx->restore.p2m_size)));
+
+        if ( !dirty_bitmap )
+        {
+            ERROR("Unable to allocate memory for dirty bitmap");
+            rc = -1;
+            goto err;
+        }
+    }
 
     rc = ctx->restore.ops.setup(ctx);
     if ( rc )
@@ -599,10 +702,15 @@ static void cleanup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
     unsigned i;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
 
     for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
         free(ctx->restore.buffered_records[i].data);
 
+    if ( ctx->restore.checkpointed == MIG_STREAM_COLO )
+        xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
+                                   NRPAGES(bitmap_size(ctx->save.p2m_size)));
     free(ctx->restore.buffered_records);
     free(ctx->restore.populated_pfns);
     if ( ctx->restore.ops.cleanup(ctx) )
@@ -713,6 +821,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
                        int checkpointed_stream,
                        struct restore_callbacks *callbacks, int back_fd)
 {
+    xen_pfn_t nr_pfns;
     struct xc_sr_context ctx =
         {
             .xch = xch,
@@ -726,6 +835,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
     ctx.restore.xenstore_domid = store_domid;
     ctx.restore.checkpointed = checkpointed_stream;
     ctx.restore.callbacks = callbacks;
+    ctx.restore.send_fd = back_fd;
 
     /* Sanity checks for callbacks. */
     if ( checkpointed_stream )
@@ -761,6 +871,14 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
     if ( read_headers(&ctx) )
         return -1;
 
+    if ( xc_domain_nr_gpfns(xch, dom, &nr_pfns) < 0 )
+    {
+        PERROR("Unable to obtain the guest p2m size");
+        return -1;
+    }
+
+    ctx.restore.p2m_size = nr_pfns;
+
     if ( ctx.dominfo.hvm )
     {
         ctx.restore.ops = restore_ops_x86_hvm;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (13 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15 18:15   ` Andrew Cooper
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save Yang Hongyang
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
we need store mfn and console mfn when rebuilding secondary vm. So make
restore_results a function pointer in callback struct and struct
{save,restore}_callbacks, and use this callback to send store mfn and
console mfn to xl.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/include/xenguest.h     | 8 ++++++++
 tools/libxc/xc_sr_restore.c        | 7 +++++--
 tools/libxl/libxl_colo_restore.c   | 5 -----
 tools/libxl/libxl_create.c         | 2 ++
 tools/libxl/libxl_save_msgs_gen.pl | 2 +-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 1e7e1bb..d7bdfb5 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -140,6 +140,14 @@ struct restore_callbacks {
      */
     int (*should_checkpoint)(void* data);
 
+    /*
+     * callback to send store mfn and console mfn to xl
+     * if we want to resume vm before xc_domain_save()
+     * exits.
+     */
+    void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
+                            void *data);
+
     /* to be provided as the last argument to each callback function */
     void* data;
 };
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 8b13d8d..fe81acb 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -563,7 +563,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
         if ( rc )
             goto err;
 
-        /* TODO: call restore_results */
+        ctx->restore.callbacks->restore_results(ctx->restore.xenstore_gfn,
+                                                ctx->restore.console_gfn,
+                                                ctx->restore.callbacks->data);
 
         /* Resume secondary vm */
         ret = ctx->restore.callbacks->postcopy(ctx->restore.callbacks->data);
@@ -846,7 +848,8 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
         /* this is COLO restore */
         assert(callbacks->suspend &&
                callbacks->postcopy &&
-               callbacks->should_checkpoint);
+               callbacks->should_checkpoint &&
+               callbacks->restore_results);
     }
 
     IPRINTF("In experimental %s", __func__);
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 5cda0b2..99f06ab 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -137,11 +137,6 @@ static void colo_resume_vm(libxl__egc *egc,
         return;
     }
 
-    /*
-     * TODO: get store mfn and console mfn
-     *  We should call the callback restore_results in
-     *  xc_domain_restore() before resuming the guest.
-     */
     libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
 
     return;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bf4b55d..34e9362 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1080,6 +1080,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->srs.completion_callback = domcreate_stream_done;
 
     /* colo restore setup */
+    callbacks->restore_results = libxl__srm_callout_callback_restore_results;
+
     if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
         crs->ao = ao;
         crs->domid = domid;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index 7c9859b..e8943b9 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -29,7 +29,7 @@ our @msgs = (
     [  6, 'srcxA',  "should_checkpoint", [] ],
     [  7, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
-    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+    [  8, 'rcx',    "restore_results",       ['unsigned long', 'store_mfn',
                                               'unsigned long', 'console_mfn'] ],
     [  9, 'srW',    "complete",              [qw(int retval
                                                  int errnoval)] ],
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (14 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO Yang Hongyang
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

After suspend primary vm, get dirty bitmap on secondary vm,
and send pages both dirty on primary/secondary to secondary.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_sr_common.h |   2 +
 tools/libxc/xc_sr_save.c   | 104 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 7fc2021..5f2d99b 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -183,6 +183,8 @@ struct xc_sr_context
     {
         struct /* Save data. */
         {
+            int recv_fd;
+
             struct xc_sr_save_ops ops;
             struct save_callbacks *callbacks;
 
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index d12e5b1..6f13706 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -515,6 +515,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
     return rc;
 }
 
+static int merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_record rec;
+    uint64_t *pfns = NULL;
+    uint64_t pfn;
+    unsigned count, i;
+    int rc;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
+
+    rc = read_record(ctx, ctx->save.recv_fd, &rec);
+    if ( rc )
+        goto err;
+
+    if ( rec.type != REC_TYPE_DIRTY_BITMAP )
+    {
+        PERROR("Expect dirty bitmap record, but received %u", rec.type );
+        rc = -1;
+        goto err;
+    }
+
+    if ( rec.length % sizeof(*pfns) )
+    {
+        PERROR("Invalid dirty bitmap record length %u", rec.length );
+        rc = -1;
+        goto err;
+    }
+
+    count = rec.length / sizeof(*pfns);
+    pfns = rec.data;
+
+    for ( i = 0; i < count; i++ )
+    {
+        pfn = pfns[i];
+        if (pfn > ctx->save.p2m_size)
+        {
+            PERROR("Invalid pfn %#lx", pfn );
+            rc = -1;
+            goto err;
+        }
+
+        set_bit(pfn, dirty_bitmap);
+    }
+
+    rc = 0;
+
+ err:
+    free(rec.data);
+    return rc;
+}
+
 /*
  * Suspend the domain and send dirty memory.
  * This is the last iteration of the live migration and the
@@ -555,6 +607,16 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 
     bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
 
+    if ( !ctx->save.live && ctx->save.checkpointed == MIG_STREAM_COLO )
+    {
+        rc = merge_secondary_dirty_bitmap(ctx);
+        if ( rc )
+        {
+            PERROR("Failed to get secondary vm's dirty pages");
+            goto out;
+        }
+    }
+
     rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages);
     if ( rc )
         goto out;
@@ -784,11 +846,42 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
             if ( rc )
                 goto err;
 
-            ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+            if ( ctx->save.checkpointed == MIG_STREAM_COLO )
+            {
+                rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+                if ( !rc )
+                {
+                    rc = -1;
+                    goto err;
+                }
+            }
 
-            rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
-            if ( rc <= 0 )
-                ctx->save.checkpointed = false;
+            rc = ctx->save.callbacks->postcopy(ctx->save.callbacks->data);
+            if ( !rc )
+            {
+                rc = -1;
+                goto err;
+            }
+
+            if ( ctx->save.checkpointed == MIG_STREAM_COLO )
+            {
+                rc = ctx->save.callbacks->should_checkpoint(
+                                                    ctx->save.callbacks->data);
+                if ( rc <= 0 )
+                    ctx->save.checkpointed = false;
+            }
+            else if ( ctx->save.checkpointed == MIG_STREAM_REMUS )
+            {
+                rc = ctx->save.callbacks->checkpoint(ctx->save.callbacks->data);
+                if ( rc <= 0 )
+                    ctx->save.checkpointed = false;
+            }
+            else
+            {
+                ERROR("Unknown checkpointed stream");
+                rc = -1;
+                goto err;
+            }
         }
     } while ( ctx->save.checkpointed );
 
@@ -835,6 +928,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom,
     ctx.save.live  = !!(flags & XCFLAGS_LIVE);
     ctx.save.debug = !!(flags & XCFLAGS_DEBUG);
     ctx.save.checkpointed = checkpointed_stream;
+    ctx.save.recv_fd = back_fd;
 
     /*
      * TODO: Find some time to better tweak the live migration algorithm.
@@ -850,6 +944,8 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom,
         assert(callbacks->switch_qemu_logdirty);
     if ( ctx.save.checkpointed )
         assert(callbacks->checkpoint && callbacks->postcopy);
+    if ( ctx.save.checkpointed == MIG_STREAM_COLO )
+        assert(callbacks->should_checkpoint);
 
     IPRINTF("In experimental %s", __func__);
     DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (15 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk Yang Hongyang
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option -c to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/man/xl.pod.1         | 12 ++++++++--
 tools/libxl/libxl.c       | 23 ++++++++++++++++--
 tools/libxl/xl_cmdimpl.c  | 61 ++++++++++++++++++++++++++++++++++++-----------
 tools/libxl/xl_cmdtable.c |  4 +++-
 4 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f22c3f3..2cd34bb 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,12 +447,15 @@ Print huge (!) amount of debug during the migration process.
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
 
-Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
-mechanism between the two hosts.
+Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
+transport mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
+     COLO support in xl is still in experimental (proof-of-concept) phase.
+     There is no support for network or disk at the moment.
+
 B<OPTIONS>
 
 =over 4
@@ -498,6 +501,11 @@ Disable network output buffering. Requires enabling unsafe mode.
 
 Disable disk replication. Requires enabling unsafe mode.
 
+=item B<-c>
+
+Enable COLO HA. This conflicts with B<-i> and B<-b>, and memory
+checkpoint compression must be disabled.
+
 =back
 
 =item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c040909..791f364 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -814,12 +814,28 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         goto out;
     }
 
+    /* The caller must set this defbool */
+    if (libxl_defbool_is_default(info->colo)) {
+        LOG(ERROR, "colo mode must be enabled/disabled");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
     libxl_defbool_setdefault(&info->allow_unsafe, false);
     libxl_defbool_setdefault(&info->blackhole, false);
-    libxl_defbool_setdefault(&info->compression, true);
+    libxl_defbool_setdefault(&info->compression,
+                             !libxl_defbool_val(info->colo));
     libxl_defbool_setdefault(&info->netbuf, true);
     libxl_defbool_setdefault(&info->diskbuf, true);
 
+    if (libxl_defbool_val(info->colo)) {
+        if (libxl_defbool_val(info->compression)) {
+            LOG(ERROR, "cannot use memory checkpoint compression in COLO mode");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
     if (!libxl_defbool_val(info->allow_unsafe) &&
         (libxl_defbool_val(info->blackhole) ||
          !libxl_defbool_val(info->netbuf) ||
@@ -841,7 +857,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     dss->live = 1;
     dss->debug = 0;
     dss->remus = info;
-    dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
+    if (libxl_defbool_val(info->colo))
+        dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_COLO;
+    else
+        dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
 
     assert(info);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ace4a65..45ec435 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4292,6 +4292,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     char rc_buf;
     char *migration_domname;
     struct domain_create dom_info;
+    const char *ha = checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO ?
+                     "COLO" : "Remus";
 
     signal(SIGPIPE, SIG_IGN);
     /* if we get SIGPIPE we'd rather just have it as an error */
@@ -4312,6 +4314,9 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
+    if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
+        /* COLO uses stdout to send control message to master */
+        dom_info.quiet = 1;
 
     rc = create_domain(&dom_info);
     if (rc < 0) {
@@ -4326,8 +4331,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
         /* If we are here, it means that the sender (primary) has crashed.
          * TODO: Split-Brain Check.
          */
-        fprintf(stderr, "migration target: Remus Failover for domain %u\n",
-                domid);
+        fprintf(stderr, "migration target: %s Failover for domain %u\n",
+                ha, domid);
 
         /*
          * If domain renaming fails, lets just continue (as we need the domain
@@ -4343,16 +4348,20 @@ static void migrate_receive(int debug, int daemonize, int monitor,
             rc = libxl_domain_rename(ctx, domid, migration_domname,
                                      common_domname);
             if (rc)
-                fprintf(stderr, "migration target (Remus): "
+                fprintf(stderr, "migration target (%s): "
                         "Failed to rename domain from %s to %s:%d\n",
-                        migration_domname, common_domname, rc);
+                        ha, migration_domname, common_domname, rc);
         }
 
+        if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
+            /* The guest is running after failover in COLO mode */
+            exit(rc ? -ERROR_FAIL: 0);
+
         rc = libxl_domain_unpause(ctx, domid);
         if (rc)
-            fprintf(stderr, "migration target (Remus): "
+            fprintf(stderr, "migration target (%s): "
                     "Failed to unpause domain %s (id: %u):%d\n",
-                    common_domname, domid, rc);
+                    ha, common_domname, domid, rc);
 
         exit(rc ? -ERROR_FAIL: 0);
     }
@@ -4498,7 +4507,7 @@ int main_migrate_receive(int argc, char **argv)
     int checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
 
-    SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrc", NULL, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -4512,6 +4521,9 @@ int main_migrate_receive(int argc, char **argv)
     case 'r':
         checkpointed = LIBXL_CHECKPOINTED_STREAM_REMUS;
         break;
+    case 'c':
+        checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -7877,11 +7889,8 @@ int main_remus(int argc, char **argv)
     int config_len;
 
     memset(&r_info, 0, sizeof(libxl_domain_remus_info));
-    /* Defaults */
-    r_info.interval = 200;
-    libxl_defbool_setdefault(&r_info.blackhole, false);
 
-    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:ec", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7909,11 +7918,32 @@ int main_remus(int argc, char **argv)
     case 'e':
         daemonize = 0;
         break;
+    case 'c':
+        libxl_defbool_set(&r_info.colo, true);
     }
 
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    /* Defaults */
+    libxl_defbool_setdefault(&r_info.blackhole, false);
+    libxl_defbool_setdefault(&r_info.colo, false);
+    if (!libxl_defbool_val(r_info.colo) && !r_info.interval)
+        r_info.interval = 200;
+
+    if (libxl_defbool_val(r_info.colo)) {
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
+            perror("Option -c conflicts with -i or -b");
+            exit(-1);
+        }
+
+        if (libxl_defbool_is_default(r_info.compression)) {
+            perror("COLO can't be used with memory compression. "
+                   "Disable memory checkpoint compression now...");
+            libxl_defbool_set(&r_info.compression, false);
+        }
+    }
+
     if (!r_info.netbufscript)
         r_info.netbufscript = default_remus_netbufscript;
 
@@ -7928,8 +7958,9 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            if (asprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+            if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
                          ssh_command, host,
+                         libxl_defbool_val(r_info.colo) ? "-c" : "-r",
                          daemonize ? "" : " -e") < 0)
                 return 1;
         }
@@ -7958,7 +7989,8 @@ int main_remus(int argc, char **argv)
      * domain to force failover
      */
     if (libxl_domain_info(ctx, 0, domid)) {
-        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        fprintf(stderr, "%s: Primary domain has been destroyed.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         close(send_fd);
         return 0;
     }
@@ -7970,7 +8002,8 @@ int main_remus(int argc, char **argv)
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
-        fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+        fprintf(stderr, "%s: Backup failed? resuming domain at primary.\n",
+                libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
         libxl_domain_resume(ctx, domid, 1, 0);
     }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 54dbecc..7e65a11 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -499,7 +499,9 @@ struct cmd_spec cmd_table[] = {
       "-b                      Replicate memory checkpoints to /dev/null (blackhole).\n"
       "                        Works only in unsafe mode.\n"
       "-n                      Disable network output buffering. Works only in unsafe mode.\n"
-      "-d                      Disable disk replication. Works only in unsafe mode."
+      "-d                      Disable disk replication. Works only in unsafe mode.\n"
+      "-c                      Enable COLO HA. It is conflict with -i and -b, and memory\n"
+      "                        checkpoint must be disabled"
     },
 #endif
     { "devd",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (16 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication Yang Hongyang
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

Usage: disk = ['...,colo,colo-params=xxx,active-disk=xxx,hidden-disk=xxx...']
The format of colo-params: host:port:exportname=xx
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/man/xl.pod.1                   |   2 +-
 docs/misc/xl-disk-configuration.txt |  38 ++++++
 tools/libxl/libxl.c                 |  42 +++++-
 tools/libxl/libxl_create.c          |  25 +++-
 tools/libxl/libxl_device.c          |  38 ++++++
 tools/libxl/libxl_dm.c              | 257 +++++++++++++++++++++++++++++++++++-
 tools/libxl/libxl_types.idl         |   5 +
 tools/libxl/libxlu_disk_l.l         |   5 +
 8 files changed, 403 insertions(+), 9 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2cd34bb..1effce7 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -454,7 +454,7 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
      COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk at the moment.
+     There is no support for network at the moment.
 
 B<OPTIONS>
 
diff --git a/docs/misc/xl-disk-configuration.txt b/docs/misc/xl-disk-configuration.txt
index 6a2118d..e366e8d 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -234,6 +234,44 @@ were intentionally created non-sparse to avoid fragmentation of the
 file.
 
 
+===============
+COLO PARAMETERS
+===============
+
+
+colo
+----
+
+Enable COLO HA for disk. For better understanding block replication on
+QEMU, please refer to:
+http://wiki.qemu.org/Features/BlockReplication
+
+
+colo-params=host:port:exportname=<name>
+---------------------------------------
+
+Description:           Secondary host's address and port information,
+                       We will run a nbd server on secondary host,
+                       exportname is the nbd server's disk export name.
+Mandatory:             Yes when COLO enabled
+
+
+active-disk
+-----------
+
+Description:           This is used by secondary. Secondary guest's write
+                       will be buffered in this disk.
+Mandatory:             Yes when COLO enabled
+
+
+hidden-disk
+-----------
+
+Description:           This is used by secondary. It buffers the original
+                       content that is modified by the primary VM.
+Mandatory:             Yes when COLO enabled
+
+
 ============================================
 DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES
 ============================================
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 791f364..c6cc5aa 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2256,6 +2256,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk)
     int rc;
 
     libxl_defbool_setdefault(&disk->discard_enable, !!disk->readwrite);
+    libxl_defbool_setdefault(&disk->colo_enable, false);
+    libxl_defbool_setdefault(&disk->colo_restore_enable, false);
 
     rc = libxl__resolve_domid(gc, disk->backend_domname, &disk->backend_domid);
     if (rc < 0) return rc;
@@ -2456,6 +2458,14 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                 flexarray_append(back, "params");
                 flexarray_append(back, libxl__sprintf(gc, "%s:%s",
                               libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                if (libxl_defbool_val(disk->colo_enable)) {
+                    flexarray_append(back, "colo-params");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_params));
+                    flexarray_append(back, "active-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->active_disk));
+                    flexarray_append(back, "hidden-disk");
+                    flexarray_append(back, libxl__sprintf(gc, "%s", disk->hidden_disk));
+                }
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_QDISK);
                 break;
             default:
@@ -2570,7 +2580,10 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         goto cleanup;
     }
 
-    /* "params" may not be present; but everything else must be. */
+    /*
+     * "params" and "colo-params" may not be present; but everything
+     * else must be.
+     */
     tmp = xs_read(ctx->xsh, XBT_NULL,
                   libxl__sprintf(gc, "%s/params", be_path), &len);
     if (tmp && strchr(tmp, ':')) {
@@ -2580,6 +2593,33 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
         disk->pdev_path = tmp;
     }
 
+    tmp = xs_read(ctx->xsh, XBT_NULL,
+                  libxl__sprintf(gc, "%s/colo-params", be_path), &len);
+    if (tmp) {
+        libxl_defbool_set(&disk->colo_enable, true);
+        disk->colo_params = tmp;
+    } else {
+        libxl_defbool_set(&disk->colo_enable, false);
+    }
+
+    if (libxl_defbool_val(disk->colo_enable)) {
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      libxl__sprintf(gc, "%s/active-disk", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/active-disk", be_path);
+            goto cleanup;
+        }
+        disk->active_disk = tmp;
+
+        tmp = xs_read(ctx->xsh, XBT_NULL,
+                      libxl__sprintf(gc, "%s/hidden-disk", be_path), &len);
+        if (!tmp) {
+            LOG(ERROR, "Missing xenstore node %s/hidden-disk", be_path);
+            goto cleanup;
+        }
+        disk->hidden_disk = tmp;
+    }
+
 
     tmp = libxl__xs_read(gc, XBT_NULL,
                          libxl__sprintf(gc, "%s/type", be_path));
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 34e9362..d99d5ef 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1647,12 +1647,29 @@ static void domain_create_cb(libxl__egc *egc,
 
     libxl__ao_complete(egc, ao, rc);
 }
-    
+
+static void set_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, true);
+}
+
+static void unset_disk_colo_restore(libxl_domain_config *d_config)
+{
+    int i;
+
+    for (i = 0; i < d_config->num_disks; i++)
+        libxl_defbool_set(&d_config->disks[i].colo_restore_enable, false);
+}
+
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
+    unset_disk_colo_restore(d_config);
     return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
                             ao_how, aop_console_how);
 }
@@ -1663,6 +1680,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        set_disk_colo_restore(d_config);
+    } else {
+        unset_disk_colo_restore(d_config);
+    }
+
     return do_domain_create(ctx, d_config, domid, restore_fd, send_fd, params,
                             ao_how, aop_console_how);
 }
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 2493972..5fcd45b 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -196,6 +196,10 @@ static int disk_try_backend(disk_try_backend_args *a,
             goto bad_format;
         }
 
+        if (libxl_defbool_val(a->disk->colo_enable) ||
+            a->disk->active_disk || a->disk->hidden_disk)
+            goto bad_colo;
+
         if (a->disk->backend_domid != LIBXL_TOOLSTACK_DOMID) {
             LOG(DEBUG, "Disk vdev=%s, is using a storage driver domain, "
                        "skipping physical device check", a->disk->vdev);
@@ -218,6 +222,10 @@ static int disk_try_backend(disk_try_backend_args *a,
     case LIBXL_DISK_BACKEND_TAP:
         if (a->disk->script) goto bad_script;
 
+        if (libxl_defbool_val(a->disk->colo_enable) ||
+            a->disk->active_disk || a->disk->hidden_disk)
+            goto bad_colo;
+
         if (a->disk->is_cdrom) {
             LOG(DEBUG, "Disk vdev=%s, backend tap unsuitable for cdroms",
                        a->disk->vdev);
@@ -236,6 +244,16 @@ static int disk_try_backend(disk_try_backend_args *a,
 
     case LIBXL_DISK_BACKEND_QDISK:
         if (a->disk->script) goto bad_script;
+        if (libxl_defbool_val(a->disk->colo_enable)) {
+            if (!a->disk->colo_params)
+                goto bad_colo_params;
+
+            if (!a->disk->active_disk)
+                goto bad_active_disk;
+
+            if (!a->disk->hidden_disk)
+                goto bad_hidden_disk;
+        }
         return backend;
 
     default:
@@ -256,6 +274,26 @@ static int disk_try_backend(disk_try_backend_args *a,
     LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with script=...",
         a->disk->vdev, libxl_disk_backend_to_string(backend));
     return 0;
+
+ bad_colo:
+    LOG(DEBUG, "Disk vdev=%s, backend %s not compatible with colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_colo_params:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs colo-params=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_active_disk:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs active-disk=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
+
+ bad_hidden_disk:
+    LOG(DEBUG, "Disk vdev=%s, backend %s needs hidden-disk=... for colo",
+        a->disk->vdev, libxl_disk_backend_to_string(backend));
+    return 0;
 }
 
 int libxl__device_disk_set_backend(libxl__gc *gc, libxl_device_disk *disk) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ad434f0..94bab23 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -427,6 +427,211 @@ static char *dm_spice_options(libxl__gc *gc,
     return opt;
 }
 
+/* colo mode */
+enum {
+    LIBXL__COLO_NONE = 0,
+    LIBXL__COLO_PRIMARY,
+    LIBXL__COLO_SECONDARY,
+};
+
+/* The format of colo-params: host:port:exportname=xx */
+static int parse_colo_params(libxl__gc *gc, const char *colo_params,
+                             const char **host, const char **port,
+                             const char **exportname)
+{
+    const char *delim;
+
+    delim = strstr(colo_params, ":");
+    if (!delim)
+        return 1;
+    if (delim == colo_params)
+        return 1;
+    *host = libxl__strndup(gc, colo_params, delim - colo_params);
+    colo_params = delim + 1;
+
+    delim = strstr(colo_params, ":");
+    if (!delim)
+        return 1;
+    if (delim == colo_params)
+        return 1;
+    *port = libxl__strndup(gc, colo_params, delim - colo_params);
+    colo_params = delim + 1;
+
+    if (strncmp(colo_params, "exportname=", strlen("exportname=")))
+        return 1;
+    *exportname = colo_params + strlen("exportname=");
+    if ((*exportname)[0] == 0)
+        return 1;
+
+    return 0;
+}
+
+static char *qemu_disk_scsi_drive_string(libxl__gc *gc, const char *pdev_path,
+                                         int unit, const char *format,
+                                         const libxl_device_disk *disk,
+                                         const char *nbd_target,
+                                         int colo_mode)
+{
+    char *drive = NULL;
+    const char *host = NULL, *port = NULL, *exportname = NULL;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const char *colo_params = disk->colo_params;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = libxl__sprintf
+            (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=scsi,bus=0,unit=x,cache=writeback,driver=quorum,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  children.1.file.host=host,\
+         *  children.1.file.port=port,\
+         *  children.1.file.export=exportname,\
+         *  children.1.file.driver=nbd+colo,\
+         *  children.1.driver=raw,\
+         *  children.1.ignore-errors=on,\
+         *  read-pattern=fifo
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=scsi,bus=0,unit=%d,cache=writeback,driver=quorum,"
+                 "children.0.file.filename=%s,"
+                 "children.0.driver=%s,"
+                 "children.1.file.host=%s,"
+                 "children.1.file.port=%s,"
+                 "children.1.file.export=%s,"
+                 "children.1.file.driver=nbd+colo,"
+                 "children.1.driver=raw,"
+                 "children.1.ignore-errors=on,"
+                 "read-pattern=fifo",
+             unit, pdev_path, format, host, port, exportname);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=scsi,bus=0,unit=x,cache=writeback,driver=qcow2+colo,\
+         *  file=active_disk,\
+         *  backing_reference.drive_id=nbd_target,\
+         *  backing_reference.hidden-disk.file.filename=hidden_disk,\
+         *  backing_reference.hidden-disk.allow-write-backing-file=on,\
+         *  export=exportname,
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=scsi,bus=0,unit=%d,cache=writeback,driver=qcow2+colo,"
+                 "file=%s,"
+                 "backing_reference.drive_id=%s,"
+                 "backing_reference.hidden-disk.file.filename=%s,"
+                 "backing_reference.hidden-disk.allow-write-backing-file=on,"
+                 "export=%s",
+             unit, active_disk, nbd_target, hidden_disk, exportname);
+        break;
+    default:
+        abort();
+    }
+
+    if (!drive)
+        LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
+                   "colo-params is invalid for %s", pdev_path);
+    return drive;
+}
+
+static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *pdev_path,
+                                        int unit, const char *format,
+                                        const libxl_device_disk *disk,
+                                        const char *nbd_target,
+                                        int colo_mode)
+{
+    char *drive = NULL;
+    const char *host = NULL, *port = NULL, *exportname = NULL;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const char *colo_params = disk->colo_params;
+    const char *active_disk = disk->active_disk;
+    const char *hidden_disk = disk->hidden_disk;
+
+    switch (colo_mode) {
+    case LIBXL__COLO_NONE:
+        drive = libxl__sprintf
+            (gc, "file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
+             pdev_path, unit, format);
+        break;
+    case LIBXL__COLO_PRIMARY:
+        /*
+         * primary:
+         *  -dirve if=ide,index=x,media=disk,cache=writeback,driver=quorum,\
+         *  children.0.file.filename=pdev_path,\
+         *  children.0.driver=format,\
+         *  children.1.file.host=host,\
+         *  children.1.file.port=port,\
+         *  children.1.file.export=exportname,\
+         *  children.1.file.driver=nbd+colo,\
+         *  children.1.driver=raw,\
+         *  children.1.ignored-errors=on,\
+         *  read-pattern=fifo
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=ide,index=%d,media=disk,cache=writeback,driver=quorum,"
+                 "children.0.file.filename=%s,"
+                 "children.0.driver=%s,"
+                 "children.1.file.host=%s,"
+                 "children.1.file.port=%s,"
+                 "children.1.file.export=%s,"
+                 "children.1.file.driver=nbd+colo,"
+                 "children.1.driver=raw,"
+                 "children.1.ignore-errors=on,"
+                 "read-pattern=fifo",
+             unit, pdev_path, format, host, port, exportname);
+        break;
+    case LIBXL__COLO_SECONDARY:
+        /*
+         * secondary:
+         *  -drive if=ide,index=x,media=disk,cache=writeback,driver=qcow2+colo,\
+         *  file=active_disk,\
+         *  backing_reference.drive_id=nbd_target,\
+         *  backing_reference.hidden-disk.file.filename=hidden_disk,\
+         *  backing_reference.hidden-disk.allow-write-backing-file=on,\
+         *  export=exportname,
+         */
+
+        if (parse_colo_params(gc, colo_params, &host, &port, &exportname))
+            break;
+
+        drive = libxl__sprintf
+            (gc, "if=ide,index=%d,media=disk,cache=writeback,driver=qcow2+colo,"
+                 "file=%s,"
+                 "backing_reference.drive_id=%s,"
+                 "backing_reference.hidden-disk.file.filename=%s,"
+                 "backing_reference.hidden-disk.allow-write-backing-file=on,"
+                 "export=%s",
+             unit, active_disk, nbd_target, hidden_disk, exportname);
+        break;
+    default:
+        abort();
+    }
+
+    if (!drive)
+        LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
+                   "colo-params is invalid for %s", pdev_path);
+    return drive;
+}
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
                                         const char *dm, int guest_domid,
                                         const libxl_domain_config *guest_config,
@@ -827,6 +1032,8 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             const char *format = qemu_disk_format_string(disks[i].format);
             char *drive;
             const char *pdev_path;
+            int colo_mode;
+            char *drive_id;
 
             if (dev_number == -1) {
                 LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine"
@@ -870,10 +1077,43 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                  * For other disks we translate devices 0..3 into
                  * hd[a-d] and ignore the rest.
                  */
-                if (strncmp(disks[i].vdev, "sd", 2) == 0)
+                if (libxl_defbool_val(disks[i].colo_enable)) {
+                    if (libxl_defbool_val(disks[i].colo_restore_enable))
+                        colo_mode = LIBXL__COLO_SECONDARY;
+                    else
+                        colo_mode = LIBXL__COLO_PRIMARY;
+                } else {
+                    colo_mode = LIBXL__COLO_NONE;
+                }
+
+                if (colo_mode == LIBXL__COLO_SECONDARY) {
+                    /*
+                     * -drive if=none,driver=format,file=pdev_path,\
+                     * id=nbd_targetx
+                     */
+                    if (strncmp(disks[i].vdev, "sd", 2) == 0) {
+                        drive_id = libxl__sprintf(gc, "nbd_target%d", disk + 4);
+                    } else if (disk < 4) {
+                        drive_id = libxl__sprintf(gc, "nbd_target%d", disk);
+                    } else {
+                        continue; /* Do not emulate this disk */
+                    }
                     drive = libxl__sprintf
-                        (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
-                         pdev_path, disk, format);
+                        (gc, "if=none,driver=%s,file=%s,id=%s",
+                         format, pdev_path, drive_id);
+
+                    flexarray_append(dm_args, "-drive");
+                    flexarray_append(dm_args, drive);
+                } else {
+                    drive_id = NULL;
+                }
+
+                if (strncmp(disks[i].vdev, "sd", 2) == 0)
+                    drive = qemu_disk_scsi_drive_string(gc, pdev_path, disk,
+                                                        format,
+                                                        &disks[i],
+                                                        drive_id,
+                                                        colo_mode);
                 else if (disk < 6 && b_info->u.hvm.hdtype == LIBXL_HDTYPE_AHCI) {
                     flexarray_vappend(dm_args, "-drive",
                         GCSPRINTF("file=%s,if=none,id=ahcidisk-%d,format=%s,cache=writeback",
@@ -882,11 +1122,16 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                         disk, disk), NULL);
                     continue;
                 } else if (disk < 4)
-                    drive = libxl__sprintf
-                        (gc, "file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
-                         pdev_path, disk, format);
+                    drive = qemu_disk_ide_drive_string(gc, pdev_path, disk,
+                                                       format,
+                                                       &disks[i],
+                                                       drive_id,
+                                                       colo_mode);
                 else
                     continue; /* Do not emulate this disk */
+
+                if (!drive)
+                    continue;
             }
 
             flexarray_append(dm_args, "-drive");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 19bd2c1..5eb9a38 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -542,6 +542,11 @@ libxl_device_disk = Struct("device_disk", [
     ("is_cdrom", integer),
     ("direct_io_safe", bool),
     ("discard_enable", libxl_defbool),
+    ("colo_enable", libxl_defbool),
+    ("colo_restore_enable", libxl_defbool),
+    ("colo_params", string),
+    ("active_disk", string),
+    ("hidden_disk", string)
     ])
 
 libxl_device_nic = Struct("device_nic", [
diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 1a5deb5..566aa1e 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -176,6 +176,11 @@ script=[^,]*,?	{ STRIP(','); SAVESTRING("script", script, FROMEQUALS); }
 direct-io-safe,? { DPC->disk->direct_io_safe = 1; }
 discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, true); }
 no-discard,?	{ libxl_defbool_set(&DPC->disk->discard_enable, false); }
+colo,?		{ libxl_defbool_set(&DPC->disk->colo_enable, true); }
+no-colo,?	{ libxl_defbool_set(&DPC->disk->colo_enable, false); }
+colo-params=[^,]*,?	{ STRIP(','); SAVESTRING("colo-params", colo_params, FROMEQUALS); }
+active-disk=[^,]*,?	{ STRIP(','); SAVESTRING("active-disk", active_disk, FROMEQUALS); }
+hidden-disk=[^,]*,?	{ STRIP(','); SAVESTRING("hidden-disk", hidden_disk, FROMEQUALS); }
 
  /* the target magic parameter, eats the rest of the string */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (17 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

From: Wen Congyang <wency@cn.fujitsu.com>

Use qemu block replication as our block replication solution.
Note that guest must be paused before starting COLO, otherwise,
the disk won't be consistent between primary and secondary.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
for commit message,
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile             |   1 +
 tools/libxl/libxl_colo_qdisk.c   | 209 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_restore.c |  20 +++-
 tools/libxl/libxl_colo_save.c    |  36 ++++++-
 tools/libxl/libxl_internal.h     |  18 ++++
 tools/libxl/libxl_qmp.c          |  31 ++++++
 6 files changed, 311 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_qdisk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 71bf7a2..e91ae79 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -64,6 +64,7 @@ endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
+LIBXL_OBJS-y += libxl_colo_qdisk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo_qdisk.c b/tools/libxl/libxl_colo_qdisk.c
new file mode 100644
index 0000000..d73572e
--- /dev/null
+++ b/tools/libxl/libxl_colo_qdisk.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+typedef struct libxl__colo_qdisk {
+    libxl__checkpoint_device *dev;
+} libxl__colo_qdisk;
+
+/* ========== init() and cleanup() ========== */
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+    /*
+     * We don't know if we use qemu block replication, so
+     * we cannot start block replication here.
+     */
+    return 0;
+}
+
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== setup() and teardown() ========== */
+static void colo_qdisk_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                             bool primary)
+{
+    const libxl_device_disk *disk = dev->backend_dev;
+    const char *addr = NULL;
+    const char *export_name;
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const char *colo_params = disk->colo_params;
+    const int domid = cds->domid;
+
+    EGC_GC;
+
+    if (disk->backend != LIBXL_DISK_BACKEND_QDISK ||
+        !libxl_defbool_val(disk->colo_enable)) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    export_name = strstr(colo_params, ":exportname=");
+    if (!export_name) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+    export_name += strlen(":exportname=");
+    if (export_name[0] == 0) {
+        rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+        goto out;
+    }
+
+    dev->matched = 1;
+
+    if (primary) {
+        /* NBD server is not ready, so we cannot start block replication now */
+        goto out;
+    } else {
+        libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+        int len;
+
+        if (crs->qdisk_setuped)
+            goto out;
+
+        crs->qdisk_setuped = true;
+
+        len = export_name - strlen(":exportname=") - colo_params;
+        addr = libxl__strndup(gc, colo_params, len);
+    }
+
+    ret = libxl__qmp_block_start_replication(gc, domid, primary, addr);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_qdisk_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                                bool primary)
+{
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    libxl__checkpoint_devices_state *const cds = dev->cds;
+    const int domid = cds->domid;
+
+    EGC_GC;
+
+    if (primary) {
+        libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+
+        if (!css->qdisk_setuped)
+            goto out;
+
+        css->qdisk_setuped = false;
+    } else {
+        libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+
+        if (!crs->qdisk_setuped)
+            goto out;
+
+        crs->qdisk_setuped = false;
+    }
+
+    ret = libxl__qmp_block_stop_replication(gc, domid, primary);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ========== checkpointing APIs ========== */
+/* should be called after libxl__checkpoint_device_instance_ops.preresume */
+int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid)
+{
+    GC_INIT(ctx);
+    int ret;
+
+    ret = libxl__qmp_block_do_checkpoint(gc, domid);
+
+    GC_FREE;
+    return ret;
+}
+
+static void colo_qdisk_save_preresume(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+    int ret, rc = 0;
+
+    /* Convenience aliases */
+    const int domid = dev->cds->domid;
+
+    EGC_GC;
+
+    if (css->qdisk_setuped)
+        goto out;
+
+    css->qdisk_setuped = true;
+
+    ret = libxl__qmp_block_start_replication(gc, domid, true, NULL);
+    if (ret)
+        rc = ERROR_FAIL;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+/* ======== primary ======== */
+static void colo_qdisk_save_setup(libxl__egc *egc,
+                                  libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, true);
+}
+
+static void colo_qdisk_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, true);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_save_setup,
+    .teardown = colo_qdisk_save_teardown,
+    .preresume = colo_qdisk_save_preresume,
+};
+
+/* ======== secondary ======== */
+static void colo_qdisk_restore_setup(libxl__egc *egc,
+                                     libxl__checkpoint_device *dev)
+{
+    colo_qdisk_setup(egc, dev, false);
+}
+
+static void colo_qdisk_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    colo_qdisk_teardown(egc, dev, false);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk = {
+    .kind = LIBXL__DEVICE_KIND_VBD,
+    .setup = colo_qdisk_restore_setup,
+    .teardown = colo_qdisk_restore_teardown,
+};
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 99f06ab..96ea0b9 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -49,7 +49,10 @@ static void libxl__colo_restore_domain_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_should_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_suspend_callback(void *data);
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_qdisk,
     NULL,
 };
 
@@ -148,7 +151,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc)  goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -156,6 +163,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 
@@ -215,6 +224,7 @@ void libxl__colo_restore_setup(libxl__egc *egc,
     GCNEW(crcs);
     crs->crcs = crcs;
     crcs->crs = crs;
+    crs->qdisk_setuped = false;
 
     /* setup dsps */
     crcs->dsps.ao = ao;
@@ -519,6 +529,12 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    rc = colo_qdisk_preresume(CTX, crs->domid);
+    if (rc) {
+        LOG(ERROR, "colo_qdisk_preresume() fails");
+        goto out;
+    }
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -674,8 +690,8 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index f0ab565..1245da7 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -19,7 +19,10 @@
 #include "libxl_internal.h"
 #include "libxl_colo.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
+
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_qdisk,
     NULL,
 };
 
@@ -30,7 +33,11 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) goto out;
+
     rc = 0;
+out:
     return rc;
 }
 
@@ -38,6 +45,8 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 {
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
+
+    cleanup_subkind_qdisk(cds);
 }
 
 /* ================= colo: setup save environment ================= */
@@ -65,9 +74,11 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->send_fd = dss->fd;
     css->recv_fd = dss->recv_fd;
     css->svm_running = false;
+    css->paused = true;
+    css->qdisk_setuped = false;
 
-    /* TODO: disk/nic support */
-    cds->device_kind_flags = 0;
+    /* TODO: nic support */
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -391,12 +402,33 @@ static void colo_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    if (!css->paused) {
+        rc = colo_qdisk_preresume(CTX, dss->domid);
+        if (rc) {
+            LOG(ERROR, "colo_qdisk_preresume() fails");
+            goto out;
+        }
+    }
+
     /* Resumes the domain and the device model */
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1)) {
         LOG(ERROR, "cannot resume primary vm");
         goto out;
     }
 
+    /*
+     * The guest should be paused before doing colo because there is
+     * no disk migration.
+     */
+    if (css->paused) {
+        rc = libxl_domain_unpause(CTX, dss->domid);
+        if (rc) {
+            LOG(ERROR, "cannot unpause primary vm");
+            goto out;
+        }
+        css->paused = false;
+    }
+
     /* read COLO_SVM_RESUMED */
     css->callback = colo_read_svm_resumed_done;
     css->srs.checkpoint_callback = colo_common_read_stream_done;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c429852..898e42c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1710,6 +1710,14 @@ _hidden int libxl__qmp_set_global_dirty_log(libxl__gc *gc, int domid, bool enabl
 _hidden int libxl__qmp_insert_cdrom(libxl__gc *gc, int domid, const libxl_device_disk *disk);
 /* Add a virtual CPU */
 _hidden int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int index);
+/* Start block replication */
+_hidden int libxl__qmp_block_start_replication(libxl__gc *gc, int domid,
+                                               bool primary, const char *addr);
+/* Do block checkpoint */
+_hidden int libxl__qmp_block_do_checkpoint(libxl__gc *gc, int domid);
+/* Stop block replication */
+_hidden int libxl__qmp_block_stop_replication(libxl__gc *gc, int domid,
+                                              bool primary);
 /* close and free the QMP handler */
 _hidden void libxl__qmp_close(libxl__qmp_handler *qmp);
 /* remove the socket file, if the file has already been removed,
@@ -2825,6 +2833,9 @@ int init_subkind_nic(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_nic(libxl__checkpoint_devices_state *cds);
 int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds);
+int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
                                         libxl__checkpoint_devices_state *,
@@ -3044,6 +3055,10 @@ struct libxl__colo_save_state {
     libxl__stream_read_state srs;
     void (*callback)(libxl__egc *, libxl__colo_save_state *, int);
     bool svm_running;
+    bool paused;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_setuped;
 };
 
 /*----- Domain suspend (save) state structure -----*/
@@ -3441,6 +3456,9 @@ struct libxl__colo_restore_state {
     libxl__domain_create_cb *saved_cb;
     void *crcs;
     libxl__checkpoint_devices_state cds;
+
+    /* private, used by qdisk block replication */
+    bool qdisk_setuped;
 };
 
 struct libxl__domain_create_state {
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 080cb9f..a8e7a8f 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -977,6 +977,37 @@ int libxl__qmp_cpu_add(libxl__gc *gc, int domid, int idx)
     return qmp_run_command(gc, domid, "cpu-add", args, NULL, NULL);
 }
 
+int libxl__qmp_block_start_replication(libxl__gc *gc, int domid,
+                                       bool primary, const char *addr)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", true);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+    if (!primary)
+        qmp_parameters_add_string(gc, &args, "addr", addr);
+
+    return qmp_run_command(gc, domid, "xen-set-block-replication", args,
+                           NULL, NULL);
+}
+
+int libxl__qmp_block_do_checkpoint(libxl__gc *gc, int domid)
+{
+    return qmp_run_command(gc, domid, "xen-do-block-checkpoint", NULL,
+                           NULL, NULL);
+}
+
+int libxl__qmp_block_stop_replication(libxl__gc *gc, int domid, bool primary)
+{
+    libxl__json_object *args = NULL;
+
+    qmp_parameters_add_bool(gc, &args, "enable", false);
+    qmp_parameters_add_bool(gc, &args, "primary", primary);
+
+    return qmp_run_command(gc, domid, "xen-set-block-replication", args,
+                           NULL, NULL);
+}
+
 int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid,
                                const libxl_domain_config *guest_config)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (18 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

setup/teardown of COLO proxy module.
we use netlink to communicate with proxy module.
About colo-proxy module:
https://lkml.org/lkml/2015/6/18/32
How to use:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/Makefile           |   1 +
 tools/libxl/libxl_colo.h       |   2 +
 tools/libxl/libxl_colo_proxy.c | 210 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h   |  12 +++
 4 files changed, 225 insertions(+)
 create mode 100644 tools/libxl/libxl_colo_proxy.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index e91ae79..d7a3540 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -65,6 +65,7 @@ endif
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
+LIBXL_OBJS-y += libxl_colo_proxy.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 49a430b..46ca4cf 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -34,4 +34,6 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
                                       libxl__colo_save_state *css,
                                       int rc);
 
+extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
+extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
new file mode 100644
index 0000000..9f1243e
--- /dev/null
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+#include "libxl_colo.h"
+#include <linux/netlink.h>
+
+#define NETLINK_COLO 28
+
+enum colo_netlink_op {
+    COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+    COLO_CHECKPOINT,
+    COLO_FAILOVER,
+    COLO_PROXY_INIT,
+    COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
+
+/* ========= colo-proxy: helper functions ========== */
+
+static int colo_proxy_send(libxl__colo_proxy_state *cps, uint8_t *buff, uint64_t size, int type)
+{
+    struct sockaddr_nl sa;
+    struct nlmsghdr msg;
+    struct iovec iov;
+    struct msghdr mh;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_pid = 0;
+    sa.nl_groups = 0;
+
+    msg.nlmsg_len = NLMSG_SPACE(0);
+    msg.nlmsg_flags = NLM_F_REQUEST;
+    if (type == COLO_PROXY_INIT) {
+        msg.nlmsg_flags |= NLM_F_ACK;
+    }
+    msg.nlmsg_seq = 0;
+    /* This is untrusty */
+    msg.nlmsg_pid = cps->index;
+    msg.nlmsg_type = type;
+
+    iov.iov_base = &msg;
+    iov.iov_len = msg.nlmsg_len;
+
+    mh.msg_name = &sa;
+    mh.msg_namelen = sizeof(sa);
+    mh.msg_iov = &iov;
+    mh.msg_iovlen = 1;
+    mh.msg_control = NULL;
+    mh.msg_controllen = 0;
+    mh.msg_flags = 0;
+
+    ret = sendmsg(cps->sock_fd, &mh, 0);
+    if (ret <= 0) {
+        LOG(ERROR, "can't send msg to kernel by netlink: %s",
+            strerror(errno));
+    }
+
+    return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(libxl__colo_proxy_state *cps, uint8_t **buff, int flags)
+{
+    struct sockaddr_nl sa;
+    struct iovec iov;
+    struct msghdr mh = {
+        .msg_name = &sa,
+        .msg_namelen = sizeof(sa),
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+    };
+    uint32_t size = 16384;
+    int64_t len = 0;
+    int ret;
+
+    STATE_AO_GC(cps->ao);
+    uint8_t *tmp = libxl__malloc(NOGC, size);
+
+    iov.iov_base = tmp;
+    iov.iov_len = size;
+next:
+   ret = recvmsg(cps->sock_fd, &mh, flags);
+    if (ret <= 0) {
+        goto out;
+    }
+
+    len += ret;
+    if (mh.msg_flags & MSG_TRUNC) {
+        size += 16384;
+        tmp = libxl__realloc(NOGC, tmp, size);
+        iov.iov_base = tmp + len;
+        iov.iov_len = size - len;
+        goto next;
+    }
+
+    *buff = tmp;
+    return len;
+
+out:
+    free(tmp);
+    *buff = NULL;
+    return ret;
+}
+
+/* ========= colo-proxy: setup and teardown ========== */
+
+int colo_proxy_setup(libxl__colo_proxy_state *cps)
+{
+    int skfd = 0;
+    struct sockaddr_nl sa;
+    struct nlmsghdr *h;
+    struct timeval tv = {0, 500000}; /* timeout for recvmsg from kernel */
+    int i = 1;
+    int ret = ERROR_FAIL;
+    uint8_t *buff = NULL;
+    int64_t size;
+
+    STATE_AO_GC(cps->ao);
+
+    skfd = socket(PF_NETLINK, SOCK_RAW, NETLINK_COLO);
+    if (skfd < 0) {
+        LOG(ERROR, "can not create a netlink socket: %s", strerror(errno));
+        goto out;
+    }
+    cps->sock_fd = skfd;
+    memset(&sa, 0, sizeof(sa));
+    sa.nl_family = AF_NETLINK;
+    sa.nl_groups = 0;
+retry:
+    sa.nl_pid = i++;
+
+    if (i > 10) {
+        LOG(ERROR, "netlink bind error");
+        goto out;
+    }
+
+    ret = bind(skfd, (struct sockaddr *)&sa, sizeof(sa));
+    if (ret < 0 && errno == EADDRINUSE) {
+        LOG(ERROR, "colo index %d has already in used", sa.nl_pid);
+        goto retry;
+    }
+
+    cps->index = sa.nl_pid;
+    ret = colo_proxy_send(cps, NULL, 0, COLO_PROXY_INIT);
+    if (ret < 0) {
+        goto out;
+    }
+    setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    ret = -1;
+    size = colo_proxy_recv(cps, &buff, 0);
+    /* disable SO_RCVTIMEO */
+    tv.tv_usec = 0;
+    setsockopt(cps->sock_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
+    if (size < 0) {
+        LOG(ERROR, "Can't recv msg from kernel by netlink: %s",
+            strerror(errno));
+        goto out;
+    }
+
+    if (size) {
+        h = (struct nlmsghdr *)buff;
+
+        if (h->nlmsg_type == NLMSG_ERROR) {
+            struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
+            if (size - sizeof(*h) < sizeof(*err)) {
+                goto out;
+            }
+            ret = -err->error;
+            if (ret) {
+                goto out;
+            }
+        }
+    }
+
+    ret = 0;
+
+out:
+    free(buff);
+    if (ret) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+    return ret;
+}
+
+void colo_proxy_teardown(libxl__colo_proxy_state *cps)
+{
+    if (cps->sock_fd >= 0) {
+        close(cps->sock_fd);
+        cps->sock_fd = -1;
+    }
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 898e42c..23305f2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3045,6 +3045,15 @@ libxl__stream_read_inuse(const libxl__stream_read_state *stream)
 }
 
 /*----- colo related state structure -----*/
+typedef struct libxl__colo_proxy_state libxl__colo_proxy_state;
+struct libxl__colo_proxy_state {
+    /* set by caller of colo_proxy_setup */
+    libxl__ao *ao;
+
+    int sock_fd;
+    int index;
+};
+
 typedef struct libxl__colo_save_state libxl__colo_save_state;
 struct libxl__colo_save_state {
     libxl__checkpoint_devices_state cds;
@@ -3059,6 +3068,9 @@ struct libxl__colo_save_state {
 
     /* private, used by qdisk block replication */
     bool qdisk_setuped;
+
+    /* private, used by colo-proxy */
+    libxl__colo_proxy_state cps;
 };
 
 /*----- Domain suspend (save) state structure -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (19 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind Yang Hongyang
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

preresume, postresume and checkpoint

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo.h       |  3 +++
 tools/libxl/libxl_colo_proxy.c | 57 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 46ca4cf..4e5f02a 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -36,4 +36,7 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
 
 extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
 extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
+extern void colo_proxy_preresume(libxl__colo_proxy_state *cps);
+extern void colo_proxy_postresume(libxl__colo_proxy_state *cps);
+extern int colo_proxy_checkpoint(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
index 9f1243e..c8ff722 100644
--- a/tools/libxl/libxl_colo_proxy.c
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -208,3 +208,60 @@ void colo_proxy_teardown(libxl__colo_proxy_state *cps)
         cps->sock_fd = -1;
     }
 }
+
+/* ========= colo-proxy: preresume, postresume and checkpoint ========== */
+
+void colo_proxy_preresume(libxl__colo_proxy_state *cps)
+{
+    colo_proxy_send(cps, NULL, 0, COLO_CHECKPOINT);
+    /* TODO: need to handle if the call fails... */
+}
+
+void colo_proxy_postresume(libxl__colo_proxy_state *cps)
+{
+    /* nothing to do... */
+}
+
+
+typedef struct colo_msg {
+    bool is_checkpoint;
+} colo_msg;
+
+/*
+do checkpoint: return 1
+error: return -1
+do not checkpoint: return 0
+*/
+int colo_proxy_checkpoint(libxl__colo_proxy_state *cps)
+{
+    uint8_t *buff;
+    int64_t size;
+    struct nlmsghdr *h;
+    struct colo_msg *m;
+    int ret = -1;
+
+    size = colo_proxy_recv(cps, &buff, MSG_DONTWAIT);
+
+    /* timeout, return no checkpoint message. */
+    if (size <= 0) {
+        return 0;
+    }
+
+    h = (struct nlmsghdr *) buff;
+
+    if (h->nlmsg_type == NLMSG_ERROR) {
+        goto out;
+    }
+
+    if (h->nlmsg_len < NLMSG_LENGTH(sizeof(*m))) {
+        goto out;
+    }
+
+    m = NLMSG_DATA(h);
+
+    ret = m->is_checkpoint ? 1 : 0;
+
+out:
+    free(buff);
+    return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (20 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side Yang Hongyang
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/colo-proxy-setup | 131 ++++++++++++++
 tools/libxl/Makefile                 |   1 +
 tools/libxl/libxl_colo_nic.c         | 320 +++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h         |   5 +
 tools/libxl/libxl_types.idl          |   1 +
 6 files changed, 459 insertions(+)
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index bc8ee5e..71b6475 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -26,6 +26,7 @@ XEN_SCRIPTS += block-iscsi
 XEN_SCRIPTS += block-tap
 XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
 
diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 0000000..3096a9c
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,131 @@
+#! /bin/bash
+
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+. "$dir/hotplugpath.sh"
+. "$dir/xen-network-ft.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+    echo "Invalid command: $command"
+    log err "Invalid command: $command"
+    exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${index:?}
+: ${bridge:?}
+
+forwardbr="colobr0"
+
+if [ "$mode" != "primary" -a "$mode" != "secondary" ]
+then
+    echo "Invalid mode: $mode"
+    log err "Invalid mode: $mode"
+    exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+    echo "index overflow"
+    exit 1
+fi
+
+function setup_primary()
+{
+    do_without_error tc qdisc add dev $vifname root handle 1: prio
+    do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \
+        u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \
+        12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+        dev $forwarddev
+
+    do_without_error modprobe nf_conntrack_ipv4
+    do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+    do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j PMYCOLO --index $index
+    do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark $index
+}
+
+function teardown_primary()
+{
+    do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \
+        0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+    do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+    do_without_error iptables -t mangle -F
+    do_without_error ip6tables -t mangle -F
+    do_without_error arptables -F
+    do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+    do_without_error brctl delif $bridge $vifname
+    do_without_error brctl addbr $forwardbr
+    do_without_error brctl addif $forwardbr $vifname
+    do_without_error brctl addif $forwardbr $forwarddev
+    do_without_error modprobe xt_SECCOLO
+
+    do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+    do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \
+        $vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+    do_without_error brctl delif $forwardbr $forwarddev
+    do_without_error brctl delif $forwardbr $vifname
+    do_without_error brctl delbr $forwardbr
+    do_without_error brctl addif $bridge $vifname
+
+    do_without_error iptables -t mangle -F
+    do_without_error ip6tables -t mangle -F
+    do_without_error rmmod xt_SECCOLO
+}
+
+case "$command" in
+    setup)
+        if [ "$mode" = "primary" ]
+        then
+            setup_primary
+        else
+            setup_secondary
+        fi
+
+        success
+        ;;
+    teardown)
+        if [ "$mode" = "primary" ]
+        then
+            teardown_primary
+        else
+            teardown_secondary
+        fi
+        ;;
+esac
+
+if [ "$mode" = "primary" ]
+then
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev."
+else
+    log debug "Successful colo-proxy-setup $command for $vifname." \
+              " vifname: $vifname, index: $index, forwarddev: $forwarddev,"\
+              " forwardbr: $forwardbr."
+fi
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index d7a3540..2a180fb 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -66,6 +66,7 @@ LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
 LIBXL_OBJS-y += libxl_colo_proxy.o
+LIBXL_OBJS-y += libxl_colo_nic.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c
new file mode 100644
index 0000000..9c4d469
--- /dev/null
+++ b/tools/libxl/libxl_colo_nic.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+typedef struct libxl__colo_device_nic {
+    int devid;
+    const char *vif;
+} libxl__colo_device_nic;
+
+enum {
+    primary,
+    secondary,
+};
+
+
+/* ========== init() and cleanup() ========== */
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+    return 0;
+}
+
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* ========== helper functions ========== */
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                     libxl__async_exec_state *aes,
+                                     int rc, int status);
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status);
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__checkpoint_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dev->cds->domid;
+
+    path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+                     libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $forwarddev
+ * $mode(primary/secondary)
+ * $index
+ * $bridge
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__checkpoint_device *dev, char *op, int side,
+                             char *colo_proxy_script)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    libxl__async_exec_state *aes = &dev->aodev.aes;
+    const libxl_device_nic *nic = dev->backend_dev;
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const char *const vif = colo_nic->vif;
+
+    arraysize = 11;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "forwarddev";
+    env[nr++] = libxl__strdup(gc, nic->forwarddev);
+    env[nr++] = "mode";
+    if (side == primary)
+        env[nr++] = "primary";
+    else
+        env[nr++] = "secondary";
+    env[nr++] = "index";
+    env[nr++] = GCSPRINTF("%d", css->cps.index);
+    env[nr++] = "bridge";
+    env[nr++] = libxl__strdup(gc, nic->bridge);
+    env[nr++] = NULL;
+    assert(nr == arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = colo_proxy_script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->cds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = colo_save_teardown_script_cb;
+    else
+        aes->callback = colo_save_setup_script_cb;
+}
+
+/* ========== setup() and teardown() ========== */
+static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+                           int side, char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->cds->ao);
+
+    /*
+     * thers's no subkind of nic devices, so nic ops is always matched
+     * with nic devices, we begin to setup the nic device
+     */
+    dev->matched = 1;
+
+    if (!nic->forwarddev) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    GCNEW(colo_nic);
+    dev->concrete_data = colo_nic;
+    colo_nic->devid = nic->devid;
+    colo_nic->vif = get_vifname(dev, nic);
+    if (!colo_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(dev, "setup", side, colo_proxy_script);
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_setup_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+    libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    libxl__checkpoint_devices_state *cds = dev->cds;
+    const char *out_path_base, *hotplug_error = NULL;
+
+    STATE_AO_GC(cds->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = cds->domid;
+    const int devid = colo_nic->devid;
+    const char *const vif = colo_nic->vif;
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    if (rc)
+        goto out;
+
+    out_path_base = GCSPRINTF("%s/colo_proxy/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s",
+            aes->args[0], vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = 0;
+
+out:
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+                              int side, char *colo_proxy_script)
+{
+    int rc;
+    libxl__colo_device_nic *colo_nic = dev->concrete_data;
+    STATE_AO_GC(dev->cds->ao);
+
+    if (!colo_nic || !colo_nic->vif) {
+        /* colo nic has not yet been set up, just return */
+        rc = 0;
+        goto out;
+    }
+
+    setup_async_exec(dev, "teardown", side, colo_proxy_script);
+
+    rc = libxl__async_exec_start(&dev->aodev.aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->aodev.rc = rc;
+    dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void colo_save_teardown_script_cb(libxl__egc *egc,
+                                         libxl__async_exec_state *aes,
+                                         int rc, int status)
+{
+    libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    aodev->rc = rc;
+    aodev->callback(egc, aodev);
+}
+
+/* ======== primary ======== */
+static void colo_nic_save_setup(libxl__egc *egc, libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    colo_nic_setup(egc, dev, primary, css->colo_proxy_script);
+}
+
+static void colo_nic_save_teardown(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(dev->cds, *css, cds);
+
+    colo_nic_teardown(egc, dev, primary, css->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_save_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_save_setup,
+    .teardown = colo_nic_save_teardown,
+};
+
+/* ======== secondary ======== */
+static void colo_nic_restore_setup(libxl__egc *egc,
+                                   libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(dev->cds, *crs, cds);
+
+    colo_nic_setup(egc, dev, secondary, crs->colo_proxy_script);
+}
+
+static void colo_nic_restore_teardown(libxl__egc *egc,
+                                      libxl__checkpoint_device *dev)
+{
+    libxl__colo_restore_state *crs = CONTAINER_OF(dev->cds, *crs, cds);
+
+    colo_nic_teardown(egc, dev, secondary, crs->colo_proxy_script);
+}
+
+const libxl__checkpoint_device_instance_ops colo_restore_device_nic = {
+    .kind = LIBXL__DEVICE_KIND_VIF,
+    .setup = colo_nic_restore_setup,
+    .teardown = colo_nic_restore_teardown,
+};
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 23305f2..65fac32 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2836,6 +2836,8 @@ void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 int init_subkind_qdisk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds);
 int colo_qdisk_preresume(libxl_ctx *ctx, domid_t domid);
+int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds);
+void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
                                         libxl__checkpoint_devices_state *,
@@ -3059,6 +3061,7 @@ struct libxl__colo_save_state {
     libxl__checkpoint_devices_state cds;
     int send_fd;
     int recv_fd;
+    char *colo_proxy_script;
 
     /* private */
     libxl__stream_read_state srs;
@@ -3463,6 +3466,7 @@ struct libxl__colo_restore_state {
     int recv_fd;
     int hvm;
     libxl__colo_callback *callback;
+    char *colo_proxy_script;
 
     /* private, colo restore checkpoint state */
     libxl__domain_create_cb *saved_cb;
@@ -3485,6 +3489,7 @@ struct libxl__domain_create_state {
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
     int guest_domid;
+    const char *colo_proxy_script;
     libxl__domain_build_state build_state;
     libxl__colo_restore_state crs;
     libxl__bootloader_state bl;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 5eb9a38..d835d50 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -564,6 +564,7 @@ libxl_device_nic = Struct("device_nic", [
     ("rate_bytes_per_interval", uint64),
     ("rate_interval_usecs", uint32),
     ("gatewaydev", string),
+    ("forwarddev", string)
     ])
 
 libxl_device_pci = Struct("device_pci", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (21 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy Yang Hongyang
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

setup and control colo proxy on primary side

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_save.c | 124 +++++++++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_internal.h  |   1 +
 2 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index 1245da7..50a880b 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -19,9 +19,11 @@
 #include "libxl_internal.h"
 #include "libxl_colo.h"
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+    &colo_save_device_nic,
     &colo_save_device_qdisk,
     NULL,
 };
@@ -33,9 +35,15 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
-    rc = init_subkind_qdisk(cds);
+    rc = init_subkind_colo_nic(cds);
     if (rc) goto out;
 
+    rc = init_subkind_qdisk(cds);
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
+
     rc = 0;
 out:
     return rc;
@@ -46,6 +54,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -76,9 +85,16 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->svm_running = false;
     css->paused = true;
     css->qdisk_setuped = false;
+    libxl__ev_child_init(&css->child);
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    if (dss->remus->netbufscript)
+        css->colo_proxy_script = libxl__strdup(gc, dss->remus->netbufscript);
+    else
+        css->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                           libxl__xen_script_dir_path());
+
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->ops = colo_ops;
     cds->callback = colo_save_setup_done;
     cds->ao = ao;
@@ -88,6 +104,12 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css)
     css->srs.fd = css->recv_fd;
     css->srs.back_channel = true;
     libxl__stream_read_start(egc, &css->srs);
+    css->cps.ao = ao;
+    if (colo_proxy_setup(&css->cps)) {
+        LOG(ERROR, "COLO: failed to setup colo proxy for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
 
     if (init_device_subkind(cds))
         goto out;
@@ -162,6 +184,7 @@ static void colo_teardown_done(libxl__egc *egc,
     libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
 
     cleanup_device_subkind(cds);
+    colo_proxy_teardown(&css->cps);
     dss->callback(egc, dss, rc);
 }
 
@@ -378,6 +401,8 @@ static void colo_read_svm_ready_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_preresume(&css->cps);
+
     css->svm_running = true;
     css->cds.callback = colo_preresume_cb;
     libxl__checkpoint_devices_preresume(egc, &css->cds);
@@ -454,6 +479,8 @@ static void colo_read_svm_resumed_done(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_postresume(&css->cps);
+
     ok = 1;
 
 out:
@@ -462,6 +489,91 @@ out:
 
 
 /* ===================== colo: wait new checkpoint ===================== */
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+                                      libxl__checkpoint_devices_state *cds,
+                                      int rc);
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css);
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status);
+
+static void colo_proxy_async_call(libxl__egc *egc,
+                                  libxl__colo_save_state *css,
+                                  void func(libxl__colo_save_state *),
+                                  libxl__ev_child_callback callback)
+{
+    int pid = -1, rc;
+
+    STATE_AO_GC(css->cds.ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &css->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(css);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    callback(egc, &css->child, -1, 1);
+}
+
+static void colo_proxy_wait_for_checkpoint(libxl__egc *egc,
+                                           libxl__colo_save_state *css)
+{
+    colo_proxy_async_call(egc, css,
+                          colo_proxy_async_wait_for_checkpoint,
+                          colo_proxy_async_call_done);
+}
+
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css)
+{
+    int req;
+
+again:
+    req = colo_proxy_checkpoint(&css->cps);
+    if (req < 0) {
+        /* some error happens */
+        _exit(1);
+    } else if (!req) {
+        /* no checkpoint is needed, wait for 1ms and the check again */
+        usleep(1000);
+        goto again;
+    } else {
+        /* net packets is not consistent, we need to start a checkpoint */
+        _exit(0);
+    }
+}
+
+static void colo_proxy_async_call_done(libxl__egc *egc,
+                                       libxl__ev_child *child,
+                                       int pid,
+                                       int status)
+{
+    libxl__colo_save_state *css = CONTAINER_OF(child, *css, child);
+
+    EGC_GC;
+
+    if (status) {
+        LOG(ERROR, "failed to wait for new checkpoint");
+        colo_start_new_checkpoint(egc, &css->cds, ERROR_FAIL);
+        return;
+    }
+
+    colo_start_new_checkpoint(egc, &css->cds, 0);
+}
+
 /*
  * Do the following things:
  * 1. do commit
@@ -471,9 +583,6 @@ out:
 static void colo_device_commit_cb(libxl__egc *egc,
                                   libxl__checkpoint_devices_state *cds,
                                   int rc);
-static void colo_start_new_checkpoint(libxl__egc *egc,
-                                      libxl__checkpoint_devices_state *cds,
-                                      int rc);
 
 void libxl__colo_save_domain_should_checkpoint_callback(void *data)
 {
@@ -503,8 +612,7 @@ static void colo_device_commit_cb(libxl__egc *egc,
         goto out;
     }
 
-    /* TODO: wait a new checkpoint */
-    colo_start_new_checkpoint(egc, cds, 0);
+    colo_proxy_wait_for_checkpoint(egc, css);
     return;
 
 out:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 65fac32..d12297d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3074,6 +3074,7 @@ struct libxl__colo_save_state {
 
     /* private, used by colo-proxy */
     libxl__colo_proxy_state cps;
+    libxl__ev_child child;
 };
 
 /*----- Domain suspend (save) state structure -----*/
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (22 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy Yang Hongyang
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

setup and control colo proxy on secondary side

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxl/libxl_colo_restore.c | 28 +++++++++++++++++++++++++---
 tools/libxl/libxl_internal.h     |  3 +++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 96ea0b9..da546f9 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -49,9 +49,11 @@ static void libxl__colo_restore_domain_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_should_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_suspend_callback(void *data);
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+    &colo_restore_device_nic,
     &colo_restore_device_qdisk,
     NULL,
 };
@@ -151,8 +153,14 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds)
     int rc;
     STATE_AO_GC(cds->ao);
 
+    rc = init_subkind_colo_nic(cds);
+    if (rc) goto out;
+
     rc = init_subkind_qdisk(cds);
-    if (rc)  goto out;
+    if (rc) {
+        cleanup_subkind_colo_nic(cds);
+        goto out;
+    }
 
     rc = 0;
 out:
@@ -164,6 +172,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
     /* cleanup device subkind-specific state in the libxl ctx */
     STATE_AO_GC(cds->ao);
 
+    cleanup_subkind_colo_nic(cds);
     cleanup_subkind_qdisk(cds);
 }
 
@@ -351,6 +360,8 @@ static void colo_restore_teardown_done(libxl__egc *egc,
     if (crcs->teardown_devices)
         cleanup_device_subkind(cds);
 
+    colo_proxy_teardown(&crs->cps);
+
     rc = crcs->saved_rc;
     if (!rc) {
         crcs->callback = do_failover_done;
@@ -535,6 +546,8 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
         goto out;
     }
 
+    colo_proxy_preresume(&crs->cps);
+
     colo_restore_resume_vm(egc, crcs);
 
     return;
@@ -571,6 +584,8 @@ static void colo_resume_vm_done(libxl__egc *egc,
 
     crcs->status = LIBXL_COLO_RESUMED;
 
+    colo_proxy_postresume(&crs->cps);
+
     /* avoid calling libxl__xc_domain_restore_done() more than once */
     if (crs->saved_cb) {
         dcs->callback = crs->saved_cb;
@@ -690,13 +705,20 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
     STATE_AO_GC(crs->ao);
 
-    /* TODO: nic support */
-    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VBD);
+    cds->device_kind_flags = (1 << LIBXL__DEVICE_KIND_VIF) |
+                             (1 << LIBXL__DEVICE_KIND_VBD);
     cds->callback = colo_restore_setup_cds_done;
     cds->ao = ao;
     cds->domid = crs->domid;
     cds->ops = colo_restore_ops;
 
+    crs->cps.ao = ao;
+    if (colo_proxy_setup(&crs->cps)) {
+        LOG(ERROR, "COLO: failed to setup colo proxy for guest with domid %u",
+            cds->domid);
+        goto out;
+    }
+
     if (init_device_subkind(cds))
         goto out;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d12297d..33a93a1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3476,6 +3476,9 @@ struct libxl__colo_restore_state {
 
     /* private, used by qdisk block replication */
     bool qdisk_setuped;
+
+    /* private, used by colo proxy */
+    libxl__colo_proxy_state cps;
 };
 
 struct libxl__domain_create_state {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy
  2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
                   ` (23 preceding siblings ...)
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side Yang Hongyang
@ 2015-07-15  9:18 ` Yang Hongyang
  24 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-15  9:18 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, andrew.cooper3, yunhong.jiang,
	eddie.dong, guijianfeng, rshriram, ian.jackson

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5      |  6 ++++++
 docs/man/xl.pod.1           |  1 -
 tools/libxl/libxl.c         |  6 ++++++
 tools/libxl/libxl_create.c  | 14 +++++++++++--
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl.c            |  3 +++
 tools/libxl/xl.h            |  1 +
 tools/libxl/xl_cmdimpl.c    | 50 ++++++++++++++++++++++++++++++++++-----------
 8 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to setup network buffering.
 
 Default: C</etc/xen/scripts/remus-netbuf-setup>
 
+=item B<colo.default.proxyscript="PATH">
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C</etc/xen/scripts/colo-proxy-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 1effce7..a7ac32f 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -454,7 +454,6 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
      Disk replication support is limited to DRBD disks.
 
      COLO support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network at the moment.
 
 B<OPTIONS>
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c6cc5aa..75372ea 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3305,6 +3305,11 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid,
         flexarray_append(back, nic->ifname);
     }
 
+    if (nic->forwarddev) {
+        flexarray_append(back, "forwarddev");
+        flexarray_append(back, nic->forwarddev);
+    }
+
     flexarray_append(back, "mac");
     flexarray_append(back,libxl__sprintf(gc,
                                     LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic->mac)));
@@ -3428,6 +3433,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
     nic->ip = READ_BACKEND(NOGC, "ip");
     nic->bridge = READ_BACKEND(NOGC, "bridge");
     nic->script = READ_BACKEND(NOGC, "script");
+    nic->forwarddev = READ_BACKEND(NOGC, "forwarddev");
 
     /* vif_ioemu nics use the same xenstore entries as vif interfaces */
     tmp = READ_BACKEND(gc, "type");
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d99d5ef..7de2e89 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1089,6 +1089,11 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         crs->recv_fd = restore_fd;
         crs->hvm = (info->type == LIBXL_DOMAIN_TYPE_HVM);
         crs->callback = libxl__colo_restore_setup_done;
+        if (dcs->colo_proxy_script)
+            crs->colo_proxy_script = libxl__strdup(gc, dcs->colo_proxy_script);
+        else
+            crs->colo_proxy_script = GCSPRINTF("%s/colo-proxy-setup",
+                                               libxl__xen_script_dir_path());
         libxl__colo_restore_setup(egc, crs);
     } else
         libxl__stream_read_start(egc, &dcs->srs);
@@ -1612,6 +1617,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd, int send_fd,
                             const libxl_domain_restore_params *params,
+                            const char *colo_proxy_script,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1628,6 +1634,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     if (restore_fd > -1)
         cdcs->dcs.restore_params = *params;
     cdcs->dcs.callback = domain_create_cb;
+    cdcs->dcs.colo_proxy_script = colo_proxy_script;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
     cdcs->domid_out = domid;
 
@@ -1670,7 +1677,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL,
                             ao_how, aop_console_how);
 }
 
@@ -1680,14 +1687,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
+    char *colo_proxy_script = NULL;
+
     if (params->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
+        colo_proxy_script = params->colo_proxy_script;
         set_disk_colo_restore(d_config);
     } else {
         unset_disk_colo_restore(d_config);
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_fd, params,
-                            ao_how, aop_console_how);
+                            colo_proxy_script, ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index d835d50..6e7e358 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -362,6 +362,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
 libxl_domain_restore_params = Struct("domain_restore_params", [
     ("checkpointed_stream", integer),
     ("stream_version", uint32, {'init_val': '1'}),
+    ("colo_proxy_script", string),
     ])
 
 libxl_domain_sched_params = Struct("domain_sched_params",[
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index f014306..f44f04f 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -45,6 +45,7 @@ char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
 char *default_remus_netbufscript = NULL;
+char *default_colo_proxy_script = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -179,6 +180,8 @@ static void parse_global_config(const char *configfile,
 
     xlu_cfg_replace_string (config, "remus.default.netbufscript",
         &default_remus_netbufscript, 0);
+    xlu_cfg_replace_string (config, "colo.default.proxyscript",
+        &default_colo_proxy_script, 0);
 
     xlu_cfg_destroy(config);
 }
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 13bccba..aa17dd5 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -182,6 +182,7 @@ extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
 extern char *default_remus_netbufscript;
+extern char *default_colo_proxy_script;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 45ec435..3d99555 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -156,6 +156,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    char *colo_proxy_script;
     int migrate_fd; /* -1 means none */
     int send_fd; /* -1 means none */
     char **migration_domname_r; /* from malloc */
@@ -993,6 +994,8 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to
         replace_string(&nic->model, oparg);
     } else if (MATCH_OPTION("rate", token, oparg)) {
         parse_vif_rate(config, oparg, nic);
+    } else if (MATCH_OPTION("forwarddev", token, oparg)) {
+        replace_string(&nic->forwarddev, oparg);
     } else if (MATCH_OPTION("accel", token, oparg)) {
         fprintf(stderr, "the accel parameter for vifs is currently not supported\n");
     } else {
@@ -2764,6 +2767,7 @@ start:
         params.checkpointed_stream = dom_info->checkpointed_stream;
         params.stream_version =
             (hdr.mandatory_flags & XL_MANDATORY_FLAG_STREAMv2) ? 2 : 1;
+        params.colo_proxy_script = dom_info->colo_proxy_script;
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
@@ -4285,7 +4289,8 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int send_fd, int recv_fd, int checkpointed)
+                            int send_fd, int recv_fd, int checkpointed,
+                            char *colo_proxy_script)
 {
     uint32_t domid;
     int rc, rc2;
@@ -4314,6 +4319,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.send_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
     dom_info.checkpointed_stream = checkpointed;
+    dom_info.colo_proxy_script = colo_proxy_script;
     if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
         /* COLO uses stdout to send control message to master */
         dom_info.quiet = 1;
@@ -4506,8 +4512,9 @@ int main_migrate_receive(int argc, char **argv)
     int debug = 0, daemonize = 1, monitor = 1;
     int checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
+    char *script = NULL;
 
-    SWITCH_FOREACH_OPT(opt, "Fedrc", NULL, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrcn:", NULL, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -4524,6 +4531,9 @@ int main_migrate_receive(int argc, char **argv)
     case 'c':
         checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
         break;
+    case 'n':
+        script = optarg;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -4532,7 +4542,7 @@ int main_migrate_receive(int argc, char **argv)
     }
     migrate_receive(debug, daemonize, monitor,
                     STDOUT_FILENO, STDIN_FILENO,
-                    checkpointed);
+                    checkpointed, script);
 
     return 0;
 }
@@ -7932,8 +7942,10 @@ int main_remus(int argc, char **argv)
         r_info.interval = 200;
 
     if (libxl_defbool_val(r_info.colo)) {
-        if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
-            perror("Option -c conflicts with -i or -b");
+        if (r_info.interval || libxl_defbool_val(r_info.blackhole) ||
+            !libxl_defbool_is_default(r_info.netbuf) ||
+            !libxl_defbool_is_default(r_info.diskbuf)) {
+            perror("option -c is conflict with -i, -d, -n or -b");
             exit(-1);
         }
 
@@ -7944,8 +7956,12 @@ int main_remus(int argc, char **argv)
         }
     }
 
-    if (!r_info.netbufscript)
-        r_info.netbufscript = default_remus_netbufscript;
+    if (!r_info.netbufscript) {
+        if (libxl_defbool_val(r_info.colo))
+            r_info.netbufscript = default_colo_proxy_script;
+        else
+            r_info.netbufscript = default_remus_netbufscript;
+    }
 
     if (libxl_defbool_val(r_info.blackhole)) {
         send_fd = open("/dev/null", O_RDWR, 0644);
@@ -7958,11 +7974,21 @@ int main_remus(int argc, char **argv)
         if (!ssh_command[0]) {
             rune = host;
         } else {
-            if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
-                         ssh_command, host,
-                         libxl_defbool_val(r_info.colo) ? "-c" : "-r",
-                         daemonize ? "" : " -e") < 0)
-                return 1;
+            if (!libxl_defbool_val(r_info.colo)) {
+                if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s",
+                             ssh_command, host,
+                             "-r",
+                             daemonize ? "" : " -e") < 0)
+                    return 1;
+            } else {
+                if (asprintf(&rune, "exec %s %s xl migrate-receive %s %s %s %s",
+                             ssh_command, host,
+                             "-c",
+                             r_info.netbufscript ? "-n" : "",
+                             r_info.netbufscript ? r_info.netbufscript : "",
+                             daemonize ? "" : " -e") < 0)
+                    return 1;
+            }
         }
 
         save_domain_core_begin(domid, NULL, &config_data, &config_len);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams Yang Hongyang
@ 2015-07-15 16:52   ` Andrew Cooper
  2015-07-16  6:32     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 16:52 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
>
> It is the negotiation record for COLO.
> Primary->Secondary:
> control_id      0x00000000: Secondary VM is out of sync, start a new checkpoint
> Secondary->Primary:
>                 0x00000001: Secondary VM is suspended
>                 0x00000002: Secondary VM is ready
>                 0x00000003: Secondary VM is resumed
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
>  tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>  tools/python/xen/migration/libxl.py      |  9 +++++++++
>  3 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc
> index c24a434..5986273 100644
> --- a/docs/specs/libxl-migration-stream.pandoc
> +++ b/docs/specs/libxl-migration-stream.pandoc
> @@ -121,7 +121,9 @@ type         0x00000000: END
>  
>               0x00000004: CHECKPOINT_END
>  
> -             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
> +             0x00000005: COLO_CONTEXT
> +
> +             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
>               records.
>  
>               0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a checkpoint in the image.
>      +-------------------------------------------------+
>  
>  The end record contains no fields; its body_length is 0.
> +
> +COLO\_CONTEXT
> +--------------
> +
> +A COLO context record contains the control information for COLO.
> +
> +     0     1     2     3     4     5     6     7 octet
> +    +------------------------+------------------------+
> +    | control_id             | padding                |
> +    +------------------------+------------------------+
> +
> +--------------------------------------------------------------------
> +Field            Description
> +------------     ---------------------------------------------------
> +control_id       0x00000000: Secondary VM is out of sync, start a new checkpoint
> +                 0x00000001: Secondary VM is suspended
> +                 0x00000002: Secondary VM is ready
> +                 0x00000003: Secondary VM is resumed

This style of table in pandoc need to be terminated with a line of
-------, just like the head of the table.

Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
associated blob of data, but this is not the case here.  Here, it is
more of a status update, with expected actions on some states.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records Yang Hongyang
@ 2015-07-15 17:13   ` Andrew Cooper
  2015-07-16  7:18     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:13 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> Used by secondary to send it's dirty bitmap to primary under COLO.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  docs/specs/libxc-migration-stream.pandoc | 24 +++++++++++++++++++++++-
>  tools/libxc/xc_sr_common.c               |  1 +
>  tools/libxc/xc_sr_stream_format.h        |  1 +
>  3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
> index 68fa513..480d357 100644
> --- a/docs/specs/libxc-migration-stream.pandoc
> +++ b/docs/specs/libxc-migration-stream.pandoc
> @@ -227,7 +227,9 @@ type         0x00000000: END
>  
>               0x0000000E: CHECKPOINT
>  
> -             0x0000000F - 0x7FFFFFFF: Reserved for future _mandatory_
> +             0x0000000F: DIRTY_BITMAP
> +
> +             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
>               records.
>  
>               0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
> @@ -601,6 +603,26 @@ CHECKPOINT record or an END record.
>  
>  \clearpage
>  
> +DIRTY_BITMAP
> +------------

I would name this DIRTY_PFN_LIST or similar, as the content of data
isn't actually a bitmap.

> +
> +A dirty_bitmap record is used for secondary to send it's dirty bitmap
> +to primary while doing a checkpoint under COLO. This record only exists
> +in back channel.

This section should purely be a description of the content.  i.e.

"A DIRTY\_xxx record is used to convey information about dirty memory in
the VM.  It is an unordered list of PFNs."

> +
> +     0     1     2     3     4     5     6     7 octet
> +    +-------------------------------------------------+
> +    | pfn[0]                                          |
> +    +-------------------------------------------------+
> +    ...
> +    +-------------------------------------------------+
> +    | pfn[C-1]                                        |
> +    +-------------------------------------------------+
> +
> +The count of the pfn is: record->length/sizeof(uint64_t).

"The count of pfns is", although I would like to hope that this is
obvious from the diagram.

Down here, there should be more description of record circumstances,
e.g. currently only applicable in the backchannel of a checkpointed stream.

Also please put some validation logic for this in
tools/python/xen/migration/libxc.py

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use Yang Hongyang
@ 2015-07-15 17:14   ` Andrew Cooper
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:14 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> read_record() could be used by primary to read dirty bitmap
> record sent by secondary under COLO.
> When used by save side, we need to pass the backchannel fd
> instead of ctx->fd to read_record(), so we added a fd param to
> it.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream Yang Hongyang
@ 2015-07-15 17:25   ` Andrew Cooper
  2015-07-16  7:21     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:25 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> Add back channel support to write stream. If the write stream is
> a back channel stream, this means the write stream is used by
> Secondary to send some records back.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_dom_save.c     |  1 +
>  tools/libxl/libxl_internal.h     |  1 +
>  tools/libxl/libxl_stream_write.c | 16 ++++++++++++++++
>  3 files changed, 18 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
> index 9b7159f..25813ce 100644
> --- a/tools/libxl/libxl_dom_save.c
> +++ b/tools/libxl/libxl_dom_save.c
> @@ -445,6 +445,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
>      dss->sws.ao  = dss->ao;
>      dss->sws.dss = dss;
>      dss->sws.fd  = dss->fd;
> +    dss->sws.back_channel = false;
>      dss->sws.completion_callback = stream_done;
>  
>      libxl__stream_write_start(egc, &dss->sws);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 9c81d8d..a83d6a5 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2989,6 +2989,7 @@ struct libxl__stream_write_state {
>      libxl__ao *ao;
>      libxl__domain_save_state *dss;
>      int fd;
> +    bool back_channel;
>      void (*completion_callback)(libxl__egc *egc,
>                                  libxl__stream_write_state *sws,
>                                  int rc);
> diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
> index 16f667a..df55277 100644
> --- a/tools/libxl/libxl_stream_write.c
> +++ b/tools/libxl/libxl_stream_write.c
> @@ -47,6 +47,13 @@
>   *  - Toolstack record
>   *  - if (hvm), Qemu record
>   *  - Checkpoint end record
> + *
> + * For back channel stream:
> + * - libxl__stream_write_start()
> + *    - Set up the stream to running state
> + *
> + * - Add a new API to write the record. When the record is written
> + *   out, call stream->checkpoint_callback() to return.
>   */
>  
>  /* Success/error/cleanup handling. */
> @@ -178,6 +185,9 @@ void libxl__stream_write_start(libxl__egc *egc,
>  
>      stream->running = true;
>  
> +    if (stream->back_channel)
> +        return;
> +

Some of the setup below should not be skipped.

While it makes the diff bigger, the end result would be more logical as

dc->ao = ao;
dc->readfd = -1;
dc->writefd = stream->fd;
dc->maxsz = -1;

if (!stream->back_channel) {
    /* Write the stream header. */
    dc->writewhat = "save/migration stream";
    dc->callback = stream_header_done;

    rc = ibxl__datacopier_start(dc);
    ....
}

To split the object setup from the action of sending the stream header,
which are currently mixed.

What happened to the backchannel negotiation header?

~Andrew

>      dc->ao        = ao;
>      dc->readfd    = -1;
>      dc->writewhat = "save/migration stream";
> @@ -207,6 +217,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc,
>  {
>      assert(stream->running);
>      assert(!stream->in_checkpoint);
> +    assert(!stream->back_channel);
>      stream->in_checkpoint = true;
>  
>      write_toolstack_record(egc, stream);
> @@ -500,6 +511,11 @@ static void stream_done(libxl__egc *egc,
>      assert(stream->running);
>      stream->running = false;
>  
> +    if (stream->back_channel) {
> +        stream->completion_callback(egc, stream, stream->rc);
> +        return;
> +    }
> +
>      if (stream->emu_carefd)
>          libxl__carefd_close(stream->emu_carefd);
>      free(stream->emu_body);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream Yang Hongyang
@ 2015-07-15 17:35   ` Andrew Cooper
  2015-07-16  7:24     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:35 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> write colo_context records into the stream, used by both
> primary and secondary to send colo context.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_internal.h     |  5 +++
>  tools/libxl/libxl_stream_write.c | 87 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 92 insertions(+)
>
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index a83d6a5..2634836 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3000,6 +3000,7 @@ struct libxl__stream_write_state {
>      int rc;
>      bool running;
>      bool in_checkpoint;
> +    bool in_colo_context;
>      libxl__save_helper_state shs;
>  
>      /* Main stream-writing data. */
> @@ -3019,6 +3020,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc,
>  _hidden void
>  libxl__stream_write_start_checkpoint(libxl__egc *egc,
>                                       libxl__stream_write_state *stream);
> +_hidden void
> +libxl__stream_write_colo_context(libxl__egc *egc,
> +                                 libxl__stream_write_state *stream,
> +                                 libxl_sr_colo_context *colo_context);
>  _hidden void libxl__stream_write_abort(libxl__egc *egc,
>                                         libxl__stream_write_state *stream,
>                                         int rc);
> diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
> index df55277..e7a32c4 100644
> --- a/tools/libxl/libxl_stream_write.c
> +++ b/tools/libxl/libxl_stream_write.c
> @@ -96,6 +96,16 @@ static void write_checkpoint_end_record(libxl__egc *egc,
>  static void checkpoint_end_record_done(libxl__egc *egc,
>                                         libxl__stream_write_state *stream);
>  
> +/* COLO context */
> +static void write_colo_context(libxl__egc *egc,
> +                               libxl__stream_write_state *stream,
> +                               libxl_sr_colo_context *colo_context);
> +static void write_colo_context_done(libxl__egc *egc,
> +                                    libxl__datacopier_state *dc,
> +                                    int rc, int onwrite, int errnoval);
> +static void colo_context_done(libxl__egc *egc,
> +                              libxl__stream_write_state *stream, int rc);
> +
>  /*----- Helpers -----*/
>  
>  static void write_done(libxl__egc *egc,
> @@ -500,6 +510,11 @@ static void stream_complete(libxl__egc *egc,
>          return;
>      }
>  
> +    if (stream->in_colo_context) {
> +        colo_context_done(egc, stream, rc);
> +        return;
> +    }

Please follow the same style as stream->in_checkpoint, asserting(rc) and
explaining how the error comes back around.

> +
>      if (!stream->rc)
>          stream->rc = rc;
>      stream_done(egc, stream);
> @@ -555,6 +570,78 @@ static void check_all_finished(libxl__egc *egc,
>      stream->completion_callback(egc, stream, stream->rc);
>  }
>  
> +/*----- COLO context -----*/
> +void libxl__stream_write_colo_context(libxl__egc *egc,
> +                                      libxl__stream_write_state *stream,
> +                                      libxl_sr_colo_context *colo_context)
> +{
> +    assert(stream->running);
> +    assert(!stream->in_checkpoint);
> +    assert(!stream->in_colo_context);
> +    stream->in_colo_context = true;
> +
> +    write_colo_context(egc, stream, colo_context);

Use setup_write() here, which will remove most of the rest of this
patch, and cover all your error handling for your.

You want a small pair of functions such as the write_checkpoint_end()
pair.  See write_toolstack_record() for an example using setup_write()
with a body.

Also, to preempt Ian Jacksons review, use FILLZERO() over "= { ... }"

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream Yang Hongyang
@ 2015-07-15 17:38   ` Andrew Cooper
  2015-07-16  7:25     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:38 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> This is used by primay to read records sent by secondary.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_create.c      |  1 +
>  tools/libxl/libxl_internal.h    |  1 +
>  tools/libxl/libxl_stream_read.c | 17 +++++++++++++++++
>  3 files changed, 19 insertions(+)
>
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 1d4b13b..1af7103 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -978,6 +978,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>      dcs->srs.dcs = dcs;
>      dcs->srs.fd = restore_fd;
>      dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
> +    dcs->srs.back_channel = false;
>      dcs->srs.completion_callback = domcreate_stream_done;
>  
>      libxl__stream_read_start(egc, &dcs->srs);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 2634836..05cee04 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3358,6 +3358,7 @@ struct libxl__stream_read_state {
>      libxl__domain_create_state *dcs;
>      int fd;
>      bool legacy;
> +    bool back_channel;
>      void (*completion_callback)(libxl__egc *egc,
>                                  libxl__stream_read_state *srs,
>                                  int rc);
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index 2d17403..b924f05 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -104,6 +104,15 @@
>   * Depending on the contents of the stream, there are likely to be several
>   * parallel tasks being managed.  check_all_finished() is used to join all
>   * tasks in both success and error cases.
> + *
> + * For back channel stream:
> + * - libxl__stream_read_start()
> + *    - Set up the stream to running state
> + *
> + * - libxl__stream_read_continue()
> + *     - Set up reading the next record from a started stream.
> + *       Add some codes to process_record() to handle the record.
> + *       Then call stream->checkpoint_callback() to return.
>   */
>  
>  /* Success/error/cleanup handling. */
> @@ -200,6 +209,9 @@ void libxl__stream_read_start(libxl__egc *egc,
>      stream->running = true;
>      stream->phase   = SRS_PHASE_NORMAL;
>  
> +    if (stream->back_channel)
> +        return;
> +
>      if (stream->legacy) {
>          /* Convert the legacy stream. */
>          libxl__conversion_helper_state *chs = &stream->chs;
> @@ -700,6 +712,11 @@ static void stream_done(libxl__egc *egc,
>      assert(!stream->in_checkpoint);
>      stream->running = false;
>  
> +    if (stream->back_channel) {
> +        stream->completion_callback(egc, stream, stream->rc);
> +        return;
> +    }
> +

This should be in stream_complete() not stream_done().  stream_done() is
strictly called once, and cleans stuff up.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 read stream
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 " Yang Hongyang
@ 2015-07-15 17:44   ` Andrew Cooper
  2015-07-16  7:52     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 17:44 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> Read a colo_context and call stream->checkpoint_callback to handle it.
>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  tools/libxl/libxl_internal.h    |  3 +++
>  tools/libxl/libxl_stream_read.c | 51 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+)
>
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 05cee04..1be2a4a 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3369,6 +3369,7 @@ struct libxl__stream_read_state {
>      int rc;
>      bool running;
>      bool in_checkpoint;
> +    bool in_colo_context;
>      libxl__save_helper_state shs;
>      libxl__conversion_helper_state chs;
>  
> @@ -3396,6 +3397,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc,
>                                        libxl__stream_read_state *stream);
>  _hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
>                                                   libxl__stream_read_state *stream);
> +_hidden void libxl__stream_read_colo_context(libxl__egc *egc,
> +                                             libxl__stream_read_state *stream);
>  _hidden void libxl__stream_read_abort(libxl__egc *egc,
>                                        libxl__stream_read_state *stream, int rc);
>  static inline bool
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index b924f05..ab47251 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -152,6 +152,13 @@ static void write_emulator_done(libxl__egc *egc,
>                                  libxl__datacopier_state *dc,
>                                  int rc, int onwrite, int errnoval);
>  
> +/* Handlers for colo context mini-loop */
> +static void handle_colo_context(libxl__egc *egc,
> +                                libxl__stream_read_state *stream,
> +                                libxl__sr_record_buf *rec);
> +static void colo_context_done(libxl__egc *egc,
> +                              libxl__stream_read_state *stream, int rc);
> +
>  /*----- Helpers -----*/
>  
>  /* Helper to set up reading some data from the stream. */
> @@ -569,6 +576,15 @@ static bool process_record(libxl__egc *egc,
>          checkpoint_done(egc, stream, 0);
>          break;
>  
> +    case REC_TYPE_COLO_CONTEXT:
> +        if (!stream->in_colo_context) {
> +            LOG(ERROR, "Unexpected COLO_CONTEXT record in stream");
> +            rc = ERROR_FAIL;
> +            goto err;
> +        }
> +        handle_colo_context(egc, stream, rec);
> +        break;
> +
>      default:
>          LOG(ERROR, "Unrecognised record 0x%08x", rec->hdr.type);
>          rc = ERROR_FAIL;
> @@ -678,6 +694,11 @@ static void stream_complete(libxl__egc *egc,
>          return;
>      }
>  
> +    if (stream->in_colo_context) {
> +        colo_context_done(egc, stream, rc);
> +        return;
> +    }
> +
>      if (!stream->rc)
>          stream->rc = rc;
>      stream_done(egc, stream);
> @@ -794,6 +815,36 @@ static void check_all_finished(libxl__egc *egc,
>      stream->completion_callback(egc, stream, stream->rc);
>  }
>  
> +/*----- COLO context handlers -----*/
> +
> +void libxl__stream_read_colo_context(libxl__egc *egc,
> +                                     libxl__stream_read_state *stream)

A name like this makes the erroneous assumption that a COLO\_CONTEXT
record is what is going to be found next in the stream.

Where and when is a COLO\_CONTEXT record expected, and is it only in the
backchannel?

> +{
> +    assert(stream->running);
> +    assert(!stream->in_checkpoint);
> +    assert(!stream->in_colo_context);
> +    stream->in_colo_context = true;
> +
> +    setup_read_record(egc, stream);
> +}
> +
> +static void handle_colo_context(libxl__egc *egc,
> +                                libxl__stream_read_state *stream,
> +                                libxl__sr_record_buf *rec)
> +{
> +    libxl_sr_colo_context *colo_context = rec->body;
> +
> +    colo_context_done(egc, stream, colo_context->id);

A handler this trivial should just be done in the switch statement in
process_record().  No need for its own function for a single forward call.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm
  2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
@ 2015-07-15 18:15   ` Andrew Cooper
  2015-07-16  7:56     ` Yang Hongyang
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2015-07-15 18:15 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 15/07/15 10:18, Yang Hongyang wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
>
> We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
> we need store mfn and console mfn when rebuilding secondary vm. So make
> restore_results a function pointer in callback struct and struct
> {save,restore}_callbacks, and use this callback to send store mfn and
> console mfn to xl.
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxc/include/xenguest.h     | 8 ++++++++
>  tools/libxc/xc_sr_restore.c        | 7 +++++--
>  tools/libxl/libxl_colo_restore.c   | 5 -----
>  tools/libxl/libxl_create.c         | 2 ++
>  tools/libxl/libxl_save_msgs_gen.pl | 2 +-
>  5 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 1e7e1bb..d7bdfb5 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -140,6 +140,14 @@ struct restore_callbacks {
>       */
>      int (*should_checkpoint)(void* data);
>  
> +    /*
> +     * callback to send store mfn and console mfn to xl
> +     * if we want to resume vm before xc_domain_save()
> +     * exits.
> +     */
> +    void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
> +                            void *data);

These need to be xen_pfn_t to be usable in arm.  Also, they are gfn's,
not mfn's

A whole lot of terminology in this area is wrong.  The top of
xen/include/xen/mm.h is the authoritative description of terms, starting
from c/s e758ed1

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-15 16:52   ` Andrew Cooper
@ 2015-07-16  6:32     ` Yang Hongyang
  2015-07-16  9:45       ` Andrew Cooper
  0 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  6:32 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 12:52 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> It is the negotiation record for COLO.
>> Primary->Secondary:
>> control_id      0x00000000: Secondary VM is out of sync, start a new checkpoint
>> Secondary->Primary:
>>                  0x00000001: Secondary VM is suspended
>>                  0x00000002: Secondary VM is ready
>>                  0x00000003: Secondary VM is resumed
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
>>   tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>>   tools/python/xen/migration/libxl.py      |  9 +++++++++
>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc
>> index c24a434..5986273 100644
>> --- a/docs/specs/libxl-migration-stream.pandoc
>> +++ b/docs/specs/libxl-migration-stream.pandoc
>> @@ -121,7 +121,9 @@ type         0x00000000: END
>>
>>                0x00000004: CHECKPOINT_END
>>
>> -             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
>> +             0x00000005: COLO_CONTEXT
>> +
>> +             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
>>                records.
>>
>>                0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
>> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a checkpoint in the image.
>>       +-------------------------------------------------+
>>
>>   The end record contains no fields; its body_length is 0.
>> +
>> +COLO\_CONTEXT
>> +--------------
>> +
>> +A COLO context record contains the control information for COLO.
>> +
>> +     0     1     2     3     4     5     6     7 octet
>> +    +------------------------+------------------------+
>> +    | control_id             | padding                |
>> +    +------------------------+------------------------+
>> +
>> +--------------------------------------------------------------------
>> +Field            Description
>> +------------     ---------------------------------------------------
>> +control_id       0x00000000: Secondary VM is out of sync, start a new checkpoint
>> +                 0x00000001: Secondary VM is suspended
>> +                 0x00000002: Secondary VM is ready
>> +                 0x00000003: Secondary VM is resumed
>
> This style of table in pandoc need to be terminated with a line of
> -------, just like the head of the table.

Ok

>
> Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
> associated blob of data, but this is not the case here.  Here, it is
> more of a status update, with expected actions on some states.

True, could you suggest a better name? sorry for my bad English...

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records
  2015-07-15 17:13   ` Andrew Cooper
@ 2015-07-16  7:18     ` Yang Hongyang
  0 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:18 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson


On 07/16/2015 01:13 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> Used by secondary to send it's dirty bitmap to primary under COLO.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   docs/specs/libxc-migration-stream.pandoc | 24 +++++++++++++++++++++++-
>>   tools/libxc/xc_sr_common.c               |  1 +
>>   tools/libxc/xc_sr_stream_format.h        |  1 +
>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
>> index 68fa513..480d357 100644
>> --- a/docs/specs/libxc-migration-stream.pandoc
>> +++ b/docs/specs/libxc-migration-stream.pandoc
>> @@ -227,7 +227,9 @@ type         0x00000000: END
>>
>>                0x0000000E: CHECKPOINT
>>
>> -             0x0000000F - 0x7FFFFFFF: Reserved for future _mandatory_
>> +             0x0000000F: DIRTY_BITMAP
>> +
>> +             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
>>                records.
>>
>>                0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
>> @@ -601,6 +603,26 @@ CHECKPOINT record or an END record.
>>
>>   \clearpage
>>
>> +DIRTY_BITMAP
>> +------------
>
> I would name this DIRTY_PFN_LIST or similar, as the content of data
> isn't actually a bitmap.

This sounds better, thanks!

>
>> +
>> +A dirty_bitmap record is used for secondary to send it's dirty bitmap
>> +to primary while doing a checkpoint under COLO. This record only exists
>> +in back channel.
>
> This section should purely be a description of the content.  i.e.
>
> "A DIRTY\_xxx record is used to convey information about dirty memory in
> the VM.  It is an unordered list of PFNs."

Ok.

>
>> +
>> +     0     1     2     3     4     5     6     7 octet
>> +    +-------------------------------------------------+
>> +    | pfn[0]                                          |
>> +    +-------------------------------------------------+
>> +    ...
>> +    +-------------------------------------------------+
>> +    | pfn[C-1]                                        |
>> +    +-------------------------------------------------+
>> +
>> +The count of the pfn is: record->length/sizeof(uint64_t).
>
> "The count of pfns is", although I would like to hope that this is
> obvious from the diagram.
>
> Down here, there should be more description of record circumstances,
> e.g. currently only applicable in the backchannel of a checkpointed stream.
>
> Also please put some validation logic for this in
> tools/python/xen/migration/libxc.py

Will do, thank you.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream
  2015-07-15 17:25   ` Andrew Cooper
@ 2015-07-16  7:21     ` Yang Hongyang
  0 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:21 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 01:25 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> Add back channel support to write stream. If the write stream is
>> a back channel stream, this means the write stream is used by
>> Secondary to send some records back.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl_dom_save.c     |  1 +
>>   tools/libxl/libxl_internal.h     |  1 +
>>   tools/libxl/libxl_stream_write.c | 16 ++++++++++++++++
>>   3 files changed, 18 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
>> index 9b7159f..25813ce 100644
>> --- a/tools/libxl/libxl_dom_save.c
>> +++ b/tools/libxl/libxl_dom_save.c
>> @@ -445,6 +445,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
>>       dss->sws.ao  = dss->ao;
>>       dss->sws.dss = dss;
>>       dss->sws.fd  = dss->fd;
>> +    dss->sws.back_channel = false;
>>       dss->sws.completion_callback = stream_done;
>>
>>       libxl__stream_write_start(egc, &dss->sws);
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 9c81d8d..a83d6a5 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -2989,6 +2989,7 @@ struct libxl__stream_write_state {
>>       libxl__ao *ao;
>>       libxl__domain_save_state *dss;
>>       int fd;
>> +    bool back_channel;
>>       void (*completion_callback)(libxl__egc *egc,
>>                                   libxl__stream_write_state *sws,
>>                                   int rc);
>> diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
>> index 16f667a..df55277 100644
>> --- a/tools/libxl/libxl_stream_write.c
>> +++ b/tools/libxl/libxl_stream_write.c
>> @@ -47,6 +47,13 @@
>>    *  - Toolstack record
>>    *  - if (hvm), Qemu record
>>    *  - Checkpoint end record
>> + *
>> + * For back channel stream:
>> + * - libxl__stream_write_start()
>> + *    - Set up the stream to running state
>> + *
>> + * - Add a new API to write the record. When the record is written
>> + *   out, call stream->checkpoint_callback() to return.
>>    */
>>
>>   /* Success/error/cleanup handling. */
>> @@ -178,6 +185,9 @@ void libxl__stream_write_start(libxl__egc *egc,
>>
>>       stream->running = true;
>>
>> +    if (stream->back_channel)
>> +        return;
>> +
>
> Some of the setup below should not be skipped.
>
> While it makes the diff bigger, the end result would be more logical as
>
> dc->ao = ao;
> dc->readfd = -1;
> dc->writefd = stream->fd;
> dc->maxsz = -1;
>
> if (!stream->back_channel) {
>      /* Write the stream header. */
>      dc->writewhat = "save/migration stream";
>      dc->callback = stream_header_done;
>
>      rc = ibxl__datacopier_start(dc);
>      ....
> }
>
> To split the object setup from the action of sending the stream header,
> which are currently mixed.
>
> What happened to the backchannel negotiation header?

There's no negotiation header currently...If it is a broken back channel,
COLO will fail.

>
> ~Andrew
>
>>       dc->ao        = ao;
>>       dc->readfd    = -1;
>>       dc->writewhat = "save/migration stream";
>> @@ -207,6 +217,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc,
>>   {
>>       assert(stream->running);
>>       assert(!stream->in_checkpoint);
>> +    assert(!stream->back_channel);
>>       stream->in_checkpoint = true;
>>
>>       write_toolstack_record(egc, stream);
>> @@ -500,6 +511,11 @@ static void stream_done(libxl__egc *egc,
>>       assert(stream->running);
>>       stream->running = false;
>>
>> +    if (stream->back_channel) {
>> +        stream->completion_callback(egc, stream, stream->rc);
>> +        return;
>> +    }
>> +
>>       if (stream->emu_carefd)
>>           libxl__carefd_close(stream->emu_carefd);
>>       free(stream->emu_body);
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream
  2015-07-15 17:35   ` Andrew Cooper
@ 2015-07-16  7:24     ` Yang Hongyang
  0 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:24 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 01:35 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> write colo_context records into the stream, used by both
>> primary and secondary to send colo context.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl_internal.h     |  5 +++
>>   tools/libxl/libxl_stream_write.c | 87 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 92 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index a83d6a5..2634836 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -3000,6 +3000,7 @@ struct libxl__stream_write_state {
>>       int rc;
>>       bool running;
>>       bool in_checkpoint;
>> +    bool in_colo_context;
>>       libxl__save_helper_state shs;
>>
>>       /* Main stream-writing data. */
>> @@ -3019,6 +3020,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc,
>>   _hidden void
>>   libxl__stream_write_start_checkpoint(libxl__egc *egc,
>>                                        libxl__stream_write_state *stream);
>> +_hidden void
>> +libxl__stream_write_colo_context(libxl__egc *egc,
>> +                                 libxl__stream_write_state *stream,
>> +                                 libxl_sr_colo_context *colo_context);
>>   _hidden void libxl__stream_write_abort(libxl__egc *egc,
>>                                          libxl__stream_write_state *stream,
>>                                          int rc);
>> diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
>> index df55277..e7a32c4 100644
>> --- a/tools/libxl/libxl_stream_write.c
>> +++ b/tools/libxl/libxl_stream_write.c
>> @@ -96,6 +96,16 @@ static void write_checkpoint_end_record(libxl__egc *egc,
>>   static void checkpoint_end_record_done(libxl__egc *egc,
>>                                          libxl__stream_write_state *stream);
>>
>> +/* COLO context */
>> +static void write_colo_context(libxl__egc *egc,
>> +                               libxl__stream_write_state *stream,
>> +                               libxl_sr_colo_context *colo_context);
>> +static void write_colo_context_done(libxl__egc *egc,
>> +                                    libxl__datacopier_state *dc,
>> +                                    int rc, int onwrite, int errnoval);
>> +static void colo_context_done(libxl__egc *egc,
>> +                              libxl__stream_write_state *stream, int rc);
>> +
>>   /*----- Helpers -----*/
>>
>>   static void write_done(libxl__egc *egc,
>> @@ -500,6 +510,11 @@ static void stream_complete(libxl__egc *egc,
>>           return;
>>       }
>>
>> +    if (stream->in_colo_context) {
>> +        colo_context_done(egc, stream, rc);
>> +        return;
>> +    }
>
> Please follow the same style as stream->in_checkpoint, asserting(rc) and
> explaining how the error comes back around.
>
>> +
>>       if (!stream->rc)
>>           stream->rc = rc;
>>       stream_done(egc, stream);
>> @@ -555,6 +570,78 @@ static void check_all_finished(libxl__egc *egc,
>>       stream->completion_callback(egc, stream, stream->rc);
>>   }
>>
>> +/*----- COLO context -----*/
>> +void libxl__stream_write_colo_context(libxl__egc *egc,
>> +                                      libxl__stream_write_state *stream,
>> +                                      libxl_sr_colo_context *colo_context)
>> +{
>> +    assert(stream->running);
>> +    assert(!stream->in_checkpoint);
>> +    assert(!stream->in_colo_context);
>> +    stream->in_colo_context = true;
>> +
>> +    write_colo_context(egc, stream, colo_context);
>
> Use setup_write() here, which will remove most of the rest of this
> patch, and cover all your error handling for your.

setup_write() was introduced from your v4 series I think? so I missed
that part when rebasing...

>
> You want a small pair of functions such as the write_checkpoint_end()
> pair.  See write_toolstack_record() for an example using setup_write()
> with a body.
>
> Also, to preempt Ian Jacksons review, use FILLZERO() over "= { ... }"

Ok, thank you for the kindly explaination, will address this in the
next version.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream
  2015-07-15 17:38   ` Andrew Cooper
@ 2015-07-16  7:25     ` Yang Hongyang
  0 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:25 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 07/16/2015 01:38 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> This is used by primay to read records sent by secondary.
>>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxl/libxl_create.c      |  1 +
>>   tools/libxl/libxl_internal.h    |  1 +
>>   tools/libxl/libxl_stream_read.c | 17 +++++++++++++++++
>>   3 files changed, 19 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index 1d4b13b..1af7103 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -978,6 +978,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>>       dcs->srs.dcs = dcs;
>>       dcs->srs.fd = restore_fd;
>>       dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
>> +    dcs->srs.back_channel = false;
>>       dcs->srs.completion_callback = domcreate_stream_done;
>>
>>       libxl__stream_read_start(egc, &dcs->srs);
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 2634836..05cee04 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -3358,6 +3358,7 @@ struct libxl__stream_read_state {
>>       libxl__domain_create_state *dcs;
>>       int fd;
>>       bool legacy;
>> +    bool back_channel;
>>       void (*completion_callback)(libxl__egc *egc,
>>                                   libxl__stream_read_state *srs,
>>                                   int rc);
>> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
>> index 2d17403..b924f05 100644
>> --- a/tools/libxl/libxl_stream_read.c
>> +++ b/tools/libxl/libxl_stream_read.c
>> @@ -104,6 +104,15 @@
>>    * Depending on the contents of the stream, there are likely to be several
>>    * parallel tasks being managed.  check_all_finished() is used to join all
>>    * tasks in both success and error cases.
>> + *
>> + * For back channel stream:
>> + * - libxl__stream_read_start()
>> + *    - Set up the stream to running state
>> + *
>> + * - libxl__stream_read_continue()
>> + *     - Set up reading the next record from a started stream.
>> + *       Add some codes to process_record() to handle the record.
>> + *       Then call stream->checkpoint_callback() to return.
>>    */
>>
>>   /* Success/error/cleanup handling. */
>> @@ -200,6 +209,9 @@ void libxl__stream_read_start(libxl__egc *egc,
>>       stream->running = true;
>>       stream->phase   = SRS_PHASE_NORMAL;
>>
>> +    if (stream->back_channel)
>> +        return;
>> +
>>       if (stream->legacy) {
>>           /* Convert the legacy stream. */
>>           libxl__conversion_helper_state *chs = &stream->chs;
>> @@ -700,6 +712,11 @@ static void stream_done(libxl__egc *egc,
>>       assert(!stream->in_checkpoint);
>>       stream->running = false;
>>
>> +    if (stream->back_channel) {
>> +        stream->completion_callback(egc, stream, stream->rc);
>> +        return;
>> +    }
>> +
>
> This should be in stream_complete() not stream_done().  stream_done() is
> strictly called once, and cleans stuff up.

Ok, will look into the code closely.

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 read stream
  2015-07-15 17:44   ` Andrew Cooper
@ 2015-07-16  7:52     ` Yang Hongyang
  0 siblings, 0 replies; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:52 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 01:44 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
[...]
>> +
>>       if (!stream->rc)
>>           stream->rc = rc;
>>       stream_done(egc, stream);
>> @@ -794,6 +815,36 @@ static void check_all_finished(libxl__egc *egc,
>>       stream->completion_callback(egc, stream, stream->rc);
>>   }
>>
>> +/*----- COLO context handlers -----*/
>> +
>> +void libxl__stream_read_colo_context(libxl__egc *egc,
>> +                                     libxl__stream_read_state *stream)
>
> A name like this makes the erroneous assumption that a COLO\_CONTEXT
> record is what is going to be found next in the stream.
>
> Where and when is a COLO\_CONTEXT record expected, and is it only in the
> backchannel?

A COLO CONTEXT is actually a set of control command (I'm not sure command
here is a suitable description) which used to sync the checkpoint steps between
primary and secondary. So it is not only in the back channel.

control_id       0x00000000: Secondary VM is out of sync, start a new checkpoint
                  0x00000001: Secondary VM is suspended
                  0x00000002: Secondary VM is ready
                  0x00000003: Secondary VM is resumed

First boot:
When doing COLO, primary must be start with -p, then start COLO, the
first step is live migration, after migration, when secondary is ready,
we will resume both side.


   control_id      Primary                 Secondary
                   start with -p
                   live migrate
                                           Receive&load state
   0x00000002                              @
                   Resume                  Resume
   0x00000003                              @
                   Start Comparing Packets


At checkpoint:

   control_id      Primary                 Secondary
   0x00000000      @
                                           Suspend
   0x00000001                              @
                   Suspend
                   Send state              Receive state
                   Flush Network           Load state
                   Resume                  Resume
   0x00000003                              @
                   Start Comparing Packets

NOTE:
  1) '@' who sends the message
  2) Every sync-point is synchronized by two sides with only
     one handshake(single direction) for low-latency.
     If more strict synchronization is required, a opposite direction
     sync-point should be added.
  3) Since sync-points are single direction, the remote side may
     go forward a lot when this side just receives the sync-point.

>
>> +{
>> +    assert(stream->running);
>> +    assert(!stream->in_checkpoint);
>> +    assert(!stream->in_colo_context);
>> +    stream->in_colo_context = true;
>> +
>> +    setup_read_record(egc, stream);
>> +}
>> +
>> +static void handle_colo_context(libxl__egc *egc,
>> +                                libxl__stream_read_state *stream,
>> +                                libxl__sr_record_buf *rec)
>> +{
>> +    libxl_sr_colo_context *colo_context = rec->body;
>> +
>> +    colo_context_done(egc, stream, colo_context->id);
>
> A handler this trivial should just be done in the switch statement in
> process_record().  No need for its own function for a single forward call.

Ok

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm
  2015-07-15 18:15   ` Andrew Cooper
@ 2015-07-16  7:56     ` Yang Hongyang
  2015-07-16  9:49       ` Andrew Cooper
  0 siblings, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16  7:56 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 02:15 AM, Andrew Cooper wrote:
> On 15/07/15 10:18, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
>> we need store mfn and console mfn when rebuilding secondary vm. So make
>> restore_results a function pointer in callback struct and struct
>> {save,restore}_callbacks, and use this callback to send store mfn and
>> console mfn to xl.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>   tools/libxc/include/xenguest.h     | 8 ++++++++
>>   tools/libxc/xc_sr_restore.c        | 7 +++++--
>>   tools/libxl/libxl_colo_restore.c   | 5 -----
>>   tools/libxl/libxl_create.c         | 2 ++
>>   tools/libxl/libxl_save_msgs_gen.pl | 2 +-
>>   5 files changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
>> index 1e7e1bb..d7bdfb5 100644
>> --- a/tools/libxc/include/xenguest.h
>> +++ b/tools/libxc/include/xenguest.h
>> @@ -140,6 +140,14 @@ struct restore_callbacks {
>>        */
>>       int (*should_checkpoint)(void* data);
>>
>> +    /*
>> +     * callback to send store mfn and console mfn to xl
>> +     * if we want to resume vm before xc_domain_save()
>> +     * exits.
>> +     */
>> +    void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
>> +                            void *data);
>
> These need to be xen_pfn_t to be usable in arm.  Also, they are gfn's,
> not mfn's
>
> A whole lot of terminology in this area is wrong.  The top of
> xen/include/xen/mm.h is the authoritative description of terms, starting
> from c/s e758ed1

the existing restore_results seems unchanged, still unsigned long?
see tools/libxl/libxl_save_helper.c helper_stub_restore_results

>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-16  6:32     ` Yang Hongyang
@ 2015-07-16  9:45       ` Andrew Cooper
  2015-07-16  9:47         ` Andrew Cooper
  2015-07-16 10:11         ` Yang Hongyang
  0 siblings, 2 replies; 46+ messages in thread
From: Andrew Cooper @ 2015-07-16  9:45 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 16/07/15 07:32, Yang Hongyang wrote:
>
>
> On 07/16/2015 12:52 AM, Andrew Cooper wrote:
>> On 15/07/15 10:18, Yang Hongyang wrote:
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> It is the negotiation record for COLO.
>>> Primary->Secondary:
>>> control_id      0x00000000: Secondary VM is out of sync, start a new
>>> checkpoint
>>> Secondary->Primary:
>>>                  0x00000001: Secondary VM is suspended
>>>                  0x00000002: Secondary VM is ready
>>>                  0x00000003: Secondary VM is resumed
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> ---
>>>   docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
>>>   tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>>>   tools/python/xen/migration/libxl.py      |  9 +++++++++
>>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/docs/specs/libxl-migration-stream.pandoc
>>> b/docs/specs/libxl-migration-stream.pandoc
>>> index c24a434..5986273 100644
>>> --- a/docs/specs/libxl-migration-stream.pandoc
>>> +++ b/docs/specs/libxl-migration-stream.pandoc
>>> @@ -121,7 +121,9 @@ type         0x00000000: END
>>>
>>>                0x00000004: CHECKPOINT_END
>>>
>>> -             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
>>> +             0x00000005: COLO_CONTEXT
>>> +
>>> +             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
>>>                records.
>>>
>>>                0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
>>> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a
>>> checkpoint in the image.
>>>       +-------------------------------------------------+
>>>
>>>   The end record contains no fields; its body_length is 0.
>>> +
>>> +COLO\_CONTEXT
>>> +--------------
>>> +
>>> +A COLO context record contains the control information for COLO.
>>> +
>>> +     0     1     2     3     4     5     6     7 octet
>>> +    +------------------------+------------------------+
>>> +    | control_id             | padding                |
>>> +    +------------------------+------------------------+
>>> +
>>> +--------------------------------------------------------------------
>>> +Field            Description
>>> +------------     ---------------------------------------------------
>>> +control_id       0x00000000: Secondary VM is out of sync, start a
>>> new checkpoint
>>> +                 0x00000001: Secondary VM is suspended
>>> +                 0x00000002: Secondary VM is ready
>>> +                 0x00000003: Secondary VM is resumed
>>
>> This style of table in pandoc need to be terminated with a line of
>> -------, just like the head of the table.
>
> Ok
>
>>
>> Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
>> associated blob of data, but this is not the case here.  Here, it is
>> more of a status update, with expected actions on some states.
>
> True, could you suggest a better name? sorry for my bad English...

In hindsight, I would also avoid putting COLO in the name.

How about CHECKPOINT_SECONDARY_STATE ?

You also want to note that this record should currently only be found in
the libxl backchannel.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-16  9:45       ` Andrew Cooper
@ 2015-07-16  9:47         ` Andrew Cooper
  2015-07-16 10:11         ` Yang Hongyang
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2015-07-16  9:47 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 16/07/15 10:45, Andrew Cooper wrote:
> On 16/07/15 07:32, Yang Hongyang wrote:
>>
>> On 07/16/2015 12:52 AM, Andrew Cooper wrote:
>>> On 15/07/15 10:18, Yang Hongyang wrote:
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>
>>>> It is the negotiation record for COLO.
>>>> Primary->Secondary:
>>>> control_id      0x00000000: Secondary VM is out of sync, start a new
>>>> checkpoint
>>>> Secondary->Primary:
>>>>                  0x00000001: Secondary VM is suspended
>>>>                  0x00000002: Secondary VM is ready
>>>>                  0x00000003: Secondary VM is resumed
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> ---
>>>>   docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
>>>>   tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>>>>   tools/python/xen/migration/libxl.py      |  9 +++++++++
>>>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/docs/specs/libxl-migration-stream.pandoc
>>>> b/docs/specs/libxl-migration-stream.pandoc
>>>> index c24a434..5986273 100644
>>>> --- a/docs/specs/libxl-migration-stream.pandoc
>>>> +++ b/docs/specs/libxl-migration-stream.pandoc
>>>> @@ -121,7 +121,9 @@ type         0x00000000: END
>>>>
>>>>                0x00000004: CHECKPOINT_END
>>>>
>>>> -             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
>>>> +             0x00000005: COLO_CONTEXT
>>>> +
>>>> +             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
>>>>                records.
>>>>
>>>>                0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
>>>> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a
>>>> checkpoint in the image.
>>>>       +-------------------------------------------------+
>>>>
>>>>   The end record contains no fields; its body_length is 0.
>>>> +
>>>> +COLO\_CONTEXT
>>>> +--------------
>>>> +
>>>> +A COLO context record contains the control information for COLO.
>>>> +
>>>> +     0     1     2     3     4     5     6     7 octet
>>>> +    +------------------------+------------------------+
>>>> +    | control_id             | padding                |
>>>> +    +------------------------+------------------------+
>>>> +
>>>> +--------------------------------------------------------------------
>>>> +Field            Description
>>>> +------------     ---------------------------------------------------
>>>> +control_id       0x00000000: Secondary VM is out of sync, start a
>>>> new checkpoint
>>>> +                 0x00000001: Secondary VM is suspended
>>>> +                 0x00000002: Secondary VM is ready
>>>> +                 0x00000003: Secondary VM is resumed
>>> This style of table in pandoc need to be terminated with a line of
>>> -------, just like the head of the table.
>> Ok
>>
>>> Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
>>> associated blob of data, but this is not the case here.  Here, it is
>>> more of a status update, with expected actions on some states.
>> True, could you suggest a better name? sorry for my bad English...
> In hindsight, I would also avoid putting COLO in the name.
>
> How about CHECKPOINT_SECONDARY_STATE ?
>
> You also want to note that this record should currently only be found in
> the libxl backchannel.

Sorry - ignore this final sentence.  I have just read another of your
replies.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm
  2015-07-16  7:56     ` Yang Hongyang
@ 2015-07-16  9:49       ` Andrew Cooper
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2015-07-16  9:49 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 16/07/15 08:56, Yang Hongyang wrote:
>
>
> On 07/16/2015 02:15 AM, Andrew Cooper wrote:
>> On 15/07/15 10:18, Yang Hongyang wrote:
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> We will call libxl__xc_domain_restore_done() to rebuild secondary
>>> vm. But
>>> we need store mfn and console mfn when rebuilding secondary vm. So make
>>> restore_results a function pointer in callback struct and struct
>>> {save,restore}_callbacks, and use this callback to send store mfn and
>>> console mfn to xl.
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>   tools/libxc/include/xenguest.h     | 8 ++++++++
>>>   tools/libxc/xc_sr_restore.c        | 7 +++++--
>>>   tools/libxl/libxl_colo_restore.c   | 5 -----
>>>   tools/libxl/libxl_create.c         | 2 ++
>>>   tools/libxl/libxl_save_msgs_gen.pl | 2 +-
>>>   5 files changed, 16 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/tools/libxc/include/xenguest.h
>>> b/tools/libxc/include/xenguest.h
>>> index 1e7e1bb..d7bdfb5 100644
>>> --- a/tools/libxc/include/xenguest.h
>>> +++ b/tools/libxc/include/xenguest.h
>>> @@ -140,6 +140,14 @@ struct restore_callbacks {
>>>        */
>>>       int (*should_checkpoint)(void* data);
>>>
>>> +    /*
>>> +     * callback to send store mfn and console mfn to xl
>>> +     * if we want to resume vm before xc_domain_save()
>>> +     * exits.
>>> +     */
>>> +    void (*restore_results)(unsigned long store_mfn, unsigned long
>>> console_mfn,
>>> +                            void *data);
>>
>> These need to be xen_pfn_t to be usable in arm.  Also, they are gfn's,
>> not mfn's
>>
>> A whole lot of terminology in this area is wrong.  The top of
>> xen/include/xen/mm.h is the authoritative description of terms, starting
>> from c/s e758ed1
>
> the existing restore_results seems unchanged, still unsigned long?
> see tools/libxl/libxl_save_helper.c helper_stub_restore_results

I have this

http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=commitdiff;h=2f91b4d39fd0bbbd3e2b0d3f451250d154505f9d

in my cleanup series, which I can extend all the way up the
helper_stub_restore_results() chain.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-16  9:45       ` Andrew Cooper
  2015-07-16  9:47         ` Andrew Cooper
@ 2015-07-16 10:11         ` Yang Hongyang
  2015-07-16 10:20           ` Andrew Cooper
  1 sibling, 1 reply; 46+ messages in thread
From: Yang Hongyang @ 2015-07-16 10:11 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson



On 07/16/2015 05:45 PM, Andrew Cooper wrote:
> On 16/07/15 07:32, Yang Hongyang wrote:
>>
>>
>> On 07/16/2015 12:52 AM, Andrew Cooper wrote:
>>> On 15/07/15 10:18, Yang Hongyang wrote:
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>
>>>> It is the negotiation record for COLO.
>>>> Primary->Secondary:
>>>> control_id      0x00000000: Secondary VM is out of sync, start a new
>>>> checkpoint
>>>> Secondary->Primary:
>>>>                   0x00000001: Secondary VM is suspended
>>>>                   0x00000002: Secondary VM is ready
>>>>                   0x00000003: Secondary VM is resumed
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>> ---
>>>>    docs/specs/libxl-migration-stream.pandoc | 22 +++++++++++++++++++++-
>>>>    tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>>>>    tools/python/xen/migration/libxl.py      |  9 +++++++++
>>>>    3 files changed, 41 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/docs/specs/libxl-migration-stream.pandoc
>>>> b/docs/specs/libxl-migration-stream.pandoc
>>>> index c24a434..5986273 100644
>>>> --- a/docs/specs/libxl-migration-stream.pandoc
>>>> +++ b/docs/specs/libxl-migration-stream.pandoc
>>>> @@ -121,7 +121,9 @@ type         0x00000000: END
>>>>
>>>>                 0x00000004: CHECKPOINT_END
>>>>
>>>> -             0x00000005 - 0x7FFFFFFF: Reserved for future _mandatory_
>>>> +             0x00000005: COLO_CONTEXT
>>>> +
>>>> +             0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_
>>>>                 records.
>>>>
>>>>                 0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
>>>> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a
>>>> checkpoint in the image.
>>>>        +-------------------------------------------------+
>>>>
>>>>    The end record contains no fields; its body_length is 0.
>>>> +
>>>> +COLO\_CONTEXT
>>>> +--------------
>>>> +
>>>> +A COLO context record contains the control information for COLO.
>>>> +
>>>> +     0     1     2     3     4     5     6     7 octet
>>>> +    +------------------------+------------------------+
>>>> +    | control_id             | padding                |
>>>> +    +------------------------+------------------------+
>>>> +
>>>> +--------------------------------------------------------------------
>>>> +Field            Description
>>>> +------------     ---------------------------------------------------
>>>> +control_id       0x00000000: Secondary VM is out of sync, start a
>>>> new checkpoint
>>>> +                 0x00000001: Secondary VM is suspended
>>>> +                 0x00000002: Secondary VM is ready
>>>> +                 0x00000003: Secondary VM is resumed
>>>
>>> This style of table in pandoc need to be terminated with a line of
>>> -------, just like the head of the table.
>>
>> Ok
>>
>>>
>>> Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
>>> associated blob of data, but this is not the case here.  Here, it is
>>> more of a status update, with expected actions on some states.
>>
>> True, could you suggest a better name? sorry for my bad English...
>
> In hindsight, I would also avoid putting COLO in the name.
>
> How about CHECKPOINT_SECONDARY_STATE ?

 From my another mail, I explained the COLO_CONTEXT, do you still
suggest CHECKPOINT_SECONDARY_STATE? IMO, it's more like a sync
command?

>
> You also want to note that this record should currently only be found in
> the libxl backchannel.
>
> ~Andrew
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
  2015-07-16 10:11         ` Yang Hongyang
@ 2015-07-16 10:20           ` Andrew Cooper
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2015-07-16 10:20 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: wei.liu2, ian.campbell, wency, guijianfeng, yunhong.jiang,
	eddie.dong, rshriram, ian.jackson

On 16/07/15 11:11, Yang Hongyang wrote:
>
>
> On 07/16/2015 05:45 PM, Andrew Cooper wrote:
>> On 16/07/15 07:32, Yang Hongyang wrote:
>>>
>>>
>>> On 07/16/2015 12:52 AM, Andrew Cooper wrote:
>>>> On 15/07/15 10:18, Yang Hongyang wrote:
>>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>>
>>>>> It is the negotiation record for COLO.
>>>>> Primary->Secondary:
>>>>> control_id      0x00000000: Secondary VM is out of sync, start a new
>>>>> checkpoint
>>>>> Secondary->Primary:
>>>>>                   0x00000001: Secondary VM is suspended
>>>>>                   0x00000002: Secondary VM is ready
>>>>>                   0x00000003: Secondary VM is resumed
>>>>>
>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>>>> ---
>>>>>    docs/specs/libxl-migration-stream.pandoc | 22
>>>>> +++++++++++++++++++++-
>>>>>    tools/libxl/libxl_sr_stream_format.h     | 11 +++++++++++
>>>>>    tools/python/xen/migration/libxl.py      |  9 +++++++++
>>>>>    3 files changed, 41 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/docs/specs/libxl-migration-stream.pandoc
>>>>> b/docs/specs/libxl-migration-stream.pandoc
>>>>> index c24a434..5986273 100644
>>>>> --- a/docs/specs/libxl-migration-stream.pandoc
>>>>> +++ b/docs/specs/libxl-migration-stream.pandoc
>>>>> @@ -121,7 +121,9 @@ type         0x00000000: END
>>>>>
>>>>>                 0x00000004: CHECKPOINT_END
>>>>>
>>>>> -             0x00000005 - 0x7FFFFFFF: Reserved for future
>>>>> _mandatory_
>>>>> +             0x00000005: COLO_CONTEXT
>>>>> +
>>>>> +             0x00000006 - 0x7FFFFFFF: Reserved for future
>>>>> _mandatory_
>>>>>                 records.
>>>>>
>>>>>                 0x80000000 - 0xFFFFFFFF: Reserved for future
>>>>> _optional_
>>>>> @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a
>>>>> checkpoint in the image.
>>>>>        +-------------------------------------------------+
>>>>>
>>>>>    The end record contains no fields; its body_length is 0.
>>>>> +
>>>>> +COLO\_CONTEXT
>>>>> +--------------
>>>>> +
>>>>> +A COLO context record contains the control information for COLO.
>>>>> +
>>>>> +     0     1     2     3     4     5     6     7 octet
>>>>> +    +------------------------+------------------------+
>>>>> +    | control_id             | padding                |
>>>>> +    +------------------------+------------------------+
>>>>> +
>>>>> +--------------------------------------------------------------------
>>>>> +Field            Description
>>>>> +------------     ---------------------------------------------------
>>>>> +control_id       0x00000000: Secondary VM is out of sync, start a
>>>>> new checkpoint
>>>>> +                 0x00000001: Secondary VM is suspended
>>>>> +                 0x00000002: Secondary VM is ready
>>>>> +                 0x00000003: Secondary VM is resumed
>>>>
>>>> This style of table in pandoc need to be terminated with a line of
>>>> -------, just like the head of the table.
>>>
>>> Ok
>>>
>>>>
>>>> Also, I wonder at the name "COLO_CONTEXT".  CONTEXT implies an
>>>> associated blob of data, but this is not the case here.  Here, it is
>>>> more of a status update, with expected actions on some states.
>>>
>>> True, could you suggest a better name? sorry for my bad English...
>>
>> In hindsight, I would also avoid putting COLO in the name.
>>
>> How about CHECKPOINT_SECONDARY_STATE ?
>
> From my another mail, I explained the COLO_CONTEXT, do you still
> suggest CHECKPOINT_SECONDARY_STATE? IMO, it's more like a sync
> command?

Right, but what it contains is information concerning the current state
of the secondary VM.

The real question is how you see this record being extended with further
control_id's in the future.

(naming stuff is hard :( )

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-07-16 10:20 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15  9:18 [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams Yang Hongyang
2015-07-15 16:52   ` Andrew Cooper
2015-07-16  6:32     ` Yang Hongyang
2015-07-16  9:45       ` Andrew Cooper
2015-07-16  9:47         ` Andrew Cooper
2015-07-16 10:11         ` Yang Hongyang
2015-07-16 10:20           ` Andrew Cooper
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records Yang Hongyang
2015-07-15 17:13   ` Andrew Cooper
2015-07-16  7:18     ` Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use Yang Hongyang
2015-07-15 17:14   ` Andrew Cooper
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream Yang Hongyang
2015-07-15 17:25   ` Andrew Cooper
2015-07-16  7:21     ` Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream Yang Hongyang
2015-07-15 17:35   ` Andrew Cooper
2015-07-16  7:24     ` Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream Yang Hongyang
2015-07-15 17:38   ` Andrew Cooper
2015-07-16  7:25     ` Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 " Yang Hongyang
2015-07-15 17:44   ` Andrew Cooper
2015-07-16  7:52     ` Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 12/25] primary " Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
2015-07-15 18:15   ` Andrew Cooper
2015-07-16  7:56     ` Yang Hongyang
2015-07-16  9:49       ` Andrew Cooper
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side Yang Hongyang
2015-07-15  9:18 ` [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy Yang Hongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.