From: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
To: xen devel <xen-devel@lists.xen.org>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Wei Liu <wei.liu2@citrix.com>
Cc: Lars Kurth <lars.kurth@citrix.com>,
Changlong Xie <xiecl.fnst@cn.fujitsu.com>,
Wen Congyang <wency@cn.fujitsu.com>,
Li Zhijian <lizhijian@cn.fujitsu.com>,
Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
Jiang Yunhong <yunhong.jiang@intel.com>,
Dong Eddie <eddie.dong@intel.com>,
Anthony Perard <anthony.perard@citrix.com>,
Shriram Rajagopalan <rshriram@cs.ubc.ca>,
Yang Hongyang <hongyang.yang@easystack.cn>
Subject: [PATCH v13 17/26] implement the cmdline for COLO
Date: Fri, 25 Mar 2016 14:44:24 +0800 [thread overview]
Message-ID: <1458888273-7469-18-git-send-email-xiecl.fnst@cn.fujitsu.com> (raw)
In-Reply-To: <1458888273-7469-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
From: Wen Congyang <wency@cn.fujitsu.com>
Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.
Update man pages to reflect the addition of a new option to
'xl remus' command.
Also add a new option --colo to the internal command 'xl migrate-receive'.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
docs/man/xl.pod.1 | 13 ++++++++--
tools/libxl/libxl.c | 22 ++++++++++++++--
tools/libxl/libxl_create.c | 1 -
tools/libxl/xl_cmdimpl.c | 65 +++++++++++++++++++++++++++++++++++-----------
tools/libxl/xl_cmdtable.c | 4 ++-
5 files changed, 84 insertions(+), 21 deletions(-)
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index dc6213e..a992a45 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,12 +447,16 @@ Print huge (!) amount of debug during the migration process.
=item B<remus> [I<OPTIONS>] I<domain-id> I<host>
-Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
-mechanism between the two hosts.
+Enable Remus HA or COLO HA for domain. By default B<xl> relies on ssh as a
+transport mechanism between the two hosts.
N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
Disk replication support is limited to DRBD disks.
+ COLO support in xl is still in experimental (proof-of-concept) phase.
+ There is no support for network or disk, so the guest will corrupt its
+ disk and confuse its network peers at the moment.
+
B<OPTIONS>
=over 4
@@ -498,6 +502,11 @@ Disable network output buffering. Requires enabling unsafe mode.
Disable disk replication. Requires enabling unsafe mode.
+=item B<-c>
+
+Enable COLO HA. This conflicts with B<-i> and B<-b>, and memory
+checkpoint compression must be disabled.
+
=back
=item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 272c6a5..349a3c6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -848,12 +848,27 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
goto out;
}
+ /* The caller must set this defbool */
+ if (libxl_defbool_is_default(info->colo)) {
+ LOG(ERROR, "colo mode must be enabled/disabled");
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
libxl_defbool_setdefault(&info->allow_unsafe, false);
libxl_defbool_setdefault(&info->blackhole, false);
- libxl_defbool_setdefault(&info->compression, true);
+ libxl_defbool_setdefault(&info->compression,
+ !libxl_defbool_val(info->colo));
libxl_defbool_setdefault(&info->netbuf, true);
libxl_defbool_setdefault(&info->diskbuf, true);
+ if (libxl_defbool_val(info->colo) &&
+ libxl_defbool_val(info->compression)) {
+ LOG(ERROR, "cannot use memory checkpoint compression in COLO mode");
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
if (!libxl_defbool_val(info->allow_unsafe) &&
(libxl_defbool_val(info->blackhole) ||
!libxl_defbool_val(info->netbuf) ||
@@ -875,7 +890,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
dss->live = 1;
dss->debug = 0;
dss->remus = info;
- dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
+ if (libxl_defbool_val(info->colo))
+ dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_COLO;
+ else
+ dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
assert(info);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d6c794e..be604e5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1893,7 +1893,6 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
const libxl_asyncop_how *ao_how,
const libxl_asyncprogress_how *aop_console_how)
{
- assert(send_back_fd == -1);
return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
params, ao_how, aop_console_how);
}
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 2e64f44..25bd81a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4740,6 +4740,8 @@ static void migrate_receive(int debug, int daemonize, int monitor,
char rc_buf;
char *migration_domname;
struct domain_create dom_info;
+ const char *ha = checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO ?
+ "COLO" : "Remus";
signal(SIGPIPE, SIG_IGN);
/* if we get SIGPIPE we'd rather just have it as an error */
@@ -4757,7 +4759,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
dom_info.monitor = monitor;
dom_info.paused = 1;
dom_info.migrate_fd = recv_fd;
- dom_info.send_back_fd = -1;
+ dom_info.send_back_fd = send_fd;
dom_info.migration_domname_r = &migration_domname;
dom_info.checkpointed_stream = checkpointed;
@@ -4772,11 +4774,12 @@ static void migrate_receive(int debug, int daemonize, int monitor,
switch (checkpointed) {
case LIBXL_CHECKPOINTED_STREAM_REMUS:
+ case LIBXL_CHECKPOINTED_STREAM_COLO:
/* If we are here, it means that the sender (primary) has crashed.
* TODO: Split-Brain Check.
*/
- fprintf(stderr, "migration target: Remus Failover for domain %u\n",
- domid);
+ fprintf(stderr, "migration target: %s Failover for domain %u\n",
+ ha, domid);
/*
* If domain renaming fails, lets just continue (as we need the domain
@@ -4792,16 +4795,20 @@ static void migrate_receive(int debug, int daemonize, int monitor,
rc = libxl_domain_rename(ctx, domid, migration_domname,
common_domname);
if (rc)
- fprintf(stderr, "migration target (Remus): "
+ fprintf(stderr, "migration target (%s): "
"Failed to rename domain from %s to %s:%d\n",
- migration_domname, common_domname, rc);
+ ha, migration_domname, common_domname, rc);
}
+ if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
+ /* The guest is running after failover in COLO mode */
+ exit(rc ? -ERROR_FAIL: 0);
+
rc = libxl_domain_unpause(ctx, domid);
if (rc)
- fprintf(stderr, "migration target (Remus): "
+ fprintf(stderr, "migration target (%s): "
"Failed to unpause domain %s (id: %u):%d\n",
- common_domname, domid, rc);
+ ha, common_domname, domid, rc);
exit(rc ? -ERROR_FAIL: 0);
default:
@@ -4948,8 +4955,12 @@ int main_migrate_receive(int argc, char **argv)
int debug = 0, daemonize = 1, monitor = 1;
libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
int opt;
+ static struct option opts[] = {
+ {"colo", 0, 0, 0x100},
+ COMMON_LONG_OPTS
+ };
- SWITCH_FOREACH_OPT(opt, "Fedr", NULL, "migrate-receive", 0) {
+ SWITCH_FOREACH_OPT(opt, "Fedr", opts, "migrate-receive", 0) {
case 'F':
daemonize = 0;
break;
@@ -4963,6 +4974,9 @@ int main_migrate_receive(int argc, char **argv)
case 'r':
checkpointed = LIBXL_CHECKPOINTED_STREAM_REMUS;
break;
+ case 0x100:
+ checkpointed = LIBXL_CHECKPOINTED_STREAM_COLO;
+ break;
}
if (argc-optind != 0) {
@@ -8338,11 +8352,8 @@ int main_remus(int argc, char **argv)
int config_len;
memset(&r_info, 0, sizeof(libxl_domain_remus_info));
- /* Defaults */
- r_info.interval = 200;
- libxl_defbool_setdefault(&r_info.blackhole, false);
- SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
+ SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:ec", NULL, "remus", 2) {
case 'i':
r_info.interval = atoi(optarg);
break;
@@ -8370,11 +8381,32 @@ int main_remus(int argc, char **argv)
case 'e':
daemonize = 0;
break;
+ case 'c':
+ libxl_defbool_set(&r_info.colo, true);
}
domid = find_domain(argv[optind]);
host = argv[optind + 1];
+ /* Defaults */
+ libxl_defbool_setdefault(&r_info.blackhole, false);
+ libxl_defbool_setdefault(&r_info.colo, false);
+ if (!libxl_defbool_val(r_info.colo) && !r_info.interval)
+ r_info.interval = 200;
+
+ if (libxl_defbool_val(r_info.colo)) {
+ if (r_info.interval || libxl_defbool_val(r_info.blackhole)) {
+ perror("Option -c conflicts with -i or -b");
+ exit(-1);
+ }
+
+ if (libxl_defbool_is_default(r_info.compression)) {
+ perror("COLO can't be used with memory compression. "
+ "Disable memory checkpoint compression now...");
+ libxl_defbool_set(&r_info.compression, false);
+ }
+ }
+
if (!r_info.netbufscript)
r_info.netbufscript = default_remus_netbufscript;
@@ -8389,8 +8421,9 @@ int main_remus(int argc, char **argv)
if (!ssh_command[0]) {
rune = host;
} else {
- xasprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+ xasprintf(&rune, "exec %s %s xl migrate-receive %s %s",
ssh_command, host,
+ libxl_defbool_val(r_info.colo) ? "-c" : "-r",
daemonize ? "" : " -e");
}
@@ -8418,7 +8451,8 @@ int main_remus(int argc, char **argv)
* domain to force failover
*/
if (libxl_domain_info(ctx, 0, domid)) {
- fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+ fprintf(stderr, "%s: Primary domain has been destroyed.\n",
+ libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
close(send_fd);
return 0;
}
@@ -8430,7 +8464,8 @@ int main_remus(int argc, char **argv)
if (rc == ERROR_GUEST_TIMEDOUT)
fprintf(stderr, "Failed to suspend domain at primary.\n");
else {
- fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+ fprintf(stderr, "%s: Backup failed? resuming domain at primary.\n",
+ libxl_defbool_val(r_info.colo) ? "COLO" : "Remus");
libxl_domain_resume(ctx, domid, 1, 0);
}
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index b14b881..5911ea8 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -499,7 +499,9 @@ struct cmd_spec cmd_table[] = {
"-b Replicate memory checkpoints to /dev/null (blackhole).\n"
" Works only in unsafe mode.\n"
"-n Disable network output buffering. Works only in unsafe mode.\n"
- "-d Disable disk replication. Works only in unsafe mode."
+ "-d Disable disk replication. Works only in unsafe mode.\n"
+ "-c Enable COLO HA. It is conflict with -i and -b, and memory\n"
+ " checkpoint must be disabled"
},
#endif
{ "devd",
--
1.9.3
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-03-25 6:44 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-25 6:44 [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Changlong Xie
2016-03-25 6:44 ` [PATCH v13 01/26] tools/libxl: introduction of libxl__qmp_restore to load qemu state Changlong Xie
2016-03-25 6:44 ` [PATCH v13 02/26] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() Changlong Xie
2016-03-25 6:44 ` [PATCH v13 03/26] tools/libxl: Add back channel to allow migration target send data back Changlong Xie
2016-04-04 12:07 ` Olaf Hering
2016-04-04 13:02 ` Wei Liu
2016-04-04 15:29 ` Olaf Hering
2016-03-25 6:44 ` [PATCH v13 04/26] tools/libxl: Introduce new helper function dup_fd_helper() Changlong Xie
2016-03-25 6:44 ` [PATCH v13 05/26] tools/libx{l, c}: add back channel to libxc Changlong Xie
2016-03-25 6:44 ` [PATCH v13 06/26] docs: add colo readme Changlong Xie
2016-03-25 6:44 ` [PATCH v13 07/26] docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams Changlong Xie
2016-03-25 6:44 ` [PATCH v13 08/26] libxc/migration: Specification update for DIRTY_PFN_LIST records Changlong Xie
2016-03-25 6:44 ` [PATCH v13 09/26] libxc/migration: export read_record for common use Changlong Xie
2016-03-25 6:44 ` [PATCH v13 10/26] tools/libxl: add back channel support to write stream Changlong Xie
2016-03-25 6:44 ` [PATCH v13 11/26] tools/libxl: add back channel support to read stream Changlong Xie
2016-03-25 6:44 ` [PATCH v13 12/26] secondary vm suspend/resume/checkpoint code Changlong Xie
2016-03-30 14:07 ` Ian Jackson
2016-03-25 6:44 ` [PATCH v13 13/26] libxl_internal: move stream read manipulations to right place Changlong Xie
2016-03-25 6:44 ` [PATCH v13 14/26] primary vm suspend/resume/checkpoint code Changlong Xie
2016-03-30 14:10 ` Ian Jackson
2016-03-25 6:44 ` [PATCH v13 15/26] libxc/restore: support COLO restore Changlong Xie
2016-03-25 6:44 ` [PATCH v13 16/26] libxc/save: support COLO save Changlong Xie
2016-03-25 6:44 ` Changlong Xie [this message]
2016-03-25 6:44 ` [PATCH v13 18/26] COLO: introduce new API to prepare/start/do/get_error/stop replication Changlong Xie
2016-03-25 6:44 ` [PATCH v13 19/26] Introduce COLO mode and refactor relevant function Changlong Xie
2016-03-25 6:44 ` [PATCH v13 20/26] Support colo mode for qemu disk Changlong Xie
2016-03-28 3:46 ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:17 ` Ian Jackson
2016-03-30 14:36 ` Ian Jackson
2016-03-25 6:44 ` [PATCH v13 21/26] COLO: use qemu block replication Changlong Xie
2016-03-25 6:44 ` [PATCH v13 22/26] COLO proxy: implement setup/teardown/preresume/postresume/checkpoint Changlong Xie
2016-03-25 6:44 ` [PATCH v13 23/26] COLO nic: implement COLO nic subkind Changlong Xie
2016-03-25 12:56 ` Wei Liu
2016-03-28 3:46 ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:22 ` Ian Jackson
2016-03-30 14:38 ` Ian Jackson
2016-03-30 14:40 ` Ian Jackson
2016-03-25 6:44 ` [PATCH v13 24/26] setup and control colo proxy on primary side Changlong Xie
2016-03-25 6:44 ` [PATCH v13 25/26] setup and control colo proxy on secondary side Changlong Xie
2016-03-30 14:24 ` Ian Jackson
2016-03-31 2:19 ` Changlong Xie
2016-03-25 6:44 ` [PATCH v13 26/26] cmdline switches and config vars to control colo-proxy Changlong Xie
2016-03-28 3:47 ` [PATCH v13.1 " Changlong Xie
2016-03-30 14:28 ` Ian Jackson
2016-03-30 14:42 ` Ian Jackson
2016-03-25 15:51 ` [PATCH v13 00/26] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wei Liu
2016-03-28 3:52 ` Changlong Xie
2016-03-30 14:52 ` Ian Jackson
2016-03-30 14:50 ` Ian Jackson
2016-03-31 1:26 ` Wen Congyang
2016-03-31 2:28 ` Changlong Xie
2016-03-31 14:22 ` Wei Liu
2016-04-01 1:59 ` Changlong Xie
2016-04-01 13:47 ` Ian Jackson
2016-04-01 14:37 ` Changlong Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1458888273-7469-18-git-send-email-xiecl.fnst@cn.fujitsu.com \
--to=xiecl.fnst@cn.fujitsu.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@citrix.com \
--cc=eddie.dong@intel.com \
--cc=guijianfeng@cn.fujitsu.com \
--cc=hongyang.yang@easystack.cn \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=konrad.wilk@oracle.com \
--cc=lars.kurth@citrix.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=rshriram@cs.ubc.ca \
--cc=wei.liu2@citrix.com \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).