All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
	arei.gonglei@huawei.com, stefanha@redhat.com,
	amit.shah@redhat.com, hongyang.yang@easystack.cn
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment
Date: Thu, 10 Dec 2015 19:01:14 +0000	[thread overview]
Message-ID: <20151210190113.GL2570@work-vm> (raw)
In-Reply-To: <1448357149-17572-26-git-send-email-zhang.zhanghailiang@huawei.com>

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If we detect some error in colo,  we will wait for some time,
> hoping users also detect it. If users don't issue failover command.
> We will go into default failover procedure, which the PVM will takeover
> work while SVM is exit in default.

I'm not sure this is needed; especially on the SVM.  I don't see any harm
in the SVM waiting forever to be told what to do - it could be told to
failover or quit; I don't see any benefit to it automatically exiting.

In the primary, I can see if you didn't have some automated error
detection system then I can understand it (but I think it's rare);
but you really would want to make that failover delay configurable
so that you could turn it off in a system that did have failure detection;
because automatically restarting the primary after it had caused a failover
to the secondary would be very bad.

Dave

> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/colo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index f31e957..1e6d3dd 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -19,6 +19,14 @@
>  #include "qemu/sockets.h"
>  #include "migration/failover.h"
>  
> +/*
> + * The delay time before qemu begin the procedure of default failover treatment.
> + * Unit: ms
> + * Fix me: This value should be able to change by command
> + * 'migrate-set-parameters'
> + */
> +#define DEFAULT_FAILOVER_DELAY 2000
> +
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
> @@ -264,6 +272,7 @@ static void colo_process_checkpoint(MigrationState *s)
>  {
>      QEMUSizedBuffer *buffer = NULL;
>      int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    int64_t error_time;
>      int ret = 0;
>      uint64_t value;
>  
> @@ -322,8 +331,25 @@ static void colo_process_checkpoint(MigrationState *s)
>      }
>  
>  out:
> +    current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      if (ret < 0) {
>          error_report("%s: %s", __func__, strerror(-ret));
> +        /* Give users time to get involved in this verdict */
> +        while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> +            if (failover_request_is_active()) {
> +                error_report("Primary VM will take over work");
> +                break;
> +            }
> +            usleep(100 * 1000);
> +            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        }
> +
> +        qemu_mutex_lock_iothread();
> +        if (!failover_request_is_active()) {
> +            error_report("Primary VM will take over work in default");
> +            failover_request_active(NULL);
> +        }
> +        qemu_mutex_unlock_iothread();
>      }
>  
>      qsb_free(buffer);
> @@ -384,6 +410,7 @@ void *colo_process_incoming_thread(void *opaque)
>      QEMUFile *fb = NULL;
>      QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
>      uint64_t  total_size;
> +    int64_t error_time, current_time;
>      int ret = 0;
>      uint64_t value;
>  
> @@ -499,9 +526,28 @@ void *colo_process_incoming_thread(void *opaque)
>      }
>  
>  out:
> +    current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      if (ret < 0) {
>          error_report("colo incoming thread will exit, detect error: %s",
>                       strerror(-ret));
> +        /* Give users time to get involved in this verdict */
> +        while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> +            if (failover_request_is_active()) {
> +                error_report("Secondary VM will take over work");
> +                break;
> +            }
> +            usleep(100 * 1000);
> +            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        }
> +        /* check flag again*/
> +        if (!failover_request_is_active()) {
> +            /*
> +            * We assume that Primary VM is still alive according to
> +            * heartbeat, just kill Secondary VM
> +            */
> +            error_report("SVM is going to exit in default!");
> +            exit(1);
> +        }
>      }
>  
>      if (fb) {
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2015-12-10 19:01 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24  9:25 [Qemu-devel] [PATCH COLO-Frame v11 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 01/39] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 02/39] migration: Introduce capability 'x-colo' to migration zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 03/39] COLO: migrate colo related info to secondary node zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 04/39] migration: Export migrate_set_state() zhanghailiang
2015-11-24 17:31   ` Dr. David Alan Gilbert
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 05/39] migration: Add state records for migration incoming zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 06/39] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 07/39] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-11-24 18:14   ` Dr. David Alan Gilbert
2015-11-25  6:39     ` zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 08/39] migration: Rename the'file' member of MigrationState zhanghailiang
2015-11-24 18:26   ` Dr. David Alan Gilbert
2015-11-25  6:48     ` zhanghailiang
2015-12-10  6:41   ` Wen Congyang
2015-12-11  3:40     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 09/39] COLO/migration: Create a new communication path from destination to source zhanghailiang
2015-11-24 18:40   ` Dr. David Alan Gilbert
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 10/39] COLO: Implement colo checkpoint protocol zhanghailiang
2015-11-24 19:00   ` Dr. David Alan Gilbert
2015-11-25 14:01     ` Eric Blake
2015-11-26  6:52       ` Hailiang Zhang
2015-11-26  7:12     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 11/39] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 12/39] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 13/39] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 14/39] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
2015-12-01 18:19   ` Dr. David Alan Gilbert
2015-12-03  7:19     ` Hailiang Zhang
2015-12-03  7:29       ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 15/39] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2015-12-01 19:02   ` Dr. David Alan Gilbert
2015-12-03  8:25     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 16/39] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2015-12-01 19:36   ` Dr. David Alan Gilbert
2015-12-03  8:29     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 17/39] COLO: Load VMState into qsb before restore it zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 18/39] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2015-11-27  5:29   ` Li Zhijian
2015-12-01 12:02     ` Hailiang Zhang
2015-12-01 20:06   ` Dr. David Alan Gilbert
2015-12-03  8:50     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 19/39] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2015-12-09 18:50   ` Dr. David Alan Gilbert
2015-12-11  3:20     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 20/39] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2015-12-09 18:53   ` Dr. David Alan Gilbert
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 21/39] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 22/39] COLO failover: Introduce state to record failover process zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 23/39] COLO: Implement failover work for Primary VM zhanghailiang
2015-12-10 18:34   ` Dr. David Alan Gilbert
2015-12-11  7:54     ` Hailiang Zhang
2015-12-11  9:22       ` Dr. David Alan Gilbert
2015-12-11  9:38         ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 24/39] COLO: Implement failover work for Secondary VM zhanghailiang
2015-12-10 18:50   ` Dr. David Alan Gilbert
2015-12-11  8:27     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment zhanghailiang
2015-12-10 19:01   ` Dr. David Alan Gilbert [this message]
2015-12-11  9:48     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 26/39] qmp event: Add event notification for COLO error zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 27/39] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2015-12-10 20:03   ` Dr. David Alan Gilbert
2015-12-11  8:57     ` Hailiang Zhang
2015-12-11  9:18       ` Dr. David Alan Gilbert
2015-12-11  9:29         ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 28/39] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 29/39] COLO: Process shutdown command for VM in COLO state zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 30/39] COLO: Update the global runstate after going into colo state zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 31/39] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 32/39] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2015-11-27  5:10   ` Li Zhijian
2015-12-01 12:07     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 33/39] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 34/39] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
2015-11-27 11:39   ` Yang Hongyang
2015-11-28  5:55     ` Hailiang Zhang
2015-11-30  1:19   ` Li Zhijian
2015-12-01  8:56     ` Hailiang Zhang
2015-12-03  1:17   ` Wen Congyang
2015-12-03  3:53     ` Hailiang Zhang
2015-12-03  6:25       ` Wen Congyang
2015-12-03  6:48         ` Hailiang Zhang
2015-12-03  7:21           ` Yang Hongyang
2015-12-03  8:37             ` Hailiang Zhang
2015-12-07  7:38             ` Hailiang Zhang
2015-12-08  1:49               ` Yang Hongyang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 35/39] filter-buffer: Accept zero interval zhanghailiang
2015-11-27 11:42   ` Yang Hongyang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 36/39] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 37/39] filter-buffer: Introduce a helper function to release packets zhanghailiang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 38/39] colo: Use default buffer-filter to buffer and " zhanghailiang
2015-11-27 12:51   ` Yang Hongyang
2015-11-28  6:15     ` Hailiang Zhang
2015-11-24  9:25 ` [Qemu-devel] [PATCH COLO-Frame v11 39/39] COLO: Add block replication into colo process zhanghailiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151210190113.GL2570@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=eddie.dong@intel.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.