All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] COLO: fix some bugs
@ 2017-01-17 12:57 zhanghailiang
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly zhanghailiang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: zhanghailiang @ 2017-01-17 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8, zhanghailiang

Hi,

This series fix three bugs of COLO. 
patch 1 fix one usage case which users want to change checkpoint-delay
with an extream big value set before, the new value may not take effect
until finishing a long time of sleep.

Patch 2 and 3 are old patches that split from previous version
long time ago, and has been reviewed by Dave.

I'd like to pick these three patches from the later COLO series,
which will include some optimization and integrating with block
replication and COLO net proxy.

Please review, thanks.


zhanghailiang (3):
  COLO: fix setting checkpoint-delay not working properly
  COLO: Shutdown related socket fd while do failover
  COLO: Don't process failover request while loading VM's state

 include/migration/colo.h      |   2 +
 include/migration/migration.h |   8 ++++
 migration/colo.c              | 102 +++++++++++++++++++++++++++++++++++++-----
 migration/migration.c         |   3 ++
 qapi-schema.json              |   4 +-
 5 files changed, 108 insertions(+), 11 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly
  2017-01-17 12:57 [Qemu-devel] [PATCH 0/3] COLO: fix some bugs zhanghailiang
@ 2017-01-17 12:57 ` zhanghailiang
  2017-02-08 10:38   ` Dr. David Alan Gilbert
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover zhanghailiang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: zhanghailiang @ 2017-01-17 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8, zhanghailiang

If we set checkpoint-delay through command 'migrate-set-parameters',
It will not take effect until we finish last sleep chekpoint-delay,
That's will be offensive espeically when we want to change its value
from an extreme big one to a proper value.

Fix it by using timer to realize checkpoint-delay.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/migration/colo.h      |  2 ++
 include/migration/migration.h |  5 +++++
 migration/colo.c              | 33 +++++++++++++++++++++++----------
 migration/migration.c         |  3 +++
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e32eef4..2bbff9e 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
 
 /* failover */
 void colo_do_failover(MigrationState *s);
+
+void colo_checkpoint_notify(void *opaque);
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..487ac1e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -183,6 +183,11 @@ struct MigrationState
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
 
+    /* The semaphore is used to notify COLO thread to do checkpoint */
+    QemuSemaphore colo_checkpoint_sem;
+    int64_t colo_checkpoint_time;
+    QEMUTimer *colo_delay_timer;
+
     /* The last error that occurred */
     Error *error;
 };
diff --git a/migration/colo.c b/migration/colo.c
index 93c85c5..08b2e46 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -302,7 +302,7 @@ static void colo_process_checkpoint(MigrationState *s)
 {
     QIOChannelBuffer *bioc;
     QEMUFile *fb = NULL;
-    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     Error *local_err = NULL;
     int ret;
 
@@ -332,26 +332,21 @@ static void colo_process_checkpoint(MigrationState *s)
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
+    timer_mod(s->colo_delay_timer,
+            current_time + s->parameters.x_checkpoint_delay);
+
     while (s->state == MIGRATION_STATUS_COLO) {
         if (failover_get_state() != FAILOVER_STATUS_NONE) {
             error_report("failover request");
             goto out;
         }
 
-        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-        if (current_time - checkpoint_time <
-            s->parameters.x_checkpoint_delay) {
-            int64_t delay_ms;
+        qemu_sem_wait(&s->colo_checkpoint_sem);
 
-            delay_ms = s->parameters.x_checkpoint_delay -
-                       (current_time - checkpoint_time);
-            g_usleep(delay_ms * 1000);
-        }
         ret = colo_do_checkpoint_transaction(s, bioc, fb);
         if (ret < 0) {
             goto out;
         }
-        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     }
 
 out:
@@ -364,14 +359,32 @@ out:
         qemu_fclose(fb);
     }
 
+    timer_del(s->colo_delay_timer);
+
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
 }
 
+void colo_checkpoint_notify(void *opaque)
+{
+    MigrationState *s = opaque;
+    int64_t next_notify_time;
+
+    qemu_sem_post(&s->colo_checkpoint_sem);
+    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    next_notify_time = s->colo_checkpoint_time +
+                    s->parameters.x_checkpoint_delay;
+    timer_mod(s->colo_delay_timer, next_notify_time);
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
     qemu_mutex_unlock_iothread();
+    qemu_sem_init(&s->colo_checkpoint_sem, 0);
+    s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
+                                colo_checkpoint_notify, s);
+
     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
     colo_process_checkpoint(s);
diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..a4861da 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -895,6 +895,9 @@ void qmp_migrate_set_parameters(MigrationParameters *params, Error **errp)
 
     if (params->has_x_checkpoint_delay) {
         s->parameters.x_checkpoint_delay = params->x_checkpoint_delay;
+        if (migration_in_colo_state()) {
+            colo_checkpoint_notify(s);
+        }
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
  2017-01-17 12:57 [Qemu-devel] [PATCH 0/3] COLO: fix some bugs zhanghailiang
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly zhanghailiang
@ 2017-01-17 12:57 ` zhanghailiang
  2017-01-18 11:01   ` Dr. David Alan Gilbert
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state zhanghailiang
  2017-02-10 15:44 ` [Qemu-devel] [PATCH 0/3] COLO: fix some bugs Dr. David Alan Gilbert
  3 siblings, 1 reply; 15+ messages in thread
From: zhanghailiang @ 2017-01-17 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8, zhanghailiang

If the net connection between primary host and secondary host breaks
while COLO/COLO incoming threads are doing read() or write().
It will block until connection is timeout, and the failover process
will be blocked because of it.

So it is necessary to shutdown all the socket fds used by COLO
to avoid this situation. Besides, we should close the corresponding
file descriptors after failvoer BH shutdown them,
Or there will be an error.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  3 +++
 migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 487ac1e..7cac877 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -113,6 +113,7 @@ struct MigrationIncomingState {
     QemuThread colo_incoming_thread;
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
+    QemuSemaphore colo_incoming_sem;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
@@ -182,6 +183,8 @@ struct MigrationState
     QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
     /* The RAMBlock used in the last src_page_request */
     RAMBlock *last_req_rb;
+    /* The semaphore is used to notify COLO thread that failover is finished */
+    QemuSemaphore colo_exit_sem;
 
     /* The semaphore is used to notify COLO thread to do checkpoint */
     QemuSemaphore colo_checkpoint_sem;
diff --git a/migration/colo.c b/migration/colo.c
index 08b2e46..3222812 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
         /* recover runstate to normal migration finish state */
         autostart = true;
     }
+    /*
+     * Make sure COLO incoming thread not block in recv or send,
+     * If mis->from_src_file and mis->to_src_file use the same fd,
+     * The second shutdown() will return -1, we ignore this value,
+     * It is harmless.
+     */
+    if (mis->from_src_file) {
+        qemu_file_shutdown(mis->from_src_file);
+    }
+    if (mis->to_src_file) {
+        qemu_file_shutdown(mis->to_src_file);
+    }
 
     old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
                                    FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
                      "secondary VM", FailoverStatus_lookup[old_state]);
         return;
     }
+    /* Notify COLO incoming thread that failover work is finished */
+    qemu_sem_post(&mis->colo_incoming_sem);
     /* For Secondary VM, jump to incoming co */
     if (mis->migration_incoming_co) {
         qemu_coroutine_enter(mis->migration_incoming_co);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    /*
+     * Wake up COLO thread which may blocked in recv() or send(),
+     * The s->rp_state.from_dst_file and s->to_dst_file may use the
+     * same fd, but we still shutdown the fd for twice, it is harmless.
+     */
+    if (s->to_dst_file) {
+        qemu_file_shutdown(s->to_dst_file);
+    }
+    if (s->rp_state.from_dst_file) {
+        qemu_file_shutdown(s->rp_state.from_dst_file);
+    }
+
     old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
                                    FAILOVER_STATUS_COMPLETED);
     if (old_state != FAILOVER_STATUS_ACTIVE) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
                      FailoverStatus_lookup[old_state]);
         return;
     }
+    /* Notify COLO thread that failover work is finished */
+    qemu_sem_post(&s->colo_exit_sem);
 }
 
 void colo_do_failover(MigrationState *s)
@@ -361,6 +389,14 @@ out:
 
     timer_del(s->colo_delay_timer);
 
+    /* Hope this not to be too long to wait here */
+    qemu_sem_wait(&s->colo_exit_sem);
+    qemu_sem_destroy(&s->colo_exit_sem);
+    /*
+     * Must be called after failover BH is completed,
+     * Or the failover BH may shutdown the wrong fd that
+     * re-used by other threads after we release here.
+     */
     if (s->rp_state.from_dst_file) {
         qemu_fclose(s->rp_state.from_dst_file);
     }
@@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s)
     s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
                                 colo_checkpoint_notify, s);
 
+    qemu_sem_init(&s->colo_exit_sem, 0);
     migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
     colo_process_checkpoint(s);
@@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t value;
     Error *local_err = NULL;
 
+    qemu_sem_init(&mis->colo_incoming_sem, 0);
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
 
@@ -533,6 +572,10 @@ out:
         qemu_fclose(fb);
     }
 
+    /* Hope this not to be too long to loop here */
+    qemu_sem_wait(&mis->colo_incoming_sem);
+    qemu_sem_destroy(&mis->colo_incoming_sem);
+    /* Must be called after failover BH is completed */
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state
  2017-01-17 12:57 [Qemu-devel] [PATCH 0/3] COLO: fix some bugs zhanghailiang
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly zhanghailiang
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover zhanghailiang
@ 2017-01-17 12:57 ` zhanghailiang
  2017-01-17 18:24   ` Eric Blake
  2017-02-10 15:44 ` [Qemu-devel] [PATCH 0/3] COLO: fix some bugs Dr. David Alan Gilbert
  3 siblings, 1 reply; 15+ messages in thread
From: zhanghailiang @ 2017-01-17 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8, zhanghailiang, Eric Blake

We should not do failover work while the main thread is loading
VM's state. Otherwise the consistent of VM's memory and
device state will be broken.

We will restart the loading process after jump over the stage,
The new failover status 'RELAUNCH' will help to record if we
need to restart the process.

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/colo.c | 26 ++++++++++++++++++++++++++
 qapi-schema.json |  4 +++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index 3222812..712308e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -20,6 +20,8 @@
 #include "qapi/error.h"
 #include "migration/failover.h"
 
+static bool vmstate_loading;
+
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
 bool colo_supported(void)
@@ -51,6 +53,19 @@ static void secondary_vm_do_failover(void)
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    /* Can not do failover during the process of VM's loading VMstate, Or
+     * it will break the secondary VM.
+     */
+    if (vmstate_loading) {
+        old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
+                        FAILOVER_STATUS_RELAUNCH);
+        if (old_state != FAILOVER_STATUS_ACTIVE) {
+            error_report("Unknown error while do failover for secondary VM,"
+                         "old_state: %s", FailoverStatus_lookup[old_state]);
+        }
+        return;
+    }
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
@@ -548,13 +563,23 @@ void *colo_process_incoming_thread(void *opaque)
 
         qemu_mutex_lock_iothread();
         qemu_system_reset(VMRESET_SILENT);
+        vmstate_loading = true;
         if (qemu_loadvm_state(fb) < 0) {
             error_report("COLO: loadvm failed");
             qemu_mutex_unlock_iothread();
             goto out;
         }
+
+        vmstate_loading = false;
         qemu_mutex_unlock_iothread();
 
+        if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+            failover_set_state(FAILOVER_STATUS_RELAUNCH,
+                            FAILOVER_STATUS_NONE);
+            failover_request_active(NULL);
+            goto out;
+        }
+
         colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
                      &local_err);
         if (local_err) {
@@ -563,6 +588,7 @@ void *colo_process_incoming_thread(void *opaque)
     }
 
 out:
+    vmstate_loading = false;
     /* Throw the unreported error message after exited from loop */
     if (local_err) {
         error_report_err(local_err);
diff --git a/qapi-schema.json b/qapi-schema.json
index ce20f16..97210c7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -856,10 +856,12 @@
 #
 # @completed: finish the process of failover
 #
+# @relaunch: restart the failover process, from 'none' -> 'completed'
+#
 # Since: 2.8
 ##
 { 'enum': 'FailoverStatus',
-  'data': [ 'none', 'require', 'active', 'completed'] }
+  'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
 
 ##
 # @x-colo-lost-heartbeat:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state zhanghailiang
@ 2017-01-17 18:24   ` Eric Blake
  2017-01-18  8:19     ` Hailiang Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-01-17 18:24 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel
  Cc: amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On 01/17/2017 06:57 AM, zhanghailiang wrote:
> We should not do failover work while the main thread is loading
> VM's state. Otherwise the consistent of VM's memory and
> device state will be broken.
> 
> We will restart the loading process after jump over the stage,
> The new failover status 'RELAUNCH' will help to record if we
> need to restart the process.
> 
> Cc: Eric Blake <eblake@redhat.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/colo.c | 26 ++++++++++++++++++++++++++
>  qapi-schema.json |  4 +++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 

> +++ b/qapi-schema.json
> @@ -856,10 +856,12 @@
>  #
>  # @completed: finish the process of failover
>  #
> +# @relaunch: restart the failover process, from 'none' -> 'completed'

You'll need to add a '(since 2.9)' tag

> +#
>  # Since: 2.8
>  ##
>  { 'enum': 'FailoverStatus',
> -  'data': [ 'none', 'require', 'active', 'completed'] }
> +  'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
>  
>  ##
>  # @x-colo-lost-heartbeat:
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state
  2017-01-17 18:24   ` Eric Blake
@ 2017-01-18  8:19     ` Hailiang Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Hailiang Zhang @ 2017-01-18  8:19 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: xuquan8, amit.shah, quintela, dgilbert, eddie.dong, lizhijian,
	zhangchen.fnst

On 2017/1/18 2:24, Eric Blake wrote:
> On 01/17/2017 06:57 AM, zhanghailiang wrote:
>> We should not do failover work while the main thread is loading
>> VM's state. Otherwise the consistent of VM's memory and
>> device state will be broken.
>>
>> We will restart the loading process after jump over the stage,
>> The new failover status 'RELAUNCH' will help to record if we
>> need to restart the process.
>>
>> Cc: Eric Blake <eblake@redhat.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>>   migration/colo.c | 26 ++++++++++++++++++++++++++
>>   qapi-schema.json |  4 +++-
>>   2 files changed, 29 insertions(+), 1 deletion(-)
>>
>
>> +++ b/qapi-schema.json
>> @@ -856,10 +856,12 @@
>>   #
>>   # @completed: finish the process of failover
>>   #
>> +# @relaunch: restart the failover process, from 'none' -> 'completed'
>
> You'll need to add a '(since 2.9)' tag
>

OK, I'll add it in next version, thanks.

>> +#
>>   # Since: 2.8
>>   ##
>>   { 'enum': 'FailoverStatus',
>> -  'data': [ 'none', 'require', 'active', 'completed'] }
>> +  'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
>>
>>   ##
>>   # @x-colo-lost-heartbeat:
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover zhanghailiang
@ 2017-01-18 11:01   ` Dr. David Alan Gilbert
  2017-02-08 11:14     ` Hailiang Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2017-01-18 11:01 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, amit.shah, quintela, eddie.dong, lizhijian,
	zhangchen.fnst, xuquan8

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If the net connection between primary host and secondary host breaks
> while COLO/COLO incoming threads are doing read() or write().
> It will block until connection is timeout, and the failover process
> will be blocked because of it.
> 
> So it is necessary to shutdown all the socket fds used by COLO
> to avoid this situation. Besides, we should close the corresponding
> file descriptors after failvoer BH shutdown them,
> Or there will be an error.

Hi,
  There are two parts to this patch:
   a) Add some semaphores to sequence failover
   b) Use shutdown()

  At first I wondered if perhaps they should be split; but I see
the reason for the semaphores is mostly to stop the race between
the fd's getting closed and the shutdown() calls; so I think it's
OK.

Do you have any problems with these semaphores during powering off the
guest?

Dave




> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  3 +++
>  migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 487ac1e..7cac877 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -113,6 +113,7 @@ struct MigrationIncomingState {
>      QemuThread colo_incoming_thread;
>      /* The coroutine we should enter (back) after failover */
>      Coroutine *migration_incoming_co;
> +    QemuSemaphore colo_incoming_sem;
>  
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
> @@ -182,6 +183,8 @@ struct MigrationState
>      QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
>      /* The RAMBlock used in the last src_page_request */
>      RAMBlock *last_req_rb;
> +    /* The semaphore is used to notify COLO thread that failover is finished */
> +    QemuSemaphore colo_exit_sem;
>  
>      /* The semaphore is used to notify COLO thread to do checkpoint */
>      QemuSemaphore colo_checkpoint_sem;
> diff --git a/migration/colo.c b/migration/colo.c
> index 08b2e46..3222812 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
>          /* recover runstate to normal migration finish state */
>          autostart = true;
>      }
> +    /*
> +     * Make sure COLO incoming thread not block in recv or send,
> +     * If mis->from_src_file and mis->to_src_file use the same fd,
> +     * The second shutdown() will return -1, we ignore this value,
> +     * It is harmless.
> +     */
> +    if (mis->from_src_file) {
> +        qemu_file_shutdown(mis->from_src_file);
> +    }
> +    if (mis->to_src_file) {
> +        qemu_file_shutdown(mis->to_src_file);
> +    }
>  
>      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>                                     FAILOVER_STATUS_COMPLETED);
> @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
>                       "secondary VM", FailoverStatus_lookup[old_state]);
>          return;
>      }
> +    /* Notify COLO incoming thread that failover work is finished */
> +    qemu_sem_post(&mis->colo_incoming_sem);
>      /* For Secondary VM, jump to incoming co */
>      if (mis->migration_incoming_co) {
>          qemu_coroutine_enter(mis->migration_incoming_co);
> @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
>      migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>                        MIGRATION_STATUS_COMPLETED);
>  
> +    /*
> +     * Wake up COLO thread which may blocked in recv() or send(),
> +     * The s->rp_state.from_dst_file and s->to_dst_file may use the
> +     * same fd, but we still shutdown the fd for twice, it is harmless.
> +     */
> +    if (s->to_dst_file) {
> +        qemu_file_shutdown(s->to_dst_file);
> +    }
> +    if (s->rp_state.from_dst_file) {
> +        qemu_file_shutdown(s->rp_state.from_dst_file);
> +    }
> +
>      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>                                     FAILOVER_STATUS_COMPLETED);
>      if (old_state != FAILOVER_STATUS_ACTIVE) {
> @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
>                       FailoverStatus_lookup[old_state]);
>          return;
>      }
> +    /* Notify COLO thread that failover work is finished */
> +    qemu_sem_post(&s->colo_exit_sem);
>  }
>  
>  void colo_do_failover(MigrationState *s)
> @@ -361,6 +389,14 @@ out:
>  
>      timer_del(s->colo_delay_timer);
>  
> +    /* Hope this not to be too long to wait here */
> +    qemu_sem_wait(&s->colo_exit_sem);
> +    qemu_sem_destroy(&s->colo_exit_sem);
> +    /*
> +     * Must be called after failover BH is completed,
> +     * Or the failover BH may shutdown the wrong fd that
> +     * re-used by other threads after we release here.
> +     */
>      if (s->rp_state.from_dst_file) {
>          qemu_fclose(s->rp_state.from_dst_file);
>      }
> @@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s)
>      s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>                                  colo_checkpoint_notify, s);
>  
> +    qemu_sem_init(&s->colo_exit_sem, 0);
>      migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
>      colo_process_checkpoint(s);
> @@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque)
>      uint64_t value;
>      Error *local_err = NULL;
>  
> +    qemu_sem_init(&mis->colo_incoming_sem, 0);
> +
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
>  
> @@ -533,6 +572,10 @@ out:
>          qemu_fclose(fb);
>      }
>  
> +    /* Hope this not to be too long to loop here */
> +    qemu_sem_wait(&mis->colo_incoming_sem);
> +    qemu_sem_destroy(&mis->colo_incoming_sem);
> +    /* Must be called after failover BH is completed */
>      if (mis->to_src_file) {
>          qemu_fclose(mis->to_src_file);
>      }
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly zhanghailiang
@ 2017-02-08 10:38   ` Dr. David Alan Gilbert
  2017-02-08 11:18     ` Hailiang Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2017-02-08 10:38 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, quintela, eddie.dong, lizhijian, zhangchen.fnst, xuquan8

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If we set checkpoint-delay through command 'migrate-set-parameters',
> It will not take effect until we finish last sleep chekpoint-delay,
> That's will be offensive espeically when we want to change its value
> from an extreme big one to a proper value.
> 
> Fix it by using timer to realize checkpoint-delay.

Yes, I think this works; you only kick the timer in COLO state,
and you create the timer before going into COLO state and clean it up
after we leave, so it feels safe.

Are you also going to kick this semaphore when doing a failover to
cause this thread to exit quickly?


> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Dave
> ---
>  include/migration/colo.h      |  2 ++
>  include/migration/migration.h |  5 +++++
>  migration/colo.c              | 33 +++++++++++++++++++++++----------
>  migration/migration.c         |  3 +++
>  4 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/include/migration/colo.h b/include/migration/colo.h
> index e32eef4..2bbff9e 100644
> --- a/include/migration/colo.h
> +++ b/include/migration/colo.h
> @@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
>  
>  /* failover */
>  void colo_do_failover(MigrationState *s);
> +
> +void colo_checkpoint_notify(void *opaque);
>  #endif
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index c309d23..487ac1e 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -183,6 +183,11 @@ struct MigrationState
>      /* The RAMBlock used in the last src_page_request */
>      RAMBlock *last_req_rb;
>  
> +    /* The semaphore is used to notify COLO thread to do checkpoint */
> +    QemuSemaphore colo_checkpoint_sem;
> +    int64_t colo_checkpoint_time;
> +    QEMUTimer *colo_delay_timer;
> +
>      /* The last error that occurred */
>      Error *error;
>  };
> diff --git a/migration/colo.c b/migration/colo.c
> index 93c85c5..08b2e46 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -302,7 +302,7 @@ static void colo_process_checkpoint(MigrationState *s)
>  {
>      QIOChannelBuffer *bioc;
>      QEMUFile *fb = NULL;
> -    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      Error *local_err = NULL;
>      int ret;
>  
> @@ -332,26 +332,21 @@ static void colo_process_checkpoint(MigrationState *s)
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> +    timer_mod(s->colo_delay_timer,
> +            current_time + s->parameters.x_checkpoint_delay);
> +
>      while (s->state == MIGRATION_STATUS_COLO) {
>          if (failover_get_state() != FAILOVER_STATUS_NONE) {
>              error_report("failover request");
>              goto out;
>          }
>  
> -        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -        if (current_time - checkpoint_time <
> -            s->parameters.x_checkpoint_delay) {
> -            int64_t delay_ms;
> +        qemu_sem_wait(&s->colo_checkpoint_sem);
>  
> -            delay_ms = s->parameters.x_checkpoint_delay -
> -                       (current_time - checkpoint_time);
> -            g_usleep(delay_ms * 1000);
> -        }
>          ret = colo_do_checkpoint_transaction(s, bioc, fb);
>          if (ret < 0) {
>              goto out;
>          }
> -        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      }
>  
>  out:
> @@ -364,14 +359,32 @@ out:
>          qemu_fclose(fb);
>      }
>  
> +    timer_del(s->colo_delay_timer);
> +
>      if (s->rp_state.from_dst_file) {
>          qemu_fclose(s->rp_state.from_dst_file);
>      }
>  }
>  
> +void colo_checkpoint_notify(void *opaque)
> +{
> +    MigrationState *s = opaque;
> +    int64_t next_notify_time;
> +
> +    qemu_sem_post(&s->colo_checkpoint_sem);
> +    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    next_notify_time = s->colo_checkpoint_time +
> +                    s->parameters.x_checkpoint_delay;
> +    timer_mod(s->colo_delay_timer, next_notify_time);
> +}
> +
>  void migrate_start_colo_process(MigrationState *s)
>  {
>      qemu_mutex_unlock_iothread();
> +    qemu_sem_init(&s->colo_checkpoint_sem, 0);
> +    s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
> +                                colo_checkpoint_notify, s);
> +
>      migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_COLO);
>      colo_process_checkpoint(s);
> diff --git a/migration/migration.c b/migration/migration.c
> index f498ab8..a4861da 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -895,6 +895,9 @@ void qmp_migrate_set_parameters(MigrationParameters *params, Error **errp)
>  
>      if (params->has_x_checkpoint_delay) {
>          s->parameters.x_checkpoint_delay = params->x_checkpoint_delay;
> +        if (migration_in_colo_state()) {
> +            colo_checkpoint_notify(s);
> +        }
>      }
>  }
>  
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
  2017-01-18 11:01   ` Dr. David Alan Gilbert
@ 2017-02-08 11:14     ` Hailiang Zhang
  2017-02-08 19:53       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Hailiang Zhang @ 2017-02-08 11:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xuquan8, qemu-devel, amit.shah, quintela, eddie.dong, lizhijian,
	zhangchen.fnst

On 2017/1/18 19:01, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If the net connection between primary host and secondary host breaks
>> while COLO/COLO incoming threads are doing read() or write().
>> It will block until connection is timeout, and the failover process
>> will be blocked because of it.
>>
>> So it is necessary to shutdown all the socket fds used by COLO
>> to avoid this situation. Besides, we should close the corresponding
>> file descriptors after failvoer BH shutdown them,
>> Or there will be an error.
>
> Hi,
>    There are two parts to this patch:
>     a) Add some semaphores to sequence failover
>     b) Use shutdown()
>
>    At first I wondered if perhaps they should be split; but I see
> the reason for the semaphores is mostly to stop the race between
> the fd's getting closed and the shutdown() calls; so I think it's
> OK.
>

Hi,

Yes, you are right, maybe i should add some comments about that.
Will do in next version.

> Do you have any problems with these semaphores during powering off the
> guest?
>

No, we didn't encounter any problems or trigger any bugs in our test
with this semaphores. In what places do you doubt it may has problems ? :)

Thanks,
Hailiang

> Dave
>
>
>
>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>>   include/migration/migration.h |  3 +++
>>   migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index 487ac1e..7cac877 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -113,6 +113,7 @@ struct MigrationIncomingState {
>>       QemuThread colo_incoming_thread;
>>       /* The coroutine we should enter (back) after failover */
>>       Coroutine *migration_incoming_co;
>> +    QemuSemaphore colo_incoming_sem;
>>
>>       /* See savevm.c */
>>       LoadStateEntry_Head loadvm_handlers;
>> @@ -182,6 +183,8 @@ struct MigrationState
>>       QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
>>       /* The RAMBlock used in the last src_page_request */
>>       RAMBlock *last_req_rb;
>> +    /* The semaphore is used to notify COLO thread that failover is finished */
>> +    QemuSemaphore colo_exit_sem;
>>
>>       /* The semaphore is used to notify COLO thread to do checkpoint */
>>       QemuSemaphore colo_checkpoint_sem;
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 08b2e46..3222812 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
>>           /* recover runstate to normal migration finish state */
>>           autostart = true;
>>       }
>> +    /*
>> +     * Make sure COLO incoming thread not block in recv or send,
>> +     * If mis->from_src_file and mis->to_src_file use the same fd,
>> +     * The second shutdown() will return -1, we ignore this value,
>> +     * It is harmless.
>> +     */
>> +    if (mis->from_src_file) {
>> +        qemu_file_shutdown(mis->from_src_file);
>> +    }
>> +    if (mis->to_src_file) {
>> +        qemu_file_shutdown(mis->to_src_file);
>> +    }
>>
>>       old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>>                                      FAILOVER_STATUS_COMPLETED);
>> @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
>>                        "secondary VM", FailoverStatus_lookup[old_state]);
>>           return;
>>       }
>> +    /* Notify COLO incoming thread that failover work is finished */
>> +    qemu_sem_post(&mis->colo_incoming_sem);
>>       /* For Secondary VM, jump to incoming co */
>>       if (mis->migration_incoming_co) {
>>           qemu_coroutine_enter(mis->migration_incoming_co);
>> @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
>>       migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>>                         MIGRATION_STATUS_COMPLETED);
>>
>> +    /*
>> +     * Wake up COLO thread which may blocked in recv() or send(),
>> +     * The s->rp_state.from_dst_file and s->to_dst_file may use the
>> +     * same fd, but we still shutdown the fd for twice, it is harmless.
>> +     */
>> +    if (s->to_dst_file) {
>> +        qemu_file_shutdown(s->to_dst_file);
>> +    }
>> +    if (s->rp_state.from_dst_file) {
>> +        qemu_file_shutdown(s->rp_state.from_dst_file);
>> +    }
>> +
>>       old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>>                                      FAILOVER_STATUS_COMPLETED);
>>       if (old_state != FAILOVER_STATUS_ACTIVE) {
>> @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
>>                        FailoverStatus_lookup[old_state]);
>>           return;
>>       }
>> +    /* Notify COLO thread that failover work is finished */
>> +    qemu_sem_post(&s->colo_exit_sem);
>>   }
>>
>>   void colo_do_failover(MigrationState *s)
>> @@ -361,6 +389,14 @@ out:
>>
>>       timer_del(s->colo_delay_timer);
>>
>> +    /* Hope this not to be too long to wait here */
>> +    qemu_sem_wait(&s->colo_exit_sem);
>> +    qemu_sem_destroy(&s->colo_exit_sem);
>> +    /*
>> +     * Must be called after failover BH is completed,
>> +     * Or the failover BH may shutdown the wrong fd that
>> +     * re-used by other threads after we release here.
>> +     */
>>       if (s->rp_state.from_dst_file) {
>>           qemu_fclose(s->rp_state.from_dst_file);
>>       }
>> @@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s)
>>       s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>>                                   colo_checkpoint_notify, s);
>>
>> +    qemu_sem_init(&s->colo_exit_sem, 0);
>>       migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>>                         MIGRATION_STATUS_COLO);
>>       colo_process_checkpoint(s);
>> @@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque)
>>       uint64_t value;
>>       Error *local_err = NULL;
>>
>> +    qemu_sem_init(&mis->colo_incoming_sem, 0);
>> +
>>       migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>                         MIGRATION_STATUS_COLO);
>>
>> @@ -533,6 +572,10 @@ out:
>>           qemu_fclose(fb);
>>       }
>>
>> +    /* Hope this not to be too long to loop here */
>> +    qemu_sem_wait(&mis->colo_incoming_sem);
>> +    qemu_sem_destroy(&mis->colo_incoming_sem);
>> +    /* Must be called after failover BH is completed */
>>       if (mis->to_src_file) {
>>           qemu_fclose(mis->to_src_file);
>>       }
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly
  2017-02-08 10:38   ` Dr. David Alan Gilbert
@ 2017-02-08 11:18     ` Hailiang Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Hailiang Zhang @ 2017-02-08 11:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xuquan8, qemu-devel, quintela, eddie.dong, lizhijian, zhangchen.fnst

On 2017/2/8 18:38, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If we set checkpoint-delay through command 'migrate-set-parameters',
>> It will not take effect until we finish last sleep chekpoint-delay,
>> That's will be offensive espeically when we want to change its value
>> from an extreme big one to a proper value.
>>
>> Fix it by using timer to realize checkpoint-delay.
>
> Yes, I think this works; you only kick the timer in COLO state,
> and you create the timer before going into COLO state and clean it up
> after we leave, so it feels safe.
>
> Are you also going to kick this semaphore when doing a failover to
> cause this thread to exit quickly?
>

Ha, good suggestion! I think it will work.
Accepted, thanks very much.

>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Dave
>> ---
>>   include/migration/colo.h      |  2 ++
>>   include/migration/migration.h |  5 +++++
>>   migration/colo.c              | 33 +++++++++++++++++++++++----------
>>   migration/migration.c         |  3 +++
>>   4 files changed, 33 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/migration/colo.h b/include/migration/colo.h
>> index e32eef4..2bbff9e 100644
>> --- a/include/migration/colo.h
>> +++ b/include/migration/colo.h
>> @@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
>>
>>   /* failover */
>>   void colo_do_failover(MigrationState *s);
>> +
>> +void colo_checkpoint_notify(void *opaque);
>>   #endif
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index c309d23..487ac1e 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -183,6 +183,11 @@ struct MigrationState
>>       /* The RAMBlock used in the last src_page_request */
>>       RAMBlock *last_req_rb;
>>
>> +    /* The semaphore is used to notify COLO thread to do checkpoint */
>> +    QemuSemaphore colo_checkpoint_sem;
>> +    int64_t colo_checkpoint_time;
>> +    QEMUTimer *colo_delay_timer;
>> +
>>       /* The last error that occurred */
>>       Error *error;
>>   };
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 93c85c5..08b2e46 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -302,7 +302,7 @@ static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       QIOChannelBuffer *bioc;
>>       QEMUFile *fb = NULL;
>> -    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> +    int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>       Error *local_err = NULL;
>>       int ret;
>>
>> @@ -332,26 +332,21 @@ static void colo_process_checkpoint(MigrationState *s)
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("stop", "run");
>>
>> +    timer_mod(s->colo_delay_timer,
>> +            current_time + s->parameters.x_checkpoint_delay);
>> +
>>       while (s->state == MIGRATION_STATUS_COLO) {
>>           if (failover_get_state() != FAILOVER_STATUS_NONE) {
>>               error_report("failover request");
>>               goto out;
>>           }
>>
>> -        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> -        if (current_time - checkpoint_time <
>> -            s->parameters.x_checkpoint_delay) {
>> -            int64_t delay_ms;
>> +        qemu_sem_wait(&s->colo_checkpoint_sem);
>>
>> -            delay_ms = s->parameters.x_checkpoint_delay -
>> -                       (current_time - checkpoint_time);
>> -            g_usleep(delay_ms * 1000);
>> -        }
>>           ret = colo_do_checkpoint_transaction(s, bioc, fb);
>>           if (ret < 0) {
>>               goto out;
>>           }
>> -        checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>       }
>>
>>   out:
>> @@ -364,14 +359,32 @@ out:
>>           qemu_fclose(fb);
>>       }
>>
>> +    timer_del(s->colo_delay_timer);
>> +
>>       if (s->rp_state.from_dst_file) {
>>           qemu_fclose(s->rp_state.from_dst_file);
>>       }
>>   }
>>
>> +void colo_checkpoint_notify(void *opaque)
>> +{
>> +    MigrationState *s = opaque;
>> +    int64_t next_notify_time;
>> +
>> +    qemu_sem_post(&s->colo_checkpoint_sem);
>> +    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> +    next_notify_time = s->colo_checkpoint_time +
>> +                    s->parameters.x_checkpoint_delay;
>> +    timer_mod(s->colo_delay_timer, next_notify_time);
>> +}
>> +
>>   void migrate_start_colo_process(MigrationState *s)
>>   {
>>       qemu_mutex_unlock_iothread();
>> +    qemu_sem_init(&s->colo_checkpoint_sem, 0);
>> +    s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>> +                                colo_checkpoint_notify, s);
>> +
>>       migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>>                         MIGRATION_STATUS_COLO);
>>       colo_process_checkpoint(s);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index f498ab8..a4861da 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -895,6 +895,9 @@ void qmp_migrate_set_parameters(MigrationParameters *params, Error **errp)
>>
>>       if (params->has_x_checkpoint_delay) {
>>           s->parameters.x_checkpoint_delay = params->x_checkpoint_delay;
>> +        if (migration_in_colo_state()) {
>> +            colo_checkpoint_notify(s);
>> +        }
>>       }
>>   }
>>
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
  2017-02-08 11:14     ` Hailiang Zhang
@ 2017-02-08 19:53       ` Dr. David Alan Gilbert
  2017-02-13  4:13         ` Hailiang Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2017-02-08 19:53 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xuquan8, zhangchen.fnst, lizhijian, quintela, eddie.dong,
	qemu-devel, amit.shah

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2017/1/18 19:01, Dr. David Alan Gilbert wrote:
> > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> > > If the net connection between primary host and secondary host breaks
> > > while COLO/COLO incoming threads are doing read() or write().
> > > It will block until connection is timeout, and the failover process
> > > will be blocked because of it.
> > > 
> > > So it is necessary to shutdown all the socket fds used by COLO
> > > to avoid this situation. Besides, we should close the corresponding
> > > file descriptors after failvoer BH shutdown them,
> > > Or there will be an error.
> > 
> > Hi,
> >    There are two parts to this patch:
> >     a) Add some semaphores to sequence failover
> >     b) Use shutdown()
> > 
> >    At first I wondered if perhaps they should be split; but I see
> > the reason for the semaphores is mostly to stop the race between
> > the fd's getting closed and the shutdown() calls; so I think it's
> > OK.
> > 
> 
> Hi,
> 
> Yes, you are right, maybe i should add some comments about that.
> Will do in next version.
> 
> > Do you have any problems with these semaphores during powering off the
> > guest?
> > 
> 
> No, we didn't encounter any problems or trigger any bugs in our test
> with this semaphores. In what places do you doubt it may has problems ? :)

I just wondered about other exit cases other than failover; e.g. what
if the guest shutdown or something like that, would it get stuck
waiting for the colo_incoming_sem.

Dave

> Thanks,
> Hailiang
> 
> > Dave
> > 
> > 
> > 
> > 
> > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >   include/migration/migration.h |  3 +++
> > >   migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 46 insertions(+)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 487ac1e..7cac877 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -113,6 +113,7 @@ struct MigrationIncomingState {
> > >       QemuThread colo_incoming_thread;
> > >       /* The coroutine we should enter (back) after failover */
> > >       Coroutine *migration_incoming_co;
> > > +    QemuSemaphore colo_incoming_sem;
> > > 
> > >       /* See savevm.c */
> > >       LoadStateEntry_Head loadvm_handlers;
> > > @@ -182,6 +183,8 @@ struct MigrationState
> > >       QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
> > >       /* The RAMBlock used in the last src_page_request */
> > >       RAMBlock *last_req_rb;
> > > +    /* The semaphore is used to notify COLO thread that failover is finished */
> > > +    QemuSemaphore colo_exit_sem;
> > > 
> > >       /* The semaphore is used to notify COLO thread to do checkpoint */
> > >       QemuSemaphore colo_checkpoint_sem;
> > > diff --git a/migration/colo.c b/migration/colo.c
> > > index 08b2e46..3222812 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
> > >           /* recover runstate to normal migration finish state */
> > >           autostart = true;
> > >       }
> > > +    /*
> > > +     * Make sure COLO incoming thread not block in recv or send,
> > > +     * If mis->from_src_file and mis->to_src_file use the same fd,
> > > +     * The second shutdown() will return -1, we ignore this value,
> > > +     * It is harmless.
> > > +     */
> > > +    if (mis->from_src_file) {
> > > +        qemu_file_shutdown(mis->from_src_file);
> > > +    }
> > > +    if (mis->to_src_file) {
> > > +        qemu_file_shutdown(mis->to_src_file);
> > > +    }
> > > 
> > >       old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > >                                      FAILOVER_STATUS_COMPLETED);
> > > @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
> > >                        "secondary VM", FailoverStatus_lookup[old_state]);
> > >           return;
> > >       }
> > > +    /* Notify COLO incoming thread that failover work is finished */
> > > +    qemu_sem_post(&mis->colo_incoming_sem);
> > >       /* For Secondary VM, jump to incoming co */
> > >       if (mis->migration_incoming_co) {
> > >           qemu_coroutine_enter(mis->migration_incoming_co);
> > > @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
> > >       migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
> > >                         MIGRATION_STATUS_COMPLETED);
> > > 
> > > +    /*
> > > +     * Wake up COLO thread which may blocked in recv() or send(),
> > > +     * The s->rp_state.from_dst_file and s->to_dst_file may use the
> > > +     * same fd, but we still shutdown the fd for twice, it is harmless.
> > > +     */
> > > +    if (s->to_dst_file) {
> > > +        qemu_file_shutdown(s->to_dst_file);
> > > +    }
> > > +    if (s->rp_state.from_dst_file) {
> > > +        qemu_file_shutdown(s->rp_state.from_dst_file);
> > > +    }
> > > +
> > >       old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > >                                      FAILOVER_STATUS_COMPLETED);
> > >       if (old_state != FAILOVER_STATUS_ACTIVE) {
> > > @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
> > >                        FailoverStatus_lookup[old_state]);
> > >           return;
> > >       }
> > > +    /* Notify COLO thread that failover work is finished */
> > > +    qemu_sem_post(&s->colo_exit_sem);
> > >   }
> > > 
> > >   void colo_do_failover(MigrationState *s)
> > > @@ -361,6 +389,14 @@ out:
> > > 
> > >       timer_del(s->colo_delay_timer);
> > > 
> > > +    /* Hope this not to be too long to wait here */
> > > +    qemu_sem_wait(&s->colo_exit_sem);
> > > +    qemu_sem_destroy(&s->colo_exit_sem);
> > > +    /*
> > > +     * Must be called after failover BH is completed,
> > > +     * Or the failover BH may shutdown the wrong fd that
> > > +     * re-used by other threads after we release here.
> > > +     */
> > >       if (s->rp_state.from_dst_file) {
> > >           qemu_fclose(s->rp_state.from_dst_file);
> > >       }
> > > @@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s)
> > >       s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
> > >                                   colo_checkpoint_notify, s);
> > > 
> > > +    qemu_sem_init(&s->colo_exit_sem, 0);
> > >       migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> > >                         MIGRATION_STATUS_COLO);
> > >       colo_process_checkpoint(s);
> > > @@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque)
> > >       uint64_t value;
> > >       Error *local_err = NULL;
> > > 
> > > +    qemu_sem_init(&mis->colo_incoming_sem, 0);
> > > +
> > >       migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > >                         MIGRATION_STATUS_COLO);
> > > 
> > > @@ -533,6 +572,10 @@ out:
> > >           qemu_fclose(fb);
> > >       }
> > > 
> > > +    /* Hope this not to be too long to loop here */
> > > +    qemu_sem_wait(&mis->colo_incoming_sem);
> > > +    qemu_sem_destroy(&mis->colo_incoming_sem);
> > > +    /* Must be called after failover BH is completed */
> > >       if (mis->to_src_file) {
> > >           qemu_fclose(mis->to_src_file);
> > >       }
> > > --
> > > 1.8.3.1
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > .
> > 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] COLO: fix some bugs
  2017-01-17 12:57 [Qemu-devel] [PATCH 0/3] COLO: fix some bugs zhanghailiang
                   ` (2 preceding siblings ...)
  2017-01-17 12:57 ` [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state zhanghailiang
@ 2017-02-10 15:44 ` Dr. David Alan Gilbert
  2017-02-13  8:46   ` Hailiang Zhang
  3 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2017-02-10 15:44 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, quintela, eddie.dong, lizhijian, zhangchen.fnst, xuquan8

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Hi,
> 
> This series fix three bugs of COLO. 
> patch 1 fix one usage case which users want to change checkpoint-delay
> with an extream big value set before, the new value may not take effect
> until finishing a long time of sleep.
> 
> Patch 2 and 3 are old patches that split from previous version
> long time ago, and has been reviewed by Dave.
> 
> I'd like to pick these three patches from the later COLO series,
> which will include some optimization and integrating with block
> replication and COLO net proxy.
> 
> Please review, thanks.

Queued.
I've added the '(Since 2.9)' that Eric asked for.

Dave

> 
> 
> zhanghailiang (3):
>   COLO: fix setting checkpoint-delay not working properly
>   COLO: Shutdown related socket fd while do failover
>   COLO: Don't process failover request while loading VM's state
> 
>  include/migration/colo.h      |   2 +
>  include/migration/migration.h |   8 ++++
>  migration/colo.c              | 102 +++++++++++++++++++++++++++++++++++++-----
>  migration/migration.c         |   3 ++
>  qapi-schema.json              |   4 +-
>  5 files changed, 108 insertions(+), 11 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover
  2017-02-08 19:53       ` Dr. David Alan Gilbert
@ 2017-02-13  4:13         ` Hailiang Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Hailiang Zhang @ 2017-02-13  4:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xuquan8, zhangchen.fnst, lizhijian, quintela, eddie.dong,
	qemu-devel, amit.shah

On 2017/2/9 3:53, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2017/1/18 19:01, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> If the net connection between primary host and secondary host breaks
>>>> while COLO/COLO incoming threads are doing read() or write().
>>>> It will block until connection is timeout, and the failover process
>>>> will be blocked because of it.
>>>>
>>>> So it is necessary to shutdown all the socket fds used by COLO
>>>> to avoid this situation. Besides, we should close the corresponding
>>>> file descriptors after failvoer BH shutdown them,
>>>> Or there will be an error.
>>>
>>> Hi,
>>>     There are two parts to this patch:
>>>      a) Add some semaphores to sequence failover
>>>      b) Use shutdown()
>>>
>>>     At first I wondered if perhaps they should be split; but I see
>>> the reason for the semaphores is mostly to stop the race between
>>> the fd's getting closed and the shutdown() calls; so I think it's
>>> OK.
>>>
>>
>> Hi,
>>
>> Yes, you are right, maybe i should add some comments about that.
>> Will do in next version.
>>
>>> Do you have any problems with these semaphores during powering off the
>>> guest?
>>>
>>
>> No, we didn't encounter any problems or trigger any bugs in our test
>> with this semaphores. In what places do you doubt it may has problems ? :)
>
> I just wondered about other exit cases other than failover; e.g. what
> if the guest shutdown or something like that, would it get stuck
> waiting for the colo_incoming_sem.
>

Sorry for the late reply, too busy with our project these days ...

Your worry makes sense, we have processed the shutdown case specially,
Let qemu does a checkpoint process (It seems that, we should kick colo
thread which might be waiting for colo_checkpoint_sem.) before exit colo therad.
And for the secondary sides, if it receives shutdown request, it will exit
colo incoming thread after some cleanup works.


Thanks.
Hailiang


> Dave
>
>> Thanks,
>> Hailiang
>>
>>> Dave
>>>
>>>
>>>
>>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> ---
>>>>    include/migration/migration.h |  3 +++
>>>>    migration/colo.c              | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 46 insertions(+)
>>>>
>>>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>>>> index 487ac1e..7cac877 100644
>>>> --- a/include/migration/migration.h
>>>> +++ b/include/migration/migration.h
>>>> @@ -113,6 +113,7 @@ struct MigrationIncomingState {
>>>>        QemuThread colo_incoming_thread;
>>>>        /* The coroutine we should enter (back) after failover */
>>>>        Coroutine *migration_incoming_co;
>>>> +    QemuSemaphore colo_incoming_sem;
>>>>
>>>>        /* See savevm.c */
>>>>        LoadStateEntry_Head loadvm_handlers;
>>>> @@ -182,6 +183,8 @@ struct MigrationState
>>>>        QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
>>>>        /* The RAMBlock used in the last src_page_request */
>>>>        RAMBlock *last_req_rb;
>>>> +    /* The semaphore is used to notify COLO thread that failover is finished */
>>>> +    QemuSemaphore colo_exit_sem;
>>>>
>>>>        /* The semaphore is used to notify COLO thread to do checkpoint */
>>>>        QemuSemaphore colo_checkpoint_sem;
>>>> diff --git a/migration/colo.c b/migration/colo.c
>>>> index 08b2e46..3222812 100644
>>>> --- a/migration/colo.c
>>>> +++ b/migration/colo.c
>>>> @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
>>>>            /* recover runstate to normal migration finish state */
>>>>            autostart = true;
>>>>        }
>>>> +    /*
>>>> +     * Make sure COLO incoming thread not block in recv or send,
>>>> +     * If mis->from_src_file and mis->to_src_file use the same fd,
>>>> +     * The second shutdown() will return -1, we ignore this value,
>>>> +     * It is harmless.
>>>> +     */
>>>> +    if (mis->from_src_file) {
>>>> +        qemu_file_shutdown(mis->from_src_file);
>>>> +    }
>>>> +    if (mis->to_src_file) {
>>>> +        qemu_file_shutdown(mis->to_src_file);
>>>> +    }
>>>>
>>>>        old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>>>>                                       FAILOVER_STATUS_COMPLETED);
>>>> @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
>>>>                         "secondary VM", FailoverStatus_lookup[old_state]);
>>>>            return;
>>>>        }
>>>> +    /* Notify COLO incoming thread that failover work is finished */
>>>> +    qemu_sem_post(&mis->colo_incoming_sem);
>>>>        /* For Secondary VM, jump to incoming co */
>>>>        if (mis->migration_incoming_co) {
>>>>            qemu_coroutine_enter(mis->migration_incoming_co);
>>>> @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
>>>>        migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>>>>                          MIGRATION_STATUS_COMPLETED);
>>>>
>>>> +    /*
>>>> +     * Wake up COLO thread which may blocked in recv() or send(),
>>>> +     * The s->rp_state.from_dst_file and s->to_dst_file may use the
>>>> +     * same fd, but we still shutdown the fd for twice, it is harmless.
>>>> +     */
>>>> +    if (s->to_dst_file) {
>>>> +        qemu_file_shutdown(s->to_dst_file);
>>>> +    }
>>>> +    if (s->rp_state.from_dst_file) {
>>>> +        qemu_file_shutdown(s->rp_state.from_dst_file);
>>>> +    }
>>>> +
>>>>        old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
>>>>                                       FAILOVER_STATUS_COMPLETED);
>>>>        if (old_state != FAILOVER_STATUS_ACTIVE) {
>>>> @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
>>>>                         FailoverStatus_lookup[old_state]);
>>>>            return;
>>>>        }
>>>> +    /* Notify COLO thread that failover work is finished */
>>>> +    qemu_sem_post(&s->colo_exit_sem);
>>>>    }
>>>>
>>>>    void colo_do_failover(MigrationState *s)
>>>> @@ -361,6 +389,14 @@ out:
>>>>
>>>>        timer_del(s->colo_delay_timer);
>>>>
>>>> +    /* Hope this not to be too long to wait here */
>>>> +    qemu_sem_wait(&s->colo_exit_sem);
>>>> +    qemu_sem_destroy(&s->colo_exit_sem);
>>>> +    /*
>>>> +     * Must be called after failover BH is completed,
>>>> +     * Or the failover BH may shutdown the wrong fd that
>>>> +     * re-used by other threads after we release here.
>>>> +     */
>>>>        if (s->rp_state.from_dst_file) {
>>>>            qemu_fclose(s->rp_state.from_dst_file);
>>>>        }
>>>> @@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s)
>>>>        s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>>>>                                    colo_checkpoint_notify, s);
>>>>
>>>> +    qemu_sem_init(&s->colo_exit_sem, 0);
>>>>        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>>>>                          MIGRATION_STATUS_COLO);
>>>>        colo_process_checkpoint(s);
>>>> @@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque)
>>>>        uint64_t value;
>>>>        Error *local_err = NULL;
>>>>
>>>> +    qemu_sem_init(&mis->colo_incoming_sem, 0);
>>>> +
>>>>        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>>>                          MIGRATION_STATUS_COLO);
>>>>
>>>> @@ -533,6 +572,10 @@ out:
>>>>            qemu_fclose(fb);
>>>>        }
>>>>
>>>> +    /* Hope this not to be too long to loop here */
>>>> +    qemu_sem_wait(&mis->colo_incoming_sem);
>>>> +    qemu_sem_destroy(&mis->colo_incoming_sem);
>>>> +    /* Must be called after failover BH is completed */
>>>>        if (mis->to_src_file) {
>>>>            qemu_fclose(mis->to_src_file);
>>>>        }
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] COLO: fix some bugs
  2017-02-10 15:44 ` [Qemu-devel] [PATCH 0/3] COLO: fix some bugs Dr. David Alan Gilbert
@ 2017-02-13  8:46   ` Hailiang Zhang
  2017-02-13 10:17     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Hailiang Zhang @ 2017-02-13  8:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: xuquan8, qemu-devel, quintela, eddie.dong, lizhijian, zhangchen.fnst

On 2017/2/10 23:44, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi,
>>
>> This series fix three bugs of COLO.
>> patch 1 fix one usage case which users want to change checkpoint-delay
>> with an extream big value set before, the new value may not take effect
>> until finishing a long time of sleep.
>>
>> Patch 2 and 3 are old patches that split from previous version
>> long time ago, and has been reviewed by Dave.
>>
>> I'd like to pick these three patches from the later COLO series,
>> which will include some optimization and integrating with block
>> replication and COLO net proxy.
>>
>> Please review, thanks.
>
> Queued.
> I've added the '(Since 2.9)' that Eric asked for.
>

Thank you very much.
By the way, as we discussed in patch 1, should i add the codes in patch 1
which kick the colo_checkpoint_sem to quick the colo thread exiting ?
In next version ? Or post a new patch ?

Hailiang


> Dave
>
>>
>>
>> zhanghailiang (3):
>>    COLO: fix setting checkpoint-delay not working properly
>>    COLO: Shutdown related socket fd while do failover
>>    COLO: Don't process failover request while loading VM's state
>>
>>   include/migration/colo.h      |   2 +
>>   include/migration/migration.h |   8 ++++
>>   migration/colo.c              | 102 +++++++++++++++++++++++++++++++++++++-----
>>   migration/migration.c         |   3 ++
>>   qapi-schema.json              |   4 +-
>>   5 files changed, 108 insertions(+), 11 deletions(-)
>>
>> --
>> 1.8.3.1
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] COLO: fix some bugs
  2017-02-13  8:46   ` Hailiang Zhang
@ 2017-02-13 10:17     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2017-02-13 10:17 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xuquan8, qemu-devel, quintela, eddie.dong, lizhijian, zhangchen.fnst

* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
> On 2017/2/10 23:44, Dr. David Alan Gilbert wrote:
> > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> > > Hi,
> > > 
> > > This series fix three bugs of COLO.
> > > patch 1 fix one usage case which users want to change checkpoint-delay
> > > with an extream big value set before, the new value may not take effect
> > > until finishing a long time of sleep.
> > > 
> > > Patch 2 and 3 are old patches that split from previous version
> > > long time ago, and has been reviewed by Dave.
> > > 
> > > I'd like to pick these three patches from the later COLO series,
> > > which will include some optimization and integrating with block
> > > replication and COLO net proxy.
> > > 
> > > Please review, thanks.
> > 
> > Queued.
> > I've added the '(Since 2.9)' that Eric asked for.
> > 
> 
> Thank you very much.
> By the way, as we discussed in patch 1, should i add the codes in patch 1
> which kick the colo_checkpoint_sem to quick the colo thread exiting ?
> In next version ? Or post a new patch ?

Post a new patch sometime.

Dave

> Hailiang
> 
> 
> > Dave
> > 
> > > 
> > > 
> > > zhanghailiang (3):
> > >    COLO: fix setting checkpoint-delay not working properly
> > >    COLO: Shutdown related socket fd while do failover
> > >    COLO: Don't process failover request while loading VM's state
> > > 
> > >   include/migration/colo.h      |   2 +
> > >   include/migration/migration.h |   8 ++++
> > >   migration/colo.c              | 102 +++++++++++++++++++++++++++++++++++++-----
> > >   migration/migration.c         |   3 ++
> > >   qapi-schema.json              |   4 +-
> > >   5 files changed, 108 insertions(+), 11 deletions(-)
> > > 
> > > --
> > > 1.8.3.1
> > > 
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-13 10:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-17 12:57 [Qemu-devel] [PATCH 0/3] COLO: fix some bugs zhanghailiang
2017-01-17 12:57 ` [Qemu-devel] [PATCH 1/3] COLO: fix setting checkpoint-delay not working properly zhanghailiang
2017-02-08 10:38   ` Dr. David Alan Gilbert
2017-02-08 11:18     ` Hailiang Zhang
2017-01-17 12:57 ` [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover zhanghailiang
2017-01-18 11:01   ` Dr. David Alan Gilbert
2017-02-08 11:14     ` Hailiang Zhang
2017-02-08 19:53       ` Dr. David Alan Gilbert
2017-02-13  4:13         ` Hailiang Zhang
2017-01-17 12:57 ` [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state zhanghailiang
2017-01-17 18:24   ` Eric Blake
2017-01-18  8:19     ` Hailiang Zhang
2017-02-10 15:44 ` [Qemu-devel] [PATCH 0/3] COLO: fix some bugs Dr. David Alan Gilbert
2017-02-13  8:46   ` Hailiang Zhang
2017-02-13 10:17     ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.