kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] migration: fix migrate_cancel problems of multifd
@ 2019-07-24  9:55 Juan Quintela
  2019-07-24  9:55 ` [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Juan Quintela @ 2019-07-24  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert,
	Laurent Vivier, kvm, Thomas Huth, Richard Henderson

- Just simplify patch 2 from Ivan
- Add patch 3 to cover everything.

Please review.

My plan is send the three of them for the update

Ivan Ren (3):
  migration: fix migrate_cancel leads live_migration thread endless loop
  migration: fix migrate_cancel leads live_migration thread hung forever
  migration: fix migrate_cancel multifd migration leads destination hung
    forever

Juan Quintela (1):
  migration: Make explicit that we are quitting multifd

 migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 9 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
  2019-07-24  9:55 [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
@ 2019-07-24  9:55 ` Juan Quintela
  2019-07-24 10:57   ` Dr. David Alan Gilbert
  2019-07-24  9:55 ` [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert,
	Laurent Vivier, kvm, Thomas Huth, Richard Henderson, Ivan Ren,
	Ivan Ren

From: Ivan Ren <renyime@gmail.com>

When we 'migrate_cancel' a multifd migration, live_migration thread may
go into endless loop in multifd_send_pages functions.

Reproduce steps:

(qemu) migrate_set_capability multifd on
(qemu) migrate -d url
(qemu) [wait a while]
(qemu) migrate_cancel

Then may get live_migration 100% cpu usage in following stack:

pthread_mutex_lock
qemu_mutex_lock_impl
multifd_send_pages
multifd_queue_page
ram_save_multifd_page
ram_save_target_page
ram_save_host_page
ram_find_and_save_block
ram_find_and_save_block
ram_save_iterate
qemu_savevm_state_iterate
migration_iteration_run
migration_thread
qemu_thread_start
start_thread
clone

Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2b0774c2bf..52a2d498e4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -920,7 +920,7 @@ struct {
  * false.
  */
 
-static void multifd_send_pages(void)
+static int multifd_send_pages(void)
 {
     int i;
     static int next_channel;
@@ -933,6 +933,11 @@ static void multifd_send_pages(void)
         p = &multifd_send_state->params[i];
 
         qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            error_report("%s: channel %d has already quit!", __func__, i);
+            qemu_mutex_unlock(&p->mutex);
+            return -1;
+        }
         if (!p->pending_job) {
             p->pending_job++;
             next_channel = (i + 1) % migrate_multifd_channels();
@@ -951,9 +956,11 @@ static void multifd_send_pages(void)
     ram_counters.transferred += transferred;;
     qemu_mutex_unlock(&p->mutex);
     qemu_sem_post(&p->sem);
+
+    return 1;
 }
 
-static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 {
     MultiFDPages_t *pages = multifd_send_state->pages;
 
@@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
         pages->used++;
 
         if (pages->used < pages->allocated) {
-            return;
+            return 1;
         }
     }
 
-    multifd_send_pages();
+    if (multifd_send_pages() < 0) {
+        return -1;
+    }
 
     if (pages->block != block) {
-        multifd_queue_page(block, offset);
+        return  multifd_queue_page(block, offset);
     }
+
+    return 1;
 }
 
 static void multifd_send_terminate_threads(Error *err)
@@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
         return;
     }
     if (multifd_send_state->pages->used) {
-        multifd_send_pages();
+        if (multifd_send_pages() < 0) {
+            error_report("%s: multifd_send_pages fail", __func__);
+            return;
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
 
         qemu_mutex_lock(&p->mutex);
 
+        if (p->quit) {
+            error_report("%s: channel %d has already quit", __func__, i);
+            qemu_mutex_unlock(&p->mutex);
+            return;
+        }
+
         p->packet_num = multifd_send_state->packet_num++;
         p->flags |= MULTIFD_FLAG_SYNC;
         p->pending_job++;
@@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
 static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
                                  ram_addr_t offset)
 {
-    multifd_queue_page(block, offset);
+    if (multifd_queue_page(block, offset) < 0) {
+        return -1;
+    }
     ram_counters.normal++;
 
     return 1;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
  2019-07-24  9:55 [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
  2019-07-24  9:55 ` [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
@ 2019-07-24  9:55 ` Juan Quintela
  2019-07-24 10:58   ` Dr. David Alan Gilbert
  2019-07-24  9:55 ` [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
  2019-07-24  9:55 ` [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
  3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert,
	Laurent Vivier, kvm, Thomas Huth, Richard Henderson, Ivan Ren,
	Ivan Ren

From: Ivan Ren <renyime@gmail.com>

When we 'migrate_cancel' a multifd migration, live_migration thread may
hung forever at some points, because of multifd_send_thread has already
exit for socket error:
1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
   channels_ready)
2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
   sem_sync)

Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Remove spurious not needed bits
---
 migration/ram.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 52a2d498e4..87bb7da8e2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
 {
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
-    int ret;
+    int ret = 0;
+    uint32_t flags = 0;
 
     trace_multifd_send_thread_start(p->id);
     rcu_register_thread();
@@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
         if (p->pending_job) {
             uint32_t used = p->pages->used;
             uint64_t packet_num = p->packet_num;
-            uint32_t flags = p->flags;
+            flags = p->flags;
 
             p->next_packet_size = used * qemu_target_page_size();
             multifd_send_fill_packet(p);
@@ -1164,6 +1165,17 @@ out:
         multifd_send_terminate_threads(local_err);
     }
 
+    /*
+     * Error happen, I will exit, but I can't just leave, tell
+     * who pay attention to me.
+     */
+    if (ret != 0) {
+        if (flags & MULTIFD_FLAG_SYNC) {
+            qemu_sem_post(&multifd_send_state->sem_sync);
+        }
+        qemu_sem_post(&multifd_send_state->channels_ready);
+    }
+
     qemu_mutex_lock(&p->mutex);
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] migration: Make explicit that we are quitting multifd
  2019-07-24  9:55 [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
  2019-07-24  9:55 ` [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
  2019-07-24  9:55 ` [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
@ 2019-07-24  9:55 ` Juan Quintela
  2019-07-24 11:01   ` Dr. David Alan Gilbert
  2019-07-24  9:55 ` [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
  3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert,
	Laurent Vivier, kvm, Thomas Huth, Richard Henderson

We add a bool to indicate that.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 87bb7da8e2..eb6716710e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -677,6 +677,8 @@ typedef struct {
     QemuMutex mutex;
     /* is this channel thread running */
     bool running;
+    /* should this thread finish */
+    bool quit;
     /* array of pages to receive */
     MultiFDPages_t *pages;
     /* packet allocated len */
@@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
         qemu_mutex_lock(&p->mutex);
+        p->quit = true;
         /* We could arrive here for two reasons:
            - normal quit, i.e. everything went fine, just finished
            - error quit: We close the channels so the channel threads
@@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
         if (p->running) {
+            p->quit = true;
             qemu_thread_join(&p->thread);
         }
         object_unref(OBJECT(p->c));
@@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
         uint32_t used;
         uint32_t flags;
 
+        if (p->quit) {
+            break;
+        }
+
         ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
                                        p->packet_len, &local_err);
         if (ret == 0) {   /* EOF */
@@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
 
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem_sync, 0);
+        p->quit = false;
         p->id = i;
         p->pages = multifd_pages_init(page_count);
         p->packet_len = sizeof(MultiFDPacket_t)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
  2019-07-24  9:55 [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
                   ` (2 preceding siblings ...)
  2019-07-24  9:55 ` [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
@ 2019-07-24  9:55 ` Juan Quintela
  2019-07-24 11:03   ` Dr. David Alan Gilbert
  3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert,
	Laurent Vivier, kvm, Thomas Huth, Richard Henderson, Ivan Ren,
	Ivan Ren

From: Ivan Ren <renyime@gmail.com>

When migrate_cancel a multifd migration, if run sequence like this:

        [source]                              [destination]

multifd_send_sync_main[finish]
                                    multifd_recv_thread wait &p->sem_sync
shutdown to_dst_file
                                    detect error from_src_file
send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run multifd_recv_sync_main]
                                    multifd_load_cleanup
                                    join multifd receive thread forever

will lead destination qemu hung at following stack:

pthread_join
qemu_thread_join
multifd_load_cleanup
process_incoming_migration_co
coroutine_trampoline

Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index eb6716710e..889148dd84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
 
         if (p->running) {
             p->quit = true;
+            /*
+             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+             * however try to wakeup it without harm in cleanup phase.
+             */
+            qemu_sem_post(&p->sem_sync);
             qemu_thread_join(&p->thread);
         }
         object_unref(OBJECT(p->c));
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
  2019-07-24  9:55 ` [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
@ 2019-07-24 10:57   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 10:57 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, kvm, Thomas Huth,
	Richard Henderson, Ivan Ren, Ivan Ren

* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
> 
> When we 'migrate_cancel' a multifd migration, live_migration thread may
> go into endless loop in multifd_send_pages functions.
> 
> Reproduce steps:
> 
> (qemu) migrate_set_capability multifd on
> (qemu) migrate -d url
> (qemu) [wait a while]
> (qemu) migrate_cancel
> 
> Then may get live_migration 100% cpu usage in following stack:
> 
> pthread_mutex_lock
> qemu_mutex_lock_impl
> multifd_send_pages
> multifd_queue_page
> ram_save_multifd_page
> ram_save_target_page
> ram_save_host_page
> ram_find_and_save_block
> ram_find_and_save_block
> ram_save_iterate
> qemu_savevm_state_iterate
> migration_iteration_run
> migration_thread
> qemu_thread_start
> start_thread
> clone
> 
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 36 +++++++++++++++++++++++++++++-------
>  1 file changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 2b0774c2bf..52a2d498e4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -920,7 +920,7 @@ struct {
>   * false.
>   */
>  
> -static void multifd_send_pages(void)
> +static int multifd_send_pages(void)
>  {
>      int i;
>      static int next_channel;
> @@ -933,6 +933,11 @@ static void multifd_send_pages(void)
>          p = &multifd_send_state->params[i];
>  
>          qemu_mutex_lock(&p->mutex);
> +        if (p->quit) {
> +            error_report("%s: channel %d has already quit!", __func__, i);
> +            qemu_mutex_unlock(&p->mutex);
> +            return -1;
> +        }
>          if (!p->pending_job) {
>              p->pending_job++;
>              next_channel = (i + 1) % migrate_multifd_channels();
> @@ -951,9 +956,11 @@ static void multifd_send_pages(void)
>      ram_counters.transferred += transferred;;
>      qemu_mutex_unlock(&p->mutex);
>      qemu_sem_post(&p->sem);
> +
> +    return 1;
>  }
>  
> -static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> +static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
>  {
>      MultiFDPages_t *pages = multifd_send_state->pages;
>  
> @@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
>          pages->used++;
>  
>          if (pages->used < pages->allocated) {
> -            return;
> +            return 1;
>          }
>      }
>  
> -    multifd_send_pages();
> +    if (multifd_send_pages() < 0) {
> +        return -1;
> +    }
>  
>      if (pages->block != block) {
> -        multifd_queue_page(block, offset);
> +        return  multifd_queue_page(block, offset);
>      }
> +
> +    return 1;
>  }
>  
>  static void multifd_send_terminate_threads(Error *err)
> @@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
>          return;
>      }
>      if (multifd_send_state->pages->used) {
> -        multifd_send_pages();
> +        if (multifd_send_pages() < 0) {
> +            error_report("%s: multifd_send_pages fail", __func__);
> +            return;
> +        }
>      }
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
> @@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
>  
>          qemu_mutex_lock(&p->mutex);
>  
> +        if (p->quit) {
> +            error_report("%s: channel %d has already quit", __func__, i);
> +            qemu_mutex_unlock(&p->mutex);
> +            return;
> +        }
> +
>          p->packet_num = multifd_send_state->packet_num++;
>          p->flags |= MULTIFD_FLAG_SYNC;
>          p->pending_job++;
> @@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>  static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
>                                   ram_addr_t offset)
>  {
> -    multifd_queue_page(block, offset);
> +    if (multifd_queue_page(block, offset) < 0) {
> +        return -1;
> +    }
>      ram_counters.normal++;
>  
>      return 1;
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
  2019-07-24  9:55 ` [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
@ 2019-07-24 10:58   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 10:58 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, kvm, Thomas Huth,
	Richard Henderson, Ivan Ren, Ivan Ren

* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
> 
> When we 'migrate_cancel' a multifd migration, live_migration thread may
> hung forever at some points, because of multifd_send_thread has already
> exit for socket error:
> 1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
>    channels_ready)
> 2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
>    sem_sync)
> 
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> 
> ---
> 
> Remove spurious not needed bits
> ---
>  migration/ram.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 52a2d498e4..87bb7da8e2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
>  {
>      MultiFDSendParams *p = opaque;
>      Error *local_err = NULL;
> -    int ret;
> +    int ret = 0;
> +    uint32_t flags = 0;
>  
>      trace_multifd_send_thread_start(p->id);
>      rcu_register_thread();
> @@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
>          if (p->pending_job) {
>              uint32_t used = p->pages->used;
>              uint64_t packet_num = p->packet_num;
> -            uint32_t flags = p->flags;
> +            flags = p->flags;
>  
>              p->next_packet_size = used * qemu_target_page_size();
>              multifd_send_fill_packet(p);
> @@ -1164,6 +1165,17 @@ out:
>          multifd_send_terminate_threads(local_err);
>      }
>  
> +    /*
> +     * Error happen, I will exit, but I can't just leave, tell
> +     * who pay attention to me.
> +     */
> +    if (ret != 0) {
> +        if (flags & MULTIFD_FLAG_SYNC) {
> +            qemu_sem_post(&multifd_send_state->sem_sync);
> +        }
> +        qemu_sem_post(&multifd_send_state->channels_ready);
> +    }
> +
>      qemu_mutex_lock(&p->mutex);
>      p->running = false;
>      qemu_mutex_unlock(&p->mutex);
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] migration: Make explicit that we are quitting multifd
  2019-07-24  9:55 ` [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
@ 2019-07-24 11:01   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 11:01 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, kvm, Thomas Huth,
	Richard Henderson

* Juan Quintela (quintela@redhat.com) wrote:
> We add a bool to indicate that.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

OK, similar to send.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 87bb7da8e2..eb6716710e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -677,6 +677,8 @@ typedef struct {
>      QemuMutex mutex;
>      /* is this channel thread running */
>      bool running;
> +    /* should this thread finish */
> +    bool quit;
>      /* array of pages to receive */
>      MultiFDPages_t *pages;
>      /* packet allocated len */
> @@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
>          qemu_mutex_lock(&p->mutex);
> +        p->quit = true;
>          /* We could arrive here for two reasons:
>             - normal quit, i.e. everything went fine, just finished
>             - error quit: We close the channels so the channel threads
> @@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
>          if (p->running) {
> +            p->quit = true;
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));
> @@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
>          uint32_t used;
>          uint32_t flags;
>  
> +        if (p->quit) {
> +            break;
> +        }
> +
>          ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>                                         p->packet_len, &local_err);
>          if (ret == 0) {   /* EOF */
> @@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
>  
>          qemu_mutex_init(&p->mutex);
>          qemu_sem_init(&p->sem_sync, 0);
> +        p->quit = false;
>          p->id = i;
>          p->pages = multifd_pages_init(page_count);
>          p->packet_len = sizeof(MultiFDPacket_t)
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
  2019-07-24  9:55 ` [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
@ 2019-07-24 11:03   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 11:03 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Paolo Bonzini, Laurent Vivier, kvm, Thomas Huth,
	Richard Henderson, Ivan Ren, Ivan Ren

* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
> 
> When migrate_cancel a multifd migration, if run sequence like this:
> 
>         [source]                              [destination]
> 
> multifd_send_sync_main[finish]
>                                     multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
>                                     detect error from_src_file
> send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run multifd_recv_sync_main]
>                                     multifd_load_cleanup
>                                     join multifd receive thread forever
> 
> will lead destination qemu hung at following stack:
> 
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
> 
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index eb6716710e..889148dd84 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
>  
>          if (p->running) {
>              p->quit = true;
> +            /*
> +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
> +             * however try to wakeup it without harm in cleanup phase.
> +             */
> +            qemu_sem_post(&p->sem_sync);
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-24 11:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24  9:55 [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
2019-07-24  9:55 ` [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
2019-07-24 10:57   ` Dr. David Alan Gilbert
2019-07-24  9:55 ` [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
2019-07-24 10:58   ` Dr. David Alan Gilbert
2019-07-24  9:55 ` [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
2019-07-24 11:01   ` Dr. David Alan Gilbert
2019-07-24  9:55 ` [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
2019-07-24 11:03   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).