* [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd
@ 2019-07-24 9:55 Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Juan Quintela @ 2019-07-24 9:55 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
- Just simplify patch 2 from Ivan
- Add patch 3 to cover everything.
Please review.
My plan is send the three of them for the update
Ivan Ren (3):
migration: fix migrate_cancel leads live_migration thread endless loop
migration: fix migrate_cancel leads live_migration thread hung forever
migration: fix migrate_cancel multifd migration leads destination hung
forever
Juan Quintela (1):
migration: Make explicit that we are quitting multifd
migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 9 deletions(-)
--
2.21.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
2019-07-24 9:55 [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
@ 2019-07-24 9:55 ` Juan Quintela
2019-07-24 10:57 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24 9:55 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
go into endless loop in multifd_send_pages functions.
Reproduce steps:
(qemu) migrate_set_capability multifd on
(qemu) migrate -d url
(qemu) [wait a while]
(qemu) migrate_cancel
Then may get live_migration 100% cpu usage in following stack:
pthread_mutex_lock
qemu_mutex_lock_impl
multifd_send_pages
multifd_queue_page
ram_save_multifd_page
ram_save_target_page
ram_save_host_page
ram_find_and_save_block
ram_find_and_save_block
ram_save_iterate
qemu_savevm_state_iterate
migration_iteration_run
migration_thread
qemu_thread_start
start_thread
clone
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 2b0774c2bf..52a2d498e4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -920,7 +920,7 @@ struct {
* false.
*/
-static void multifd_send_pages(void)
+static int multifd_send_pages(void)
{
int i;
static int next_channel;
@@ -933,6 +933,11 @@ static void multifd_send_pages(void)
p = &multifd_send_state->params[i];
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit!", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return -1;
+ }
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
@@ -951,9 +956,11 @@ static void multifd_send_pages(void)
ram_counters.transferred += transferred;;
qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
+
+ return 1;
}
-static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
{
MultiFDPages_t *pages = multifd_send_state->pages;
@@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
pages->used++;
if (pages->used < pages->allocated) {
- return;
+ return 1;
}
}
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ return -1;
+ }
if (pages->block != block) {
- multifd_queue_page(block, offset);
+ return multifd_queue_page(block, offset);
}
+
+ return 1;
}
static void multifd_send_terminate_threads(Error *err)
@@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
return;
}
if (multifd_send_state->pages->used) {
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ error_report("%s: multifd_send_pages fail", __func__);
+ return;
+ }
}
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return;
+ }
+
p->packet_num = multifd_send_state->packet_num++;
p->flags |= MULTIFD_FLAG_SYNC;
p->pending_job++;
@@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
ram_addr_t offset)
{
- multifd_queue_page(block, offset);
+ if (multifd_queue_page(block, offset) < 0) {
+ return -1;
+ }
ram_counters.normal++;
return 1;
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
2019-07-24 9:55 [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
@ 2019-07-24 9:55 ` Juan Quintela
2019-07-24 10:58 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24 9:55 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
hung forever at some points, because of multifd_send_thread has already
exit for socket error:
1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
channels_ready)
2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
sem_sync)
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Remove spurious not needed bits
---
migration/ram.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 52a2d498e4..87bb7da8e2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
{
MultiFDSendParams *p = opaque;
Error *local_err = NULL;
- int ret;
+ int ret = 0;
+ uint32_t flags = 0;
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
if (p->pending_job) {
uint32_t used = p->pages->used;
uint64_t packet_num = p->packet_num;
- uint32_t flags = p->flags;
+ flags = p->flags;
p->next_packet_size = used * qemu_target_page_size();
multifd_send_fill_packet(p);
@@ -1164,6 +1165,17 @@ out:
multifd_send_terminate_threads(local_err);
}
+ /*
+ * Error happen, I will exit, but I can't just leave, tell
+ * who pay attention to me.
+ */
+ if (ret != 0) {
+ if (flags & MULTIFD_FLAG_SYNC) {
+ qemu_sem_post(&multifd_send_state->sem_sync);
+ }
+ qemu_sem_post(&multifd_send_state->channels_ready);
+ }
+
qemu_mutex_lock(&p->mutex);
p->running = false;
qemu_mutex_unlock(&p->mutex);
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd
2019-07-24 9:55 [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
@ 2019-07-24 9:55 ` Juan Quintela
2019-07-24 11:01 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24 9:55 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
We add a bool to indicate that.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index 87bb7da8e2..eb6716710e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -677,6 +677,8 @@ typedef struct {
QemuMutex mutex;
/* is this channel thread running */
bool running;
+ /* should this thread finish */
+ bool quit;
/* array of pages to receive */
MultiFDPages_t *pages;
/* packet allocated len */
@@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
qemu_mutex_lock(&p->mutex);
+ p->quit = true;
/* We could arrive here for two reasons:
- normal quit, i.e. everything went fine, just finished
- error quit: We close the channels so the channel threads
@@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
if (p->running) {
+ p->quit = true;
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
@@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
uint32_t used;
uint32_t flags;
+ if (p->quit) {
+ break;
+ }
+
ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret == 0) { /* EOF */
@@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
qemu_mutex_init(&p->mutex);
qemu_sem_init(&p->sem_sync, 0);
+ p->quit = false;
p->id = i;
p->pages = multifd_pages_init(page_count);
p->packet_len = sizeof(MultiFDPacket_t)
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
2019-07-24 9:55 [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
` (2 preceding siblings ...)
2019-07-24 9:55 ` [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
@ 2019-07-24 9:55 ` Juan Quintela
2019-07-24 11:03 ` Dr. David Alan Gilbert
3 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2019-07-24 9:55 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When migrate_cancel a multifd migration, if run sequence like this:
[source] [destination]
multifd_send_sync_main[finish]
multifd_recv_thread wait &p->sem_sync
shutdown to_dst_file
detect error from_src_file
send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main]
multifd_load_cleanup
join multifd receive thread forever
will lead destination qemu hung at following stack:
pthread_join
qemu_thread_join
multifd_load_cleanup
process_incoming_migration_co
coroutine_trampoline
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index eb6716710e..889148dd84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
if (p->running) {
p->quit = true;
+ /*
+ * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+ * however try to wakeup it without harm in cleanup phase.
+ */
+ qemu_sem_post(&p->sem_sync);
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
2019-07-24 9:55 ` [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
@ 2019-07-24 10:57 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 10:57 UTC (permalink / raw)
To: Juan Quintela
Cc: Laurent Vivier, Thomas Huth, kvm, qemu-devel, Ivan Ren,
Paolo Bonzini, Ivan Ren, Richard Henderson
* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
>
> When we 'migrate_cancel' a multifd migration, live_migration thread may
> go into endless loop in multifd_send_pages functions.
>
> Reproduce steps:
>
> (qemu) migrate_set_capability multifd on
> (qemu) migrate -d url
> (qemu) [wait a while]
> (qemu) migrate_cancel
>
> Then may get live_migration 100% cpu usage in following stack:
>
> pthread_mutex_lock
> qemu_mutex_lock_impl
> multifd_send_pages
> multifd_queue_page
> ram_save_multifd_page
> ram_save_target_page
> ram_save_host_page
> ram_find_and_save_block
> ram_find_and_save_block
> ram_save_iterate
> qemu_savevm_state_iterate
> migration_iteration_run
> migration_thread
> qemu_thread_start
> start_thread
> clone
>
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/ram.c | 36 +++++++++++++++++++++++++++++-------
> 1 file changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 2b0774c2bf..52a2d498e4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -920,7 +920,7 @@ struct {
> * false.
> */
>
> -static void multifd_send_pages(void)
> +static int multifd_send_pages(void)
> {
> int i;
> static int next_channel;
> @@ -933,6 +933,11 @@ static void multifd_send_pages(void)
> p = &multifd_send_state->params[i];
>
> qemu_mutex_lock(&p->mutex);
> + if (p->quit) {
> + error_report("%s: channel %d has already quit!", __func__, i);
> + qemu_mutex_unlock(&p->mutex);
> + return -1;
> + }
> if (!p->pending_job) {
> p->pending_job++;
> next_channel = (i + 1) % migrate_multifd_channels();
> @@ -951,9 +956,11 @@ static void multifd_send_pages(void)
> ram_counters.transferred += transferred;;
> qemu_mutex_unlock(&p->mutex);
> qemu_sem_post(&p->sem);
> +
> + return 1;
> }
>
> -static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> +static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> {
> MultiFDPages_t *pages = multifd_send_state->pages;
>
> @@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> pages->used++;
>
> if (pages->used < pages->allocated) {
> - return;
> + return 1;
> }
> }
>
> - multifd_send_pages();
> + if (multifd_send_pages() < 0) {
> + return -1;
> + }
>
> if (pages->block != block) {
> - multifd_queue_page(block, offset);
> + return multifd_queue_page(block, offset);
> }
> +
> + return 1;
> }
>
> static void multifd_send_terminate_threads(Error *err)
> @@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
> return;
> }
> if (multifd_send_state->pages->used) {
> - multifd_send_pages();
> + if (multifd_send_pages() < 0) {
> + error_report("%s: multifd_send_pages fail", __func__);
> + return;
> + }
> }
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
> @@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
>
> qemu_mutex_lock(&p->mutex);
>
> + if (p->quit) {
> + error_report("%s: channel %d has already quit", __func__, i);
> + qemu_mutex_unlock(&p->mutex);
> + return;
> + }
> +
> p->packet_num = multifd_send_state->packet_num++;
> p->flags |= MULTIFD_FLAG_SYNC;
> p->pending_job++;
> @@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
> static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
> ram_addr_t offset)
> {
> - multifd_queue_page(block, offset);
> + if (multifd_queue_page(block, offset) < 0) {
> + return -1;
> + }
> ram_counters.normal++;
>
> return 1;
> --
> 2.21.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
2019-07-24 9:55 ` [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
@ 2019-07-24 10:58 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 10:58 UTC (permalink / raw)
To: Juan Quintela
Cc: Laurent Vivier, Thomas Huth, kvm, qemu-devel, Ivan Ren,
Paolo Bonzini, Ivan Ren, Richard Henderson
* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
>
> When we 'migrate_cancel' a multifd migration, live_migration thread may
> hung forever at some points, because of multifd_send_thread has already
> exit for socket error:
> 1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
> channels_ready)
> 2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
> sem_sync)
>
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> ---
>
> Remove spurious not needed bits
> ---
> migration/ram.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 52a2d498e4..87bb7da8e2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
> {
> MultiFDSendParams *p = opaque;
> Error *local_err = NULL;
> - int ret;
> + int ret = 0;
> + uint32_t flags = 0;
>
> trace_multifd_send_thread_start(p->id);
> rcu_register_thread();
> @@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
> if (p->pending_job) {
> uint32_t used = p->pages->used;
> uint64_t packet_num = p->packet_num;
> - uint32_t flags = p->flags;
> + flags = p->flags;
>
> p->next_packet_size = used * qemu_target_page_size();
> multifd_send_fill_packet(p);
> @@ -1164,6 +1165,17 @@ out:
> multifd_send_terminate_threads(local_err);
> }
>
> + /*
> + * Error happen, I will exit, but I can't just leave, tell
> + * who pay attention to me.
> + */
> + if (ret != 0) {
> + if (flags & MULTIFD_FLAG_SYNC) {
> + qemu_sem_post(&multifd_send_state->sem_sync);
> + }
> + qemu_sem_post(&multifd_send_state->channels_ready);
> + }
> +
> qemu_mutex_lock(&p->mutex);
> p->running = false;
> qemu_mutex_unlock(&p->mutex);
> --
> 2.21.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd
2019-07-24 9:55 ` [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
@ 2019-07-24 11:01 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 11:01 UTC (permalink / raw)
To: Juan Quintela
Cc: Laurent Vivier, Thomas Huth, kvm, qemu-devel, Paolo Bonzini,
Richard Henderson
* Juan Quintela (quintela@redhat.com) wrote:
> We add a bool to indicate that.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
OK, similar to send.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/ram.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 87bb7da8e2..eb6716710e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -677,6 +677,8 @@ typedef struct {
> QemuMutex mutex;
> /* is this channel thread running */
> bool running;
> + /* should this thread finish */
> + bool quit;
> /* array of pages to receive */
> MultiFDPages_t *pages;
> /* packet allocated len */
> @@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
> MultiFDRecvParams *p = &multifd_recv_state->params[i];
>
> qemu_mutex_lock(&p->mutex);
> + p->quit = true;
> /* We could arrive here for two reasons:
> - normal quit, i.e. everything went fine, just finished
> - error quit: We close the channels so the channel threads
> @@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
> MultiFDRecvParams *p = &multifd_recv_state->params[i];
>
> if (p->running) {
> + p->quit = true;
> qemu_thread_join(&p->thread);
> }
> object_unref(OBJECT(p->c));
> @@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
> uint32_t used;
> uint32_t flags;
>
> + if (p->quit) {
> + break;
> + }
> +
> ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> p->packet_len, &local_err);
> if (ret == 0) { /* EOF */
> @@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
>
> qemu_mutex_init(&p->mutex);
> qemu_sem_init(&p->sem_sync, 0);
> + p->quit = false;
> p->id = i;
> p->pages = multifd_pages_init(page_count);
> p->packet_len = sizeof(MultiFDPacket_t)
> --
> 2.21.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
2019-07-24 9:55 ` [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
@ 2019-07-24 11:03 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-24 11:03 UTC (permalink / raw)
To: Juan Quintela
Cc: Laurent Vivier, Thomas Huth, kvm, qemu-devel, Ivan Ren,
Paolo Bonzini, Ivan Ren, Richard Henderson
* Juan Quintela (quintela@redhat.com) wrote:
> From: Ivan Ren <renyime@gmail.com>
>
> When migrate_cancel a multifd migration, if run sequence like this:
>
> [source] [destination]
>
> multifd_send_sync_main[finish]
> multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
> detect error from_src_file
> send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main]
> multifd_load_cleanup
> join multifd receive thread forever
>
> will lead destination qemu hung at following stack:
>
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
>
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/ram.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index eb6716710e..889148dd84 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
>
> if (p->running) {
> p->quit = true;
> + /*
> + * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
> + * however try to wakeup it without harm in cleanup phase.
> + */
> + qemu_sem_post(&p->sem_sync);
> qemu_thread_join(&p->thread);
> }
> object_unref(OBJECT(p->c));
> --
> 2.21.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-07-24 11:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24 9:55 [Qemu-devel] [PATCH 0/4] migration: fix migrate_cancel problems of multifd Juan Quintela
2019-07-24 9:55 ` [Qemu-devel] [PATCH 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
2019-07-24 10:57 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
2019-07-24 10:58 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
2019-07-24 11:01 ` Dr. David Alan Gilbert
2019-07-24 9:55 ` [Qemu-devel] [PATCH 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
2019-07-24 11:03 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).