* [PULL 0/4] Migration patches
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: kvm, Thomas Huth, Dr. David Alan Gilbert, Laurent Vivier,
Juan Quintela, Paolo Bonzini, Richard Henderson
The following changes since commit bf8b024372bf8abf5a9f40bfa65eeefad23ff988:
Update version for v4.1.0-rc2 release (2019-07-23 18:28:08 +0100)
are available in the Git repository at:
https://github.com/juanquintela/qemu.git tags/migration-pull-request
for you to fetch changes up to f193bc0c5342496ce07355c0c30394560a7f4738:
migration: fix migrate_cancel multifd migration leads destination hung forever (2019-07-24 14:47:21 +0200)
----------------------------------------------------------------
Migration pull request
This series fixes problems with migration-cancel while using multifd.
In some cases it can hang waiting in a semaphore.
Please apply.
----------------------------------------------------------------
Ivan Ren (3):
migration: fix migrate_cancel leads live_migration thread endless loop
migration: fix migrate_cancel leads live_migration thread hung forever
migration: fix migrate_cancel multifd migration leads destination hung
forever
Juan Quintela (1):
migration: Make explicit that we are quitting multifd
migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 9 deletions(-)
--
2.21.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PULL 0/4] Migration patches
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
The following changes since commit bf8b024372bf8abf5a9f40bfa65eeefad23ff988:
Update version for v4.1.0-rc2 release (2019-07-23 18:28:08 +0100)
are available in the Git repository at:
https://github.com/juanquintela/qemu.git tags/migration-pull-request
for you to fetch changes up to f193bc0c5342496ce07355c0c30394560a7f4738:
migration: fix migrate_cancel multifd migration leads destination hung forever (2019-07-24 14:47:21 +0200)
----------------------------------------------------------------
Migration pull request
This series fixes problems with migration-cancel while using multifd.
In some cases it can hang waiting in a semaphore.
Please apply.
----------------------------------------------------------------
Ivan Ren (3):
migration: fix migrate_cancel leads live_migration thread endless loop
migration: fix migrate_cancel leads live_migration thread hung forever
migration: fix migrate_cancel multifd migration leads destination hung
forever
Juan Quintela (1):
migration: Make explicit that we are quitting multifd
migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 9 deletions(-)
--
2.21.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PULL 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
@ 2019-07-25 10:57 ` Juan Quintela
-1 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: kvm, Thomas Huth, Dr. David Alan Gilbert, Laurent Vivier,
Juan Quintela, Paolo Bonzini, Richard Henderson, Ivan Ren,
Ivan Ren
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
go into endless loop in multifd_send_pages functions.
Reproduce steps:
(qemu) migrate_set_capability multifd on
(qemu) migrate -d url
(qemu) [wait a while]
(qemu) migrate_cancel
Then may get live_migration 100% cpu usage in following stack:
pthread_mutex_lock
qemu_mutex_lock_impl
multifd_send_pages
multifd_queue_page
ram_save_multifd_page
ram_save_target_page
ram_save_host_page
ram_find_and_save_block
ram_find_and_save_block
ram_save_iterate
qemu_savevm_state_iterate
migration_iteration_run
migration_thread
qemu_thread_start
start_thread
clone
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 2b0774c2bf..52a2d498e4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -920,7 +920,7 @@ struct {
* false.
*/
-static void multifd_send_pages(void)
+static int multifd_send_pages(void)
{
int i;
static int next_channel;
@@ -933,6 +933,11 @@ static void multifd_send_pages(void)
p = &multifd_send_state->params[i];
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit!", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return -1;
+ }
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
@@ -951,9 +956,11 @@ static void multifd_send_pages(void)
ram_counters.transferred += transferred;;
qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
+
+ return 1;
}
-static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
{
MultiFDPages_t *pages = multifd_send_state->pages;
@@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
pages->used++;
if (pages->used < pages->allocated) {
- return;
+ return 1;
}
}
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ return -1;
+ }
if (pages->block != block) {
- multifd_queue_page(block, offset);
+ return multifd_queue_page(block, offset);
}
+
+ return 1;
}
static void multifd_send_terminate_threads(Error *err)
@@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
return;
}
if (multifd_send_state->pages->used) {
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ error_report("%s: multifd_send_pages fail", __func__);
+ return;
+ }
}
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return;
+ }
+
p->packet_num = multifd_send_state->packet_num++;
p->flags |= MULTIFD_FLAG_SYNC;
p->pending_job++;
@@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
ram_addr_t offset)
{
- multifd_queue_page(block, offset);
+ if (multifd_queue_page(block, offset) < 0) {
+ return -1;
+ }
ram_counters.normal++;
return 1;
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PULL 1/4] migration: fix migrate_cancel leads live_migration thread endless loop
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
go into endless loop in multifd_send_pages functions.
Reproduce steps:
(qemu) migrate_set_capability multifd on
(qemu) migrate -d url
(qemu) [wait a while]
(qemu) migrate_cancel
Then may get live_migration 100% cpu usage in following stack:
pthread_mutex_lock
qemu_mutex_lock_impl
multifd_send_pages
multifd_queue_page
ram_save_multifd_page
ram_save_target_page
ram_save_host_page
ram_find_and_save_block
ram_find_and_save_block
ram_save_iterate
qemu_savevm_state_iterate
migration_iteration_run
migration_thread
qemu_thread_start
start_thread
clone
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 2b0774c2bf..52a2d498e4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -920,7 +920,7 @@ struct {
* false.
*/
-static void multifd_send_pages(void)
+static int multifd_send_pages(void)
{
int i;
static int next_channel;
@@ -933,6 +933,11 @@ static void multifd_send_pages(void)
p = &multifd_send_state->params[i];
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit!", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return -1;
+ }
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
@@ -951,9 +956,11 @@ static void multifd_send_pages(void)
ram_counters.transferred += transferred;;
qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
+
+ return 1;
}
-static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
{
MultiFDPages_t *pages = multifd_send_state->pages;
@@ -968,15 +975,19 @@ static void multifd_queue_page(RAMBlock *block, ram_addr_t offset)
pages->used++;
if (pages->used < pages->allocated) {
- return;
+ return 1;
}
}
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ return -1;
+ }
if (pages->block != block) {
- multifd_queue_page(block, offset);
+ return multifd_queue_page(block, offset);
}
+
+ return 1;
}
static void multifd_send_terminate_threads(Error *err)
@@ -1049,7 +1060,10 @@ static void multifd_send_sync_main(void)
return;
}
if (multifd_send_state->pages->used) {
- multifd_send_pages();
+ if (multifd_send_pages() < 0) {
+ error_report("%s: multifd_send_pages fail", __func__);
+ return;
+ }
}
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1058,6 +1072,12 @@ static void multifd_send_sync_main(void)
qemu_mutex_lock(&p->mutex);
+ if (p->quit) {
+ error_report("%s: channel %d has already quit", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return;
+ }
+
p->packet_num = multifd_send_state->packet_num++;
p->flags |= MULTIFD_FLAG_SYNC;
p->pending_job++;
@@ -2033,7 +2053,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
ram_addr_t offset)
{
- multifd_queue_page(block, offset);
+ if (multifd_queue_page(block, offset) < 0) {
+ return -1;
+ }
ram_counters.normal++;
return 1;
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PULL 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
@ 2019-07-25 10:57 ` Juan Quintela
-1 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: kvm, Thomas Huth, Dr. David Alan Gilbert, Laurent Vivier,
Juan Quintela, Paolo Bonzini, Richard Henderson, Ivan Ren,
Ivan Ren
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
hung forever at some points, because of multifd_send_thread has already
exit for socket error:
1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
channels_ready)
2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
sem_sync)
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Remove spurious not needed bits
---
migration/ram.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 52a2d498e4..87bb7da8e2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
{
MultiFDSendParams *p = opaque;
Error *local_err = NULL;
- int ret;
+ int ret = 0;
+ uint32_t flags = 0;
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
if (p->pending_job) {
uint32_t used = p->pages->used;
uint64_t packet_num = p->packet_num;
- uint32_t flags = p->flags;
+ flags = p->flags;
p->next_packet_size = used * qemu_target_page_size();
multifd_send_fill_packet(p);
@@ -1164,6 +1165,17 @@ out:
multifd_send_terminate_threads(local_err);
}
+ /*
+ * Error happen, I will exit, but I can't just leave, tell
+ * who pay attention to me.
+ */
+ if (ret != 0) {
+ if (flags & MULTIFD_FLAG_SYNC) {
+ qemu_sem_post(&multifd_send_state->sem_sync);
+ }
+ qemu_sem_post(&multifd_send_state->channels_ready);
+ }
+
qemu_mutex_lock(&p->mutex);
p->running = false;
qemu_mutex_unlock(&p->mutex);
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PULL 2/4] migration: fix migrate_cancel leads live_migration thread hung forever
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
hung forever at some points, because of multifd_send_thread has already
exit for socket error:
1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
channels_ready)
2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
sem_sync)
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Remove spurious not needed bits
---
migration/ram.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 52a2d498e4..87bb7da8e2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1097,7 +1097,8 @@ static void *multifd_send_thread(void *opaque)
{
MultiFDSendParams *p = opaque;
Error *local_err = NULL;
- int ret;
+ int ret = 0;
+ uint32_t flags = 0;
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -1115,7 +1116,7 @@ static void *multifd_send_thread(void *opaque)
if (p->pending_job) {
uint32_t used = p->pages->used;
uint64_t packet_num = p->packet_num;
- uint32_t flags = p->flags;
+ flags = p->flags;
p->next_packet_size = used * qemu_target_page_size();
multifd_send_fill_packet(p);
@@ -1164,6 +1165,17 @@ out:
multifd_send_terminate_threads(local_err);
}
+ /*
+ * Error happen, I will exit, but I can't just leave, tell
+ * who pay attention to me.
+ */
+ if (ret != 0) {
+ if (flags & MULTIFD_FLAG_SYNC) {
+ qemu_sem_post(&multifd_send_state->sem_sync);
+ }
+ qemu_sem_post(&multifd_send_state->channels_ready);
+ }
+
qemu_mutex_lock(&p->mutex);
p->running = false;
qemu_mutex_unlock(&p->mutex);
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PULL 3/4] migration: Make explicit that we are quitting multifd
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
@ 2019-07-25 10:57 ` Juan Quintela
-1 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: kvm, Thomas Huth, Dr. David Alan Gilbert, Laurent Vivier,
Juan Quintela, Paolo Bonzini, Richard Henderson
We add a bool to indicate that.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index 87bb7da8e2..eb6716710e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -677,6 +677,8 @@ typedef struct {
QemuMutex mutex;
/* is this channel thread running */
bool running;
+ /* should this thread finish */
+ bool quit;
/* array of pages to receive */
MultiFDPages_t *pages;
/* packet allocated len */
@@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
qemu_mutex_lock(&p->mutex);
+ p->quit = true;
/* We could arrive here for two reasons:
- normal quit, i.e. everything went fine, just finished
- error quit: We close the channels so the channel threads
@@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
if (p->running) {
+ p->quit = true;
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
@@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
uint32_t used;
uint32_t flags;
+ if (p->quit) {
+ break;
+ }
+
ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret == 0) { /* EOF */
@@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
qemu_mutex_init(&p->mutex);
qemu_sem_init(&p->sem_sync, 0);
+ p->quit = false;
p->id = i;
p->pages = multifd_pages_init(page_count);
p->packet_len = sizeof(MultiFDPacket_t)
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PULL 3/4] migration: Make explicit that we are quitting multifd
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
We add a bool to indicate that.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index 87bb7da8e2..eb6716710e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -677,6 +677,8 @@ typedef struct {
QemuMutex mutex;
/* is this channel thread running */
bool running;
+ /* should this thread finish */
+ bool quit;
/* array of pages to receive */
MultiFDPages_t *pages;
/* packet allocated len */
@@ -1266,6 +1268,7 @@ static void multifd_recv_terminate_threads(Error *err)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
qemu_mutex_lock(&p->mutex);
+ p->quit = true;
/* We could arrive here for two reasons:
- normal quit, i.e. everything went fine, just finished
- error quit: We close the channels so the channel threads
@@ -1288,6 +1291,7 @@ int multifd_load_cleanup(Error **errp)
MultiFDRecvParams *p = &multifd_recv_state->params[i];
if (p->running) {
+ p->quit = true;
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
@@ -1351,6 +1355,10 @@ static void *multifd_recv_thread(void *opaque)
uint32_t used;
uint32_t flags;
+ if (p->quit) {
+ break;
+ }
+
ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret == 0) { /* EOF */
@@ -1422,6 +1430,7 @@ int multifd_load_setup(void)
qemu_mutex_init(&p->mutex);
qemu_sem_init(&p->sem_sync, 0);
+ p->quit = false;
p->id = i;
p->pages = multifd_pages_init(page_count);
p->packet_len = sizeof(MultiFDPacket_t)
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PULL 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
@ 2019-07-25 10:57 ` Juan Quintela
-1 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: kvm, Thomas Huth, Dr. David Alan Gilbert, Laurent Vivier,
Juan Quintela, Paolo Bonzini, Richard Henderson, Ivan Ren,
Ivan Ren
From: Ivan Ren <renyime@gmail.com>
When migrate_cancel a multifd migration, if run sequence like this:
[source] [destination]
multifd_send_sync_main[finish]
multifd_recv_thread wait &p->sem_sync
shutdown to_dst_file
detect error from_src_file
send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main]
multifd_load_cleanup
join multifd receive thread forever
will lead destination qemu hung at following stack:
pthread_join
qemu_thread_join
multifd_load_cleanup
process_incoming_migration_co
coroutine_trampoline
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index eb6716710e..889148dd84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
if (p->running) {
p->quit = true;
+ /*
+ * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+ * however try to wakeup it without harm in cleanup phase.
+ */
+ qemu_sem_post(&p->sem_sync);
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PULL 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever
@ 2019-07-25 10:57 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2019-07-25 10:57 UTC (permalink / raw)
To: qemu-devel
Cc: Laurent Vivier, Thomas Huth, kvm, Juan Quintela,
Dr. David Alan Gilbert, Ivan Ren, Paolo Bonzini, Ivan Ren,
Richard Henderson
From: Ivan Ren <renyime@gmail.com>
When migrate_cancel a multifd migration, if run sequence like this:
[source] [destination]
multifd_send_sync_main[finish]
multifd_recv_thread wait &p->sem_sync
shutdown to_dst_file
detect error from_src_file
send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main]
multifd_load_cleanup
join multifd receive thread forever
will lead destination qemu hung at following stack:
pthread_join
qemu_thread_join
multifd_load_cleanup
process_incoming_migration_co
coroutine_trampoline
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/ram.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index eb6716710e..889148dd84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1292,6 +1292,11 @@ int multifd_load_cleanup(Error **errp)
if (p->running) {
p->quit = true;
+ /*
+ * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+ * however try to wakeup it without harm in cleanup phase.
+ */
+ qemu_sem_post(&p->sem_sync);
qemu_thread_join(&p->thread);
}
object_unref(OBJECT(p->c));
--
2.21.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PULL 0/4] Migration patches
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
@ 2019-07-25 13:45 ` Peter Maydell
-1 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2019-07-25 13:45 UTC (permalink / raw)
To: Juan Quintela
Cc: QEMU Developers, Laurent Vivier, Thomas Huth, kvm-devel,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
On Thu, 25 Jul 2019 at 11:57, Juan Quintela <quintela@redhat.com> wrote:
>
> The following changes since commit bf8b024372bf8abf5a9f40bfa65eeefad23ff988:
>
> Update version for v4.1.0-rc2 release (2019-07-23 18:28:08 +0100)
>
> are available in the Git repository at:
>
> https://github.com/juanquintela/qemu.git tags/migration-pull-request
>
> for you to fetch changes up to f193bc0c5342496ce07355c0c30394560a7f4738:
>
> migration: fix migrate_cancel multifd migration leads destination hung forever (2019-07-24 14:47:21 +0200)
>
> ----------------------------------------------------------------
> Migration pull request
>
> This series fixes problems with migration-cancel while using multifd.
> In some cases it can hang waiting in a semaphore.
>
> Please apply.
>
> ----------------------------------------------------------------
>
> Ivan Ren (3):
> migration: fix migrate_cancel leads live_migration thread endless loop
> migration: fix migrate_cancel leads live_migration thread hung forever
> migration: fix migrate_cancel multifd migration leads destination hung
> forever
>
> Juan Quintela (1):
> migration: Make explicit that we are quitting multifd
>
> migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 57 insertions(+), 9 deletions(-)
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1
for any user-visible changes.
-- PMM
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PULL 0/4] Migration patches
@ 2019-07-25 13:45 ` Peter Maydell
0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2019-07-25 13:45 UTC (permalink / raw)
To: Juan Quintela
Cc: Laurent Vivier, Thomas Huth, kvm-devel, QEMU Developers,
Dr. David Alan Gilbert, Paolo Bonzini, Richard Henderson
On Thu, 25 Jul 2019 at 11:57, Juan Quintela <quintela@redhat.com> wrote:
>
> The following changes since commit bf8b024372bf8abf5a9f40bfa65eeefad23ff988:
>
> Update version for v4.1.0-rc2 release (2019-07-23 18:28:08 +0100)
>
> are available in the Git repository at:
>
> https://github.com/juanquintela/qemu.git tags/migration-pull-request
>
> for you to fetch changes up to f193bc0c5342496ce07355c0c30394560a7f4738:
>
> migration: fix migrate_cancel multifd migration leads destination hung forever (2019-07-24 14:47:21 +0200)
>
> ----------------------------------------------------------------
> Migration pull request
>
> This series fixes problems with migration-cancel while using multifd.
> In some cases it can hang waiting in a semaphore.
>
> Please apply.
>
> ----------------------------------------------------------------
>
> Ivan Ren (3):
> migration: fix migrate_cancel leads live_migration thread endless loop
> migration: fix migrate_cancel leads live_migration thread hung forever
> migration: fix migrate_cancel multifd migration leads destination hung
> forever
>
> Juan Quintela (1):
> migration: Make explicit that we are quitting multifd
>
> migration/ram.c | 66 ++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 57 insertions(+), 9 deletions(-)
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1
for any user-visible changes.
-- PMM
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-07-25 13:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 10:57 [PULL 0/4] Migration patches Juan Quintela
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
2019-07-25 10:57 ` [PULL 1/4] migration: fix migrate_cancel leads live_migration thread endless loop Juan Quintela
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
2019-07-25 10:57 ` [PULL 2/4] migration: fix migrate_cancel leads live_migration thread hung forever Juan Quintela
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
2019-07-25 10:57 ` [PULL 3/4] migration: Make explicit that we are quitting multifd Juan Quintela
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
2019-07-25 10:57 ` [PULL 4/4] migration: fix migrate_cancel multifd migration leads destination hung forever Juan Quintela
2019-07-25 10:57 ` [Qemu-devel] " Juan Quintela
2019-07-25 13:45 ` [Qemu-devel] [PULL 0/4] Migration patches Peter Maydell
2019-07-25 13:45 ` Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.