qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context
@ 2016-02-23 15:49 Denis V. Lunev
  2016-02-23 15:49 ` [Qemu-devel] [PATCH 1/2] migration (ordinary): move bdrv_invalidate_cache_all of " Denis V. Lunev
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Denis V. Lunev @ 2016-02-23 15:49 UTC (permalink / raw)
  Cc: Amit Shah, Denis V. Lunev, Juan Quintela, qemu-devel, Paolo Bonzini

There is a possibility to hit an assert in qcow2_get_specific_info that
s->qcow_version is undefined. This happens when VM in starting from
suspended state, i.e. it processes incoming migration, and in the same
time 'info block' is called.

The problem is that qcow2_invalidate_cache() closes the image and
memset()s BDRVQcowState in the middle.

This operation should not be performed in coroutine context.

Changes from v2:
- subject lines in patches

Changes from v1:
- fixed spelling. Eric, thank you for spell checking

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 1/2] migration (ordinary): move bdrv_invalidate_cache_all of of coroutine context
  2016-02-23 15:49 [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context Denis V. Lunev
@ 2016-02-23 15:49 ` Denis V. Lunev
  2016-02-23 15:49 ` [Qemu-devel] [PATCH 2/2] migration (postcopy): " Denis V. Lunev
  2016-02-24  8:22 ` [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out " Amit Shah
  2 siblings, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2016-02-23 15:49 UTC (permalink / raw)
  Cc: Amit Shah, Denis V. Lunev, Juan Quintela, qemu-devel, Paolo Bonzini

There is a possibility to hit an assert in qcow2_get_specific_info that
s->qcow_version is undefined. This happens when VM in starting from
suspended state, i.e. it processes incoming migration, and in the same
time 'info block' is called.

The problem is that qcow2_invalidate_cache() closes the image and
memset()s BDRVQcowState in the middle.

The patch moves processing of bdrv_invalidate_cache_all out of
coroutine context for standard migration to avoid that.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
 migration/migration.c | 89 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 40 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a64cfcd..1f8535e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -323,13 +323,59 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     }
 }
 
+static void process_incoming_migration_bh(void *opaque)
+{
+    Error *local_err = NULL;
+    MigrationIncomingState *mis = opaque;
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+        error_report_err(local_err);
+        migrate_decompress_threads_join();
+        exit(EXIT_FAILURE);
+    }
+
+    /*
+     * This must happen after all error conditions are dealt with and
+     * we're sure the VM is going to be running on this host.
+     */
+    qemu_announce_self();
+
+    /* If global state section was not received or we are in running
+       state, we need to obey autostart. Any other state is set with
+       runstate_set. */
+
+    if (!global_state_received() ||
+        global_state_get_runstate() == RUN_STATE_RUNNING) {
+        if (autostart) {
+            vm_start();
+        } else {
+            runstate_set(RUN_STATE_PAUSED);
+        }
+    } else {
+        runstate_set(global_state_get_runstate());
+    }
+    migrate_decompress_threads_join();
+    /*
+     * This must happen after any state changes since as soon as an external
+     * observer sees this event they might start to prod at the VM assuming
+     * it's ready to use.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_COMPLETED);
+    migration_incoming_state_destroy();
+}
+
 static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
-    Error *local_err = NULL;
     MigrationIncomingState *mis;
     PostcopyState ps;
     int ret;
+    QEMUBH *bh;
 
     mis = migration_incoming_state_new(f);
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
@@ -369,45 +415,8 @@ static void process_incoming_migration_co(void *opaque)
         exit(EXIT_FAILURE);
     }
 
-    /* Make sure all file formats flush their mutable metadata */
-    bdrv_invalidate_cache_all(&local_err);
-    if (local_err) {
-        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                          MIGRATION_STATUS_FAILED);
-        error_report_err(local_err);
-        migrate_decompress_threads_join();
-        exit(EXIT_FAILURE);
-    }
-
-    /*
-     * This must happen after all error conditions are dealt with and
-     * we're sure the VM is going to be running on this host.
-     */
-    qemu_announce_self();
-
-    /* If global state section was not received or we are in running
-       state, we need to obey autostart. Any other state is set with
-       runstate_set. */
-
-    if (!global_state_received() ||
-        global_state_get_runstate() == RUN_STATE_RUNNING) {
-        if (autostart) {
-            vm_start();
-        } else {
-            runstate_set(RUN_STATE_PAUSED);
-        }
-    } else {
-        runstate_set(global_state_get_runstate());
-    }
-    migrate_decompress_threads_join();
-    /*
-     * This must happen after any state changes since as soon as an external
-     * observer sees this event they might start to prod at the VM assuming
-     * it's ready to use.
-     */
-    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                      MIGRATION_STATUS_COMPLETED);
-    migration_incoming_state_destroy();
+    bh = qemu_bh_new(process_incoming_migration_bh, mis);
+    qemu_bh_schedule(bh);
 }
 
 void process_incoming_migration(QEMUFile *f)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 2/2] migration (postcopy): move bdrv_invalidate_cache_all of of coroutine context
  2016-02-23 15:49 [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context Denis V. Lunev
  2016-02-23 15:49 ` [Qemu-devel] [PATCH 1/2] migration (ordinary): move bdrv_invalidate_cache_all of " Denis V. Lunev
@ 2016-02-23 15:49 ` Denis V. Lunev
  2016-02-24  8:22 ` [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out " Amit Shah
  2 siblings, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2016-02-23 15:49 UTC (permalink / raw)
  Cc: Amit Shah, Denis V. Lunev, Juan Quintela, qemu-devel, Paolo Bonzini

There is a possibility to hit an assert in qcow2_get_specific_info that
s->qcow_version is undefined. This happens when VM in starting from
suspended state, i.e. it processes incoming migration, and in the same
time 'info block' is called.

The problem is that qcow2_invalidate_cache() closes the image and
memset()s BDRVQcowState in the middle.

The patch moves processing of bdrv_invalidate_cache_all out of
coroutine context for postcopy migration to avoid that. This function
is called with the following stack:
  process_incoming_migration_co
  qemu_loadvm_state
  qemu_loadvm_state_main
  loadvm_process_command
  loadvm_postcopy_handle_run

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
 migration/savevm.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 94f2894..8415fd9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1496,18 +1496,10 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     return 0;
 }
 
-/* After all discards we can start running and asking for pages */
-static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
+static void loadvm_postcopy_handle_run_bh(void *opaque)
 {
-    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
     Error *local_err = NULL;
 
-    trace_loadvm_postcopy_handle_run();
-    if (ps != POSTCOPY_INCOMING_LISTENING) {
-        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
-        return -1;
-    }
-
     /* TODO we should move all of this lot into postcopy_ram.c or a shared code
      * in migration.c
      */
@@ -1519,7 +1511,6 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     bdrv_invalidate_cache_all(&local_err);
     if (local_err) {
         error_report_err(local_err);
-        return -1;
     }
 
     trace_loadvm_postcopy_handle_run_cpu_sync();
@@ -1534,6 +1525,22 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
         /* leave it paused and let management decide when to start the CPU */
         runstate_set(RUN_STATE_PAUSED);
     }
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
+    QEMUBH *bh;
+
+    trace_loadvm_postcopy_handle_run();
+    if (ps != POSTCOPY_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
+    qemu_bh_schedule(bh);
 
     /* We need to finish reading the stream from the package
      * and also stop reading anything more from the stream that loaded the
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context
  2016-02-23 15:49 [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context Denis V. Lunev
  2016-02-23 15:49 ` [Qemu-devel] [PATCH 1/2] migration (ordinary): move bdrv_invalidate_cache_all of " Denis V. Lunev
  2016-02-23 15:49 ` [Qemu-devel] [PATCH 2/2] migration (postcopy): " Denis V. Lunev
@ 2016-02-24  8:22 ` Amit Shah
  2016-02-24  8:32   ` Denis V. Lunev
  2 siblings, 1 reply; 5+ messages in thread
From: Amit Shah @ 2016-02-24  8:22 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel, Juan Quintela

On (Tue) 23 Feb 2016 [18:49:00], Denis V. Lunev wrote:
> There is a possibility to hit an assert in qcow2_get_specific_info that
> s->qcow_version is undefined. This happens when VM in starting from
> suspended state, i.e. it processes incoming migration, and in the same
> time 'info block' is called.
> 
> The problem is that qcow2_invalidate_cache() closes the image and
> memset()s BDRVQcowState in the middle.
> 
> This operation should not be performed in coroutine context.
> 
> Changes from v2:
> - subject lines in patches

Denis, did you see the comment by Fam to your patches?

		Amit

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context
  2016-02-24  8:22 ` [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out " Amit Shah
@ 2016-02-24  8:32   ` Denis V. Lunev
  0 siblings, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2016-02-24  8:32 UTC (permalink / raw)
  To: Amit Shah; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel, Juan Quintela

On 02/24/2016 11:22 AM, Amit Shah wrote:
> On (Tue) 23 Feb 2016 [18:49:00], Denis V. Lunev wrote:
>> There is a possibility to hit an assert in qcow2_get_specific_info that
>> s->qcow_version is undefined. This happens when VM in starting from
>> suspended state, i.e. it processes incoming migration, and in the same
>> time 'info block' is called.
>>
>> The problem is that qcow2_invalidate_cache() closes the image and
>> memset()s BDRVQcowState in the middle.
>>
>> This operation should not be performed in coroutine context.
>>
>> Changes from v2:
>> - subject lines in patches
> Denis, did you see the comment by Fam to your patches?
>
> 		Amit
oops, I have seen it but have forgotten :( This should be fixed in a
perfect world. Though this code is called not frequently
and the amount of data lost is not that big.

OK, I'll rework this. Sorry :(

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-24  8:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23 15:49 [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out of coroutine context Denis V. Lunev
2016-02-23 15:49 ` [Qemu-devel] [PATCH 1/2] migration (ordinary): move bdrv_invalidate_cache_all of " Denis V. Lunev
2016-02-23 15:49 ` [Qemu-devel] [PATCH 2/2] migration (postcopy): " Denis V. Lunev
2016-02-24  8:22 ` [Qemu-devel] [PATCH v3 0/2] move qcow2_invalidate_cache() out " Amit Shah
2016-02-24  8:32   ` Denis V. Lunev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).