All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Postcopy fixes
@ 2017-02-02 15:59 Dr. David Alan Gilbert (git)
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert Dr. David Alan Gilbert (git)
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure Dr. David Alan Gilbert (git)
  0 siblings, 2 replies; 5+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-02-02 15:59 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah; +Cc: zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  These are a couple of postcopy fixes for failure paths;
The first is a pretty harmless assert, it happens on the destination
when it discovers that it's got an incoming postcopy but
it can't support it - it was going to exit anyway.

The second is more useful and fixes an assert on the source
in the small window where a postcopy that's starting up fails
and tries to recover the source.

Dave

Dr. David Alan Gilbert (2):
  Postcopy: Reset state to avoid cleanup assert
  postcopy: Recover block devices on early failure

 migration/migration.c | 25 +++++++++++++++++++++++++
 migration/savevm.c    |  1 +
 2 files changed, 26 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert
  2017-02-02 15:59 [Qemu-devel] [PATCH 0/2] Postcopy fixes Dr. David Alan Gilbert (git)
@ 2017-02-02 15:59 ` Dr. David Alan Gilbert (git)
  2017-02-02 16:45   ` Juan Quintela
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure Dr. David Alan Gilbert (git)
  1 sibling, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-02-02 15:59 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah; +Cc: zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On a destination host with no userfault support an incoming
postcopy would cause the state to enter ADVISE before
it realised there was no support, and because it was in ADVISE
state it would perform a cleanup at the end.  Since there
was no support the cleanup function should be unreachable,
but ends up being called and asserting.

Reset the state when we realise we have no support, thus the
cleanup doesn't happen.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index e8e5ff5..de86db0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1355,6 +1355,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
     }
 
     if (!postcopy_ram_supported_by_host()) {
+        postcopy_state_set(POSTCOPY_INCOMING_NONE);
         return -1;
     }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure
  2017-02-02 15:59 [Qemu-devel] [PATCH 0/2] Postcopy fixes Dr. David Alan Gilbert (git)
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert Dr. David Alan Gilbert (git)
@ 2017-02-02 15:59 ` Dr. David Alan Gilbert (git)
  2017-02-02 16:45   ` Juan Quintela
  1 sibling, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-02-02 15:59 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah; +Cc: zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

An early postcopy failure can be recovered from as long as we know
we haven't sent the command to run the destination.
We have to undo the bdrv_inactivate_all by calling
bdrv_invalidate_cache_all

Note that I'm not using ms->block_inactive because once we've
sent the postcopy package we dont want anything else to try
and recover the block storage on the source; the destination
might have started writing to it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 2766d2f..283677c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1605,6 +1605,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     QIOChannelBuffer *bioc;
     QEMUFile *fb;
     int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    bool restart_block = false;
     migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
@@ -1624,6 +1625,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     if (ret < 0) {
         goto fail;
     }
+    restart_block = true;
 
     /*
      * Cause any non-postcopiable, but iterative devices to
@@ -1680,6 +1682,18 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
 
     /* <><> end of stuff going into the package */
 
+    /* Last point of recovery; as soon as we send the package the destination
+     * can open devices and potentially start running.
+     * Lets just check again we've not got any errors.
+     */
+    ret = qemu_file_get_error(ms->to_dst_file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored (pre package)");
+        goto fail_closefb;
+    }
+
+    restart_block = false;
+
     /* Now send that blob */
     if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
         goto fail_closefb;
@@ -1717,6 +1731,17 @@ fail_closefb:
 fail:
     migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                           MIGRATION_STATUS_FAILED);
+    if (restart_block) {
+        /* A failure happened early enough that we know the destination hasn't
+         * accessed block devices, so we're safe to recover.
+         */
+        Error *local_err = NULL;
+
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+        }
+    }
     qemu_mutex_unlock_iothread();
     return -1;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert Dr. David Alan Gilbert (git)
@ 2017-02-02 16:45   ` Juan Quintela
  0 siblings, 0 replies; 5+ messages in thread
From: Juan Quintela @ 2017-02-02 16:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, amit.shah, zhang.zhanghailiang

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> On a destination host with no userfault support an incoming
> postcopy would cause the state to enter ADVISE before
> it realised there was no support, and because it was in ADVISE
> state it would perform a cleanup at the end.  Since there
> was no support the cleanup function should be unreachable,
> but ends up being called and asserting.
>
> Reset the state when we realise we have no support, thus the
> cleanup doesn't happen.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure
  2017-02-02 15:59 ` [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure Dr. David Alan Gilbert (git)
@ 2017-02-02 16:45   ` Juan Quintela
  0 siblings, 0 replies; 5+ messages in thread
From: Juan Quintela @ 2017-02-02 16:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, amit.shah, zhang.zhanghailiang

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> An early postcopy failure can be recovered from as long as we know
> we haven't sent the command to run the destination.
> We have to undo the bdrv_inactivate_all by calling
> bdrv_invalidate_cache_all
>
> Note that I'm not using ms->block_inactive because once we've
> sent the postcopy package we dont want anything else to try
> and recover the block storage on the source; the destination
> might have started writing to it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-02 16:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-02 15:59 [Qemu-devel] [PATCH 0/2] Postcopy fixes Dr. David Alan Gilbert (git)
2017-02-02 15:59 ` [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert Dr. David Alan Gilbert (git)
2017-02-02 16:45   ` Juan Quintela
2017-02-02 15:59 ` [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure Dr. David Alan Gilbert (git)
2017-02-02 16:45   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.