qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
@ 2021-06-29 18:13 Peter Xu
  2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

The 1st patch should fix yank with unregister instance; I think it should also
fix the issue that Leonardo used to fix in this patch:

https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/

The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt
provided a wrong port address.

Note that the multifd zstd test may fail if run migration-test with sudo on
master (which seems to be a known issue now), and it'll still fail after these
two patches applied, however all running tests keep usual.

(Leo: please let me know if this series didn't fix the issue you used to fix)

Please review, thanks.

Peter Xu (2):
  migration: Move yank outside qemu_start_incoming_migration()
  migration: Allow reset of postcopy_recover_triggered when failed

 migration/migration.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration()
  2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
@ 2021-06-29 18:13 ` Peter Xu
  2021-06-30 15:33   ` Dr. David Alan Gilbert
  2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister
before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
the next call to yank_register_instance(), but I think that's wrong.

Firstly, if during recover, we should keep the yank instance there, not
"quickly removing and adding it back".

Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly
crash the dest qemu (right now it can't; but it'll start to work right after
the next patch) because the 1st call of qmp_migrate_recover() will unregister
permanently when the channel failed to establish, then the 2nd call of
qmp_migrate_recover() crashes at yank_unregister_instance().

This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
into qmp_migrate_incoming.  For qmp_migrate_recover(), drop the unregister of
yank instance too since we keep it there during the recovery phase.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4228635d18..1bb03d1eca 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
 
-    if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
-        return;
-    }
-
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
@@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
-        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
         return;
     }
 
+    if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
+        return;
+    }
+
     qemu_start_incoming_migration(uri, &local_err);
 
     if (local_err) {
+        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         error_propagate(errp, local_err);
         return;
     }
@@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
-    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
     qemu_start_incoming_migration(uri, errp);
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed
  2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
  2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
@ 2021-06-29 18:13 ` Peter Xu
  2021-06-30 15:39   ` Dr. David Alan Gilbert
  2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
  2021-06-29 22:38 ` Leonardo Bras Soares Passos
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

It's possible qemu_start_incoming_migration() failed at any point, when it
happens we should reset postcopy_recover_triggered to false so that the user
can still retry with a saner incoming port.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 1bb03d1eca..fcca289ef7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    /*
+     * Don't even bother to use ERRP_GUARD() as it _must_ always be set by
+     * callers (no one should ignore a recover failure); if there is, it's a
+     * programming error.
+     */
+    assert(errp);
+
     if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
         error_setg(errp, "Migrate recover can only be run "
                    "when postcopy is paused.");
@@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * to continue using that newly established channel.
      */
     qemu_start_incoming_migration(uri, errp);
+
+    /* Safe to dereference with the assert above */
+    if (*errp) {
+        /* Reset the flag so user could still retry */
+        qatomic_set(&mis->postcopy_recover_triggered, false);
+    }
 }
 
 void qmp_migrate_pause(Error **errp)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
  2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
  2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
  2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
@ 2021-06-29 19:00 ` Peter Xu
  2021-06-29 22:38 ` Leonardo Bras Soares Passos
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Lukas Straub, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

On Tue, Jun 29, 2021 at 02:13:54PM -0400, Peter Xu wrote:
> Note that the multifd zstd test may fail if run migration-test with sudo on
> master (which seems to be a known issue now), and it'll still fail after these
> two patches applied, however all running tests keep usual.

There's an unexpected accident; please ignore this paragraph as zstd test
actually passes with/without the patchset applied.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
  2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
                   ` (2 preceding siblings ...)
  2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
@ 2021-06-29 22:38 ` Leonardo Bras Soares Passos
  2021-06-29 23:52   ` Peter Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-06-29 22:38 UTC (permalink / raw)
  To: Peter Xu; +Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela

On Tue, Jun 29, 2021 at 3:14 PM Peter Xu <peterx@redhat.com> wrote:
>
> The 1st patch should fix yank with unregister instance; I think it should also
> fix the issue that Leonardo used to fix in this patch:
>
> https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/
>
> The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt
> provided a wrong port address.
>
> Note that the multifd zstd test may fail if run migration-test with sudo on
> master (which seems to be a known issue now), and it'll still fail after these
> two patches applied, however all running tests keep usual.
>
> (Leo: please let me know if this series didn't fix the issue you used to fix)

It does fix the issue, as far as I tested.

>
> Please review, thanks.
>
> Peter Xu (2):
>   migration: Move yank outside qemu_start_incoming_migration()
>   migration: Allow reset of postcopy_recover_triggered when failed
>
>  migration/migration.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
>
> --
> 2.31.1
>
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
  2021-06-29 22:38 ` Leonardo Bras Soares Passos
@ 2021-06-29 23:52   ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 23:52 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela

On Tue, Jun 29, 2021 at 07:38:32PM -0300, Leonardo Bras Soares Passos wrote:
> > (Leo: please let me know if this series didn't fix the issue you used to fix)
> 
> It does fix the issue, as far as I tested.

Thanks, Leo!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration()
  2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
@ 2021-06-30 15:33   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2021-06-30 15:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister
> before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
> the next call to yank_register_instance(), but I think that's wrong.
> 
> Firstly, if during recover, we should keep the yank instance there, not
> "quickly removing and adding it back".
> 
> Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly
> crash the dest qemu (right now it can't; but it'll start to work right after
> the next patch) because the 1st call of qmp_migrate_recover() will unregister
> permanently when the channel failed to establish, then the 2nd call of
> qmp_migrate_recover() crashes at yank_unregister_instance().
> 
> This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
> into qmp_migrate_incoming.  For qmp_migrate_recover(), drop the unregister of
> yank instance too since we keep it there during the recovery phase.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4228635d18..1bb03d1eca 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p = NULL;
>  
> -    if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
> -        return;
> -    }
> -
>      qapi_event_send_migration(MIGRATION_STATUS_SETUP);
>      if (strstart(uri, "tcp:", &p) ||
>          strstart(uri, "unix:", NULL) ||
> @@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
>      } else if (strstart(uri, "fd:", &p)) {
>          fd_start_incoming_migration(p, errp);
>      } else {
> -        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>          error_setg(errp, "unknown migration protocol: %s", uri);
>      }
>  }
> @@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
>          return;
>      }
>  
> +    if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
> +        return;
> +    }
> +
>      qemu_start_incoming_migration(uri, &local_err);
>  
>      if (local_err) {
> +        yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>          error_propagate(errp, local_err);
>          return;
>      }
> @@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>       * only re-setup the migration stream and poke existing migration
>       * to continue using that newly established channel.
>       */
> -    yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>      qemu_start_incoming_migration(uri, errp);
>  }
>  
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed
  2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
@ 2021-06-30 15:39   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2021-06-30 15:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> It's possible qemu_start_incoming_migration() failed at any point, when it
> happens we should reset postcopy_recover_triggered to false so that the user
> can still retry with a saner incoming port.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 1bb03d1eca..fcca289ef7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
> +    /*
> +     * Don't even bother to use ERRP_GUARD() as it _must_ always be set by
> +     * callers (no one should ignore a recover failure); if there is, it's a
> +     * programming error.
> +     */
> +    assert(errp);
> +
>      if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
>          error_setg(errp, "Migrate recover can only be run "
>                     "when postcopy is paused.");
> @@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>       * to continue using that newly established channel.
>       */
>      qemu_start_incoming_migration(uri, errp);
> +
> +    /* Safe to dereference with the assert above */
> +    if (*errp) {
> +        /* Reset the flag so user could still retry */
> +        qatomic_set(&mis->postcopy_recover_triggered, false);
> +    }
>  }
>  
>  void qmp_migrate_pause(Error **errp)
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-30 15:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
2021-06-30 15:33   ` Dr. David Alan Gilbert
2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
2021-06-30 15:39   ` Dr. David Alan Gilbert
2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 22:38 ` Leonardo Bras Soares Passos
2021-06-29 23:52   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).