* [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
@ 2021-06-29 18:13 Peter Xu
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
Dr . David Alan Gilbert, Juan Quintela
The 1st patch should fix yank with unregister instance; I think it should also
fix the issue that Leonardo used to fix in this patch:
https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/
The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt
provided a wrong port address.
Note that the multifd zstd test may fail if run migration-test with sudo on
master (which seems to be a known issue now), and it'll still fail after these
two patches applied, however all running tests keep usual.
(Leo: please let me know if this series didn't fix the issue you used to fix)
Please review, thanks.
Peter Xu (2):
migration: Move yank outside qemu_start_incoming_migration()
migration: Allow reset of postcopy_recover_triggered when failed
migration/migration.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration()
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
@ 2021-06-29 18:13 ` Peter Xu
2021-06-30 15:33 ` Dr. David Alan Gilbert
2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
Dr . David Alan Gilbert, Juan Quintela
Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister
before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
the next call to yank_register_instance(), but I think that's wrong.
Firstly, if during recover, we should keep the yank instance there, not
"quickly removing and adding it back".
Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly
crash the dest qemu (right now it can't; but it'll start to work right after
the next patch) because the 1st call of qmp_migrate_recover() will unregister
permanently when the channel failed to establish, then the 2nd call of
qmp_migrate_recover() crashes at yank_unregister_instance().
This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of
yank instance too since we keep it there during the recovery phase.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 4228635d18..1bb03d1eca 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
{
const char *p = NULL;
- if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
- return;
- }
-
qapi_event_send_migration(MIGRATION_STATUS_SETUP);
if (strstart(uri, "tcp:", &p) ||
strstart(uri, "unix:", NULL) ||
@@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
} else if (strstart(uri, "fd:", &p)) {
fd_start_incoming_migration(p, errp);
} else {
- yank_unregister_instance(MIGRATION_YANK_INSTANCE);
error_setg(errp, "unknown migration protocol: %s", uri);
}
}
@@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
return;
}
+ if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
+ return;
+ }
+
qemu_start_incoming_migration(uri, &local_err);
if (local_err) {
+ yank_unregister_instance(MIGRATION_YANK_INSTANCE);
error_propagate(errp, local_err);
return;
}
@@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
* only re-setup the migration stream and poke existing migration
* to continue using that newly established channel.
*/
- yank_unregister_instance(MIGRATION_YANK_INSTANCE);
qemu_start_incoming_migration(uri, errp);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
@ 2021-06-29 18:13 ` Peter Xu
2021-06-30 15:39 ` Dr. David Alan Gilbert
2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 22:38 ` Leonardo Bras Soares Passos
3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw)
To: qemu-devel
Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos,
Dr . David Alan Gilbert, Juan Quintela
It's possible qemu_start_incoming_migration() failed at any point, when it
happens we should reset postcopy_recover_triggered to false so that the user
can still retry with a saner incoming port.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 1bb03d1eca..fcca289ef7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp)
{
MigrationIncomingState *mis = migration_incoming_get_current();
+ /*
+ * Don't even bother to use ERRP_GUARD() as it _must_ always be set by
+ * callers (no one should ignore a recover failure); if there is, it's a
+ * programming error.
+ */
+ assert(errp);
+
if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
error_setg(errp, "Migrate recover can only be run "
"when postcopy is paused.");
@@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp)
* to continue using that newly established channel.
*/
qemu_start_incoming_migration(uri, errp);
+
+ /* Safe to dereference with the assert above */
+ if (*errp) {
+ /* Reset the flag so user could still retry */
+ qatomic_set(&mis->postcopy_recover_triggered, false);
+ }
}
void qmp_migrate_pause(Error **errp)
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
@ 2021-06-29 19:00 ` Peter Xu
2021-06-29 22:38 ` Leonardo Bras Soares Passos
3 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 19:00 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Leonardo Bras Soares Passos,
Dr . David Alan Gilbert, Juan Quintela
On Tue, Jun 29, 2021 at 02:13:54PM -0400, Peter Xu wrote:
> Note that the multifd zstd test may fail if run migration-test with sudo on
> master (which seems to be a known issue now), and it'll still fail after these
> two patches applied, however all running tests keep usual.
There's an unexpected accident; please ignore this paragraph as zstd test
actually passes with/without the patchset applied.
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
` (2 preceding siblings ...)
2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
@ 2021-06-29 22:38 ` Leonardo Bras Soares Passos
2021-06-29 23:52 ` Peter Xu
3 siblings, 1 reply; 8+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-06-29 22:38 UTC (permalink / raw)
To: Peter Xu; +Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela
On Tue, Jun 29, 2021 at 3:14 PM Peter Xu <peterx@redhat.com> wrote:
>
> The 1st patch should fix yank with unregister instance; I think it should also
> fix the issue that Leonardo used to fix in this patch:
>
> https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/
>
> The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt
> provided a wrong port address.
>
> Note that the multifd zstd test may fail if run migration-test with sudo on
> master (which seems to be a known issue now), and it'll still fail after these
> two patches applied, however all running tests keep usual.
>
> (Leo: please let me know if this series didn't fix the issue you used to fix)
It does fix the issue, as far as I tested.
>
> Please review, thanks.
>
> Peter Xu (2):
> migration: Move yank outside qemu_start_incoming_migration()
> migration: Allow reset of postcopy_recover_triggered when failed
>
> migration/migration.c | 24 ++++++++++++++++++------
> 1 file changed, 18 insertions(+), 6 deletions(-)
>
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery
2021-06-29 22:38 ` Leonardo Bras Soares Passos
@ 2021-06-29 23:52 ` Peter Xu
0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2021-06-29 23:52 UTC (permalink / raw)
To: Leonardo Bras Soares Passos
Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela
On Tue, Jun 29, 2021 at 07:38:32PM -0300, Leonardo Bras Soares Passos wrote:
> > (Leo: please let me know if this series didn't fix the issue you used to fix)
>
> It does fix the issue, as far as I tested.
Thanks, Leo!
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration()
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
@ 2021-06-30 15:33 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2021-06-30 15:33 UTC (permalink / raw)
To: Peter Xu
Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela
* Peter Xu (peterx@redhat.com) wrote:
> Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister
> before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
> the next call to yank_register_instance(), but I think that's wrong.
>
> Firstly, if during recover, we should keep the yank instance there, not
> "quickly removing and adding it back".
>
> Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly
> crash the dest qemu (right now it can't; but it'll start to work right after
> the next patch) because the 1st call of qmp_migrate_recover() will unregister
> permanently when the channel failed to establish, then the 2nd call of
> qmp_migrate_recover() crashes at yank_unregister_instance().
>
> This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
> into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of
> yank instance too since we keep it there during the recovery phase.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/migration.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 4228635d18..1bb03d1eca 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
> {
> const char *p = NULL;
>
> - if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
> - return;
> - }
> -
> qapi_event_send_migration(MIGRATION_STATUS_SETUP);
> if (strstart(uri, "tcp:", &p) ||
> strstart(uri, "unix:", NULL) ||
> @@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
> } else if (strstart(uri, "fd:", &p)) {
> fd_start_incoming_migration(p, errp);
> } else {
> - yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> error_setg(errp, "unknown migration protocol: %s", uri);
> }
> }
> @@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
> return;
> }
>
> + if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
> + return;
> + }
> +
> qemu_start_incoming_migration(uri, &local_err);
>
> if (local_err) {
> + yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> error_propagate(errp, local_err);
> return;
> }
> @@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
> * only re-setup the migration stream and poke existing migration
> * to continue using that newly established channel.
> */
> - yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> qemu_start_incoming_migration(uri, errp);
> }
>
> --
> 2.31.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed
2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
@ 2021-06-30 15:39 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2021-06-30 15:39 UTC (permalink / raw)
To: Peter Xu
Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela
* Peter Xu (peterx@redhat.com) wrote:
> It's possible qemu_start_incoming_migration() failed at any point, when it
> happens we should reset postcopy_recover_triggered to false so that the user
> can still retry with a saner incoming port.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/migration.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 1bb03d1eca..fcca289ef7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp)
> {
> MigrationIncomingState *mis = migration_incoming_get_current();
>
> + /*
> + * Don't even bother to use ERRP_GUARD() as it _must_ always be set by
> + * callers (no one should ignore a recover failure); if there is, it's a
> + * programming error.
> + */
> + assert(errp);
> +
> if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> error_setg(errp, "Migrate recover can only be run "
> "when postcopy is paused.");
> @@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp)
> * to continue using that newly established channel.
> */
> qemu_start_incoming_migration(uri, errp);
> +
> + /* Safe to dereference with the assert above */
> + if (*errp) {
> + /* Reset the flag so user could still retry */
> + qatomic_set(&mis->postcopy_recover_triggered, false);
> + }
> }
>
> void qmp_migrate_pause(Error **errp)
> --
> 2.31.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-06-30 15:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu
2021-06-30 15:33 ` Dr. David Alan Gilbert
2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu
2021-06-30 15:39 ` Dr. David Alan Gilbert
2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu
2021-06-29 22:38 ` Leonardo Bras Soares Passos
2021-06-29 23:52 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).