All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
@ 2019-09-23 17:49 Dr. David Alan Gilbert (git)
  2019-09-23 18:23 ` Alex Bennée
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-23 17:49 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx; +Cc: thuth, alex.bennee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Various parts of the migration code do different things when they're
in postcopy mode; prior to this patch this has been 'postcopy-active'.
This patch extends 'in_postcopy' to include 'postcopy-paused' and
'postcopy-recover'.

In particular, when you set the max-postcopy-bandwidth parameter, this
only affects the current migration fd if we're 'in_postcopy';
this leads to a race in the postcopy recovery test where it increases
the speed from 4k/sec to unlimited, but that increase can get ignored
if the change is made between the point at which the reconnection
happens and it transitions back to active.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 01863a95f5..5f7e4d15e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
 {
     MigrationState *s = migrate_get_current();
 
-    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
+        return true;
+    default:
+        return false;
+    }
 }
 
 bool migration_in_postcopy_after_devices(MigrationState *s)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
@ 2019-09-23 18:23 ` Alex Bennée
  2019-09-24  0:15 ` Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Alex Bennée @ 2019-09-23 18:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: thuth, qemu-devel, peterx, quintela


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

I'm stress testing it now.

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>
>  bool migration_in_postcopy_after_devices(MigrationState *s)


--
Alex Bennée


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
  2019-09-23 18:23 ` Alex Bennée
@ 2019-09-24  0:15 ` Peter Xu
  2019-09-24  7:29 ` Juan Quintela
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2019-09-24  0:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: thuth, alex.bennee, qemu-devel, quintela

On Mon, Sep 23, 2019 at 06:49:42PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
> 
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Yeh this makes quite a lot of sense to me...

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
  2019-09-23 18:23 ` Alex Bennée
  2019-09-24  0:15 ` Peter Xu
@ 2019-09-24  7:29 ` Juan Quintela
  2019-09-24 15:39 ` Alex Bennée
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2019-09-24  7:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: thuth, alex.bennee, qemu-devel, peterx

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2019-09-24  7:29 ` Juan Quintela
@ 2019-09-24 15:39 ` Alex Bennée
  2019-09-25  9:21 ` Markus Armbruster
  2019-09-25 10:37 ` Dr. David Alan Gilbert
  5 siblings, 0 replies; 7+ messages in thread
From: Alex Bennée @ 2019-09-24 15:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: thuth, qemu-devel, peterx, quintela


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

In my xenial stress test I run 100 times and it never triggered the 180s
timeout I set on my retry.py script:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>
>  bool migration_in_postcopy_after_devices(MigrationState *s)


--
Alex Bennée


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2019-09-24 15:39 ` Alex Bennée
@ 2019-09-25  9:21 ` Markus Armbruster
  2019-09-25 10:37 ` Dr. David Alan Gilbert
  5 siblings, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2019-09-25  9:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: thuth, alex.bennee, qemu-devel, peterx, quintela

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

This seems to fix the intermittent hangs I observed and bisected to
commit 8504ddeca0 "migration: Fix postcopy bw for recovery".

Tested-by: Markus Armbruster <armbru@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2019-09-25  9:21 ` Markus Armbruster
@ 2019-09-25 10:37 ` Dr. David Alan Gilbert
  5 siblings, 0 replies; 7+ messages in thread
From: Dr. David Alan Gilbert @ 2019-09-25 10:37 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx; +Cc: thuth, alex.bennee

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
> 
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Queued

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>  
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>  
>  bool migration_in_postcopy_after_devices(MigrationState *s)
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-25 10:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23 17:49 [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
2019-09-23 18:23 ` Alex Bennée
2019-09-24  0:15 ` Peter Xu
2019-09-24  7:29 ` Juan Quintela
2019-09-24 15:39 ` Alex Bennée
2019-09-25  9:21 ` Markus Armbruster
2019-09-25 10:37 ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.