qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] dump-guest-memory: Add blocker for migration
@ 2021-08-24 15:27 Peter Xu
  2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-24 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrew Jones, Juan Quintela, Dr . David Alan Gilbert, peterx,
	Leonardo Bras Soares Passos, Marc-André Lureau

Both dump-guest-memory and live migration have vm state cached internally.
Allowing them to happen together means the vm state can be messed up.  Simply
block live migration for dump-guest-memory.

One trivial thing to mention is we should still allow dump-guest-memory even if
-only-migratable is specified, because that flag should majorly be used to
guarantee not adding devices that will block migration by accident.  Dump guest
memory is not like that - it'll only block for the seconds when it's dumping.

Thanks,

Peter Xu (2):
  migration: Add migrate_add_blocker_internal()
  dump-guest-memory: Block live migration

 dump/dump.c                 | 20 +++++++++++++++-----
 include/migration/blocker.h | 16 ++++++++++++++++
 include/sysemu/dump.h       |  1 +
 migration/migration.c       | 21 +++++++++++++--------
 4 files changed, 45 insertions(+), 13 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] migration: Add migrate_add_blocker_internal()
  2021-08-24 15:27 [PATCH 0/2] dump-guest-memory: Add blocker for migration Peter Xu
@ 2021-08-24 15:27 ` Peter Xu
  2021-08-24 18:04   ` Marc-André Lureau
  2021-08-25  8:04   ` Juan Quintela
  2021-08-24 15:27 ` [PATCH 2/2] dump-guest-memory: Block live migration Peter Xu
  2021-08-25  7:54 ` [PATCH 0/2] dump-guest-memory: Add blocker for migration Markus Armbruster
  2 siblings, 2 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-24 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrew Jones, Juan Quintela, Dr . David Alan Gilbert, peterx,
	Leonardo Bras Soares Passos, Marc-André Lureau

An internal version that removes -only-migratable implications.  It can be used
for temporary migration blockers like dump-guest-memory.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/blocker.h | 16 ++++++++++++++++
 migration/migration.c       | 21 +++++++++++++--------
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/migration/blocker.h b/include/migration/blocker.h
index acd27018e9..9cebe2ba06 100644
--- a/include/migration/blocker.h
+++ b/include/migration/blocker.h
@@ -25,6 +25,22 @@
  */
 int migrate_add_blocker(Error *reason, Error **errp);
 
+/**
+ * @migrate_add_blocker_internal - prevent migration from proceeding without
+ *                                 only-migrate implications
+ *
+ * @reason - an error to be returned whenever migration is attempted
+ *
+ * @errp - [out] The reason (if any) we cannot block migration right now.
+ *
+ * @returns - 0 on success, -EBUSY on failure, with errp set.
+ *
+ * Some of the migration blockers can be temporary (e.g., for a few seconds),
+ * so it shouldn't need to conflict with "-only-migratable".  For those cases,
+ * we can call this function rather than @migrate_add_blocker().
+ */
+int migrate_add_blocker_internal(Error *reason, Error **errp);
+
 /**
  * @migrate_del_blocker - remove a blocking error from migration
  *
diff --git a/migration/migration.c b/migration/migration.c
index 041b8451a6..41429680c2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2053,15 +2053,8 @@ void migrate_init(MigrationState *s)
     s->threshold_size = 0;
 }
 
-int migrate_add_blocker(Error *reason, Error **errp)
+int migrate_add_blocker_internal(Error *reason, Error **errp)
 {
-    if (only_migratable) {
-        error_propagate_prepend(errp, error_copy(reason),
-                                "disallowing migration blocker "
-                                "(--only-migratable) for: ");
-        return -EACCES;
-    }
-
     if (migration_is_idle()) {
         migration_blockers = g_slist_prepend(migration_blockers, reason);
         return 0;
@@ -2073,6 +2066,18 @@ int migrate_add_blocker(Error *reason, Error **errp)
     return -EBUSY;
 }
 
+int migrate_add_blocker(Error *reason, Error **errp)
+{
+    if (only_migratable) {
+        error_propagate_prepend(errp, error_copy(reason),
+                                "disallowing migration blocker "
+                                "(--only-migratable) for: ");
+        return -EACCES;
+    }
+
+    return migrate_add_blocker_internal(reason, errp);
+}
+
 void migrate_del_blocker(Error *reason)
 {
     migration_blockers = g_slist_remove(migration_blockers, reason);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] dump-guest-memory: Block live migration
  2021-08-24 15:27 [PATCH 0/2] dump-guest-memory: Add blocker for migration Peter Xu
  2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
@ 2021-08-24 15:27 ` Peter Xu
  2021-08-24 18:04   ` Marc-André Lureau
  2021-08-25  7:36   ` Marc-André Lureau
  2021-08-25  7:54 ` [PATCH 0/2] dump-guest-memory: Add blocker for migration Markus Armbruster
  2 siblings, 2 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-24 15:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrew Jones, Juan Quintela, Dr . David Alan Gilbert, peterx,
	Leonardo Bras Soares Passos, Marc-André Lureau

Both dump-guest-memory and live migration caches vm state at the beginning.
Either of them entering the other one will cause race on the vm state, and even
more severe on that (please refer to the crash report in the bug link).

Let's block live migration in dump-guest-memory, and that'll also block
dump-guest-memory if it detected that we're during a live migration.

Side note: migrate_del_blocker() can be called even if the blocker is not
inserted yet, so it's safe to unconditionally delete that blocker in
dump_cleanup (g_slist_remove allows no-entry-found case).

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 dump/dump.c           | 20 +++++++++++++++-----
 include/sysemu/dump.h |  1 +
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index ab625909f3..7996d7a6c5 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -29,6 +29,7 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "hw/misc/vmcoreinfo.h"
+#include "migration/blocker.h"
 
 #ifdef TARGET_X86_64
 #include "win_dump.h"
@@ -101,6 +102,7 @@ static int dump_cleanup(DumpState *s)
             qemu_mutex_unlock_iothread();
         }
     }
+    migrate_del_blocker(s->dump_migration_blocker);
 
     return 0;
 }
@@ -1857,6 +1859,19 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         }
     }
 
+    if (!s->dump_migration_blocker) {
+        error_setg(&s->dump_migration_blocker,
+                   "Live migration disabled: dump-guest-memory in progress");
+    }
+
+    /*
+     * Allows even for -only-migratable, but forbid migration during the
+     * process of dump guest memory.
+     */
+    if (migrate_add_blocker_internal(s->dump_migration_blocker, errp)) {
+        goto cleanup;
+    }
+
     return;
 
 cleanup:
@@ -1927,11 +1942,6 @@ void qmp_dump_guest_memory(bool paging, const char *file,
     Error *local_err = NULL;
     bool detach_p = false;
 
-    if (runstate_check(RUN_STATE_INMIGRATE)) {
-        error_setg(errp, "Dump not allowed during incoming migration.");
-        return;
-    }
-
     /* if there is a dump in background, we should wait until the dump
      * finished */
     if (dump_in_progress()) {
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 250143cb5a..7b619c2a43 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -195,6 +195,7 @@ typedef struct DumpState {
                                   * finished. */
     uint8_t *guest_note;         /* ELF note content */
     size_t guest_note_size;
+    Error *dump_migration_blocker; /* Blocker for live migration */
 } DumpState;
 
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] dump-guest-memory: Block live migration
  2021-08-24 15:27 ` [PATCH 2/2] dump-guest-memory: Block live migration Peter Xu
@ 2021-08-24 18:04   ` Marc-André Lureau
  2021-08-24 19:39     ` Peter Xu
  2021-08-25  7:36   ` Marc-André Lureau
  1 sibling, 1 reply; 13+ messages in thread
From: Marc-André Lureau @ 2021-08-24 18:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Jones, Juan Quintela, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 3131 bytes --]

Hi

On Tue, Aug 24, 2021 at 7:27 PM Peter Xu <peterx@redhat.com> wrote:

> Both dump-guest-memory and live migration caches vm state at the beginning.
> Either of them entering the other one will cause race on the vm state, and
> even
> more severe on that (please refer to the crash report in the bug link).
>
> Let's block live migration in dump-guest-memory, and that'll also block
> dump-guest-memory if it detected that we're during a live migration.
>

How does it detect that migration is in progress?

Otherwise, it looks reasonable to me.


> Side note: migrate_del_blocker() can be called even if the blocker is not
> inserted yet, so it's safe to unconditionally delete that blocker in
> dump_cleanup (g_slist_remove allows no-entry-found case).
>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  dump/dump.c           | 20 +++++++++++++++-----
>  include/sysemu/dump.h |  1 +
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index ab625909f3..7996d7a6c5 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -29,6 +29,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "hw/misc/vmcoreinfo.h"
> +#include "migration/blocker.h"
>
>  #ifdef TARGET_X86_64
>  #include "win_dump.h"
> @@ -101,6 +102,7 @@ static int dump_cleanup(DumpState *s)
>              qemu_mutex_unlock_iothread();
>          }
>      }
> +    migrate_del_blocker(s->dump_migration_blocker);
>
>      return 0;
>  }
> @@ -1857,6 +1859,19 @@ static void dump_init(DumpState *s, int fd, bool
> has_format,
>          }
>      }
>
> +    if (!s->dump_migration_blocker) {
> +        error_setg(&s->dump_migration_blocker,
> +                   "Live migration disabled: dump-guest-memory in
> progress");
> +    }
> +
> +    /*
> +     * Allows even for -only-migratable, but forbid migration during the
> +     * process of dump guest memory.
> +     */
> +    if (migrate_add_blocker_internal(s->dump_migration_blocker, errp)) {
> +        goto cleanup;
> +    }
> +
>      return;
>
>  cleanup:
> @@ -1927,11 +1942,6 @@ void qmp_dump_guest_memory(bool paging, const char
> *file,
>      Error *local_err = NULL;
>      bool detach_p = false;
>
> -    if (runstate_check(RUN_STATE_INMIGRATE)) {
> -        error_setg(errp, "Dump not allowed during incoming migration.");
> -        return;
> -    }
> -
>      /* if there is a dump in background, we should wait until the dump
>       * finished */
>      if (dump_in_progress()) {
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index 250143cb5a..7b619c2a43 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -195,6 +195,7 @@ typedef struct DumpState {
>                                    * finished. */
>      uint8_t *guest_note;         /* ELF note content */
>      size_t guest_note_size;
> +    Error *dump_migration_blocker; /* Blocker for live migration */
>  } DumpState;
>
>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
> --
> 2.31.1
>
>

[-- Attachment #2: Type: text/html, Size: 4367 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Add migrate_add_blocker_internal()
  2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
@ 2021-08-24 18:04   ` Marc-André Lureau
  2021-08-25  8:04   ` Juan Quintela
  1 sibling, 0 replies; 13+ messages in thread
From: Marc-André Lureau @ 2021-08-24 18:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Jones, Juan Quintela, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 3057 bytes --]

On Tue, Aug 24, 2021 at 7:27 PM Peter Xu <peterx@redhat.com> wrote:

> An internal version that removes -only-migratable implications.  It can be
> used
> for temporary migration blockers like dump-guest-memory.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  include/migration/blocker.h | 16 ++++++++++++++++
>  migration/migration.c       | 21 +++++++++++++--------
>  2 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/include/migration/blocker.h b/include/migration/blocker.h
> index acd27018e9..9cebe2ba06 100644
> --- a/include/migration/blocker.h
> +++ b/include/migration/blocker.h
> @@ -25,6 +25,22 @@
>   */
>  int migrate_add_blocker(Error *reason, Error **errp);
>
> +/**
> + * @migrate_add_blocker_internal - prevent migration from proceeding
> without
> + *                                 only-migrate implications
> + *
> + * @reason - an error to be returned whenever migration is attempted
> + *
> + * @errp - [out] The reason (if any) we cannot block migration right now.
> + *
> + * @returns - 0 on success, -EBUSY on failure, with errp set.
> + *
> + * Some of the migration blockers can be temporary (e.g., for a few
> seconds),
> + * so it shouldn't need to conflict with "-only-migratable".  For those
> cases,
> + * we can call this function rather than @migrate_add_blocker().
> + */
> +int migrate_add_blocker_internal(Error *reason, Error **errp);
> +
>  /**
>   * @migrate_del_blocker - remove a blocking error from migration
>   *
> diff --git a/migration/migration.c b/migration/migration.c
> index 041b8451a6..41429680c2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2053,15 +2053,8 @@ void migrate_init(MigrationState *s)
>      s->threshold_size = 0;
>  }
>
> -int migrate_add_blocker(Error *reason, Error **errp)
> +int migrate_add_blocker_internal(Error *reason, Error **errp)
>  {
> -    if (only_migratable) {
> -        error_propagate_prepend(errp, error_copy(reason),
> -                                "disallowing migration blocker "
> -                                "(--only-migratable) for: ");
> -        return -EACCES;
> -    }
> -
>      if (migration_is_idle()) {
>          migration_blockers = g_slist_prepend(migration_blockers, reason);
>          return 0;
> @@ -2073,6 +2066,18 @@ int migrate_add_blocker(Error *reason, Error **errp)
>      return -EBUSY;
>  }
>
> +int migrate_add_blocker(Error *reason, Error **errp)
> +{
> +    if (only_migratable) {
> +        error_propagate_prepend(errp, error_copy(reason),
> +                                "disallowing migration blocker "
> +                                "(--only-migratable) for: ");
> +        return -EACCES;
> +    }
> +
> +    return migrate_add_blocker_internal(reason, errp);
> +}
> +
>  void migrate_del_blocker(Error *reason)
>  {
>      migration_blockers = g_slist_remove(migration_blockers, reason);
> --
> 2.31.1
>
>

[-- Attachment #2: Type: text/html, Size: 4042 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] dump-guest-memory: Block live migration
  2021-08-24 18:04   ` Marc-André Lureau
@ 2021-08-24 19:39     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-24 19:39 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Andrew Jones, Juan Quintela, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

On Tue, Aug 24, 2021 at 10:04:19PM +0400, Marc-André Lureau wrote:
> Hi

Hello, Marc-Andre,

> 
> On Tue, Aug 24, 2021 at 7:27 PM Peter Xu <peterx@redhat.com> wrote:
> 
> > Both dump-guest-memory and live migration caches vm state at the beginning.
> > Either of them entering the other one will cause race on the vm state, and
> > even
> > more severe on that (please refer to the crash report in the bug link).
> >
> > Let's block live migration in dump-guest-memory, and that'll also block
> > dump-guest-memory if it detected that we're during a live migration.
> >
> 
> How does it detect that migration is in progress?

migrate_add_blocker() (and the new migrate_add_blocker_internal()) guaranteed
it; it will only succeed if there's no migration, and it should cover both
sides of migration (as I think migration_is_idle() should return true on both
src/dst when there's one):

    if (migration_is_idle()) {
        migration_blockers = g_slist_prepend(migration_blockers, reason);
        return 0;
    }

    error_propagate_prepend(errp, error_copy(reason),
                            "disallowing migration blocker "
                            "(migration in progress) for: ");
    return -EBUSY;

That's why I removed the old check on incoming migration:

-    if (runstate_check(RUN_STATE_INMIGRATE)) {
-        error_setg(errp, "Dump not allowed during incoming migration.");
-        return;
-    }

Because I think it'll cover that case too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] dump-guest-memory: Block live migration
  2021-08-24 15:27 ` [PATCH 2/2] dump-guest-memory: Block live migration Peter Xu
  2021-08-24 18:04   ` Marc-André Lureau
@ 2021-08-25  7:36   ` Marc-André Lureau
  2021-08-25 20:48     ` Peter Xu
  1 sibling, 1 reply; 13+ messages in thread
From: Marc-André Lureau @ 2021-08-25  7:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Jones, Juan Quintela, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 3136 bytes --]

Hi

On Tue, Aug 24, 2021 at 7:27 PM Peter Xu <peterx@redhat.com> wrote:

> Both dump-guest-memory and live migration caches vm state at the beginning.
> Either of them entering the other one will cause race on the vm state, and
> even
> more severe on that (please refer to the crash report in the bug link).
>
> Let's block live migration in dump-guest-memory, and that'll also block
> dump-guest-memory if it detected that we're during a live migration.
>
> Side note: migrate_del_blocker() can be called even if the blocker is not
> inserted yet, so it's safe to unconditionally delete that blocker in
> dump_cleanup (g_slist_remove allows no-entry-found case).
>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  dump/dump.c           | 20 +++++++++++++++-----
>  include/sysemu/dump.h |  1 +
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index ab625909f3..7996d7a6c5 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -29,6 +29,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "hw/misc/vmcoreinfo.h"
> +#include "migration/blocker.h"
>
>  #ifdef TARGET_X86_64
>  #include "win_dump.h"
> @@ -101,6 +102,7 @@ static int dump_cleanup(DumpState *s)
>              qemu_mutex_unlock_iothread();
>          }
>      }
> +    migrate_del_blocker(s->dump_migration_blocker);
>
>      return 0;
>  }
> @@ -1857,6 +1859,19 @@ static void dump_init(DumpState *s, int fd, bool
> has_format,
>          }
>      }
>
> +    if (!s->dump_migration_blocker) {
> +        error_setg(&s->dump_migration_blocker,
> +                   "Live migration disabled: dump-guest-memory in
> progress");
> +    }
> +
> +    /*
> +     * Allows even for -only-migratable, but forbid migration during the
> +     * process of dump guest memory.
> +     */
> +    if (migrate_add_blocker_internal(s->dump_migration_blocker, errp)) {
> +        goto cleanup;
> +    }
> +
>

Shouldn't this be placed earlier in the function, before
runstate_is_running() and vm_stop() ?

     return;
>
>  cleanup:
> @@ -1927,11 +1942,6 @@ void qmp_dump_guest_memory(bool paging, const char
> *file,
>      Error *local_err = NULL;
>      bool detach_p = false;
>
> -    if (runstate_check(RUN_STATE_INMIGRATE)) {
> -        error_setg(errp, "Dump not allowed during incoming migration.");
> -        return;
> -    }
> -
>      /* if there is a dump in background, we should wait until the dump
>       * finished */
>      if (dump_in_progress()) {
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index 250143cb5a..7b619c2a43 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -195,6 +195,7 @@ typedef struct DumpState {
>                                    * finished. */
>      uint8_t *guest_note;         /* ELF note content */
>      size_t guest_note_size;
> +    Error *dump_migration_blocker; /* Blocker for live migration */
>  } DumpState;
>
>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
> --
> 2.31.1
>
>

[-- Attachment #2: Type: text/html, Size: 4348 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration
  2021-08-24 15:27 [PATCH 0/2] dump-guest-memory: Add blocker for migration Peter Xu
  2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
  2021-08-24 15:27 ` [PATCH 2/2] dump-guest-memory: Block live migration Peter Xu
@ 2021-08-25  7:54 ` Markus Armbruster
  2021-08-25 21:32   ` Peter Xu
  2 siblings, 1 reply; 13+ messages in thread
From: Markus Armbruster @ 2021-08-25  7:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Jones, Juan Quintela, qemu-devel, Dr . David Alan Gilbert,
	Leonardo Bras Soares Passos, David Gibson,
	Marc-André Lureau

Peter Xu <peterx@redhat.com> writes:

> Both dump-guest-memory and live migration have vm state cached internally.
> Allowing them to happen together means the vm state can be messed up.  Simply
> block live migration for dump-guest-memory.
>
> One trivial thing to mention is we should still allow dump-guest-memory even if
> -only-migratable is specified, because that flag should majorly be used to
> guarantee not adding devices that will block migration by accident.  Dump guest
> memory is not like that - it'll only block for the seconds when it's dumping.

I recently ran into a similarly unusual use of migration blockers:

    Subject: -only-migrate and the two different uses of migration blockers
     (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
    Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
    Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>

    We appear to use migration blockers in two ways:

    (1) Prevent migration for an indefinite time, typically due to use of
    some feature that isn't compatible with migration.

    (2) Delay migration for a short time.

    Option -only-migrate is designed for (1).  It interferes with (2).

    Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
    adds a migration blocker on realize, and deletes it on unrealize.  With
    -only-migrate, device realize fails.  Works as designed.

    Example for (2): spapr_mce_req_event() makes an effort to prevent
    migration degrate the reporting of FWNMIs.  It adds a migration blocker
    when it receives one, and deletes it when it's done handling it.  This
    is a best effort; if migration is already in progress by the time FWNMI
    is received, we simply carry on, and that's okay.  However, option
    -only-migrate sabotages the best effort entirely.

    While this isn't exactly terrible, it may be a weakness in our thinking
    and our infrastructure.  I'm bringing it up so the people in charge are
    aware :)

https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html

Downthread there, Dave Gilbert opined

    It almost feels like they need a way to temporarily hold off
    'completion' of migratio - i.e. the phase where we stop the CPU and
    write the device data;  mind you you'd also probably want it to stop
    cold-migrates/snapshots?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Add migrate_add_blocker_internal()
  2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
  2021-08-24 18:04   ` Marc-André Lureau
@ 2021-08-25  8:04   ` Juan Quintela
  1 sibling, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2021-08-25  8:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Marc-André Lureau, Andrew Jones,
	Leonardo Bras Soares Passos, qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> An internal version that removes -only-migratable implications.  It can be used
> for temporary migration blockers like dump-guest-memory.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] dump-guest-memory: Block live migration
  2021-08-25  7:36   ` Marc-André Lureau
@ 2021-08-25 20:48     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-08-25 20:48 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Andrew Jones, Juan Quintela, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

On Wed, Aug 25, 2021 at 11:36:08AM +0400, Marc-André Lureau wrote:
> Shouldn't this be placed earlier in the function, before
> runstate_is_running() and vm_stop() ?

Good point...  Will respin, thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration
  2021-08-25  7:54 ` [PATCH 0/2] dump-guest-memory: Add blocker for migration Markus Armbruster
@ 2021-08-25 21:32   ` Peter Xu
  2021-09-01 11:35     ` Markus Armbruster
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2021-08-25 21:32 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Andrew Jones, Juan Quintela, qemu-devel, Dr . David Alan Gilbert,
	Leonardo Bras Soares Passos, David Gibson,
	Marc-André Lureau

Markus,

On Wed, Aug 25, 2021 at 09:54:12AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Both dump-guest-memory and live migration have vm state cached internally.
> > Allowing them to happen together means the vm state can be messed up.  Simply
> > block live migration for dump-guest-memory.
> >
> > One trivial thing to mention is we should still allow dump-guest-memory even if
> > -only-migratable is specified, because that flag should majorly be used to
> > guarantee not adding devices that will block migration by accident.  Dump guest
> > memory is not like that - it'll only block for the seconds when it's dumping.
> 
> I recently ran into a similarly unusual use of migration blockers:
> 
>     Subject: -only-migrate and the two different uses of migration blockers
>      (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
>     Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
>     Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>
> 
>     We appear to use migration blockers in two ways:
> 
>     (1) Prevent migration for an indefinite time, typically due to use of
>     some feature that isn't compatible with migration.
> 
>     (2) Delay migration for a short time.
> 
>     Option -only-migrate is designed for (1).  It interferes with (2).
> 
>     Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
>     adds a migration blocker on realize, and deletes it on unrealize.  With
>     -only-migrate, device realize fails.  Works as designed.
> 
>     Example for (2): spapr_mce_req_event() makes an effort to prevent
>     migration degrate the reporting of FWNMIs.  It adds a migration blocker
>     when it receives one, and deletes it when it's done handling it.  This
>     is a best effort; if migration is already in progress by the time FWNMI
>     is received, we simply carry on, and that's okay.  However, option
>     -only-migrate sabotages the best effort entirely.
> 
>     While this isn't exactly terrible, it may be a weakness in our thinking
>     and our infrastructure.  I'm bringing it up so the people in charge are
>     aware :)
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html
> 
> Downthread there, Dave Gilbert opined
> 
>     It almost feels like they need a way to temporarily hold off
>     'completion' of migratio - i.e. the phase where we stop the CPU and
>     write the device data;  mind you you'd also probably want it to stop
>     cold-migrates/snapshots?

Yeah, maybe spapr_mce_req_event() can be another candidate of the internal
version of migration_add_blocker().

I can add a patch to replace it if anyone likes me to.

Both cold and live snapshot should have checked migration blockers, I think.
E.g., cold snapshot has:

bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
                  bool has_devices, strList *devices, Error **errp)
{
    [...]
    if (migration_is_blocked(errp)) {
        return false;
    }
    [...]
}

While the live snapshot shares similar code in migrate_prepare().

So looks safe that nothing wrong should happen within add/del pair of blockers.

However I do see that it's possible we'll allow the add_blocker to succeed even
if during cold snapshot, because migration_is_idle() in migration_add_blocker()
only checks migration state, not RUN_STATE_SAVE_VM.  So I'm wondering whether
we'd like one more patch to cover that too, like:

---8<---
diff --git a/migration/migration.c b/migration/migration.c
index 41429680c2..9c602a4ac1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2055,15 +2055,16 @@ void migrate_init(MigrationState *s)
 
 int migrate_add_blocker_internal(Error *reason, Error **errp)
 {
-    if (migration_is_idle()) {
-        migration_blockers = g_slist_prepend(migration_blockers, reason);
-        return 0;
+    /* Snapshots are similar to migrations, so check RUN_STATE_SAVE_VM too. */
+    if (runstate_check(RUN_STATE_SAVE_VM) || !migration_is_idle()) {
+        error_propagate_prepend(errp, error_copy(reason),
+                                "disallowing migration blocker "
+                                "(migration in progress) for: ");
+        return -EBUSY;
     }
 
-    error_propagate_prepend(errp, error_copy(reason),
-                            "disallowing migration blocker "
-                            "(migration in progress) for: ");
-    return -EBUSY;
+    migration_blockers = g_slist_prepend(migration_blockers, reason);
+    return 0;
 }
 
 int migrate_add_blocker(Error *reason, Error **errp)
---8<---

It'll by accident also cover guest dump which also sets RUN_STATE_SAVE_VM, but
I think that's ok.

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration
  2021-08-25 21:32   ` Peter Xu
@ 2021-09-01 11:35     ` Markus Armbruster
  2021-09-03 16:08       ` Peter Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Markus Armbruster @ 2021-09-01 11:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Jones, Juan Quintela, qemu-devel, Dr . David Alan Gilbert,
	Leonardo Bras Soares Passos, David Gibson,
	Marc-André Lureau

Peter Xu <peterx@redhat.com> writes:

> Markus,
>
> On Wed, Aug 25, 2021 at 09:54:12AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Both dump-guest-memory and live migration have vm state cached internally.
>> > Allowing them to happen together means the vm state can be messed up.  Simply
>> > block live migration for dump-guest-memory.
>> >
>> > One trivial thing to mention is we should still allow dump-guest-memory even if
>> > -only-migratable is specified, because that flag should majorly be used to
>> > guarantee not adding devices that will block migration by accident.  Dump guest
>> > memory is not like that - it'll only block for the seconds when it's dumping.
>> 
>> I recently ran into a similarly unusual use of migration blockers:
>> 
>>     Subject: -only-migrate and the two different uses of migration blockers
>>      (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
>>     Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
>>     Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>
>> 
>>     We appear to use migration blockers in two ways:
>> 
>>     (1) Prevent migration for an indefinite time, typically due to use of
>>     some feature that isn't compatible with migration.
>> 
>>     (2) Delay migration for a short time.
>> 
>>     Option -only-migrate is designed for (1).  It interferes with (2).
>> 
>>     Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
>>     adds a migration blocker on realize, and deletes it on unrealize.  With
>>     -only-migrate, device realize fails.  Works as designed.
>> 
>>     Example for (2): spapr_mce_req_event() makes an effort to prevent
>>     migration degrate the reporting of FWNMIs.  It adds a migration blocker
>>     when it receives one, and deletes it when it's done handling it.  This
>>     is a best effort; if migration is already in progress by the time FWNMI
>>     is received, we simply carry on, and that's okay.  However, option
>>     -only-migrate sabotages the best effort entirely.
>> 
>>     While this isn't exactly terrible, it may be a weakness in our thinking
>>     and our infrastructure.  I'm bringing it up so the people in charge are
>>     aware :)
>> 
>> https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html
>> 
>> Downthread there, Dave Gilbert opined
>> 
>>     It almost feels like they need a way to temporarily hold off
>>     'completion' of migratio - i.e. the phase where we stop the CPU and
>>     write the device data;  mind you you'd also probably want it to stop
>>     cold-migrates/snapshots?
>
> Yeah, maybe spapr_mce_req_event() can be another candidate of the internal
> version of migration_add_blocker().
>
> I can add a patch to replace it if anyone likes me to.
>
> Both cold and live snapshot should have checked migration blockers, I think.
> E.g., cold snapshot has:
>
> bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
>                   bool has_devices, strList *devices, Error **errp)
> {
>     [...]
>     if (migration_is_blocked(errp)) {
>         return false;
>     }
>     [...]
> }
>
> While the live snapshot shares similar code in migrate_prepare().
>
> So looks safe that nothing wrong should happen within add/del pair of blockers.
>
> However I do see that it's possible we'll allow the add_blocker to succeed even
> if during cold snapshot, because migration_is_idle() in migration_add_blocker()
> only checks migration state, not RUN_STATE_SAVE_VM.  So I'm wondering whether
> we'd like one more patch to cover that too, like:
>
> ---8<---
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 41429680c2..9c602a4ac1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2055,15 +2055,16 @@ void migrate_init(MigrationState *s)
>  
>  int migrate_add_blocker_internal(Error *reason, Error **errp)
>  {
> -    if (migration_is_idle()) {
> -        migration_blockers = g_slist_prepend(migration_blockers, reason);
> -        return 0;
> +    /* Snapshots are similar to migrations, so check RUN_STATE_SAVE_VM too. */
> +    if (runstate_check(RUN_STATE_SAVE_VM) || !migration_is_idle()) {
> +        error_propagate_prepend(errp, error_copy(reason),
> +                                "disallowing migration blocker "
> +                                "(migration in progress) for: ");
> +        return -EBUSY;
>      }
>  
> -    error_propagate_prepend(errp, error_copy(reason),
> -                            "disallowing migration blocker "
> -                            "(migration in progress) for: ");
> -    return -EBUSY;
> +    migration_blockers = g_slist_prepend(migration_blockers, reason);
> +    return 0;
>  }
>  
>  int migrate_add_blocker(Error *reason, Error **errp)
> ---8<---
>
> It'll by accident also cover guest dump which also sets RUN_STATE_SAVE_VM, but
> I think that's ok.
>
> Thanks,

I delayed answering this in the hope of finding the time to think.  No
luck.  This is a quick answer, hopefully better than nothing and not too
confused.

"Snapshot should honor migration blockers" feels like an independent
issue.  I can't comment on it, because I haven't done my thinking there.

I'm not sure you fully got Dave's suggestion to "temporarily hold off
'completion'".  Let me explain (Dave, please correct misunderstandings,
if any).

Migration blockers and migration are mutually exclusive: you can't
migrate while such blockers are in effect (trying to immediately fails),
and you can't add such blockers while migration is in progress.

The temporary blockers Dave suggested do not block migration, only its
*completion* phase.  This makes sense *only* for short-lived blockers.
If such temporary blockers are in effect when migration is ready to
enter its completion phase, that entry is delayed until the temporary
blockers are gone.  If you try to add a temporary blocker while
migration is in its completion phase, the add blocks until migration
exists its completion phase.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration
  2021-09-01 11:35     ` Markus Armbruster
@ 2021-09-03 16:08       ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2021-09-03 16:08 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Andrew Jones, Juan Quintela, qemu-devel, Dr . David Alan Gilbert,
	Leonardo Bras Soares Passos, David Gibson,
	Marc-André Lureau

On Wed, Sep 01, 2021 at 01:35:18PM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Markus,
> >
> > On Wed, Aug 25, 2021 at 09:54:12AM +0200, Markus Armbruster wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > Both dump-guest-memory and live migration have vm state cached internally.
> >> > Allowing them to happen together means the vm state can be messed up.  Simply
> >> > block live migration for dump-guest-memory.
> >> >
> >> > One trivial thing to mention is we should still allow dump-guest-memory even if
> >> > -only-migratable is specified, because that flag should majorly be used to
> >> > guarantee not adding devices that will block migration by accident.  Dump guest
> >> > memory is not like that - it'll only block for the seconds when it's dumping.
> >> 
> >> I recently ran into a similarly unusual use of migration blockers:
> >> 
> >>     Subject: -only-migrate and the two different uses of migration blockers
> >>      (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
> >>     Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
> >>     Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>
> >> 
> >>     We appear to use migration blockers in two ways:
> >> 
> >>     (1) Prevent migration for an indefinite time, typically due to use of
> >>     some feature that isn't compatible with migration.
> >> 
> >>     (2) Delay migration for a short time.
> >> 
> >>     Option -only-migrate is designed for (1).  It interferes with (2).
> >> 
> >>     Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
> >>     adds a migration blocker on realize, and deletes it on unrealize.  With
> >>     -only-migrate, device realize fails.  Works as designed.
> >> 
> >>     Example for (2): spapr_mce_req_event() makes an effort to prevent
> >>     migration degrate the reporting of FWNMIs.  It adds a migration blocker
> >>     when it receives one, and deletes it when it's done handling it.  This
> >>     is a best effort; if migration is already in progress by the time FWNMI
> >>     is received, we simply carry on, and that's okay.  However, option
> >>     -only-migrate sabotages the best effort entirely.
> >> 
> >>     While this isn't exactly terrible, it may be a weakness in our thinking
> >>     and our infrastructure.  I'm bringing it up so the people in charge are
> >>     aware :)
> >> 
> >> https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html
> >> 
> >> Downthread there, Dave Gilbert opined
> >> 
> >>     It almost feels like they need a way to temporarily hold off
> >>     'completion' of migratio - i.e. the phase where we stop the CPU and
> >>     write the device data;  mind you you'd also probably want it to stop
> >>     cold-migrates/snapshots?
> >
> > Yeah, maybe spapr_mce_req_event() can be another candidate of the internal
> > version of migration_add_blocker().
> >
> > I can add a patch to replace it if anyone likes me to.
> >
> > Both cold and live snapshot should have checked migration blockers, I think.
> > E.g., cold snapshot has:
> >
> > bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
> >                   bool has_devices, strList *devices, Error **errp)
> > {
> >     [...]
> >     if (migration_is_blocked(errp)) {
> >         return false;
> >     }
> >     [...]
> > }
> >
> > While the live snapshot shares similar code in migrate_prepare().
> >
> > So looks safe that nothing wrong should happen within add/del pair of blockers.
> >
> > However I do see that it's possible we'll allow the add_blocker to succeed even
> > if during cold snapshot, because migration_is_idle() in migration_add_blocker()
> > only checks migration state, not RUN_STATE_SAVE_VM.  So I'm wondering whether
> > we'd like one more patch to cover that too, like:
> >
> > ---8<---
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 41429680c2..9c602a4ac1 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2055,15 +2055,16 @@ void migrate_init(MigrationState *s)
> >  
> >  int migrate_add_blocker_internal(Error *reason, Error **errp)
> >  {
> > -    if (migration_is_idle()) {
> > -        migration_blockers = g_slist_prepend(migration_blockers, reason);
> > -        return 0;
> > +    /* Snapshots are similar to migrations, so check RUN_STATE_SAVE_VM too. */
> > +    if (runstate_check(RUN_STATE_SAVE_VM) || !migration_is_idle()) {
> > +        error_propagate_prepend(errp, error_copy(reason),
> > +                                "disallowing migration blocker "
> > +                                "(migration in progress) for: ");
> > +        return -EBUSY;
> >      }
> >  
> > -    error_propagate_prepend(errp, error_copy(reason),
> > -                            "disallowing migration blocker "
> > -                            "(migration in progress) for: ");
> > -    return -EBUSY;
> > +    migration_blockers = g_slist_prepend(migration_blockers, reason);
> > +    return 0;
> >  }
> >  
> >  int migrate_add_blocker(Error *reason, Error **errp)
> > ---8<---
> >
> > It'll by accident also cover guest dump which also sets RUN_STATE_SAVE_VM, but
> > I think that's ok.
> >
> > Thanks,
> 
> I delayed answering this in the hope of finding the time to think.  No
> luck.  This is a quick answer, hopefully better than nothing and not too
> confused.
> 
> "Snapshot should honor migration blockers" feels like an independent
> issue.  I can't comment on it, because I haven't done my thinking there.

Yes it's independent, but I do think they're similar to live migration.  Let's
wait for Dave or Juan to chim in.

> 
> I'm not sure you fully got Dave's suggestion to "temporarily hold off
> 'completion'".  Let me explain (Dave, please correct misunderstandings,
> if any).
> 
> Migration blockers and migration are mutually exclusive: you can't
> migrate while such blockers are in effect (trying to immediately fails),
> and you can't add such blockers while migration is in progress.
> 
> The temporary blockers Dave suggested do not block migration, only its
> *completion* phase.  This makes sense *only* for short-lived blockers.
> If such temporary blockers are in effect when migration is ready to
> enter its completion phase, that entry is delayed until the temporary
> blockers are gone.  If you try to add a temporary blocker while
> migration is in its completion phase, the add blocks until migration
> exists its completion phase.

I see, that sounds like what a mutex would do (which we'll hold during
completion phase of migration), and it looks reasonable.

But I think it's not a blocker of not having a bigger protection - so far my
approach should be blocking the whole migration.  To be explicit:

  - If we dump-guest-memory during migration (not only completion phase,
    anytime after it starts), it fails dump-guest-mem.

  - If we try to live migrate during guest-dump-memory, it fails migration.

So basically that's a "bigger mutex".  We may want to shrink it further, but
IMHO it can also be done on top (and we need some more work to do to justify
everything will still be okay).

It shouldn't be in a rush, imo, and it should be low priority because we don't
really have those two collide each other, not to mention when dump-guest-memory
during completing of migration.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-03 16:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 15:27 [PATCH 0/2] dump-guest-memory: Add blocker for migration Peter Xu
2021-08-24 15:27 ` [PATCH 1/2] migration: Add migrate_add_blocker_internal() Peter Xu
2021-08-24 18:04   ` Marc-André Lureau
2021-08-25  8:04   ` Juan Quintela
2021-08-24 15:27 ` [PATCH 2/2] dump-guest-memory: Block live migration Peter Xu
2021-08-24 18:04   ` Marc-André Lureau
2021-08-24 19:39     ` Peter Xu
2021-08-25  7:36   ` Marc-André Lureau
2021-08-25 20:48     ` Peter Xu
2021-08-25  7:54 ` [PATCH 0/2] dump-guest-memory: Add blocker for migration Markus Armbruster
2021-08-25 21:32   ` Peter Xu
2021-09-01 11:35     ` Markus Armbruster
2021-09-03 16:08       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).