All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] migration: pause-before-device
@ 2017-10-11 19:13 Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This set attempts to make a race condition between migration and
drive-mirror (and other block users) soluble by allowing the migration
to be paused after the source qemu releases the block devices but
before the serialisation of the device state.

The symptom of this failure, as reported by Wangjie, is a:
   _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed

and the source qemu dieing; so the problem is pretty nasty.
This has only been seen on 2.9 onwards, but the theory is that
prior to 2.9 it might have been happening anyway and we were
perhaps getting unreported corruptions (lost writes); so this
really needs fixing.

This flow came from discussions between Kevin and me, and we can't
see a way of fixing it without exposing a new state to the management
layer.

The flow is now:

(qemu) migrate_set_capability pause-before-device on
(qemu) migrate -d ...
(qemu) info migrate
...
Migration status: pause-before-device
...
<< issue commands to clean up any block jobs>>

(qemu) migrate_continue pause-before-device
(qemu) info migrate
...
Migration status: completed

This set has been _very_ lightly tested just at the normal migration
code, without the addition of the drive mirror; so this is a first
cut.  I'd appreciate some feedback from libvirt whether the inteface
is OK and ideally a hack to test it in a full libvirt setup to see
if we hit any other issues.

The precopy flow is:
active->pause-before-device->completed

The postcopy flow is:
active->pause-before-device->postcopy-active->completed

Although the behaviour with postcopy only gets interesting when
we add something like Max's active-sync.

Please argue about the command and state naming.

Dave

Dr. David Alan Gilbert (7):
  migration: Add 'pause-before-device' capability
  migration: Add 'pause-before-device' and 'device' statuses
  migration: Wait for semaphore before completing migration
  migration: migrate-continue
  migrate: HMP migate_continue
  migration: allow cancel to unpause
  migration: pause-before-device for postcopy

 hmp-commands.hx       | 12 +++++++
 hmp.c                 | 13 +++++++
 hmp.h                 |  1 +
 migration/migration.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/migration.h |  4 +++
 qapi/migration.json   | 30 ++++++++++++++--
 6 files changed, 152 insertions(+), 4 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When 'pause-before-device' is enabled, the outgoing migration
will pause after completing any outstanding IOs but before serialising
the device state.   At this point the management layer gets the
chance to clean up any device jobs or other device users before
the migration completes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 9 +++++++++
 migration/migration.h | 1 +
 qapi/migration.json   | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 98429dc843..df8f8c9f14 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1521,6 +1521,15 @@ bool migrate_use_multifd(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_MULTIFD];
 }
 
+bool migrate_pause_before_device(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_PAUSE_BEFORE_DEVICE];
+}
+
 int migrate_multifd_channels(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index b83cceadc4..37feea5453 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -177,6 +177,7 @@ bool migrate_zero_blocks(void);
 
 bool migrate_auto_converge(void);
 bool migrate_use_multifd(void);
+bool migrate_pause_before_device(void);
 int migrate_multifd_channels(void);
 int migrate_multifd_page_count(void);
 
diff --git a/qapi/migration.json b/qapi/migration.json
index f8b365e3f5..420b56f194 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -341,6 +341,9 @@
 # @return-path: If enabled, migration will use the return path even
 #               for precopy. (since 2.10)
 #
+# @pause-before-device: Pause outgoing migration before serialising device
+#          state but after finalising any IO. (since 2.11)
+#
 # @x-multifd: Use more than one fd for migration (since 2.11)
 #
 # Since: 1.2
@@ -348,7 +351,7 @@
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
            'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-           'block', 'return-path', 'x-multifd' ] }
+           'block', 'return-path', 'pause-before-device', 'x-multifd' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two statuses for use when the 'pause-before-device'
capability is enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 6 ++++++
 qapi/migration.json   | 8 +++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index df8f8c9f14..e1a87c3d23 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -526,6 +526,8 @@ static bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_SETUP:
+    case MIGRATION_STATUS_PAUSE_BEFORE_DEVICE:
+    case MIGRATION_STATUS_DEVICE:
         return true;
 
     default:
@@ -600,6 +602,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_PAUSE_BEFORE_DEVICE:
+    case MIGRATION_STATUS_DEVICE:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -1183,6 +1187,8 @@ bool migration_is_idle(void)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_COLO:
+    case MIGRATION_STATUS_PAUSE_BEFORE_DEVICE:
+    case MIGRATION_STATUS_DEVICE:
         return false;
     case MIGRATION_STATUS__MAX:
         g_assert_not_reached();
diff --git a/qapi/migration.json b/qapi/migration.json
index 420b56f194..00f2f6cc09 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -96,12 +96,18 @@
 # @colo: VM is in the process of fault tolerance, VM can not get into this
 #        state unless colo capability is enabled for migration. (since 2.8)
 #
+# @pause-before-device: Paused before device serialisation. (since 2.11)
+#
+# @device: During device serialisation when pause-before-device is enabled
+#        (since 2.11)
+#
 # Since: 2.3
 #
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed', 'colo' ] }
+            'active', 'postcopy-active', 'completed', 'failed', 'colo',
+            'pause-before-device', 'device' ] }
 
 ##
 # @MigrationInfo:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-18  3:35   ` Peter Xu
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wait for a semaphore before completing the migration,
if the previously added capability was enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 migration/migration.h |  3 +++
 2 files changed, 50 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index e1a87c3d23..b411a7bb63 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1967,6 +1967,46 @@ fail:
 }
 
 /**
+ * migration_maybe_pause: Pause if required to by migrate_pause_before_device
+ * called with the iothread locked
+ * Returns: 0 on success
+ */
+static int migration_maybe_pause(MigrationState *s, int *current_active_state)
+{
+    int ret;
+    if (!migrate_pause_before_device()) {
+        return 0;
+    }
+    ret = bdrv_inactivate_all();
+    if (ret) {
+        error_report("%s: bdrv_inactivate_all() failed (%d)",
+                     __func__, ret);
+        return ret;
+    }
+
+    s->block_inactive = true;
+
+    /* Since leaving this state is not atomic with posting the semaphore
+     * it's possible that someone could have issued multiple migrate_continue
+     * and the semaphore is incorrectly positive at this point;
+     * the docs say it's undefined to reinit a semaphore that's already
+     * init'd, so use timedwait to eat up any existing posts.
+     */
+    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);
+
+    qemu_mutex_unlock_iothread();
+    migrate_set_state(&s->state, *current_active_state,
+                      MIGRATION_STATUS_PAUSE_BEFORE_DEVICE);
+    qemu_sem_wait(&s->pause_sem);
+    migrate_set_state(&s->state, MIGRATION_STATUS_PAUSE_BEFORE_DEVICE,
+                      MIGRATION_STATUS_DEVICE);
+    *current_active_state = MIGRATION_STATUS_DEVICE;
+    qemu_mutex_lock_iothread();
+
+    return s->state == MIGRATION_STATUS_DEVICE ? 0 : -EINVAL;
+}
+
+/**
  * migration_completion: Used by migration_thread when there's not much left.
  *   The caller 'breaks' the loop when this returns.
  *
@@ -1992,6 +2032,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
             bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             if (ret >= 0) {
+                ret = migration_maybe_pause(s, &current_active_state);
+                /* If this worked it will already have inactivated */
+                inactivate &= !migrate_pause_before_device();
+            }
+            if (ret >= 0) {
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                          inactivate);
@@ -2372,6 +2417,7 @@ static void migration_instance_finalize(Object *obj)
 
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
+    qemu_sem_destroy(&ms->pause_sem);
 }
 
 static void migration_instance_init(Object *obj)
@@ -2382,6 +2428,7 @@ static void migration_instance_init(Object *obj)
     ms->state = MIGRATION_STATUS_NONE;
     ms->xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
     ms->mbps = -1;
+    qemu_sem_init(&ms->pause_sem, 0);
 
     params->tls_hostname = g_strdup("");
     params->tls_creds = g_strdup("");
diff --git a/migration/migration.h b/migration/migration.h
index 37feea5453..447e8b3f79 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -121,6 +121,9 @@ struct MigrationState
     /* Flag set once the migration thread called bdrv_inactivate_all */
     bool block_inactive;
 
+    /* Migration is paused due to pause-before-device */
+    QemuSemaphore pause_sem;
+
     /* The semaphore is used to notify COLO thread that failover is finished */
     QemuSemaphore colo_exit_sem;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 4/7] migration: migrate-continue
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

A new qmp command allows the caller to continue from a given
paused state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 11 +++++++++++
 qapi/migration.json   | 17 +++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index b411a7bb63..be03e8ff0b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1368,6 +1368,17 @@ void qmp_migrate_cancel(Error **errp)
     migrate_fd_cancel(migrate_get_current());
 }
 
+void qmp_migrate_continue(MigrationStatus state, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    if (s->state != state) {
+        error_setg(errp,  "Migration not in expected state: %s",
+                   MigrationStatus_str(s->state));
+        return;
+    }
+    qemu_sem_post(&s->pause_sem);
+}
+
 void qmp_migrate_set_cache_size(int64_t value, Error **errp)
 {
     MigrationState *s = migrate_get_current();
diff --git a/qapi/migration.json b/qapi/migration.json
index 00f2f6cc09..a95eb19528 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -877,6 +877,23 @@
 { 'command': 'migrate_cancel' }
 
 ##
+# @migrate-continue:
+#
+# Continue migration when it's in a paused state.
+#
+# @state: The state the migration is currently expected to be in
+#
+# Returns: nothing on success
+# Since: 2.11
+# Example:
+#
+# -> { "execute": "migrate-continue" , "arguments":
+#      { "state": "pause-before-device" } }
+# <- { "return": {} }
+##
+{ 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
+
+##
 # @migrate_set_downtime:
 #
 # Set maximum tolerated downtime for migration.
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

HMP equivalent to the just added migrate-continue
Unpause a migrate paused at a given state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp-commands.hx | 12 ++++++++++++
 hmp.c           | 13 +++++++++++++
 hmp.h           |  1 +
 3 files changed, 26 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 1941e19932..4afd57cf5f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -959,7 +959,19 @@ STEXI
 @item migrate_cancel
 @findex migrate_cancel
 Cancel the current VM migration.
+ETEXI
 
+    {
+        .name       = "migrate_continue",
+        .args_type  = "state:s",
+        .params     = "state",
+        .help       = "Continue migration from the given paused state",
+        .cmd        = hmp_migrate_continue,
+    },
+STEXI
+@item migrate_continue @var{state}
+@findex migrate_continue
+Continue migration from the paused state @var{state}
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 739d330f4e..5fd22a6ea8 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1490,6 +1490,19 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict)
     qmp_migrate_cancel(NULL);
 }
 
+void hmp_migrate_continue(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *state = qdict_get_str(qdict, "state");
+    int val = qapi_enum_parse(&MigrationStatus_lookup, state, -1, &err);
+
+    if (val >= 0) {
+        qmp_migrate_continue(val, &err);
+    }
+
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
diff --git a/hmp.h b/hmp.h
index 3605003e4c..a6f56b1f29 100644
--- a/hmp.h
+++ b/hmp.h
@@ -68,6 +68,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
+void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

If a migration_cancel is issued during the new paused state,
kick the pause_sem to get to unpause so it can cancel.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index be03e8ff0b..fa42918270 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1108,6 +1108,10 @@ static void migrate_fd_cancel(MigrationState *s)
         if (!migration_is_setup_or_active(old_state)) {
             break;
         }
+        /* If the migration is paused, kick it out of the pause */
+        if (old_state == MIGRATION_STATUS_PAUSE_BEFORE_DEVICE) {
+            qemu_sem_post(&s->pause_sem);
+        }
         migrate_set_state(&s->state, old_state, MIGRATION_STATUS_CANCELLING);
     } while (s->state != MIGRATION_STATUS_CANCELLING);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
@ 2017-10-11 19:13 ` Dr. David Alan Gilbert (git)
  2017-10-11 20:03 ` [Qemu-devel] [PATCH 0/7] migration: pause-before-device no-reply
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-10-11 19:13 UTC (permalink / raw)
  To: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz
  Cc: berrange, eblake, fuweiwei2

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add pause-before-device support for postcopy.
After starting postcopy it will transition
    active->pause_before_device->postcopy_active

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index fa42918270..bdb6a30995 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -104,6 +104,9 @@ enum mig_rp_message_type {
 static MigrationState *current_migration;
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
+static int migration_maybe_pause(MigrationState *s,
+                                 int *current_active_state,
+                                 int new_state);
 
 void migration_object_init(void)
 {
@@ -1829,8 +1832,11 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     QEMUFile *fb;
     int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     bool restart_block = false;
-    migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
-                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    int cur_state = MIGRATION_STATUS_ACTIVE;
+    if (!migrate_pause_before_device()) {
+        migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    }
 
     trace_postcopy_start();
     qemu_mutex_lock_iothread();
@@ -1850,6 +1856,12 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     }
     restart_block = true;
 
+    ret = migration_maybe_pause(ms, &cur_state,
+                                MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    if (ret < 0) {
+        goto fail;
+    }
+
     /*
      * Cause any non-postcopiable, but iterative devices to
      * send out their final data.
@@ -1986,7 +1998,9 @@ fail:
  * called with the iothread locked
  * Returns: 0 on success
  */
-static int migration_maybe_pause(MigrationState *s, int *current_active_state)
+static int migration_maybe_pause(MigrationState *s,
+                                 int *current_active_state,
+                                 int new_state)
 {
     int ret;
     if (!migrate_pause_before_device()) {
@@ -2014,11 +2028,11 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state)
                       MIGRATION_STATUS_PAUSE_BEFORE_DEVICE);
     qemu_sem_wait(&s->pause_sem);
     migrate_set_state(&s->state, MIGRATION_STATUS_PAUSE_BEFORE_DEVICE,
-                      MIGRATION_STATUS_DEVICE);
-    *current_active_state = MIGRATION_STATUS_DEVICE;
+                      new_state);
+    *current_active_state = new_state;
     qemu_mutex_lock_iothread();
 
-    return s->state == MIGRATION_STATUS_DEVICE ? 0 : -EINVAL;
+    return s->state == new_state ? 0 : -EINVAL;
 }
 
 /**
@@ -2047,7 +2061,8 @@ static void migration_completion(MigrationState *s, int current_active_state,
             bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             if (ret >= 0) {
-                ret = migration_maybe_pause(s, &current_active_state);
+                ret = migration_maybe_pause(s, &current_active_state,
+                                            MIGRATION_STATUS_DEVICE);
                 /* If this worked it will already have inactivated */
                 inactivate &= !migrate_pause_before_device();
             }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy Dr. David Alan Gilbert (git)
@ 2017-10-11 20:03 ` no-reply
  2017-10-12  8:21 ` Daniel P. Berrange
  2017-10-12 10:02 ` Daniel P. Berrange
  9 siblings, 0 replies; 17+ messages in thread
From: no-reply @ 2017-10-11 20:03 UTC (permalink / raw)
  To: dgilbert
  Cc: famz, qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx,
	mreitz, fuweiwei2

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20171011191317.24157-1-dgilbert@redhat.com
Subject: [Qemu-devel] [PATCH 0/7] migration: pause-before-device

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/1507742676-9908-1-git-send-email-peter.maydell@linaro.org -> patchew/1507742676-9908-1-git-send-email-peter.maydell@linaro.org
 * [new tag]               patchew/1507751249-3035-1-git-send-email-stefanb@linux.vnet.ibm.com -> patchew/1507751249-3035-1-git-send-email-stefanb@linux.vnet.ibm.com
Switched to a new branch 'test'
88ddcde82f migration: pause-before-device for postcopy
341433d19b migration: allow cancel to unpause
4a76fbc2b2 migrate: HMP migate_continue
24305477e0 migration: migrate-continue
fc48af642c migration: Wait for semaphore before completing migration
263e6dc5be migration: Add 'pause-before-device' and 'device' statuses
7827c25132 migration: Add 'pause-before-device' capability

=== OUTPUT BEGIN ===
Checking PATCH 1/7: migration: Add 'pause-before-device' capability...
Checking PATCH 2/7: migration: Add 'pause-before-device' and 'device' statuses...
Checking PATCH 3/7: migration: Wait for semaphore before completing migration...
ERROR: trailing statements should be on next line
#45: FILE: migration/migration.c:1995:
+    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);

ERROR: braces {} are necessary even for single statement blocks
#45: FILE: migration/migration.c:1995:
+    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);

total: 2 errors, 0 warnings, 80 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/7: migration: migrate-continue...
Checking PATCH 5/7: migrate: HMP migate_continue...
Checking PATCH 6/7: migration: allow cancel to unpause...
Checking PATCH 7/7: migration: pause-before-device for postcopy...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2017-10-11 20:03 ` [Qemu-devel] [PATCH 0/7] migration: pause-before-device no-reply
@ 2017-10-12  8:21 ` Daniel P. Berrange
  2017-10-12  9:18   ` Kevin Wolf
  2017-10-12 10:02 ` Daniel P. Berrange
  9 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrange @ 2017-10-12  8:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This set attempts to make a race condition between migration and
> drive-mirror (and other block users) soluble by allowing the migration
> to be paused after the source qemu releases the block devices but
> before the serialisation of the device state.
> 
> The symptom of this failure, as reported by Wangjie, is a:
>    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> 
> and the source qemu dieing; so the problem is pretty nasty.
> This has only been seen on 2.9 onwards, but the theory is that
> prior to 2.9 it might have been happening anyway and we were
> perhaps getting unreported corruptions (lost writes); so this
> really needs fixing.
> 
> This flow came from discussions between Kevin and me, and we can't
> see a way of fixing it without exposing a new state to the management
> layer.
> 
> The flow is now:
> 
> (qemu) migrate_set_capability pause-before-device on
> (qemu) migrate -d ...
> (qemu) info migrate
> ...
> Migration status: pause-before-device
> ...
> << issue commands to clean up any block jobs>>
> 
> (qemu) migrate_continue pause-before-device
> (qemu) info migrate
> ...
> Migration status: completed

I'm curious why QEMU doesn't have enough info to clean up the block
jobs automatically ? What is the key thing that libvirt knows about
the block jobs, that QEMU is lacking ? If QEMU had the right info it
could do it automatically & avoid this extra lock-step synchronization
with libvirt.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-12  8:21 ` Daniel P. Berrange
@ 2017-10-12  9:18   ` Kevin Wolf
  2017-10-12  9:27     ` Daniel P. Berrange
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2017-10-12  9:18 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This set attempts to make a race condition between migration and
> > drive-mirror (and other block users) soluble by allowing the migration
> > to be paused after the source qemu releases the block devices but
> > before the serialisation of the device state.
> > 
> > The symptom of this failure, as reported by Wangjie, is a:
> >    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > 
> > and the source qemu dieing; so the problem is pretty nasty.
> > This has only been seen on 2.9 onwards, but the theory is that
> > prior to 2.9 it might have been happening anyway and we were
> > perhaps getting unreported corruptions (lost writes); so this
> > really needs fixing.
> > 
> > This flow came from discussions between Kevin and me, and we can't
> > see a way of fixing it without exposing a new state to the management
> > layer.
> > 
> > The flow is now:
> > 
> > (qemu) migrate_set_capability pause-before-device on
> > (qemu) migrate -d ...
> > (qemu) info migrate
> > ...
> > Migration status: pause-before-device
> > ...
> > << issue commands to clean up any block jobs>>
> > 
> > (qemu) migrate_continue pause-before-device
> > (qemu) info migrate
> > ...
> > Migration status: completed
> 
> I'm curious why QEMU doesn't have enough info to clean up the block
> jobs automatically ? What is the key thing that libvirt knows about
> the block jobs, that QEMU is lacking ? If QEMU had the right info it
> could do it automatically & avoid this extra lock-step synchronization
> with libvirt.

The key point is that the block job needs to be completed while the
source VM is stopped, but the source qemu is still in control of the
image files (e.g. still holds the file locks), so that it can do the
remaining writes.

Without the additional migration phase, the only state where both sides
are stopped is when the destination is in control of the image files
(migration has completed, but -S prevents it from automatically
resuming), so the source can't write to the image any more.

Kevin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-12  9:18   ` Kevin Wolf
@ 2017-10-12  9:27     ` Daniel P. Berrange
  2017-10-12  9:52       ` Kevin Wolf
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel P. Berrange @ 2017-10-12  9:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This set attempts to make a race condition between migration and
> > > drive-mirror (and other block users) soluble by allowing the migration
> > > to be paused after the source qemu releases the block devices but
> > > before the serialisation of the device state.
> > > 
> > > The symptom of this failure, as reported by Wangjie, is a:
> > >    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > 
> > > and the source qemu dieing; so the problem is pretty nasty.
> > > This has only been seen on 2.9 onwards, but the theory is that
> > > prior to 2.9 it might have been happening anyway and we were
> > > perhaps getting unreported corruptions (lost writes); so this
> > > really needs fixing.
> > > 
> > > This flow came from discussions between Kevin and me, and we can't
> > > see a way of fixing it without exposing a new state to the management
> > > layer.
> > > 
> > > The flow is now:
> > > 
> > > (qemu) migrate_set_capability pause-before-device on
> > > (qemu) migrate -d ...
> > > (qemu) info migrate
> > > ...
> > > Migration status: pause-before-device
> > > ...
> > > << issue commands to clean up any block jobs>>
> > > 
> > > (qemu) migrate_continue pause-before-device
> > > (qemu) info migrate
> > > ...
> > > Migration status: completed
> > 
> > I'm curious why QEMU doesn't have enough info to clean up the block
> > jobs automatically ? What is the key thing that libvirt knows about
> > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > could do it automatically & avoid this extra lock-step synchronization
> > with libvirt.
> 
> The key point is that the block job needs to be completed while the
> source VM is stopped, but the source qemu is still in control of the
> image files (e.g. still holds the file locks), so that it can do the
> remaining writes.
> 
> Without the additional migration phase, the only state where both sides
> are stopped is when the destination is in control of the image files
> (migration has completed, but -S prevents it from automatically
> resuming), so the source can't write to the image any more.

Hmm, I always thought that the target QEMU did not start using the
image files until you ran 'cont' on the target. eg once source QEMU
has migrate=completed, both QEMUs are in paused state and source QEMU
still owns the images, until we run 'cont'.

What you're saying seems to imply this is not the case, but if so what
is triggering the target QEMU to acquire the locks on images ? Is it
done implicitly when it finishes reading device state off the wire ?

If so, could we instead add a migrate feature flag to tell the target
QEMU not to automatically acquire image locks, until it receives an
explicit 'cont'. That would then not require this extra lock-step
migration state.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-12  9:27     ` Daniel P. Berrange
@ 2017-10-12  9:52       ` Kevin Wolf
  2017-10-12  9:55         ` Daniel P. Berrange
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2017-10-12  9:52 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

Am 12.10.2017 um 11:27 hat Daniel P. Berrange geschrieben:
> On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> > Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Hi,
> > > >   This set attempts to make a race condition between migration and
> > > > drive-mirror (and other block users) soluble by allowing the migration
> > > > to be paused after the source qemu releases the block devices but
> > > > before the serialisation of the device state.
> > > > 
> > > > The symptom of this failure, as reported by Wangjie, is a:
> > > >    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > > 
> > > > and the source qemu dieing; so the problem is pretty nasty.
> > > > This has only been seen on 2.9 onwards, but the theory is that
> > > > prior to 2.9 it might have been happening anyway and we were
> > > > perhaps getting unreported corruptions (lost writes); so this
> > > > really needs fixing.
> > > > 
> > > > This flow came from discussions between Kevin and me, and we can't
> > > > see a way of fixing it without exposing a new state to the management
> > > > layer.
> > > > 
> > > > The flow is now:
> > > > 
> > > > (qemu) migrate_set_capability pause-before-device on
> > > > (qemu) migrate -d ...
> > > > (qemu) info migrate
> > > > ...
> > > > Migration status: pause-before-device
> > > > ...
> > > > << issue commands to clean up any block jobs>>
> > > > 
> > > > (qemu) migrate_continue pause-before-device
> > > > (qemu) info migrate
> > > > ...
> > > > Migration status: completed
> > > 
> > > I'm curious why QEMU doesn't have enough info to clean up the block
> > > jobs automatically ? What is the key thing that libvirt knows about
> > > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > > could do it automatically & avoid this extra lock-step synchronization
> > > with libvirt.
> > 
> > The key point is that the block job needs to be completed while the
> > source VM is stopped, but the source qemu is still in control of the
> > image files (e.g. still holds the file locks), so that it can do the
> > remaining writes.
> > 
> > Without the additional migration phase, the only state where both sides
> > are stopped is when the destination is in control of the image files
> > (migration has completed, but -S prevents it from automatically
> > resuming), so the source can't write to the image any more.
> 
> Hmm, I always thought that the target QEMU did not start using the
> image files until you ran 'cont' on the target. eg once source QEMU
> has migrate=completed, both QEMUs are in paused state and source QEMU
> still owns the images, until we run 'cont'.
> 
> What you're saying seems to imply this is not the case, but if so what
> is triggering the target QEMU to acquire the locks on images ? Is it
> done implicitly when it finishes reading device state off the wire ?
> 
> If so, could we instead add a migrate feature flag to tell the target
> QEMU not to automatically acquire image locks, until it receives an
> explicit 'cont'. That would then not require this extra lock-step
> migration state.

The handover consists of two parts: The destination acquires the locks,
but first the source needs to release them. Without a new command, the
source can't know when it is supposed to do that. The destination
receives the 'cont' command, but source doesn't know about this. So you
have to have something that tells the source "management has made sure
to complete what needed to be completed, you can now give up control of
the images".

I also think that conceptually it is the cleanest to have a source
controlled pre-handover phase with paused VM, which is only symmetrical
to the existing post-handover phase that we have on the destination.
This gives us a clean model for the handover of any resources that
require some tearing down on the source before they can be used on the
destination, so it appears to be the most future-proof option.

Kevin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-12  9:52       ` Kevin Wolf
@ 2017-10-12  9:55         ` Daniel P. Berrange
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel P. Berrange @ 2017-10-12  9:55 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Dr. David Alan Gilbert (git),
	qemu-devel, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

On Thu, Oct 12, 2017 at 11:52:40AM +0200, Kevin Wolf wrote:
> Am 12.10.2017 um 11:27 hat Daniel P. Berrange geschrieben:
> > On Thu, Oct 12, 2017 at 11:18:31AM +0200, Kevin Wolf wrote:
> > > Am 12.10.2017 um 10:21 hat Daniel P. Berrange geschrieben:
> > > > On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Hi,
> > > > >   This set attempts to make a race condition between migration and
> > > > > drive-mirror (and other block users) soluble by allowing the migration
> > > > > to be paused after the source qemu releases the block devices but
> > > > > before the serialisation of the device state.
> > > > > 
> > > > > The symptom of this failure, as reported by Wangjie, is a:
> > > > >    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> > > > > 
> > > > > and the source qemu dieing; so the problem is pretty nasty.
> > > > > This has only been seen on 2.9 onwards, but the theory is that
> > > > > prior to 2.9 it might have been happening anyway and we were
> > > > > perhaps getting unreported corruptions (lost writes); so this
> > > > > really needs fixing.
> > > > > 
> > > > > This flow came from discussions between Kevin and me, and we can't
> > > > > see a way of fixing it without exposing a new state to the management
> > > > > layer.
> > > > > 
> > > > > The flow is now:
> > > > > 
> > > > > (qemu) migrate_set_capability pause-before-device on
> > > > > (qemu) migrate -d ...
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: pause-before-device
> > > > > ...
> > > > > << issue commands to clean up any block jobs>>
> > > > > 
> > > > > (qemu) migrate_continue pause-before-device
> > > > > (qemu) info migrate
> > > > > ...
> > > > > Migration status: completed
> > > > 
> > > > I'm curious why QEMU doesn't have enough info to clean up the block
> > > > jobs automatically ? What is the key thing that libvirt knows about
> > > > the block jobs, that QEMU is lacking ? If QEMU had the right info it
> > > > could do it automatically & avoid this extra lock-step synchronization
> > > > with libvirt.
> > > 
> > > The key point is that the block job needs to be completed while the
> > > source VM is stopped, but the source qemu is still in control of the
> > > image files (e.g. still holds the file locks), so that it can do the
> > > remaining writes.
> > > 
> > > Without the additional migration phase, the only state where both sides
> > > are stopped is when the destination is in control of the image files
> > > (migration has completed, but -S prevents it from automatically
> > > resuming), so the source can't write to the image any more.
> > 
> > Hmm, I always thought that the target QEMU did not start using the
> > image files until you ran 'cont' on the target. eg once source QEMU
> > has migrate=completed, both QEMUs are in paused state and source QEMU
> > still owns the images, until we run 'cont'.
> > 
> > What you're saying seems to imply this is not the case, but if so what
> > is triggering the target QEMU to acquire the locks on images ? Is it
> > done implicitly when it finishes reading device state off the wire ?
> > 
> > If so, could we instead add a migrate feature flag to tell the target
> > QEMU not to automatically acquire image locks, until it receives an
> > explicit 'cont'. That would then not require this extra lock-step
> > migration state.
> 
> The handover consists of two parts: The destination acquires the locks,
> but first the source needs to release them. Without a new command, the
> source can't know when it is supposed to do that. The destination
> receives the 'cont' command, but source doesn't know about this. So you
> have to have something that tells the source "management has made sure
> to complete what needed to be completed, you can now give up control of
> the images".
> 
> I also think that conceptually it is the cleanest to have a source
> controlled pre-handover phase with paused VM, which is only symmetrical
> to the existing post-handover phase that we have on the destination.
> This gives us a clean model for the handover of any resources that
> require some tearing down on the source before they can be used on the
> destination, so it appears to be the most future-proof option.

Ok, I see what you mean now.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
  2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2017-10-12  8:21 ` Daniel P. Berrange
@ 2017-10-12 10:02 ` Daniel P. Berrange
  9 siblings, 0 replies; 17+ messages in thread
From: Daniel P. Berrange @ 2017-10-12 10:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, kwolf, jdenemar, wangjie88, quintela, peterx, mreitz,
	eblake, fuweiwei2

On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This set attempts to make a race condition between migration and
> drive-mirror (and other block users) soluble by allowing the migration
> to be paused after the source qemu releases the block devices but
> before the serialisation of the device state.
> 
> The symptom of this failure, as reported by Wangjie, is a:
>    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> 
> and the source qemu dieing; so the problem is pretty nasty.
> This has only been seen on 2.9 onwards, but the theory is that
> prior to 2.9 it might have been happening anyway and we were
> perhaps getting unreported corruptions (lost writes); so this
> really needs fixing.
> 
> This flow came from discussions between Kevin and me, and we can't
> see a way of fixing it without exposing a new state to the management
> layer.
> 
> The flow is now:
> 
> (qemu) migrate_set_capability pause-before-device on

How about 'switchover-cleanup'


> (qemu) migrate -d ...
> (qemu) info migrate
> ...
> Migration status: pause-before-device

and 'switchover'

> ...
> << issue commands to clean up any block jobs>>
> 
> (qemu) migrate_continue pause-before-device
> (qemu) info migrate
> ...
> Migration status: completed
> 
> This set has been _very_ lightly tested just at the normal migration
> code, without the addition of the drive mirror; so this is a first
> cut.  I'd appreciate some feedback from libvirt whether the inteface
> is OK and ideally a hack to test it in a full libvirt setup to see
> if we hit any other issues.
> 
> The precopy flow is:
> active->pause-before-device->completed
> 
> The postcopy flow is:
> active->pause-before-device->postcopy-active->completed
> 
> Although the behaviour with postcopy only gets interesting when
> we add something like Max's active-sync.
> 
> Please argue about the command and state naming.

Argued above :-)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration
  2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
@ 2017-10-18  3:35   ` Peter Xu
  2017-10-18  8:59     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2017-10-18  3:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: qemu-devel, kwolf, jdenemar, wangjie88, quintela, mreitz,
	berrange, eblake, fuweiwei2

On Wed, Oct 11, 2017 at 08:13:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wait for a semaphore before completing the migration,
> if the previously added capability was enabled.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/migration.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  migration/migration.h |  3 +++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index e1a87c3d23..b411a7bb63 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1967,6 +1967,46 @@ fail:
>  }
>  
>  /**
> + * migration_maybe_pause: Pause if required to by migrate_pause_before_device
> + * called with the iothread locked
> + * Returns: 0 on success
> + */
> +static int migration_maybe_pause(MigrationState *s, int *current_active_state)
> +{
> +    int ret;
> +    if (!migrate_pause_before_device()) {
> +        return 0;
> +    }
> +    ret = bdrv_inactivate_all();

My understanding is that the crash was caused by mirrored block device
IO triggered after the inactivation, then... should we do this after
waiting for the semaphore (possibly at [1] below) to make sure the
block jobs are completed?  Or did I miss anything?

> +    if (ret) {
> +        error_report("%s: bdrv_inactivate_all() failed (%d)",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
> +    s->block_inactive = true;
> +
> +    /* Since leaving this state is not atomic with posting the semaphore
> +     * it's possible that someone could have issued multiple migrate_continue
> +     * and the semaphore is incorrectly positive at this point;
> +     * the docs say it's undefined to reinit a semaphore that's already
> +     * init'd, so use timedwait to eat up any existing posts.
> +     */
> +    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);
> +
> +    qemu_mutex_unlock_iothread();
> +    migrate_set_state(&s->state, *current_active_state,
> +                      MIGRATION_STATUS_PAUSE_BEFORE_DEVICE);
> +    qemu_sem_wait(&s->pause_sem);

[1]

> +    migrate_set_state(&s->state, MIGRATION_STATUS_PAUSE_BEFORE_DEVICE,
> +                      MIGRATION_STATUS_DEVICE);
> +    *current_active_state = MIGRATION_STATUS_DEVICE;
> +    qemu_mutex_lock_iothread();
> +
> +    return s->state == MIGRATION_STATUS_DEVICE ? 0 : -EINVAL;
> +}
> +
> +/**
>   * migration_completion: Used by migration_thread when there's not much left.
>   *   The caller 'breaks' the loop when this returns.
>   *
> @@ -1992,6 +2032,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
>              bool inactivate = !migrate_colo_enabled();
>              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>              if (ret >= 0) {
> +                ret = migration_maybe_pause(s, &current_active_state);
> +                /* If this worked it will already have inactivated */
> +                inactivate &= !migrate_pause_before_device();
> +            }
> +            if (ret >= 0) {
>                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
>                  ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
>                                                           inactivate);
> @@ -2372,6 +2417,7 @@ static void migration_instance_finalize(Object *obj)
>  
>      g_free(params->tls_hostname);
>      g_free(params->tls_creds);
> +    qemu_sem_destroy(&ms->pause_sem);
>  }
>  
>  static void migration_instance_init(Object *obj)
> @@ -2382,6 +2428,7 @@ static void migration_instance_init(Object *obj)
>      ms->state = MIGRATION_STATUS_NONE;
>      ms->xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
>      ms->mbps = -1;
> +    qemu_sem_init(&ms->pause_sem, 0);
>  
>      params->tls_hostname = g_strdup("");
>      params->tls_creds = g_strdup("");
> diff --git a/migration/migration.h b/migration/migration.h
> index 37feea5453..447e8b3f79 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -121,6 +121,9 @@ struct MigrationState
>      /* Flag set once the migration thread called bdrv_inactivate_all */
>      bool block_inactive;
>  
> +    /* Migration is paused due to pause-before-device */
> +    QemuSemaphore pause_sem;
> +
>      /* The semaphore is used to notify COLO thread that failover is finished */
>      QemuSemaphore colo_exit_sem;
>  
> -- 
> 2.13.6
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration
  2017-10-18  3:35   ` Peter Xu
@ 2017-10-18  8:59     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2017-10-18  8:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, kwolf, jdenemar, wangjie88, quintela, mreitz,
	berrange, eblake, fuweiwei2

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Oct 11, 2017 at 08:13:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Wait for a semaphore before completing the migration,
> > if the previously added capability was enabled.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/migration.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/migration.h |  3 +++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index e1a87c3d23..b411a7bb63 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1967,6 +1967,46 @@ fail:
> >  }
> >  
> >  /**
> > + * migration_maybe_pause: Pause if required to by migrate_pause_before_device
> > + * called with the iothread locked
> > + * Returns: 0 on success
> > + */
> > +static int migration_maybe_pause(MigrationState *s, int *current_active_state)
> > +{
> > +    int ret;
> > +    if (!migrate_pause_before_device()) {
> > +        return 0;
> > +    }
> > +    ret = bdrv_inactivate_all();
> 
> My understanding is that the crash was caused by mirrored block device
> IO triggered after the inactivation, then... should we do this after
> waiting for the semaphore (possibly at [1] below) to make sure the
> block jobs are completed?  Or did I miss anything?

Ah you're right, just confirmed this with kwolf that I got it the
wrong way around.
I'll fix it.

Dave

> > +    if (ret) {
> > +        error_report("%s: bdrv_inactivate_all() failed (%d)",
> > +                     __func__, ret);
> > +        return ret;
> > +    }
> > +
> > +    s->block_inactive = true;
> > +
> > +    /* Since leaving this state is not atomic with posting the semaphore
> > +     * it's possible that someone could have issued multiple migrate_continue
> > +     * and the semaphore is incorrectly positive at this point;
> > +     * the docs say it's undefined to reinit a semaphore that's already
> > +     * init'd, so use timedwait to eat up any existing posts.
> > +     */
> > +    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);
> > +
> > +    qemu_mutex_unlock_iothread();
> > +    migrate_set_state(&s->state, *current_active_state,
> > +                      MIGRATION_STATUS_PAUSE_BEFORE_DEVICE);
> > +    qemu_sem_wait(&s->pause_sem);
> 
> [1]
> 
> > +    migrate_set_state(&s->state, MIGRATION_STATUS_PAUSE_BEFORE_DEVICE,
> > +                      MIGRATION_STATUS_DEVICE);
> > +    *current_active_state = MIGRATION_STATUS_DEVICE;
> > +    qemu_mutex_lock_iothread();
> > +
> > +    return s->state == MIGRATION_STATUS_DEVICE ? 0 : -EINVAL;
> > +}
> > +
> > +/**
> >   * migration_completion: Used by migration_thread when there's not much left.
> >   *   The caller 'breaks' the loop when this returns.
> >   *
> > @@ -1992,6 +2032,11 @@ static void migration_completion(MigrationState *s, int current_active_state,
> >              bool inactivate = !migrate_colo_enabled();
> >              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >              if (ret >= 0) {
> > +                ret = migration_maybe_pause(s, &current_active_state);
> > +                /* If this worked it will already have inactivated */
> > +                inactivate &= !migrate_pause_before_device();
> > +            }
> > +            if (ret >= 0) {
> >                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> >                  ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
> >                                                           inactivate);
> > @@ -2372,6 +2417,7 @@ static void migration_instance_finalize(Object *obj)
> >  
> >      g_free(params->tls_hostname);
> >      g_free(params->tls_creds);
> > +    qemu_sem_destroy(&ms->pause_sem);
> >  }
> >  
> >  static void migration_instance_init(Object *obj)
> > @@ -2382,6 +2428,7 @@ static void migration_instance_init(Object *obj)
> >      ms->state = MIGRATION_STATUS_NONE;
> >      ms->xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
> >      ms->mbps = -1;
> > +    qemu_sem_init(&ms->pause_sem, 0);
> >  
> >      params->tls_hostname = g_strdup("");
> >      params->tls_creds = g_strdup("");
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 37feea5453..447e8b3f79 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -121,6 +121,9 @@ struct MigrationState
> >      /* Flag set once the migration thread called bdrv_inactivate_all */
> >      bool block_inactive;
> >  
> > +    /* Migration is paused due to pause-before-device */
> > +    QemuSemaphore pause_sem;
> > +
> >      /* The semaphore is used to notify COLO thread that failover is finished */
> >      QemuSemaphore colo_exit_sem;
> >  
> > -- 
> > 2.13.6
> > 
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-18  8:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11 19:13 [Qemu-devel] [PATCH 0/7] migration: pause-before-device Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration Dr. David Alan Gilbert (git)
2017-10-18  3:35   ` Peter Xu
2017-10-18  8:59     ` Dr. David Alan Gilbert
2017-10-11 19:13 ` [Qemu-devel] [PATCH 4/7] migration: migrate-continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause Dr. David Alan Gilbert (git)
2017-10-11 19:13 ` [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy Dr. David Alan Gilbert (git)
2017-10-11 20:03 ` [Qemu-devel] [PATCH 0/7] migration: pause-before-device no-reply
2017-10-12  8:21 ` Daniel P. Berrange
2017-10-12  9:18   ` Kevin Wolf
2017-10-12  9:27     ` Daniel P. Berrange
2017-10-12  9:52       ` Kevin Wolf
2017-10-12  9:55         ` Daniel P. Berrange
2017-10-12 10:02 ` Daniel P. Berrange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.