All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Fix multifd + cancel + multifd
@ 2020-01-16 15:46 Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error Juan Quintela
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

Hi

In the v3 series:
- Add back the ->shutdown fix
  Dave found the problem, we need to setup an error if we do shutdown.
- Make iotests to work back (we need to setup ->active for savevm)
- So postcopy/recovery is fixed again.

Please review, if there are not outstanding issues, I plan to push it.

Thanks, Juan

In the v2 series:
- get the multifd test patch
- drop the ->shutdown fix
  it break postcopy recovery test.  Still trying to determine if the
  problem is inside the recover test or the recover code.
- upgrade the migrate_cancel test

Please review.

[v1]
This series:
- create a test that does:
  launch multifd on target
  migrate to target
  cancel on source
  create another target
  migrate again

- And fixes the cases that made it fail:
* Make sure that we don't try ever IO after shutdown/error

Please, review.

Juan Quintela (5):
  multifd: Make sure that we don't do any IO after an error
  migration: Create MigrationState active field
  migration: Don't wait in semaphore for thread we know has finished
  qemu-file: Don't do IO after shutdown
  migration-test: Make sure that multifd and cancel works

 migration/migration.c        |   5 ++
 migration/migration.h        |   5 ++
 migration/qemu-file.c        |  22 ++++++-
 migration/ram.c              |  28 ++++++---
 migration/savevm.c           |   2 +
 tests/qtest/migration-test.c | 112 ++++++++++++++++++++++++++++++++++-
 6 files changed, 163 insertions(+), 11 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error
  2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
@ 2020-01-16 15:46 ` Juan Quintela
  2020-01-16 18:20   ` Dr. David Alan Gilbert
  2020-01-16 15:46 ` [PATCH v3 2/5] migration: Create MigrationState active field Juan Quintela
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ba6e0eea15..8f9f3bba5b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3442,7 +3442,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 {
     RAMState **temp = opaque;
     RAMState *rs = *temp;
-    int ret;
+    int ret = 0;
     int i;
     int64_t t0;
     int done = 0;
@@ -3521,12 +3521,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
 out:
-    multifd_send_sync_main(rs);
-    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-    qemu_fflush(f);
-    ram_counters.transferred += 8;
+    if (ret >= 0) {
+        multifd_send_sync_main(rs);
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        qemu_fflush(f);
+        ram_counters.transferred += 8;
 
-    ret = qemu_file_get_error(f);
+        ret = qemu_file_get_error(f);
+    }
     if (ret < 0) {
         return ret;
     }
@@ -3578,9 +3580,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    multifd_send_sync_main(rs);
-    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-    qemu_fflush(f);
+    if (ret >= 0) {
+        multifd_send_sync_main(rs);
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        qemu_fflush(f);
+    }
 
     return ret;
 }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/5] migration: Create MigrationState active field
  2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error Juan Quintela
@ 2020-01-16 15:46 ` Juan Quintela
  2020-01-17 16:26   ` Dr. David Alan Gilbert
  2020-01-16 15:46 ` [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished Juan Quintela
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

Right now, there is no easy way to dectect if we have already
cancelled/finished/failed a migration.  This field is setup to true
when we start a migration, and it is set to false as soon as we stop
it.

It fixes a real bug, in ram_save_iterate() we call functions that
wrote to the channel even if we know that migration has stopped for
any reason.  This gives problems with multifd because we need to
synchronize various semoaphores that we don't want to take.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c | 5 +++++
 migration/migration.h | 5 +++++
 migration/ram.c       | 2 +-
 migration/savevm.c    | 2 ++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 990bff00c0..60bc8710b6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1583,6 +1583,8 @@ static void migrate_fd_cancel(MigrationState *s)
     QEMUFile *f = migrate_get_current()->to_dst_file;
     trace_migrate_fd_cancel();
 
+    s->active = false;
+
     if (s->rp_state.from_dst_file) {
         /* shutdown the rp socket, so causing the rp thread to shutdown */
         qemu_file_shutdown(s->rp_state.from_dst_file);
@@ -2834,6 +2836,7 @@ static void migration_completion(MigrationState *s)
     }
 
     if (!migrate_colo_enabled()) {
+        s->active = false;
         migrate_set_state(&s->state, current_active_state,
                           MIGRATION_STATUS_COMPLETED);
     }
@@ -2859,6 +2862,7 @@ fail_invalidate:
     }
 
 fail:
+    s->active = false;
     migrate_set_state(&s->state, current_active_state,
                       MIGRATION_STATUS_FAILED);
 }
@@ -3289,6 +3293,7 @@ static void *migration_thread(void *opaque)
     }
 
     qemu_savevm_state_setup(s->to_dst_file);
+    s->active = true;
 
     if (qemu_savevm_nr_failover_devices()) {
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aa9ff6f27b..e0386efe95 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -169,6 +169,11 @@ struct MigrationState
 
     int state;
 
+    /* Is the migration channel still open.  When migration finish,
+     * gets an error or is cancelled this becomes false.
+     */
+
+    bool active;
     /* State related to return path */
     struct {
         QEMUFile     *from_dst_file;
diff --git a/migration/ram.c b/migration/ram.c
index 8f9f3bba5b..44ca56e1ea 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3521,7 +3521,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
 out:
-    if (ret >= 0) {
+    if (ret >= 0 && migrate_get_current()->active) {
         multifd_send_sync_main(rs);
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
diff --git a/migration/savevm.c b/migration/savevm.c
index adfdca26ac..3efde5b3dd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1550,6 +1550,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     qemu_mutex_unlock_iothread();
     qemu_savevm_state_header(f);
     qemu_savevm_state_setup(f);
+    ms->active = true;
     qemu_mutex_lock_iothread();
 
     while (qemu_file_get_error(f) == 0) {
@@ -1574,6 +1575,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         status = MIGRATION_STATUS_COMPLETED;
     }
     migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP, status);
+    ms->active = false;
 
     /* f is outer parameter, it should not stay in global migration state after
      * this function finished */
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished
  2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 2/5] migration: Create MigrationState active field Juan Quintela
@ 2020-01-16 15:46 ` Juan Quintela
  2020-01-17 16:45   ` Dr. David Alan Gilbert
  2020-01-16 15:46 ` [PATCH v3 4/5] qemu-file: Don't do IO after shutdown Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works Juan Quintela
  4 siblings, 1 reply; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

If p->quit is true for any channel, we know that it has finished for
any reason.  So don't wait for it, just continue.

Signed-off-by: Juan Quintela <quintela@redhat.com>

---

I could be convinced that the right thing to do in that case is to
just do a break instead of a continue.  Each option has its own
advantages/disadvantanges.
---
 migration/ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 44ca56e1ea..bc918ef28d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1118,6 +1118,12 @@ static void multifd_send_sync_main(RAMState *rs)
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
         trace_multifd_send_sync_main_wait(p->id);
+        qemu_mutex_lock(&p->mutex);
+        if (p->quit) {
+            qemu_mutex_unlock(&p->mutex);
+            continue;
+        }
+        qemu_mutex_unlock(&p->mutex);
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/5] qemu-file: Don't do IO after shutdown
  2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
                   ` (2 preceding siblings ...)
  2020-01-16 15:46 ` [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished Juan Quintela
@ 2020-01-16 15:46 ` Juan Quintela
  2020-01-16 15:46 ` [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works Juan Quintela
  4 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

Be sure that we are not doing neither read/write after shutdown of the
QEMUFile.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

---

Set error in case that there is none (dave)
---
 migration/qemu-file.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 26fb25ddc1..bbb2b63927 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -53,6 +53,8 @@ struct QEMUFile {
 
     int last_error;
     Error *last_error_obj;
+    /* has the file has been shutdown */
+    bool shutdown;
 };
 
 /*
@@ -61,10 +63,18 @@ struct QEMUFile {
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
+    int ret;
+
+    f->shutdown = true;
     if (!f->ops->shut_down) {
         return -ENOSYS;
     }
-    return f->ops->shut_down(f->opaque, true, true, NULL);
+    ret = f->ops->shut_down(f->opaque, true, true, NULL);
+
+    if (!f->last_error) {
+        qemu_file_set_error(f, -EIO);
+    }
+    return ret;
 }
 
 /*
@@ -214,6 +224,9 @@ void qemu_fflush(QEMUFile *f)
         return;
     }
 
+    if (f->shutdown) {
+        return;
+    }
     if (f->iovcnt > 0) {
         expect = iov_size(f->iov, f->iovcnt);
         ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos,
@@ -328,6 +341,10 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
     f->buf_index = 0;
     f->buf_size = pending;
 
+    if (f->shutdown) {
+        return 0;
+    }
+
     len = f->ops->get_buffer(f->opaque, f->buf + pending, f->pos,
                              IO_BUF_SIZE - pending, &local_error);
     if (len > 0) {
@@ -642,6 +659,9 @@ int64_t qemu_ftell(QEMUFile *f)
 
 int qemu_file_rate_limit(QEMUFile *f)
 {
+    if (f->shutdown) {
+        return 1;
+    }
     if (qemu_file_get_error(f)) {
         return 1;
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works
  2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
                   ` (3 preceding siblings ...)
  2020-01-16 15:46 ` [PATCH v3 4/5] qemu-file: Don't do IO after shutdown Juan Quintela
@ 2020-01-16 15:46 ` Juan Quintela
  2020-01-16 15:59   ` Thomas Huth
  4 siblings, 1 reply; 13+ messages in thread
From: Juan Quintela @ 2020-01-16 15:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth,
	Dr. David Alan Gilbert, Juan Quintela

Test that this sequerce works:

- launch source
- launch target
- start migration
- cancel migration
- relaunch target
- do migration again

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

- Wait for 1st trhead to move to cancelled before launching second
  migration
- Add 'to2' parameter to diferentiate 1st and second target.
---
 tests/qtest/migration-test.c | 112 ++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 26e2e77289..4bf4901970 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -424,6 +424,14 @@ static void migrate_recover(QTestState *who, const char *uri)
     qobject_unref(rsp);
 }
 
+static void migrate_cancel(QTestState *who)
+{
+    QDict *rsp;
+
+    rsp = wait_command(who, "{ 'execute': 'migrate_cancel' }");
+    qobject_unref(rsp);
+}
+
 static void migrate_set_capability(QTestState *who, const char *capability,
                                    bool value)
 {
@@ -456,6 +464,8 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to)
 typedef struct {
     bool hide_stderr;
     bool use_shmem;
+    /* only launch the target process */
+    bool only_target;
     char *opts_source;
     char *opts_target;
 } MigrateStart;
@@ -571,7 +581,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
                                  arch_source, shmem_opts, args->opts_source,
                                  ignore_stderr);
     g_free(arch_source);
-    *from = qtest_init(cmd_source);
+    if (!args->only_target) {
+        *from = qtest_init(cmd_source);
+    }
     g_free(cmd_source);
 
     cmd_target = g_strdup_printf("-accel kvm -accel tcg%s%s "
@@ -1294,6 +1306,103 @@ static void test_multifd_tcp(void)
     free(uri);
 }
 
+/*
+ * This test does:
+ *  source               target
+ *                       migrate_incoming
+ *     migrate
+ *     migrate_cancel
+ *                       launch another target
+ *     migrate
+ *
+ *  And see that it works
+ */
+
+static void test_multifd_tcp_cancel(void)
+{
+    MigrateStart *args = migrate_start_new();
+    QTestState *from, *to, *to2;
+    QDict *rsp;
+    char *uri;
+
+    args->hide_stderr = true;
+
+    if (test_migrate_start(&from, &to, "defer", args)) {
+        return;
+    }
+
+    /*
+     * We want to pick a speed slow enough that the test completes
+     * quickly, but that it doesn't complete precopy even on a slow
+     * machine, so also set the downtime.
+     */
+    /* 1 ms should make it not converge*/
+    migrate_set_parameter_int(from, "downtime-limit", 1);
+    /* 300MB/s */
+    migrate_set_parameter_int(from, "max-bandwidth", 30000000);
+
+    migrate_set_parameter_int(from, "multifd-channels", 16);
+    migrate_set_parameter_int(to, "multifd-channels", 16);
+
+    migrate_set_capability(from, "multifd", "true");
+    migrate_set_capability(to, "multifd", "true");
+
+    /* Start incoming migration from the 1st socket */
+    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+                           "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+    qobject_unref(rsp);
+
+    /* Wait for the first serial output from the source */
+    wait_for_serial("src_serial");
+
+    uri = migrate_get_socket_address(to, "socket-address");
+
+    migrate_qmp(from, uri, "{}");
+
+    wait_for_migration_pass(from);
+
+    migrate_cancel(from);
+
+    args = migrate_start_new();
+    args->only_target = true;
+
+    if (test_migrate_start(&from, &to2, "defer", args)) {
+        return;
+    }
+
+    migrate_set_parameter_int(to2, "multifd-channels", 16);
+
+    migrate_set_capability(to2, "multifd", "true");
+
+    /* Start incoming migration from the 1st socket */
+    rsp = wait_command(to2, "{ 'execute': 'migrate-incoming',"
+                            "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+    qobject_unref(rsp);
+
+    uri = migrate_get_socket_address(to2, "socket-address");
+
+    wait_for_migration_status(from, "cancelled", NULL);
+
+    /* 300ms it should converge */
+    migrate_set_parameter_int(from, "downtime-limit", 300);
+    /* 1GB/s */
+    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
+
+    migrate_qmp(from, uri, "{}");
+
+    wait_for_migration_pass(from);
+
+    if (!got_stop) {
+        qtest_qmp_eventwait(from, "STOP");
+    }
+    qtest_qmp_eventwait(to2, "RESUME");
+
+    wait_for_serial("dest_serial");
+    wait_for_migration_complete(from);
+    test_migrate_end(from, to2, true);
+    free(uri);
+}
+
 int main(int argc, char **argv)
 {
     char template[] = "/tmp/migration-test-XXXXXX";
@@ -1359,6 +1468,7 @@ int main(int argc, char **argv)
 
     qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
     qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
+    qtest_add_func("/migration/multifd/tcp/cancel", test_multifd_tcp_cancel);
 
     ret = g_test_run();
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works
  2020-01-16 15:46 ` [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works Juan Quintela
@ 2020-01-16 15:59   ` Thomas Huth
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Huth @ 2020-01-16 15:59 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Dr. David Alan Gilbert

On 16/01/2020 16.46, Juan Quintela wrote:
> Test that this sequerce works:
> 
> - launch source
> - launch target
> - start migration
> - cancel migration
> - relaunch target
> - do migration again
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
[...]
> +static void test_multifd_tcp_cancel(void)
> +{
> +    MigrateStart *args = migrate_start_new();
> +    QTestState *from, *to, *to2;
> +    QDict *rsp;
> +    char *uri;
> +
> +    args->hide_stderr = true;
> +
> +    if (test_migrate_start(&from, &to, "defer", args)) {
> +        return;
> +    }
> +
> +    /*
> +     * We want to pick a speed slow enough that the test completes
> +     * quickly, but that it doesn't complete precopy even on a slow
> +     * machine, so also set the downtime.
> +     */
> +    /* 1 ms should make it not converge*/
> +    migrate_set_parameter_int(from, "downtime-limit", 1);
> +    /* 300MB/s */
> +    migrate_set_parameter_int(from, "max-bandwidth", 30000000);
> +
> +    migrate_set_parameter_int(from, "multifd-channels", 16);
> +    migrate_set_parameter_int(to, "multifd-channels", 16);
> +
> +    migrate_set_capability(from, "multifd", "true");
> +    migrate_set_capability(to, "multifd", "true");
> +
> +    /* Start incoming migration from the 1st socket */
> +    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
> +                           "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
> +    qobject_unref(rsp);
> +
> +    /* Wait for the first serial output from the source */
> +    wait_for_serial("src_serial");
> +
> +    uri = migrate_get_socket_address(to, "socket-address");
> +
> +    migrate_qmp(from, uri, "{}");
> +
> +    wait_for_migration_pass(from);
> +
> +    migrate_cancel(from);
> +
> +    args = migrate_start_new();
> +    args->only_target = true;
> +
> +    if (test_migrate_start(&from, &to2, "defer", args)) {
> +        return;
> +    }
> +
> +    migrate_set_parameter_int(to2, "multifd-channels", 16);
> +
> +    migrate_set_capability(to2, "multifd", "true");
> +
> +    /* Start incoming migration from the 1st socket */
> +    rsp = wait_command(to2, "{ 'execute': 'migrate-incoming',"
> +                            "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
> +    qobject_unref(rsp);
> +
> +    uri = migrate_get_socket_address(to2, "socket-address");
> +
> +    wait_for_migration_status(from, "cancelled", NULL);
> +
> +    /* 300ms it should converge */
> +    migrate_set_parameter_int(from, "downtime-limit", 300);
> +    /* 1GB/s */
> +    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
> +
> +    migrate_qmp(from, uri, "{}");
> +
> +    wait_for_migration_pass(from);
> +
> +    if (!got_stop) {
> +        qtest_qmp_eventwait(from, "STOP");
> +    }
> +    qtest_qmp_eventwait(to2, "RESUME");
> +
> +    wait_for_serial("dest_serial");
> +    wait_for_migration_complete(from);
> +    test_migrate_end(from, to2, true);
> +    free(uri);

That should be g_free(uri) instead, shouldn't it?

 Thomas

> +}
> +
>  int main(int argc, char **argv)
>  {
>      char template[] = "/tmp/migration-test-XXXXXX";
> @@ -1359,6 +1468,7 @@ int main(int argc, char **argv)
>  
>      qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
>      qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
> +    qtest_add_func("/migration/multifd/tcp/cancel", test_multifd_tcp_cancel);
>  
>      ret = g_test_run();
>  
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error
  2020-01-16 15:46 ` [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error Juan Quintela
@ 2020-01-16 18:20   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 18:20 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

* Juan Quintela (quintela@redhat.com) wrote:
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ba6e0eea15..8f9f3bba5b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3442,7 +3442,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  {
>      RAMState **temp = opaque;
>      RAMState *rs = *temp;
> -    int ret;
> +    int ret = 0;
>      int i;
>      int64_t t0;
>      int done = 0;
> @@ -3521,12 +3521,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      ram_control_after_iterate(f, RAM_CONTROL_ROUND);
>  
>  out:
> -    multifd_send_sync_main(rs);
> -    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -    qemu_fflush(f);
> -    ram_counters.transferred += 8;
> +    if (ret >= 0) {
> +        multifd_send_sync_main(rs);
> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +        qemu_fflush(f);
> +        ram_counters.transferred += 8;
>  
> -    ret = qemu_file_get_error(f);
> +        ret = qemu_file_get_error(f);
> +    }
>      if (ret < 0) {
>          return ret;
>      }
> @@ -3578,9 +3580,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>      }
>  
> -    multifd_send_sync_main(rs);
> -    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -    qemu_fflush(f);
> +    if (ret >= 0) {
> +        multifd_send_sync_main(rs);
> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +        qemu_fflush(f);
> +    }
>  
>      return ret;
>  }
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/5] migration: Create MigrationState active field
  2020-01-16 15:46 ` [PATCH v3 2/5] migration: Create MigrationState active field Juan Quintela
@ 2020-01-17 16:26   ` Dr. David Alan Gilbert
  2020-01-17 18:35     ` Juan Quintela
  2020-01-21 11:08     ` Juan Quintela
  0 siblings, 2 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 16:26 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

* Juan Quintela (quintela@redhat.com) wrote:
> Right now, there is no easy way to dectect if we have already
> cancelled/finished/failed a migration.  This field is setup to true
> when we start a migration, and it is set to false as soon as we stop
> it.
> 
> It fixes a real bug, in ram_save_iterate() we call functions that
> wrote to the channel even if we know that migration has stopped for
> any reason.  This gives problems with multifd because we need to
> synchronize various semoaphores that we don't want to take.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Why can't you use migration_is_active() in the ram.c case?
My preference would be just to stick with something derived
from the state rather than tacking another state bit on.

> ---
>  migration/migration.c | 5 +++++
>  migration/migration.h | 5 +++++
>  migration/ram.c       | 2 +-
>  migration/savevm.c    | 2 ++
>  4 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 990bff00c0..60bc8710b6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1583,6 +1583,8 @@ static void migrate_fd_cancel(MigrationState *s)
>      QEMUFile *f = migrate_get_current()->to_dst_file;
>      trace_migrate_fd_cancel();
>  
> +    s->active = false;
> +
>      if (s->rp_state.from_dst_file) {
>          /* shutdown the rp socket, so causing the rp thread to shutdown */
>          qemu_file_shutdown(s->rp_state.from_dst_file);
> @@ -2834,6 +2836,7 @@ static void migration_completion(MigrationState *s)
>      }
>  
>      if (!migrate_colo_enabled()) {
> +        s->active = false;
>          migrate_set_state(&s->state, current_active_state,
>                            MIGRATION_STATUS_COMPLETED);

You've not always got these two the same way around - i.e. do you change
the state first or do you set the active state first?  I think it needs
to be consistent.

>      }
> @@ -2859,6 +2862,7 @@ fail_invalidate:
>      }
>  
>  fail:
> +    s->active = false;
>      migrate_set_state(&s->state, current_active_state,
>                        MIGRATION_STATUS_FAILED);
>  }
> @@ -3289,6 +3293,7 @@ static void *migration_thread(void *opaque)
>      }
>  
>      qemu_savevm_state_setup(s->to_dst_file);
> +    s->active = true;
>  
>      if (qemu_savevm_nr_failover_devices()) {
>          migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> diff --git a/migration/migration.h b/migration/migration.h
> index aa9ff6f27b..e0386efe95 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -169,6 +169,11 @@ struct MigrationState
>  
>      int state;
>  
> +    /* Is the migration channel still open.  When migration finish,
> +     * gets an error or is cancelled this becomes false.
> +     */
> +
> +    bool active;
>      /* State related to return path */
>      struct {
>          QEMUFile     *from_dst_file;
> diff --git a/migration/ram.c b/migration/ram.c
> index 8f9f3bba5b..44ca56e1ea 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3521,7 +3521,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      ram_control_after_iterate(f, RAM_CONTROL_ROUND);
>  
>  out:
> -    if (ret >= 0) {
> +    if (ret >= 0 && migrate_get_current()->active) {
>          multifd_send_sync_main(rs);
>          qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>          qemu_fflush(f);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index adfdca26ac..3efde5b3dd 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1550,6 +1550,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      qemu_mutex_unlock_iothread();
>      qemu_savevm_state_header(f);
>      qemu_savevm_state_setup(f);
> +    ms->active = true;
>      qemu_mutex_lock_iothread();
>  
>      while (qemu_file_get_error(f) == 0) {
> @@ -1574,6 +1575,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>          status = MIGRATION_STATUS_COMPLETED;
>      }
>      migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP, status);
> +    ms->active = false;
>  
>      /* f is outer parameter, it should not stay in global migration state after
>       * this function finished */
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished
  2020-01-16 15:46 ` [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished Juan Quintela
@ 2020-01-17 16:45   ` Dr. David Alan Gilbert
  2020-01-21 11:10     ` Juan Quintela
  0 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 16:45 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

* Juan Quintela (quintela@redhat.com) wrote:
> If p->quit is true for any channel, we know that it has finished for
> any reason.  So don't wait for it, just continue.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> ---
> 
> I could be convinced that the right thing to do in that case is to
> just do a break instead of a continue.  Each option has its own
> advantages/disadvantanges.
> ---
>  migration/ram.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 44ca56e1ea..bc918ef28d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1118,6 +1118,12 @@ static void multifd_send_sync_main(RAMState *rs)
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
>          trace_multifd_send_sync_main_wait(p->id);
> +        qemu_mutex_lock(&p->mutex);
> +        if (p->quit) {
> +            qemu_mutex_unlock(&p->mutex);
> +            continue;
> +        }
> +        qemu_mutex_unlock(&p->mutex);

Why is this needed/helps?
You can't depend on the p->quit happening before the 
sem_wait, so the main thread still has to do a post on sem_sync before
the join, even with the addition of the check for p->quit.

Dave

>          qemu_sem_wait(&p->sem_sync);
>      }
>      trace_multifd_send_sync_main(multifd_send_state->packet_num);
> -- 
> 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/5] migration: Create MigrationState active field
  2020-01-17 16:26   ` Dr. David Alan Gilbert
@ 2020-01-17 18:35     ` Juan Quintela
  2020-01-21 11:08     ` Juan Quintela
  1 sibling, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2020-01-17 18:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> Right now, there is no easy way to dectect if we have already
>> cancelled/finished/failed a migration.  This field is setup to true
>> when we start a migration, and it is set to false as soon as we stop
>> it.
>> 
>> It fixes a real bug, in ram_save_iterate() we call functions that
>> wrote to the channel even if we know that migration has stopped for
>> any reason.  This gives problems with multifd because we need to
>> synchronize various semoaphores that we don't want to take.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> Why can't you use migration_is_active() in the ram.c case?
> My preference would be just to stick with something derived
> from the state rather than tacking another state bit on.

My plan was to go the other way around.
We need to use the state with atomics, I wanted a single way of deciding
if we are/or not in the middle of a migration.  Just now it is too
confusing on my opinion.

>> ---
>>  migration/migration.c | 5 +++++
>>  migration/migration.h | 5 +++++
>>  migration/ram.c       | 2 +-
>>  migration/savevm.c    | 2 ++
>>  4 files changed, 13 insertions(+), 1 deletion(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 990bff00c0..60bc8710b6 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1583,6 +1583,8 @@ static void migrate_fd_cancel(MigrationState *s)
>>      QEMUFile *f = migrate_get_current()->to_dst_file;
>>      trace_migrate_fd_cancel();
>>  
>> +    s->active = false;
>> +
>>      if (s->rp_state.from_dst_file) {
>>          /* shutdown the rp socket, so causing the rp thread to shutdown */
>>          qemu_file_shutdown(s->rp_state.from_dst_file);
>> @@ -2834,6 +2836,7 @@ static void migration_completion(MigrationState *s)
>>      }
>>  
>>      if (!migrate_colo_enabled()) {
>> +        s->active = false;
>>          migrate_set_state(&s->state, current_active_state,
>>                            MIGRATION_STATUS_COMPLETED);
>
> You've not always got these two the same way around - i.e. do you change
> the state first or do you set the active state first?  I think it needs
> to be consistent.

Ok.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/5] migration: Create MigrationState active field
  2020-01-17 16:26   ` Dr. David Alan Gilbert
  2020-01-17 18:35     ` Juan Quintela
@ 2020-01-21 11:08     ` Juan Quintela
  1 sibling, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2020-01-21 11:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> Right now, there is no easy way to dectect if we have already
>> cancelled/finished/failed a migration.  This field is setup to true
>> when we start a migration, and it is set to false as soon as we stop
>> it.
>> 
>> It fixes a real bug, in ram_save_iterate() we call functions that
>> wrote to the channel even if we know that migration has stopped for
>> any reason.  This gives problems with multifd because we need to
>> synchronize various semoaphores that we don't want to take.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>
> Why can't you use migration_is_active() in the ram.c case?
> My preference would be just to stick with something derived
> from the state rather than tacking another state bit on.

Trying to redo this as something more reasonable.
Problem that I was trying to do is being sure that we know in what state
we are.  Real migration states are:

- NOT_STARTED: We haven't even started
- SETUP: We have started with local stuff but haven't yet transmitted
  anything
- ACTIVE: Migration is donig well, we are trasnmitting data
- FINISHED: We have finished migration (COMPLETED/FAILED/CANCELLED/CANCELLING)
- COLO: Yet a completelly different can of worms

To make things even more interesting, we export ->state, so code can do
whatever they want with that variable.

What do we need in a lot of places:
- migration_is_running() (i.e. channel is still open).

And we go left and right to be sure what is going on.

>> @@ -2834,6 +2836,7 @@ static void migration_completion(MigrationState *s)
>>      }
>>  
>>      if (!migrate_colo_enabled()) {
>> +        s->active = false;
>>          migrate_set_state(&s->state, current_active_state,
>>                            MIGRATION_STATUS_COMPLETED);
>
> You've not always got these two the same way around - i.e. do you change
> the state first or do you set the active state first?  I think it needs
> to be consistent.

As said, I will try to move that to inside migrate_set_state()

thanks, Juan.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished
  2020-01-17 16:45   ` Dr. David Alan Gilbert
@ 2020-01-21 11:10     ` Juan Quintela
  0 siblings, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2020-01-21 11:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, Paolo Bonzini, Thomas Huth, qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> If p->quit is true for any channel, we know that it has finished for
>> any reason.  So don't wait for it, just continue.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> 
>> ---
>> 
>> I could be convinced that the right thing to do in that case is to
>> just do a break instead of a continue.  Each option has its own
>> advantages/disadvantanges.
>> ---
>>  migration/ram.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 44ca56e1ea..bc918ef28d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1118,6 +1118,12 @@ static void multifd_send_sync_main(RAMState *rs)
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>>  
>>          trace_multifd_send_sync_main_wait(p->id);
>> +        qemu_mutex_lock(&p->mutex);
>> +        if (p->quit) {
>> +            qemu_mutex_unlock(&p->mutex);
>> +            continue;
>> +        }
>> +        qemu_mutex_unlock(&p->mutex);
>
> Why is this needed/helps?
> You can't depend on the p->quit happening before the 
> sem_wait, so the main thread still has to do a post on sem_sync before
> the join, even with the addition of the check for p->quit.

if we have asked the thread to quit, it is inside posibility that it has
already quit, so it is not going to be able to do the ->post() for this
sem.

if ->quit == true, then we know that we are exiting.  On _normal_ exit,
we know that everything is ok.  On cancel/error, we don't really know,
it deppends how lucky we are.

Later, Juan.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-21 11:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 15:46 [PATCH v3 0/5] Fix multifd + cancel + multifd Juan Quintela
2020-01-16 15:46 ` [PATCH v3 1/5] multifd: Make sure that we don't do any IO after an error Juan Quintela
2020-01-16 18:20   ` Dr. David Alan Gilbert
2020-01-16 15:46 ` [PATCH v3 2/5] migration: Create MigrationState active field Juan Quintela
2020-01-17 16:26   ` Dr. David Alan Gilbert
2020-01-17 18:35     ` Juan Quintela
2020-01-21 11:08     ` Juan Quintela
2020-01-16 15:46 ` [PATCH v3 3/5] migration: Don't wait in semaphore for thread we know has finished Juan Quintela
2020-01-17 16:45   ` Dr. David Alan Gilbert
2020-01-21 11:10     ` Juan Quintela
2020-01-16 15:46 ` [PATCH v3 4/5] qemu-file: Don't do IO after shutdown Juan Quintela
2020-01-16 15:46 ` [PATCH v3 5/5] migration-test: Make sure that multifd and cancel works Juan Quintela
2020-01-16 15:59   ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.