All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
@ 2009-07-09 11:47 Paolo Bonzini
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 1/3] move state and mon_resume to struct MigrationState Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 11:47 UTC (permalink / raw)
  To: qemu-devel

Unlike for example xen, KVM virtualized guests do not have a method to
generate something like a core file.  While kdump could help in the
case of a crash, it does not cater for non-Linux operating systems
and does not help with dumping the state live, while the machine is
running.

The existing savevm stuff (besides not being supported by libvirt)
does not perform the dump live; it stops the virtual machine before
saving.  One way to do this could be to start migrating to "exec:dd
of=OUT-FILE-NAME" and somehow restart the machine as soon as the migration
ends.  This (except the restarting part) is similar to how libvirt
implements VM snapshotting.  However this has several disadvantages:

1) it is not possible when the terminal does not support synchronous
migration;

2) I'm not sure how easy it would be to "script" it from libvirt -- I
have not tried;

3) last but not least, filename completion would not work from the
QEMU monitor itself. :-)

For this reason I instead opted for a new monitor command, "dump"
(suggestions for a different name are welcome).  The command is
still based on the migration mechanism, the only differences are the
destination, which is a file rather than a URI, and the fact that the
VM is restarted after its state (actually the parts that cannot be saved
live) is saved.

This approach is somewhat obvious and packs a lot of functionality in
a relatively small patch set (e.g. "info migrate" and "migrate_cancel"
will work for dumps too).  Still it does not come without disadvantages:

1) it is impossible to dump and migrate at the same time, though
this is mostly due to limitations of the monitor interface;

2) it is somewhat unintuitive that migrate commands (migrate_cancel in
particular) affect dumps as well.

The patch set is structured as follows.

Patch 1 is a cleanup to move some of the migration logic from
FdMigrationState to MigrationState.  In particular the
active/completed/cancelled/error "state machine" becomes part
of MigrationState.

Patch 2 adds a new state to this state machine, COMPLETING, which is when
the data is not completely written but the VM is stopped.  The new design
simplifies the implementation of live dumping, but arguably this patch
also fixes rare bugs that I found by inspection (see the patch itself).
I'm selling this as a point in favor of the patch.

Patch 3 finally introduces the new command.  The patch is by far
the simplest of the three.

 migration-exec.c |   56 ++++++++++++++++++++++-----
 migration-tcp.c  |    8 +---
 migration.c      |  111 ++++++++++++++++++++++++++++++++++++++---------------
 migration.h      |   26 +++++++++---
 qemu-monitor.hx  |    8 ++++
 5 files changed, 153 insertions(+), 56 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 1/3] move state and mon_resume to struct MigrationState
  2009-07-09 11:47 [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Paolo Bonzini
@ 2009-07-09 11:47 ` Paolo Bonzini
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 11:47 UTC (permalink / raw)
  To: qemu-devel

The handling of suspending/resuming the monitor, and the
small active/completed/cancelled "state machine" should be the same
across all "subclasses" of MigrationState.  If any differences in
behavior are required, more methods should be added that subclasses
can customize.  This is what I do here with migrate_fd_cleanup.

I also extracted the initialization of a FdMigrationState into
a new function, migrate_fd_init.

---
 migration-exec.c |    8 +-----
 migration-tcp.c  |    8 +-----
 migration.c      |   55 +++++++++++++++++++++++++++++++----------------------
 migration.h      |   17 +++++++++------
 4 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 0dd5aff..cfa1304 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -83,16 +83,12 @@ MigrationState *exec_start_outgoing_migration(const char *command,
     s->close = exec_close;
     s->get_error = file_errno;
     s->write = file_write;
-    s->mig_state.cancel = migrate_fd_cancel;
-    s->mig_state.get_status = migrate_fd_get_status;
-    s->mig_state.release = migrate_fd_release;
 
-    s->state = MIG_STATE_ACTIVE;
-    s->mon_resume = NULL;
     s->bandwidth_limit = bandwidth_limit;
 
+    migrate_fd_init(s);
     if (!detach)
-        migrate_fd_monitor_suspend(s);
+        migrate_monitor_suspend(&s->mig_state);
 
     migrate_fd_connect(s);
     return &s->mig_state;
diff --git a/migration-tcp.c b/migration-tcp.c
index 1f4358e..ef48145 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -92,12 +92,8 @@ MigrationState *tcp_start_outgoing_migration(const char *host_port,
     s->get_error = socket_errno;
     s->write = socket_write;
     s->close = tcp_close;
-    s->mig_state.cancel = migrate_fd_cancel;
-    s->mig_state.get_status = migrate_fd_get_status;
-    s->mig_state.release = migrate_fd_release;
+    migrate_fd_init(s);
 
-    s->state = MIG_STATE_ACTIVE;
-    s->mon_resume = NULL;
     s->bandwidth_limit = bandwidth_limit;
     s->fd = socket(PF_INET, SOCK_STREAM, 0);
     if (s->fd == -1) {
@@ -108,7 +104,7 @@ MigrationState *tcp_start_outgoing_migration(const char *host_port,
     socket_set_nonblock(s->fd);
 
     if (!detach)
-        migrate_fd_monitor_suspend(s);
+        migrate_monitor_suspend(&s->mig_state);
 
     do {
         ret = connect(s->fd, (struct sockaddr *)&addr, sizeof(addr));
diff --git a/migration.c b/migration.c
index 190b37e..5abce0c 100644
--- a/migration.c
+++ b/migration.c
@@ -143,7 +143,7 @@ void do_info_migrate(Monitor *mon)
 
     if (s) {
         monitor_printf(mon, "Migration status: ");
-        switch (s->get_status(s)) {
+        switch (s->state) {
         case MIG_STATE_ACTIVE:
             monitor_printf(mon, "active\n");
             monitor_printf(mon, "transferred ram: %" PRIu64 " kbytes\n", ram_bytes_transferred() >> 10);
@@ -163,9 +163,19 @@ void do_info_migrate(Monitor *mon)
     }
 }
 
+void migrate_fd_init(FdMigrationState *s)
+{
+    s->mig_state.cleanup = migrate_fd_cleanup;
+    s->mig_state.cancel = migrate_fd_cancel;
+    s->mig_state.release = migrate_fd_release;
+
+    s->mig_state.state = MIG_STATE_ACTIVE;
+    s->mig_state.mon_resume = NULL;
+}
+
 /* shared migration helpers */
 
-void migrate_fd_monitor_suspend(FdMigrationState *s)
+void migrate_monitor_suspend(MigrationState *s)
 {
     s->mon_resume = cur_mon;
     if (monitor_suspend(cur_mon) == 0)
@@ -178,12 +188,13 @@ void migrate_fd_monitor_suspend(FdMigrationState *s)
 void migrate_fd_error(FdMigrationState *s)
 {
     dprintf("setting error state\n");
-    s->state = MIG_STATE_ERROR;
-    migrate_fd_cleanup(s);
+    migrate_set_state(&s->mig_state, MIG_STATE_ERROR);
 }
 
-void migrate_fd_cleanup(FdMigrationState *s)
+void migrate_fd_cleanup(MigrationState *mig_state)
 {
+    FdMigrationState *s = migrate_to_fms(mig_state);
+
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
     if (s->file) {
@@ -194,10 +205,6 @@ void migrate_fd_cleanup(FdMigrationState *s)
     if (s->fd != -1)
         close(s->fd);
 
-    /* Don't resume monitor until we've flushed all of the buffers */
-    if (s->mon_resume)
-        monitor_resume(s->mon_resume);
-
     s->fd = -1;
 }
 
@@ -253,7 +260,7 @@ void migrate_fd_put_ready(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    if (s->state != MIG_STATE_ACTIVE) {
+    if (s->mig_state.state != MIG_STATE_ACTIVE) {
         dprintf("put_ready returning because of non-active state\n");
         return;
     }
@@ -271,29 +278,32 @@ void migrate_fd_put_ready(void *opaque)
         } else {
             state = MIG_STATE_COMPLETED;
         }
-        migrate_fd_cleanup(s);
-        s->state = state;
+        migrate_set_state(&s->mig_state, state);
     }
 }
 
-int migrate_fd_get_status(MigrationState *mig_state)
+void migrate_set_state(MigrationState *s, int state)
 {
-    FdMigrationState *s = migrate_to_fms(mig_state);
-    return s->state;
+    s->state = state;
+
+    if (state != MIG_STATE_ACTIVE) {
+	s->cleanup (s);
+        /* Don't resume monitor until we've flushed all of the buffers */
+	if (s->mon_resume)
+            monitor_resume(s->mon_resume);
+    }
 }
 
 void migrate_fd_cancel(MigrationState *mig_state)
 {
     FdMigrationState *s = migrate_to_fms(mig_state);
 
-    if (s->state != MIG_STATE_ACTIVE)
+    if (s->mig_state.state != MIG_STATE_ACTIVE)
         return;
 
     dprintf("cancelling migration\n");
 
-    s->state = MIG_STATE_CANCELLED;
-
-    migrate_fd_cleanup(s);
+    migrate_set_state(&s->mig_state, MIG_STATE_CANCELLED);
 }
 
 void migrate_fd_release(MigrationState *mig_state)
@@ -302,9 +312,8 @@ void migrate_fd_release(MigrationState *mig_state)
 
     dprintf("releasing state\n");
    
-    if (s->state == MIG_STATE_ACTIVE) {
-        s->state = MIG_STATE_CANCELLED;
-        migrate_fd_cleanup(s);
+    if (s->mig_state.state == MIG_STATE_ACTIVE) {
+        migrate_set_state(&s->mig_state, MIG_STATE_CANCELLED);
     }
     free(s);
 }
@@ -315,7 +324,7 @@ void migrate_fd_wait_for_unfreeze(void *opaque)
     int ret;
 
     dprintf("wait for unfreeze\n");
-    if (s->state != MIG_STATE_ACTIVE)
+    if (s->mig_state.state != MIG_STATE_ACTIVE)
         return;
 
     do {
diff --git a/migration.h b/migration.h
index 37c7f8e..81ac361 100644
--- a/migration.h
+++ b/migration.h
@@ -25,9 +25,12 @@ typedef struct MigrationState MigrationState;
 
 struct MigrationState
 {
+    Monitor *mon_resume;
+    int state;
+
     /* FIXME: add more accessors to print migration info */
+    void (*cleanup)(MigrationState *s);
     void (*cancel)(MigrationState *s);
-    int (*get_status)(MigrationState *s);
     void (*release)(MigrationState *s);
 };
 
@@ -39,8 +42,6 @@ struct FdMigrationState
     int64_t bandwidth_limit;
     QEMUFile *file;
     int fd;
-    Monitor *mon_resume;
-    int state;
     int (*get_error)(struct FdMigrationState*);
     int (*close)(struct FdMigrationState*);
     int (*write)(struct FdMigrationState*, const void *, size_t);
@@ -73,11 +74,13 @@ MigrationState *tcp_start_outgoing_migration(const char *host_port,
 					     int64_t bandwidth_limit,
 					     int detach);
 
-void migrate_fd_monitor_suspend(FdMigrationState *s);
+void migrate_monitor_suspend(MigrationState *s);
 
-void migrate_fd_error(FdMigrationState *s);
+void migrate_set_state(MigrationState *s, int state);
 
-void migrate_fd_cleanup(FdMigrationState *s);
+void migrate_fd_init(FdMigrationState *s);
+
+void migrate_fd_error(FdMigrationState *s);
 
 void migrate_fd_put_notify(void *opaque);
 
@@ -87,7 +90,7 @@ void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
 
-int migrate_fd_get_status(MigrationState *mig_state);
+void migrate_fd_cleanup(MigrationState *mig_state);
 
 void migrate_fd_cancel(MigrationState *mig_state);
 
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 11:47 [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Paolo Bonzini
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 1/3] move state and mon_resume to struct MigrationState Paolo Bonzini
@ 2009-07-09 11:47 ` Paolo Bonzini
  2009-07-09 13:45   ` Anthony Liguori
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 3/3] add live dumping capability Paolo Bonzini
  2009-07-09 13:42 ` [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Anthony Liguori
  3 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 11:47 UTC (permalink / raw)
  To: qemu-devel

With this patch, the state machine grows a new state, COMPLETING.
This new state corresponds to the VM being stopped while migration
is finalized.  Stopping and starting the machine is driven
exclusively by the state machine mechanisms.

Two rare bugs are fixed.  The first is a race; the VM used to remain
stopped if a migration was canceled exactly in what corresponds
to the new COMPLETING state.  A bit worse is that if an error occurred
exactly during the final stage (for example due to disk full)
the VM was unconditionally restarted, even if it was paused before
the beginning of the migration.

---
 migration.c |   36 +++++++++++++++++++++++++-----------
 migration.h |    2 ++
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/migration.c b/migration.c
index 5abce0c..96585dd 100644
--- a/migration.c
+++ b/migration.c
@@ -150,6 +150,11 @@ void do_info_migrate(Monitor *mon)
             monitor_printf(mon, "remaining ram: %" PRIu64 " kbytes\n", ram_bytes_remaining() >> 10);
             monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n", ram_bytes_total() >> 10);
             break;
+        case MIG_STATE_COMPLETING:
+            monitor_printf(mon, "completing\n");
+            monitor_printf(mon, "transferred ram: %" PRIu64 " kbytes\n", ram_bytes_transferred() >> 10);
+            monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n", ram_bytes_total() >> 10);
+            break;
         case MIG_STATE_COMPLETED:
             monitor_printf(mon, "completed\n");
             break;
@@ -267,18 +272,13 @@ void migrate_fd_put_ready(void *opaque)
 
     dprintf("iterate\n");
     if (qemu_savevm_state_iterate(s->file) == 1) {
-        int state;
         dprintf("done iterating\n");
-        vm_stop(0);
 
-        bdrv_flush_all();
-        if ((qemu_savevm_state_complete(s->file)) < 0) {
-            vm_start();
-            state = MIG_STATE_ERROR;
-        } else {
-            state = MIG_STATE_COMPLETED;
-        }
-        migrate_set_state(&s->mig_state, state);
+        migrate_set_state(&s->mig_state, MIG_STATE_COMPLETING);
+        if ((qemu_savevm_state_complete(s->file)) < 0)
+            migrate_set_state(&s->mig_state, MIG_STATE_ERROR);
+        else
+            migrate_set_state(&s->mig_state, MIG_STATE_COMPLETED);
     }
 }
 
@@ -286,11 +286,25 @@ void migrate_set_state(MigrationState *s, int state)
 {
     s->state = state;
 
-    if (state != MIG_STATE_ACTIVE) {
+    switch (state) {
+    case MIG_STATE_COMPLETING:
+        s->save_vm_running = vm_running;
+        vm_stop(0);
+        bdrv_flush_all();
+        break;
+
+    case MIG_STATE_ACTIVE:
+        break;
+
+    default:
 	s->cleanup (s);
         /* Don't resume monitor until we've flushed all of the buffers */
 	if (s->mon_resume)
             monitor_resume(s->mon_resume);
+
+        if (s->save_vm_running && state != MIG_STATE_COMPLETED)
+	    vm_start();
+        break;
     }
 }
 
diff --git a/migration.h b/migration.h
index 81ac361..c44bd75 100644
--- a/migration.h
+++ b/migration.h
@@ -20,6 +20,7 @@
 #define MIG_STATE_COMPLETED	0
 #define MIG_STATE_CANCELLED	1
 #define MIG_STATE_ACTIVE	2
+#define MIG_STATE_COMPLETING	3
 
 typedef struct MigrationState MigrationState;
 
@@ -27,6 +28,7 @@ struct MigrationState
 {
     Monitor *mon_resume;
     int state;
+    int save_vm_running;
 
     /* FIXME: add more accessors to print migration info */
     void (*cleanup)(MigrationState *s);
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-09 11:47 [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Paolo Bonzini
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 1/3] move state and mon_resume to struct MigrationState Paolo Bonzini
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state Paolo Bonzini
@ 2009-07-09 11:47 ` Paolo Bonzini
  2009-07-09 13:49   ` Anthony Liguori
  2009-07-09 13:42 ` [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Anthony Liguori
  3 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 11:47 UTC (permalink / raw)
  To: qemu-devel

With the previous cleanups in place, it is easy to trigger
restart when the state machine goes from the COMPLETING to the
COMPLETED state.  Besides this, the patch is just simple
scaffolding for the monitor command and to migrate to a file
rather than a pipe (which is a bit simpler because we do not
need non-blocking I/O).

The patch reuses most of the code in migration-exec.c.  The
functions there were half named exec_* and the other half
named file_*.  I consistently adopted file_* for what can be used
for dumping, and exec_* when the function is different in the
two cases.
---
	Cancelling a dump will leave a half-created file.  I
	can fix this in a v2 or in a follow-up.  Right now,
	I would like to make sure that the idea is sound and
	agree on the name of the new monitor command (if
	anything).

 migration-exec.c |   48 +++++++++++++++++++++++++++++++++++++++++++-----
 migration.c      |   26 +++++++++++++++++++++++++-
 migration.h      |    7 +++++++
 qemu-monitor.hx  |    8 ++++++++
 4 files changed, 83 insertions(+), 6 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index cfa1304..95642dc 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -36,14 +36,20 @@ static int file_errno(FdMigrationState *s)
     return errno;
 }
 
-static int file_write(FdMigrationState *s, const void * buf, size_t size)
+static int exec_write(FdMigrationState *s, const void * buf, size_t size)
 {
     return write(s->fd, buf, size);
 }
 
-static int exec_close(FdMigrationState *s)
+static int file_write(FdMigrationState *s, const void * buf, size_t size)
+{
+    qemu_put_buffer(s->opaque, buf, size);
+    return size;
+}
+
+static int file_close(FdMigrationState *s)
 {
-    dprintf("exec_close\n");
+    dprintf("file_close\n");
     if (s->opaque) {
         qemu_fclose(s->opaque);
         s->opaque = NULL;
@@ -80,9 +86,9 @@ MigrationState *exec_start_outgoing_migration(const char *command,
 
     s->opaque = qemu_popen(f, "w");
 
-    s->close = exec_close;
+    s->close = file_close;
     s->get_error = file_errno;
-    s->write = file_write;
+    s->write = exec_write;
 
     s->bandwidth_limit = bandwidth_limit;
 
@@ -138,3 +144,35 @@ int exec_start_incoming_migration(const char *command)
 
     return 0;
 }
+
+MigrationState *start_live_dump(const char *file,
+                                int64_t bandwidth_limit,
+                                int detach)
+{
+    FdMigrationState *s;
+    FILE *f;
+
+    s = qemu_mallocz(sizeof(*s));
+
+    s->opaque = qemu_fopen(file, "wb");
+    if (!s->opaque)
+        goto err_after_alloc;
+
+    s->close = file_close;
+    s->get_error = file_errno;
+    s->write = file_write;
+    s->bandwidth_limit = bandwidth_limit;
+
+    migrate_fd_init(s);
+    s->mig_state.restart_after = 1;
+    if (!detach)
+        migrate_monitor_suspend(&s->mig_state);
+
+    migrate_fd_connect(s);
+    return &s->mig_state;
+
+    pclose(f);
+err_after_alloc:
+    qemu_free(s);
+    return NULL;
+}
diff --git a/migration.c b/migration.c
index 96585dd..beed034 100644
--- a/migration.c
+++ b/migration.c
@@ -72,6 +72,29 @@ void do_migrate(Monitor *mon, int detach, const char *uri)
     }
 }
 
+void do_dump(Monitor *mon, int detach, const char *file)
+{
+    MigrationState *s;
+
+    if (current_migration) {
+        if (current_migration->state == MIG_STATE_ACTIVE) {
+            monitor_printf(mon, "migration in progress, cannot dump\n");
+            return;
+        }
+    }
+
+    s = start_live_dump(file, max_throttle, detach);
+
+    if (s == NULL)
+        monitor_printf(mon, "dump failed\n");
+    else {
+        if (current_migration)
+            current_migration->release(current_migration);
+
+        current_migration = s;
+    }
+}
+
 void do_migrate_cancel(Monitor *mon)
 {
     MigrationState *s = current_migration;
@@ -302,7 +325,8 @@ void migrate_set_state(MigrationState *s, int state)
 	if (s->mon_resume)
             monitor_resume(s->mon_resume);
 
-        if (s->save_vm_running && state != MIG_STATE_COMPLETED)
+        if (s->save_vm_running &&
+	    (state != MIG_STATE_COMPLETED || s->restart_after))
 	    vm_start();
         break;
     }
diff --git a/migration.h b/migration.h
index c44bd75..17dea30 100644
--- a/migration.h
+++ b/migration.h
@@ -28,6 +28,7 @@ struct MigrationState
 {
     Monitor *mon_resume;
     int state;
+    int restart_after;
     int save_vm_running;
 
     /* FIXME: add more accessors to print migration info */
@@ -52,6 +53,8 @@ struct FdMigrationState
 
 void qemu_start_incoming_migration(const char *uri);
 
+void do_dump(Monitor *mon, int detach, const char *file);
+
 void do_migrate(Monitor *mon, int detach, const char *uri);
 
 void do_migrate_cancel(Monitor *mon);
@@ -64,6 +67,10 @@ void do_migrate_set_downtime(Monitor *mon, const char *value);
 
 void do_info_migrate(Monitor *mon);
 
+MigrationState *start_live_dump(const char *file,
+                                int64_t bandwidth_limit,
+                                int detach);
+
 int exec_start_incoming_migration(const char *host_port);
 
 MigrationState *exec_start_outgoing_migration(const char *host_port,
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index dc10b75..86ec3a9 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -463,6 +463,14 @@ STEXI
 Inject an NMI on the given CPU (x86 only).
 ETEXI
 
+    { "dump", "-dF", do_dump,
+      "[-d] file", "dump the VM state to FILE (using -d to not wait for completion)" },
+STEXI
+@item dump [-d] @var{file}
+Dump the state of the running VM to @var{file} (using -d to not wait for
+completion).  Commands that act on migration can be used for dumps too.
+ETEXI
+
     { "migrate", "-ds", do_migrate,
       "[-d] uri", "migrate to URI (using -d to not wait for completion)" },
 STEXI
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
  2009-07-09 11:47 [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Paolo Bonzini
                   ` (2 preceding siblings ...)
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 3/3] add live dumping capability Paolo Bonzini
@ 2009-07-09 13:42 ` Anthony Liguori
  2009-07-09 13:46   ` Paolo Bonzini
  3 siblings, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> Unlike for example xen, KVM virtualized guests do not have a method to
> generate something like a core file.  While kdump could help in the
> case of a crash, it does not cater for non-Linux operating systems
> and does not help with dumping the state live, while the machine is
> running.
>   

Actually, you can use gdb to dump a core from a running guest.  Just do:

(gdb) gcore filename


But really, live migration gives you all the state you need.  Why not 
just introduce a third-party program that you could live migrate to via 
exec?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state Paolo Bonzini
@ 2009-07-09 13:45   ` Anthony Liguori
  2009-07-09 13:48     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 13:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> With this patch, the state machine grows a new state, COMPLETING.
> This new state corresponds to the VM being stopped while migration
> is finalized.  Stopping and starting the machine is driven
> exclusively by the state machine mechanisms.
>
> Two rare bugs are fixed.  The first is a race; the VM used to remain
> stopped if a migration was canceled exactly in what corresponds
> to the new COMPLETING state.

You cannot cancel during the stopped state.  When you enter into the 
final stage, you no longer process monitor input.

>   A bit worse is that if an error occurred
> exactly during the final stage (for example due to disk full)
> the VM was unconditionally restarted, even if it was paused before
> the beginning of the migration.
>   

How does the disk become full during the final stage?  The guest isn't 
running.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
  2009-07-09 13:42 ` [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Anthony Liguori
@ 2009-07-09 13:46   ` Paolo Bonzini
  2009-07-09 13:51     ` Anthony Liguori
  2009-07-09 14:46     ` Gerd Hoffmann
  0 siblings, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 13:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel


> But really, live migration gives you all the state you need. Why not
> just introduce a third-party program that you could live migrate to via
> exec?

dd can be such a program, but how would you restart the VM at the end of 
migration?

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 13:45   ` Anthony Liguori
@ 2009-07-09 13:48     ` Paolo Bonzini
  2009-07-09 13:53       ` Anthony Liguori
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 13:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On 07/09/2009 03:45 PM, Anthony Liguori wrote:
> How does the disk become full during the final stage?  The guest isn't
> running.

The host disk can become full and cause a "migrate exec" to fail.  Or 
for network migration migration, you could have the connection drop 
exactly during the final stage.  In this case, the VM would be 
unconditionally restarted.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-09 11:47 ` [Qemu-devel] [PATCH 3/3] add live dumping capability Paolo Bonzini
@ 2009-07-09 13:49   ` Anthony Liguori
  2009-07-09 14:06     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 13:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> With the previous cleanups in place, it is easy to trigger
> restart when the state machine goes from the COMPLETING to the
> COMPLETED state.  Besides this, the patch is just simple
> scaffolding for the monitor command and to migrate to a file
> rather than a pipe (which is a bit simpler because we do not
> need non-blocking I/O).
>   

Then this isn't live migration.

This is functionally equivalent to migrate "exec:dd of=filename", no?

I don't think there's value in introduce a new monitor command that just 
does the above.

If you were truly dumping an actual core file and things remained 
"live", that would be compelling, but it would be a lot easier to just 
implement that as an external process.  If you just used a table that 
mapped section names and versions to length, you only really need to 
understand the format of ram and cpu save sections.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
  2009-07-09 13:46   ` Paolo Bonzini
@ 2009-07-09 13:51     ` Anthony Liguori
  2009-07-09 14:46     ` Gerd Hoffmann
  1 sibling, 0 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 13:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
>
>> But really, live migration gives you all the state you need. Why not
>> just introduce a third-party program that you could live migrate to via
>> exec?
>
> dd can be such a program, but how would you restart the VM at the end 
> of migration?

Originally, if exec: returns a non-zero status, the VM was automatically 
restarted.   We should restore that functionality.

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 13:48     ` Paolo Bonzini
@ 2009-07-09 13:53       ` Anthony Liguori
  2009-07-09 13:58         ` Paolo Bonzini
  2009-07-10 23:14         ` Jamie Lokier
  0 siblings, 2 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 13:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> On 07/09/2009 03:45 PM, Anthony Liguori wrote:
>> How does the disk become full during the final stage?  The guest isn't
>> running.
>
> The host disk can become full and cause a "migrate exec" to fail.  Or 
> for network migration migration, you could have the connection drop 
> exactly during the final stage.  In this case, the VM would be 
> unconditionally restarted.

Because migration failed.  Is that not the desired behavior?  It seems 
like it is to me.

If I try to do a live migration, it should either succeed and my guest 
experiences minimal downtime or it should fail and my guest should 
experience minimal downtime.

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 13:53       ` Anthony Liguori
@ 2009-07-09 13:58         ` Paolo Bonzini
  2009-07-09 14:41           ` Anthony Liguori
  2009-07-10 23:14         ` Jamie Lokier
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 13:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On 07/09/2009 03:53 PM, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> On 07/09/2009 03:45 PM, Anthony Liguori wrote:
>>> How does the disk become full during the final stage? The guest isn't
>>> running.
>>
>> The host disk can become full and cause a "migrate exec" to fail. Or
>> for network migration migration, you could have the connection drop
>> exactly during the final stage. In this case, the VM would be
>> unconditionally restarted.
>
> Because migration failed. Is that not the desired behavior? It seems
> like it is to me.

By "unconditionally" I meant "even if the VM used to be paused".

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-09 13:49   ` Anthony Liguori
@ 2009-07-09 14:06     ` Paolo Bonzini
  2009-07-09 14:43       ` Anthony Liguori
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 14:06 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

>> With the previous cleanups in place, it is easy to trigger
>> restart when the state machine goes from the COMPLETING to the
>> COMPLETED state. Besides this, the patch is just simple
>> scaffolding for the monitor command and to migrate to a file
>> rather than a pipe (which is a bit simpler because we do not
>> need non-blocking I/O).
>
> Then this isn't live migration.

Sorry, I cannot understand this remark.

> This is functionally equivalent to migrate "exec:dd of=filename", no?

Yes, except for restarting at the end.

> If you were truly dumping an actual core file and things remained
> "live", that would be compelling, but it would be a lot easier to just
> implement that as an external process. If you just used a table that
> mapped section names and versions to length, you only really need to
> understand the format of ram and cpu save sections.

I already have the code to read ram and cpu save sections.  I don't need 
the "core file" to be ELF; I just need the live state to be dumped with 
as small downtime as possible, and the live migration support provides that.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 13:58         ` Paolo Bonzini
@ 2009-07-09 14:41           ` Anthony Liguori
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> On 07/09/2009 03:53 PM, Anthony Liguori wrote:
>> Paolo Bonzini wrote:
>>> On 07/09/2009 03:45 PM, Anthony Liguori wrote:
>>>> How does the disk become full during the final stage? The guest isn't
>>>> running.
>>>
>>> The host disk can become full and cause a "migrate exec" to fail. Or
>>> for network migration migration, you could have the connection drop
>>> exactly during the final stage. In this case, the VM would be
>>> unconditionally restarted.
>>
>> Because migration failed. Is that not the desired behavior? It seems
>> like it is to me.
>
> By "unconditionally" I meant "even if the VM used to be paused".

Oh, that's definitely a bug.  But that's an easy fix too:

diff --git a/migration.c b/migration.c
index 190b37e..e6c8b16 100644
--- a/migration.c
+++ b/migration.c
@@ -261,12 +261,16 @@ void migrate_fd_put_ready(void *opaque)
     dprintf("iterate\n");
     if (qemu_savevm_state_iterate(s->file) == 1) {
         int state;
+        int old_vm_running = vm_running;
+
         dprintf("done iterating\n");
         vm_stop(0);
 
         bdrv_flush_all();
         if ((qemu_savevm_state_complete(s->file)) < 0) {
-            vm_start();
+            if (old_vm_running) {
+                vm_start();
+            }
             state = MIG_STATE_ERROR;
         } else {
             state = MIG_STATE_COMPLETED;

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-09 14:06     ` Paolo Bonzini
@ 2009-07-09 14:43       ` Anthony Liguori
  2009-07-10  8:32         ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-09 14:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
>>> With the previous cleanups in place, it is easy to trigger
>>> restart when the state machine goes from the COMPLETING to the
>>> COMPLETED state. Besides this, the patch is just simple
>>> scaffolding for the monitor command and to migrate to a file
>>> rather than a pipe (which is a bit simpler because we do not
>>> need non-blocking I/O).
>>
>> Then this isn't live migration.
>
> Sorry, I cannot understand this remark.

You're using blocking I/O which will cause the guest to pause while disk 
I/O is happening.

If you want to see this in action, before running dump, type 
"migrate_set_rate 10G" in the monitor.  It only appears live now because 
the default rate limit is pretty close to the write speed of a typical disk.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
  2009-07-09 13:46   ` Paolo Bonzini
  2009-07-09 13:51     ` Anthony Liguori
@ 2009-07-09 14:46     ` Gerd Hoffmann
  2009-07-09 16:20       ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Gerd Hoffmann @ 2009-07-09 14:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 07/09/09 15:46, Paolo Bonzini wrote:
>
>> But really, live migration gives you all the state you need. Why not
>> just introduce a third-party program that you could live migrate to via
>> exec?
>
> dd can be such a program, but how would you restart the VM at the end of
> migration?

'cont' monitor command?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] add "core dump"-like capability
  2009-07-09 14:46     ` Gerd Hoffmann
@ 2009-07-09 16:20       ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-09 16:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel


>>> But really, live migration gives you all the state you need. Why not
>>> just introduce a third-party program that you could live migrate to via
>>> exec?
>>
>> dd can be such a program, but how would you restart the VM at the end of
>> migration?
>
> 'cont' monitor command?

That's okay for libvirt, but it wouldn't work for other users if the 
monitor does not support synchronous commands.  (Assuming polling is not 
a solution).

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-09 14:43       ` Anthony Liguori
@ 2009-07-10  8:32         ` Paolo Bonzini
  2009-07-10 12:51           ` Anthony Liguori
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-10  8:32 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On 07/09/2009 04:43 PM, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>>>> With the previous cleanups in place, it is easy to trigger
>>>> restart when the state machine goes from the COMPLETING to the
>>>> COMPLETED state. Besides this, the patch is just simple
>>>> scaffolding for the monitor command and to migrate to a file
>>>> rather than a pipe (which is a bit simpler because we do not
>>>> need non-blocking I/O).
>>>
>>> Then this isn't live migration.
>>
>> Sorry, I cannot understand this remark.
>
> You're using blocking I/O which will cause the guest to pause while disk
> I/O is happening.

Unfortunately non-blocking I/O does not work with files (it works with 
named pipes of course, so it would have been a good idea to keep it):

$ cat f.c
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
   int f = open("f.txt", O_CREAT|O_WRONLY, 0666);
   fcntl (f, F_SETFL, fcntl (f, F_GETFL) | O_NONBLOCK);
   char *m = malloc (1 << 28);
   write (f, m, 1 << 28);
}
$ gcc f.c
$ strace ./a.out
open("f.txt", O_WRONLY|O_CREAT, 0666)   = 3
fcntl(3, F_GETFL)                       = O_WRONLY|O_LARGEFILE
fcntl(3, F_SETFL, O_WRONLY|O_NONBLOCK|O_LARGEFILE) = 0
mmap(NULL, 268439552, ...) = 0x7f73634db000
write(3, "\0"..., 268435456) = <after quite some time> 268435456
$

Anyway, 3/3 is withdrawn since as far as libvirt is concerned I can use 
migrate+cont.  Do you have any comments on 1/3 and 2/3 to include them 
as cleanups, or you prefer to go with just your small patch to fix the 
unconditional restart after an error?

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] add live dumping capability
  2009-07-10  8:32         ` Paolo Bonzini
@ 2009-07-10 12:51           ` Anthony Liguori
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-10 12:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> Unfortunately non-blocking I/O does not work with files (it works with 
> named pipes of course, so it would have been a good idea to keep it):

That's why we have a thread pool to do asynchronous IO to files.

> Anyway, 3/3 is withdrawn since as far as libvirt is concerned I can 
> use migrate+cont.  Do you have any comments on 1/3 and 2/3 to include 
> them as cleanups, or you prefer to go with just your small patch to 
> fix the unconditional restart after an error?

I prefer the small fix.

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-09 13:53       ` Anthony Liguori
  2009-07-09 13:58         ` Paolo Bonzini
@ 2009-07-10 23:14         ` Jamie Lokier
  2009-07-11  0:04           ` malc
  2009-07-11  0:58           ` Anthony Liguori
  1 sibling, 2 replies; 34+ messages in thread
From: Jamie Lokier @ 2009-07-10 23:14 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

Anthony Liguori wrote:
> Paolo Bonzini wrote:
> >On 07/09/2009 03:45 PM, Anthony Liguori wrote:
> >>How does the disk become full during the final stage?  The guest isn't
> >>running.
> >
> >The host disk can become full and cause a "migrate exec" to fail.  Or 
> >for network migration migration, you could have the connection drop 
> >exactly during the final stage.  In this case, the VM would be 
> >unconditionally restarted.
> 
> Because migration failed.  Is that not the desired behavior?  It seems 
> like it is to me.
> 
> If I try to do a live migration, it should either succeed and my guest 
> experiences minimal downtime or it should fail and my guest should 
> experience minimal downtime.

What happens if the destination host sends "migration completed", and
then the connection drops before that message is delivered reliably to
the sending host?

The destination host will run the VM,
and the sending host will restart and run the VM too.

Two copies of the same VM running together doesn't sound healthy.

This is a classic handshaking problem and I'm not aware of any perfect
solution, only ways to ensure eventual recovery, and temporary
uncertainty errs on the side of caution.  In this case, caution would
be neither VM running but a notification to the system manager of this
rare condition, and the possibility to recover when the two hosts are
able to resume communication.  I don't know how to do better than that.

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-10 23:14         ` Jamie Lokier
@ 2009-07-11  0:04           ` malc
  2009-07-11  0:42             ` Jamie Lokier
  2009-07-11  0:55             ` Anthony Liguori
  2009-07-11  0:58           ` Anthony Liguori
  1 sibling, 2 replies; 34+ messages in thread
From: malc @ 2009-07-11  0:04 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Paolo Bonzini, qemu-devel

On Sat, 11 Jul 2009, Jamie Lokier wrote:

> Anthony Liguori wrote:
> > Paolo Bonzini wrote:
> > >On 07/09/2009 03:45 PM, Anthony Liguori wrote:
> > >>How does the disk become full during the final stage?  The guest isn't
> > >>running.
> > >
> > >The host disk can become full and cause a "migrate exec" to fail.  Or 
> > >for network migration migration, you could have the connection drop 
> > >exactly during the final stage.  In this case, the VM would be 
> > >unconditionally restarted.
> > 
> > Because migration failed.  Is that not the desired behavior?  It seems 
> > like it is to me.
> > 
> > If I try to do a live migration, it should either succeed and my guest 
> > experiences minimal downtime or it should fail and my guest should 
> > experience minimal downtime.
> 
> What happens if the destination host sends "migration completed", and
> then the connection drops before that message is delivered reliably to
> the sending host?
> 
> The destination host will run the VM,
> and the sending host will restart and run the VM too.
> 
> Two copies of the same VM running together doesn't sound healthy.
> 
> This is a classic handshaking problem and I'm not aware of any perfect
> solution, only ways to ensure eventual recovery, and temporary
> uncertainty errs on the side of caution.  In this case, caution would
> be neither VM running but a notification to the system manager of this
> rare condition, and the possibility to recover when the two hosts are
> able to resume communication.  I don't know how to do better than that.

Sounds like http://en.wikipedia.org/wiki/Two_Generals%27_Problem

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-11  0:04           ` malc
@ 2009-07-11  0:42             ` Jamie Lokier
  2009-07-11  0:55             ` Anthony Liguori
  1 sibling, 0 replies; 34+ messages in thread
From: Jamie Lokier @ 2009-07-11  0:42 UTC (permalink / raw)
  To: malc; +Cc: Paolo Bonzini, qemu-devel

malc wrote:
> > What happens if the destination host sends "migration completed", and
> > then the connection drops before that message is delivered reliably to
> > the sending host?
> > 
> > The destination host will run the VM,
> > and the sending host will restart and run the VM too.
> > 
> > Two copies of the same VM running together doesn't sound healthy.
> > 
> > This is a classic handshaking problem and I'm not aware of any perfect
> > solution, only ways to ensure eventual recovery, and temporary
> > uncertainty errs on the side of caution.  In this case, caution would
> > be neither VM running but a notification to the system manager of this
> > rare condition, and the possibility to recover when the two hosts are
> > able to resume communication.  I don't know how to do better than that.
> 
> Sounds like http://en.wikipedia.org/wiki/Two_Generals%27_Problem

It's not the same.  Unlike the Two Generals, the handshake has
outcomes which allow progress with guaranteed safety.  Two outcomes
result in one or other machine running, and a third outcome is both
machines being stopped, and repeatedly attempting to communicate for
recovery.  Both machines stopped is undesirable (and may be a
catastrophe for some applications), but it is safe in some useful
sense - it's not a disastrous failure compared with both running.

Two Generals, on the other hand, doesn't have any safe solutions,
except for no progress at all.  There is no way for either General to
proceed without some risk of failure, so the only strategy is to
minimise that probability.

-- Jamie







there is uncertainty

  1. A sends "migration complete, you start running" to B, and A stops.
  2. B sends "migration complete accepted" to A, and starts running.

If message 2 is lost, B will be running, A will be stopped, though A
is uncertain.  A defers to the system operator, or keeps trying to
communicate with B.

If message 1 is lost, A 


  - 


> 
> -- 
> mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-11  0:04           ` malc
  2009-07-11  0:42             ` Jamie Lokier
@ 2009-07-11  0:55             ` Anthony Liguori
  1 sibling, 0 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-11  0:55 UTC (permalink / raw)
  To: malc; +Cc: Paolo Bonzini, qemu-devel

malc wrote:
> On Sat, 11 Jul 2009, Jamie Lokier wrote:
>   
> Sounds like http://en.wikipedia.org/wiki/Two_Generals%27_Problem
>   

And this is why the source remains in the stopped state instead of 
exiting after a successful migration.

The only "safe" solution is to use a reliable third party which is 
usually going to be a management tool.  If there is an undetected 
failure, the management tool always has the option of resuming the 
source node.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-10 23:14         ` Jamie Lokier
  2009-07-11  0:04           ` malc
@ 2009-07-11  0:58           ` Anthony Liguori
  2009-07-11  1:42             ` Jamie Lokier
  1 sibling, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-11  0:58 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Paolo Bonzini, qemu-devel

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>> Paolo Bonzini wrote:
>>     
>>> On 07/09/2009 03:45 PM, Anthony Liguori wrote:
>>>       
>>>> How does the disk become full during the final stage?  The guest isn't
>>>> running.
>>>>         
>>> The host disk can become full and cause a "migrate exec" to fail.  Or 
>>> for network migration migration, you could have the connection drop 
>>> exactly during the final stage.  In this case, the VM would be 
>>> unconditionally restarted.
>>>       
>> Because migration failed.  Is that not the desired behavior?  It seems 
>> like it is to me.
>>
>> If I try to do a live migration, it should either succeed and my guest 
>> experiences minimal downtime or it should fail and my guest should 
>> experience minimal downtime.
>>     
>
> What happens if the destination host sends "migration completed", and
> then the connection drops before that message is delivered reliably to
> the sending host?
>   

We don't check the return value of close so the last possible place 
failure can occur is the last write.  By definition, if the write 
failed, the migration session could not have been completed successfully.

Migration is unidirectional.  There is no "migration completed" message 
from the destination.  We're very conservative wrt restarting the source.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-11  0:58           ` Anthony Liguori
@ 2009-07-11  1:42             ` Jamie Lokier
  2009-07-12  3:31               ` Anthony Liguori
  0 siblings, 1 reply; 34+ messages in thread
From: Jamie Lokier @ 2009-07-11  1:42 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

Anthony Liguori wrote:
> Jamie Lokier wrote:
> >Anthony Liguori wrote:
> >  
> >>Paolo Bonzini wrote:
> >>    
> >>>On 07/09/2009 03:45 PM, Anthony Liguori wrote:
> >>>      
> >>>>How does the disk become full during the final stage?  The guest isn't
> >>>>running.
> >>>>        
> >>>The host disk can become full and cause a "migrate exec" to fail.  Or 
> >>>for network migration migration, you could have the connection drop 
> >>>exactly during the final stage.  In this case, the VM would be 
> >>>unconditionally restarted.
> >>>      
> >>Because migration failed.  Is that not the desired behavior?  It seems 
> >>like it is to me.
> >>
> >>If I try to do a live migration, it should either succeed and my guest 
> >>experiences minimal downtime or it should fail and my guest should 
> >>experience minimal downtime.
> >>    
> >
> >What happens if the destination host sends "migration completed", and
> >then the connection drops before that message is delivered reliably to
> >the sending host?
> >  
> 
> We don't check the return value of close

Linux doesn't return I/O or network errors from close() anyway, except
for a few network filesystems, and not even those in older kernels.  It
generally returns zero.

(If you were saving to disk and wanted to detect write I/O errors,
which by the way includes disk full when writing to a network
filesystem, you'll need to call fsync().  I'm not sure if this is relevant).

> so the last possible place failure can occur is the last write.  By
> definition, if the write failed, the migration session could not
> have been completed successfully.  Migration is unidirectional.
> There is no "migration completed" message from the destination.
> We're very conservative wrt restarting the source.

Yes, I agree, as long as it's conservative and only restarts when the
last byte needed to start the destination has definitely not been
written, that's safe.  That's a good design.

If you get an error during the last write(), I wouldn't trust that to
mean the recipient will definitely not see the data you wrote.  (Enjoy
the double negative).  It's another variation of the handshake
uncertainty, this time reflected in what write() should report when
it's uncertain about a network transmission.  If it reports an error
when it's uncertain, then you can't trust that a write() error means
the data was not written, only that a problem was detected.

By saving the final "commit" byte for it's own 1-byte write(), then if
you get an error from any earlier write, then of course you know the
last byte has not been sent and it's safe to resume the source.
Reading SO_ERROR before the 1-byte write() would maximise this chance,
but it's probably so rare as to be pointless.

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-11  1:42             ` Jamie Lokier
@ 2009-07-12  3:31               ` Anthony Liguori
  2009-07-12 14:22                 ` Avi Kivity
  0 siblings, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-12  3:31 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Paolo Bonzini, qemu-devel

Jamie Lokier wrote:
> If you get an error during the last write(), I wouldn't trust that to
> mean the recipient will definitely not see the data you wrote.  (Enjoy
> the double negative).  It's another variation of the handshake
> uncertainty, this time reflected in what write() should report when
> it's uncertain about a network transmission.  If it reports an error
> when it's uncertain, then you can't trust that a write() error means
> the data was not written, only that a problem was detected.
>   

I think you're stretching here.  If it really were the case that write() 
could actually result in data being sent out the wire and yet still 
returning an error, it would make all error handling in Unix 
unmanagable.  I can't believe this is possible in Linux and without an 
actual counter-example, I'm inclined to believe the same is true for 
every other OS out there.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-12  3:31               ` Anthony Liguori
@ 2009-07-12 14:22                 ` Avi Kivity
  2009-07-12 19:10                   ` Anthony Liguori
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2009-07-12 14:22 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On 07/12/2009 06:31 AM, Anthony Liguori wrote:
> Jamie Lokier wrote:
>> If you get an error during the last write(), I wouldn't trust that to
>> mean the recipient will definitely not see the data you wrote.  (Enjoy
>> the double negative).  It's another variation of the handshake
>> uncertainty, this time reflected in what write() should report when
>> it's uncertain about a network transmission.  If it reports an error
>> when it's uncertain, then you can't trust that a write() error means
>> the data was not written, only that a problem was detected.
>
> I think you're stretching here.  If it really were the case that 
> write() could actually result in data being sent out the wire and yet 
> still returning an error, it would make all error handling in Unix 
> unmanagable.  I can't believe this is possible in Linux and without an 
> actual counter-example, I'm inclined to believe the same is true for 
> every other OS out there.

It's actually a common scenario for block devices.  I don't know about 
networking, but for disks a write can be completed and then report an 
error if the cable or power was disconnected before the acknowledge 
could arrive.

It could conceivably happen with networking if the device reports an 
error when it isn't sure if the data was sent out or not (but it 
actually was), or if some path after the transmission required a memory 
allocation, which failed.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-12 14:22                 ` Avi Kivity
@ 2009-07-12 19:10                   ` Anthony Liguori
  2009-07-12 19:30                     ` Avi Kivity
  2009-07-13  5:31                     ` Gleb Natapov
  0 siblings, 2 replies; 34+ messages in thread
From: Anthony Liguori @ 2009-07-12 19:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, qemu-devel

Avi Kivity wrote:
> On 07/12/2009 06:31 AM, Anthony Liguori wrote:
>> Jamie Lokier wrote:
>>> If you get an error during the last write(), I wouldn't trust that to
>>> mean the recipient will definitely not see the data you wrote.  (Enjoy
>>> the double negative).  It's another variation of the handshake
>>> uncertainty, this time reflected in what write() should report when
>>> it's uncertain about a network transmission.  If it reports an error
>>> when it's uncertain, then you can't trust that a write() error means
>>> the data was not written, only that a problem was detected.
>>
>> I think you're stretching here.  If it really were the case that 
>> write() could actually result in data being sent out the wire and yet 
>> still returning an error, it would make all error handling in Unix 
>> unmanagable.  I can't believe this is possible in Linux and without 
>> an actual counter-example, I'm inclined to believe the same is true 
>> for every other OS out there.
>
> It's actually a common scenario for block devices.  I don't know about 
> networking, but for disks a write can be completed and then report an 
> error if the cable or power was disconnected before the acknowledge 
> could arrive.

Is it common that a disk cable is yanked out before the ack arrives?  
Are their gremlins in your servers :-)

> It could conceivably happen with networking if the device reports an 
> error when it isn't sure if the data was sent out or not (but it 
> actually was), or if some path after the transmission required a 
> memory allocation, which failed.

But does this actually happen or is this all theoretical?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-12 19:10                   ` Anthony Liguori
@ 2009-07-12 19:30                     ` Avi Kivity
  2009-07-13  5:31                     ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Avi Kivity @ 2009-07-12 19:30 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On 07/12/2009 10:10 PM, Anthony Liguori wrote:
>> It's actually a common scenario for block devices.  I don't know 
>> about networking, but for disks a write can be completed and then 
>> report an error if the cable or power was disconnected before the 
>> acknowledge could arrive.
>
>
> Is it common that a disk cable is yanked out before the ack arrives? 

It was common when I was doing filesystems.

> Are their gremlins in your servers :-)

Worse, QA.

>
>> It could conceivably happen with networking if the device reports an 
>> error when it isn't sure if the data was sent out or not (but it 
>> actually was), or if some path after the transmission required a 
>> memory allocation, which failed.
>
> But does this actually happen or is this all theoretical?

With block devices, my experience indicates that the probability of 
something happening is proportional to the damage it will cause.  With 
networking, it may be theoretical or practical, I suggest we don't rely 
on it either way.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-12 19:10                   ` Anthony Liguori
  2009-07-12 19:30                     ` Avi Kivity
@ 2009-07-13  5:31                     ` Gleb Natapov
  2009-07-13  8:05                       ` Gleb Natapov
  2009-07-13 14:52                       ` Anthony Liguori
  1 sibling, 2 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-07-13  5:31 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Avi Kivity, qemu-devel

On Sun, Jul 12, 2009 at 02:10:43PM -0500, Anthony Liguori wrote:
> Avi Kivity wrote:
>> On 07/12/2009 06:31 AM, Anthony Liguori wrote:
>>> Jamie Lokier wrote:
>>>> If you get an error during the last write(), I wouldn't trust that to
>>>> mean the recipient will definitely not see the data you wrote.  (Enjoy
>>>> the double negative).  It's another variation of the handshake
>>>> uncertainty, this time reflected in what write() should report when
>>>> it's uncertain about a network transmission.  If it reports an error
>>>> when it's uncertain, then you can't trust that a write() error means
>>>> the data was not written, only that a problem was detected.
>>>
>>> I think you're stretching here.  If it really were the case that  
>>> write() could actually result in data being sent out the wire and yet 
>>> still returning an error, it would make all error handling in Unix  
>>> unmanagable.  I can't believe this is possible in Linux and without  
>>> an actual counter-example, I'm inclined to believe the same is true  
>>> for every other OS out there.
>>
>> It's actually a common scenario for block devices.  I don't know about  
>> networking, but for disks a write can be completed and then report an  
>> error if the cable or power was disconnected before the acknowledge  
>> could arrive.
>
> Is it common that a disk cable is yanked out before the ack arrives?   
> Are their gremlins in your servers :-)
>
>> It could conceivably happen with networking if the device reports an  
>> error when it isn't sure if the data was sent out or not (but it  
>> actually was), or if some path after the transmission required a  
>> memory allocation, which failed.
>
> But does this actually happen or is this all theoretical?
>
With unreliable socket it doesn't matter what write() returns data may
or may not reach the destination regardless, with reliable sockets
write() succeeds only after data was acked by the receiver, but it still
doesn't mean that data will be read from destination socket.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-13  5:31                     ` Gleb Natapov
@ 2009-07-13  8:05                       ` Gleb Natapov
  2009-07-13 14:52                       ` Anthony Liguori
  1 sibling, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-07-13  8:05 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Avi Kivity, qemu-devel

On Mon, Jul 13, 2009 at 08:31:39AM +0300, Gleb Natapov wrote:
> On Sun, Jul 12, 2009 at 02:10:43PM -0500, Anthony Liguori wrote:
> > Avi Kivity wrote:
> >> On 07/12/2009 06:31 AM, Anthony Liguori wrote:
> >>> Jamie Lokier wrote:
> >>>> If you get an error during the last write(), I wouldn't trust that to
> >>>> mean the recipient will definitely not see the data you wrote.  (Enjoy
> >>>> the double negative).  It's another variation of the handshake
> >>>> uncertainty, this time reflected in what write() should report when
> >>>> it's uncertain about a network transmission.  If it reports an error
> >>>> when it's uncertain, then you can't trust that a write() error means
> >>>> the data was not written, only that a problem was detected.
> >>>
> >>> I think you're stretching here.  If it really were the case that  
> >>> write() could actually result in data being sent out the wire and yet 
> >>> still returning an error, it would make all error handling in Unix  
> >>> unmanagable.  I can't believe this is possible in Linux and without  
> >>> an actual counter-example, I'm inclined to believe the same is true  
> >>> for every other OS out there.
> >>
> >> It's actually a common scenario for block devices.  I don't know about  
> >> networking, but for disks a write can be completed and then report an  
> >> error if the cable or power was disconnected before the acknowledge  
> >> could arrive.
> >
> > Is it common that a disk cable is yanked out before the ack arrives?   
> > Are their gremlins in your servers :-)
> >
> >> It could conceivably happen with networking if the device reports an  
> >> error when it isn't sure if the data was sent out or not (but it  
> >> actually was), or if some path after the transmission required a  
> >> memory allocation, which failed.
> >
> > But does this actually happen or is this all theoretical?
> >
> With unreliable socket it doesn't matter what write() returns data may
> or may not reach the destination regardless, with reliable sockets
> write() succeeds only after data was acked by the receiver, but it still
> doesn't mean that data will be read from destination socket.
> 
Actually for reliable sockets the write() succeeds when data is copied
into in kernel buffer, not when it is acked. If you want to be sure the
data is sent you can check the length of a socket send queue, but this
still doesn't mean that data was actually delivered to the receiver
application. Only application level protocol may guaranty that.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-13  5:31                     ` Gleb Natapov
  2009-07-13  8:05                       ` Gleb Natapov
@ 2009-07-13 14:52                       ` Anthony Liguori
  2009-07-14  8:48                         ` Dor Laor
  1 sibling, 1 reply; 34+ messages in thread
From: Anthony Liguori @ 2009-07-13 14:52 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Avi Kivity, qemu-devel

Gleb Natapov wrote:
> With unreliable socket it doesn't matter what write() returns data may
> or may not reach the destination regardless, with reliable sockets
> write() succeeds only after data was acked by the receiver, but it still
> doesn't mean that data will be read from destination socket.
>   

You are correct and we handle both of these cases appropriately.  In the 
event that we think we completed a migration successfully and we really 
didn't because of a lost network connection, the result is both the 
source and destination are stopped.  A third party can resume the source 
and continue along happily.

The case being debated is whether write() can ever actually complete and 
yet still return an error.  In this case, since we automatically resume 
the source on error, the result would be two copies of the VM running.

I haven't seen any evidence that this case could actually happen other 
than theoretic speculation.  I just at the migration code and it's not a 
simple change to try and be conservative wrt this case because of the 
way we do buffering.

Regards,

Anthony Liguori

> --
> 			Gleb.
>   

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-13 14:52                       ` Anthony Liguori
@ 2009-07-14  8:48                         ` Dor Laor
  2009-07-14 14:41                           ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Dor Laor @ 2009-07-14  8:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Avi Kivity, Gleb Natapov, qemu-devel

On 07/13/2009 05:52 PM, Anthony Liguori wrote:
> Gleb Natapov wrote:
>> With unreliable socket it doesn't matter what write() returns data may
>> or may not reach the destination regardless, with reliable sockets
>> write() succeeds only after data was acked by the receiver, but it still
>> doesn't mean that data will be read from destination socket.
>
> You are correct and we handle both of these cases appropriately. In the
> event that we think we completed a migration successfully and we really
> didn't because of a lost network connection, the result is both the
> source and destination are stopped. A third party can resume the source
> and continue along happily.
>
> The case being debated is whether write() can ever actually complete and
> yet still return an error. In this case, since we automatically resume
> the source on error, the result would be two copies of the VM running.
>
> I haven't seen any evidence that this case could actually happen other
> than theoretic speculation. I just at the migration code and it's not a
> simple change to try and be conservative wrt this case because of the
> way we do buffering.
>

Reminder: please remember to commit for saving the running state so we 
won't resume paused guest after successful migration.


> Regards,
>
> Anthony Liguori
>
>> --
>> Gleb.
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state
  2009-07-14  8:48                         ` Dor Laor
@ 2009-07-14 14:41                           ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2009-07-14 14:41 UTC (permalink / raw)
  To: dlaor; +Cc: qemu-devel, Gleb Natapov, Avi Kivity


> Reminder: please remember to commit for saving the running state so we
> won't resume paused guest after successful migration.

If you meant Red Hat bugzilla 510459, that had not been posted yet. :-) 
  That patch fixes qemu's support for libvirt resuming paused guests 
after migration; it relies on libvirt remembering whether the VM was 
running (which works fine).

If instead you referred to a patch to do this purely in qemu, sorry for 
the noise.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2009-07-14 14:41 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-09 11:47 [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Paolo Bonzini
2009-07-09 11:47 ` [Qemu-devel] [PATCH 1/3] move state and mon_resume to struct MigrationState Paolo Bonzini
2009-07-09 11:47 ` [Qemu-devel] [PATCH 2/3] move vm stop/start to migrate_set_state Paolo Bonzini
2009-07-09 13:45   ` Anthony Liguori
2009-07-09 13:48     ` Paolo Bonzini
2009-07-09 13:53       ` Anthony Liguori
2009-07-09 13:58         ` Paolo Bonzini
2009-07-09 14:41           ` Anthony Liguori
2009-07-10 23:14         ` Jamie Lokier
2009-07-11  0:04           ` malc
2009-07-11  0:42             ` Jamie Lokier
2009-07-11  0:55             ` Anthony Liguori
2009-07-11  0:58           ` Anthony Liguori
2009-07-11  1:42             ` Jamie Lokier
2009-07-12  3:31               ` Anthony Liguori
2009-07-12 14:22                 ` Avi Kivity
2009-07-12 19:10                   ` Anthony Liguori
2009-07-12 19:30                     ` Avi Kivity
2009-07-13  5:31                     ` Gleb Natapov
2009-07-13  8:05                       ` Gleb Natapov
2009-07-13 14:52                       ` Anthony Liguori
2009-07-14  8:48                         ` Dor Laor
2009-07-14 14:41                           ` Paolo Bonzini
2009-07-09 11:47 ` [Qemu-devel] [PATCH 3/3] add live dumping capability Paolo Bonzini
2009-07-09 13:49   ` Anthony Liguori
2009-07-09 14:06     ` Paolo Bonzini
2009-07-09 14:43       ` Anthony Liguori
2009-07-10  8:32         ` Paolo Bonzini
2009-07-10 12:51           ` Anthony Liguori
2009-07-09 13:42 ` [Qemu-devel] [PATCH 0/3] add "core dump"-like capability Anthony Liguori
2009-07-09 13:46   ` Paolo Bonzini
2009-07-09 13:51     ` Anthony Liguori
2009-07-09 14:46     ` Gerd Hoffmann
2009-07-09 16:20       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.