All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 00/27] Migration thread (WIP)
@ 2012-07-24 18:36 Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
                   ` (30 more replies)
  0 siblings, 31 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Hi

This series are on top of the migration-next-v5 series just posted.

First of all, this is an RFC/Work in progress.  Just a lot of people
asked for it, and I would like review of the design.

It does:
- get a new bitmap for migration, and that bitmap uses 1 bit by page
- it unfolds migration_buffered_file.  Only one user existed.
- it simplifies buffered_file a lot.

- About the migration thread, special attention was giving to try to
  get the series review-able (reviewers would tell if I got it).

Basic design:
- we create a new thread instead of a timer function
- we move all the migration work to that thread (but run everything
  except the waits with the iothread lock.
- we move all the writting to outside the iothread lock.  i.e.
  we walk the state with the iothread hold, and copy everything to one buffer.
  then we write that buffer to the sockets outside the iothread lock.
- once here, we move to writting synchronously to the sockets.
- this allows us to simplify quite a lot.

And basically, that is it.  Notice that we still do the iterate page
walking with the iothread held.  Light testing show that we got
similar speed and latencies than without the thread (notice that
almost no optimizations done here yet).

Appart of the review:
- Are there any locking issues that I have missed (I guess so)
- stop all cpus correctly.  vm_stop should be called from the iothread,
  I use the trick of using a bottom half to get that working correctly.
  but this _implementation_ is ugly as hell.  Is there an easy way
  of doing it?
- Do I really have to export last_ram_offset(), there is no other way
  of knowing the ammount of RAM?

Known issues:

- for some reason, when it has to start a 2nd round of bitmap
  handling, it decides to dirty all pages.  Haven't found still why
  this happens.

If you can test it, and said me where it breaks, it would also help.

Work is based on Umesh thread work, and work that Paolo Bonzini had
work on top of that.  All the mirgation thread was done from scratch
becase I was unable to debug why it was failing, but it "owes" a lot
to the previous design.

Thanks in advance, Juan.

The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:

  Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)

are available in the git repository at:


  http://repo.or.cz/r/qemu/quintela.git migration-thread-v1

for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:

  buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)


Juan Quintela (23):
  buffered_file: g_realloc() can't fail
  savevm: Factorize ram globals reset in its own function
  ram: introduce migration_bitmap_set_dirty()
  ram: Introduce migration_bitmap_test_and_reset_dirty()
  ram: Export last_ram_offset()
  ram: introduce migration_bitmap_sync()
  Separate migration bitmap
  buffered_file: rename opaque to migration_state
  buffered_file: opaque is MigrationState
  buffered_file: unfold migrate_fd_put_buffer
  buffered_file: unfold migrate_fd_put_ready
  buffered_file: unfold migrate_fd_put_buffer
  buffered_file: unfold migrate_fd_put_buffer
  buffered_file: We can access directly to bandwidth_limit
  buffered_file: Move from using a timer to use a thread
  migration: make qemu_fopen_ops_buffered() return void
  migration: stop all cpus correctly
  migration: make writes blocking
  migration: remove unfreeze logic
  migration: take finer locking
  buffered_file: Unfold the trick to restart generating migration data
  buffered_file: don't flush on put buffer
  buffered_file: unfold buffered_append in buffered_put_buffer

Paolo Bonzini (2):
  split MRU ram list
  BufferedFile: append, then flush

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 arch_init.c      |  108 +++++++++++++++++++++++++-------
 buffered_file.c  |  179 +++++++++++++++++-------------------------------------
 buffered_file.h  |   12 +---
 cpu-all.h        |   17 +++++-
 exec-obsolete.h  |   10 ---
 exec.c           |   45 +++++++++++---
 migration-exec.c |    2 -
 migration-fd.c   |    6 --
 migration-tcp.c  |    2 +-
 migration-unix.c |    2 -
 migration.c      |  111 ++++++++++++++-------------------
 migration.h      |    6 ++
 qemu-file.h      |    5 --
 savevm.c         |    5 --
 14 files changed, 249 insertions(+), 261 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index f170aa0..4148abb 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -50,20 +50,12 @@ static void buffered_append(QEMUFileBuffered *s,
                             const uint8_t *buf, size_t size)
 {
     if (size > (s->buffer_capacity - s->buffer_size)) {
-        void *tmp;
-
         DPRINTF("increasing buffer capacity from %zu by %zu\n",
                 s->buffer_capacity, size + 1024);

         s->buffer_capacity += size + 1024;

-        tmp = g_realloc(s->buffer, s->buffer_capacity);
-        if (tmp == NULL) {
-            fprintf(stderr, "qemu file buffer expansion failed\n");
-            exit(1);
-        }
-
-        s->buffer = tmp;
+        s->buffer = g_realloc(s->buffer, s->buffer_capacity);
     }

     memcpy(s->buffer + s->buffer_size, buf, size);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 02/27] split MRU ram list
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-25 20:20   ` Michael Roth
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function Juan Quintela
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

From: Paolo Bonzini <pbonzini@redhat.com>

Outside the execution threads the normal, non-MRU-ized order of
the RAM blocks should always be enough.  So manage two separate
lists, which will have separate locking rules.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 cpu-all.h |    4 +++-
 exec.c    |   16 +++++++++++-----
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 82ba1d7..ca3bb24 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -476,8 +476,9 @@ typedef struct RAMBlock {
     ram_addr_t offset;
     ram_addr_t length;
     uint32_t flags;
-    char idstr[256];
+    QLIST_ENTRY(RAMBlock) next_mru;
     QLIST_ENTRY(RAMBlock) next;
+    char idstr[256];
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
@@ -485,6 +486,7 @@ typedef struct RAMBlock {

 typedef struct RAMList {
     uint8_t *phys_dirty;
+    QLIST_HEAD(, RAMBlock) blocks_mru;
     QLIST_HEAD(, RAMBlock) blocks;
     uint64_t dirty_pages;
 } RAMList;
diff --git a/exec.c b/exec.c
index feb4795..afc472f 100644
--- a/exec.c
+++ b/exec.c
@@ -112,7 +112,10 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;

-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
+RAMList ram_list = {
+    .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks),
+    .blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};

 static MemoryRegion *system_memory;
 static MemoryRegion *system_io;
@@ -2550,6 +2553,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     new_block->length = size;

     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
+    QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);

     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2573,6 +2577,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             g_free(block);
             return;
         }
@@ -2586,6 +2591,7 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
@@ -2690,12 +2696,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block;

-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         if (addr - block->offset < block->length) {
             /* Move this entry to to start of the list.  */
             if (block != QLIST_FIRST(&ram_list.blocks)) {
-                QLIST_REMOVE(block, next);
-                QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+                QLIST_REMOVE(block, next_mru);
+                QLIST_INSERT_HEAD(&ram_list.blocks_mru, block, next_mru);
             }
             if (xen_enabled()) {
                 /* We need to check if the requested address is in the RAM
@@ -2790,7 +2796,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
         return 0;
     }

-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         /* This case append when the block is not mapped. */
         if (block->host == NULL) {
             continue;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index f555c27..02d36ce 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -491,6 +491,14 @@ static void ram_migration_cancel(void *opaque)
     migration_end();
 }

+
+static void reset_ram_globals(void)
+{
+    last_block = NULL;
+    last_offset = 0;
+    sort_ram_list();
+}
+
 #define MAX_WAIT 50 /* ms, half buffered_file limit */

 static int ram_save_setup(QEMUFile *f, void *opaque)
@@ -499,9 +507,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;

     bytes_transferred = 0;
-    last_block = NULL;
-    last_offset = 0;
-    sort_ram_list();
+    reset_ram_globals();

     if (migrate_use_xbzrle()) {
         XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() /
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 04/27] add a version number to ram_list
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (2 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-25 23:27   ` Michael Roth
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex Juan Quintela
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Umesh Deshpande

From: Umesh Deshpande <udeshpan@redhat.com>

This will be used to detect if last_block might have become invalid
across different calls to ram_save_live.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |    6 ++++++
 cpu-all.h   |    1 +
 exec.c      |    4 ++++
 3 files changed, 11 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 02d36ce..9e05aae 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -340,6 +340,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,

 static RAMBlock *last_block;
 static ram_addr_t last_offset;
+static uint32_t last_version;

 /*
  * ram_save_block: Writes a page of memory to the stream f
@@ -496,6 +497,7 @@ static void reset_ram_globals(void)
 {
     last_block = NULL;
     last_offset = 0;
+    last_version = ram_list.version;
     sort_ram_list();
 }

@@ -554,6 +556,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     int i;
     uint64_t expected_time;

+    if (ram_list.version != last_version) {
+        reset_ram_globals();
+    }
+
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);

diff --git a/cpu-all.h b/cpu-all.h
index ca3bb24..429b2c6 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -486,6 +486,7 @@ typedef struct RAMBlock {

 typedef struct RAMList {
     uint8_t *phys_dirty;
+    uint32_t version;
     QLIST_HEAD(, RAMBlock) blocks_mru;
     QLIST_HEAD(, RAMBlock) blocks;
     uint64_t dirty_pages;
diff --git a/exec.c b/exec.c
index afc472f..ceffa4f 100644
--- a/exec.c
+++ b/exec.c
@@ -2555,6 +2555,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);

+    ram_list.version++;
+
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
     cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
@@ -2578,6 +2580,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
             g_free(block);
             return;
         }
@@ -2592,6 +2595,7 @@ void qemu_ram_free(ram_addr_t addr)
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (3 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty() Juan Quintela
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Umesh Deshpande

From: Umesh Deshpande <udeshpan@redhat.com>

Add the new mutex that protects shared state between ram_save_live
and the iothread.  If the iothread mutex has to be taken together
with the ramlist mutex, the iothread shall always be _outside_.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   10 +++++++++-
 cpu-all.h   |    9 +++++++++
 exec.c      |   23 +++++++++++++++++++++--
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9e05aae..5a97710 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -492,7 +492,6 @@ static void ram_migration_cancel(void *opaque)
     migration_end();
 }

-
 static void reset_ram_globals(void)
 {
     last_block = NULL;
@@ -508,6 +507,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_addr_t addr;
     RAMBlock *block;

+    qemu_mutex_lock_ramlist();
+
     bytes_transferred = 0;
     reset_ram_globals();

@@ -543,6 +544,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         qemu_put_be64(f, block->length);
     }

+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);

     return 0;
@@ -556,6 +558,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     int i;
     uint64_t expected_time;

+    qemu_mutex_lock_ramlist();
+
     if (ram_list.version != last_version) {
         reset_ram_globals();
     }
@@ -603,6 +607,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         bwidth = 0.000001;
     }

+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);

     expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
@@ -623,6 +628,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     memory_global_sync_dirty_bitmap(get_system_memory());

+    qemu_mutex_lock_ramlist();
+
     /* try transferring iterative blocks of memory */

     /* flush all remaining blocks regardless of rate limiting */
@@ -638,6 +645,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     }
     memory_global_dirty_log_stop();

+    qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);

     return 0;
diff --git a/cpu-all.h b/cpu-all.h
index 429b2c6..0ff452f 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include "qemu-common.h"
 #include "qemu-tls.h"
 #include "cpu-common.h"
+#include "qemu-thread.h"

 /* some important defines:
  *
@@ -476,7 +477,9 @@ typedef struct RAMBlock {
     ram_addr_t offset;
     ram_addr_t length;
     uint32_t flags;
+    /* Protected by the iothread lock.  */
     QLIST_ENTRY(RAMBlock) next_mru;
+    /* Protected by the ramlist lock.  */
     QLIST_ENTRY(RAMBlock) next;
     char idstr[256];
 #if defined(__linux__) && !defined(TARGET_S390X)
@@ -485,9 +488,12 @@ typedef struct RAMBlock {
 } RAMBlock;

 typedef struct RAMList {
+    QemuMutex mutex;
+    /* Protected by the iothread lock.  */
     uint8_t *phys_dirty;
     uint32_t version;
     QLIST_HEAD(, RAMBlock) blocks_mru;
+    /* Protected by the ramlist lock.  */
     QLIST_HEAD(, RAMBlock) blocks;
     uint64_t dirty_pages;
 } RAMList;
@@ -509,6 +515,9 @@ extern int mem_prealloc;
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 #endif /* !CONFIG_USER_ONLY */

+void qemu_mutex_lock_ramlist(void);
+void qemu_mutex_unlock_ramlist(void);
+
 int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);

diff --git a/exec.c b/exec.c
index ceffa4f..53da253 100644
--- a/exec.c
+++ b/exec.c
@@ -637,6 +637,7 @@ bool tcg_enabled(void)

 void cpu_exec_init_all(void)
 {
+    qemu_mutex_init(&ram_list.mutex);
 #if !defined(CONFIG_USER_ONLY)
     memory_map_init();
     io_mem_init();
@@ -2367,6 +2368,16 @@ static long gethugepagesize(const char *path)
     return fs.f_bsize;
 }

+void qemu_mutex_lock_ramlist(void)
+{
+    qemu_mutex_lock(&ram_list.mutex);
+}
+
+void qemu_mutex_unlock_ramlist(void)
+{
+    qemu_mutex_unlock(&ram_list.mutex);
+}
+
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path)
@@ -2504,6 +2515,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     }
     pstrcat(new_block->idstr, sizeof(new_block->idstr), name);

+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
             fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
@@ -2511,6 +2523,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
             abort();
         }
     }
+    qemu_mutex_unlock_ramlist();
 }

 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
@@ -2521,6 +2534,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     size = TARGET_PAGE_ALIGN(size);
     new_block = g_malloc0(sizeof(*new_block));

+    qemu_mutex_lock_ramlist();
     new_block->mr = mr;
     new_block->offset = find_ram_offset(size);
     if (host) {
@@ -2556,6 +2570,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);

     ram_list.version++;
+    qemu_mutex_unlock_ramlist();

     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2576,21 +2591,24 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 {
     RAMBlock *block;

+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
             ram_list.version++;
             g_free(block);
-            return;
+            break;
         }
     }
+    qemu_mutex_unlock_ramlist();
 }

 void qemu_ram_free(ram_addr_t addr)
 {
     RAMBlock *block;

+    qemu_mutex_lock_ramlist();
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
@@ -2621,9 +2639,10 @@ void qemu_ram_free(ram_addr_t addr)
 #endif
             }
             g_free(block);
-            return;
+            break;
         }
     }
+    qemu_mutex_unlock_ramlist();

 }

-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty()
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (4 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty() Juan Quintela
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

It just marks a region of memory as dirty.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 5a97710..b47bf05 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -342,6 +342,18 @@ static RAMBlock *last_block;
 static ram_addr_t last_offset;
 static uint32_t last_version;

+static inline void migration_bitmap_set_dirty(MemoryRegion *mr, int length)
+{
+    ram_addr_t addr;
+
+    for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+        if (!memory_region_get_dirty(mr, addr, TARGET_PAGE_SIZE,
+                                     DIRTY_MEMORY_MIGRATION)) {
+            memory_region_set_dirty(mr, addr, TARGET_PAGE_SIZE);
+        }
+    }
+}
+
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
@@ -504,7 +516,6 @@ static void reset_ram_globals(void)

 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
-    ram_addr_t addr;
     RAMBlock *block;

     qemu_mutex_lock_ramlist();
@@ -526,12 +537,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }

     QLIST_FOREACH(block, &ram_list.blocks, next) {
-        for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
-            if (!memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
-                                         DIRTY_MEMORY_MIGRATION)) {
-                memory_region_set_dirty(block->mr, addr, TARGET_PAGE_SIZE);
-            }
-        }
+        migration_bitmap_set_dirty(block->mr, block->length);
     }

     memory_global_dirty_log_start();
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty()
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (5 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty() Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset() Juan Quintela
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

It just test if the dirty bit is set, and clears it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index b47bf05..c49c321 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -342,6 +342,19 @@ static RAMBlock *last_block;
 static ram_addr_t last_offset;
 static uint32_t last_version;

+static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
+                                                         ram_addr_t offset)
+{
+    bool ret = memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
+                                       DIRTY_MEMORY_MIGRATION);
+
+    if (ret) {
+        memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
+                                  DIRTY_MEMORY_MIGRATION);
+    }
+    return ret;
+}
+
 static inline void migration_bitmap_set_dirty(MemoryRegion *mr, int length)
 {
     ram_addr_t addr;
@@ -375,14 +388,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage)

     do {
         mr = block->mr;
-        if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                    DIRTY_MEMORY_MIGRATION)) {
+        if (migration_bitmap_test_and_reset_dirty(mr, offset)) {
             uint8_t *p;
             int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;

-            memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                      DIRTY_MEMORY_MIGRATION);
-
             p = memory_region_get_ram_ptr(mr) + offset;

             if (is_dup_page(p)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset()
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (6 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty() Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync() Juan Quintela
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Is the only way of knowing the RAM size.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 cpu-all.h |    2 ++
 exec.c    |    2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/cpu-all.h b/cpu-all.h
index 0ff452f..45290b7 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -518,6 +518,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);

+ram_addr_t last_ram_offset(void);
+
 int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);

diff --git a/exec.c b/exec.c
index 53da253..1a98621 100644
--- a/exec.c
+++ b/exec.c
@@ -2481,7 +2481,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
     return offset;
 }

-static ram_addr_t last_ram_offset(void)
+ram_addr_t last_ram_offset(void)
 {
     RAMBlock *block;
     ram_addr_t last = 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync()
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (7 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset() Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Helper that we use each time that we need to syncronize the migration
bitmap with the other dirty bitmaps.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c49c321..d21b2a3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -367,6 +367,12 @@ static inline void migration_bitmap_set_dirty(MemoryRegion *mr, int length)
     }
 }

+static void migration_bitmap_sync(void)
+{
+    memory_global_sync_dirty_bitmap(get_system_memory());
+}
+
+
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
@@ -631,7 +637,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
             expected_time, migrate_max_downtime());

     if (expected_time <= migrate_max_downtime()) {
-        memory_global_sync_dirty_bitmap(get_system_memory());
+        migration_bitmap_sync();
         expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;

         return expected_time <= migrate_max_downtime();
@@ -641,7 +647,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)

 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
-    memory_global_sync_dirty_bitmap(get_system_memory());
+    migration_bitmap_sync();

     qemu_mutex_lock_ramlist();

-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 10/27] Separate migration bitmap
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (8 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync() Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-25  9:16   ` Avi Kivity
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush Juan Quintela
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Umesh Deshpande

This patch creates a migration bitmap, which is periodically kept in
sync with the qemu bitmap. A separate copy of the dirty bitmap for the
migration limits the amount of concurrent access to the qemu bitmap
from iothread and migration thread (which requires taking the big
lock).

We use the qemu bitmap type.  We have to "undo" the dirty_pages
counting optimization on the general dirty bitmap and do the counting
optimization with the migration local bitmap.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c     |   57 +++++++++++++++++++++++++++++++++++++++----------------
 cpu-all.h       |    1 -
 exec-obsolete.h |   10 ----------
 3 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d21b2a3..72345c3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -31,6 +31,8 @@
 #include "config.h"
 #include "monitor.h"
 #include "sysemu.h"
+#include "bitops.h"
+#include "bitmap.h"
 #include "arch_init.h"
 #include "audio/audio.h"
 #include "hw/pc.h"
@@ -341,35 +343,54 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 static uint32_t last_version;
+static unsigned long *migration_bitmap;
+static uint64_t migration_dirty_pages;

 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
                                                          ram_addr_t offset)
 {
-    bool ret = memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                       DIRTY_MEMORY_MIGRATION);
+    bool ret;
+    int nr = (mr->ram_addr + offset) >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);

     if (ret) {
-        memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                  DIRTY_MEMORY_MIGRATION);
+        migration_dirty_pages--;
     }
     return ret;
 }

-static inline void migration_bitmap_set_dirty(MemoryRegion *mr, int length)
+static inline bool migration_bitmap_set_dirty(MemoryRegion *mr,
+                                              ram_addr_t offset)
 {
-    ram_addr_t addr;
+    bool ret;
+    int nr = (mr->ram_addr + offset) >> TARGET_PAGE_BITS;

-    for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
-        if (!memory_region_get_dirty(mr, addr, TARGET_PAGE_SIZE,
-                                     DIRTY_MEMORY_MIGRATION)) {
-            memory_region_set_dirty(mr, addr, TARGET_PAGE_SIZE);
-        }
+    ret = test_and_set_bit(nr, migration_bitmap);
+
+    if (!ret) {
+        migration_dirty_pages++;
     }
+    return ret;
 }

 static void migration_bitmap_sync(void)
 {
+    RAMBlock *block;
+    ram_addr_t addr;
+
     memory_global_sync_dirty_bitmap(get_system_memory());
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
+            if (memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
+                                        DIRTY_MEMORY_MIGRATION)) {
+                migration_bitmap_set_dirty(block->mr, addr);
+            }
+        }
+        memory_region_reset_dirty(block->mr, 0, block->length,
+                                  DIRTY_MEMORY_MIGRATION);
+    }
 }


@@ -447,7 +468,7 @@ static uint64_t bytes_transferred;

 static ram_addr_t ram_save_remaining(void)
 {
-    return ram_list.dirty_pages;
+    return migration_dirty_pages;
 }

 uint64_t ram_bytes_remaining(void)
@@ -532,6 +553,11 @@ static void reset_ram_globals(void)
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    migration_bitmap = bitmap_new(ram_pages);
+    bitmap_set(migration_bitmap, 0, ram_pages);
+    migration_dirty_pages = ram_pages;

     qemu_mutex_lock_ramlist();

@@ -551,10 +577,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         acct_clear();
     }

-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        migration_bitmap_set_dirty(block->mr, block->length);
-    }
-
     memory_global_dirty_log_start();

     qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
@@ -669,6 +691,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     qemu_mutex_unlock_ramlist();
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);

+    g_free(migration_bitmap);
+    migration_bitmap = NULL;
+
     return 0;
 }

diff --git a/cpu-all.h b/cpu-all.h
index 45290b7..88ad1ba 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -495,7 +495,6 @@ typedef struct RAMList {
     QLIST_HEAD(, RAMBlock) blocks_mru;
     /* Protected by the ramlist lock.  */
     QLIST_HEAD(, RAMBlock) blocks;
-    uint64_t dirty_pages;
 } RAMList;
 extern RAMList ram_list;

diff --git a/exec-obsolete.h b/exec-obsolete.h
index c099256..f8ffce6 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -74,11 +74,6 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start,
 static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr,
                                                       int dirty_flags)
 {
-    if ((dirty_flags & MIGRATION_DIRTY_FLAG) &&
-        !cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
-                                       MIGRATION_DIRTY_FLAG)) {
-        ram_list.dirty_pages++;
-    }
     return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags;
 }

@@ -92,11 +87,6 @@ static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr,
 {
     int mask = ~dirty_flags;

-    if ((dirty_flags & MIGRATION_DIRTY_FLAG) &&
-        cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
-                                      MIGRATION_DIRTY_FLAG)) {
-        ram_list.dirty_pages--;
-    }
     return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] &= mask;
 }

-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (9 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state Juan Quintela
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

From: Paolo Bonzini <pbonzini@redhat.com>

Simplify the logic for pushing data from the buffer to the output
pipe/socket.  This also matches more closely what will be the
operation of the migration thread.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   50 +++++++++++---------------------------------------
 1 file changed, 11 insertions(+), 39 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 4148abb..7155800 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -75,7 +75,7 @@ static void buffered_flush(QEMUFileBuffered *s)

     DPRINTF("flushing %zu byte(s) of data\n", s->buffer_size);

-    while (offset < s->buffer_size) {
+    while (s->bytes_xfer < s->xfer_limit && offset < s->buffer_size) {
         ssize_t ret;

         ret = s->put_buffer(s->opaque, s->buffer + offset,
@@ -93,6 +93,7 @@ static void buffered_flush(QEMUFileBuffered *s)
         } else {
             DPRINTF("flushed %zd byte(s)\n", ret);
             offset += ret;
+            s->bytes_xfer += ret;
         }
     }

@@ -104,8 +105,7 @@ static void buffered_flush(QEMUFileBuffered *s)
 static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size)
 {
     QEMUFileBuffered *s = opaque;
-    int offset = 0, error;
-    ssize_t ret;
+    int error;

     DPRINTF("putting %d bytes at %" PRId64 "\n", size, pos);

@@ -118,48 +118,22 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
     DPRINTF("unfreezing output\n");
     s->freeze_output = 0;

-    buffered_flush(s);
-
-    while (!s->freeze_output && offset < size) {
-        if (s->bytes_xfer > s->xfer_limit) {
-            DPRINTF("transfer limit exceeded when putting\n");
-            break;
-        }
-
-        ret = s->put_buffer(s->opaque, buf + offset, size - offset);
-        if (ret == -EAGAIN) {
-            DPRINTF("backend not ready, freezing\n");
-            s->freeze_output = 1;
-            break;
-        }
-
-        if (ret <= 0) {
-            DPRINTF("error putting\n");
-            qemu_file_set_error(s->file, ret);
-            offset = -EINVAL;
-            break;
-        }
-
-        DPRINTF("put %zd byte(s)\n", ret);
-        offset += ret;
-        s->bytes_xfer += ret;
-    }
-
-    if (offset >= 0) {
+    if (size > 0) {
         DPRINTF("buffering %d bytes\n", size - offset);
-        buffered_append(s, buf + offset, size - offset);
-        offset = size;
+        buffered_append(s, buf, size);
     }

+    buffered_flush(s);
+
     if (pos == 0 && size == 0) {
         DPRINTF("file is ready\n");
-        if (s->bytes_xfer <= s->xfer_limit) {
+        if (!s->freeze_output && s->bytes_xfer < s->xfer_limit) {
             DPRINTF("notifying client\n");
             s->put_ready(s->opaque);
         }
     }

-    return offset;
+    return size;
 }

 static int buffered_close(void *opaque)
@@ -169,6 +143,7 @@ static int buffered_close(void *opaque)

     DPRINTF("closing\n");

+    s->xfer_limit = INT_MAX;
     while (!qemu_file_get_error(s->file) && s->buffer_size) {
         buffered_flush(s);
         if (s->freeze_output)
@@ -248,10 +223,7 @@ static void buffered_rate_tick(void *opaque)

     s->bytes_xfer = 0;

-    buffered_flush(s);
-
-    /* Add some checks around this */
-    s->put_ready(s->opaque);
+    buffered_put_buffer(s, NULL, 0, 0);
 }

 QEMUFile *qemu_fopen_ops_buffered(void *opaque,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (10 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState Juan Quintela
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 7155800..33b700b 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -27,7 +27,7 @@ typedef struct QEMUFileBuffered
     BufferedPutReadyFunc *put_ready;
     BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
     BufferedCloseFunc *close;
-    void *opaque;
+    void *migration_state;
     QEMUFile *file;
     int freeze_output;
     size_t bytes_xfer;
@@ -78,7 +78,7 @@ static void buffered_flush(QEMUFileBuffered *s)
     while (s->bytes_xfer < s->xfer_limit && offset < s->buffer_size) {
         ssize_t ret;

-        ret = s->put_buffer(s->opaque, s->buffer + offset,
+        ret = s->put_buffer(s->migration_state, s->buffer + offset,
                             s->buffer_size - offset);
         if (ret == -EAGAIN) {
             DPRINTF("backend not ready, freezing\n");
@@ -129,7 +129,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         DPRINTF("file is ready\n");
         if (!s->freeze_output && s->bytes_xfer < s->xfer_limit) {
             DPRINTF("notifying client\n");
-            s->put_ready(s->opaque);
+            s->put_ready(s->migration_state);
         }
     }

@@ -147,10 +147,10 @@ static int buffered_close(void *opaque)
     while (!qemu_file_get_error(s->file) && s->buffer_size) {
         buffered_flush(s);
         if (s->freeze_output)
-            s->wait_for_unfreeze(s->opaque);
+            s->wait_for_unfreeze(s->migration_state);
     }

-    ret = s->close(s->opaque);
+    ret = s->close(s->migration_state);

     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
@@ -237,7 +237,7 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,

     s = g_malloc0(sizeof(*s));

-    s->opaque = opaque;
+    s->migration_state = opaque;
     s->xfer_limit = bytes_per_sec / 10;
     s->put_buffer = put_buffer;
     s->put_ready = put_ready;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (11 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

It always have that type, just change it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    6 +++---
 buffered_file.h |    4 +++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 33b700b..59d952d 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -27,7 +27,7 @@ typedef struct QEMUFileBuffered
     BufferedPutReadyFunc *put_ready;
     BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
     BufferedCloseFunc *close;
-    void *migration_state;
+    MigrationState *migration_state;
     QEMUFile *file;
     int freeze_output;
     size_t bytes_xfer;
@@ -226,7 +226,7 @@ static void buffered_rate_tick(void *opaque)
     buffered_put_buffer(s, NULL, 0, 0);
 }

-QEMUFile *qemu_fopen_ops_buffered(void *opaque,
+QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t bytes_per_sec,
                                   BufferedPutFunc *put_buffer,
                                   BufferedPutReadyFunc *put_ready,
@@ -237,7 +237,7 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,

     s = g_malloc0(sizeof(*s));

-    s->migration_state = opaque;
+    s->migration_state = migration_state;
     s->xfer_limit = bytes_per_sec / 10;
     s->put_buffer = put_buffer;
     s->put_ready = put_ready;
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..39f7fa0 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -15,13 +15,15 @@
 #define QEMU_BUFFERED_FILE_H

 #include "hw/hw.h"
+#include "migration.h"

 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
 typedef int (BufferedCloseFunc)(void *opaque);

-QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
+QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
+                                  size_t xfer_limit,
                                   BufferedPutFunc *put_buffer,
                                   BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (12 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready Juan Quintela
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We only used it once, just remove the callback indirection

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    7 ++-----
 buffered_file.h |    2 --
 migration.c     |    6 ++----
 migration.h     |    3 +++
 4 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 59d952d..702a726 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -23,7 +23,6 @@

 typedef struct QEMUFileBuffered
 {
-    BufferedPutFunc *put_buffer;
     BufferedPutReadyFunc *put_ready;
     BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
     BufferedCloseFunc *close;
@@ -78,8 +77,8 @@ static void buffered_flush(QEMUFileBuffered *s)
     while (s->bytes_xfer < s->xfer_limit && offset < s->buffer_size) {
         ssize_t ret;

-        ret = s->put_buffer(s->migration_state, s->buffer + offset,
-                            s->buffer_size - offset);
+        ret = migrate_fd_put_buffer(s->migration_state, s->buffer + offset,
+                                    s->buffer_size - offset);
         if (ret == -EAGAIN) {
             DPRINTF("backend not ready, freezing\n");
             s->freeze_output = 1;
@@ -228,7 +227,6 @@ static void buffered_rate_tick(void *opaque)

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t bytes_per_sec,
-                                  BufferedPutFunc *put_buffer,
                                   BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close)
@@ -239,7 +237,6 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,

     s->migration_state = migration_state;
     s->xfer_limit = bytes_per_sec / 10;
-    s->put_buffer = put_buffer;
     s->put_ready = put_ready;
     s->wait_for_unfreeze = wait_for_unfreeze;
     s->close = close;
diff --git a/buffered_file.h b/buffered_file.h
index 39f7fa0..ca7e62d 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,14 +17,12 @@
 #include "hw/hw.h"
 #include "migration.h"

-typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
 typedef int (BufferedCloseFunc)(void *opaque);

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t xfer_limit,
-                                  BufferedPutFunc *put_buffer,
                                   BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close);
diff --git a/migration.c b/migration.c
index 4be047a..fdc570b 100644
--- a/migration.c
+++ b/migration.c
@@ -303,10 +303,9 @@ static void migrate_fd_put_notify(void *opaque)
     }
 }

-static ssize_t migrate_fd_put_buffer(void *opaque, const void *data,
-                                     size_t size)
+ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
+                              size_t size)
 {
-    MigrationState *s = opaque;
     ssize_t ret;

     if (s->state != MIG_STATE_ACTIVE) {
@@ -440,7 +439,6 @@ void migrate_fd_connect(MigrationState *s)
     s->state = MIG_STATE_ACTIVE;
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
-                                      migrate_fd_put_buffer,
                                       migrate_fd_put_ready,
                                       migrate_fd_wait_for_unfreeze,
                                       migrate_fd_close);
diff --git a/migration.h b/migration.h
index a9852fc..eeaa32c 100644
--- a/migration.h
+++ b/migration.h
@@ -75,6 +75,9 @@ void migrate_fd_error(MigrationState *s);

 void migrate_fd_connect(MigrationState *s);

+ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
+                              size_t size);
+
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_is_active(MigrationState *);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (13 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We only use it once, just remove the callback indirection.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    5 +----
 buffered_file.h |    2 --
 migration.c     |    4 +---
 migration.h     |    1 +
 4 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 702a726..4c6a797 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -23,7 +23,6 @@

 typedef struct QEMUFileBuffered
 {
-    BufferedPutReadyFunc *put_ready;
     BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
     BufferedCloseFunc *close;
     MigrationState *migration_state;
@@ -128,7 +127,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         DPRINTF("file is ready\n");
         if (!s->freeze_output && s->bytes_xfer < s->xfer_limit) {
             DPRINTF("notifying client\n");
-            s->put_ready(s->migration_state);
+            migrate_fd_put_ready(s->migration_state);
         }
     }

@@ -227,7 +226,6 @@ static void buffered_rate_tick(void *opaque)

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t bytes_per_sec,
-                                  BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close)
 {
@@ -237,7 +235,6 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,

     s->migration_state = migration_state;
     s->xfer_limit = bytes_per_sec / 10;
-    s->put_ready = put_ready;
     s->wait_for_unfreeze = wait_for_unfreeze;
     s->close = close;

diff --git a/buffered_file.h b/buffered_file.h
index ca7e62d..dd239b3 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,13 +17,11 @@
 #include "hw/hw.h"
 #include "migration.h"

-typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
 typedef int (BufferedCloseFunc)(void *opaque);

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t xfer_limit,
-                                  BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close);

diff --git a/migration.c b/migration.c
index fdc570b..a54cd77 100644
--- a/migration.c
+++ b/migration.c
@@ -326,9 +326,8 @@ ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
     return ret;
 }

-static void migrate_fd_put_ready(void *opaque)
+void migrate_fd_put_ready(MigrationState *s)
 {
-    MigrationState *s = opaque;
     int ret;

     if (s->state != MIG_STATE_ACTIVE) {
@@ -439,7 +438,6 @@ void migrate_fd_connect(MigrationState *s)
     s->state = MIG_STATE_ACTIVE;
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
-                                      migrate_fd_put_ready,
                                       migrate_fd_wait_for_unfreeze,
                                       migrate_fd_close);

diff --git a/migration.h b/migration.h
index eeaa32c..852db51 100644
--- a/migration.h
+++ b/migration.h
@@ -77,6 +77,7 @@ void migrate_fd_connect(MigrationState *s);

 ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
                               size_t size);
+void migrate_fd_put_ready(MigrationState *s);

 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (14 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 17/27] " Juan Quintela
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We only used it once, just remove the callback indirection.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    5 +----
 buffered_file.h |    2 --
 migration.c     |    4 +---
 migration.h     |    1 +
 4 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 4c6a797..d257496 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -23,7 +23,6 @@

 typedef struct QEMUFileBuffered
 {
-    BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
     BufferedCloseFunc *close;
     MigrationState *migration_state;
     QEMUFile *file;
@@ -145,7 +144,7 @@ static int buffered_close(void *opaque)
     while (!qemu_file_get_error(s->file) && s->buffer_size) {
         buffered_flush(s);
         if (s->freeze_output)
-            s->wait_for_unfreeze(s->migration_state);
+            migrate_fd_wait_for_unfreeze(s->migration_state);
     }

     ret = s->close(s->migration_state);
@@ -226,7 +225,6 @@ static void buffered_rate_tick(void *opaque)

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t bytes_per_sec,
-                                  BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close)
 {
     QEMUFileBuffered *s;
@@ -235,7 +233,6 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,

     s->migration_state = migration_state;
     s->xfer_limit = bytes_per_sec / 10;
-    s->wait_for_unfreeze = wait_for_unfreeze;
     s->close = close;

     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
diff --git a/buffered_file.h b/buffered_file.h
index dd239b3..926e5c6 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,12 +17,10 @@
 #include "hw/hw.h"
 #include "migration.h"

-typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
 typedef int (BufferedCloseFunc)(void *opaque);

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
                                   size_t xfer_limit,
-                                  BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close);

 #endif
diff --git a/migration.c b/migration.c
index a54cd77..3561b16 100644
--- a/migration.c
+++ b/migration.c
@@ -374,9 +374,8 @@ static void migrate_fd_cancel(MigrationState *s)
     migrate_fd_cleanup(s);
 }

-static void migrate_fd_wait_for_unfreeze(void *opaque)
+void migrate_fd_wait_for_unfreeze(MigrationState *s)
 {
-    MigrationState *s = opaque;
     int ret;

     DPRINTF("wait for unfreeze\n");
@@ -438,7 +437,6 @@ void migrate_fd_connect(MigrationState *s)
     s->state = MIG_STATE_ACTIVE;
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
-                                      migrate_fd_wait_for_unfreeze,
                                       migrate_fd_close);

     DPRINTF("beginning savevm\n");
diff --git a/migration.h b/migration.h
index 852db51..62fc4f3 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ void migrate_fd_connect(MigrationState *s);
 ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
                               size_t size);
 void migrate_fd_put_ready(MigrationState *s);
+void migrate_fd_wait_for_unfreeze(MigrationState *s);

 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 17/27] buffered_file: unfold migrate_fd_put_buffer
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (15 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit Juan Quintela
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We only used it once, just remove the callback indirection.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    7 ++-----
 buffered_file.h |    5 +----
 migration.c     |    8 ++------
 migration.h     |    1 +
 4 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index d257496..4fca774 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -23,7 +23,6 @@

 typedef struct QEMUFileBuffered
 {
-    BufferedCloseFunc *close;
     MigrationState *migration_state;
     QEMUFile *file;
     int freeze_output;
@@ -147,7 +146,7 @@ static int buffered_close(void *opaque)
             migrate_fd_wait_for_unfreeze(s->migration_state);
     }

-    ret = s->close(s->migration_state);
+    ret = migrate_fd_close(s->migration_state);

     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
@@ -224,8 +223,7 @@ static void buffered_rate_tick(void *opaque)
 }

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
-                                  size_t bytes_per_sec,
-                                  BufferedCloseFunc *close)
+                                  size_t bytes_per_sec)
 {
     QEMUFileBuffered *s;

@@ -233,7 +231,6 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,

     s->migration_state = migration_state;
     s->xfer_limit = bytes_per_sec / 10;
-    s->close = close;

     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
diff --git a/buffered_file.h b/buffered_file.h
index 926e5c6..8a38754 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,10 +17,7 @@
 #include "hw/hw.h"
 #include "migration.h"

-typedef int (BufferedCloseFunc)(void *opaque);
-
 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
-                                  size_t xfer_limit,
-                                  BufferedCloseFunc *close);
+                                  size_t xfer_limit);

 #endif
diff --git a/migration.c b/migration.c
index 3561b16..c639c99 100644
--- a/migration.c
+++ b/migration.c
@@ -396,10 +396,8 @@ void migrate_fd_wait_for_unfreeze(MigrationState *s)
     }
 }

-static int migrate_fd_close(void *opaque)
+int migrate_fd_close(MigrationState *s)
 {
-    MigrationState *s = opaque;
-
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }
@@ -435,9 +433,7 @@ void migrate_fd_connect(MigrationState *s)
     int ret;

     s->state = MIG_STATE_ACTIVE;
-    s->file = qemu_fopen_ops_buffered(s,
-                                      s->bandwidth_limit,
-                                      migrate_fd_close);
+    s->file = qemu_fopen_ops_buffered(s, s->bandwidth_limit);

     DPRINTF("beginning savevm\n");
     ret = qemu_savevm_state_begin(s->file, &s->params);
diff --git a/migration.h b/migration.h
index 62fc4f3..f3a684a 100644
--- a/migration.h
+++ b/migration.h
@@ -79,6 +79,7 @@ ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
                               size_t size);
 void migrate_fd_put_ready(MigrationState *s);
 void migrate_fd_wait_for_unfreeze(MigrationState *s);
+int migrate_fd_close(MigrationState *s);

 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (16 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 17/27] " Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread Juan Quintela
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    5 ++---
 buffered_file.h |    3 +--
 migration.c     |    2 +-
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 4fca774..43e68b6 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -222,15 +222,14 @@ static void buffered_rate_tick(void *opaque)
     buffered_put_buffer(s, NULL, 0, 0);
 }

-QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
-                                  size_t bytes_per_sec)
+QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)
 {
     QEMUFileBuffered *s;

     s = g_malloc0(sizeof(*s));

     s->migration_state = migration_state;
-    s->xfer_limit = bytes_per_sec / 10;
+    s->xfer_limit = migration_state->bandwidth_limit / 10;

     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
diff --git a/buffered_file.h b/buffered_file.h
index 8a38754..ef010fe 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,7 +17,6 @@
 #include "hw/hw.h"
 #include "migration.h"

-QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state,
-                                  size_t xfer_limit);
+QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state);

 #endif
diff --git a/migration.c b/migration.c
index c639c99..25ea98d 100644
--- a/migration.c
+++ b/migration.c
@@ -433,7 +433,7 @@ void migrate_fd_connect(MigrationState *s)
     int ret;

     s->state = MIG_STATE_ACTIVE;
-    s->file = qemu_fopen_ops_buffered(s, s->bandwidth_limit);
+    s->file = qemu_fopen_ops_buffered(s);

     DPRINTF("beginning savevm\n");
     ret = qemu_savevm_state_begin(s->file, &s->params);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (17 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void Juan Quintela
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We still protect everything except the wait with the iothread lock.
But we moved from a timer to a thread.  Steps one by one.

We also need to detect when we have finished with a variable "complete".

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   58 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 43e68b6..0021b30 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -18,6 +18,7 @@
 #include "qemu-timer.h"
 #include "qemu-char.h"
 #include "buffered_file.h"
+#include "qemu-thread.h"

 //#define DEBUG_BUFFERED_FILE

@@ -31,7 +32,8 @@ typedef struct QEMUFileBuffered
     uint8_t *buffer;
     size_t buffer_size;
     size_t buffer_capacity;
-    QEMUTimer *timer;
+    QemuThread thread;
+    bool complete;
 } QEMUFileBuffered;

 #ifdef DEBUG_BUFFERED_FILE
@@ -147,12 +149,7 @@ static int buffered_close(void *opaque)
     }

     ret = migrate_fd_close(s->migration_state);
-
-    qemu_del_timer(s->timer);
-    qemu_free_timer(s->timer);
-    g_free(s->buffer);
-    g_free(s);
-
+    s->complete = true;
     return ret;
 }

@@ -203,23 +200,38 @@ static int64_t buffered_get_rate_limit(void *opaque)
     return s->xfer_limit;
 }

-static void buffered_rate_tick(void *opaque)
+/* 10ms  xfer_limit is the limit that we should write each 10ms */
+#define BUFFER_DELAY 100
+
+static void *buffered_file_thread(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
+    int64_t expire_time = qemu_get_clock_ms(rt_clock) + BUFFER_DELAY;

-    if (qemu_file_get_error(s->file)) {
-        buffered_close(s);
-        return;
-    }
-
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
-
-    if (s->freeze_output)
-        return;
-
-    s->bytes_xfer = 0;
+    while(true) {
+        int64_t current_time = qemu_get_clock_ms(rt_clock);

-    buffered_put_buffer(s, NULL, 0, 0);
+        if (s->complete) {
+            break;
+        }
+        if (s->freeze_output) {
+            continue;
+        }
+        if (current_time >= expire_time) {
+            s->bytes_xfer = 0;
+            expire_time = current_time + BUFFER_DELAY;
+        }
+        if (s->bytes_xfer >= s->xfer_limit) {
+            /* usleep expects microseconds */
+            usleep((expire_time - current_time)*1000);
+        }
+        qemu_mutex_lock_iothread();
+        buffered_put_buffer(s, NULL, 0, 0);
+        qemu_mutex_unlock_iothread();
+    }
+    g_free(s->buffer);
+    g_free(s);
+    return NULL;
 }

 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)
@@ -230,15 +242,15 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)

     s->migration_state = migration_state;
     s->xfer_limit = migration_state->bandwidth_limit / 10;
+    s->complete = false;

     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
                              buffered_set_rate_limit,
 			     buffered_get_rate_limit);

-    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
-
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    qemu_thread_create(&s->thread, buffered_file_thread, s,
+                       QEMU_THREAD_DETACHED);

     return s->file;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (18 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We want the file assignment to happen before the thread is created to
avoid locking, so we just do it before creating the thread.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   13 ++++++-------
 buffered_file.h |    2 +-
 migration.c     |    2 +-
 migration.h     |    1 +
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 0021b30..94830fa 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -33,7 +33,6 @@ typedef struct QEMUFileBuffered
     size_t buffer_size;
     size_t buffer_capacity;
     QemuThread thread;
-    bool complete;
 } QEMUFileBuffered;

 #ifdef DEBUG_BUFFERED_FILE
@@ -149,7 +148,7 @@ static int buffered_close(void *opaque)
     }

     ret = migrate_fd_close(s->migration_state);
-    s->complete = true;
+    s->migration_state->complete = true;
     return ret;
 }

@@ -211,7 +210,7 @@ static void *buffered_file_thread(void *opaque)
     while(true) {
         int64_t current_time = qemu_get_clock_ms(rt_clock);

-        if (s->complete) {
+        if (s->migration_state->complete) {
             break;
         }
         if (s->freeze_output) {
@@ -234,7 +233,7 @@ static void *buffered_file_thread(void *opaque)
     return NULL;
 }

-QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)
+void qemu_fopen_ops_buffered(MigrationState *migration_state)
 {
     QEMUFileBuffered *s;

@@ -242,15 +241,15 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state)

     s->migration_state = migration_state;
     s->xfer_limit = migration_state->bandwidth_limit / 10;
-    s->complete = false;
+    s->migration_state->complete = false;

     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
                              buffered_set_rate_limit,
 			     buffered_get_rate_limit);

+    migration_state->file = s->file;
+
     qemu_thread_create(&s->thread, buffered_file_thread, s,
                        QEMU_THREAD_DETACHED);
-
-    return s->file;
 }
diff --git a/buffered_file.h b/buffered_file.h
index ef010fe..8a246fd 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -17,6 +17,6 @@
 #include "hw/hw.h"
 #include "migration.h"

-QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state);
+void qemu_fopen_ops_buffered(MigrationState *migration_state);

 #endif
diff --git a/migration.c b/migration.c
index 25ea98d..cd1c11f 100644
--- a/migration.c
+++ b/migration.c
@@ -433,7 +433,7 @@ void migrate_fd_connect(MigrationState *s)
     int ret;

     s->state = MIG_STATE_ACTIVE;
-    s->file = qemu_fopen_ops_buffered(s);
+    qemu_fopen_ops_buffered(s);

     DPRINTF("beginning savevm\n");
     ret = qemu_savevm_state_begin(s->file, &s->params);
diff --git a/migration.h b/migration.h
index f3a684a..a05795b 100644
--- a/migration.h
+++ b/migration.h
@@ -42,6 +42,7 @@ struct MigrationState
     int64_t total_time;
     bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
     int64_t xbzrle_cache_size;
+    bool complete;
 };

 void process_incoming_migration(QEMUFile *f);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (19 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-26 12:54   ` Eric Blake
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 22/27] migration: make writes blocking Juan Quintela
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

You can only stop all cpus from the iothread or an vcpu.  As we want
to do it from the migration_thread, we need to do this dance with the
botton handlers.

This patch is a request for ideas.  I can move this function to cpus.c, but
wondered if there is an easy way of doing this?

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration.c |   46 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/migration.c b/migration.c
index cd1c11f..e3eec97 100644
--- a/migration.c
+++ b/migration.c
@@ -20,6 +20,7 @@
 #include "sysemu.h"
 #include "block.h"
 #include "qemu_socket.h"
+#include "qemu-thread.h"
 #include "block-migration.h"
 #include "qmp-commands.h"

@@ -326,14 +327,37 @@ ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
     return ret;
 }

+static QemuCond migrate_vm_stop_cond;
+
+static void migrate_vm_stop(void *opaque)
+{
+    QEMUBH **bh = opaque;
+    vm_stop(RUN_STATE_FINISH_MIGRATE);
+    qemu_bh_delete(*bh);
+    qemu_cond_signal(&migrate_vm_stop_cond);
+}
+
+extern QemuMutex qemu_global_mutex;
+
 void migrate_fd_put_ready(MigrationState *s)
 {
     int ret;
+    static bool first_time = true;

     if (s->state != MIG_STATE_ACTIVE) {
         DPRINTF("put_ready returning because of non-active state\n");
         return;
     }
+    if (first_time) {
+        first_time = false;
+        DPRINTF("beginning savevm\n");
+        ret = qemu_savevm_state_begin(s->file, &s->params);
+        if (ret < 0) {
+            DPRINTF("failed, %d\n", ret);
+            migrate_fd_error(s);
+            return;
+        }
+    }

     DPRINTF("iterate\n");
     ret = qemu_savevm_state_iterate(s->file);
@@ -344,7 +368,16 @@ void migrate_fd_put_ready(MigrationState *s)

         DPRINTF("done iterating\n");
         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        if (old_vm_running) {
+            QEMUBH *bh;
+
+            qemu_cond_init(&migrate_vm_stop_cond);
+            bh = qemu_bh_new(migrate_vm_stop, &bh);
+            qemu_bh_schedule(bh);
+            qemu_cond_wait(&migrate_vm_stop_cond, &qemu_global_mutex);
+        } else {
+            vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        }

         if (qemu_savevm_state_complete(s->file) < 0) {
             migrate_fd_error(s);
@@ -430,19 +463,8 @@ bool migration_has_failed(MigrationState *s)

 void migrate_fd_connect(MigrationState *s)
 {
-    int ret;
-
     s->state = MIG_STATE_ACTIVE;
     qemu_fopen_ops_buffered(s);
-
-    DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->file, &s->params);
-    if (ret < 0) {
-        DPRINTF("failed, %d\n", ret);
-        migrate_fd_error(s);
-        return;
-    }
-    migrate_fd_put_ready(s);
 }

 static MigrationState *migrate_init(const MigrationParams *params)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 22/27] migration: make writes blocking
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (20 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic Juan Quintela
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Move all the writes to the migration_thread, and make writings
blocking.  Notice that are still using the iothread for everything
that we do.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration-exec.c |    2 --
 migration-fd.c   |    6 ------
 migration-tcp.c  |    2 +-
 migration-unix.c |    2 --
 migration.c      |   18 ------------------
 qemu-file.h      |    5 -----
 savevm.c         |    5 -----
 7 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..908f22e 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -76,8 +76,6 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command)
         goto err_after_open;
     }

-    socket_set_nonblock(s->fd);
-
     s->opaque = qemu_popen(f, "w");

     s->close = exec_close;
diff --git a/migration-fd.c b/migration-fd.c
index 50138ed..e5be972 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -81,11 +81,6 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
         goto err_after_get_fd;
     }

-    if (fcntl(s->fd, F_SETFL, O_NONBLOCK) == -1) {
-        DPRINTF("Unable to set nonblocking mode on file descriptor\n");
-        goto err_after_open;
-    }
-
     s->get_error = fd_errno;
     s->write = fd_write;
     s->close = fd_close;
@@ -93,7 +88,6 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
     migrate_fd_connect(s);
     return 0;

-err_after_open:
     close(s->fd);
 err_after_get_fd:
     return -1;
diff --git a/migration-tcp.c b/migration-tcp.c
index 440804d..d2bfb03 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -86,7 +86,7 @@ int tcp_start_outgoing_migration(MigrationState *s, const char *host_port,
     s->write = socket_write;
     s->close = tcp_close;

-    s->fd = inet_connect(host_port, false, errp);
+    s->fd = inet_connect(host_port, true, errp);

     if (!error_is_set(errp)) {
         migrate_fd_connect(s);
diff --git a/migration-unix.c b/migration-unix.c
index 169de88..bb41bac 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -96,8 +96,6 @@ int unix_start_outgoing_migration(MigrationState *s, const char *path)
         return -errno;
     }

-    socket_set_nonblock(s->fd);
-
     do {
         ret = connect(s->fd, (struct sockaddr *)&addr, sizeof(addr));
         if (ret == -1) {
diff --git a/migration.c b/migration.c
index e3eec97..fe2afd0 100644
--- a/migration.c
+++ b/migration.c
@@ -257,8 +257,6 @@ static int migrate_fd_cleanup(MigrationState *s)
 {
     int ret = 0;

-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
     if (s->file) {
         DPRINTF("closing file\n");
         ret = qemu_fclose(s->file);
@@ -293,17 +291,6 @@ static void migrate_fd_completed(MigrationState *s)
     notifier_list_notify(&migration_state_notifiers, s);
 }

-static void migrate_fd_put_notify(void *opaque)
-{
-    MigrationState *s = opaque;
-
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-    qemu_file_put_notify(s->file);
-    if (s->file && qemu_file_get_error(s->file)) {
-        migrate_fd_error(s);
-    }
-}
-
 ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
                               size_t size)
 {
@@ -320,10 +307,6 @@ ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
     if (ret == -1)
         ret = -(s->get_error(s));

-    if (ret == -EAGAIN) {
-        qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
-    }
-
     return ret;
 }

@@ -431,7 +414,6 @@ void migrate_fd_wait_for_unfreeze(MigrationState *s)

 int migrate_fd_close(MigrationState *s)
 {
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }

diff --git a/qemu-file.h b/qemu-file.h
index 31b83f6..af7c36f 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -106,11 +106,6 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int error);

-/* Try to send any outstanding data.  This function is useful when output is
- * halted due to rate limiting or EAGAIN errors occur as it can be used to
- * resume output. */
-void qemu_file_put_notify(QEMUFile *f);
-
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
     qemu_put_be64(f, *pv);
diff --git a/savevm.c b/savevm.c
index c5fd13f..2c34faa 100644
--- a/savevm.c
+++ b/savevm.c
@@ -546,11 +546,6 @@ int qemu_fclose(QEMUFile *f)
     return ret;
 }

-void qemu_file_put_notify(QEMUFile *f)
-{
-    f->put_buffer(f->opaque, NULL, 0, 0);
-}
-
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
 {
     int l;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (21 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 22/27] migration: make writes blocking Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 24/27] migration: take finer locking Juan Quintela
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Now that we have a thread, and blocking writes, we don't need it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   19 +------------------
 migration.c     |   22 ----------------------
 migration.h     |    1 -
 3 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 94830fa..b5e809c 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -26,7 +26,6 @@ typedef struct QEMUFileBuffered
 {
     MigrationState *migration_state;
     QEMUFile *file;
-    int freeze_output;
     size_t bytes_xfer;
     size_t xfer_limit;
     uint8_t *buffer;
@@ -77,12 +76,6 @@ static void buffered_flush(QEMUFileBuffered *s)

         ret = migrate_fd_put_buffer(s->migration_state, s->buffer + offset,
                                     s->buffer_size - offset);
-        if (ret == -EAGAIN) {
-            DPRINTF("backend not ready, freezing\n");
-            s->freeze_output = 1;
-            break;
-        }
-
         if (ret <= 0) {
             DPRINTF("error flushing data, %zd\n", ret);
             qemu_file_set_error(s->file, ret);
@@ -112,9 +105,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         return error;
     }

-    DPRINTF("unfreezing output\n");
-    s->freeze_output = 0;
-
     if (size > 0) {
         DPRINTF("buffering %d bytes\n", size - offset);
         buffered_append(s, buf, size);
@@ -124,7 +114,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in

     if (pos == 0 && size == 0) {
         DPRINTF("file is ready\n");
-        if (!s->freeze_output && s->bytes_xfer < s->xfer_limit) {
+        if (s->bytes_xfer < s->xfer_limit) {
             DPRINTF("notifying client\n");
             migrate_fd_put_ready(s->migration_state);
         }
@@ -143,8 +133,6 @@ static int buffered_close(void *opaque)
     s->xfer_limit = INT_MAX;
     while (!qemu_file_get_error(s->file) && s->buffer_size) {
         buffered_flush(s);
-        if (s->freeze_output)
-            migrate_fd_wait_for_unfreeze(s->migration_state);
     }

     ret = migrate_fd_close(s->migration_state);
@@ -167,8 +155,6 @@ static int buffered_rate_limit(void *opaque)
     if (ret) {
         return ret;
     }
-    if (s->freeze_output)
-        return 1;

     if (s->bytes_xfer > s->xfer_limit)
         return 1;
@@ -213,9 +199,6 @@ static void *buffered_file_thread(void *opaque)
         if (s->migration_state->complete) {
             break;
         }
-        if (s->freeze_output) {
-            continue;
-        }
         if (current_time >= expire_time) {
             s->bytes_xfer = 0;
             expire_time = current_time + BUFFER_DELAY;
diff --git a/migration.c b/migration.c
index fe2afd0..da0f192 100644
--- a/migration.c
+++ b/migration.c
@@ -390,28 +390,6 @@ static void migrate_fd_cancel(MigrationState *s)
     migrate_fd_cleanup(s);
 }

-void migrate_fd_wait_for_unfreeze(MigrationState *s)
-{
-    int ret;
-
-    DPRINTF("wait for unfreeze\n");
-    if (s->state != MIG_STATE_ACTIVE)
-        return;
-
-    do {
-        fd_set wfds;
-
-        FD_ZERO(&wfds);
-        FD_SET(s->fd, &wfds);
-
-        ret = select(s->fd + 1, NULL, &wfds, NULL, NULL);
-    } while (ret == -1 && (s->get_error(s)) == EINTR);
-
-    if (ret == -1) {
-        qemu_file_set_error(s->file, -s->get_error(s));
-    }
-}
-
 int migrate_fd_close(MigrationState *s)
 {
     return s->close(s);
diff --git a/migration.h b/migration.h
index a05795b..7c7a261 100644
--- a/migration.h
+++ b/migration.h
@@ -79,7 +79,6 @@ void migrate_fd_connect(MigrationState *s);
 ssize_t migrate_fd_put_buffer(MigrationState *s, const void *data,
                               size_t size);
 void migrate_fd_put_ready(MigrationState *s);
-void migrate_fd_wait_for_unfreeze(MigrationState *s);
 int migrate_fd_close(MigrationState *s);

 void add_migration_state_change_notifier(Notifier *notify);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 24/27] migration: take finer locking
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (22 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data Juan Quintela
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

Instead of locking the whole migration_thread inside loop, just lock
migration_fd_put_notify, that is what interacts with the rest of the
world.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    2 --
 migration.c     |    5 +++++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index b5e809c..de07a05 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -207,9 +207,7 @@ static void *buffered_file_thread(void *opaque)
             /* usleep expects microseconds */
             usleep((expire_time - current_time)*1000);
         }
-        qemu_mutex_lock_iothread();
         buffered_put_buffer(s, NULL, 0, 0);
-        qemu_mutex_unlock_iothread();
     }
     g_free(s->buffer);
     g_free(s);
diff --git a/migration.c b/migration.c
index da0f192..2f45f66 100644
--- a/migration.c
+++ b/migration.c
@@ -327,8 +327,10 @@ void migrate_fd_put_ready(MigrationState *s)
     int ret;
     static bool first_time = true;

+    qemu_mutex_lock_iothread();
     if (s->state != MIG_STATE_ACTIVE) {
         DPRINTF("put_ready returning because of non-active state\n");
+        qemu_mutex_unlock_iothread();
         return;
     }
     if (first_time) {
@@ -338,6 +340,7 @@ void migrate_fd_put_ready(MigrationState *s)
         if (ret < 0) {
             DPRINTF("failed, %d\n", ret);
             migrate_fd_error(s);
+            qemu_mutex_unlock_iothread();
             return;
         }
     }
@@ -374,6 +377,8 @@ void migrate_fd_put_ready(MigrationState *s)
             }
         }
     }
+    qemu_mutex_unlock_iothread();
+
 }

 static void migrate_fd_cancel(MigrationState *s)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (23 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 24/27] migration: take finer locking Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer Juan Quintela
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

This was needed before due to the way that the callbacks worked.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index de07a05..ac2327b 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -112,14 +112,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in

     buffered_flush(s);

-    if (pos == 0 && size == 0) {
-        DPRINTF("file is ready\n");
-        if (s->bytes_xfer < s->xfer_limit) {
-            DPRINTF("notifying client\n");
-            migrate_fd_put_ready(s->migration_state);
-        }
-    }
-
     return size;
 }

@@ -207,8 +199,15 @@ static void *buffered_file_thread(void *opaque)
             /* usleep expects microseconds */
             usleep((expire_time - current_time)*1000);
         }
-        buffered_put_buffer(s, NULL, 0, 0);
+        buffered_flush(s);
+
+        DPRINTF("file is ready\n");
+        if (s->bytes_xfer < s->xfer_limit) {
+            DPRINTF("notifying client\n");
+            migrate_fd_put_ready(s->migration_state);
+        }
     }
+
     g_free(s->buffer);
     g_free(s);
     return NULL;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (24 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer Juan Quintela
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

We call buffered_put_buffer with iothread held, and buffered_flush() does
synchronous writes.  We only want to do the synchronous writes outside.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index ac2327b..d3bed86 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -110,8 +110,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         buffered_append(s, buf, size);
     }

-    buffered_flush(s);
-
     return size;
 }

-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (25 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer Juan Quintela
@ 2012-07-24 18:36 ` Juan Quintela
  2012-07-25  9:55 ` [Qemu-devel] [RFC 00/27] Migration thread (WIP) Orit Wasserman
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-24 18:36 UTC (permalink / raw)
  To: qemu-devel

It was the only user, and now buffered_put_buffer just do the append

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 buffered_file.c |   33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index d3bed86..f1a21fa 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -42,22 +42,6 @@ typedef struct QEMUFileBuffered
     do { } while (0)
 #endif

-static void buffered_append(QEMUFileBuffered *s,
-                            const uint8_t *buf, size_t size)
-{
-    if (size > (s->buffer_capacity - s->buffer_size)) {
-        DPRINTF("increasing buffer capacity from %zu by %zu\n",
-                s->buffer_capacity, size + 1024);
-
-        s->buffer_capacity += size + 1024;
-
-        s->buffer = g_realloc(s->buffer, s->buffer_capacity);
-    }
-
-    memcpy(s->buffer + s->buffer_size, buf, size);
-    s->buffer_size += size;
-}
-
 static void buffered_flush(QEMUFileBuffered *s)
 {
     size_t offset = 0;
@@ -105,11 +89,22 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         return error;
     }

-    if (size > 0) {
-        DPRINTF("buffering %d bytes\n", size - offset);
-        buffered_append(s, buf, size);
+    if (size <= 0) {
+        return size;
     }

+    if (size > (s->buffer_capacity - s->buffer_size)) {
+        DPRINTF("increasing buffer capacity from %zu by %zu\n",
+                s->buffer_capacity, size + 1024);
+
+        s->buffer_capacity += size + 1024;
+
+        s->buffer = g_realloc(s->buffer, s->buffer_capacity);
+    }
+
+    memcpy(s->buffer + s->buffer_size, buf, size);
+    s->buffer_size += size;
+
     return size;
 }

-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 10/27] Separate migration bitmap
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
@ 2012-07-25  9:16   ` Avi Kivity
  2012-07-26  9:22     ` Juan Quintela
  0 siblings, 1 reply; 45+ messages in thread
From: Avi Kivity @ 2012-07-25  9:16 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, Umesh Deshpande, qemu-devel

On 07/24/2012 09:36 PM, Juan Quintela wrote:
> This patch creates a migration bitmap, which is periodically kept in
> sync with the qemu bitmap. A separate copy of the dirty bitmap for the
> migration limits the amount of concurrent access to the qemu bitmap
> from iothread and migration thread (which requires taking the big
> lock).
> 
> We use the qemu bitmap type.  We have to "undo" the dirty_pages
> counting optimization on the general dirty bitmap and do the counting
> optimization with the migration local bitmap.

I had different plans for this (and a stale and possibly smelly patchset
moving in that direction):

- move the dirty bytemap from a single global instance to
per-memory-region instances (in the MemoryRegion structure)
- convert it from a bytemap to either a per-client bitmap (client =
VGA/CODE/MIGRATION) or a variable bit-length bitfieldmap
- allocate the bitmaps or strangely named thingie above on demand, so
ordinarily you have nothing allocated and the framebuffer has a bitmap,
when you start migration you allocate a bitmap for memory and a
twobitmap for the framebuffer
- protect the whole thing using rcu

The patchset is stalled, mostly because it's very difficult to
disentangle the tcg stuff.  I don't think we should introduce a
dependency here, just something to keep in mind.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (26 preceding siblings ...)
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer Juan Quintela
@ 2012-07-25  9:55 ` Orit Wasserman
  2012-07-26 10:57 ` Jan Kiszka
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 45+ messages in thread
From: Orit Wasserman @ 2012-07-25  9:55 UTC (permalink / raw)
  To: qemu-devel

Hi,
I started with some performance testing , the results look very good, one of the workload that wasn't converging with upstream version (migration completed only when I stopped the workload) seems to converge.

I used FC16 guest with 1G ram , default speed and downtime

workload     upstream transferred   upstream total time  thread transferred   thread total time
boot	      372028k		    14402ms 		 463966k	      2764ms
idle	      496512k	            14741ms		 976334k	      2847ms
iozone        913535k               27346ms              108872k              3021ms
stressapptest  stopped the workload after 400s           1827221k             4998ms

Regards,
Orit

On 07/24/2012 09:36 PM, Juan Quintela wrote:
> Hi
> 
> This series are on top of the migration-next-v5 series just posted.
> 
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
> 
> It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
> 
> - About the migration thread, special attention was giving to try to
>   get the series review-able (reviewers would tell if I got it).
> 
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>   except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>   we walk the state with the iothread hold, and copy everything to one buffer.
>   then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
> 
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got
> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
> 
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>   I use the trick of using a bottom half to get that working correctly.
>   but this _implementation_ is ugly as hell.  Is there an easy way
>   of doing it?
> - Do I really have to export last_ram_offset(), there is no other way
>   of knowing the ammount of RAM?
> 
> Known issues:
> 
> - for some reason, when it has to start a 2nd round of bitmap
>   handling, it decides to dirty all pages.  Haven't found still why
>   this happens.
> 
> If you can test it, and said me where it breaks, it would also help.
> 
> Work is based on Umesh thread work, and work that Paolo Bonzini had
> work on top of that.  All the mirgation thread was done from scratch
> becase I was unable to debug why it was failing, but it "owes" a lot
> to the previous design.
> 
> Thanks in advance, Juan.
> 
> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
> 
>   Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
> 
> are available in the git repository at:
> 
> 
>   http://repo.or.cz/r/qemu/quintela.git migration-thread-v1
> 
> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
> 
>   buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
> 
> 
> Juan Quintela (23):
>   buffered_file: g_realloc() can't fail
>   savevm: Factorize ram globals reset in its own function
>   ram: introduce migration_bitmap_set_dirty()
>   ram: Introduce migration_bitmap_test_and_reset_dirty()
>   ram: Export last_ram_offset()
>   ram: introduce migration_bitmap_sync()
>   Separate migration bitmap
>   buffered_file: rename opaque to migration_state
>   buffered_file: opaque is MigrationState
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_ready
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: We can access directly to bandwidth_limit
>   buffered_file: Move from using a timer to use a thread
>   migration: make qemu_fopen_ops_buffered() return void
>   migration: stop all cpus correctly
>   migration: make writes blocking
>   migration: remove unfreeze logic
>   migration: take finer locking
>   buffered_file: Unfold the trick to restart generating migration data
>   buffered_file: don't flush on put buffer
>   buffered_file: unfold buffered_append in buffered_put_buffer
> 
> Paolo Bonzini (2):
>   split MRU ram list
>   BufferedFile: append, then flush
> 
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
> 
>  arch_init.c      |  108 +++++++++++++++++++++++++-------
>  buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>  buffered_file.h  |   12 +---
>  cpu-all.h        |   17 +++++-
>  exec-obsolete.h  |   10 ---
>  exec.c           |   45 +++++++++++---
>  migration-exec.c |    2 -
>  migration-fd.c   |    6 --
>  migration-tcp.c  |    2 +-
>  migration-unix.c |    2 -
>  migration.c      |  111 ++++++++++++++-------------------
>  migration.h      |    6 ++
>  qemu-file.h      |    5 --
>  savevm.c         |    5 --
>  14 files changed, 249 insertions(+), 261 deletions(-)
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 02/27] split MRU ram list
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
@ 2012-07-25 20:20   ` Michael Roth
  2012-07-26 13:19     ` Avi Kivity
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Roth @ 2012-07-25 20:20 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, ryanh, qemu-devel

On Tue, Jul 24, 2012 at 08:36:27PM +0200, Juan Quintela wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Outside the execution threads the normal, non-MRU-ized order of
> the RAM blocks should always be enough.  So manage two separate
> lists, which will have separate locking rules.

One thing I'm noticing is that, prior to this series, we're traversing the
blocks in MRU order for migration. This seems counter-intuitive, as those are
the blocks most likely to get re-dirtied and re-sent, so it make sense to hold
off on sending those till last to reduce the amount of time the running guest
has to invalidate the target's copy of it.

This isn't as bad as it could be, since we at least don't restart the
loop on every iteration, but it might still make sense to come up with a way
to keep RAMList.blocks roughly in sync with RAMList.blocks_mru, and then
traverse that in reverse order for ram_save_iterate. The fact that we're
switching from the MRU ordering in the current version might be
obscuring performance issues as well, which is probably worth keeping in
mind.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  cpu-all.h |    4 +++-
>  exec.c    |   16 +++++++++++-----
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index 82ba1d7..ca3bb24 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -476,8 +476,9 @@ typedef struct RAMBlock {
>      ram_addr_t offset;
>      ram_addr_t length;
>      uint32_t flags;
> -    char idstr[256];
> +    QLIST_ENTRY(RAMBlock) next_mru;
>      QLIST_ENTRY(RAMBlock) next;
> +    char idstr[256];
>  #if defined(__linux__) && !defined(TARGET_S390X)
>      int fd;
>  #endif
> @@ -485,6 +486,7 @@ typedef struct RAMBlock {
> 
>  typedef struct RAMList {
>      uint8_t *phys_dirty;
> +    QLIST_HEAD(, RAMBlock) blocks_mru;
>      QLIST_HEAD(, RAMBlock) blocks;
>      uint64_t dirty_pages;
>  } RAMList;
> diff --git a/exec.c b/exec.c
> index feb4795..afc472f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -112,7 +112,10 @@ static uint8_t *code_gen_ptr;
>  int phys_ram_fd;
>  static int in_migration;
> 
> -RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
> +RAMList ram_list = {
> +    .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks),
> +    .blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
> +};
> 
>  static MemoryRegion *system_memory;
>  static MemoryRegion *system_io;
> @@ -2550,6 +2553,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>      new_block->length = size;
> 
>      QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
> +    QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
> 
>      ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
>                                         last_ram_offset() >> TARGET_PAGE_BITS);
> @@ -2573,6 +2577,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
>      QLIST_FOREACH(block, &ram_list.blocks, next) {
>          if (addr == block->offset) {
>              QLIST_REMOVE(block, next);
> +            QLIST_REMOVE(block, next_mru);
>              g_free(block);
>              return;
>          }
> @@ -2586,6 +2591,7 @@ void qemu_ram_free(ram_addr_t addr)
>      QLIST_FOREACH(block, &ram_list.blocks, next) {
>          if (addr == block->offset) {
>              QLIST_REMOVE(block, next);
> +            QLIST_REMOVE(block, next_mru);
>              if (block->flags & RAM_PREALLOC_MASK) {
>                  ;
>              } else if (mem_path) {
> @@ -2690,12 +2696,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
>  {
>      RAMBlock *block;
> 
> -    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
>          if (addr - block->offset < block->length) {
>              /* Move this entry to to start of the list.  */
>              if (block != QLIST_FIRST(&ram_list.blocks)) {
> -                QLIST_REMOVE(block, next);
> -                QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
> +                QLIST_REMOVE(block, next_mru);
> +                QLIST_INSERT_HEAD(&ram_list.blocks_mru, block, next_mru);
>              }
>              if (xen_enabled()) {
>                  /* We need to check if the requested address is in the RAM
> @@ -2790,7 +2796,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
>          return 0;
>      }
> 
> -    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
>          /* This case append when the block is not mapped. */
>          if (block->host == NULL) {
>              continue;
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 04/27] add a version number to ram_list
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
@ 2012-07-25 23:27   ` Michael Roth
  2012-07-26  9:19     ` Juan Quintela
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Roth @ 2012-07-25 23:27 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, Umesh Deshpande, qemu-devel

On Tue, Jul 24, 2012 at 08:36:29PM +0200, Juan Quintela wrote:
> From: Umesh Deshpande <udeshpan@redhat.com>
> 
> This will be used to detect if last_block might have become invalid
> across different calls to ram_save_live.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  arch_init.c |    6 ++++++
>  cpu-all.h   |    1 +
>  exec.c      |    4 ++++
>  3 files changed, 11 insertions(+)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 02d36ce..9e05aae 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -340,6 +340,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
> 
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> +static uint32_t last_version;
> 
>  /*
>   * ram_save_block: Writes a page of memory to the stream f
> @@ -496,6 +497,7 @@ static void reset_ram_globals(void)
>  {
>      last_block = NULL;
>      last_offset = 0;
> +    last_version = ram_list.version;
>      sort_ram_list();
>  }
> 
> @@ -554,6 +556,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      int i;
>      uint64_t expected_time;
> 
> +    if (ram_list.version != last_version) {
> +        reset_ram_globals();
> +    }
> +

Was it a bug in the old code to not do this? In the unthreaded version
it looks like last_block could still become invalid between iterations if it
was removed in the meantime, so I'm trying to understand why it's needed
now.

And why do we re-sort when this happens? The old implementation didn't
seem to place any guarantees on the block ordering beyond the initial
sorting by block->idstr. After that they were sent in MRU order.

>      bytes_transferred_last = bytes_transferred;
>      bwidth = qemu_get_clock_ns(rt_clock);
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index ca3bb24..429b2c6 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -486,6 +486,7 @@ typedef struct RAMBlock {
> 
>  typedef struct RAMList {
>      uint8_t *phys_dirty;
> +    uint32_t version;
>      QLIST_HEAD(, RAMBlock) blocks_mru;
>      QLIST_HEAD(, RAMBlock) blocks;
>      uint64_t dirty_pages;
> diff --git a/exec.c b/exec.c
> index afc472f..ceffa4f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2555,6 +2555,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>      QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
>      QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
> 
> +    ram_list.version++;
> +
>      ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
>                                         last_ram_offset() >> TARGET_PAGE_BITS);
>      cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
> @@ -2578,6 +2580,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
>          if (addr == block->offset) {
>              QLIST_REMOVE(block, next);
>              QLIST_REMOVE(block, next_mru);
> +            ram_list.version++;
>              g_free(block);
>              return;
>          }
> @@ -2592,6 +2595,7 @@ void qemu_ram_free(ram_addr_t addr)
>          if (addr == block->offset) {
>              QLIST_REMOVE(block, next);
>              QLIST_REMOVE(block, next_mru);
> +            ram_list.version++;
>              if (block->flags & RAM_PREALLOC_MASK) {
>                  ;
>              } else if (mem_path) {
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 04/27] add a version number to ram_list
  2012-07-25 23:27   ` Michael Roth
@ 2012-07-26  9:19     ` Juan Quintela
  0 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-26  9:19 UTC (permalink / raw)
  To: Michael Roth; +Cc: Paolo Bonzini, Umesh Deshpande, qemu-devel

Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Tue, Jul 24, 2012 at 08:36:29PM +0200, Juan Quintela wrote:
>> From: Umesh Deshpande <udeshpan@redhat.com>
>> 
>> This will be used to detect if last_block might have become invalid
>> across different calls to ram_save_live.

>
> Was it a bug in the old code to not do this? In the unthreaded version
> it looks like last_block could still become invalid between iterations if it
> was removed in the meantime, so I'm trying to understand why it's needed
> now.

Old code assumed that we _never_ do hotplug/unplug of anything during
migration (I am still not convinced that everything works if we plug
some devices, but now at least memory works).

> And why do we re-sort when this happens? The old implementation didn't
> seem to place any guarantees on the block ordering beyond the initial
> sorting by block->idstr. After that they were sent in MRU order.

It is a "subtle" attempt to try to get the memory in order when we
suspend to disk.  So we could compare two files done while we were
stopped.  And no, I didn't liked when it was introduced.

Later, Juan.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 10/27] Separate migration bitmap
  2012-07-25  9:16   ` Avi Kivity
@ 2012-07-26  9:22     ` Juan Quintela
  0 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-07-26  9:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, Umesh Deshpande, qemu-devel

Avi Kivity <avi@redhat.com> wrote:
> On 07/24/2012 09:36 PM, Juan Quintela wrote:
>> This patch creates a migration bitmap, which is periodically kept in
>> sync with the qemu bitmap. A separate copy of the dirty bitmap for the
>> migration limits the amount of concurrent access to the qemu bitmap
>> from iothread and migration thread (which requires taking the big
>> lock).
>> 
>> We use the qemu bitmap type.  We have to "undo" the dirty_pages
>> counting optimization on the general dirty bitmap and do the counting
>> optimization with the migration local bitmap.
>
> I had different plans for this (and a stale and possibly smelly patchset
> moving in that direction):
>
> - move the dirty bytemap from a single global instance to
> per-memory-region instances (in the MemoryRegion structure)
> - convert it from a bytemap to either a per-client bitmap (client =
> VGA/CODE/MIGRATION) or a variable bit-length bitfieldmap
> - allocate the bitmaps or strangely named thingie above on demand, so
> ordinarily you have nothing allocated and the framebuffer has a bitmap,
> when you start migration you allocate a bitmap for memory and a
> twobitmap for the framebuffer
> - protect the whole thing using rcu
>
> The patchset is stalled, mostly because it's very difficult to
> disentangle the tcg stuff.  I don't think we should introduce a
> dependency here, just something to keep in mind.

I tried something like that myself (before your memory regions work).
And got stuck on the same problem:
- for migration, I always had a ramblock handy
- for vga the same (well, I am not sure for the sparc ones, I think they
  were weird on that respect)
- for TCG, there is none around that I can find.  I guess it is there
  somewhere, but not in any obvious part.

So, to fix TCG, we need a TCG guru, and probably change the access
parters somehow :p

Later, Juan.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (27 preceding siblings ...)
  2012-07-25  9:55 ` [Qemu-devel] [RFC 00/27] Migration thread (WIP) Orit Wasserman
@ 2012-07-26 10:57 ` Jan Kiszka
  2012-07-26 11:16   ` Juan Quintela
       [not found] ` <500EF579.5040607@redhat.com>
  2012-07-26 21:36 ` [Qemu-devel] " Michael Roth
  30 siblings, 1 reply; 45+ messages in thread
From: Jan Kiszka @ 2012-07-26 10:57 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On 2012-07-24 20:36, Juan Quintela wrote:
> Hi
> 
> This series are on top of the migration-next-v5 series just posted.
> 
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
> 
> It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
> 
> - About the migration thread, special attention was giving to try to
>   get the series review-able (reviewers would tell if I got it).
> 
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>   except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>   we walk the state with the iothread hold, and copy everything to one buffer.
>   then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
> 
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got
> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
> 
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>   I use the trick of using a bottom half to get that working correctly.
>   but this _implementation_ is ugly as hell.  Is there an easy way
>   of doing it?

vm_stop is prepared to be called from vcpu context as well. I'm not sure
right now if we actually do, but the code is there.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-26 10:57 ` Jan Kiszka
@ 2012-07-26 11:16   ` Juan Quintela
  2012-07-26 11:56     ` Jan Kiszka
  0 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-26 11:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel

Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-07-24 20:36, Juan Quintela wrote:
>> Hi

>> Appart of the review:
>> - Are there any locking issues that I have missed (I guess so)
>> - stop all cpus correctly.  vm_stop should be called from the iothread,
>>   I use the trick of using a bottom half to get that working correctly.
>>   but this _implementation_ is ugly as hell.  Is there an easy way
>>   of doing it?
>
> vm_stop is prepared to be called from vcpu context as well. I'm not sure
> right now if we actually do, but the code is there.

But this is a migation_thread (i.e. neither iothread of vcpu), and we
need to wait for vm_stop to finish.  My reading is that in vcpu context,
we just ask the iothread to stop all cpus.

void vm_stop(RunState state)
{
    if (!qemu_thread_is_self(&io_thread)) {
        qemu_system_vmstop_request(state);
        /*
         * FIXME: should not return to device code in case
         * vm_stop() has been requested.
         */
        cpu_stop_current();
        return;
    }
    do_vm_stop(state);
}

Or I am reading it wrong?

Later, Juan.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-26 11:16   ` Juan Quintela
@ 2012-07-26 11:56     ` Jan Kiszka
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Kiszka @ 2012-07-26 11:56 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel

On 2012-07-26 13:16, Juan Quintela wrote:
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-07-24 20:36, Juan Quintela wrote:
>>> Hi
> 
>>> Appart of the review:
>>> - Are there any locking issues that I have missed (I guess so)
>>> - stop all cpus correctly.  vm_stop should be called from the iothread,
>>>   I use the trick of using a bottom half to get that working correctly.
>>>   but this _implementation_ is ugly as hell.  Is there an easy way
>>>   of doing it?
>>
>> vm_stop is prepared to be called from vcpu context as well. I'm not sure
>> right now if we actually do, but the code is there.
> 
> But this is a migation_thread (i.e. neither iothread of vcpu), and we
> need to wait for vm_stop to finish.  My reading is that in vcpu context,
> we just ask the iothread to stop all cpus.
> 
> void vm_stop(RunState state)
> {
>     if (!qemu_thread_is_self(&io_thread)) {
>         qemu_system_vmstop_request(state);
>         /*
>          * FIXME: should not return to device code in case
>          * vm_stop() has been requested.
>          */
>         cpu_stop_current();
>         return;
>     }
>     do_vm_stop(state);
> }
> 
> Or I am reading it wrong?

Ah, indeed. Did you try top make this service ready for such a use case
(sorry, didn't look at the code yet)? Something like issuing
qemu_system_vmstop_request and then resuming the with the next step on a
vmstate change notification?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly
  2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
@ 2012-07-26 12:54   ` Eric Blake
  0 siblings, 0 replies; 45+ messages in thread
From: Eric Blake @ 2012-07-26 12:54 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On 07/24/2012 12:36 PM, Juan Quintela wrote:
> You can only stop all cpus from the iothread or an vcpu.  As we want
> to do it from the migration_thread, we need to do this dance with the
> botton handlers.

s/botton/bottom/ ?

> 
> This patch is a request for ideas.  I can move this function to cpus.c, but
> wondered if there is an easy way of doing this?

Sorry, I'm not able to help there.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH 02/27] split MRU ram list
  2012-07-25 20:20   ` Michael Roth
@ 2012-07-26 13:19     ` Avi Kivity
  0 siblings, 0 replies; 45+ messages in thread
From: Avi Kivity @ 2012-07-26 13:19 UTC (permalink / raw)
  To: Michael Roth; +Cc: Paolo Bonzini, ryanh, qemu-devel, Juan Quintela

On 07/25/2012 11:20 PM, Michael Roth wrote:
> On Tue, Jul 24, 2012 at 08:36:27PM +0200, Juan Quintela wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> 
>> Outside the execution threads the normal, non-MRU-ized order of
>> the RAM blocks should always be enough.  So manage two separate
>> lists, which will have separate locking rules.
> 
> One thing I'm noticing is that, prior to this series, we're traversing the
> blocks in MRU order for migration. This seems counter-intuitive, as those are
> the blocks most likely to get re-dirtied and re-sent, so it make sense to hold
> off on sending those till last to reduce the amount of time the running guest
> has to invalidate the target's copy of it.
> 
> This isn't as bad as it could be, since we at least don't restart the
> loop on every iteration, but it might still make sense to come up with a way
> to keep RAMList.blocks roughly in sync with RAMList.blocks_mru, and then
> traverse that in reverse order for ram_save_iterate. The fact that we're
> switching from the MRU ordering in the current version might be
> obscuring performance issues as well, which is probably worth keeping in
> mind.
> 

Main memory is the only ram block which matters (the framebuffer a
remote second).  The others are ROMs.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] Fwd:  [RFC 00/27] Migration thread (WIP)
       [not found] ` <500EF579.5040607@redhat.com>
@ 2012-07-26 18:41   ` Chegu Vinod
  2012-07-26 21:26     ` Chegu Vinod
  0 siblings, 1 reply; 45+ messages in thread
From: Chegu Vinod @ 2012-07-26 18:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, Juan Jose Quintela Carreira

[-- Attachment #1: Type: text/plain, Size: 7442 bytes --]



> -------- Original Message --------
> Subject: 	[Qemu-devel] [RFC 00/27] Migration thread (WIP)
> Date: 	Tue, 24 Jul 2012 20:36:25 +0200
> From: 	Juan Quintela <quintela@redhat.com>
> To: 	qemu-devel@nongnu.org
>
>
>
> Hi
>
> This series are on top of the migration-next-v5 series just posted.
>
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
Hello,

Thanks for sharing this early/WIP version for evaluation.

Still in the middle of  code review..but wanted to share a couple of 
quick  observations.
'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G and 
downtime 2s).
Once with no workload (i.e. idle guest) and the second was with a 
SpecJBB running in the guest.

The idle guest case seemed to migrate fine...


capabilities: xbzrle: off
Migration status: completed
transferred ram: 3811345 kbytes
remaining ram: 0 kbytes
total ram: 134226368 kbytes
total time: 199743 milliseconds


In the case of the SpecJBB I ran into issues during stage 3...the source 
host's qemu and the guest hung. I need to debug this more... (if  
already have some hints pl. let me know.).


capabilities: xbzrle: off
Migration status: active
transferred ram: 127618578 kbytes
remaining ram: 2386832 kbytes
total ram: 134226368 kbytes
total time: 526139 milliseconds
(qemu) qemu_savevm_state_complete called
qemu_savevm_state_complete calling ram_save_complete

<---  hung somewhere after this ('need to get more info).


---

As with the non-migration-thread version the Specjbb workload completed 
before the migration attempted to move to stage 3 (i.e. didn't converge 
while the workload was still active).

BTW, with this version of the bits (i.e. while running SpecJBB which is 
supposed to dirty quite a bit of memory) I noticed that there wasn't 
much change in the b/w usage of the dedicated 10Gb private network link 
(It was still < ~1.5-3.0Gb/sec).   Expected this to be a little better 
since we have a separate thread...  not sure what else is in play here ? 
(numa locality of where the migration thread runs or something other 
basic tuning in the implementation ?)

'have a hi-level design question... (perhaps folks have already thought 
about it..and categorized it as potential future optimization..?)

Would it be possible to off load the iothread completely [from all 
migration related activity] and have one thread (with the appropriate 
protection) get involved with getting the list of the dirty pages ? Have 
one or more threads dedicated for trying to push multiple streams of 
data to saturate the allocated network bandwidth ?  This may help in 
large + busy guests. Comments?    There  are perhaps other implications 
of doing all of this (like burning more host cpu cycles) but perhaps 
this can be configurable based on user's needs... e.g. fewer but large 
guests on a host with no over subscription.

Thanks
Vinod


> It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
>
> - About the migration thread, special attention was giving to try to
>    get the series review-able (reviewers would tell if I got it).
>
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>    except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>    we walk the state with the iothread hold, and copy everything to one buffer.
>    then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
>
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got
> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
>
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>    I use the trick of using a bottom half to get that working correctly.
>    but this _implementation_ is ugly as hell.  Is there an easy way
>    of doing it?
> - Do I really have to export last_ram_offset(), there is no other way
>    of knowing the ammount of RAM?
>
> Known issues:
>
> - for some reason, when it has to start a 2nd round of bitmap
>    handling, it decides to dirty all pages.  Haven't found still why
>    this happens.
>
> If you can test it, and said me where it breaks, it would also help.
>
> Work is based on Umesh thread work, and work that Paolo Bonzini had
> work on top of that.  All the mirgation thread was done from scratch
> becase I was unable to debug why it was failing, but it "owes" a lot
> to the previous design.
>
> Thanks in advance, Juan.
>
> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>
>    Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
>
> are available in the git repository at:
>
>
>    http://repo.or.cz/r/qemu/quintela.git  migration-thread-v1
>
> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>
>    buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
>
>
> Juan Quintela (23):
>    buffered_file: g_realloc() can't fail
>    savevm: Factorize ram globals reset in its own function
>    ram: introduce migration_bitmap_set_dirty()
>    ram: Introduce migration_bitmap_test_and_reset_dirty()
>    ram: Export last_ram_offset()
>    ram: introduce migration_bitmap_sync()
>    Separate migration bitmap
>    buffered_file: rename opaque to migration_state
>    buffered_file: opaque is MigrationState
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: unfold migrate_fd_put_ready
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: unfold migrate_fd_put_buffer
>    buffered_file: We can access directly to bandwidth_limit
>    buffered_file: Move from using a timer to use a thread
>    migration: make qemu_fopen_ops_buffered() return void
>    migration: stop all cpus correctly
>    migration: make writes blocking
>    migration: remove unfreeze logic
>    migration: take finer locking
>    buffered_file: Unfold the trick to restart generating migration data
>    buffered_file: don't flush on put buffer
>    buffered_file: unfold buffered_append in buffered_put_buffer
>
> Paolo Bonzini (2):
>    split MRU ram list
>    BufferedFile: append, then flush
>
> Umesh Deshpande (2):
>    add a version number to ram_list
>    protect the ramlist with a separate mutex
>
>   arch_init.c      |  108 +++++++++++++++++++++++++-------
>   buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>   buffered_file.h  |   12 +---
>   cpu-all.h        |   17 +++++-
>   exec-obsolete.h  |   10 ---
>   exec.c           |   45 +++++++++++---
>   migration-exec.c |    2 -
>   migration-fd.c   |    6 --
>   migration-tcp.c  |    2 +-
>   migration-unix.c |    2 -
>   migration.c      |  111 ++++++++++++++-------------------
>   migration.h      |    6 ++
>   qemu-file.h      |    5 --
>   savevm.c         |    5 --
>   14 files changed, 249 insertions(+), 261 deletions(-)
>
> -- 
> 1.7.10.4
>
>
>
>
>



[-- Attachment #2: Type: text/html, Size: 9329 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] Fwd:  [RFC 00/27] Migration thread (WIP)
  2012-07-26 18:41   ` [Qemu-devel] Fwd: " Chegu Vinod
@ 2012-07-26 21:26     ` Chegu Vinod
  2012-07-27 11:05       ` Juan Quintela
  0 siblings, 1 reply; 45+ messages in thread
From: Chegu Vinod @ 2012-07-26 21:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, Juan Jose Quintela Carreira

[-- Attachment #1: Type: text/plain, Size: 8066 bytes --]

On 7/26/2012 11:41 AM, Chegu Vinod wrote:
>
>
>> -------- Original Message --------
>> Subject: 	[Qemu-devel] [RFC 00/27] Migration thread (WIP)
>> Date: 	Tue, 24 Jul 2012 20:36:25 +0200
>> From: 	Juan Quintela <quintela@redhat.com>
>> To: 	qemu-devel@nongnu.org
>>
>>
>>
>> Hi
>>
>> This series are on top of the migration-next-v5 series just posted.
>>
>> First of all, this is an RFC/Work in progress.  Just a lot of people
>> asked for it, and I would like review of the design.
> Hello,
>
> Thanks for sharing this early/WIP version for evaluation.
>
> Still in the middle of  code review..but wanted to share a couple of 
> quick  observations.
> 'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G and 
> downtime 2s).
> Once with no workload (i.e. idle guest) and the second was with a 
> SpecJBB running in the guest.
>
> The idle guest case seemed to migrate fine...
>
>
> capabilities: xbzrle: off
> Migration status: completed
> transferred ram: 3811345 kbytes
> remaining ram: 0 kbytes
> total ram: 134226368 kbytes
> total time: 199743 milliseconds
>
>
> In the case of the SpecJBB I ran into issues during stage 3...the 
> source host's qemu and the guest hung. I need to debug this more... 
> (if  already have some hints pl. let me know.).
>
>
> capabilities: xbzrle: off
> Migration status: active
> transferred ram: 127618578 kbytes
> remaining ram: 2386832 kbytes
> total ram: 134226368 kbytes
> total time: 526139 milliseconds
> (qemu) qemu_savevm_state_complete called
> qemu_savevm_state_complete calling ram_save_complete
>
> <---  hung somewhere after this ('need to get more info).
>


Appears to be some race condition...as there are cases when it hangs and 
in some cases it succeeds.

(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
transferred ram: 129937687 kbytes
remaining ram: 0 kbytes
total ram: 134226368 kbytes
total time: 543228 milliseconds

Need to review/debug...

Vinod


>
> ---
>
> As with the non-migration-thread version the Specjbb workload 
> completed before the migration attempted to move to stage 3 (i.e. 
> didn't converge while the workload was still active).
>
> BTW, with this version of the bits (i.e. while running SpecJBB which 
> is supposed to dirty quite a bit of memory) I noticed that there 
> wasn't much change in the b/w usage of the dedicated 10Gb private 
> network link (It was still < ~1.5-3.0Gb/sec). Expected this to be a 
> little better since we have a separate thread...  not sure what else 
> is in play here ? (numa locality of where the migration thread runs or 
> something other basic tuning in the implementation ?)
>
> 'have a hi-level design question... (perhaps folks have already 
> thought about it..and categorized it as potential future optimization..?)
>
> Would it be possible to off load the iothread completely [from all 
> migration related activity] and have one thread (with the appropriate 
> protection) get involved with getting the list of the dirty pages ? 
> Have one or more threads dedicated for trying to push multiple streams 
> of data to saturate the allocated network bandwidth ?  This may help 
> in large + busy guests. Comments? There  are perhaps other 
> implications of doing all of this (like burning more host cpu cycles) 
> but perhaps this can be configurable based on user's needs... e.g. 
> fewer but large guests on a host with no over subscription.
>
> Thanks
> Vinod
>
>
>> It does:
>> - get a new bitmap for migration, and that bitmap uses 1 bit by page
>> - it unfolds migration_buffered_file.  Only one user existed.
>> - it simplifies buffered_file a lot.
>>
>> - About the migration thread, special attention was giving to try to
>>    get the series review-able (reviewers would tell if I got it).
>>
>> Basic design:
>> - we create a new thread instead of a timer function
>> - we move all the migration work to that thread (but run everything
>>    except the waits with the iothread lock.
>> - we move all the writting to outside the iothread lock.  i.e.
>>    we walk the state with the iothread hold, and copy everything to one buffer.
>>    then we write that buffer to the sockets outside the iothread lock.
>> - once here, we move to writting synchronously to the sockets.
>> - this allows us to simplify quite a lot.
>>
>> And basically, that is it.  Notice that we still do the iterate page
>> walking with the iothread held.  Light testing show that we got
>> similar speed and latencies than without the thread (notice that
>> almost no optimizations done here yet).
>>
>> Appart of the review:
>> - Are there any locking issues that I have missed (I guess so)
>> - stop all cpus correctly.  vm_stop should be called from the iothread,
>>    I use the trick of using a bottom half to get that working correctly.
>>    but this _implementation_ is ugly as hell.  Is there an easy way
>>    of doing it?
>> - Do I really have to export last_ram_offset(), there is no other way
>>    of knowing the ammount of RAM?
>>
>> Known issues:
>>
>> - for some reason, when it has to start a 2nd round of bitmap
>>    handling, it decides to dirty all pages.  Haven't found still why
>>    this happens.
>>
>> If you can test it, and said me where it breaks, it would also help.
>>
>> Work is based on Umesh thread work, and work that Paolo Bonzini had
>> work on top of that.  All the mirgation thread was done from scratch
>> becase I was unable to debug why it was failing, but it "owes" a lot
>> to the previous design.
>>
>> Thanks in advance, Juan.
>>
>> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>>
>>    Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
>>
>> are available in the git repository at:
>>
>>
>>    http://repo.or.cz/r/qemu/quintela.git  migration-thread-v1
>>
>> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>>
>>    buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
>>
>>
>> Juan Quintela (23):
>>    buffered_file: g_realloc() can't fail
>>    savevm: Factorize ram globals reset in its own function
>>    ram: introduce migration_bitmap_set_dirty()
>>    ram: Introduce migration_bitmap_test_and_reset_dirty()
>>    ram: Export last_ram_offset()
>>    ram: introduce migration_bitmap_sync()
>>    Separate migration bitmap
>>    buffered_file: rename opaque to migration_state
>>    buffered_file: opaque is MigrationState
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: unfold migrate_fd_put_ready
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: We can access directly to bandwidth_limit
>>    buffered_file: Move from using a timer to use a thread
>>    migration: make qemu_fopen_ops_buffered() return void
>>    migration: stop all cpus correctly
>>    migration: make writes blocking
>>    migration: remove unfreeze logic
>>    migration: take finer locking
>>    buffered_file: Unfold the trick to restart generating migration data
>>    buffered_file: don't flush on put buffer
>>    buffered_file: unfold buffered_append in buffered_put_buffer
>>
>> Paolo Bonzini (2):
>>    split MRU ram list
>>    BufferedFile: append, then flush
>>
>> Umesh Deshpande (2):
>>    add a version number to ram_list
>>    protect the ramlist with a separate mutex
>>
>>   arch_init.c      |  108 +++++++++++++++++++++++++-------
>>   buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>>   buffered_file.h  |   12 +---
>>   cpu-all.h        |   17 +++++-
>>   exec-obsolete.h  |   10 ---
>>   exec.c           |   45 +++++++++++---
>>   migration-exec.c |    2 -
>>   migration-fd.c   |    6 --
>>   migration-tcp.c  |    2 +-
>>   migration-unix.c |    2 -
>>   migration.c      |  111 ++++++++++++++-------------------
>>   migration.h      |    6 ++
>>   qemu-file.h      |    5 --
>>   savevm.c         |    5 --
>>   14 files changed, 249 insertions(+), 261 deletions(-)
>>
>> -- 
>> 1.7.10.4
>>
>>
>>
>>
>>
>
>


[-- Attachment #2: Type: text/html, Size: 10399 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
                   ` (29 preceding siblings ...)
       [not found] ` <500EF579.5040607@redhat.com>
@ 2012-07-26 21:36 ` Michael Roth
  2012-08-02 12:01   ` Juan Quintela
  30 siblings, 1 reply; 45+ messages in thread
From: Michael Roth @ 2012-07-26 21:36 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On Tue, Jul 24, 2012 at 08:36:25PM +0200, Juan Quintela wrote:
> Hi
> 
> This series are on top of the migration-next-v5 series just posted.
> 
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
> 
> It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
> 
> - About the migration thread, special attention was giving to try to
>   get the series review-able (reviewers would tell if I got it).
> 
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>   except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>   we walk the state with the iothread hold, and copy everything to one buffer.
>   then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
> 
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got

Is the plan to eventually hold the iothread lock only to re-sync the
dirty bitmap, and then rely on qemu_mutex_lock_ramlist() to walk the
ramlist?

It seems like the non-MRU block list, "local" migration
bitmap copy, and ramlist mutex are all toward this end, but we still
have basically the same locking protocol as before. If not, can you elaborate
on what purpose they serve in this series?

> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
> 
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>   I use the trick of using a bottom half to get that working correctly.
>   but this _implementation_ is ugly as hell.  Is there an easy way
>   of doing it?
> - Do I really have to export last_ram_offset(), there is no other way
>   of knowing the ammount of RAM?
> 
> Known issues:
> 
> - for some reason, when it has to start a 2nd round of bitmap
>   handling, it decides to dirty all pages.  Haven't found still why
>   this happens.
> 
> If you can test it, and said me where it breaks, it would also help.
> 
> Work is based on Umesh thread work, and work that Paolo Bonzini had
> work on top of that.  All the mirgation thread was done from scratch
> becase I was unable to debug why it was failing, but it "owes" a lot
> to the previous design.
> 
> Thanks in advance, Juan.
> 
> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
> 
>   Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
> 
> are available in the git repository at:
> 
> 
>   http://repo.or.cz/r/qemu/quintela.git migration-thread-v1
> 
> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
> 
>   buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
> 
> 
> Juan Quintela (23):
>   buffered_file: g_realloc() can't fail
>   savevm: Factorize ram globals reset in its own function
>   ram: introduce migration_bitmap_set_dirty()
>   ram: Introduce migration_bitmap_test_and_reset_dirty()
>   ram: Export last_ram_offset()
>   ram: introduce migration_bitmap_sync()
>   Separate migration bitmap
>   buffered_file: rename opaque to migration_state
>   buffered_file: opaque is MigrationState
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_ready
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: We can access directly to bandwidth_limit
>   buffered_file: Move from using a timer to use a thread
>   migration: make qemu_fopen_ops_buffered() return void
>   migration: stop all cpus correctly
>   migration: make writes blocking
>   migration: remove unfreeze logic
>   migration: take finer locking
>   buffered_file: Unfold the trick to restart generating migration data
>   buffered_file: don't flush on put buffer
>   buffered_file: unfold buffered_append in buffered_put_buffer
> 
> Paolo Bonzini (2):
>   split MRU ram list
>   BufferedFile: append, then flush
> 
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
> 
>  arch_init.c      |  108 +++++++++++++++++++++++++-------
>  buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>  buffered_file.h  |   12 +---
>  cpu-all.h        |   17 +++++-
>  exec-obsolete.h  |   10 ---
>  exec.c           |   45 +++++++++++---
>  migration-exec.c |    2 -
>  migration-fd.c   |    6 --
>  migration-tcp.c  |    2 +-
>  migration-unix.c |    2 -
>  migration.c      |  111 ++++++++++++++-------------------
>  migration.h      |    6 ++
>  qemu-file.h      |    5 --
>  savevm.c         |    5 --
>  14 files changed, 249 insertions(+), 261 deletions(-)
> 
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] Fwd: [RFC 00/27] Migration thread (WIP)
  2012-07-26 21:26     ` Chegu Vinod
@ 2012-07-27 11:05       ` Juan Quintela
       [not found]         ` <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
  0 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2012-07-27 11:05 UTC (permalink / raw)
  To: Chegu Vinod; +Cc: Orit Wasserman, qemu-devel

Chegu Vinod <chegu_vinod@hp.com> wrote:
> On 7/26/2012 11:41 AM, Chegu Vinod wrote:
>
>
>     
>         
>         -------- Original Message -------- 
>
>                                                                    
>          Subject:  [Qemu-devel] [RFC 00/27] Migration thread (WIP) 
>                                                                    
>          Date:     Tue, 24 Jul 2012 20:36:25 +0200                 
>                                                                    
>          From:     Juan Quintela <quintela@redhat.com>             
>                                                                    
>          To:       qemu-devel@nongnu.org                           
>                                                                    
>
>         
>         
>         Hi
>
> This series are on top of the migration-next-v5 series just posted.
>
> First of all, this is an RFC/Work in progress.  Just a lot of people
> asked for it, and I would like review of the design.
>
>     Hello,
>     
>     Thanks for sharing this early/WIP version for evaluation. 
>     
>     Still in the middle of  code review..but wanted to share a couple
>     of quick  observations.
>     'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G
>     and downtime 2s). 
>     Once with no workload (i.e. idle guest) and the second was with a
>     SpecJBB running in the guest.  
>     
>     The idle guest case seemed to migrate fine...
>     
>     
>     capabilities: xbzrle: off
>     Migration status: completed
>     transferred ram: 3811345 kbytes
>     remaining ram: 0 kbytes
>     total ram: 134226368 kbytes
>     total time: 199743 milliseconds
>     
>     
>     In the case of the SpecJBB I ran into issues during stage 3...the
>     source host's qemu and the guest hung. I need to debug this
>     more... (if  already have some hints pl. let me know.).
>     
>     
>     capabilities: xbzrle: off 
>     Migration status: active
>     transferred ram: 127618578 kbytes
>     remaining ram: 2386832 kbytes
>     total ram: 134226368 kbytes
>     total time: 526139 milliseconds
>     (qemu) qemu_savevm_state_complete called 
>     qemu_savevm_state_complete calling ram_save_complete
>      
>     <---  hung somewhere after this ('need to get more info).
>     
>     
>
>
> Appears to be some race condition...as there are cases when it hangs
> and in some cases it succeeds.

Weird guess, try to use less vcpus, same ram.  The way that we stop cpus
is _hacky_ to say it somewhere.  Will try to think about that part.

Thanks for the testing.  All my testing has been done with 8GB guests
and 2vcps.  Will try with more vcpus to see if it makes a difference.




>
> (qemu) info migrate
> capabilities: xbzrle: off 
> Migration status: completed
> transferred ram: 129937687 kbytes
> remaining ram: 0 kbytes
> total ram: 134226368 kbytes
> total time: 543228 milliseconds

Humm, _that_ is more strange.  This means that it finished.  Could you
run qemu under gdb and sent me the stack traces?

I don't know your gdb thread kung-fu, so here are the instructions just
in case:

gdb --args <exact qemu commandh line you used>
C-c to break when it hangs
(gdb)info threads
you see all the threads running
(gdb)thread 1
or whatever other number
(gdb)bt
the backtrace of that thread.

I am specially interested in the backtrace of the migration thread and
of the iothread.

Thanks, Juan.

>
> Need to review/debug...
>
> Vinod
>
>
>
>     ---
>     
>     As with the non-migration-thread version the Specjbb workload
>     completed before the migration attempted to move to stage 3 (i.e.
>     didn't converge while the workload was still active). 
>     
>     BTW, with this version of the bits (i.e. while running SpecJBB
>     which is supposed to dirty quite a bit of memory) I noticed that
>     there wasn't much change in the b/w usage of the dedicated 10Gb
>     private network link (It was still < ~1.5-3.0Gb/sec).   Expected
>     this to be a little better since we have a separate thread...  not
>     sure what else is in play here ? (numa locality of where the
>     migration thread runs or something other basic tuning in the
>     implementation ?)
>     
>     'have a hi-level design question... (perhaps folks have already
>     thought about it..and categorized it as potential future
>     optimization..?)
>     
>     Would it be possible to off load the iothread completely [from all
>     migration related activity] and have one thread (with the
>     appropriate protection) get involved with getting the list of the
>     dirty pages ? Have one or more threads dedicated for trying to
>     push multiple streams of data to saturate the allocated network
>     bandwidth ?  This may help in large + busy guests. Comments?   
>     There  are perhaps other implications of doing all of this (like
>     burning more host cpu cycles) but perhaps this can be configurable
>     based on user's needs... e.g. fewer but large guests on a host
>     with no over subscription. 
>     
>     Thanks
>     Vinod
>     
>     
>         
>         
>         It does:
> - get a new bitmap for migration, and that bitmap uses 1 bit by page
> - it unfolds migration_buffered_file.  Only one user existed.
> - it simplifies buffered_file a lot.
>
> - About the migration thread, special attention was giving to try to
>   get the series review-able (reviewers would tell if I got it).
>
> Basic design:
> - we create a new thread instead of a timer function
> - we move all the migration work to that thread (but run everything
>   except the waits with the iothread lock.
> - we move all the writting to outside the iothread lock.  i.e.
>   we walk the state with the iothread hold, and copy everything to one buffer.
>   then we write that buffer to the sockets outside the iothread lock.
> - once here, we move to writting synchronously to the sockets.
> - this allows us to simplify quite a lot.
>
> And basically, that is it.  Notice that we still do the iterate page
> walking with the iothread held.  Light testing show that we got
> similar speed and latencies than without the thread (notice that
> almost no optimizations done here yet).
>
> Appart of the review:
> - Are there any locking issues that I have missed (I guess so)
> - stop all cpus correctly.  vm_stop should be called from the iothread,
>   I use the trick of using a bottom half to get that working correctly.
>   but this _implementation_ is ugly as hell.  Is there an easy way
>   of doing it?
> - Do I really have to export last_ram_offset(), there is no other way
>   of knowing the ammount of RAM?
>
> Known issues:
>
> - for some reason, when it has to start a 2nd round of bitmap
>   handling, it decides to dirty all pages.  Haven't found still why
>   this happens.
>
> If you can test it, and said me where it breaks, it would also help.
>
> Work is based on Umesh thread work, and work that Paolo Bonzini had
> work on top of that.  All the mirgation thread was done from scratch
> becase I was unable to debug why it was failing, but it "owes" a lot
> to the previous design.
>
> Thanks in advance, Juan.
>
> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>
>   Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23 13:15:34 -0500)
>
> are available in the git repository at:
>
>
>   http://repo.or.cz/r/qemu/quintela.git migration-thread-v1
>
> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>
>   buffered_file: unfold buffered_append in buffered_put_buffer (2012-07-24 16:46:13 +0200)
>
>
> Juan Quintela (23):
>   buffered_file: g_realloc() can't fail
>   savevm: Factorize ram globals reset in its own function
>   ram: introduce migration_bitmap_set_dirty()
>   ram: Introduce migration_bitmap_test_and_reset_dirty()
>   ram: Export last_ram_offset()
>   ram: introduce migration_bitmap_sync()
>   Separate migration bitmap
>   buffered_file: rename opaque to migration_state
>   buffered_file: opaque is MigrationState
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_ready
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: unfold migrate_fd_put_buffer
>   buffered_file: We can access directly to bandwidth_limit
>   buffered_file: Move from using a timer to use a thread
>   migration: make qemu_fopen_ops_buffered() return void
>   migration: stop all cpus correctly
>   migration: make writes blocking
>   migration: remove unfreeze logic
>   migration: take finer locking
>   buffered_file: Unfold the trick to restart generating migration data
>   buffered_file: don't flush on put buffer
>   buffered_file: unfold buffered_append in buffered_put_buffer
>
> Paolo Bonzini (2):
>   split MRU ram list
>   BufferedFile: append, then flush
>
> Umesh Deshpande (2):
>   add a version number to ram_list
>   protect the ramlist with a separate mutex
>
>  arch_init.c      |  108 +++++++++++++++++++++++++-------
>  buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>  buffered_file.h  |   12 +---
>  cpu-all.h        |   17 +++++-
>  exec-obsolete.h  |   10 ---
>  exec.c           |   45 +++++++++++---
>  migration-exec.c |    2 -
>  migration-fd.c   |    6 --
>  migration-tcp.c  |    2 +-
>  migration-unix.c |    2 -
>  migration.c      |  111 ++++++++++++++-------------------
>  migration.h      |    6 ++
>  qemu-file.h      |    5 --
>  savevm.c         |    5 --
>  14 files changed, 249 insertions(+), 261 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] FW: Fwd: [RFC 00/27] Migration thread (WIP)
       [not found]         ` <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
@ 2012-07-27 14:21           ` Chegu Vinod
  0 siblings, 0 replies; 45+ messages in thread
From: Chegu Vinod @ 2012-07-27 14:21 UTC (permalink / raw)
  To: Juan Jose Quintela Carreira; +Cc: Orit Wasserman, qemu-devel

On 7/27/2012 7:11 AM, Vinod, Chegu wrote:
>
> -----Original Message-----
> From: Juan Quintela [mailto:quintela@redhat.com]
> Sent: Friday, July 27, 2012 4:06 AM
> To: Vinod, Chegu
> Cc: qemu-devel@nongnu.org; Orit Wasserman
> Subject: Re: Fwd: [RFC 00/27] Migration thread (WIP)
>
> Chegu Vinod <chegu_vinod@hp.com> wrote:
>> On 7/26/2012 11:41 AM, Chegu Vinod wrote:
>>
>>
>>      
>>          
>>          -------- Original Message --------
>>
>>                                                                     
>>           Subject:  [Qemu-devel] [RFC 00/27] Migration thread (WIP)
>>                                                                     
>>           Date:     Tue, 24 Jul 2012 20:36:25 +0200
>>                                                                     
>>           From:     Juan Quintela <quintela@redhat.com>
>>                                                                     
>>           To:       qemu-devel@nongnu.org
>>                                                                     
>>
>>          
>>          
>>          Hi
>>
>> This series are on top of the migration-next-v5 series just posted.
>>
>> First of all, this is an RFC/Work in progress.  Just a lot of people
>> asked for it, and I would like review of the design.
>>
>>      Hello,
>>      
>>      Thanks for sharing this early/WIP version for evaluation.
>>      
>>      Still in the middle of  code review..but wanted to share a couple
>>      of quick  observations.
>>      'tried to use it to migrate a 128G/10VCPU guest (speed set to 10G
>>      and downtime 2s).
>>      Once with no workload (i.e. idle guest) and the second was with a
>>      SpecJBB running in the guest.
>>      
>>      The idle guest case seemed to migrate fine...
>>      
>>      
>>      capabilities: xbzrle: off
>>      Migration status: completed
>>      transferred ram: 3811345 kbytes
>>      remaining ram: 0 kbytes
>>      total ram: 134226368 kbytes
>>      total time: 199743 milliseconds
>>      
>>      
>>      In the case of the SpecJBB I ran into issues during stage 3...the
>>      source host's qemu and the guest hung. I need to debug this
>>      more... (if  already have some hints pl. let me know.).
>>      
>>      
>>      capabilities: xbzrle: off
>>      Migration status: active
>>      transferred ram: 127618578 kbytes
>>      remaining ram: 2386832 kbytes
>>      total ram: 134226368 kbytes
>>      total time: 526139 milliseconds
>>      (qemu) qemu_savevm_state_complete called
>>      qemu_savevm_state_complete calling ram_save_complete
>>       
>>      <---  hung somewhere after this ('need to get more info).
>>      
>>      
>>
>>
>> Appears to be some race condition...as there are cases when it hangs
>> and in some cases it succeeds.
> Weird guess, try to use less vcpus, same ram.

Ok..will try that.
> The way that we stop cpus is _hacky_ to say it somewhere.  Will try to think about that part.
Ok.
> Thanks for the testing.  All my testing has been done with 8GB guests and 2vcps.  Will try with more vcpus to see if it makes a difference.
>
>
>
>
>> (qemu) info migrate
>> capabilities: xbzrle: off
>> Migration status: completed
>> transferred ram: 129937687 kbytes
>> remaining ram: 0 kbytes
>> total ram: 134226368 kbytes
>> total time: 543228 milliseconds
> Humm, _that_ is more strange.  This means that it finished.

There are cases where the migration is finishing just fine... even with 
larger guest configurations (256G/20VCPUs).


>   Could you run qemu under gdb and sent me the stack traces?
>
> I don't know your gdb thread kung-fu, so here are the instructions just in case:
>
> gdb --args <exact qemu commandh line you used> C-c to break when it hangs (gdb)info threads you see all the threads running (gdb)thread 1 or whatever other number (gdb)bt the backtrace of that thread.

The hang is intermittent...
I ran it 4-5 times (under gdb) just now and I didn't see the issue :-(


> I am specially interested in the backtrace of the migration thread and of the iothread.

Will keep re-trying with different configs. and see if i get lucky in 
reproducing it (under gdb).

Vinod
>
> Thanks, Juan.
>
>> Need to review/debug...
>>
>> Vinod
>>
>>
>>
>>      ---
>>      
>>      As with the non-migration-thread version the Specjbb workload
>>      completed before the migration attempted to move to stage 3 (i.e.
>>      didn't converge while the workload was still active).
>>      
>>      BTW, with this version of the bits (i.e. while running SpecJBB
>>      which is supposed to dirty quite a bit of memory) I noticed that
>>      there wasn't much change in the b/w usage of the dedicated 10Gb
>>      private network link (It was still < ~1.5-3.0Gb/sec).   Expected
>>      this to be a little better since we have a separate thread...  not
>>      sure what else is in play here ? (numa locality of where the
>>      migration thread runs or something other basic tuning in the
>>      implementation ?)
>>      
>>      'have a hi-level design question... (perhaps folks have already
>>      thought about it..and categorized it as potential future
>>      optimization..?)
>>      
>>      Would it be possible to off load the iothread completely [from all
>>      migration related activity] and have one thread (with the
>>      appropriate protection) get involved with getting the list of the
>>      dirty pages ? Have one or more threads dedicated for trying to
>>      push multiple streams of data to saturate the allocated network
>>      bandwidth ?  This may help in large + busy guests. Comments?
>>      There  are perhaps other implications of doing all of this (like
>>      burning more host cpu cycles) but perhaps this can be configurable
>>      based on user's needs... e.g. fewer but large guests on a host
>>      with no over subscription.
>>      
>>      Thanks
>>      Vinod
>>      
>>      
>>          
>>          
>>          It does:
>> - get a new bitmap for migration, and that bitmap uses 1 bit by page
>> - it unfolds migration_buffered_file.  Only one user existed.
>> - it simplifies buffered_file a lot.
>>
>> - About the migration thread, special attention was giving to try to
>>    get the series review-able (reviewers would tell if I got it).
>>
>> Basic design:
>> - we create a new thread instead of a timer function
>> - we move all the migration work to that thread (but run everything
>>    except the waits with the iothread lock.
>> - we move all the writting to outside the iothread lock.  i.e.
>>    we walk the state with the iothread hold, and copy everything to one buffer.
>>    then we write that buffer to the sockets outside the iothread lock.
>> - once here, we move to writting synchronously to the sockets.
>> - this allows us to simplify quite a lot.
>>
>> And basically, that is it.  Notice that we still do the iterate page
>> walking with the iothread held.  Light testing show that we got
>> similar speed and latencies than without the thread (notice that
>> almost no optimizations done here yet).
>>
>> Appart of the review:
>> - Are there any locking issues that I have missed (I guess so)
>> - stop all cpus correctly.  vm_stop should be called from the iothread,
>>    I use the trick of using a bottom half to get that working correctly.
>>    but this _implementation_ is ugly as hell.  Is there an easy way
>>    of doing it?
>> - Do I really have to export last_ram_offset(), there is no other way
>>    of knowing the ammount of RAM?
>>
>> Known issues:
>>
>> - for some reason, when it has to start a 2nd round of bitmap
>>    handling, it decides to dirty all pages.  Haven't found still why
>>    this happens.
>>
>> If you can test it, and said me where it breaks, it would also help.
>>
>> Work is based on Umesh thread work, and work that Paolo Bonzini had
>> work on top of that.  All the mirgation thread was done from scratch
>> becase I was unable to debug why it was failing, but it "owes" a lot
>> to the previous design.
>>
>> Thanks in advance, Juan.
>>
>> The following changes since commit a21143486b9c6d7a50b7b62877c02b3c686943cb:
>>
>>    Merge remote-tracking branch 'stefanha/net' into staging (2012-07-23
>> 13:15:34 -0500)
>>
>> are available in the git repository at:
>>
>>
>>    http://repo.or.cz/r/qemu/quintela.git migration-thread-v1
>>
>> for you to fetch changes up to 27e539b03ba97bc37e107755bcb44511ec4c8100:
>>
>>    buffered_file: unfold buffered_append in buffered_put_buffer
>> (2012-07-24 16:46:13 +0200)
>>
>>
>> Juan Quintela (23):
>>    buffered_file: g_realloc() can't fail
>>    savevm: Factorize ram globals reset in its own function
>>    ram: introduce migration_bitmap_set_dirty()
>>    ram: Introduce migration_bitmap_test_and_reset_dirty()
>>    ram: Export last_ram_offset()
>>    ram: introduce migration_bitmap_sync()
>>    Separate migration bitmap
>>    buffered_file: rename opaque to migration_state
>>    buffered_file: opaque is MigrationState
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: unfold migrate_fd_put_ready
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: unfold migrate_fd_put_buffer
>>    buffered_file: We can access directly to bandwidth_limit
>>    buffered_file: Move from using a timer to use a thread
>>    migration: make qemu_fopen_ops_buffered() return void
>>    migration: stop all cpus correctly
>>    migration: make writes blocking
>>    migration: remove unfreeze logic
>>    migration: take finer locking
>>    buffered_file: Unfold the trick to restart generating migration data
>>    buffered_file: don't flush on put buffer
>>    buffered_file: unfold buffered_append in buffered_put_buffer
>>
>> Paolo Bonzini (2):
>>    split MRU ram list
>>    BufferedFile: append, then flush
>>
>> Umesh Deshpande (2):
>>    add a version number to ram_list
>>    protect the ramlist with a separate mutex
>>
>>   arch_init.c      |  108 +++++++++++++++++++++++++-------
>>   buffered_file.c  |  179 +++++++++++++++++-------------------------------------
>>   buffered_file.h  |   12 +---
>>   cpu-all.h        |   17 +++++-
>>   exec-obsolete.h  |   10 ---
>>   exec.c           |   45 +++++++++++---
>>   migration-exec.c |    2 -
>>   migration-fd.c   |    6 --
>>   migration-tcp.c  |    2 +-
>>   migration-unix.c |    2 -
>>   migration.c      |  111 ++++++++++++++-------------------
>>   migration.h      |    6 ++
>>   qemu-file.h      |    5 --
>>   savevm.c         |    5 --
>>   14 files changed, 249 insertions(+), 261 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [RFC 00/27] Migration thread (WIP)
  2012-07-26 21:36 ` [Qemu-devel] " Michael Roth
@ 2012-08-02 12:01   ` Juan Quintela
  0 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2012-08-02 12:01 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel

Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Tue, Jul 24, 2012 at 08:36:25PM +0200, Juan Quintela wrote:
>> Hi
>> 
>> This series are on top of the migration-next-v5 series just posted.
>> 
>> First of all, this is an RFC/Work in progress.  Just a lot of people
>> asked for it, and I would like review of the design.
>> 
>> It does:
>> - get a new bitmap for migration, and that bitmap uses 1 bit by page
>> - it unfolds migration_buffered_file.  Only one user existed.
>> - it simplifies buffered_file a lot.
>> 
>> - About the migration thread, special attention was giving to try to
>>   get the series review-able (reviewers would tell if I got it).
>> 
>> Basic design:
>> - we create a new thread instead of a timer function
>> - we move all the migration work to that thread (but run everything
>>   except the waits with the iothread lock.
>> - we move all the writting to outside the iothread lock.  i.e.
>>   we walk the state with the iothread hold, and copy everything to one buffer.
>>   then we write that buffer to the sockets outside the iothread lock.
>> - once here, we move to writting synchronously to the sockets.
>> - this allows us to simplify quite a lot.
>> 
>> And basically, that is it.  Notice that we still do the iterate page
>> walking with the iothread held.  Light testing show that we got
>
> Is the plan to eventually hold the iothread lock only to re-sync the
> dirty bitmap, and then rely on qemu_mutex_lock_ramlist() to walk the
> ramlist?

Yeap.  I want to drop the walking of the RAM without iothread lock, but
aren't yet there.  This series basically move:
- all migration operations are done in its own thread
- all copy from "guest" to "buffer" is done with the io thread lock (to
  call it something)
- all writes from that buffer to the socket/fd is done synchronously and
  without the iothread
- we measure bandwith/downtime more correctly (still not perfect)

> It seems like the non-MRU block list, "local" migration
> bitmap copy, and ramlist mutex are all toward this end, but we still
> have basically the same locking protocol as before. If not, can you elaborate
> on what purpose they serve in this series?

The problem to move the "live part" outside is that we still miss some
locking issues.

Later, Juan.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-08-02 12:03 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 18:36 [Qemu-devel] [RFC 00/27] Migration thread (WIP) Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 01/27] buffered_file: g_realloc() can't fail Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 02/27] split MRU ram list Juan Quintela
2012-07-25 20:20   ` Michael Roth
2012-07-26 13:19     ` Avi Kivity
2012-07-24 18:36 ` [Qemu-devel] [PATCH 03/27] savevm: Factorize ram globals reset in its own function Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 04/27] add a version number to ram_list Juan Quintela
2012-07-25 23:27   ` Michael Roth
2012-07-26  9:19     ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 05/27] protect the ramlist with a separate mutex Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 06/27] ram: introduce migration_bitmap_set_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 07/27] ram: Introduce migration_bitmap_test_and_reset_dirty() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 08/27] ram: Export last_ram_offset() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 09/27] ram: introduce migration_bitmap_sync() Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 10/27] Separate migration bitmap Juan Quintela
2012-07-25  9:16   ` Avi Kivity
2012-07-26  9:22     ` Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 11/27] BufferedFile: append, then flush Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 12/27] buffered_file: rename opaque to migration_state Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 13/27] buffered_file: opaque is MigrationState Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 14/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 15/27] buffered_file: unfold migrate_fd_put_ready Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 16/27] buffered_file: unfold migrate_fd_put_buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 17/27] " Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 18/27] buffered_file: We can access directly to bandwidth_limit Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 19/27] buffered_file: Move from using a timer to use a thread Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 20/27] migration: make qemu_fopen_ops_buffered() return void Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 21/27] migration: stop all cpus correctly Juan Quintela
2012-07-26 12:54   ` Eric Blake
2012-07-24 18:36 ` [Qemu-devel] [PATCH 22/27] migration: make writes blocking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 23/27] migration: remove unfreeze logic Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 24/27] migration: take finer locking Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 25/27] buffered_file: Unfold the trick to restart generating migration data Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 26/27] buffered_file: don't flush on put buffer Juan Quintela
2012-07-24 18:36 ` [Qemu-devel] [PATCH 27/27] buffered_file: unfold buffered_append in buffered_put_buffer Juan Quintela
2012-07-25  9:55 ` [Qemu-devel] [RFC 00/27] Migration thread (WIP) Orit Wasserman
2012-07-26 10:57 ` Jan Kiszka
2012-07-26 11:16   ` Juan Quintela
2012-07-26 11:56     ` Jan Kiszka
     [not found] ` <500EF579.5040607@redhat.com>
2012-07-26 18:41   ` [Qemu-devel] Fwd: " Chegu Vinod
2012-07-26 21:26     ` Chegu Vinod
2012-07-27 11:05       ` Juan Quintela
     [not found]         ` <4168C988EBDF2141B4E0B6475B6A73D1165CDD@G6W2493.americas.hpqcorp.net>
2012-07-27 14:21           ` [Qemu-devel] FW: " Chegu Vinod
2012-07-26 21:36 ` [Qemu-devel] " Michael Roth
2012-08-02 12:01   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.