All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Separate thread for VM migration
@ 2011-08-27 18:09 ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

Following patch series deals with VCPU and iothread starvation during the
migration of a guest. Currently the iothread is responsible for performing the
guest migration. It holds qemu_mutex during the migration and doesn't allow VCPU
to enter the qemu mode and delays its return to the guest. The guest migration,
executed as an iohandler also delays the execution of other iohandlers.
In the following patch series,

The migration has been moved to a separate thread to
reduce the qemu_mutex contention and iohandler starvation.

Umesh Deshpande (5):
  vm_stop from non-io threads
  MRU ram block list
  migration thread mutex
  separate migration bitmap
  separate migration thread

 arch_init.c         |   38 +++++++++++++----
 buffered_file.c     |   76 ++++++++++++++++++---------------
 cpu-all.h           |   42 ++++++++++++++++++
 cpus.c              |    4 +-
 exec.c              |   97 ++++++++++++++++++++++++++++++++++++++++--
 migration.c         |  117 ++++++++++++++++++++++++++++++++-------------------
 migration.h         |    9 ++++
 qemu-common.h       |    2 +
 qemu-thread-posix.c |   10 ++++
 qemu-thread.h       |    1 +
 10 files changed, 302 insertions(+), 94 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 0/5] Separate thread for VM migration
@ 2011-08-27 18:09 ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

Following patch series deals with VCPU and iothread starvation during the
migration of a guest. Currently the iothread is responsible for performing the
guest migration. It holds qemu_mutex during the migration and doesn't allow VCPU
to enter the qemu mode and delays its return to the guest. The guest migration,
executed as an iohandler also delays the execution of other iohandlers.
In the following patch series,

The migration has been moved to a separate thread to
reduce the qemu_mutex contention and iohandler starvation.

Umesh Deshpande (5):
  vm_stop from non-io threads
  MRU ram block list
  migration thread mutex
  separate migration bitmap
  separate migration thread

 arch_init.c         |   38 +++++++++++++----
 buffered_file.c     |   76 ++++++++++++++++++---------------
 cpu-all.h           |   42 ++++++++++++++++++
 cpus.c              |    4 +-
 exec.c              |   97 ++++++++++++++++++++++++++++++++++++++++--
 migration.c         |  117 ++++++++++++++++++++++++++++++++-------------------
 migration.h         |    9 ++++
 qemu-common.h       |    2 +
 qemu-thread-posix.c |   10 ++++
 qemu-thread.h       |    1 +
 10 files changed, 302 insertions(+), 94 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] Support for vm_stop from the migration thread
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-27 18:09   ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

Currently, when any thread other than iothread calls vm_stop, it is scheduled to
be executed later by the iothread. This patch allows the execution of vm_stop
from threads other than iothread. This is especially helpful when the migration is
moved into a separate thread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 cpus.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index de70e02..f35f683 100644
--- a/cpus.c
+++ b/cpus.c
@@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
 {
     if (vm_running) {
         cpu_disable_ticks();
-        vm_running = 0;
         pause_all_vcpus();
+        vm_running = 0;
         vm_state_notify(0, reason);
         qemu_aio_flush();
         bdrv_flush_all();
@@ -1027,7 +1027,7 @@ void cpu_stop_current(void)
 
 void vm_stop(int reason)
 {
-    if (!qemu_thread_is_self(&io_thread)) {
+    if (cpu_single_env) {
         qemu_system_vmstop_request(reason);
         /*
          * FIXME: should not return to device code in case
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 1/5] Support for vm_stop from the migration thread
@ 2011-08-27 18:09   ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

Currently, when any thread other than iothread calls vm_stop, it is scheduled to
be executed later by the iothread. This patch allows the execution of vm_stop
from threads other than iothread. This is especially helpful when the migration is
moved into a separate thread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 cpus.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index de70e02..f35f683 100644
--- a/cpus.c
+++ b/cpus.c
@@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
 {
     if (vm_running) {
         cpu_disable_ticks();
-        vm_running = 0;
         pause_all_vcpus();
+        vm_running = 0;
         vm_state_notify(0, reason);
         qemu_aio_flush();
         bdrv_flush_all();
@@ -1027,7 +1027,7 @@ void cpu_stop_current(void)
 
 void vm_stop(int reason)
 {
-    if (!qemu_thread_is_self(&io_thread)) {
+    if (cpu_single_env) {
         qemu_system_vmstop_request(reason);
         /*
          * FIXME: should not return to device code in case
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] MRU ram block list
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-27 18:09   ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande, Paolo Bonzini

This patch creates a new list of RAM blocks in MRU order. So that separate
locking rules can be applied to the regular RAM block list and the MRU list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-all.h |    2 ++
 exec.c    |   17 ++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e839100..6b217a2 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -925,6 +925,7 @@ typedef struct RAMBlock {
     uint32_t flags;
     char idstr[256];
     QLIST_ENTRY(RAMBlock) next;
+    QLIST_ENTRY(RAMBlock) next_mru;
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
@@ -933,6 +934,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
     uint8_t *phys_dirty;
     QLIST_HEAD(ram, RAMBlock) blocks;
+    QLIST_HEAD(, RAMBlock) blocks_mru;
 } RAMList;
 extern RAMList ram_list;
 
diff --git a/exec.c b/exec.c
index 0e2ce57..c5c247c 100644
--- a/exec.c
+++ b/exec.c
@@ -113,7 +113,11 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;
 
-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list) };
+RAMList ram_list = {
+    .blocks = QLIST_HEAD_INITIALIZER(ram_list),
+    .blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};
+
 #endif
 
 CPUState *first_cpu;
@@ -2973,6 +2977,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     new_block->length = size;
 
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
+    QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
     ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2997,6 +3002,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             qemu_free(block);
             return;
         }
@@ -3010,6 +3016,7 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
@@ -3113,12 +3120,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         if (addr - block->offset < block->length) {
             /* Move this entry to to start of the list.  */
             if (block != QLIST_FIRST(&ram_list.blocks)) {
-                QLIST_REMOVE(block, next);
-                QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+                QLIST_REMOVE(block, next_mru);
+                QLIST_INSERT_HEAD(&ram_list.blocks_mru, block, next_mru);
             }
             if (xen_mapcache_enabled()) {
                 /* We need to check if the requested address is in the RAM
@@ -3211,7 +3218,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
         return 0;
     }
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         /* This case append when the block is not mapped. */
         if (block->host == NULL) {
             continue;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 2/5] MRU ram block list
@ 2011-08-27 18:09   ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Paolo Bonzini, Umesh Deshpande

This patch creates a new list of RAM blocks in MRU order. So that separate
locking rules can be applied to the regular RAM block list and the MRU list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-all.h |    2 ++
 exec.c    |   17 ++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e839100..6b217a2 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -925,6 +925,7 @@ typedef struct RAMBlock {
     uint32_t flags;
     char idstr[256];
     QLIST_ENTRY(RAMBlock) next;
+    QLIST_ENTRY(RAMBlock) next_mru;
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
@@ -933,6 +934,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
     uint8_t *phys_dirty;
     QLIST_HEAD(ram, RAMBlock) blocks;
+    QLIST_HEAD(, RAMBlock) blocks_mru;
 } RAMList;
 extern RAMList ram_list;
 
diff --git a/exec.c b/exec.c
index 0e2ce57..c5c247c 100644
--- a/exec.c
+++ b/exec.c
@@ -113,7 +113,11 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;
 
-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list) };
+RAMList ram_list = {
+    .blocks = QLIST_HEAD_INITIALIZER(ram_list),
+    .blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};
+
 #endif
 
 CPUState *first_cpu;
@@ -2973,6 +2977,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     new_block->length = size;
 
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
+    QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
     ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
@@ -2997,6 +3002,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             qemu_free(block);
             return;
         }
@@ -3010,6 +3016,7 @@ void qemu_ram_free(ram_addr_t addr)
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
             QLIST_REMOVE(block, next);
+            QLIST_REMOVE(block, next_mru);
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
@@ -3113,12 +3120,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block;
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         if (addr - block->offset < block->length) {
             /* Move this entry to to start of the list.  */
             if (block != QLIST_FIRST(&ram_list.blocks)) {
-                QLIST_REMOVE(block, next);
-                QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
+                QLIST_REMOVE(block, next_mru);
+                QLIST_INSERT_HEAD(&ram_list.blocks_mru, block, next_mru);
             }
             if (xen_mapcache_enabled()) {
                 /* We need to check if the requested address is in the RAM
@@ -3211,7 +3218,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
         return 0;
     }
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
+    QLIST_FOREACH(block, &ram_list.blocks_mru, next_mru) {
         /* This case append when the block is not mapped. */
         if (block->host == NULL) {
             continue;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] Migration thread mutex
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-27 18:09   ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch implements migrate_ram mutex, which protects the RAMBlock list
traversal in the migration thread during the transfer of a ram from their
addition/removal from the iothread.

Note: Combination of iothread mutex and migration thread mutex works as a
rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
block list.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c   |   21 +++++++++++++++++++++
 cpu-all.h     |    3 +++
 exec.c        |   23 +++++++++++++++++++++++
 qemu-common.h |    2 ++
 4 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 484b39d..9d02270 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
 
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
+static uint64_t last_version;
 
 static int ram_save_block(QEMUFile *f)
 {
@@ -170,6 +171,7 @@ static int ram_save_block(QEMUFile *f)
 
     last_block = block;
     last_offset = offset;
+    last_version = ram_list.version;
 
     return bytes_sent;
 }
@@ -270,6 +272,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         bytes_transferred = 0;
         last_block = NULL;
         last_offset = 0;
+        last_version = ram_list.version = 0;
         sort_ram_list();
 
         /* Make sure all dirty bits are set */
@@ -298,6 +301,17 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);
 
+    if (stage != 3) {
+        qemu_mutex_lock_migrate_ram();
+        qemu_mutex_unlock_iothread();
+    }
+
+    if (ram_list.version != last_version) {
+        /* RAM block added or removed */
+        last_block = NULL;
+        last_offset = 0;
+    }
+
     while (!qemu_file_rate_limit(f)) {
         int bytes_sent;
 
@@ -308,6 +322,13 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         }
     }
 
+    if (stage != 3) {
+        qemu_mutex_unlock_migrate_ram();
+        qemu_mutex_lock_iothread();
+        /* Lock ordering : iothread mutex is always acquired outside migrate_ram
+         * mutex critical section to avoid deadlock */
+    }
+
     bwidth = qemu_get_clock_ns(rt_clock) - bwidth;
     bwidth = (bytes_transferred - bytes_transferred_last) / bwidth;
 
diff --git a/cpu-all.h b/cpu-all.h
index 6b217a2..b85483f 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -21,6 +21,7 @@
 
 #include "qemu-common.h"
 #include "cpu-common.h"
+#include "qemu-thread.h"
 
 /* some important defines:
  *
@@ -932,7 +933,9 @@ typedef struct RAMBlock {
 } RAMBlock;
 
 typedef struct RAMList {
+    QemuMutex mutex;    /* Protects RAM block list */
     uint8_t *phys_dirty;
+    uint32_t version;   /* To detect ram block addition/removal */
     QLIST_HEAD(ram, RAMBlock) blocks;
     QLIST_HEAD(, RAMBlock) blocks_mru;
 } RAMList;
diff --git a/exec.c b/exec.c
index c5c247c..7627483 100644
--- a/exec.c
+++ b/exec.c
@@ -582,6 +582,7 @@ void cpu_exec_init_all(unsigned long tb_size)
     code_gen_alloc(tb_size);
     code_gen_ptr = code_gen_buffer;
     page_init();
+    qemu_mutex_init(&ram_list.mutex);
 #if !defined(CONFIG_USER_ONLY)
     io_mem_init();
 #endif
@@ -2802,6 +2803,16 @@ static long gethugepagesize(const char *path)
     return fs.f_bsize;
 }
 
+void qemu_mutex_lock_migrate_ram(void)
+{
+    qemu_mutex_lock(&ram_list.mutex);
+}
+
+void qemu_mutex_unlock_migrate_ram(void)
+{
+    qemu_mutex_unlock(&ram_list.mutex);
+}
+
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path)
@@ -2976,14 +2987,20 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     }
     new_block->length = size;
 
+    qemu_mutex_lock_migrate_ram();
+
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
+    ram_list.version++;
+
     ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);
 
+    qemu_mutex_unlock_migrate_ram();
+
     if (kvm_enabled())
         kvm_setup_guest_memory(new_block->host, size);
 
@@ -3001,8 +3018,11 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
+            qemu_mutex_lock_migrate_ram();
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
+            qemu_mutex_unlock_migrate_ram();
             qemu_free(block);
             return;
         }
@@ -3015,8 +3035,11 @@ void qemu_ram_free(ram_addr_t addr)
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
+            qemu_mutex_lock_migrate_ram();
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
+            qemu_mutex_unlock_migrate_ram();
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
diff --git a/qemu-common.h b/qemu-common.h
index abd7a75..7dabfe9 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -212,6 +212,8 @@ char *qemu_strndup(const char *str, size_t size);
 
 void qemu_mutex_lock_iothread(void);
 void qemu_mutex_unlock_iothread(void);
+void qemu_mutex_lock_migrate_ram(void);
+void qemu_mutex_unlock_migrate_ram(void);
 
 int qemu_open(const char *name, int flags, ...);
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 3/5] Migration thread mutex
@ 2011-08-27 18:09   ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch implements migrate_ram mutex, which protects the RAMBlock list
traversal in the migration thread during the transfer of a ram from their
addition/removal from the iothread.

Note: Combination of iothread mutex and migration thread mutex works as a
rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
block list.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c   |   21 +++++++++++++++++++++
 cpu-all.h     |    3 +++
 exec.c        |   23 +++++++++++++++++++++++
 qemu-common.h |    2 ++
 4 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 484b39d..9d02270 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
 
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
+static uint64_t last_version;
 
 static int ram_save_block(QEMUFile *f)
 {
@@ -170,6 +171,7 @@ static int ram_save_block(QEMUFile *f)
 
     last_block = block;
     last_offset = offset;
+    last_version = ram_list.version;
 
     return bytes_sent;
 }
@@ -270,6 +272,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         bytes_transferred = 0;
         last_block = NULL;
         last_offset = 0;
+        last_version = ram_list.version = 0;
         sort_ram_list();
 
         /* Make sure all dirty bits are set */
@@ -298,6 +301,17 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);
 
+    if (stage != 3) {
+        qemu_mutex_lock_migrate_ram();
+        qemu_mutex_unlock_iothread();
+    }
+
+    if (ram_list.version != last_version) {
+        /* RAM block added or removed */
+        last_block = NULL;
+        last_offset = 0;
+    }
+
     while (!qemu_file_rate_limit(f)) {
         int bytes_sent;
 
@@ -308,6 +322,13 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         }
     }
 
+    if (stage != 3) {
+        qemu_mutex_unlock_migrate_ram();
+        qemu_mutex_lock_iothread();
+        /* Lock ordering : iothread mutex is always acquired outside migrate_ram
+         * mutex critical section to avoid deadlock */
+    }
+
     bwidth = qemu_get_clock_ns(rt_clock) - bwidth;
     bwidth = (bytes_transferred - bytes_transferred_last) / bwidth;
 
diff --git a/cpu-all.h b/cpu-all.h
index 6b217a2..b85483f 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -21,6 +21,7 @@
 
 #include "qemu-common.h"
 #include "cpu-common.h"
+#include "qemu-thread.h"
 
 /* some important defines:
  *
@@ -932,7 +933,9 @@ typedef struct RAMBlock {
 } RAMBlock;
 
 typedef struct RAMList {
+    QemuMutex mutex;    /* Protects RAM block list */
     uint8_t *phys_dirty;
+    uint32_t version;   /* To detect ram block addition/removal */
     QLIST_HEAD(ram, RAMBlock) blocks;
     QLIST_HEAD(, RAMBlock) blocks_mru;
 } RAMList;
diff --git a/exec.c b/exec.c
index c5c247c..7627483 100644
--- a/exec.c
+++ b/exec.c
@@ -582,6 +582,7 @@ void cpu_exec_init_all(unsigned long tb_size)
     code_gen_alloc(tb_size);
     code_gen_ptr = code_gen_buffer;
     page_init();
+    qemu_mutex_init(&ram_list.mutex);
 #if !defined(CONFIG_USER_ONLY)
     io_mem_init();
 #endif
@@ -2802,6 +2803,16 @@ static long gethugepagesize(const char *path)
     return fs.f_bsize;
 }
 
+void qemu_mutex_lock_migrate_ram(void)
+{
+    qemu_mutex_lock(&ram_list.mutex);
+}
+
+void qemu_mutex_unlock_migrate_ram(void)
+{
+    qemu_mutex_unlock(&ram_list.mutex);
+}
+
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path)
@@ -2976,14 +2987,20 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     }
     new_block->length = size;
 
+    qemu_mutex_lock_migrate_ram();
+
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
     QLIST_INSERT_HEAD(&ram_list.blocks_mru, new_block, next_mru);
 
+    ram_list.version++;
+
     ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
                                        last_ram_offset() >> TARGET_PAGE_BITS);
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);
 
+    qemu_mutex_unlock_migrate_ram();
+
     if (kvm_enabled())
         kvm_setup_guest_memory(new_block->host, size);
 
@@ -3001,8 +3018,11 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
+            qemu_mutex_lock_migrate_ram();
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
+            qemu_mutex_unlock_migrate_ram();
             qemu_free(block);
             return;
         }
@@ -3015,8 +3035,11 @@ void qemu_ram_free(ram_addr_t addr)
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
         if (addr == block->offset) {
+            qemu_mutex_lock_migrate_ram();
             QLIST_REMOVE(block, next);
             QLIST_REMOVE(block, next_mru);
+            ram_list.version++;
+            qemu_mutex_unlock_migrate_ram();
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
             } else if (mem_path) {
diff --git a/qemu-common.h b/qemu-common.h
index abd7a75..7dabfe9 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -212,6 +212,8 @@ char *qemu_strndup(const char *str, size_t size);
 
 void qemu_mutex_lock_iothread(void);
 void qemu_mutex_unlock_iothread(void);
+void qemu_mutex_lock_migrate_ram(void);
+void qemu_mutex_unlock_migrate_ram(void);
 
 int qemu_open(const char *name, int flags, ...);
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] Separate migration dirty bitmap
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-27 18:09   ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch creates a migration bitmap, which is periodically kept in sync with
the qemu bitmap. A separate copy of the dirty bitmap for the migration avoids
concurrent access to the qemu bitmap from the iothread and the migration thread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c |   17 ++++++++---------
 cpu-all.h   |   37 +++++++++++++++++++++++++++++++++++++
 exec.c      |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9d02270..b5b852b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -124,13 +124,13 @@ static int ram_save_block(QEMUFile *f)
     current_addr = block->offset + offset;
 
     do {
-        if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
+        if (migration_bitmap_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
             uint8_t *p;
             int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
 
-            cpu_physical_memory_reset_dirty(current_addr,
-                                            current_addr + TARGET_PAGE_SIZE,
-                                            MIGRATION_DIRTY_FLAG);
+            migration_bitmap_reset_dirty(current_addr,
+                                         current_addr + TARGET_PAGE_SIZE,
+                                         MIGRATION_DIRTY_FLAG);
 
             p = block->host + offset;
 
@@ -187,7 +187,7 @@ static ram_addr_t ram_save_remaining(void)
         ram_addr_t addr;
         for (addr = block->offset; addr < block->offset + block->length;
              addr += TARGET_PAGE_SIZE) {
-            if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
+            if (migration_bitmap_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
                 count++;
             }
         }
@@ -267,6 +267,8 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         return 0;
     }
 
+    sync_migration_bitmap(0, TARGET_PHYS_ADDR_MAX);
+
     if (stage == 1) {
         RAMBlock *block;
         bytes_transferred = 0;
@@ -279,10 +281,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         QLIST_FOREACH(block, &ram_list.blocks, next) {
             for (addr = block->offset; addr < block->offset + block->length;
                  addr += TARGET_PAGE_SIZE) {
-                if (!cpu_physical_memory_get_dirty(addr,
-                                                   MIGRATION_DIRTY_FLAG)) {
-                    cpu_physical_memory_set_dirty(addr);
-                }
+                migration_bitmap_set_dirty(addr);
             }
         }
 
diff --git a/cpu-all.h b/cpu-all.h
index b85483f..8181f8b 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -935,6 +935,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
     QemuMutex mutex;    /* Protects RAM block list */
     uint8_t *phys_dirty;
+    uint8_t *migration_bitmap; /* Dedicated bitmap for migration thread */
     uint32_t version;   /* To detect ram block addition/removal */
     QLIST_HEAD(ram, RAMBlock) blocks;
     QLIST_HEAD(, RAMBlock) blocks_mru;
@@ -1009,8 +1010,44 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start,
     }
 }
 
+
+
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                      int dirty_flags);
+
+static inline int migration_bitmap_get_dirty(ram_addr_t addr,
+                                             int dirty_flags)
+{
+    return ram_list.migration_bitmap[addr >> TARGET_PAGE_BITS] & dirty_flags;
+}
+
+static inline void migration_bitmap_set_dirty(ram_addr_t addr)
+{
+    ram_list.migration_bitmap[addr >> TARGET_PAGE_BITS] = 0xff;
+}
+
+static inline void migration_bitmap_mask_dirty_range(ram_addr_t start,
+                                                     int length,
+                                                     int dirty_flags)
+{
+    int i, mask, len;
+    uint8_t *p;
+
+    len = length >> TARGET_PAGE_BITS;
+    mask = ~dirty_flags;
+    p = ram_list.migration_bitmap + (start >> TARGET_PAGE_BITS);
+    for (i = 0; i < len; i++) {
+        p[i] &= mask;
+    }
+}
+
+
+void migration_bitmap_reset_dirty(ram_addr_t start,
+                                  ram_addr_t end,
+                                  int dirty_flags);
+
+void sync_migration_bitmap(ram_addr_t start, ram_addr_t end);
+
 void cpu_tlb_update_dirty(CPUState *env);
 
 int cpu_physical_memory_set_dirty_tracking(int enable);
diff --git a/exec.c b/exec.c
index 7627483..8dfbdbc 100644
--- a/exec.c
+++ b/exec.c
@@ -2111,6 +2111,10 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
         abort();
     }
 
+    if (kvm_enabled()) {
+        return;
+    }
+
     for(env = first_cpu; env != NULL; env = env->next_cpu) {
         int mmu_idx;
         for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -2119,8 +2123,54 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                       start1, length);
         }
     }
+
+}
+
+void migration_bitmap_reset_dirty(ram_addr_t start, ram_addr_t end,
+                                  int dirty_flags)
+{
+    unsigned long length;
+
+    start &= TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    length = end - start;
+    if (length == 0) {
+        return;
+    }
+
+    migration_bitmap_mask_dirty_range(start, length, dirty_flags);
+}
+
+/* Synchronizes migration bitmap with the qemu dirty bitmap.
+ * Called by acquiring the iothread mutex */
+
+void sync_migration_bitmap(ram_addr_t start, ram_addr_t end)
+{
+    unsigned long length, len, i;
+    ram_addr_t addr;
+    start &= TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    length = end - start;
+    if (length == 0) {
+        return;
+    }
+
+    len = length >> TARGET_PAGE_BITS;
+    for (i = 0; i < len; i++) {
+        addr = i << TARGET_PAGE_BITS;
+        if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
+            migration_bitmap_set_dirty(addr);
+            cpu_physical_memory_reset_dirty(addr, addr + TARGET_PAGE_SIZE,
+                                            MIGRATION_DIRTY_FLAG);
+        }
+    }
+
 }
 
+
+
 int cpu_physical_memory_set_dirty_tracking(int enable)
 {
     int ret = 0;
@@ -2999,6 +3049,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);
 
+    ram_list.migration_bitmap = qemu_realloc(ram_list.phys_dirty,
+                                             last_ram_offset() >>
+                                             TARGET_PAGE_BITS);
+
+    memset(ram_list.migration_bitmap + (new_block->offset >> TARGET_PAGE_BITS),
+           0xff, size >> TARGET_PAGE_BITS);
+
     qemu_mutex_unlock_migrate_ram();
 
     if (kvm_enabled())
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 4/5] Separate migration dirty bitmap
@ 2011-08-27 18:09   ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch creates a migration bitmap, which is periodically kept in sync with
the qemu bitmap. A separate copy of the dirty bitmap for the migration avoids
concurrent access to the qemu bitmap from the iothread and the migration thread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 arch_init.c |   17 ++++++++---------
 cpu-all.h   |   37 +++++++++++++++++++++++++++++++++++++
 exec.c      |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9d02270..b5b852b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -124,13 +124,13 @@ static int ram_save_block(QEMUFile *f)
     current_addr = block->offset + offset;
 
     do {
-        if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
+        if (migration_bitmap_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
             uint8_t *p;
             int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
 
-            cpu_physical_memory_reset_dirty(current_addr,
-                                            current_addr + TARGET_PAGE_SIZE,
-                                            MIGRATION_DIRTY_FLAG);
+            migration_bitmap_reset_dirty(current_addr,
+                                         current_addr + TARGET_PAGE_SIZE,
+                                         MIGRATION_DIRTY_FLAG);
 
             p = block->host + offset;
 
@@ -187,7 +187,7 @@ static ram_addr_t ram_save_remaining(void)
         ram_addr_t addr;
         for (addr = block->offset; addr < block->offset + block->length;
              addr += TARGET_PAGE_SIZE) {
-            if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
+            if (migration_bitmap_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
                 count++;
             }
         }
@@ -267,6 +267,8 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         return 0;
     }
 
+    sync_migration_bitmap(0, TARGET_PHYS_ADDR_MAX);
+
     if (stage == 1) {
         RAMBlock *block;
         bytes_transferred = 0;
@@ -279,10 +281,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
         QLIST_FOREACH(block, &ram_list.blocks, next) {
             for (addr = block->offset; addr < block->offset + block->length;
                  addr += TARGET_PAGE_SIZE) {
-                if (!cpu_physical_memory_get_dirty(addr,
-                                                   MIGRATION_DIRTY_FLAG)) {
-                    cpu_physical_memory_set_dirty(addr);
-                }
+                migration_bitmap_set_dirty(addr);
             }
         }
 
diff --git a/cpu-all.h b/cpu-all.h
index b85483f..8181f8b 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -935,6 +935,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
     QemuMutex mutex;    /* Protects RAM block list */
     uint8_t *phys_dirty;
+    uint8_t *migration_bitmap; /* Dedicated bitmap for migration thread */
     uint32_t version;   /* To detect ram block addition/removal */
     QLIST_HEAD(ram, RAMBlock) blocks;
     QLIST_HEAD(, RAMBlock) blocks_mru;
@@ -1009,8 +1010,44 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start,
     }
 }
 
+
+
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                      int dirty_flags);
+
+static inline int migration_bitmap_get_dirty(ram_addr_t addr,
+                                             int dirty_flags)
+{
+    return ram_list.migration_bitmap[addr >> TARGET_PAGE_BITS] & dirty_flags;
+}
+
+static inline void migration_bitmap_set_dirty(ram_addr_t addr)
+{
+    ram_list.migration_bitmap[addr >> TARGET_PAGE_BITS] = 0xff;
+}
+
+static inline void migration_bitmap_mask_dirty_range(ram_addr_t start,
+                                                     int length,
+                                                     int dirty_flags)
+{
+    int i, mask, len;
+    uint8_t *p;
+
+    len = length >> TARGET_PAGE_BITS;
+    mask = ~dirty_flags;
+    p = ram_list.migration_bitmap + (start >> TARGET_PAGE_BITS);
+    for (i = 0; i < len; i++) {
+        p[i] &= mask;
+    }
+}
+
+
+void migration_bitmap_reset_dirty(ram_addr_t start,
+                                  ram_addr_t end,
+                                  int dirty_flags);
+
+void sync_migration_bitmap(ram_addr_t start, ram_addr_t end);
+
 void cpu_tlb_update_dirty(CPUState *env);
 
 int cpu_physical_memory_set_dirty_tracking(int enable);
diff --git a/exec.c b/exec.c
index 7627483..8dfbdbc 100644
--- a/exec.c
+++ b/exec.c
@@ -2111,6 +2111,10 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
         abort();
     }
 
+    if (kvm_enabled()) {
+        return;
+    }
+
     for(env = first_cpu; env != NULL; env = env->next_cpu) {
         int mmu_idx;
         for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -2119,8 +2123,54 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                       start1, length);
         }
     }
+
+}
+
+void migration_bitmap_reset_dirty(ram_addr_t start, ram_addr_t end,
+                                  int dirty_flags)
+{
+    unsigned long length;
+
+    start &= TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    length = end - start;
+    if (length == 0) {
+        return;
+    }
+
+    migration_bitmap_mask_dirty_range(start, length, dirty_flags);
+}
+
+/* Synchronizes migration bitmap with the qemu dirty bitmap.
+ * Called by acquiring the iothread mutex */
+
+void sync_migration_bitmap(ram_addr_t start, ram_addr_t end)
+{
+    unsigned long length, len, i;
+    ram_addr_t addr;
+    start &= TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    length = end - start;
+    if (length == 0) {
+        return;
+    }
+
+    len = length >> TARGET_PAGE_BITS;
+    for (i = 0; i < len; i++) {
+        addr = i << TARGET_PAGE_BITS;
+        if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
+            migration_bitmap_set_dirty(addr);
+            cpu_physical_memory_reset_dirty(addr, addr + TARGET_PAGE_SIZE,
+                                            MIGRATION_DIRTY_FLAG);
+        }
+    }
+
 }
 
+
+
 int cpu_physical_memory_set_dirty_tracking(int enable)
 {
     int ret = 0;
@@ -2999,6 +3049,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);
 
+    ram_list.migration_bitmap = qemu_realloc(ram_list.phys_dirty,
+                                             last_ram_offset() >>
+                                             TARGET_PAGE_BITS);
+
+    memset(ram_list.migration_bitmap + (new_block->offset >> TARGET_PAGE_BITS),
+           0xff, size >> TARGET_PAGE_BITS);
+
     qemu_mutex_unlock_migrate_ram();
 
     if (kvm_enabled())
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] Separate migration thread
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-27 18:09   ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch creates a separate thread for the guest migration on the source side.
All exits (on completion/error) from the migration thread are handled by a
bottom handler, which is called from the iothread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 buffered_file.c     |   76 ++++++++++++++++++++----------------
 migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
 migration.h         |    8 ++++
 qemu-thread-posix.c |   10 +++++
 qemu-thread.h       |    1 +
 5 files changed, 124 insertions(+), 76 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 41b42c3..c31852e 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -16,6 +16,8 @@
 #include "qemu-timer.h"
 #include "qemu-char.h"
 #include "buffered_file.h"
+#include "migration.h"
+#include "qemu-thread.h"
 
 //#define DEBUG_BUFFERED_FILE
 
@@ -28,13 +30,14 @@ typedef struct QEMUFileBuffered
     void *opaque;
     QEMUFile *file;
     int has_error;
+    int closed;
     int freeze_output;
     size_t bytes_xfer;
     size_t xfer_limit;
     uint8_t *buffer;
     size_t buffer_size;
     size_t buffer_capacity;
-    QEMUTimer *timer;
+    QemuThread thread;
 } QEMUFileBuffered;
 
 #ifdef DEBUG_BUFFERED_FILE
@@ -155,14 +158,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         offset = size;
     }
 
-    if (pos == 0 && size == 0) {
-        DPRINTF("file is ready\n");
-        if (s->bytes_xfer <= s->xfer_limit) {
-            DPRINTF("notifying client\n");
-            s->put_ready(s->opaque);
-        }
-    }
-
     return offset;
 }
 
@@ -173,22 +168,25 @@ static int buffered_close(void *opaque)
 
     DPRINTF("closing\n");
 
-    while (!s->has_error && s->buffer_size) {
-        buffered_flush(s);
-        if (s->freeze_output)
-            s->wait_for_unfreeze(s);
-    }
+    s->closed = 1;
 
-    ret = s->close(s->opaque);
+    qemu_mutex_unlock_migrate_ram();
+    qemu_mutex_unlock_iothread();
 
-    qemu_del_timer(s->timer);
-    qemu_free_timer(s->timer);
+    qemu_thread_join(&s->thread);
+    /* Waits for the completion of the migration thread */
+
+    qemu_mutex_lock_iothread();
+    qemu_mutex_lock_migrate_ram();
+
+    ret = s->close(s->opaque);
     qemu_free(s->buffer);
     qemu_free(s);
 
     return ret;
 }
 
+
 static int buffered_rate_limit(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
@@ -228,26 +226,37 @@ static int64_t buffered_get_rate_limit(void *opaque)
     return s->xfer_limit;
 }
 
-static void buffered_rate_tick(void *opaque)
+static void *migrate_vm(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
+    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
+    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
 
-    if (s->has_error) {
-        buffered_close(s);
-        return;
-    }
-
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    while (!s->has_error && (!s->closed || s->buffer_size)) {
+        if (s->freeze_output) {
+            s->wait_for_unfreeze(s);
+            s->freeze_output = 0;
+            continue;
+        }
 
-    if (s->freeze_output)
-        return;
+        current_time = qemu_get_clock_ms(rt_clock);
+        if (!s->closed && (expire_time > current_time)) {
+            tv.tv_usec = 1000 * (expire_time - current_time);
+            select(0, NULL, NULL, NULL, &tv);
+            continue;
+        }
 
-    s->bytes_xfer = 0;
+        s->bytes_xfer = 0;
 
-    buffered_flush(s);
+        expire_time = qemu_get_clock_ms(rt_clock) + 100;
+        if (!s->closed) {
+            s->put_ready(s->opaque);
+        } else {
+            buffered_flush(s);
+        }
+    }
 
-    /* Add some checks around this */
-    s->put_ready(s->opaque);
+    return NULL;
 }
 
 QEMUFile *qemu_fopen_ops_buffered(void *opaque,
@@ -267,15 +276,14 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
     s->put_ready = put_ready;
     s->wait_for_unfreeze = wait_for_unfreeze;
     s->close = close;
+    s->closed = 0;
 
     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
                              buffered_set_rate_limit,
-			     buffered_get_rate_limit);
-
-    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
+                             buffered_get_rate_limit);
 
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    qemu_thread_create(&s->thread, migrate_vm, s);
 
     return s->file;
 }
diff --git a/migration.c b/migration.c
index af3a1f2..5df186d 100644
--- a/migration.c
+++ b/migration.c
@@ -149,10 +149,12 @@ int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject **ret_data)
     }
     max_throttle = d;
 
+    qemu_mutex_lock_migrate_ram();
     s = migrate_to_fms(current_migration);
     if (s && s->file) {
         qemu_file_set_rate_limit(s->file, max_throttle);
     }
+    qemu_mutex_unlock_migrate_ram();
 
     return 0;
 }
@@ -284,13 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
 {
     int ret = 0;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
     if (s->file) {
         DPRINTF("closing file\n");
+        qemu_mutex_lock_migrate_ram();
         if (qemu_fclose(s->file) != 0) {
             ret = -1;
         }
+        qemu_mutex_unlock_migrate_ram();
         s->file = NULL;
     }
 
@@ -311,7 +313,6 @@ void migrate_fd_put_notify(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     qemu_file_put_notify(s->file);
 }
 
@@ -327,76 +328,87 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -1)
         ret = -(s->get_error(s));
 
-    if (ret == -EAGAIN) {
-        qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
-    } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
+    return ret;
+}
+
+static void migrate_fd_terminate(void *opaque)
+{
+    FdMigrationState *s = opaque;
+
+    if (s->code == COMPLETE) {
+        if (migrate_fd_cleanup(s) < 0) {
+            if (s->old_vm_running) {
+                vm_start();
+            }
+            s->state = MIG_STATE_ERROR;
+        } else {
+            s->state = MIG_STATE_COMPLETED;
         }
-        s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers);
+    } else if (s->code == ERROR) {
+        migrate_fd_error(s);
     }
-
-    return ret;
 }
 
 void migrate_fd_connect(FdMigrationState *s)
 {
-    int ret;
-
+    s->code = START;
+    s->bh = qemu_bh_new(migrate_fd_terminate, s);
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
                                       migrate_fd_put_buffer,
                                       migrate_fd_put_ready,
                                       migrate_fd_wait_for_unfreeze,
                                       migrate_fd_close);
-
-    DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
-                                  s->mig_state.shared);
-    if (ret < 0) {
-        DPRINTF("failed, %d\n", ret);
-        migrate_fd_error(s);
-        return;
-    }
-    
-    migrate_fd_put_ready(s);
 }
 
 void migrate_fd_put_ready(void *opaque)
 {
     FdMigrationState *s = opaque;
+    int ret;
 
-    if (s->state != MIG_STATE_ACTIVE) {
+    qemu_mutex_lock_iothread();
+    if (s->code != ACTIVE && s->code != START) {
         DPRINTF("put_ready returning because of non-active state\n");
+        qemu_mutex_unlock_iothread();
         return;
     }
 
+    migrate_fd_put_notify(s);
+
+    if (!s->code) {
+        DPRINTF("beginning savevm\n");
+        ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
+                s->mig_state.shared);
+        if (ret < 0) {
+            DPRINTF("failed, %d\n", ret);
+            s->code = ERROR;
+            qemu_bh_schedule(s->bh);
+            qemu_mutex_unlock_iothread();
+            return;
+        }
+        s->code = ACTIVE;
+    }
+
     DPRINTF("iterate\n");
     if (qemu_savevm_state_iterate(s->mon, s->file) == 1) {
-        int state;
-        int old_vm_running = vm_running;
+        s->old_vm_running = vm_running;
 
         DPRINTF("done iterating\n");
         vm_stop(VMSTOP_MIGRATE);
 
         if ((qemu_savevm_state_complete(s->mon, s->file)) < 0) {
-            if (old_vm_running) {
+            if (s->old_vm_running) {
                 vm_start();
             }
-            state = MIG_STATE_ERROR;
+            s->code = ERROR;
         } else {
-            state = MIG_STATE_COMPLETED;
+            s->code = COMPLETE;
         }
-        if (migrate_fd_cleanup(s) < 0) {
-            if (old_vm_running) {
-                vm_start();
-            }
-            state = MIG_STATE_ERROR;
-        }
-        s->state = state;
-        notifier_list_notify(&migration_state_notifiers);
+
+        qemu_bh_schedule(s->bh);
     }
+    qemu_mutex_unlock_iothread();
 }
 
 int migrate_fd_get_status(MigrationState *mig_state)
@@ -426,22 +438,32 @@ void migrate_fd_release(MigrationState *mig_state)
     FdMigrationState *s = migrate_to_fms(mig_state);
 
     DPRINTF("releasing state\n");
-   
+
     if (s->state == MIG_STATE_ACTIVE) {
         s->state = MIG_STATE_CANCELLED;
         notifier_list_notify(&migration_state_notifiers);
         migrate_fd_cleanup(s);
     }
+
+    if (s->bh) {
+        qemu_bh_delete(s->bh);
+    }
+
     qemu_free(s);
 }
 
 void migrate_fd_wait_for_unfreeze(void *opaque)
 {
     FdMigrationState *s = opaque;
-    int ret;
+    int ret, state;
 
     DPRINTF("wait for unfreeze\n");
-    if (s->state != MIG_STATE_ACTIVE)
+
+    qemu_mutex_lock_iothread();
+    state = s->state;
+    qemu_mutex_unlock_iothread();
+
+    if (state != MIG_STATE_ACTIVE)
         return;
 
     do {
@@ -458,7 +480,6 @@ int migrate_fd_close(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }
 
diff --git a/migration.h b/migration.h
index 050c56c..960559a 100644
--- a/migration.h
+++ b/migration.h
@@ -23,6 +23,11 @@
 #define MIG_STATE_CANCELLED	1
 #define MIG_STATE_ACTIVE	2
 
+#define START       0
+#define ACTIVE      1
+#define COMPLETE    2
+#define ERROR       3
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -45,10 +50,13 @@ struct FdMigrationState
     int fd;
     Monitor *mon;
     int state;
+    int code;
+    int old_vm_running;
     int (*get_error)(struct FdMigrationState*);
     int (*close)(struct FdMigrationState*);
     int (*write)(struct FdMigrationState*, const void *, size_t);
     void *opaque;
+    QEMUBH *bh;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index 2bd02ef..6a275be 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -115,6 +115,16 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
         error_exit(err, __func__);
 }
 
+void qemu_thread_join(QemuThread *thread)
+{
+    int err;
+
+    err = pthread_join(thread->thread, NULL);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
 void qemu_thread_create(QemuThread *thread,
                        void *(*start_routine)(void*),
                        void *arg)
diff --git a/qemu-thread.h b/qemu-thread.h
index 0a73d50..d5b99d5 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -30,6 +30,7 @@ void qemu_cond_destroy(QemuCond *cond);
 void qemu_cond_signal(QemuCond *cond);
 void qemu_cond_broadcast(QemuCond *cond);
 void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex);
+void qemu_thread_join(QemuThread *thread);
 
 void qemu_thread_create(QemuThread *thread,
                        void *(*start_routine)(void*),
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-27 18:09   ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-27 18:09 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: Umesh Deshpande

This patch creates a separate thread for the guest migration on the source side.
All exits (on completion/error) from the migration thread are handled by a
bottom handler, which is called from the iothread.

Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
---
 buffered_file.c     |   76 ++++++++++++++++++++----------------
 migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
 migration.h         |    8 ++++
 qemu-thread-posix.c |   10 +++++
 qemu-thread.h       |    1 +
 5 files changed, 124 insertions(+), 76 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 41b42c3..c31852e 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -16,6 +16,8 @@
 #include "qemu-timer.h"
 #include "qemu-char.h"
 #include "buffered_file.h"
+#include "migration.h"
+#include "qemu-thread.h"
 
 //#define DEBUG_BUFFERED_FILE
 
@@ -28,13 +30,14 @@ typedef struct QEMUFileBuffered
     void *opaque;
     QEMUFile *file;
     int has_error;
+    int closed;
     int freeze_output;
     size_t bytes_xfer;
     size_t xfer_limit;
     uint8_t *buffer;
     size_t buffer_size;
     size_t buffer_capacity;
-    QEMUTimer *timer;
+    QemuThread thread;
 } QEMUFileBuffered;
 
 #ifdef DEBUG_BUFFERED_FILE
@@ -155,14 +158,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         offset = size;
     }
 
-    if (pos == 0 && size == 0) {
-        DPRINTF("file is ready\n");
-        if (s->bytes_xfer <= s->xfer_limit) {
-            DPRINTF("notifying client\n");
-            s->put_ready(s->opaque);
-        }
-    }
-
     return offset;
 }
 
@@ -173,22 +168,25 @@ static int buffered_close(void *opaque)
 
     DPRINTF("closing\n");
 
-    while (!s->has_error && s->buffer_size) {
-        buffered_flush(s);
-        if (s->freeze_output)
-            s->wait_for_unfreeze(s);
-    }
+    s->closed = 1;
 
-    ret = s->close(s->opaque);
+    qemu_mutex_unlock_migrate_ram();
+    qemu_mutex_unlock_iothread();
 
-    qemu_del_timer(s->timer);
-    qemu_free_timer(s->timer);
+    qemu_thread_join(&s->thread);
+    /* Waits for the completion of the migration thread */
+
+    qemu_mutex_lock_iothread();
+    qemu_mutex_lock_migrate_ram();
+
+    ret = s->close(s->opaque);
     qemu_free(s->buffer);
     qemu_free(s);
 
     return ret;
 }
 
+
 static int buffered_rate_limit(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
@@ -228,26 +226,37 @@ static int64_t buffered_get_rate_limit(void *opaque)
     return s->xfer_limit;
 }
 
-static void buffered_rate_tick(void *opaque)
+static void *migrate_vm(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
+    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
+    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
 
-    if (s->has_error) {
-        buffered_close(s);
-        return;
-    }
-
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    while (!s->has_error && (!s->closed || s->buffer_size)) {
+        if (s->freeze_output) {
+            s->wait_for_unfreeze(s);
+            s->freeze_output = 0;
+            continue;
+        }
 
-    if (s->freeze_output)
-        return;
+        current_time = qemu_get_clock_ms(rt_clock);
+        if (!s->closed && (expire_time > current_time)) {
+            tv.tv_usec = 1000 * (expire_time - current_time);
+            select(0, NULL, NULL, NULL, &tv);
+            continue;
+        }
 
-    s->bytes_xfer = 0;
+        s->bytes_xfer = 0;
 
-    buffered_flush(s);
+        expire_time = qemu_get_clock_ms(rt_clock) + 100;
+        if (!s->closed) {
+            s->put_ready(s->opaque);
+        } else {
+            buffered_flush(s);
+        }
+    }
 
-    /* Add some checks around this */
-    s->put_ready(s->opaque);
+    return NULL;
 }
 
 QEMUFile *qemu_fopen_ops_buffered(void *opaque,
@@ -267,15 +276,14 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
     s->put_ready = put_ready;
     s->wait_for_unfreeze = wait_for_unfreeze;
     s->close = close;
+    s->closed = 0;
 
     s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
                              buffered_close, buffered_rate_limit,
                              buffered_set_rate_limit,
-			     buffered_get_rate_limit);
-
-    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
+                             buffered_get_rate_limit);
 
-    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+    qemu_thread_create(&s->thread, migrate_vm, s);
 
     return s->file;
 }
diff --git a/migration.c b/migration.c
index af3a1f2..5df186d 100644
--- a/migration.c
+++ b/migration.c
@@ -149,10 +149,12 @@ int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject **ret_data)
     }
     max_throttle = d;
 
+    qemu_mutex_lock_migrate_ram();
     s = migrate_to_fms(current_migration);
     if (s && s->file) {
         qemu_file_set_rate_limit(s->file, max_throttle);
     }
+    qemu_mutex_unlock_migrate_ram();
 
     return 0;
 }
@@ -284,13 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
 {
     int ret = 0;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
     if (s->file) {
         DPRINTF("closing file\n");
+        qemu_mutex_lock_migrate_ram();
         if (qemu_fclose(s->file) != 0) {
             ret = -1;
         }
+        qemu_mutex_unlock_migrate_ram();
         s->file = NULL;
     }
 
@@ -311,7 +313,6 @@ void migrate_fd_put_notify(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     qemu_file_put_notify(s->file);
 }
 
@@ -327,76 +328,87 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -1)
         ret = -(s->get_error(s));
 
-    if (ret == -EAGAIN) {
-        qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
-    } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
+    return ret;
+}
+
+static void migrate_fd_terminate(void *opaque)
+{
+    FdMigrationState *s = opaque;
+
+    if (s->code == COMPLETE) {
+        if (migrate_fd_cleanup(s) < 0) {
+            if (s->old_vm_running) {
+                vm_start();
+            }
+            s->state = MIG_STATE_ERROR;
+        } else {
+            s->state = MIG_STATE_COMPLETED;
         }
-        s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers);
+    } else if (s->code == ERROR) {
+        migrate_fd_error(s);
     }
-
-    return ret;
 }
 
 void migrate_fd_connect(FdMigrationState *s)
 {
-    int ret;
-
+    s->code = START;
+    s->bh = qemu_bh_new(migrate_fd_terminate, s);
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
                                       migrate_fd_put_buffer,
                                       migrate_fd_put_ready,
                                       migrate_fd_wait_for_unfreeze,
                                       migrate_fd_close);
-
-    DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
-                                  s->mig_state.shared);
-    if (ret < 0) {
-        DPRINTF("failed, %d\n", ret);
-        migrate_fd_error(s);
-        return;
-    }
-    
-    migrate_fd_put_ready(s);
 }
 
 void migrate_fd_put_ready(void *opaque)
 {
     FdMigrationState *s = opaque;
+    int ret;
 
-    if (s->state != MIG_STATE_ACTIVE) {
+    qemu_mutex_lock_iothread();
+    if (s->code != ACTIVE && s->code != START) {
         DPRINTF("put_ready returning because of non-active state\n");
+        qemu_mutex_unlock_iothread();
         return;
     }
 
+    migrate_fd_put_notify(s);
+
+    if (!s->code) {
+        DPRINTF("beginning savevm\n");
+        ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
+                s->mig_state.shared);
+        if (ret < 0) {
+            DPRINTF("failed, %d\n", ret);
+            s->code = ERROR;
+            qemu_bh_schedule(s->bh);
+            qemu_mutex_unlock_iothread();
+            return;
+        }
+        s->code = ACTIVE;
+    }
+
     DPRINTF("iterate\n");
     if (qemu_savevm_state_iterate(s->mon, s->file) == 1) {
-        int state;
-        int old_vm_running = vm_running;
+        s->old_vm_running = vm_running;
 
         DPRINTF("done iterating\n");
         vm_stop(VMSTOP_MIGRATE);
 
         if ((qemu_savevm_state_complete(s->mon, s->file)) < 0) {
-            if (old_vm_running) {
+            if (s->old_vm_running) {
                 vm_start();
             }
-            state = MIG_STATE_ERROR;
+            s->code = ERROR;
         } else {
-            state = MIG_STATE_COMPLETED;
+            s->code = COMPLETE;
         }
-        if (migrate_fd_cleanup(s) < 0) {
-            if (old_vm_running) {
-                vm_start();
-            }
-            state = MIG_STATE_ERROR;
-        }
-        s->state = state;
-        notifier_list_notify(&migration_state_notifiers);
+
+        qemu_bh_schedule(s->bh);
     }
+    qemu_mutex_unlock_iothread();
 }
 
 int migrate_fd_get_status(MigrationState *mig_state)
@@ -426,22 +438,32 @@ void migrate_fd_release(MigrationState *mig_state)
     FdMigrationState *s = migrate_to_fms(mig_state);
 
     DPRINTF("releasing state\n");
-   
+
     if (s->state == MIG_STATE_ACTIVE) {
         s->state = MIG_STATE_CANCELLED;
         notifier_list_notify(&migration_state_notifiers);
         migrate_fd_cleanup(s);
     }
+
+    if (s->bh) {
+        qemu_bh_delete(s->bh);
+    }
+
     qemu_free(s);
 }
 
 void migrate_fd_wait_for_unfreeze(void *opaque)
 {
     FdMigrationState *s = opaque;
-    int ret;
+    int ret, state;
 
     DPRINTF("wait for unfreeze\n");
-    if (s->state != MIG_STATE_ACTIVE)
+
+    qemu_mutex_lock_iothread();
+    state = s->state;
+    qemu_mutex_unlock_iothread();
+
+    if (state != MIG_STATE_ACTIVE)
         return;
 
     do {
@@ -458,7 +480,6 @@ int migrate_fd_close(void *opaque)
 {
     FdMigrationState *s = opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }
 
diff --git a/migration.h b/migration.h
index 050c56c..960559a 100644
--- a/migration.h
+++ b/migration.h
@@ -23,6 +23,11 @@
 #define MIG_STATE_CANCELLED	1
 #define MIG_STATE_ACTIVE	2
 
+#define START       0
+#define ACTIVE      1
+#define COMPLETE    2
+#define ERROR       3
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -45,10 +50,13 @@ struct FdMigrationState
     int fd;
     Monitor *mon;
     int state;
+    int code;
+    int old_vm_running;
     int (*get_error)(struct FdMigrationState*);
     int (*close)(struct FdMigrationState*);
     int (*write)(struct FdMigrationState*, const void *, size_t);
     void *opaque;
+    QEMUBH *bh;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index 2bd02ef..6a275be 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -115,6 +115,16 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
         error_exit(err, __func__);
 }
 
+void qemu_thread_join(QemuThread *thread)
+{
+    int err;
+
+    err = pthread_join(thread->thread, NULL);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
 void qemu_thread_create(QemuThread *thread,
                        void *(*start_routine)(void*),
                        void *arg)
diff --git a/qemu-thread.h b/qemu-thread.h
index 0a73d50..d5b99d5 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -30,6 +30,7 @@ void qemu_cond_destroy(QemuCond *cond);
 void qemu_cond_signal(QemuCond *cond);
 void qemu_cond_broadcast(QemuCond *cond);
 void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex);
+void qemu_thread_join(QemuThread *thread);
 
 void qemu_thread_create(QemuThread *thread,
                        void *(*start_routine)(void*),
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] Migration thread mutex
  2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29  9:04     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2011-08-29  9:04 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: kvm, qemu-devel

On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande <udeshpan@redhat.com> wrote:
> This patch implements migrate_ram mutex, which protects the RAMBlock list
> traversal in the migration thread during the transfer of a ram from their
> addition/removal from the iothread.
>
> Note: Combination of iothread mutex and migration thread mutex works as a
> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
> block list.
>
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c   |   21 +++++++++++++++++++++
>  cpu-all.h     |    3 +++
>  exec.c        |   23 +++++++++++++++++++++++
>  qemu-common.h |    2 ++
>  4 files changed, 49 insertions(+), 0 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 484b39d..9d02270 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> +static uint64_t last_version;
[...]
>  typedef struct RAMList {
> +    QemuMutex mutex;    /* Protects RAM block list */
>     uint8_t *phys_dirty;
> +    uint32_t version;   /* To detect ram block addition/removal */

Is there a reason why RAMList.version is uint32_t but last_version is uint64_t?

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] Migration thread mutex
@ 2011-08-29  9:04     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2011-08-29  9:04 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande <udeshpan@redhat.com> wrote:
> This patch implements migrate_ram mutex, which protects the RAMBlock list
> traversal in the migration thread during the transfer of a ram from their
> addition/removal from the iothread.
>
> Note: Combination of iothread mutex and migration thread mutex works as a
> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
> block list.
>
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c   |   21 +++++++++++++++++++++
>  cpu-all.h     |    3 +++
>  exec.c        |   23 +++++++++++++++++++++++
>  qemu-common.h |    2 ++
>  4 files changed, 49 insertions(+), 0 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 484b39d..9d02270 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> +static uint64_t last_version;
[...]
>  typedef struct RAMList {
> +    QemuMutex mutex;    /* Protects RAM block list */
>     uint8_t *phys_dirty;
> +    uint32_t version;   /* To detect ram block addition/removal */

Is there a reason why RAMList.version is uint32_t but last_version is uint64_t?

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] Separate migration thread
  2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29  9:09     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2011-08-29  9:09 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: kvm, qemu-devel

On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande <udeshpan@redhat.com> wrote:
> This patch creates a separate thread for the guest migration on the source side.
> All exits (on completion/error) from the migration thread are handled by a
> bottom handler, which is called from the iothread.
>
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  buffered_file.c     |   76 ++++++++++++++++++++----------------
>  migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>  migration.h         |    8 ++++
>  qemu-thread-posix.c |   10 +++++
>  qemu-thread.h       |    1 +

Will this patch break Windows builds by adding a function to
qemu-thread-posix.c which is not implemented in qemu-thread-win32.c?

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-29  9:09     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2011-08-29  9:09 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande <udeshpan@redhat.com> wrote:
> This patch creates a separate thread for the guest migration on the source side.
> All exits (on completion/error) from the migration thread are handled by a
> bottom handler, which is called from the iothread.
>
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  buffered_file.c     |   76 ++++++++++++++++++++----------------
>  migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>  migration.h         |    8 ++++
>  qemu-thread-posix.c |   10 +++++
>  qemu-thread.h       |    1 +

Will this patch break Windows builds by adding a function to
qemu-thread-posix.c which is not implemented in qemu-thread-win32.c?

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Separate thread for VM migration
  2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29 10:20   ` Paolo Bonzini
  -1 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-29 10:20 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: kvm, qemu-devel

On 08/27/2011 08:09 PM, Umesh Deshpande wrote:
> Following patch series deals with VCPU and iothread starvation during the
> migration of a guest. Currently the iothread is responsible for performing the
> guest migration. It holds qemu_mutex during the migration and doesn't allow VCPU
> to enter the qemu mode and delays its return to the guest. The guest migration,
> executed as an iohandler also delays the execution of other iohandlers.
> In the following patch series,
>
> The migration has been moved to a separate thread to
> reduce the qemu_mutex contention and iohandler starvation.
>
> Umesh Deshpande (5):
>    vm_stop from non-io threads
>    MRU ram block list
>    migration thread mutex
>    separate migration bitmap
>    separate migration thread
>
>   arch_init.c         |   38 +++++++++++++----
>   buffered_file.c     |   76 ++++++++++++++++++---------------
>   cpu-all.h           |   42 ++++++++++++++++++
>   cpus.c              |    4 +-
>   exec.c              |   97 ++++++++++++++++++++++++++++++++++++++++--
>   migration.c         |  117 ++++++++++++++++++++++++++++++++-------------------
>   migration.h         |    9 ++++
>   qemu-common.h       |    2 +
>   qemu-thread-posix.c |   10 ++++
>   qemu-thread.h       |    1 +
>   10 files changed, 302 insertions(+), 94 deletions(-)
>

I think this patchset is quite good.  These are the problems I found:

1) the locking rules in patch 3 are a bit too clever, and the cleverness 
will become obsolete once RCU is in place.  The advantage of the clever 
stuff is that rwlock code looks more like RCU code; the disadvantage is 
that it is harder to read and easier to bikeshed about.

2) it breaks Windows build, but that's easy to fix.

3) there are a _lot_ of cleanups possible on top of patch 5 (I would not 
be too surprised if the final version has an almost-neutral diffstat), 
and whether to prefer good or perfect is another very popular topic.

4) I'm not sure block migration has been tested in all scenarios (I'm 
curious about coroutine-heavy ones).  This may mean that the migration 
thread is blocked onto Marcelo's live streaming work, which would 
provide the ingredients to remove block migration altogether.  A round 
of Autotest testing is the minimum required to include this, and I'm not 
sure if this was done either.


That said, I find the code to be quite good overall, and I wouldn't 
oppose inclusion with only (2) fixed---may even take care of it 
myself---and more testing results apart from the impressive performance 
numbers.

About performance, I'm curious how you measured it.  Was the buffer 
cache empty?  That is, how many compressible pages were found?  I toyed 
with vectorizing is_dup_page, but I need to get some numbers before posting.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] Separate thread for VM migration
@ 2011-08-29 10:20   ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-29 10:20 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On 08/27/2011 08:09 PM, Umesh Deshpande wrote:
> Following patch series deals with VCPU and iothread starvation during the
> migration of a guest. Currently the iothread is responsible for performing the
> guest migration. It holds qemu_mutex during the migration and doesn't allow VCPU
> to enter the qemu mode and delays its return to the guest. The guest migration,
> executed as an iohandler also delays the execution of other iohandlers.
> In the following patch series,
>
> The migration has been moved to a separate thread to
> reduce the qemu_mutex contention and iohandler starvation.
>
> Umesh Deshpande (5):
>    vm_stop from non-io threads
>    MRU ram block list
>    migration thread mutex
>    separate migration bitmap
>    separate migration thread
>
>   arch_init.c         |   38 +++++++++++++----
>   buffered_file.c     |   76 ++++++++++++++++++---------------
>   cpu-all.h           |   42 ++++++++++++++++++
>   cpus.c              |    4 +-
>   exec.c              |   97 ++++++++++++++++++++++++++++++++++++++++--
>   migration.c         |  117 ++++++++++++++++++++++++++++++++-------------------
>   migration.h         |    9 ++++
>   qemu-common.h       |    2 +
>   qemu-thread-posix.c |   10 ++++
>   qemu-thread.h       |    1 +
>   10 files changed, 302 insertions(+), 94 deletions(-)
>

I think this patchset is quite good.  These are the problems I found:

1) the locking rules in patch 3 are a bit too clever, and the cleverness 
will become obsolete once RCU is in place.  The advantage of the clever 
stuff is that rwlock code looks more like RCU code; the disadvantage is 
that it is harder to read and easier to bikeshed about.

2) it breaks Windows build, but that's easy to fix.

3) there are a _lot_ of cleanups possible on top of patch 5 (I would not 
be too surprised if the final version has an almost-neutral diffstat), 
and whether to prefer good or perfect is another very popular topic.

4) I'm not sure block migration has been tested in all scenarios (I'm 
curious about coroutine-heavy ones).  This may mean that the migration 
thread is blocked onto Marcelo's live streaming work, which would 
provide the ingredients to remove block migration altogether.  A round 
of Autotest testing is the minimum required to include this, and I'm not 
sure if this was done either.


That said, I find the code to be quite good overall, and I wouldn't 
oppose inclusion with only (2) fixed---may even take care of it 
myself---and more testing results apart from the impressive performance 
numbers.

About performance, I'm curious how you measured it.  Was the buffer 
cache empty?  That is, how many compressible pages were found?  I toyed 
with vectorizing is_dup_page, but I need to get some numbers before posting.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] Migration thread mutex
  2011-08-29  9:04     ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-08-29 13:49       ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-29 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm, qemu-devel

On 08/29/2011 05:04 AM, Stefan Hajnoczi wrote:
> On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande<udeshpan@redhat.com>  wrote:
>> This patch implements migrate_ram mutex, which protects the RAMBlock list
>> traversal in the migration thread during the transfer of a ram from their
>> addition/removal from the iothread.
>>
>> Note: Combination of iothread mutex and migration thread mutex works as a
>> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
>> block list.
>>
>> Signed-off-by: Umesh Deshpande<udeshpan@redhat.com>
>> ---
>>   arch_init.c   |   21 +++++++++++++++++++++
>>   cpu-all.h     |    3 +++
>>   exec.c        |   23 +++++++++++++++++++++++
>>   qemu-common.h |    2 ++
>>   4 files changed, 49 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 484b39d..9d02270 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>>
>>   static RAMBlock *last_block;
>>   static ram_addr_t last_offset;
>> +static uint64_t last_version;
> [...]
>>   typedef struct RAMList {
>> +    QemuMutex mutex;    /* Protects RAM block list */
>>      uint8_t *phys_dirty;
>> +    uint32_t version;   /* To detect ram block addition/removal */
> Is there a reason why RAMList.version is uint32_t but last_version is uint64_t?
No, my bad, they both should be consistent.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] Migration thread mutex
@ 2011-08-29 13:49       ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-29 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, kvm

On 08/29/2011 05:04 AM, Stefan Hajnoczi wrote:
> On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande<udeshpan@redhat.com>  wrote:
>> This patch implements migrate_ram mutex, which protects the RAMBlock list
>> traversal in the migration thread during the transfer of a ram from their
>> addition/removal from the iothread.
>>
>> Note: Combination of iothread mutex and migration thread mutex works as a
>> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
>> block list.
>>
>> Signed-off-by: Umesh Deshpande<udeshpan@redhat.com>
>> ---
>>   arch_init.c   |   21 +++++++++++++++++++++
>>   cpu-all.h     |    3 +++
>>   exec.c        |   23 +++++++++++++++++++++++
>>   qemu-common.h |    2 ++
>>   4 files changed, 49 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 484b39d..9d02270 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>>
>>   static RAMBlock *last_block;
>>   static ram_addr_t last_offset;
>> +static uint64_t last_version;
> [...]
>>   typedef struct RAMList {
>> +    QemuMutex mutex;    /* Protects RAM block list */
>>      uint8_t *phys_dirty;
>> +    uint32_t version;   /* To detect ram block addition/removal */
> Is there a reason why RAMList.version is uint32_t but last_version is uint64_t?
No, my bad, they both should be consistent.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] Separate migration thread
  2011-08-29  9:09     ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-08-29 13:49       ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-29 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm, qemu-devel

On 08/29/2011 05:09 AM, Stefan Hajnoczi wrote:
> On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande<udeshpan@redhat.com>  wrote:
>> This patch creates a separate thread for the guest migration on the source side.
>> All exits (on completion/error) from the migration thread are handled by a
>> bottom handler, which is called from the iothread.
>>
>> Signed-off-by: Umesh Deshpande<udeshpan@redhat.com>
>> ---
>>   buffered_file.c     |   76 ++++++++++++++++++++----------------
>>   migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>>   migration.h         |    8 ++++
>>   qemu-thread-posix.c |   10 +++++
>>   qemu-thread.h       |    1 +
> Will this patch break Windows builds by adding a function to
> qemu-thread-posix.c which is not implemented in qemu-thread-win32.c?
Yes, equivalent function needs to be added in qemu-thread.win32.c

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-29 13:49       ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-29 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, kvm

On 08/29/2011 05:09 AM, Stefan Hajnoczi wrote:
> On Sat, Aug 27, 2011 at 7:09 PM, Umesh Deshpande<udeshpan@redhat.com>  wrote:
>> This patch creates a separate thread for the guest migration on the source side.
>> All exits (on completion/error) from the migration thread are handled by a
>> bottom handler, which is called from the iothread.
>>
>> Signed-off-by: Umesh Deshpande<udeshpan@redhat.com>
>> ---
>>   buffered_file.c     |   76 ++++++++++++++++++++----------------
>>   migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>>   migration.h         |    8 ++++
>>   qemu-thread-posix.c |   10 +++++
>>   qemu-thread.h       |    1 +
> Will this patch break Windows builds by adding a function to
> qemu-thread-posix.c which is not implemented in qemu-thread-win32.c?
Yes, equivalent function needs to be added in qemu-thread.win32.c

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] Support for vm_stop from the migration thread
  2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29 16:56     ` Marcelo Tosatti
  -1 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 16:56 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: kvm, qemu-devel

On Sat, Aug 27, 2011 at 02:09:44PM -0400, Umesh Deshpande wrote:
> Currently, when any thread other than iothread calls vm_stop, it is scheduled to
> be executed later by the iothread. This patch allows the execution of vm_stop
> from threads other than iothread. This is especially helpful when the migration is
> moved into a separate thread.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  cpus.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index de70e02..f35f683 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
>  {
>      if (vm_running) {
>          cpu_disable_ticks();
> -        vm_running = 0;
>          pause_all_vcpus();
> +        vm_running = 0;
>          vm_state_notify(0, reason);
>          qemu_aio_flush();
>          bdrv_flush_all();

Why this change?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] Support for vm_stop from the migration thread
@ 2011-08-29 16:56     ` Marcelo Tosatti
  0 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 16:56 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 02:09:44PM -0400, Umesh Deshpande wrote:
> Currently, when any thread other than iothread calls vm_stop, it is scheduled to
> be executed later by the iothread. This patch allows the execution of vm_stop
> from threads other than iothread. This is especially helpful when the migration is
> moved into a separate thread.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  cpus.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index de70e02..f35f683 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
>  {
>      if (vm_running) {
>          cpu_disable_ticks();
> -        vm_running = 0;
>          pause_all_vcpus();
> +        vm_running = 0;
>          vm_state_notify(0, reason);
>          qemu_aio_flush();
>          bdrv_flush_all();

Why this change?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] Migration thread mutex
  2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29 18:40     ` Marcelo Tosatti
  -1 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 18:40 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 02:09:46PM -0400, Umesh Deshpande wrote:
> This patch implements migrate_ram mutex, which protects the RAMBlock list
> traversal in the migration thread during the transfer of a ram from their
> addition/removal from the iothread.
> 
> Note: Combination of iothread mutex and migration thread mutex works as a
> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
> block list.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c   |   21 +++++++++++++++++++++
>  cpu-all.h     |    3 +++
>  exec.c        |   23 +++++++++++++++++++++++
>  qemu-common.h |    2 ++
>  4 files changed, 49 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 484b39d..9d02270 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>  
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> +static uint64_t last_version;
>  
>  static int ram_save_block(QEMUFile *f)
>  {
> @@ -170,6 +171,7 @@ static int ram_save_block(QEMUFile *f)
>  
>      last_block = block;
>      last_offset = offset;
> +    last_version = ram_list.version;
>  
>      return bytes_sent;
>  }
> @@ -270,6 +272,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>          bytes_transferred = 0;
>          last_block = NULL;
>          last_offset = 0;
> +        last_version = ram_list.version = 0;
>          sort_ram_list();
>  
>          /* Make sure all dirty bits are set */
> @@ -298,6 +301,17 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>      bytes_transferred_last = bytes_transferred;
>      bwidth = qemu_get_clock_ns(rt_clock);
>  
> +    if (stage != 3) {
> +        qemu_mutex_lock_migrate_ram();
> +        qemu_mutex_unlock_iothread();
> +    }

Dropping the iothread lock from within a timer handler is not safe.
This change to ram_save_live should be moved to the patch where 
migration thread is introduced.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] Migration thread mutex
@ 2011-08-29 18:40     ` Marcelo Tosatti
  0 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 18:40 UTC (permalink / raw)
  To: Umesh Deshpande; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 02:09:46PM -0400, Umesh Deshpande wrote:
> This patch implements migrate_ram mutex, which protects the RAMBlock list
> traversal in the migration thread during the transfer of a ram from their
> addition/removal from the iothread.
> 
> Note: Combination of iothread mutex and migration thread mutex works as a
> rw-lock. Both mutexes are acquired while modifying the ram_list members or RAM
> block list.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  arch_init.c   |   21 +++++++++++++++++++++
>  cpu-all.h     |    3 +++
>  exec.c        |   23 +++++++++++++++++++++++
>  qemu-common.h |    2 ++
>  4 files changed, 49 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 484b39d..9d02270 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -109,6 +109,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>  
>  static RAMBlock *last_block;
>  static ram_addr_t last_offset;
> +static uint64_t last_version;
>  
>  static int ram_save_block(QEMUFile *f)
>  {
> @@ -170,6 +171,7 @@ static int ram_save_block(QEMUFile *f)
>  
>      last_block = block;
>      last_offset = offset;
> +    last_version = ram_list.version;
>  
>      return bytes_sent;
>  }
> @@ -270,6 +272,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>          bytes_transferred = 0;
>          last_block = NULL;
>          last_offset = 0;
> +        last_version = ram_list.version = 0;
>          sort_ram_list();
>  
>          /* Make sure all dirty bits are set */
> @@ -298,6 +301,17 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>      bytes_transferred_last = bytes_transferred;
>      bwidth = qemu_get_clock_ns(rt_clock);
>  
> +    if (stage != 3) {
> +        qemu_mutex_lock_migrate_ram();
> +        qemu_mutex_unlock_iothread();
> +    }

Dropping the iothread lock from within a timer handler is not safe.
This change to ram_save_live should be moved to the patch where 
migration thread is introduced.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] Separate migration thread
  2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
@ 2011-08-29 18:49     ` Marcelo Tosatti
  -1 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 18:49 UTC (permalink / raw)
  To: Umesh Deshpande, Anthony Liguori; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 02:09:48PM -0400, Umesh Deshpande wrote:
> This patch creates a separate thread for the guest migration on the source side.
> All exits (on completion/error) from the migration thread are handled by a
> bottom handler, which is called from the iothread.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  buffered_file.c     |   76 ++++++++++++++++++++----------------
>  migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>  migration.h         |    8 ++++
>  qemu-thread-posix.c |   10 +++++
>  qemu-thread.h       |    1 +
>  5 files changed, 124 insertions(+), 76 deletions(-)
> 
> diff --git a/buffered_file.c b/buffered_file.c
> index 41b42c3..c31852e 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -16,6 +16,8 @@
>  #include "qemu-timer.h"
>  #include "qemu-char.h"
>  #include "buffered_file.h"
> +#include "migration.h"
> +#include "qemu-thread.h"
>  
>  //#define DEBUG_BUFFERED_FILE
>  
> @@ -28,13 +30,14 @@ typedef struct QEMUFileBuffered
>      void *opaque;
>      QEMUFile *file;
>      int has_error;
> +    int closed;
>      int freeze_output;
>      size_t bytes_xfer;
>      size_t xfer_limit;
>      uint8_t *buffer;
>      size_t buffer_size;
>      size_t buffer_capacity;
> -    QEMUTimer *timer;
> +    QemuThread thread;
>  } QEMUFileBuffered;
>  
>  #ifdef DEBUG_BUFFERED_FILE
> @@ -155,14 +158,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
>          offset = size;
>      }
>  
> -    if (pos == 0 && size == 0) {
> -        DPRINTF("file is ready\n");
> -        if (s->bytes_xfer <= s->xfer_limit) {
> -            DPRINTF("notifying client\n");
> -            s->put_ready(s->opaque);
> -        }
> -    }
> -
>      return offset;
>  }
>  
> @@ -173,22 +168,25 @@ static int buffered_close(void *opaque)
>  
>      DPRINTF("closing\n");
>  
> -    while (!s->has_error && s->buffer_size) {
> -        buffered_flush(s);
> -        if (s->freeze_output)
> -            s->wait_for_unfreeze(s);
> -    }
> +    s->closed = 1;
>  
> -    ret = s->close(s->opaque);
> +    qemu_mutex_unlock_migrate_ram();
> +    qemu_mutex_unlock_iothread();

This is using the ram mutex to protect migration thread specific data.
A new lock should be introduced for that purpose.

> -    qemu_del_timer(s->timer);
> -    qemu_free_timer(s->timer);
> +    qemu_thread_join(&s->thread);
> +    /* Waits for the completion of the migration thread */
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_mutex_lock_migrate_ram();
> +
> +    ret = s->close(s->opaque);
>      qemu_free(s->buffer);
>      qemu_free(s);
>  
>      return ret;
>  }
>  
> +
>  static int buffered_rate_limit(void *opaque)
>  {
>      QEMUFileBuffered *s = opaque;
> @@ -228,26 +226,37 @@ static int64_t buffered_get_rate_limit(void *opaque)
>      return s->xfer_limit;
>  }
>  
> -static void buffered_rate_tick(void *opaque)
> +static void *migrate_vm(void *opaque)
>  {

buffered_file.c was generic code that has now become migration specific
(although migration was the only user). So it should either stop
pretending to be generic code, by rename to migration_thread.c along
with un-exporting interfaces, or it should remain generic and therefore
all migration specific knowledge moved somewhere else.

Anthony?

> +    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
> +    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};

qemu_get_clock_ms should happen under iothread lock.

> -    if (s->freeze_output)
> -        return;
> +        current_time = qemu_get_clock_ms(rt_clock);
> +        if (!s->closed && (expire_time > current_time)) {
> +            tv.tv_usec = 1000 * (expire_time - current_time);
> +            select(0, NULL, NULL, NULL, &tv);
> +            continue;
> +        }
>  
> -    s->bytes_xfer = 0;
> +        s->bytes_xfer = 0;
>  
> -    buffered_flush(s);
> +        expire_time = qemu_get_clock_ms(rt_clock) + 100;
> +        if (!s->closed) {
> +            s->put_ready(s->opaque);
> +        } else {
> +            buffered_flush(s);
> +        }
> +    }
>  
> -    /* Add some checks around this */
> -    s->put_ready(s->opaque);
> +    return NULL;
>  }
>  
>  QEMUFile *qemu_fopen_ops_buffered(void *opaque,
> @@ -267,15 +276,14 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
>      s->put_ready = put_ready;
>      s->wait_for_unfreeze = wait_for_unfreeze;
>      s->close = close;
> +    s->closed = 0;
>  
>      s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
>                               buffered_close, buffered_rate_limit,
>                               buffered_set_rate_limit,
> -			     buffered_get_rate_limit);
> -
> -    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
> +                             buffered_get_rate_limit);
>  
> -    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
> +    qemu_thread_create(&s->thread, migrate_vm, s);
>  
>      return s->file;
>  }
> diff --git a/migration.c b/migration.c
> index af3a1f2..5df186d 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -149,10 +149,12 @@ int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject **ret_data)
>      }
>      max_throttle = d;
>  
> +    qemu_mutex_lock_migrate_ram();
>      s = migrate_to_fms(current_migration);
>      if (s && s->file) {
>          qemu_file_set_rate_limit(s->file, max_throttle);
>      }
> +    qemu_mutex_unlock_migrate_ram();

This lock protects the RAMlist, and only the RAMlist, but here its      
being used to protect migration thread data. As noted above, a new lock 
should be introduced.

>      int ret = 0;
>  
> -    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> -
>      if (s->file) {
>          DPRINTF("closing file\n");
> +        qemu_mutex_lock_migrate_ram();
>          if (qemu_fclose(s->file) != 0) {
>              ret = -1;
>          }
> +        qemu_mutex_unlock_migrate_ram();
>          s->file = NULL;
>      }

Again.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-29 18:49     ` Marcelo Tosatti
  0 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-29 18:49 UTC (permalink / raw)
  To: Umesh Deshpande, Anthony Liguori; +Cc: qemu-devel, kvm

On Sat, Aug 27, 2011 at 02:09:48PM -0400, Umesh Deshpande wrote:
> This patch creates a separate thread for the guest migration on the source side.
> All exits (on completion/error) from the migration thread are handled by a
> bottom handler, which is called from the iothread.
> 
> Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
> ---
>  buffered_file.c     |   76 ++++++++++++++++++++----------------
>  migration.c         |  105 ++++++++++++++++++++++++++++++--------------------
>  migration.h         |    8 ++++
>  qemu-thread-posix.c |   10 +++++
>  qemu-thread.h       |    1 +
>  5 files changed, 124 insertions(+), 76 deletions(-)
> 
> diff --git a/buffered_file.c b/buffered_file.c
> index 41b42c3..c31852e 100644
> --- a/buffered_file.c
> +++ b/buffered_file.c
> @@ -16,6 +16,8 @@
>  #include "qemu-timer.h"
>  #include "qemu-char.h"
>  #include "buffered_file.h"
> +#include "migration.h"
> +#include "qemu-thread.h"
>  
>  //#define DEBUG_BUFFERED_FILE
>  
> @@ -28,13 +30,14 @@ typedef struct QEMUFileBuffered
>      void *opaque;
>      QEMUFile *file;
>      int has_error;
> +    int closed;
>      int freeze_output;
>      size_t bytes_xfer;
>      size_t xfer_limit;
>      uint8_t *buffer;
>      size_t buffer_size;
>      size_t buffer_capacity;
> -    QEMUTimer *timer;
> +    QemuThread thread;
>  } QEMUFileBuffered;
>  
>  #ifdef DEBUG_BUFFERED_FILE
> @@ -155,14 +158,6 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
>          offset = size;
>      }
>  
> -    if (pos == 0 && size == 0) {
> -        DPRINTF("file is ready\n");
> -        if (s->bytes_xfer <= s->xfer_limit) {
> -            DPRINTF("notifying client\n");
> -            s->put_ready(s->opaque);
> -        }
> -    }
> -
>      return offset;
>  }
>  
> @@ -173,22 +168,25 @@ static int buffered_close(void *opaque)
>  
>      DPRINTF("closing\n");
>  
> -    while (!s->has_error && s->buffer_size) {
> -        buffered_flush(s);
> -        if (s->freeze_output)
> -            s->wait_for_unfreeze(s);
> -    }
> +    s->closed = 1;
>  
> -    ret = s->close(s->opaque);
> +    qemu_mutex_unlock_migrate_ram();
> +    qemu_mutex_unlock_iothread();

This is using the ram mutex to protect migration thread specific data.
A new lock should be introduced for that purpose.

> -    qemu_del_timer(s->timer);
> -    qemu_free_timer(s->timer);
> +    qemu_thread_join(&s->thread);
> +    /* Waits for the completion of the migration thread */
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_mutex_lock_migrate_ram();
> +
> +    ret = s->close(s->opaque);
>      qemu_free(s->buffer);
>      qemu_free(s);
>  
>      return ret;
>  }
>  
> +
>  static int buffered_rate_limit(void *opaque)
>  {
>      QEMUFileBuffered *s = opaque;
> @@ -228,26 +226,37 @@ static int64_t buffered_get_rate_limit(void *opaque)
>      return s->xfer_limit;
>  }
>  
> -static void buffered_rate_tick(void *opaque)
> +static void *migrate_vm(void *opaque)
>  {

buffered_file.c was generic code that has now become migration specific
(although migration was the only user). So it should either stop
pretending to be generic code, by rename to migration_thread.c along
with un-exporting interfaces, or it should remain generic and therefore
all migration specific knowledge moved somewhere else.

Anthony?

> +    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
> +    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};

qemu_get_clock_ms should happen under iothread lock.

> -    if (s->freeze_output)
> -        return;
> +        current_time = qemu_get_clock_ms(rt_clock);
> +        if (!s->closed && (expire_time > current_time)) {
> +            tv.tv_usec = 1000 * (expire_time - current_time);
> +            select(0, NULL, NULL, NULL, &tv);
> +            continue;
> +        }
>  
> -    s->bytes_xfer = 0;
> +        s->bytes_xfer = 0;
>  
> -    buffered_flush(s);
> +        expire_time = qemu_get_clock_ms(rt_clock) + 100;
> +        if (!s->closed) {
> +            s->put_ready(s->opaque);
> +        } else {
> +            buffered_flush(s);
> +        }
> +    }
>  
> -    /* Add some checks around this */
> -    s->put_ready(s->opaque);
> +    return NULL;
>  }
>  
>  QEMUFile *qemu_fopen_ops_buffered(void *opaque,
> @@ -267,15 +276,14 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
>      s->put_ready = put_ready;
>      s->wait_for_unfreeze = wait_for_unfreeze;
>      s->close = close;
> +    s->closed = 0;
>  
>      s->file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
>                               buffered_close, buffered_rate_limit,
>                               buffered_set_rate_limit,
> -			     buffered_get_rate_limit);
> -
> -    s->timer = qemu_new_timer_ms(rt_clock, buffered_rate_tick, s);
> +                             buffered_get_rate_limit);
>  
> -    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
> +    qemu_thread_create(&s->thread, migrate_vm, s);
>  
>      return s->file;
>  }
> diff --git a/migration.c b/migration.c
> index af3a1f2..5df186d 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -149,10 +149,12 @@ int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject **ret_data)
>      }
>      max_throttle = d;
>  
> +    qemu_mutex_lock_migrate_ram();
>      s = migrate_to_fms(current_migration);
>      if (s && s->file) {
>          qemu_file_set_rate_limit(s->file, max_throttle);
>      }
> +    qemu_mutex_unlock_migrate_ram();

This lock protects the RAMlist, and only the RAMlist, but here its      
being used to protect migration thread data. As noted above, a new lock 
should be introduced.

>      int ret = 0;
>  
> -    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> -
>      if (s->file) {
>          DPRINTF("closing file\n");
> +        qemu_mutex_lock_migrate_ram();
>          if (qemu_fclose(s->file) != 0) {
>              ret = -1;
>          }
> +        qemu_mutex_unlock_migrate_ram();
>          s->file = NULL;
>      }

Again.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] Support for vm_stop from the migration thread
  2011-08-29 16:56     ` [Qemu-devel] " Marcelo Tosatti
@ 2011-08-30  8:40       ` Paolo Bonzini
  -1 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-30  8:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Umesh Deshpande, kvm, qemu-devel

On 08/29/2011 06:56 PM, Marcelo Tosatti wrote:
>> >  diff --git a/cpus.c b/cpus.c
>> >  index de70e02..f35f683 100644
>> >  --- a/cpus.c
>> >  +++ b/cpus.c
>> >  @@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
>> >    {
>> >        if (vm_running) {
>> >            cpu_disable_ticks();
>> >  -        vm_running = 0;
>> >            pause_all_vcpus();
>> >  +        vm_running = 0;
>> >            vm_state_notify(0, reason);
>> >            qemu_aio_flush();
>> >            bdrv_flush_all();
> Why this change?

Without it, you could have two threads calling into do_vm_stop and the 
second would not wait for all CPUs to be paused.  Still not perfect as 
you'd get double notifications, you would need a condition variable or 
QemuEvent (from the RCU series) for that.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] Support for vm_stop from the migration thread
@ 2011-08-30  8:40       ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-30  8:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Umesh Deshpande, qemu-devel, kvm

On 08/29/2011 06:56 PM, Marcelo Tosatti wrote:
>> >  diff --git a/cpus.c b/cpus.c
>> >  index de70e02..f35f683 100644
>> >  --- a/cpus.c
>> >  +++ b/cpus.c
>> >  @@ -122,8 +122,8 @@ static void do_vm_stop(int reason)
>> >    {
>> >        if (vm_running) {
>> >            cpu_disable_ticks();
>> >  -        vm_running = 0;
>> >            pause_all_vcpus();
>> >  +        vm_running = 0;
>> >            vm_state_notify(0, reason);
>> >            qemu_aio_flush();
>> >            bdrv_flush_all();
> Why this change?

Without it, you could have two threads calling into do_vm_stop and the 
second would not wait for all CPUs to be paused.  Still not perfect as 
you'd get double notifications, you would need a condition variable or 
QemuEvent (from the RCU series) for that.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] Separate migration thread
  2011-08-29 18:49     ` [Qemu-devel] " Marcelo Tosatti
@ 2011-08-30  8:48       ` Paolo Bonzini
  -1 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-30  8:48 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Umesh Deshpande, Anthony Liguori, qemu-devel, kvm

On 08/29/2011 08:49 PM, Marcelo Tosatti wrote:
>> >  -static void buffered_rate_tick(void *opaque)
>> >  +static void *migrate_vm(void *opaque)
>> >    {
>
> buffered_file.c was generic code that has now become migration specific
> (although migration was the only user). So it should either stop
> pretending to be generic code, by rename to migration_thread.c along
> with un-exporting interfaces, or it should remain generic and therefore
> all migration specific knowledge moved somewhere else.

Actually, the thread function is ill-named.  buffered_file.c is still 
generic code (or if it is not, it's a bug), except it should be called 
threaded_file.c.

Moving it to migration.c is also an option of course.  I asked Umesh to 
keep the abstraction for now, because it helped pinpointing places where 
abstractions were leaking in (such as the qemu_mutex_unlock_migrate_ram 
call that you found).

>> +    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
>> +    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
>
> qemu_get_clock_ms should happen under iothread lock.

For rt_clock it is safe.  Should be documented, though.

>> +    qemu_mutex_lock_migrate_ram();
>>      s = migrate_to_fms(current_migration);
>>      if (s && s->file) {
>>          qemu_file_set_rate_limit(s->file, max_throttle);
>>      }
>> +    qemu_mutex_unlock_migrate_ram();
>
> This lock protects the RAMlist, and only the RAMlist, but here its
> being used to protect migration thread data. As noted above, a new lock
> should be introduced.

Even better, freeing the buffered_file should be only done in the 
iothread (if this is not the case) so that the lock can be pushed down 
to buffered_set_rate_limit...

> +        qemu_mutex_lock_migrate_ram();
>          if (qemu_fclose(s->file) != 0) {
>              ret = -1;
>          }
> +        qemu_mutex_unlock_migrate_ram();

... and buffered_close (if a lock turns out to be needed at all).

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-30  8:48       ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2011-08-30  8:48 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Anthony Liguori, Umesh Deshpande, qemu-devel, kvm

On 08/29/2011 08:49 PM, Marcelo Tosatti wrote:
>> >  -static void buffered_rate_tick(void *opaque)
>> >  +static void *migrate_vm(void *opaque)
>> >    {
>
> buffered_file.c was generic code that has now become migration specific
> (although migration was the only user). So it should either stop
> pretending to be generic code, by rename to migration_thread.c along
> with un-exporting interfaces, or it should remain generic and therefore
> all migration specific knowledge moved somewhere else.

Actually, the thread function is ill-named.  buffered_file.c is still 
generic code (or if it is not, it's a bug), except it should be called 
threaded_file.c.

Moving it to migration.c is also an option of course.  I asked Umesh to 
keep the abstraction for now, because it helped pinpointing places where 
abstractions were leaking in (such as the qemu_mutex_unlock_migrate_ram 
call that you found).

>> +    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
>> +    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
>
> qemu_get_clock_ms should happen under iothread lock.

For rt_clock it is safe.  Should be documented, though.

>> +    qemu_mutex_lock_migrate_ram();
>>      s = migrate_to_fms(current_migration);
>>      if (s && s->file) {
>>          qemu_file_set_rate_limit(s->file, max_throttle);
>>      }
>> +    qemu_mutex_unlock_migrate_ram();
>
> This lock protects the RAMlist, and only the RAMlist, but here its
> being used to protect migration thread data. As noted above, a new lock
> should be introduced.

Even better, freeing the buffered_file should be only done in the 
iothread (if this is not the case) so that the lock can be pushed down 
to buffered_set_rate_limit...

> +        qemu_mutex_lock_migrate_ram();
>          if (qemu_fclose(s->file) != 0) {
>              ret = -1;
>          }
> +        qemu_mutex_unlock_migrate_ram();

... and buffered_close (if a lock turns out to be needed at all).

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] Separate migration thread
  2011-08-30  8:48       ` [Qemu-devel] " Paolo Bonzini
@ 2011-08-30 12:31         ` Marcelo Tosatti
  -1 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-30 12:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Umesh Deshpande, Anthony Liguori, qemu-devel, kvm

On Tue, Aug 30, 2011 at 10:48:11AM +0200, Paolo Bonzini wrote:
> On 08/29/2011 08:49 PM, Marcelo Tosatti wrote:
> >>>  -static void buffered_rate_tick(void *opaque)
> >>>  +static void *migrate_vm(void *opaque)
> >>>    {
> >
> >buffered_file.c was generic code that has now become migration specific
> >(although migration was the only user). So it should either stop
> >pretending to be generic code, by rename to migration_thread.c along
> >with un-exporting interfaces, or it should remain generic and therefore
> >all migration specific knowledge moved somewhere else.
> 
> Actually, the thread function is ill-named.  buffered_file.c is
> still generic code (or if it is not, it's a bug), except it should
> be called threaded_file.c.
> 
> Moving it to migration.c is also an option of course.  I asked Umesh
> to keep the abstraction for now, because it helped pinpointing
> places where abstractions were leaking in (such as the
> qemu_mutex_unlock_migrate_ram call that you found).

Fair enough, its indeed generic except misuse of ram lock.

> >>+    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
> >>+    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
> >
> >qemu_get_clock_ms should happen under iothread lock.
> 
> For rt_clock it is safe.  Should be documented, though.
>
> >>+    qemu_mutex_lock_migrate_ram();
> >>     s = migrate_to_fms(current_migration);
> >>     if (s && s->file) {
> >>         qemu_file_set_rate_limit(s->file, max_throttle);
> >>     }
> >>+    qemu_mutex_unlock_migrate_ram();
> >
> >This lock protects the RAMlist, and only the RAMlist, but here its
> >being used to protect migration thread data. As noted above, a new lock
> >should be introduced.
> 
> Even better, freeing the buffered_file should be only done in the
> iothread (if this is not the case) so that the lock can be pushed
> down to buffered_set_rate_limit...

Sounds good.

> >+        qemu_mutex_lock_migrate_ram();
> >         if (qemu_fclose(s->file) != 0) {
> >             ret = -1;
> >         }
> >+        qemu_mutex_unlock_migrate_ram();
> 
> ... and buffered_close (if a lock turns out to be needed at all).
>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] Separate migration thread
@ 2011-08-30 12:31         ` Marcelo Tosatti
  0 siblings, 0 replies; 36+ messages in thread
From: Marcelo Tosatti @ 2011-08-30 12:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Anthony Liguori, Umesh Deshpande, qemu-devel, kvm

On Tue, Aug 30, 2011 at 10:48:11AM +0200, Paolo Bonzini wrote:
> On 08/29/2011 08:49 PM, Marcelo Tosatti wrote:
> >>>  -static void buffered_rate_tick(void *opaque)
> >>>  +static void *migrate_vm(void *opaque)
> >>>    {
> >
> >buffered_file.c was generic code that has now become migration specific
> >(although migration was the only user). So it should either stop
> >pretending to be generic code, by rename to migration_thread.c along
> >with un-exporting interfaces, or it should remain generic and therefore
> >all migration specific knowledge moved somewhere else.
> 
> Actually, the thread function is ill-named.  buffered_file.c is
> still generic code (or if it is not, it's a bug), except it should
> be called threaded_file.c.
> 
> Moving it to migration.c is also an option of course.  I asked Umesh
> to keep the abstraction for now, because it helped pinpointing
> places where abstractions were leaking in (such as the
> qemu_mutex_unlock_migrate_ram call that you found).

Fair enough, its indeed generic except misuse of ram lock.

> >>+    int64_t current_time, expire_time = qemu_get_clock_ms(rt_clock) + 100;
> >>+    struct timeval tv = { .tv_sec = 0, .tv_usec = 100000};
> >
> >qemu_get_clock_ms should happen under iothread lock.
> 
> For rt_clock it is safe.  Should be documented, though.
>
> >>+    qemu_mutex_lock_migrate_ram();
> >>     s = migrate_to_fms(current_migration);
> >>     if (s && s->file) {
> >>         qemu_file_set_rate_limit(s->file, max_throttle);
> >>     }
> >>+    qemu_mutex_unlock_migrate_ram();
> >
> >This lock protects the RAMlist, and only the RAMlist, but here its
> >being used to protect migration thread data. As noted above, a new lock
> >should be introduced.
> 
> Even better, freeing the buffered_file should be only done in the
> iothread (if this is not the case) so that the lock can be pushed
> down to buffered_set_rate_limit...

Sounds good.

> >+        qemu_mutex_lock_migrate_ram();
> >         if (qemu_fclose(s->file) != 0) {
> >             ret = -1;
> >         }
> >+        qemu_mutex_unlock_migrate_ram();
> 
> ... and buffered_close (if a lock turns out to be needed at all).
>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Separate thread for VM migration
  2011-08-29 10:20   ` [Qemu-devel] " Paolo Bonzini
@ 2011-08-31  3:53     ` Umesh Deshpande
  -1 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-31  3:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 3011 bytes --]

On Mon, Aug 29, 2011 at 6:20 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 08/27/2011 08:09 PM, Umesh Deshpande wrote:
>
>> Following patch series deals with VCPU and iothread starvation during the
>> migration of a guest. Currently the iothread is responsible for performing
>> the
>> guest migration. It holds qemu_mutex during the migration and doesn't
>> allow VCPU
>> to enter the qemu mode and delays its return to the guest. The guest
>> migration,
>> executed as an iohandler also delays the execution of other iohandlers.
>> In the following patch series,
>>
>> The migration has been moved to a separate thread to
>> reduce the qemu_mutex contention and iohandler starvation.
>>
>> Umesh Deshpande (5):
>>   vm_stop from non-io threads
>>   MRU ram block list
>>   migration thread mutex
>>   separate migration bitmap
>>   separate migration thread
>>
>>  arch_init.c         |   38 +++++++++++++----
>>  buffered_file.c     |   76 ++++++++++++++++++------------**---
>>  cpu-all.h           |   42 ++++++++++++++++++
>>  cpus.c              |    4 +-
>>  exec.c              |   97 ++++++++++++++++++++++++++++++**++++++++++--
>>  migration.c         |  117 ++++++++++++++++++++++++++++++**
>> ++-------------------
>>  migration.h         |    9 ++++
>>  qemu-common.h       |    2 +
>>  qemu-thread-posix.c |   10 ++++
>>  qemu-thread.h       |    1 +
>>  10 files changed, 302 insertions(+), 94 deletions(-)
>>
>>
> I think this patchset is quite good.  These are the problems I found:
>
> 1) the locking rules in patch 3 are a bit too clever, and the cleverness
> will become obsolete once RCU is in place.  The advantage of the clever
> stuff is that rwlock code looks more like RCU code; the disadvantage is that
> it is harder to read and easier to bikeshed about.
>
> 2) it breaks Windows build, but that's easy to fix.
>
> 3) there are a _lot_ of cleanups possible on top of patch 5 (I would not be
> too surprised if the final version has an almost-neutral diffstat), and
> whether to prefer good or perfect is another very popular topic.
>
> 4) I'm not sure block migration has been tested in all scenarios (I'm
> curious about coroutine-heavy ones).  This may mean that the migration
> thread is blocked onto Marcelo's live streaming work, which would provide
> the ingredients to remove block migration altogether.  A round of Autotest
> testing is the minimum required to include this, and I'm not sure if this
> was done either.
>
>
> That said, I find the code to be quite good overall, and I wouldn't oppose
> inclusion with only (2) fixed---may even take care of it myself---and more
> testing results apart from the impressive performance numbers.
>
> About performance, I'm curious how you measured it.  Was the buffer cache
> empty?  That is, how many compressible pages were found?  I toyed with
> vectorizing is_dup_page, but I need to get some numbers before posting.


Above tests were run with an idle VM. I didn't measure the number of
compressible pages.

- Umesh

[-- Attachment #2: Type: text/html, Size: 3591 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] Separate thread for VM migration
@ 2011-08-31  3:53     ` Umesh Deshpande
  0 siblings, 0 replies; 36+ messages in thread
From: Umesh Deshpande @ 2011-08-31  3:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 3011 bytes --]

On Mon, Aug 29, 2011 at 6:20 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 08/27/2011 08:09 PM, Umesh Deshpande wrote:
>
>> Following patch series deals with VCPU and iothread starvation during the
>> migration of a guest. Currently the iothread is responsible for performing
>> the
>> guest migration. It holds qemu_mutex during the migration and doesn't
>> allow VCPU
>> to enter the qemu mode and delays its return to the guest. The guest
>> migration,
>> executed as an iohandler also delays the execution of other iohandlers.
>> In the following patch series,
>>
>> The migration has been moved to a separate thread to
>> reduce the qemu_mutex contention and iohandler starvation.
>>
>> Umesh Deshpande (5):
>>   vm_stop from non-io threads
>>   MRU ram block list
>>   migration thread mutex
>>   separate migration bitmap
>>   separate migration thread
>>
>>  arch_init.c         |   38 +++++++++++++----
>>  buffered_file.c     |   76 ++++++++++++++++++------------**---
>>  cpu-all.h           |   42 ++++++++++++++++++
>>  cpus.c              |    4 +-
>>  exec.c              |   97 ++++++++++++++++++++++++++++++**++++++++++--
>>  migration.c         |  117 ++++++++++++++++++++++++++++++**
>> ++-------------------
>>  migration.h         |    9 ++++
>>  qemu-common.h       |    2 +
>>  qemu-thread-posix.c |   10 ++++
>>  qemu-thread.h       |    1 +
>>  10 files changed, 302 insertions(+), 94 deletions(-)
>>
>>
> I think this patchset is quite good.  These are the problems I found:
>
> 1) the locking rules in patch 3 are a bit too clever, and the cleverness
> will become obsolete once RCU is in place.  The advantage of the clever
> stuff is that rwlock code looks more like RCU code; the disadvantage is that
> it is harder to read and easier to bikeshed about.
>
> 2) it breaks Windows build, but that's easy to fix.
>
> 3) there are a _lot_ of cleanups possible on top of patch 5 (I would not be
> too surprised if the final version has an almost-neutral diffstat), and
> whether to prefer good or perfect is another very popular topic.
>
> 4) I'm not sure block migration has been tested in all scenarios (I'm
> curious about coroutine-heavy ones).  This may mean that the migration
> thread is blocked onto Marcelo's live streaming work, which would provide
> the ingredients to remove block migration altogether.  A round of Autotest
> testing is the minimum required to include this, and I'm not sure if this
> was done either.
>
>
> That said, I find the code to be quite good overall, and I wouldn't oppose
> inclusion with only (2) fixed---may even take care of it myself---and more
> testing results apart from the impressive performance numbers.
>
> About performance, I'm curious how you measured it.  Was the buffer cache
> empty?  That is, how many compressible pages were found?  I toyed with
> vectorizing is_dup_page, but I need to get some numbers before posting.


Above tests were run with an idle VM. I didn't measure the number of
compressible pages.

- Umesh

[-- Attachment #2: Type: text/html, Size: 3591 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-08-31  3:53 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-27 18:09 [PATCH 0/5] Separate thread for VM migration Umesh Deshpande
2011-08-27 18:09 ` [Qemu-devel] " Umesh Deshpande
2011-08-27 18:09 ` [PATCH 1/5] Support for vm_stop from the migration thread Umesh Deshpande
2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
2011-08-29 16:56   ` Marcelo Tosatti
2011-08-29 16:56     ` [Qemu-devel] " Marcelo Tosatti
2011-08-30  8:40     ` Paolo Bonzini
2011-08-30  8:40       ` [Qemu-devel] " Paolo Bonzini
2011-08-27 18:09 ` [PATCH 2/5] MRU ram block list Umesh Deshpande
2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
2011-08-27 18:09 ` [PATCH 3/5] Migration thread mutex Umesh Deshpande
2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
2011-08-29  9:04   ` Stefan Hajnoczi
2011-08-29  9:04     ` [Qemu-devel] " Stefan Hajnoczi
2011-08-29 13:49     ` Umesh Deshpande
2011-08-29 13:49       ` [Qemu-devel] " Umesh Deshpande
2011-08-29 18:40   ` Marcelo Tosatti
2011-08-29 18:40     ` [Qemu-devel] " Marcelo Tosatti
2011-08-27 18:09 ` [PATCH 4/5] Separate migration dirty bitmap Umesh Deshpande
2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
2011-08-27 18:09 ` [PATCH 5/5] Separate migration thread Umesh Deshpande
2011-08-27 18:09   ` [Qemu-devel] " Umesh Deshpande
2011-08-29  9:09   ` Stefan Hajnoczi
2011-08-29  9:09     ` [Qemu-devel] " Stefan Hajnoczi
2011-08-29 13:49     ` Umesh Deshpande
2011-08-29 13:49       ` [Qemu-devel] " Umesh Deshpande
2011-08-29 18:49   ` Marcelo Tosatti
2011-08-29 18:49     ` [Qemu-devel] " Marcelo Tosatti
2011-08-30  8:48     ` Paolo Bonzini
2011-08-30  8:48       ` [Qemu-devel] " Paolo Bonzini
2011-08-30 12:31       ` Marcelo Tosatti
2011-08-30 12:31         ` [Qemu-devel] " Marcelo Tosatti
2011-08-29 10:20 ` [PATCH 0/5] Separate thread for VM migration Paolo Bonzini
2011-08-29 10:20   ` [Qemu-devel] " Paolo Bonzini
2011-08-31  3:53   ` Umesh Deshpande
2011-08-31  3:53     ` [Qemu-devel] " Umesh Deshpande

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.