kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/41] postcopy live migration
@ 2012-06-04  9:57 Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
                   ` (42 more replies)
  0 siblings, 43 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

After the long time, we have v2. This is qemu part.
The linux kernel part is sent separatedly.

Changes v1 -> v2:
- split up patches for review
- buffered file refactored
- many bug fixes
  Espcially PV drivers can work with postcopy
- optimization/heuristic

Patches
1 - 30: refactoring exsiting code and preparation
31 - 37: implement postcopy itself (essential part)
38 - 41: some optimization/heuristic for postcopy

Intro
=====
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=====
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
  command line options
  -postcopy [-postcopy-flags <flags>]
  where flags is for changing behavior for benchmark/debugging
  Currently the following flags are available
  0: default
  1: enable touching page request

  example:
  qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm

- outging part
  options for migrate command 
  migrate [-p [-n] [-m]] URI [<prefault forward> [<prefault backword>]]
  -p: indicate postcopy migration
  -n: disable background transferring pages: This is for benchmark/debugging
  -m: move background transfer of postcopy mode
  <prefault forward>: The number of forward pages which is sent with on-demand
  <prefault backward>: The number of backward pages which is sent with
                       on-demand

  example:
  migrate -p -n tcp:<dest ip address>:4444 
  migrate -p -n -m tcp:<dest ip address>:4444 32 0


TODO
====
- benchmark/evaluation. Especially how async page fault affects the result.
- improve/optimization
  At the moment at least what I'm aware of is
  - making incoming socket non-blocking with thread
    As page compression is comming, it is impractical to non-blocking read
    and check if the necessary data is read.
  - touching pages in incoming qemu process by fd handler seems suboptimal.
    creating dedicated thread?
  - outgoing handler seems suboptimal causing latency.
- consider on FUSE/CUSE possibility
- don't fork umemd, but create thread?

basic postcopy work flow
========================
        qemu on the destination
              |
              V
        open(/dev/umem)
              |
              V
        UMEM_INIT
              |
              V
        Here we have two file descriptors to
        umem device and shmem file
              |
              |                                  umemd
              |                                  daemon on the destination
              |
              V    create pipe to communicate
        fork()---------------------------------------,
              |                                      |
              V                                      |
        close(socket)                                V
        close(shmem)                              mmap(shmem file)
              |                                      |
              V                                      V
        mmap(umem device) for guest RAM           close(shmem file)
              |                                      |
        close(umem device)                           |
              |                                      |
              V                                      |
        wait for ready from daemon <----pipe-----send ready message
              |                                      |
              |                                 Here the daemon takes over 
        send ok------------pipe---------------> the owner of the socket    
              |				        to the source              
              V                                      |
        entering post copy stage                     |
        start guest execution                        |
              |                                      |
              V                                      V
        access guest RAM                          read() to get faulted pages
              |                                      |
              V                                      V
        page fault ------------------------------>page offset is returned
        block                                        |
                                                     V
                                                  pull page from the source
                                                  write the page contents
                                                  to the shmem.
                                                     |
                                                     V
        unblock     <-----------------------------write() to tell served pages
        the fault handler returns the page
        page fault is resolved
              |
              |                                   pages can be sent
              |                                   backgroundly
              |                                      |
              |                                      V
              |                                   write()
              |                                      |
              V                                      V
        The specified pages<-----pipe------------request to touch pages
        are made present by                          |
        touching guest RAM.                          |
              |                                      |
              V                                      V
             reply-------------pipe-------------> release the cached page
              |                                   madvise(MADV_REMOVE)
              |                                      |
              V                                      V

                 all the pages are pulled from the source

              |                                      |
              V                                      V
        the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
       (note: I'm not sure if this can be implemented or not)
              |                                      |
              V                                      V
        migration completes                        exit()




Isaku Yamahata (41):
  arch_init: export sort_ram_list() and ram_save_block()
  arch_init: export RAM_SAVE_xxx flags for postcopy
  arch_init/ram_save: introduce constant for ram save version = 4
  arch_init: refactor host_from_stream_offset()
  arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
  arch_init: refactor ram_save_block()
  arch_init/ram_save_live: factor out ram_save_limit
  arch_init/ram_load: refactor ram_load
  arch_init: introduce helper function to find ram block with id string
  arch_init: simplify a bit by ram_find_block()
  arch_init: factor out counting transferred bytes
  arch_init: factor out setting last_block, last_offset
  exec.c: factor out qemu_get_ram_ptr()
  exec.c: export last_ram_offset()
  savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
  savevm: qemu_pending_size() to return pending buffered size
  savevm, buffered_file: introduce method to drain buffer of buffered
    file
  QEMUFile: add qemu_file_fd() for later use
  savevm/QEMUFile: drop qemu_stdio_fd
  savevm/QEMUFileSocket: drop duplicated member fd
  savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
  savevm/QEMUFile: introduce qemu_fopen_fd
  migration.c: remove redundant line in migrate_init()
  migration: export migrate_fd_completed() and migrate_fd_cleanup()
  migration: factor out parameters into MigrationParams
  buffered_file: factor out buffer management logic
  buffered_file: Introduce QEMUFileNonblock for nonblock write
  buffered_file: add qemu_file to read/write to buffer in memory
  umem.h: import Linux umem.h
  update-linux-headers.sh: teach umem.h to update-linux-headers.sh
  configure: add CONFIG_POSTCOPY option
  savevm: add new section that is used by postcopy
  postcopy: introduce -postcopy and -postcopy-flags option
  postcopy outgoing: add -p and -n option to migrate command
  postcopy: introduce helper functions for postcopy
  postcopy: implement incoming part of postcopy live migration
  postcopy: implement outgoing part of postcopy live migration
  postcopy/outgoing: add forward, backward option to specify the size
    of prefault
  postcopy/outgoing: implement prefault
  migrate: add -m (movebg) option to migrate command
  migration/postcopy: add movebg mode

 Makefile.target                 |    5 +
 arch_init.c                     |  298 ++++---
 arch_init.h                     |   20 +
 block-migration.c               |    8 +-
 buffered_file.c                 |  322 ++++++--
 buffered_file.h                 |   32 +
 configure                       |   12 +
 cpu-all.h                       |    9 +
 exec-obsolete.h                 |    1 +
 exec.c                          |   87 ++-
 hmp-commands.hx                 |   18 +-
 hmp.c                           |   10 +-
 linux-headers/linux/umem.h      |   42 +
 migration-exec.c                |   12 +-
 migration-fd.c                  |   25 +-
 migration-postcopy-stub.c       |   77 ++
 migration-postcopy.c            | 1771 +++++++++++++++++++++++++++++++++++++++
 migration-tcp.c                 |   25 +-
 migration-unix.c                |   26 +-
 migration.c                     |   97 ++-
 migration.h                     |   47 +-
 qapi-schema.json                |    4 +-
 qemu-common.h                   |    2 +
 qemu-file.h                     |    8 +-
 qemu-options.hx                 |   25 +
 qmp-commands.hx                 |    4 +-
 savevm.c                        |  177 ++++-
 scripts/update-linux-headers.sh |    2 +-
 sysemu.h                        |    4 +-
 umem.c                          |  364 ++++++++
 umem.h                          |  101 +++
 vl.c                            |   16 +-
 vmstate.h                       |    2 +-
 33 files changed, 3373 insertions(+), 280 deletions(-)
 create mode 100644 linux-headers/linux/umem.h
 create mode 100644 migration-postcopy-stub.c
 create mode 100644 migration-postcopy.c
 create mode 100644 umem.c
 create mode 100644 umem.h


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    4 ++--
 migration.h |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..38e0173 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -164,7 +164,7 @@ static int is_dup_page(uint8_t *page)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+int ram_save_block(QEMUFile *f)
 {
     RAMBlock *block = last_block;
     ram_addr_t offset = last_offset;
@@ -273,7 +273,7 @@ static int block_compar(const void *a, const void *b)
     return strcmp((*ablock)->idstr, (*bblock)->idstr);
 }
 
-static void sort_ram_list(void)
+void sort_ram_list(void)
 {
     RAMBlock *block, *nblock, **blocks;
     int n;
diff --git a/migration.h b/migration.h
index 2e9ca2e..8b9509c 100644
--- a/migration.h
+++ b/migration.h
@@ -76,6 +76,8 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void sort_ram_list(void);
+int ram_save_block(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    7 -------
 arch_init.h |    7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 38e0173..bd4e61e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -88,13 +88,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***********************************************************/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL     0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE     0x08
-#define RAM_SAVE_FLAG_EOS      0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-
 #ifdef __ALTIVEC__
 #include <altivec.h>
 #define VECTYPE        vector unsigned char
diff --git a/arch_init.h b/arch_init.h
index c7cb94a..7cc3fa7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -30,4 +30,11 @@ int tcg_available(void);
 int kvm_available(void);
 int xen_available(void);
 
+#define RAM_SAVE_FLAG_FULL     0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE     0x08
+#define RAM_SAVE_FLAG_EOS      0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+
 #endif
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() Isaku Yamahata
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |    2 +-
 arch_init.h |    2 ++
 vl.c        |    4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bd4e61e..2a53f58 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -410,7 +410,7 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
     int flags;
     int error;
 
-    if (version_id < 4 || version_id > 4) {
+    if (version_id < 4 || version_id > RAM_SAVE_VERSION_ID) {
         return -EINVAL;
     }
 
diff --git a/arch_init.h b/arch_init.h
index 7cc3fa7..456637d 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -37,4 +37,6 @@ int xen_available(void);
 #define RAM_SAVE_FLAG_EOS      0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 
+#define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
+
 #endif
diff --git a/vl.c b/vl.c
index 23ab3a3..62dc343 100644
--- a/vl.c
+++ b/vl.c
@@ -3436,8 +3436,8 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_sdcard, snapshot, machine->use_scsi,
                   IF_SD, 0, SD_OPTS);
 
-    register_savevm_live(NULL, "ram", 0, 4, NULL, ram_save_live, NULL,
-                         ram_load, NULL);
+    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID, NULL,
+                         ram_save_live, NULL, ram_load, NULL);
 
     if (nb_numa_nodes > 0) {
         int i;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 04/41] arch_init: refactor host_from_stream_offset()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (2 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case Isaku Yamahata
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   25 ++++++++++++++++++-------
 arch_init.h |    7 +++++++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2a53f58..36ece1d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -374,21 +374,22 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     return (stage == 2) && (expected_time <= migrate_max_downtime());
 }
 
-static inline void *host_from_stream_offset(QEMUFile *f,
-                                            ram_addr_t offset,
-                                            int flags)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+                                       ram_addr_t offset,
+                                       int flags,
+                                       RAMBlock **last_blockp)
 {
-    static RAMBlock *block = NULL;
+    RAMBlock *block;
     char id[256];
     uint8_t len;
 
     if (flags & RAM_SAVE_FLAG_CONTINUE) {
-        if (!block) {
+        if (!(*last_blockp)) {
             fprintf(stderr, "Ack, bad migration stream!\n");
             return NULL;
         }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        return memory_region_get_ram_ptr((*last_blockp)->mr) + offset;
     }
 
     len = qemu_get_byte(f);
@@ -396,14 +397,24 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     id[len] = 0;
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id)))
+        if (!strncmp(id, block->idstr, sizeof(id))) {
+            *last_blockp = block;
             return memory_region_get_ram_ptr(block->mr) + offset;
+        }
     }
 
     fprintf(stderr, "Can't find block %s!\n", id);
     return NULL;
 }
 
+static inline void *host_from_stream_offset(QEMUFile *f,
+                                            ram_addr_t offset,
+                                            int flags)
+{
+    static RAMBlock *block = NULL;
+    return ram_load_host_from_stream_offset(f, offset, flags, &block);
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
diff --git a/arch_init.h b/arch_init.h
index 456637d..d84eac7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -39,4 +39,11 @@ int xen_available(void);
 
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
+#if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+                                       ram_addr_t offset,
+                                       int flags,
+                                       RAMBlock **last_blockp);
+#endif
+
 #endif
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (3 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 06/41] arch_init: refactor ram_save_block() Isaku Yamahata
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   21 ++++++++++++++-------
 migration.h |    1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 36ece1d..28e5abb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -287,6 +287,19 @@ void sort_ram_list(void)
     g_free(blocks);
 }
 
+void ram_save_live_mem_size(QEMUFile *f)
+{
+    RAMBlock *block;
+
+    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        qemu_put_byte(f, strlen(block->idstr));
+        qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+        qemu_put_be64(f, block->length);
+    }
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
@@ -321,13 +334,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
         memory_global_dirty_log_start();
 
-        qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
-
-        QLIST_FOREACH(block, &ram_list.blocks, next) {
-            qemu_put_byte(f, strlen(block->idstr));
-            qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
-            qemu_put_be64(f, block->length);
-        }
+        ram_save_live_mem_size(f);
     }
 
     bytes_transferred_last = bytes_transferred;
diff --git a/migration.h b/migration.h
index 8b9509c..e2e9b43 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 06/41] arch_init: refactor ram_save_block()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (4 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit Isaku Yamahata
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>

---
Chnages v1 -> v2:
- don't refer last_block which can be NULL.
  And avoid possible infinite loop.
---
 arch_init.c |   82 +++++++++++++++++++++++++++++++++-------------------------
 arch_init.h |    1 +
 2 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 28e5abb..900cc8e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,44 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+static RAMBlock *last_block_sent = NULL;
+
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+    MemoryRegion *mr = block->mr;
+    uint8_t *p;
+    int cont;
+
+    if (!memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
+                                 DIRTY_MEMORY_MIGRATION)) {
+        return 0;
+    }
+    memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
+                              DIRTY_MEMORY_MIGRATION);
+
+    cont = (block == last_block_sent) ? RAM_SAVE_FLAG_CONTINUE : 0;
+    p = memory_region_get_ram_ptr(mr) + offset;
+    last_block_sent = block;
+
+    if (is_dup_page(p)) {
+        qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
+        if (!cont) {
+            qemu_put_byte(f, strlen(block->idstr));
+            qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+        }
+        qemu_put_byte(f, *p);
+        return 1;
+    }
+
+    qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
+    if (!cont) {
+        qemu_put_byte(f, strlen(block->idstr));
+        qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+    }
+    qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+    return TARGET_PAGE_SIZE;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -162,45 +200,14 @@ int ram_save_block(QEMUFile *f)
     RAMBlock *block = last_block;
     ram_addr_t offset = last_offset;
     int bytes_sent = 0;
-    MemoryRegion *mr;
 
-    if (!block)
+    if (!block) {
         block = QLIST_FIRST(&ram_list.blocks);
+        last_block = block;
+    }
 
     do {
-        mr = block->mr;
-        if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                    DIRTY_MEMORY_MIGRATION)) {
-            uint8_t *p;
-            int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-            memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
-                                      DIRTY_MEMORY_MIGRATION);
-
-            p = memory_region_get_ram_ptr(mr) + offset;
-
-            if (is_dup_page(p)) {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
-                qemu_put_byte(f, *p);
-                bytes_sent = 1;
-            } else {
-                qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-                if (!cont) {
-                    qemu_put_byte(f, strlen(block->idstr));
-                    qemu_put_buffer(f, (uint8_t *)block->idstr,
-                                    strlen(block->idstr));
-                }
-                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-                bytes_sent = TARGET_PAGE_SIZE;
-            }
-
-            break;
-        }
+        bytes_sent = ram_save_page(f, block, offset);
 
         offset += TARGET_PAGE_SIZE;
         if (offset >= block->length) {
@@ -209,6 +216,10 @@ int ram_save_block(QEMUFile *f)
             if (!block)
                 block = QLIST_FIRST(&ram_list.blocks);
         }
+
+        if (bytes_sent > 0) {
+            break;
+        }
     } while (block != last_block || offset != last_offset);
 
     last_block = block;
@@ -318,6 +329,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     if (stage == 1) {
         RAMBlock *block;
         bytes_transferred = 0;
+        last_block_sent = NULL;
         last_block = NULL;
         last_offset = 0;
         sort_ram_list();
diff --git a/arch_init.h b/arch_init.h
index d84eac7..0a39082 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
                                        ram_addr_t offset,
                                        int flags,
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (5 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 06/41] arch_init: refactor ram_save_block() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 08/41] arch_init/ram_load: refactor ram_load Isaku Yamahata
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   28 ++++++++++++++++------------
 migration.h |    1 +
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 900cc8e..c861e30 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -311,9 +311,23 @@ void ram_save_live_mem_size(QEMUFile *f)
     }
 }
 
+void ram_save_memory_set_dirty(void)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        ram_addr_t addr;
+        for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
+            if (!memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
+                                         DIRTY_MEMORY_MIGRATION)) {
+                memory_region_set_dirty(block->mr, addr, TARGET_PAGE_SIZE);
+            }
+        }
+    }
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
-    ram_addr_t addr;
     uint64_t bytes_transferred_last;
     double bwidth = 0;
     uint64_t expected_time = 0;
@@ -327,7 +341,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     memory_global_sync_dirty_bitmap(get_system_memory());
 
     if (stage == 1) {
-        RAMBlock *block;
         bytes_transferred = 0;
         last_block_sent = NULL;
         last_block = NULL;
@@ -335,17 +348,8 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         sort_ram_list();
 
         /* Make sure all dirty bits are set */
-        QLIST_FOREACH(block, &ram_list.blocks, next) {
-            for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
-                if (!memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
-                                             DIRTY_MEMORY_MIGRATION)) {
-                    memory_region_set_dirty(block->mr, addr, TARGET_PAGE_SIZE);
-                }
-            }
-        }
-
+        ram_save_memory_set_dirty();
         memory_global_dirty_log_start();
-
         ram_save_live_mem_size(f);
     }
 
diff --git a/migration.h b/migration.h
index e2e9b43..6cf4512 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_memory_set_dirty(void);
 void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 08/41] arch_init/ram_load: refactor ram_load
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (6 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string Isaku Yamahata
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   67 +++++++++++++++++++++++++++++++++-------------------------
 arch_init.h |    1 +
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c861e30..bb0cd52 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -438,6 +438,41 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     return ram_load_host_from_stream_offset(f, offset, flags, &block);
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+    /* Synchronize RAM block list */
+    char id[256];
+    ram_addr_t length;
+
+    while (total_ram_bytes) {
+        RAMBlock *block;
+        uint8_t len;
+
+        len = qemu_get_byte(f);
+        qemu_get_buffer(f, (uint8_t *)id, len);
+        id[len] = 0;
+        length = qemu_get_be64(f);
+
+        QLIST_FOREACH(block, &ram_list.blocks, next) {
+            if (!strncmp(id, block->idstr, sizeof(id))) {
+                if (block->length != length)
+                    return -EINVAL;
+                break;
+            }
+        }
+
+        if (!block) {
+            fprintf(stderr, "Unknown ramblock \"%s\", cannot "
+                    "accept migration\n", id);
+            return -EINVAL;
+        }
+
+        total_ram_bytes -= length;
+    }
+
+    return 0;
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     ram_addr_t addr;
@@ -456,35 +491,9 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         if (flags & RAM_SAVE_FLAG_MEM_SIZE) {
             if (version_id == 4) {
-                /* Synchronize RAM block list */
-                char id[256];
-                ram_addr_t length;
-                ram_addr_t total_ram_bytes = addr;
-
-                while (total_ram_bytes) {
-                    RAMBlock *block;
-                    uint8_t len;
-
-                    len = qemu_get_byte(f);
-                    qemu_get_buffer(f, (uint8_t *)id, len);
-                    id[len] = 0;
-                    length = qemu_get_be64(f);
-
-                    QLIST_FOREACH(block, &ram_list.blocks, next) {
-                        if (!strncmp(id, block->idstr, sizeof(id))) {
-                            if (block->length != length)
-                                return -EINVAL;
-                            break;
-                        }
-                    }
-
-                    if (!block) {
-                        fprintf(stderr, "Unknown ramblock \"%s\", cannot "
-                                "accept migration\n", id);
-                        return -EINVAL;
-                    }
-
-                    total_ram_bytes -= length;
+                error = ram_load_mem_size(f, addr);
+                if (error) {
+                    return error;
                 }
             }
         }
diff --git a/arch_init.h b/arch_init.h
index 0a39082..507f110 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -45,6 +45,7 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
                                        ram_addr_t offset,
                                        int flags,
                                        RAMBlock **last_blockp);
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
 #endif
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (7 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 08/41] arch_init/ram_load: refactor ram_load Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() Isaku Yamahata
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   13 +++++++++++++
 arch_init.h |    1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bb0cd52..9981abe 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -397,6 +397,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     return (stage == 2) && (expected_time <= migrate_max_downtime());
 }
 
+RAMBlock *ram_find_block(const char *id, uint8_t len)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (!strncmp(id, block->idstr, len)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 void *ram_load_host_from_stream_offset(QEMUFile *f,
                                        ram_addr_t offset,
                                        int flags,
diff --git a/arch_init.h b/arch_init.h
index 507f110..7f5c77a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -41,6 +41,7 @@ int xen_available(void);
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
+RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
                                        ram_addr_t offset,
                                        int flags,
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (8 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 11/41] arch_init: factor out counting transferred bytes Isaku Yamahata
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   21 ++++++++-------------
 exec.c      |   12 ++++++------
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9981abe..73bf250 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -432,11 +432,10 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
     qemu_get_buffer(f, (uint8_t *)id, len);
     id[len] = 0;
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (!strncmp(id, block->idstr, sizeof(id))) {
-            *last_blockp = block;
-            return memory_region_get_ram_ptr(block->mr) + offset;
-        }
+    block = ram_find_block(id, len);
+    if (block) {
+        *last_blockp = block;
+        return memory_region_get_ram_ptr(block->mr) + offset;
     }
 
     fprintf(stderr, "Can't find block %s!\n", id);
@@ -466,19 +465,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
         id[len] = 0;
         length = qemu_get_be64(f);
 
-        QLIST_FOREACH(block, &ram_list.blocks, next) {
-            if (!strncmp(id, block->idstr, sizeof(id))) {
-                if (block->length != length)
-                    return -EINVAL;
-                break;
-            }
-        }
-
+        block = ram_find_block(id, len);
         if (!block) {
             fprintf(stderr, "Unknown ramblock \"%s\", cannot "
                     "accept migration\n", id);
             return -EINVAL;
         }
+        if (block->length != length) {
+            return -EINVAL;
+        }
 
         total_ram_bytes -= length;
     }
diff --git a/exec.c b/exec.c
index a0494c7..078a408 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include "kvm.h"
 #include "hw/xen.h"
 #include "qemu-timer.h"
+#include "arch_init.h"
 #include "memory.h"
 #include "exec-memory.h"
 #if defined(CONFIG_USER_ONLY)
@@ -2609,12 +2610,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
     }
     pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
 
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
-            fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
-                    new_block->idstr);
-            abort();
-        }
+    block = ram_find_block(new_block->idstr, strlen(new_block->idstr));
+    if (block != new_block) {
+        fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
+                new_block->idstr);
+        abort();
     }
 }
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 11/41] arch_init: factor out counting transferred bytes
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (9 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset Isaku Yamahata
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 73bf250..2617478 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -155,8 +155,9 @@ static int is_dup_page(uint8_t *page)
 }
 
 static RAMBlock *last_block_sent = NULL;
+static uint64_t bytes_transferred;
 
-int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+static int ram_save_page_int(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
 {
     MemoryRegion *mr = block->mr;
     uint8_t *p;
@@ -192,6 +193,13 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
     return TARGET_PAGE_SIZE;
 }
 
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+    int bytes_sent = ram_save_page_int(f, block, offset);
+    bytes_transferred += bytes_sent;
+    return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -228,8 +236,6 @@ int ram_save_block(QEMUFile *f)
     return bytes_sent;
 }
 
-static uint64_t bytes_transferred;
-
 static ram_addr_t ram_save_remaining(void)
 {
     RAMBlock *block;
@@ -357,11 +363,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     bwidth = qemu_get_clock_ns(rt_clock);
 
     while ((ret = qemu_file_rate_limit(f)) == 0) {
-        int bytes_sent;
-
-        bytes_sent = ram_save_block(f);
-        bytes_transferred += bytes_sent;
-        if (bytes_sent == 0) { /* no more blocks */
+        if (ram_save_block(f) == 0) { /* no more blocks */
             break;
         }
     }
@@ -381,11 +383,9 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
     /* try transferring iterative blocks of memory */
     if (stage == 3) {
-        int bytes_sent;
-
         /* flush all remaining blocks regardless of rate limiting */
-        while ((bytes_sent = ram_save_block(f)) != 0) {
-            bytes_transferred += bytes_sent;
+        while (ram_save_block(f) != 0) {
+            /* nothing */
         }
         memory_global_dirty_log_stop();
     }
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (10 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 11/41] arch_init: factor out counting transferred bytes Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() Isaku Yamahata
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 arch_init.c |   13 ++++++++-----
 arch_init.h |    1 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2617478..22d9691 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -203,6 +203,12 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset)
+{
+    last_block = block;
+    last_offset = offset;
+}
+
 int ram_save_block(QEMUFile *f)
 {
     RAMBlock *block = last_block;
@@ -230,9 +236,7 @@ int ram_save_block(QEMUFile *f)
         }
     } while (block != last_block || offset != last_offset);
 
-    last_block = block;
-    last_offset = offset;
-
+    ram_save_set_last_block(block, offset);
     return bytes_sent;
 }
 
@@ -349,8 +353,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     if (stage == 1) {
         bytes_transferred = 0;
         last_block_sent = NULL;
-        last_block = NULL;
-        last_offset = 0;
+        ram_save_set_last_block(NULL, 0);
         sort_ram_list();
 
         /* Make sure all dirty bits are set */
diff --git a/arch_init.h b/arch_init.h
index 7f5c77a..15548cd 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID     4 /* currently version 4 */
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (11 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 14/41] exec.c: export last_ram_offset() Isaku Yamahata
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 cpu-all.h |    2 ++
 exec.c    |   51 +++++++++++++++++++++++++++++----------------------
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 028528f..ff7f827 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -508,6 +508,8 @@ extern RAMList ram_list;
 extern const char *mem_path;
 extern int mem_prealloc;
 
+RAMBlock *qemu_get_ram_block(ram_addr_t adar);
+
 /* Flags stored in the low bits of the TLB virtual address.  These are
    defined so that fast path ram access is all zeros.  */
 /* Zero if TLB entry is valid.  */
diff --git a/exec.c b/exec.c
index 078a408..7f44893 100644
--- a/exec.c
+++ b/exec.c
@@ -2799,15 +2799,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
-   With the exception of the softmmu code in this file, this should
-   only be used for local memory (e.g. video ram) that the device owns,
-   and knows it isn't going to access beyond the end of the block.
-
-   It should not be used for general purpose DMA.
-   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
- */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
     RAMBlock *block;
 
@@ -2818,19 +2810,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
                 QLIST_REMOVE(block, next);
                 QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
             }
-            if (xen_enabled()) {
-                /* We need to check if the requested address is in the RAM
-                 * because we don't want to map the entire memory in QEMU.
-                 * In that case just map until the end of the page.
-                 */
-                if (block->offset == 0) {
-                    return xen_map_cache(addr, 0, 0);
-                } else if (block->host == NULL) {
-                    block->host =
-                        xen_map_cache(block->offset, block->length, 1);
-                }
-            }
-            return block->host + (addr - block->offset);
+            return block;
         }
     }
 
@@ -2841,6 +2821,33 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
+   With the exception of the softmmu code in this file, this should
+   only be used for local memory (e.g. video ram) that the device owns,
+   and knows it isn't going to access beyond the end of the block.
+
+   It should not be used for general purpose DMA.
+   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
+ */
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+    RAMBlock *block = qemu_get_ram_block(addr);
+
+    if (xen_enabled()) {
+        /* We need to check if the requested address is in the RAM
+         * because we don't want to map the entire memory in QEMU.
+         * In that case just map until the end of the page.
+         */
+        if (block->offset == 0) {
+            return xen_map_cache(addr, 0, 0);
+        } else if (block->host == NULL) {
+            block->host =
+                xen_map_cache(block->offset, block->length, 1);
+        }
+    }
+    return block->host + (addr - block->offset);
+}
+
+/* Return a host pointer to ram allocated with qemu_ram_alloc.
  * Same as qemu_get_ram_ptr but avoid reordering ramblocks.
  */
 void *qemu_safe_ram_ptr(ram_addr_t addr)
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 14/41] exec.c: export last_ram_offset()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (12 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip Isaku Yamahata
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 exec-obsolete.h |    1 +
 exec.c          |    4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/exec-obsolete.h b/exec-obsolete.h
index 792c831..fb21dd7 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -25,6 +25,7 @@
 
 #ifndef CONFIG_USER_ONLY
 
+ram_addr_t qemu_last_ram_offset(void);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/exec.c b/exec.c
index 7f44893..78eeee5 100644
--- a/exec.c
+++ b/exec.c
@@ -2576,7 +2576,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
     return offset;
 }
 
-static ram_addr_t last_ram_offset(void)
+ram_addr_t qemu_last_ram_offset(void)
 {
     RAMBlock *block;
     ram_addr_t last = 0;
@@ -2672,7 +2672,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
 
     ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
-                                       last_ram_offset() >> TARGET_PAGE_BITS);
+                                    qemu_last_ram_offset() >> TARGET_PAGE_BITS);
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (13 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 14/41] exec.c: export last_ram_offset() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size Isaku Yamahata
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Those will be used by postcopy.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    3 +++
 savevm.c    |    6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 31b83f6..a285bef 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -88,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 2d18bab..8ad843f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -588,14 +588,14 @@ void qemu_put_byte(QEMUFile *f, int v)
         qemu_fflush(f);
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
     if (f->buf_index + size <= f->buf_size) {
         f->buf_index += size;
     }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
     int pending;
     int index;
@@ -643,7 +643,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
     return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
     int index = f->buf_index + offset;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (14 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

This will be used later by postcopy migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    1 +
 savevm.c    |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index a285bef..880ef4b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -91,6 +91,7 @@ int qemu_get_byte(QEMUFile *f);
 int qemu_peek_byte(QEMUFile *f, int offset);
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 void qemu_file_skip(QEMUFile *f, int size);
+int qemu_pending_size(const QEMUFile *f);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 8ad843f..2992f97 100644
--- a/savevm.c
+++ b/savevm.c
@@ -595,6 +595,11 @@ void qemu_file_skip(QEMUFile *f, int size)
     }
 }
 
+int qemu_pending_size(const QEMUFile *f)
+{
+    return f->buf_size - f->buf_index;
+}
+
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
     int pending;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (15 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use Isaku Yamahata
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 buffered_file.c |   20 +++++++++++++++-----
 buffered_file.h |    1 +
 qemu-file.h     |    1 +
 savevm.c        |    7 +++++++
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index f170aa0..a38caec 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -170,6 +170,15 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
     return offset;
 }
 
+static void buffered_drain(QEMUFileBuffered *s)
+{
+    while (!qemu_file_get_error(s->file) && s->buffer_size) {
+        buffered_flush(s);
+        if (s->freeze_output)
+            s->wait_for_unfreeze(s->opaque);
+    }
+}
+
 static int buffered_close(void *opaque)
 {
     QEMUFileBuffered *s = opaque;
@@ -177,11 +186,7 @@ static int buffered_close(void *opaque)
 
     DPRINTF("closing\n");
 
-    while (!qemu_file_get_error(s->file) && s->buffer_size) {
-        buffered_flush(s);
-        if (s->freeze_output)
-            s->wait_for_unfreeze(s->opaque);
-    }
+    buffered_drain(s);
 
     ret = s->close(s->opaque);
 
@@ -291,3 +296,8 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
 
     return s->file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+    buffered_drain(buffered_file);
+}
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..cd8e1e8 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -26,5 +26,6 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit,
                                   BufferedPutReadyFunc *put_ready,
                                   BufferedWaitForUnfreezeFunc *wait_for_unfreeze,
                                   BufferedCloseFunc *close);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/qemu-file.h b/qemu-file.h
index 880ef4b..331ac8b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
diff --git a/savevm.c b/savevm.c
index 2992f97..fb47529 100644
--- a/savevm.c
+++ b/savevm.c
@@ -85,6 +85,7 @@
 #include "cpus.h"
 #include "memory.h"
 #include "qmp-commands.h"
+#include "buffered_file.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -477,6 +478,12 @@ void qemu_fflush(QEMUFile *f)
     }
 }
 
+void qemu_buffered_file_drain(QEMUFile *f)
+{
+    qemu_fflush(f);
+    qemu_buffered_file_drain_buffer(f->opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
     int len;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (16 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd Isaku Yamahata
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    1 +
 savevm.c    |   12 ++++++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 331ac8b..98a8023 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -71,6 +71,7 @@ QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
+int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index fb47529..cba1a69 100644
--- a/savevm.c
+++ b/savevm.c
@@ -178,6 +178,7 @@ struct QEMUFile {
     uint8_t buf[IO_BUF_SIZE];
 
     int last_error;
+    int fd;     /* -1 means fd isn't associated */
 };
 
 typedef struct QEMUFileStdio
@@ -276,6 +277,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode)
         s->file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, 
 				 NULL, NULL, NULL);
     }
+    s->file->fd = fileno(stdio_file);
     return s->file;
 }
 
@@ -291,6 +293,7 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode)
     return qemu_popen(popen_file, mode);
 }
 
+/* TODO: replace this with qemu_file_fd() */
 int qemu_stdio_fd(QEMUFile *f)
 {
     QEMUFileStdio *p;
@@ -325,6 +328,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
         s->file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, 
 				 NULL, NULL, NULL);
     }
+    s->file->fd = fd;
     return s->file;
 
 fail:
@@ -339,6 +343,7 @@ QEMUFile *qemu_fopen_socket(int fd)
     s->fd = fd;
     s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 			     NULL, NULL, NULL);
+    s->file->fd = fd;
     return s->file;
 }
 
@@ -381,6 +386,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode)
         s->file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, 
 			       NULL, NULL, NULL);
     }
+    s->file->fd = fileno(s->stdio_file);
     return s->file;
 fail:
     g_free(s);
@@ -431,10 +437,16 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
     f->set_rate_limit = set_rate_limit;
     f->get_rate_limit = get_rate_limit;
     f->is_write = 0;
+    f->fd = -1;
 
     return f;
 }
 
+int qemu_file_fd(QEMUFile *f)
+{
+    return f->fd;
+}
+
 int qemu_file_get_error(QEMUFile *f)
 {
     return f->last_error;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (17 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd Isaku Yamahata
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Now qemu_file_fd() replaces qemu_stdio_fd().

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration-exec.c |    4 ++--
 migration-fd.c   |    2 +-
 qemu-file.h      |    1 -
 savevm.c         |   12 ------------
 4 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..95e9779 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque)
     QEMUFile *f = opaque;
 
     process_incoming_migration(f);
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
     qemu_fclose(f);
 }
 
@@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command)
         return -errno;
     }
 
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL,
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL,
 			 exec_accept_incoming_migration, NULL, f);
 
     return 0;
diff --git a/migration-fd.c b/migration-fd.c
index 50138ed..d9c13fe 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque)
     QEMUFile *f = opaque;
 
     process_incoming_migration(f);
-    qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+    qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
     qemu_fclose(f);
 }
 
diff --git a/qemu-file.h b/qemu-file.h
index 98a8023..1a12e7d 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -70,7 +70,6 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-int qemu_stdio_fd(QEMUFile *f);
 int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index cba1a69..ec9f5d0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -293,18 +293,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode)
     return qemu_popen(popen_file, mode);
 }
 
-/* TODO: replace this with qemu_file_fd() */
-int qemu_stdio_fd(QEMUFile *f)
-{
-    QEMUFileStdio *p;
-    int fd;
-
-    p = (QEMUFileStdio *)f->opaque;
-    fd = fileno(p->stdio_file);
-
-    return fd;
-}
-
 QEMUFile *qemu_fdopen(int fd, const char *mode)
 {
     QEMUFileStdio *s;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (18 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close Isaku Yamahata
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: benoit.hudzia, aarcange, aliguori, quintela, stefanha,
	t.hirofuchi, dlaor, satoshi.itoh, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

fd is already stored in QEMUFile so drop duplicated member
QEMUFileSocket::fd.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 savevm.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index ec9f5d0..4b560b3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -189,7 +189,6 @@ typedef struct QEMUFileStdio
 
 typedef struct QEMUFileSocket
 {
-    int fd;
     QEMUFile *file;
 } QEMUFileSocket;
 
@@ -199,7 +198,7 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     ssize_t len;
 
     do {
-        len = qemu_recv(s->fd, buf, size, 0);
+        len = qemu_recv(s->file->fd, buf, size, 0);
     } while (len == -1 && socket_error() == EINTR);
 
     if (len == -1)
@@ -328,7 +327,6 @@ QEMUFile *qemu_fopen_socket(int fd)
 {
     QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
 
-    s->fd = fd;
     s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 			     NULL, NULL, NULL);
     s->file->fd = fd;
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (19 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Later the structure will be shared.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 savevm.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4b560b3..2fb0c3e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -187,14 +187,14 @@ typedef struct QEMUFileStdio
     QEMUFile *file;
 } QEMUFileStdio;
 
-typedef struct QEMUFileSocket
+typedef struct QEMUFileFD
 {
     QEMUFile *file;
-} QEMUFileSocket;
+} QEMUFileFD;
 
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
-    QEMUFileSocket *s = opaque;
+    QEMUFileFD *s = opaque;
     ssize_t len;
 
     do {
@@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     return len;
 }
 
-static int socket_close(void *opaque)
+static int fd_close(void *opaque)
 {
-    QEMUFileSocket *s = opaque;
+    QEMUFileFD *s = opaque;
     g_free(s);
     return 0;
 }
@@ -325,9 +325,9 @@ fail:
 
 QEMUFile *qemu_fopen_socket(int fd)
 {
-    QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
+    QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD));
 
-    s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
+    s->file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close,
 			     NULL, NULL, NULL);
     s->file->fd = fd;
     return s->file;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (20 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() Isaku Yamahata
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Introduce nonblocking fd read backend of QEMUFile.
This will be used by postcopy live migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 qemu-file.h |    1 +
 savevm.c    |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 1a12e7d..af5b123 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_fd(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_file_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 2fb0c3e..5640614 100644
--- a/savevm.c
+++ b/savevm.c
@@ -207,6 +207,35 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
     return len;
 }
 
+static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileFD *s = opaque;
+    ssize_t len = 0;
+
+    while (size > 0) {
+        ssize_t ret = read(s->file->fd, buf, size);
+        if (ret == -1) {
+            if (errno == EINTR) {
+                continue;
+            }
+            if (len == 0) {
+                len = -errno;
+            }
+            break;
+        }
+
+        if (ret == 0) {
+            /* the write end of the pipe is closed */
+            break;
+        }
+        len += ret;
+        buf += ret;
+        size -= ret;
+    }
+
+    return len;
+}
+
 static int fd_close(void *opaque)
 {
     QEMUFileFD *s = opaque;
@@ -333,6 +362,17 @@ QEMUFile *qemu_fopen_socket(int fd)
     return s->file;
 }
 
+QEMUFile *qemu_fopen_fd(int fd)
+{
+    QEMUFileFD *s = g_malloc0(sizeof(*s));
+
+    fcntl_setfl(fd, O_NONBLOCK);
+    s->file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close,
+                             NULL, NULL, NULL);
+    s->file->fd = fd;
+    return s->file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
                             int64_t pos, int size)
 {
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 23/41] migration.c: remove redundant line in migrate_init()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (21 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3f485d3..753addb 100644
--- a/migration.c
+++ b/migration.c
@@ -367,7 +367,6 @@ static MigrationState *migrate_init(int blk, int inc)
     int64_t bandwidth_limit = s->bandwidth_limit;
 
     memset(s, 0, sizeof(*s));
-    s->bandwidth_limit = bandwidth_limit;
     s->blk = blk;
     s->shared = inc;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup()
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (22 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 25/41] migration: factor out parameters into MigrationParams Isaku Yamahata
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration.c |    4 ++--
 migration.h |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 753addb..48a8f68 100644
--- a/migration.c
+++ b/migration.c
@@ -159,7 +159,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 /* shared migration helpers */
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
     int ret = 0;
 
@@ -187,7 +187,7 @@ void migrate_fd_error(MigrationState *s)
     migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
     DPRINTF("setting completed state\n");
     if (migrate_fd_cleanup(s) < 0) {
diff --git a/migration.h b/migration.h
index 6cf4512..d0dd536 100644
--- a/migration.h
+++ b/migration.h
@@ -62,7 +62,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 25/41] migration: factor out parameters into MigrationParams
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (23 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 26/41] buffered_file: factor out buffer management logic Isaku Yamahata
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Introduce MigrationParams for parameters of migration.

Cc: Orit Wasserman <owasserm@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v1 -> v2:
- catch up qapi change
---
 block-migration.c |    8 ++++----
 migration.c       |   21 +++++++++++++++------
 migration.h       |    8 ++++++--
 qemu-common.h     |    1 +
 savevm.c          |   10 +++++++---
 sysemu.h          |    2 +-
 vmstate.h         |    2 +-
 7 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2ffff..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-    block_mig_state.blk_enable = blk_enable;
-    block_mig_state.shared_base = shared_base;
+    block_mig_state.blk_enable = params->blk;
+    block_mig_state.shared_base = params->shared;
 
     /* shared base means that blk_enable = 1 */
-    block_mig_state.blk_enable |= shared_base;
+    block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index 48a8f68..3b97aec 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
                                       migrate_fd_close);
 
     DPRINTF("beginning savevm\n");
-    ret = qemu_savevm_state_begin(s->file, s->blk, s->shared);
+    ret = qemu_savevm_state_begin(s->file, &s->params);
     if (ret < 0) {
         DPRINTF("failed, %d\n", ret);
         migrate_fd_error(s);
@@ -361,15 +361,13 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
 
     memset(s, 0, sizeof(*s));
-    s->blk = blk;
-    s->shared = inc;
-
+    s->params = *params;
     s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
 
@@ -393,9 +391,20 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
+    MigrationParams params = {
+        .blk = false,
+        .shared = false,
+    };
     const char *p;
     int ret;
 
+    if (has_blk) {
+        params.blk = blk;
+    }
+    if (has_inc) {
+        params.shared = inc;
+    }
+
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -410,7 +419,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         return;
     }
 
-    s = migrate_init(blk, inc);
+    s = migrate_init(&params);
 
     if (strstart(uri, "tcp:", &p)) {
         ret = tcp_start_outgoing_migration(s, p, errp);
diff --git a/migration.h b/migration.h
index d0dd536..59e6e68 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+    int blk;
+    int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
     int (*close)(MigrationState *s);
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
-    int blk;
-    int shared;
+    MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..057c810 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -263,6 +263,7 @@ typedef struct EventNotifier EventNotifier;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
+typedef struct MigrationParams MigrationParams;
 
 typedef uint64_t pcibus_t;
 
diff --git a/savevm.c b/savevm.c
index 5640614..318ec61 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1611,7 +1611,7 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params)
 {
     SaveStateEntry *se;
     int ret;
@@ -1620,7 +1620,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
         if(se->set_params == NULL) {
             continue;
 	}
-	se->set_params(blk_enable, shared, se->opaque);
+	se->set_params(params, se->opaque);
     }
     
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1758,13 +1758,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
     int ret;
+    MigrationParams params = {
+        .blk = 0,
+        .shared = 0,
+    };
 
     if (qemu_savevm_state_blocked(NULL)) {
         ret = -EINVAL;
         goto out;
     }
 
-    ret = qemu_savevm_state_begin(f, 0, 0);
+    ret = qemu_savevm_state_begin(f, &params);
     if (ret < 0)
         goto out;
 
diff --git a/sysemu.h b/sysemu.h
index bc2c788..3857cf0 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -77,7 +77,7 @@ void do_info_snapshots(Monitor *mon);
 void qemu_announce_self(void);
 
 bool qemu_savevm_state_blocked(Error **errp);
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared);
+int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
 int qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(QEMUFile *f);
diff --git a/vmstate.h b/vmstate.h
index 82d97ae..5af45e0 100644
--- a/vmstate.h
+++ b/vmstate.h
@@ -26,7 +26,7 @@
 #ifndef QEMU_VMSTATE_H
 #define QEMU_VMSTATE_H 1
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef void SaveSetParamsHandler(const MigrationParams *params, void * opaque);
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(QEMUFile *f, int stage, void *opaque);
 typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 26/41] buffered_file: factor out buffer management logic
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (24 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 25/41] migration: factor out parameters into MigrationParams Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write Isaku Yamahata
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This patch factors out buffer management logic.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 buffered_file.c |  141 +++++++++++++++++++++++++++++++++---------------------
 buffered_file.h |    8 +++
 2 files changed, 94 insertions(+), 55 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index a38caec..22dd4c9 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -20,24 +20,6 @@
 #include "buffered_file.h"
 
 //#define DEBUG_BUFFERED_FILE
-
-typedef struct QEMUFileBuffered
-{
-    BufferedPutFunc *put_buffer;
-    BufferedPutReadyFunc *put_ready;
-    BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
-    BufferedCloseFunc *close;
-    void *opaque;
-    QEMUFile *file;
-    int freeze_output;
-    size_t bytes_xfer;
-    size_t xfer_limit;
-    uint8_t *buffer;
-    size_t buffer_size;
-    size_t buffer_capacity;
-    QEMUTimer *timer;
-} QEMUFileBuffered;
-
 #ifdef DEBUG_BUFFERED_FILE
 #define DPRINTF(fmt, ...) \
     do { printf("buffered-file: " fmt, ## __VA_ARGS__); } while (0)
@@ -46,57 +28,71 @@ typedef struct QEMUFileBuffered
     do { } while (0)
 #endif
 
-static void buffered_append(QEMUFileBuffered *s,
-                            const uint8_t *buf, size_t size)
-{
-    if (size > (s->buffer_capacity - s->buffer_size)) {
-        void *tmp;
-
-        DPRINTF("increasing buffer capacity from %zu by %zu\n",
-                s->buffer_capacity, size + 1024);
 
-        s->buffer_capacity += size + 1024;
+/***************************************************************************
+ * buffer management
+ */
 
-        tmp = g_realloc(s->buffer, s->buffer_capacity);
-        if (tmp == NULL) {
-            fprintf(stderr, "qemu file buffer expansion failed\n");
-            exit(1);
-        }
+static void buffer_destroy(QEMUBuffer *s)
+{
+    g_free(s->buffer);
+}
 
-        s->buffer = tmp;
+static void buffer_consume(QEMUBuffer *s, size_t offset)
+{
+    if (offset > 0) {
+        assert(s->buffer_size >= offset);
+        memmove(s->buffer, s->buffer + offset, s->buffer_size - offset);
+        s->buffer_size -= offset;
     }
+}
 
+static void buffer_append(QEMUBuffer *s, const uint8_t *buf, size_t size)
+{
+#define BUF_SIZE_INC    (32 * 1024)     /* = IO_BUF_SIZE */
+    int inc = size - (s->buffer_capacity - s->buffer_size);
+    if (inc > 0) {
+        s->buffer_capacity += DIV_ROUND_UP(inc, BUF_SIZE_INC) * BUF_SIZE_INC;
+        s->buffer = g_realloc(s->buffer, s->buffer_capacity);
+    }
     memcpy(s->buffer + s->buffer_size, buf, size);
     s->buffer_size += size;
 }
 
-static void buffered_flush(QEMUFileBuffered *s)
+typedef ssize_t (BufferPutBuf)(void *opaque, const void *data, size_t size);
+
+static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
+                         void *opaque, BufferPutBuf *put_buf)
 {
     size_t offset = 0;
     int error;
 
-    error = qemu_file_get_error(s->file);
+    error = qemu_file_get_error(file);
     if (error != 0) {
         DPRINTF("flush when error, bailing: %s\n", strerror(-error));
         return;
     }
 
-    DPRINTF("flushing %zu byte(s) of data\n", s->buffer_size);
+    DPRINTF("flushing %zu byte(s) of data\n", buf->buffer_size);
 
-    while (offset < s->buffer_size) {
+    while (offset < buf->buffer_size) {
         ssize_t ret;
 
-        ret = s->put_buffer(s->opaque, s->buffer + offset,
-                            s->buffer_size - offset);
-        if (ret == -EAGAIN) {
+        ret = put_buf(opaque, buf->buffer + offset, buf->buffer_size - offset);
+        if (ret == -EINTR) {
+            continue;
+        } else if (ret == -EAGAIN) {
             DPRINTF("backend not ready, freezing\n");
-            s->freeze_output = 1;
+            buf->freeze_output = true;
             break;
         }
 
-        if (ret <= 0) {
+        if (ret < 0) {
             DPRINTF("error flushing data, %zd\n", ret);
-            qemu_file_set_error(s->file, ret);
+            qemu_file_set_error(file, ret);
+            break;
+        } else if (ret == 0) {
+            DPRINTF("ret == 0\n");
             break;
         } else {
             DPRINTF("flushed %zd byte(s)\n", ret);
@@ -104,9 +100,44 @@ static void buffered_flush(QEMUFileBuffered *s)
         }
     }
 
-    DPRINTF("flushed %zu of %zu byte(s)\n", offset, s->buffer_size);
-    memmove(s->buffer, s->buffer + offset, s->buffer_size - offset);
-    s->buffer_size -= offset;
+    DPRINTF("flushed %zu of %zu byte(s)\n", offset, buf->buffer_size);
+    buffer_consume(buf, offset);
+}
+
+
+/***************************************************************************
+ * Buffered File
+ */
+
+typedef struct QEMUFileBuffered
+{
+    BufferedPutFunc *put_buffer;
+    BufferedPutReadyFunc *put_ready;
+    BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
+    BufferedCloseFunc *close;
+    void *opaque;
+    QEMUFile *file;
+    size_t bytes_xfer;
+    size_t xfer_limit;
+    QEMUTimer *timer;
+    QEMUBuffer buf;
+} QEMUFileBuffered;
+
+static ssize_t buffered_flush_putbuf(void *opaque,
+                                     const void *data, size_t size)
+{
+    QEMUFileBuffered *s = opaque;
+    ssize_t ret = s->put_buffer(s->opaque, data, size);
+    if (ret == 0) {
+        DPRINTF("error flushing data, %zd\n", ret);
+        qemu_file_set_error(s->file, ret);
+    }
+    return ret;
+}
+
+static void buffered_flush(QEMUFileBuffered *s)
+{
+    buffer_flush(&s->buf, s->file, s, buffered_flush_putbuf);
 }
 
 static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size)
@@ -124,11 +155,11 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
     }
 
     DPRINTF("unfreezing output\n");
-    s->freeze_output = 0;
+    s->buf.freeze_output = false;
 
     buffered_flush(s);
 
-    while (!s->freeze_output && offset < size) {
+    while (!s->buf.freeze_output && offset < size) {
         if (s->bytes_xfer > s->xfer_limit) {
             DPRINTF("transfer limit exceeded when putting\n");
             break;
@@ -137,7 +168,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
         ret = s->put_buffer(s->opaque, buf + offset, size - offset);
         if (ret == -EAGAIN) {
             DPRINTF("backend not ready, freezing\n");
-            s->freeze_output = 1;
+            s->buf.freeze_output = true;
             break;
         }
 
@@ -155,7 +186,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
 
     if (offset >= 0) {
         DPRINTF("buffering %d bytes\n", size - offset);
-        buffered_append(s, buf + offset, size - offset);
+        buffer_append(&s->buf, buf + offset, size - offset);
         offset = size;
     }
 
@@ -172,9 +203,9 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in
 
 static void buffered_drain(QEMUFileBuffered *s)
 {
-    while (!qemu_file_get_error(s->file) && s->buffer_size) {
+    while (!qemu_file_get_error(s->file) && s->buf.buffer_size) {
         buffered_flush(s);
-        if (s->freeze_output)
+        if (s->buf.freeze_output)
             s->wait_for_unfreeze(s->opaque);
     }
 }
@@ -192,7 +223,7 @@ static int buffered_close(void *opaque)
 
     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
-    g_free(s->buffer);
+    buffer_destroy(&s->buf);
     g_free(s);
 
     return ret;
@@ -213,7 +244,7 @@ static int buffered_rate_limit(void *opaque)
     if (ret) {
         return ret;
     }
-    if (s->freeze_output)
+    if (s->buf.freeze_output)
         return 1;
 
     if (s->bytes_xfer > s->xfer_limit)
@@ -256,7 +287,7 @@ static void buffered_rate_tick(void *opaque)
 
     qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
 
-    if (s->freeze_output)
+    if (s->buf.freeze_output)
         return;
 
     s->bytes_xfer = 0;
diff --git a/buffered_file.h b/buffered_file.h
index cd8e1e8..d3ef546 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -16,6 +16,14 @@
 
 #include "hw/hw.h"
 
+struct QEMUBuffer {
+    uint8_t *buffer;
+    size_t buffer_size;
+    size_t buffer_capacity;
+    bool freeze_output;
+};
+typedef struct QEMUBuffer QEMUBuffer;
+
 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (25 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 26/41] buffered_file: factor out buffer management logic Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory Isaku Yamahata
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 buffered_file.c |  115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 buffered_file.h |   13 ++++++
 2 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 22dd4c9..5198923 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,121 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***************************************************************************
+ * Nonblocking write only file
+ */
+static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
+                                            const void *data, size_t size)
+{
+    QEMUFileNonblock *s = opaque;
+    ssize_t ret = write(s->fd, data, size);
+    if (ret == -1) {
+        return -errno;
+    }
+    return ret;
+}
+
+static void nonblock_flush_buffer(QEMUFileNonblock *s)
+{
+    buffer_flush(&s->buf, s->file, s, &nonblock_flush_buffer_putbuf);
+
+    if (s->buf.buffer_size > 0) {
+        s->buf.freeze_output = true;
+    }
+}
+
+static int nonblock_put_buffer(void *opaque,
+                               const uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileNonblock *s = opaque;
+    int error;
+    ssize_t len = 0;
+
+    error = qemu_file_get_error(s->file);
+    if (error) {
+        return error;
+    }
+
+    nonblock_flush_buffer(s);
+    error = qemu_file_get_error(s->file);
+    if (error) {
+        return error;
+    }
+
+    while (!s->buf.freeze_output && size > 0) {
+        ssize_t ret;
+        assert(s->buf.buffer_size == 0);
+
+        ret = write(s->fd, buf, size);
+        if (ret == -1) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == EAGAIN) {
+                s->buf.freeze_output = true;
+            } else {
+                qemu_file_set_error(s->file, errno);
+            }
+            break;
+        }
+
+        len += ret;
+        buf += ret;
+        size -= ret;
+    }
+
+    if (size > 0) {
+        buffer_append(&s->buf, buf, size);
+        len += size;
+    }
+    return len;
+}
+
+int nonblock_pending_size(QEMUFileNonblock *s)
+{
+    return qemu_pending_size(s->file) + s->buf.buffer_size;
+}
+
+void nonblock_fflush(QEMUFileNonblock *s)
+{
+    s->buf.freeze_output = false;
+    nonblock_flush_buffer(s);
+    if (!s->buf.freeze_output) {
+        qemu_fflush(s->file);
+    }
+}
+
+void nonblock_wait_for_flush(QEMUFileNonblock *s)
+{
+    while (nonblock_pending_size(s) > 0) {
+        fd_set fds;
+        FD_ZERO(&fds);
+        FD_SET(s->fd, &fds);
+        select(s->fd + 1, NULL, &fds, NULL, NULL);
+
+        nonblock_fflush(s);
+    }
+}
+
+static int nonblock_close(void *opaque)
+{
+    QEMUFileNonblock *s = opaque;
+    nonblock_wait_for_flush(s);
+    buffer_destroy(&s->buf);
+    g_free(s);
+    return 0;
+}
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd)
+{
+    QEMUFileNonblock *s = g_malloc0(sizeof(*s));
+
+    s->fd = fd;
+    fcntl_setfl(fd, O_NONBLOCK);
+    s->file = qemu_fopen_ops(s, nonblock_put_buffer, NULL, nonblock_close,
+                             NULL, NULL, NULL);
+    return s;
+}
+
+/***************************************************************************
  * Buffered File
  */
 
diff --git a/buffered_file.h b/buffered_file.h
index d3ef546..2712e01 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,19 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileNonblock {
+    int fd;
+    QEMUFile *file;
+
+    QEMUBuffer buf;
+};
+typedef struct QEMUFileNonblock QEMUFileNonblock;
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd);
+int nonblock_pending_size(QEMUFileNonblock *s);
+void nonblock_fflush(QEMUFileNonblock *s);
+void nonblock_wait_for_flush(QEMUFileNonblock *s);
+
 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (26 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 29/41] umem.h: import Linux umem.h Isaku Yamahata
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This is used by postcopy live migration.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 buffered_file.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 buffered_file.h |   10 ++++++++++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 5198923..4f0c98e 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,56 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***************************************************************************
+ * read/write to buffer on memory
+ */
+
+static int buf_close(void *opaque)
+{
+    QEMUFileBuf *s = opaque;
+    buffer_destroy(&s->buf);
+    g_free(s);
+    return 0;
+}
+
+static int buf_put_buffer(void *opaque,
+                          const uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileBuf *s = opaque;
+    buffer_append(&s->buf, buf, size);
+    return size;
+}
+
+QEMUFileBuf *qemu_fopen_buf_write(void)
+{
+    QEMUFileBuf *s = g_malloc0(sizeof(*s));
+
+    s->file = qemu_fopen_ops(s,  buf_put_buffer, NULL, buf_close,
+                             NULL, NULL, NULL);
+    return s;
+}
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+    QEMUFileBuf *s = opaque;
+    ssize_t len = MIN(size, s->buf.buffer_capacity - s->buf.buffer_size);
+    memcpy(buf, s->buf.buffer + s->buf.buffer_size, len);
+    s->buf.buffer_size += len;
+    return len;
+}
+
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size)
+{
+    QEMUFileBuf *s = g_malloc0(sizeof(*s));
+    s->buf.buffer = buf;
+    s->buf.buffer_size = 0; /* this is used as index to read */
+    s->buf.buffer_capacity = size;
+    s->file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close,
+                             NULL, NULL, NULL);
+    return s->file;
+}
+
+/***************************************************************************
  * Nonblocking write only file
  */
 static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
diff --git a/buffered_file.h b/buffered_file.h
index 2712e01..9e28bef 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,16 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileBuf {
+    QEMUFile *file;
+    QEMUBuffer buf;
+};
+typedef struct QEMUFileBuf QEMUFileBuf;
+
+QEMUFileBuf *qemu_fopen_buf_write(void);
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
+
 struct QEMUFileNonblock {
     int fd;
     QEMUFile *file;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 29/41] umem.h: import Linux umem.h
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (27 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh Isaku Yamahata
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 linux-headers/linux/umem.h |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/linux/umem.h

diff --git a/linux-headers/linux/umem.h b/linux-headers/linux/umem.h
new file mode 100644
index 0000000..0cf7399
--- /dev/null
+++ b/linux-headers/linux/umem.h
@@ -0,0 +1,42 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LINUX_UMEM_H
+#define __LINUX_UMEM_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+struct umem_init {
+	__u64 size;		/* in bytes */
+	__s32 shmem_fd;
+	__s32 padding;
+};
+
+#define UMEMIO	0x1E
+
+/* ioctl for umem fd */
+#define UMEM_INIT		_IOWR(UMEMIO, 0x0, struct umem_init)
+#define UMEM_MAKE_VMA_ANONYMOUS	_IO  (UMEMIO, 0x1)
+
+#endif /* __LINUX_UMEM_H */
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (28 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 29/41] umem.h: import Linux umem.h Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option Isaku Yamahata
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 scripts/update-linux-headers.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 9d2a4bc..2afdd54 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -43,7 +43,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h umem.h; do
     cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 if [ -L "$linux/source" ]; then
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (29 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 32/41] savevm: add new section that is used by postcopy Isaku Yamahata
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Add enable/disable postcopy mode. No dynamic test yet.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 configure |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 1f338f8..21de4cb 100755
--- a/configure
+++ b/configure
@@ -194,6 +194,7 @@ zlib="yes"
 guest_agent="yes"
 libiscsi=""
 coroutine=""
+postcopy="yes"
 
 # parse CC options first
 for opt do
@@ -824,6 +825,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --enable-postcopy) postcopy="yes"
+  ;;
+  --disable-postcopy) postcopy="no"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1110,6 +1115,8 @@ echo "  --disable-guest-agent    disable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent     enable building of the QEMU Guest Agent"
 echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
+echo "  --disable-postcopy       disable postcopy mode for live migration"
+echo "  --enable-postcopy        enable postcopy mode for live migration"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -3029,6 +3036,7 @@ echo "OpenGL support    $opengl"
 echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
 echo "coroutine backend $coroutine_backend"
+echo "postcopy support  $postcopy"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3329,6 +3337,10 @@ if test "$libiscsi" = "yes" ; then
   echo "CONFIG_LIBISCSI=y" >> $config_host_mak
 fi
 
+if test "$postcopy" = "yes" ; then
+  echo "CONFIG_POSTCOPY=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 32/41] savevm: add new section that is used by postcopy
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (30 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL
and QEMU_VM_SUBSECTION from outgoing to incoming.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 savevm.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 318ec61..3adabad 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1597,6 +1597,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_END          0x03
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_POSTCOPY             0x10
 
 bool qemu_savevm_state_blocked(Error **errp)
 {
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (31 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 32/41] savevm: add new section that is used by postcopy Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-08 10:52   ` Juan Quintela
  2012-06-04  9:57 ` [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command Isaku Yamahata
                   ` (9 subsequent siblings)
  42 siblings, 1 reply; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This patch prepares for postcopy livemigration.
It introduces -postcopy option and its internal flag, migration_postcopy.
It introduces -postcopy-flags for chaging the behavior of incoming postcopy
mainly for benchmark/debug.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration.h     |    3 +++
 qemu-options.hx |   22 ++++++++++++++++++++++
 vl.c            |    8 ++++++++
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/migration.h b/migration.h
index 59e6e68..4bbcf06 100644
--- a/migration.h
+++ b/migration.h
@@ -103,4 +103,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+extern bool incoming_postcopy;
+extern unsigned long incoming_postcopy_flags;
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..a9af31e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2616,6 +2616,28 @@ STEXI
 Prepare for incoming migration, listen on @var{port}.
 ETEXI
 
+DEF("postcopy", 0, QEMU_OPTION_postcopy,
+    "-postcopy	postcopy incoming migration when -incoming is specified\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -postcopy
+@findex -postcopy
+start incoming migration in postcopy mode.
+ETEXI
+
+DEF("postcopy-flags", HAS_ARG, QEMU_OPTION_postcopy_flags,
+    "-postcopy-flags unsigned-int(flags)\n"
+    "	                flags for postcopy incoming migration\n"
+    "                   when -incoming and -postcopy are specified.\n"
+    "                   This is for benchmark/debug purpose (default: 0)\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -postcopy-flags int
+@findex -postcopy-flags
+Specify flags for incoming postcopy migration when -incoming and -postcopy are
+specified. This is for benchamrk/debug purpose. (default: 0)
+ETEXI
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
     "-nodefaults     don't create default devices\n", QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index 62dc343..1674abb 100644
--- a/vl.c
+++ b/vl.c
@@ -189,6 +189,8 @@ int mem_prealloc = 0; /* force preallocation of physical target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
+bool incoming_postcopy = false; /* When -incoming is specified, postcopy mode */
+unsigned long incoming_postcopy_flags = 0; /* flags for postcopy incoming mode */
 static int rtc_utc = 1;
 static int rtc_date_offset = -1; /* -1 means no change */
 QEMUClock *rtc_clock;
@@ -3115,6 +3117,12 @@ int main(int argc, char **argv, char **envp)
                 incoming = optarg;
                 runstate_set(RUN_STATE_INMIGRATE);
                 break;
+            case QEMU_OPTION_postcopy:
+                incoming_postcopy = true;
+                break;
+            case QEMU_OPTION_postcopy_flags:
+                incoming_postcopy_flags = strtoul(optarg, NULL, 0);
+                break;
             case QEMU_OPTION_nodefaults:
                 default_serial = 0;
                 default_parallel = 0;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (32 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 35/41] postcopy: introduce helper functions for postcopy Isaku Yamahata
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode
is enabled.
Add -n option for postcopy migration which indicates disabling background
transfer.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Chnages v1 -> v2:
- catch up for qapi change
---
 hmp-commands.hx  |   12 ++++++++----
 hmp.c            |    6 +++++-
 migration.c      |    9 +++++++++
 migration.h      |    2 ++
 qapi-schema.json |    3 ++-
 qmp-commands.hx  |    4 +++-
 savevm.c         |    2 ++
 7 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..3c647f7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,23 +798,27 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+		      "\n\t\t\t-p for migration with postcopy mode enabled"
+		      "\n\t\t\t-n for no background transfer of postcopy mode",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+	-p for migration with postcopy mode enabled
+	-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index bb0952e..d546a52 100644
--- a/hmp.c
+++ b/hmp.c
@@ -911,10 +911,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int detach = qdict_get_try_bool(qdict, "detach", 0);
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
+    int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
+    int nobg = qdict_get_try_bool(qdict, "nobg", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
+                !!postcopy, postcopy, !!nobg, nobg,
+                &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
         error_free(err);
diff --git a/migration.c b/migration.c
index 3b97aec..7ad62ef 100644
--- a/migration.c
+++ b/migration.c
@@ -388,12 +388,15 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
     MigrationParams params = {
         .blk = false,
         .shared = false,
+        .postcopy = false,
+        .nobg = false,
     };
     const char *p;
     int ret;
@@ -404,6 +407,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     if (has_inc) {
         params.shared = inc;
     }
+    if (has_postcopy) {
+        params.postcopy = postcopy;
+    }
+    if (has_nobg) {
+        params.nobg = nobg;
+    }
 
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 4bbcf06..091b446 100644
--- a/migration.h
+++ b/migration.h
@@ -22,6 +22,8 @@
 struct MigrationParams {
     int blk;
     int shared;
+    int postcopy;
+    int nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..5861fb9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1717,7 +1717,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
+           '*postcopy': 'bool', '*nobg': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db980fa..7b5e5b7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
@@ -483,6 +483,8 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "postcopy": postcopy migration (json-bool, optional)
+- "nobg": postcopy without background transfer (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
diff --git a/savevm.c b/savevm.c
index 3adabad..bd4b5bf 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1762,6 +1762,8 @@ static int qemu_savevm_state(QEMUFile *f)
     MigrationParams params = {
         .blk = 0,
         .shared = 0,
+        .postcopy = 0,
+        .nobg = 0,
     };
 
     if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (33 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-14 21:34   ` Juan Quintela
  2012-06-04  9:57 ` [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
                   ` (7 subsequent siblings)
  42 siblings, 1 reply; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This patch introduces helper function for postcopy to access
umem char device and to communicate between incoming-qemu and umemd.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
changes v1 -> v2:
- code simplification
- make fault trigger more robust
- introduce struct umem_pages
---
 umem.c |  364 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 umem.h |  101 ++++++++++++++++++
 2 files changed, 465 insertions(+), 0 deletions(-)
 create mode 100644 umem.c
 create mode 100644 umem.h

diff --git a/umem.c b/umem.c
new file mode 100644
index 0000000..64eaab5
--- /dev/null
+++ b/umem.c
@@ -0,0 +1,364 @@
+/*
+ * umem.c: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <linux/umem.h>
+
+#include "bitops.h"
+#include "sysemu.h"
+#include "hw/hw.h"
+#include "umem.h"
+
+//#define DEBUG_UMEM
+#ifdef DEBUG_UMEM
+#include <sys/syscall.h>
+#define DPRINTF(format, ...)                                            \
+    do {                                                                \
+        printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid),    \
+               __func__, __LINE__, ## __VA_ARGS__);                     \
+    } while (0)
+#else
+#define DPRINTF(format, ...)    do { } while (0)
+#endif
+
+#define DEV_UMEM        "/dev/umem"
+
+UMem *umem_new(void *hostp, size_t size)
+{
+    struct umem_init uinit = {
+        .size = size,
+    };
+    UMem *umem;
+
+    assert((size % getpagesize()) == 0);
+    umem = g_new(UMem, 1);
+    umem->fd = open(DEV_UMEM, O_RDWR);
+    if (umem->fd < 0) {
+        perror("can't open "DEV_UMEM);
+        abort();
+    }
+
+    if (ioctl(umem->fd, UMEM_INIT, &uinit) < 0) {
+        perror("UMEM_INIT");
+        abort();
+    }
+    if (ftruncate(uinit.shmem_fd, uinit.size) < 0) {
+        perror("truncate(\"shmem_fd\")");
+        abort();
+    }
+
+    umem->nbits = 0;
+    umem->nsets = 0;
+    umem->faulted = NULL;
+    umem->page_shift = ffs(getpagesize()) - 1;
+    umem->shmem_fd = uinit.shmem_fd;
+    umem->size = uinit.size;
+    umem->umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE,
+                      MAP_PRIVATE | MAP_FIXED, umem->fd, 0);
+    if (umem->umem == MAP_FAILED) {
+        perror("mmap(UMem) failed");
+        abort();
+    }
+    return umem;
+}
+
+void umem_destroy(UMem *umem)
+{
+    if (umem->fd != -1) {
+        close(umem->fd);
+    }
+    if (umem->shmem_fd != -1) {
+        close(umem->shmem_fd);
+    }
+    g_free(umem->faulted);
+    g_free(umem);
+}
+
+void umem_get_page_request(UMem *umem, struct umem_pages *page_request)
+{
+    ssize_t ret = read(umem->fd, page_request->pgoffs,
+                       page_request->nr * sizeof(page_request->pgoffs[0]));
+    if (ret < 0) {
+        perror("daemon: umem read");
+        abort();
+    }
+    page_request->nr = ret / sizeof(page_request->pgoffs[0]);
+}
+
+void umem_mark_page_cached(UMem *umem, struct umem_pages *page_cached)
+{
+    const void *buf = page_cached->pgoffs;
+    ssize_t left = page_cached->nr * sizeof(page_cached->pgoffs[0]);
+
+    while (left > 0) {
+        ssize_t ret = write(umem->fd, buf, left);
+        if (ret == -1) {
+            if (errno == EINTR)
+                continue;
+
+            perror("daemon: umem write");
+            abort();
+        }
+
+        left -= ret;
+        buf += ret;
+    }
+}
+
+void umem_unmap(UMem *umem)
+{
+    munmap(umem->umem, umem->size);
+    umem->umem = NULL;
+}
+
+void umem_close(UMem *umem)
+{
+    close(umem->fd);
+    umem->fd = -1;
+}
+
+void *umem_map_shmem(UMem *umem)
+{
+    umem->nbits = umem->size >> umem->page_shift;
+    umem->nsets = 0;
+    umem->faulted = g_new0(unsigned long, BITS_TO_LONGS(umem->nbits));
+
+    umem->shmem = mmap(NULL, umem->size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                       umem->shmem_fd, 0);
+    if (umem->shmem == MAP_FAILED) {
+        perror("daemon: mmap(\"shmem\")");
+        abort();
+    }
+    return umem->shmem;
+}
+
+void umem_unmap_shmem(UMem *umem)
+{
+    munmap(umem->shmem, umem->size);
+    umem->shmem = NULL;
+}
+
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size)
+{
+    int s = offset >> umem->page_shift;
+    int e = (offset + size) >> umem->page_shift;
+    int i;
+
+    for (i = s; i < e; i++) {
+        if (!test_and_set_bit(i, umem->faulted)) {
+            umem->nsets++;
+#if defined(CONFIG_MADVISE) && defined(MADV_REMOVE)
+            madvise(umem->shmem + offset, size, MADV_REMOVE);
+#endif
+        }
+    }
+}
+
+void umem_close_shmem(UMem *umem)
+{
+    close(umem->shmem_fd);
+    umem->shmem_fd = -1;
+}
+
+/***************************************************************************/
+/* qemu <-> umem daemon communication */
+
+size_t umem_pages_size(uint64_t nr)
+{
+    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);
+}
+
+static void umem_write_cmd(int fd, uint8_t cmd)
+{
+    DPRINTF("write cmd %c\n", cmd);
+
+    for (;;) {
+        ssize_t ret = write(fd, &cmd, 1);
+        if (ret == -1) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == EPIPE) {
+                perror("pipe");
+                DPRINTF("write cmd %c %zd %d: pipe is closed\n",
+                        cmd, ret, errno);
+                break;
+            }
+
+            perror("pipe");
+            DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno);
+            abort();
+        }
+
+        break;
+    }
+}
+
+static void umem_read_cmd(int fd, uint8_t expect)
+{
+    uint8_t cmd;
+    for (;;) {
+        ssize_t ret = read(fd, &cmd, 1);
+        if (ret == -1) {
+            if (errno == EINTR) {
+                continue;
+            }
+            perror("pipe");
+            DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno);
+            abort();
+        }
+
+        if (ret == 0) {
+            DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret);
+            abort();
+        }
+
+        break;
+    }
+
+    DPRINTF("read cmd %c\n", cmd);
+    if (cmd != expect) {
+        DPRINTF("cmd %c expect %d\n", cmd, expect);
+        abort();
+    }
+}
+
+struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
+{
+    int ret;
+    uint64_t nr;
+    size_t size;
+    struct umem_pages *pages;
+
+    ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset);
+    *offset += sizeof(nr);
+    DPRINTF("ret %d nr %ld\n", ret, nr);
+    if (ret != sizeof(nr) || nr == 0) {
+        return NULL;
+    }
+
+    size = umem_pages_size(nr);
+    pages = g_malloc(size);
+    pages->nr = nr;
+    size -= sizeof(pages->nr);
+
+    ret = qemu_peek_buffer(f, (uint8_t*)pages->pgoffs, size, *offset);
+    *offset += size;
+    if (ret != size) {
+        g_free(pages);
+        return NULL;
+    }
+    return pages;
+}
+
+static void umem_send_pages(QEMUFile *f, const struct umem_pages *pages)
+{
+    size_t len = umem_pages_size(pages->nr);
+    qemu_put_buffer(f, (const uint8_t*)pages, len);
+}
+
+/* umem daemon -> qemu */
+void umem_daemon_ready(int to_qemu_fd)
+{
+    umem_write_cmd(to_qemu_fd, UMEM_DAEMON_READY);
+}
+
+void umem_daemon_quit(QEMUFile *to_qemu)
+{
+    qemu_put_byte(to_qemu, UMEM_DAEMON_QUIT);
+}
+
+void umem_daemon_send_pages_present(QEMUFile *to_qemu,
+                                    struct umem_pages *pages)
+{
+    qemu_put_byte(to_qemu, UMEM_DAEMON_TRIGGER_PAGE_FAULT);
+    umem_send_pages(to_qemu, pages);
+}
+
+void umem_daemon_wait_for_qemu(int from_qemu_fd)
+{
+    umem_read_cmd(from_qemu_fd, UMEM_QEMU_READY);
+}
+
+/* qemu -> umem daemon */
+void umem_qemu_wait_for_daemon(int from_umemd_fd)
+{
+    umem_read_cmd(from_umemd_fd, UMEM_DAEMON_READY);
+}
+
+void umem_qemu_ready(int to_umemd_fd)
+{
+    umem_write_cmd(to_umemd_fd, UMEM_QEMU_READY);
+}
+
+void umem_qemu_quit(QEMUFile *to_umemd)
+{
+    qemu_put_byte(to_umemd, UMEM_QEMU_QUIT);
+}
+
+/* qemu side handler */
+struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
+                                                int *offset)
+{
+    uint64_t i;
+    int page_shift = ffs(getpagesize()) - 1;
+    struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
+    if (pages == NULL) {
+        return NULL;
+    }
+
+    for (i = 0; i < pages->nr; i++) {
+        ram_addr_t addr = pages->pgoffs[i] << page_shift;
+
+        /* make pages present by forcibly triggering page fault. */
+        volatile uint8_t *ram = qemu_get_ram_ptr(addr);
+        uint8_t dummy_read = ram[0];
+        (void)dummy_read;   /* suppress unused variable warning */
+    }
+
+    /*
+     * Very Linux implementation specific.
+     * Make it sure that other thread doesn't fault on the above virtual
+     * address. (More exactly other thread doesn't call fault handler with
+     * the offset.)
+     * the fault handler is called with mmap_sem read locked.
+     * madvise() does down/up_write(mmap_sem)
+     */
+    qemu_madvise(NULL, 0, MADV_NORMAL);
+
+    return pages;
+}
+
+void umem_qemu_send_pages_present(QEMUFile *to_umemd,
+                                  const struct umem_pages *pages)
+{
+    qemu_put_byte(to_umemd, UMEM_QEMU_PAGE_FAULTED);
+    umem_send_pages(to_umemd, pages);
+}
+
+void umem_qemu_send_pages_unmapped(QEMUFile *to_umemd,
+                                   const struct umem_pages *pages)
+{
+    qemu_put_byte(to_umemd, UMEM_QEMU_PAGE_UNMAPPED);
+    umem_send_pages(to_umemd, pages);
+}
diff --git a/umem.h b/umem.h
new file mode 100644
index 0000000..058cac6
--- /dev/null
+++ b/umem.h
@@ -0,0 +1,101 @@
+/*
+ * umem.h: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_UMEM_H
+#define QEMU_UMEM_H
+
+#include <linux/umem.h>
+
+#include "qemu-common.h"
+
+typedef struct UMemDev UMemDev;
+
+struct UMem {
+    void *umem;
+    int fd;
+    void *shmem;
+    int shmem_fd;
+    uint64_t size;
+
+    /* indexed by host page size */
+    int page_shift;
+    int nbits;
+    int nsets;
+    unsigned long *faulted;
+};
+
+struct umem_pages {
+    uint64_t nr;
+    uint64_t pgoffs[0];
+};
+
+UMem *umem_new(void *hostp, size_t size);
+void umem_destroy(UMem *umem);
+
+/* umem device operations */
+void umem_get_page_request(UMem *umem, struct umem_pages *page_request);
+void umem_mark_page_cached(UMem *umem, struct umem_pages *page_cached);
+void umem_unmap(UMem *umem);
+void umem_close(UMem *umem);
+
+/* umem shmem operations */
+void *umem_map_shmem(UMem *umem);
+void umem_unmap_shmem(UMem *umem);
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size);
+void umem_close_shmem(UMem *umem);
+
+/* qemu on source <-> umem daemon communication */
+
+/* daemon -> qemu */
+#define UMEM_DAEMON_READY               'R'
+#define UMEM_DAEMON_QUIT                'Q'
+#define UMEM_DAEMON_TRIGGER_PAGE_FAULT  'T'
+#define UMEM_DAEMON_ERROR               'E'
+
+/* qemu -> daemon */
+#define UMEM_QEMU_READY                 'r'
+#define UMEM_QEMU_QUIT                  'q'
+#define UMEM_QEMU_PAGE_FAULTED          't'
+#define UMEM_QEMU_PAGE_UNMAPPED         'u'
+
+struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset);
+size_t umem_pages_size(uint64_t nr);
+
+/* for umem daemon */
+void umem_daemon_ready(int to_qemu_fd);
+void umem_daemon_wait_for_qemu(int from_qemu_fd);
+void umem_daemon_quit(QEMUFile *to_qemu);
+void umem_daemon_send_pages_present(QEMUFile *to_qemu,
+                                    struct umem_pages *pages);
+
+/* for qemu */
+void umem_qemu_wait_for_daemon(int from_umemd_fd);
+void umem_qemu_ready(int to_umemd_fd);
+void umem_qemu_quit(QEMUFile *to_umemd);
+struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
+                                                int *offset);
+void umem_qemu_send_pages_present(QEMUFile *to_umemd,
+                                  const struct umem_pages *pages);
+void umem_qemu_send_pages_unmapped(QEMUFile *to_umemd,
+                                   const struct umem_pages *pages);
+
+#endif /* QEMU_UMEM_H */
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (34 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 35/41] postcopy: introduce helper functions for postcopy Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-14 21:56   ` Juan Quintela
  2012-06-14 21:58   ` Juan Quintela
  2012-06-04  9:57 ` [PATCH v2 37/41] postcopy: implement outgoing " Isaku Yamahata
                   ` (6 subsequent siblings)
  42 siblings, 2 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This patch implements postcopy live migration for incoming part

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v3 -> v4:
- fork umemd early to address qemu devices touching guest ram via
  post/pre_load
- code clean up on initialization
- Makefile.target
  migration-postcopy.c is target dependent due to TARGET_PAGE_xxx
  So it can't be shared between target architecture.
- use qemu_fopen_fd
- introduce incoming_flags_use_umem_make_present flag
- use MADV_DONTNEED

Changes v2 -> v3:
- make incoming socket nonblocking
- several clean ups
- Dropped QEMUFilePipe
- Moved QEMUFileNonblock to buffered_file
- Split out into umem/incoming/outgoing

Changes v1 -> v2:
- make mig_read nonblocking when socket
- updates for umem device changes
---
 Makefile.target                                    |    5 +
 cpu-all.h                                          |    7 +
 exec.c                                             |   20 +-
 migration-exec.c                                   |    4 +
 migration-fd.c                                     |    6 +
 .../linux/umem.h => migration-postcopy-stub.c      |   47 +-
 migration-postcopy.c                               | 1267 ++++++++++++++++++++
 migration.c                                        |    4 +
 migration.h                                        |   13 +
 qemu-common.h                                      |    1 +
 qemu-options.hx                                    |    5 +-
 savevm.c                                           |   43 +
 vl.c                                               |    8 +-
 13 files changed, 1409 insertions(+), 21 deletions(-)
 copy linux-headers/linux/umem.h => migration-postcopy-stub.c (55%)
 create mode 100644 migration-postcopy.c

diff --git a/Makefile.target b/Makefile.target
index 1582904..618bd3e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -4,6 +4,7 @@ GENERATED_HEADERS = config-target.h
 CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
 CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
 CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y)
+CONFIG_NO_POSTCOPY = $(if $(subst n,,$(CONFIG_POSTCOPY)),n,y)
 
 include ../config-host.mak
 include config-devices.mak
@@ -196,6 +197,10 @@ LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
 
+obj-$(CONFIG_POSTCOPY) += migration-postcopy.o
+obj-$(CONFIG_NO_POSTCOPY) += migration-postcopy-stub.o
+common-obj-$(CONFIG_POSTCOPY) += umem.o
+
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
diff --git a/cpu-all.h b/cpu-all.h
index ff7f827..e0956bc 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -486,6 +486,9 @@ extern ram_addr_t ram_size;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1 << 0)
 
+/* RAM is allocated via umem for postcopy incoming mode */
+#define RAM_POSTCOPY_UMEM_MASK  (1 << 1)
+
 typedef struct RAMBlock {
     struct MemoryRegion *mr;
     uint8_t *host;
@@ -497,6 +500,10 @@ typedef struct RAMBlock {
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
+
+#ifdef CONFIG_POSTCOPY
+    UMem *umem;    /* for incoming postcopy mode */
+#endif
 } RAMBlock;
 
 typedef struct RAMList {
diff --git a/exec.c b/exec.c
index 78eeee5..e5ff2ed 100644
--- a/exec.c
+++ b/exec.c
@@ -36,6 +36,7 @@
 #include "arch_init.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "migration.h"
 #if defined(CONFIG_USER_ONLY)
 #include <qemu.h>
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -2632,6 +2633,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
         new_block->host = host;
         new_block->flags |= RAM_PREALLOC_MASK;
     } else {
+#ifdef CONFIG_POSTCOPY
+        if (incoming_postcopy) {
+            ram_addr_t page_size = getpagesize();
+            size = (size + page_size - 1) & ~(page_size - 1);
+            mem_path = NULL;
+        }
+#endif
         if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
             new_block->host = file_ram_alloc(new_block, size, mem_path);
@@ -2709,7 +2717,13 @@ void qemu_ram_free(ram_addr_t addr)
             QLIST_REMOVE(block, next);
             if (block->flags & RAM_PREALLOC_MASK) {
                 ;
-            } else if (mem_path) {
+            }
+#ifdef CONFIG_POSTCOPY
+            else if (block->flags & RAM_POSTCOPY_UMEM_MASK) {
+                postcopy_incoming_ram_free(block->umem);
+            }
+#endif
+            else if (mem_path) {
 #if defined (__linux__) && !defined(TARGET_S390X)
                 if (block->fd) {
                     munmap(block->host, block->length);
@@ -2755,6 +2769,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
             } else {
                 flags = MAP_FIXED;
                 munmap(vaddr, length);
+                if (block->flags & RAM_POSTCOPY_UMEM_MASK) {
+                    postcopy_incoming_qemu_pages_unmapped(addr, length);
+                    block->flags &= ~RAM_POSTCOPY_UMEM_MASK;
+                }
                 if (mem_path) {
 #if defined(__linux__) && !defined(TARGET_S390X)
                     if (block->fd) {
diff --git a/migration-exec.c b/migration-exec.c
index 95e9779..7f08b3b 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -106,6 +106,10 @@ int exec_start_incoming_migration(const char *command)
 {
     QEMUFile *f;
 
+    if (incoming_postcopy) {
+        return -ENOSYS;
+    }
+
     DPRINTF("Attempting to start an incoming migration\n");
     f = qemu_popen_cmd(command, "r");
     if(f == NULL) {
diff --git a/migration-fd.c b/migration-fd.c
index d9c13fe..42b8162 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -116,6 +116,12 @@ int fd_start_incoming_migration(const char *infd)
     DPRINTF("Attempting to start an incoming migration via fd\n");
 
     fd = strtol(infd, NULL, 0);
+    if (incoming_postcopy) {
+        int flags = fcntl(fd, F_GETFL);
+        if ((flags & O_ACCMODE) != O_RDWR) {
+            return -EINVAL;
+        }
+    }
     f = qemu_fdopen(fd, "rb");
     if(f == NULL) {
         DPRINTF("Unable to apply qemu wrapper to file descriptor\n");
diff --git a/linux-headers/linux/umem.h b/migration-postcopy-stub.c
similarity index 55%
copy from linux-headers/linux/umem.h
copy to migration-postcopy-stub.c
index 0cf7399..f9ebcbe 100644
--- a/linux-headers/linux/umem.h
+++ b/migration-postcopy-stub.c
@@ -1,8 +1,8 @@
 /*
- * User process backed memory.
- * This is mainly for KVM post copy.
+ * migration-postcopy-stub.c: postcopy livemigration
+ *                            stub functions for non-supported hosts
  *
- * Copyright (c) 2011,
+ * Copyright (c) 2011
  * National Institute of Advanced Industrial Science and Technology
  *
  * https://sites.google.com/site/grivonhome/quick-kvm-migration
@@ -21,22 +21,35 @@
  * with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
-#ifndef __LINUX_UMEM_H
-#define __LINUX_UMEM_H
+#include "sysemu.h"
+#include "migration.h"
 
-#include <linux/types.h>
-#include <linux/ioctl.h>
+int postcopy_incoming_init(const char *incoming, bool incoming_postcopy)
+{
+    return -ENOSYS;
+}
 
-struct umem_init {
-	__u64 size;		/* in bytes */
-	__s32 shmem_fd;
-	__s32 padding;
-};
+void postcopy_incoming_prepare(void)
+{
+}
 
-#define UMEMIO	0x1E
+int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id)
+{
+    return -ENOSYS;
+}
 
-/* ioctl for umem fd */
-#define UMEM_INIT		_IOWR(UMEMIO, 0x0, struct umem_init)
-#define UMEM_MAKE_VMA_ANONYMOUS	_IO  (UMEMIO, 0x1)
+void postcopy_incoming_fork_umemd(QEMUFile *mig_read)
+{
+}
 
-#endif /* __LINUX_UMEM_H */
+void postcopy_incoming_qemu_ready(void)
+{
+}
+
+void postcopy_incoming_qemu_cleanup(void)
+{
+}
+
+void postcopy_incoming_qemu_pages_unmapped(ram_addr_t addr, ram_addr_t size)
+{
+}
diff --git a/migration-postcopy.c b/migration-postcopy.c
new file mode 100644
index 0000000..5913e05
--- /dev/null
+++ b/migration-postcopy.c
@@ -0,0 +1,1267 @@
+/*
+ * migration-postcopy.c: postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata <yamahata at valinux co jp>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "config-host.h"
+
+#if defined(CONFIG_MADVISE) || defined(CONFIG_POSIX_MADVISE)
+#include <sys/mman.h>
+#endif
+
+#include "bitmap.h"
+#include "sysemu.h"
+#include "hw/hw.h"
+#include "arch_init.h"
+#include "migration.h"
+#include "buffered_file.h"
+#include "qemu_socket.h"
+#include "umem.h"
+
+#include "memory.h"
+#define WANT_EXEC_OBSOLETE
+#include "exec-obsolete.h"
+
+//#define DEBUG_POSTCOPY
+#ifdef DEBUG_POSTCOPY
+#include <sys/syscall.h>
+#define DPRINTF(fmt, ...)                                               \
+    do {                                                                \
+        printf("%d:%ld %s:%d: " fmt, getpid(), syscall(SYS_gettid),     \
+               __func__, __LINE__, ## __VA_ARGS__);                     \
+    } while (0)
+#else
+#define DPRINTF(fmt, ...)       do { } while (0)
+#endif
+
+#define ALIGN_UP(size, align)   (((size) + (align) - 1) & ~((align) - 1))
+
+static void fd_close(int *fd)
+{
+    if (*fd >= 0) {
+        close(*fd);
+        *fd = -1;
+    }
+}
+
+/***************************************************************************
+ * umem daemon on destination <-> qemu on source protocol
+ */
+
+#define QEMU_UMEM_REQ_INIT              0x00
+#define QEMU_UMEM_REQ_ON_DEMAND         0x01
+#define QEMU_UMEM_REQ_ON_DEMAND_CONT    0x02
+#define QEMU_UMEM_REQ_BACKGROUND        0x03
+#define QEMU_UMEM_REQ_BACKGROUND_CONT   0x04
+#define QEMU_UMEM_REQ_REMOVE            0x05
+#define QEMU_UMEM_REQ_EOC               0x06
+
+struct qemu_umem_req {
+    int8_t cmd;
+    uint8_t len;
+    char *idstr;        /* ON_DEMAND, BACKGROUND, REMOVE */
+    uint32_t nr;        /* ON_DEMAND, ON_DEMAND_CONT,
+                           BACKGROUND, BACKGROUND_CONT, REMOVE */
+
+    /* in target page size as qemu migration protocol */
+    uint64_t *pgoffs;   /* ON_DEMAND, ON_DEMAND_CONT,
+                           BACKGROUND, BACKGROUND_CONT, REMOVE */
+};
+
+static void postcopy_incoming_send_req_idstr(QEMUFile *f, const char* idstr)
+{
+    qemu_put_byte(f, strlen(idstr));
+    qemu_put_buffer(f, (uint8_t *)idstr, strlen(idstr));
+}
+
+static void postcopy_incoming_send_req_pgoffs(QEMUFile *f, uint32_t nr,
+                                              const uint64_t *pgoffs)
+{
+    uint32_t i;
+
+    qemu_put_be32(f, nr);
+    for (i = 0; i < nr; i++) {
+        qemu_put_be64(f, pgoffs[i]);
+    }
+}
+
+static void postcopy_incoming_send_req_one(QEMUFile *f,
+                                           const struct qemu_umem_req *req)
+{
+    DPRINTF("cmd %d\n", req->cmd);
+    qemu_put_byte(f, req->cmd);
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_ON_DEMAND:
+    case QEMU_UMEM_REQ_BACKGROUND:
+    case QEMU_UMEM_REQ_REMOVE:
+        postcopy_incoming_send_req_idstr(f, req->idstr);
+        postcopy_incoming_send_req_pgoffs(f, req->nr, req->pgoffs);
+        break;
+    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
+    case QEMU_UMEM_REQ_BACKGROUND_CONT:
+        postcopy_incoming_send_req_pgoffs(f, req->nr, req->pgoffs);
+        break;
+    default:
+        abort();
+        break;
+    }
+}
+
+/* QEMUFile can buffer up to IO_BUF_SIZE = 32 * 1024.
+ * So one message size must be <= IO_BUF_SIZE
+ * cmd: 1
+ * id len: 1
+ * id: 256
+ * nr: 2
+ */
+#define MAX_PAGE_NR     ((32 * 1024 - 1 - 1 - 256 - 2) / sizeof(uint64_t))
+static void postcopy_incoming_send_req(QEMUFile *f,
+                                       const struct qemu_umem_req *req)
+{
+    uint32_t nr = req->nr;
+    struct qemu_umem_req tmp = *req;
+
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        postcopy_incoming_send_req_one(f, &tmp);
+        break;
+    case QEMU_UMEM_REQ_ON_DEMAND:
+    case QEMU_UMEM_REQ_BACKGROUND:
+        tmp.nr = MIN(nr, MAX_PAGE_NR);
+        postcopy_incoming_send_req_one(f, &tmp);
+
+        nr -= tmp.nr;
+        tmp.pgoffs += tmp.nr;
+        if (tmp.cmd == QEMU_UMEM_REQ_ON_DEMAND) {
+            tmp.cmd = QEMU_UMEM_REQ_ON_DEMAND_CONT;
+        }else {
+            tmp.cmd = QEMU_UMEM_REQ_BACKGROUND_CONT;
+        }
+        /* fall through */
+    case QEMU_UMEM_REQ_REMOVE:
+    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
+    case QEMU_UMEM_REQ_BACKGROUND_CONT:
+        while (nr > 0) {
+            tmp.nr = MIN(nr, MAX_PAGE_NR);
+            postcopy_incoming_send_req_one(f, &tmp);
+
+            nr -= tmp.nr;
+            tmp.pgoffs += tmp.nr;
+        }
+        break;
+    default:
+        abort();
+        break;
+    }
+}
+
+/***************************************************************************
+ * incoming part
+ */
+
+/* flags for incoming mode to modify the behavior.
+   This is for benchmark/debug purpose */
+#define INCOMING_FLAGS_FAULT_REQUEST            0x01
+
+
+static void postcopy_incoming_umemd(void);
+
+#define PIS_STATE_QUIT_RECEIVED         0x01
+#define PIS_STATE_QUIT_QUEUED           0x02
+#define PIS_STATE_QUIT_SENT             0x04
+
+#define PIS_STATE_QUIT_MASK             (PIS_STATE_QUIT_RECEIVED | \
+                                         PIS_STATE_QUIT_QUEUED | \
+                                         PIS_STATE_QUIT_SENT)
+
+struct PostcopyIncomingState {
+    /* dest qemu state */
+    uint32_t    state;
+
+    int host_page_size;
+    int host_page_shift;
+
+    /* qemu side */
+    int to_umemd_fd;
+    QEMUFileNonblock *to_umemd;
+#define MAX_FAULTED_PAGES       256
+    struct umem_pages *faulted_pages;
+
+    int from_umemd_fd;
+    QEMUFile *from_umemd;
+    int version_id;     /* save/load format version id */
+};
+typedef struct PostcopyIncomingState PostcopyIncomingState;
+
+
+#define UMEM_STATE_EOS_RECEIVED         0x01    /* umem daemon <-> src qemu */
+#define UMEM_STATE_EOC_SENT             0x02    /* umem daemon <-> src qemu */
+#define UMEM_STATE_QUIT_RECEIVED        0x04    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_QUEUED          0x08    /* umem daemon <-> dst qemu */
+#define UMEM_STATE_QUIT_SENT            0x10    /* umem daemon <-> dst qemu */
+
+#define UMEM_STATE_QUIT_MASK            (UMEM_STATE_QUIT_QUEUED | \
+                                         UMEM_STATE_QUIT_SENT | \
+                                         UMEM_STATE_QUIT_RECEIVED)
+#define UMEM_STATE_END_MASK             (UMEM_STATE_EOS_RECEIVED | \
+                                         UMEM_STATE_EOC_SENT | \
+                                         UMEM_STATE_QUIT_MASK)
+
+struct PostcopyIncomingUMemDaemon {
+    /* umem daemon side */
+    uint32_t state;
+
+    int host_page_size;
+    int host_page_shift;
+    int nr_host_pages_per_target_page;
+    int host_to_target_page_shift;
+    int nr_target_pages_per_host_page;
+    int target_to_host_page_shift;
+    int version_id;     /* save/load format version id */
+
+    int to_qemu_fd;
+    QEMUFileNonblock *to_qemu;
+    int from_qemu_fd;
+    QEMUFile *from_qemu;
+
+    int mig_read_fd;
+    QEMUFile *mig_read;         /* qemu on source -> umem daemon */
+
+    int mig_write_fd;
+    QEMUFileNonblock *mig_write;        /* umem daemon -> qemu on source */
+
+    /* = KVM_MAX_VCPUS * (ASYNC_PF_PER_VCPUS + 1) */
+#define MAX_REQUESTS    (512 * (64 + 1))
+
+    struct umem_pages *page_request;
+    struct umem_pages *page_cached;
+
+#define MAX_PRESENT_REQUESTS    MAX_FAULTED_PAGES
+    struct umem_pages *present_request;
+
+    uint64_t *target_pgoffs;
+
+    /* bitmap indexed by target page offset */
+    unsigned long *phys_requested;
+
+    /* bitmap indexed by target page offset */
+    unsigned long *phys_received;
+
+    RAMBlock *last_block_read;  /* qemu on source -> umem daemon */
+    RAMBlock *last_block_write; /* umem daemon -> qemu on source */
+};
+typedef struct PostcopyIncomingUMemDaemon PostcopyIncomingUMemDaemon;
+
+static PostcopyIncomingState state = {
+    .state = 0,
+    .to_umemd_fd = -1,
+    .to_umemd = NULL,
+    .from_umemd_fd = -1,
+    .from_umemd = NULL,
+};
+
+static PostcopyIncomingUMemDaemon umemd = {
+    .state = 0,
+    .to_qemu_fd = -1,
+    .to_qemu = NULL,
+    .from_qemu_fd = -1,
+    .from_qemu = NULL,
+    .mig_read_fd = -1,
+    .mig_read = NULL,
+    .mig_write_fd = -1,
+    .mig_write = NULL,
+};
+
+void postcopy_incoming_ram_free(UMem *umem)
+{
+    umem_unmap(umem);
+    umem_close(umem);
+    umem_destroy(umem);
+}
+
+void postcopy_incoming_prepare(void)
+{
+    RAMBlock *block;
+
+    if (!incoming_postcopy) {
+        return;
+    }
+
+    state.state = 0;
+    state.host_page_size = getpagesize();
+    state.host_page_shift = ffs(state.host_page_size) - 1;
+    state.version_id = RAM_SAVE_VERSION_ID; /* = save version of
+                                               ram_save_live() */
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        block->umem = umem_new(block->host, block->length);
+        block->flags |= RAM_POSTCOPY_UMEM_MASK;
+    }
+}
+
+static int postcopy_incoming_ram_load_get64(QEMUFile *f,
+                                            ram_addr_t *addr, int *flags)
+{
+    *addr = qemu_get_be64(f);
+    *flags = *addr & ~TARGET_PAGE_MASK;
+    *addr &= TARGET_PAGE_MASK;
+    return qemu_file_get_error(f);
+}
+
+int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id)
+{
+    ram_addr_t addr;
+    int flags;
+    int error;
+
+    DPRINTF("incoming ram load\n");
+    /*
+     * RAM_SAVE_FLAGS_EOS or
+     * RAM_SAVE_FLAGS_MEM_SIZE + mem size + RAM_SAVE_FLAGS_EOS
+     * see postcopy_outgoing_ram_save_live()
+     */
+
+    if (version_id != RAM_SAVE_VERSION_ID) {
+        DPRINTF("RAM_SAVE_VERSION_ID %d != %d\n",
+                version_id, RAM_SAVE_VERSION_ID);
+        return -EINVAL;
+    }
+    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
+    DPRINTF("addr 0x%lx flags 0x%x\n", addr, flags);
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
+        DPRINTF("EOS\n");
+        return 0;
+    }
+
+    if (flags != RAM_SAVE_FLAG_MEM_SIZE) {
+        DPRINTF("-EINVAL flags 0x%x\n", flags);
+        return -EINVAL;
+    }
+    error = ram_load_mem_size(f, addr);
+    if (error) {
+        DPRINTF("addr 0x%lx error %d\n", addr, error);
+        return error;
+    }
+
+    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
+    if (error) {
+        DPRINTF("addr 0x%lx flags 0x%x error %d\n", addr, flags, error);
+        return error;
+    }
+    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
+        DPRINTF("done\n");
+        return 0;
+    }
+    DPRINTF("-EINVAL\n");
+    return -EINVAL;
+}
+
+static void postcopy_incoming_pipe_and_fork_umemd(int mig_read_fd,
+                                                  QEMUFile *mig_read)
+{
+    int fds[2];
+    RAMBlock *block;
+
+    DPRINTF("fork\n");
+
+    /* socketpair(AF_UNIX)? */
+
+    if (qemu_pipe(fds) == -1) {
+        perror("qemu_pipe");
+        abort();
+    }
+    state.from_umemd_fd = fds[0];
+    umemd.to_qemu_fd = fds[1];
+
+    if (qemu_pipe(fds) == -1) {
+        perror("qemu_pipe");
+        abort();
+    }
+    umemd.from_qemu_fd = fds[0];
+    state.to_umemd_fd = fds[1];
+
+    pid_t child = fork();
+    if (child < 0) {
+        perror("fork");
+        abort();
+    }
+
+    if (child == 0) {
+        int mig_write_fd;
+
+        fd_close(&state.to_umemd_fd);
+        fd_close(&state.from_umemd_fd);
+        umemd.host_page_size = state.host_page_size;
+        umemd.host_page_shift = state.host_page_shift;
+
+        umemd.nr_host_pages_per_target_page =
+            TARGET_PAGE_SIZE / umemd.host_page_size;
+        umemd.nr_target_pages_per_host_page =
+            umemd.host_page_size / TARGET_PAGE_SIZE;
+
+        umemd.target_to_host_page_shift =
+            ffs(umemd.nr_host_pages_per_target_page) - 1;
+        umemd.host_to_target_page_shift =
+            ffs(umemd.nr_target_pages_per_host_page) - 1;
+
+        umemd.state = 0;
+        umemd.version_id = state.version_id;
+        umemd.mig_read_fd = mig_read_fd;
+        umemd.mig_read = mig_read;
+
+        mig_write_fd = dup(mig_read_fd);
+        if (mig_write_fd < 0) {
+            perror("could not dup for writable socket \n");
+            abort();
+        }
+        umemd.mig_write_fd = mig_write_fd;
+        umemd.mig_write = qemu_fopen_nonblock(mig_write_fd);
+
+        postcopy_incoming_umemd(); /* noreturn */
+    }
+
+    DPRINTF("qemu pid: %d daemon pid: %d\n", getpid(), child);
+    fd_close(&umemd.to_qemu_fd);
+    fd_close(&umemd.from_qemu_fd);
+    state.faulted_pages = g_malloc(umem_pages_size(MAX_FAULTED_PAGES));
+    state.faulted_pages->nr = 0;
+
+    /* close all UMem.shmem_fd */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        umem_close_shmem(block->umem);
+    }
+    umem_qemu_wait_for_daemon(state.from_umemd_fd);
+}
+
+void postcopy_incoming_fork_umemd(QEMUFile *mig_read)
+{
+    int fd = qemu_file_fd(mig_read);
+    assert((fcntl(fd, F_GETFL) & O_ACCMODE) == O_RDWR);
+
+    socket_set_nonblock(fd);
+    postcopy_incoming_pipe_and_fork_umemd(fd, mig_read);
+    /* now socket is disowned. So tell umem server that it's safe to use it */
+    postcopy_incoming_qemu_ready();
+}
+
+static void postcopy_incoming_qemu_recv_quit(void)
+{
+    RAMBlock *block;
+    if (state.state & PIS_STATE_QUIT_RECEIVED) {
+        return;
+    }
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (block->umem != NULL) {
+            umem_destroy(block->umem);
+            block->umem = NULL;
+            block->flags &= ~RAM_POSTCOPY_UMEM_MASK;
+        }
+    }
+
+    DPRINTF("|= PIS_STATE_QUIT_RECEIVED\n");
+    state.state |= PIS_STATE_QUIT_RECEIVED;
+    qemu_set_fd_handler(state.from_umemd_fd, NULL, NULL, NULL);
+    qemu_fclose(state.from_umemd);
+    state.from_umemd = NULL;
+    fd_close(&state.from_umemd_fd);
+}
+
+static void postcopy_incoming_qemu_fflush_to_umemd_handler(void *opaque)
+{
+    assert(state.to_umemd != NULL);
+
+    nonblock_fflush(state.to_umemd);
+    if (nonblock_pending_size(state.to_umemd) > 0) {
+        return;
+    }
+
+    qemu_set_fd_handler(state.to_umemd->fd, NULL, NULL, NULL);
+    if (state.state & PIS_STATE_QUIT_QUEUED) {
+        DPRINTF("|= PIS_STATE_QUIT_SENT\n");
+        state.state |= PIS_STATE_QUIT_SENT;
+        qemu_fclose(state.to_umemd->file);
+        state.to_umemd = NULL;
+        fd_close(&state.to_umemd_fd);
+        g_free(state.faulted_pages);
+        state.faulted_pages = NULL;
+    }
+}
+
+static void postcopy_incoming_qemu_fflush_to_umemd(void)
+{
+    qemu_set_fd_handler(state.to_umemd->fd, NULL,
+                        postcopy_incoming_qemu_fflush_to_umemd_handler, NULL);
+    postcopy_incoming_qemu_fflush_to_umemd_handler(NULL);
+}
+
+static void postcopy_incoming_qemu_queue_quit(void)
+{
+    if (state.state & PIS_STATE_QUIT_QUEUED) {
+        return;
+    }
+
+    DPRINTF("|= PIS_STATE_QUIT_QUEUED\n");
+    umem_qemu_quit(state.to_umemd->file);
+    state.state |= PIS_STATE_QUIT_QUEUED;
+}
+
+static void postcopy_incoming_qemu_send_pages_present(void)
+{
+    if (state.faulted_pages->nr > 0) {
+        umem_qemu_send_pages_present(state.to_umemd->file,
+                                     state.faulted_pages);
+        state.faulted_pages->nr = 0;
+    }
+}
+
+static void postcopy_incoming_qemu_faulted_pages(
+    const struct umem_pages *pages)
+{
+    assert(pages->nr <= MAX_FAULTED_PAGES);
+    assert(state.faulted_pages != NULL);
+
+    if (state.faulted_pages->nr + pages->nr > MAX_FAULTED_PAGES) {
+        postcopy_incoming_qemu_send_pages_present();
+    }
+    memcpy(&state.faulted_pages->pgoffs[state.faulted_pages->nr],
+           &pages->pgoffs[0], sizeof(pages->pgoffs[0]) * pages->nr);
+    state.faulted_pages->nr += pages->nr;
+}
+
+static void postcopy_incoming_qemu_cleanup_umem(void);
+
+static int postcopy_incoming_qemu_handle_req_one(void)
+{
+    int offset = 0;
+    int ret;
+    uint8_t cmd;
+
+    ret = qemu_peek_buffer(state.from_umemd, &cmd, sizeof(cmd), offset);
+    offset += sizeof(cmd);
+    if (ret != sizeof(cmd)) {
+        return -EAGAIN;
+    }
+    DPRINTF("cmd %c\n", cmd);
+
+    switch (cmd) {
+    case UMEM_DAEMON_QUIT:
+        postcopy_incoming_qemu_recv_quit();
+        postcopy_incoming_qemu_queue_quit();
+        postcopy_incoming_qemu_cleanup_umem();
+        break;
+    case UMEM_DAEMON_TRIGGER_PAGE_FAULT: {
+        struct umem_pages *pages =
+            umem_qemu_trigger_page_fault(state.from_umemd, &offset);
+        if (pages == NULL) {
+            return -EAGAIN;
+        }
+        if (state.to_umemd_fd >= 0 && !(state.state & PIS_STATE_QUIT_QUEUED)) {
+            postcopy_incoming_qemu_faulted_pages(pages);
+            g_free(pages);
+        }
+        break;
+    }
+    case UMEM_DAEMON_ERROR:
+        /* umem daemon hit troubles, so it warned us to stop vm execution */
+        vm_stop(RUN_STATE_IO_ERROR); /* or RUN_STATE_INTERNAL_ERROR */
+        break;
+    default:
+        abort();
+        break;
+    }
+
+    if (state.from_umemd != NULL) {
+        qemu_file_skip(state.from_umemd, offset);
+    }
+    return 0;
+}
+
+static void postcopy_incoming_qemu_handle_req(void *opaque)
+{
+    do {
+        int ret = postcopy_incoming_qemu_handle_req_one();
+        if (ret == -EAGAIN) {
+            break;
+        }
+    } while (state.from_umemd != NULL &&
+             qemu_pending_size(state.from_umemd) > 0);
+
+    if (state.to_umemd != NULL) {
+        if (state.faulted_pages->nr > 0) {
+            postcopy_incoming_qemu_send_pages_present();
+        }
+        postcopy_incoming_qemu_fflush_to_umemd();
+    }
+}
+
+void postcopy_incoming_qemu_ready(void)
+{
+    umem_qemu_ready(state.to_umemd_fd);
+
+    state.from_umemd = qemu_fopen_fd(state.from_umemd_fd);
+    state.to_umemd = qemu_fopen_nonblock(state.to_umemd_fd);
+    qemu_set_fd_handler(state.from_umemd_fd,
+                        postcopy_incoming_qemu_handle_req, NULL, NULL);
+}
+
+static void postcopy_incoming_qemu_cleanup_umem(void)
+{
+    /* when qemu will quit before completing postcopy, tell umem daemon
+       to tear down umem device and exit. */
+    if (state.to_umemd_fd >= 0) {
+        postcopy_incoming_qemu_queue_quit();
+        postcopy_incoming_qemu_fflush_to_umemd();
+    }
+}
+
+void postcopy_incoming_qemu_cleanup(void)
+{
+    postcopy_incoming_qemu_cleanup_umem();
+    if (state.to_umemd != NULL) {
+        nonblock_wait_for_flush(state.to_umemd);
+    }
+}
+
+void postcopy_incoming_qemu_pages_unmapped(ram_addr_t addr, ram_addr_t size)
+{
+    uint64_t nr = DIV_ROUND_UP(size, state.host_page_size);
+    size_t len = umem_pages_size(nr);
+    ram_addr_t end = addr + size;
+    struct umem_pages *pages;
+    int i;
+
+    if (state.to_umemd_fd < 0 || state.state & PIS_STATE_QUIT_QUEUED) {
+        return;
+    }
+    pages = g_malloc(len);
+    pages->nr = nr;
+    for (i = 0; addr < end; addr += state.host_page_size, i++) {
+        pages->pgoffs[i] = addr >> state.host_page_shift;
+    }
+    umem_qemu_send_pages_unmapped(state.to_umemd->file, pages);
+    g_free(pages);
+    assert(state.to_umemd != NULL);
+    postcopy_incoming_qemu_fflush_to_umemd();
+}
+
+/**************************************************************************
+ * incoming umem daemon
+ */
+
+static void postcopy_incoming_umem_recv_quit(void)
+{
+    if (umemd.state & UMEM_STATE_QUIT_RECEIVED) {
+        return;
+    }
+    DPRINTF("|= UMEM_STATE_QUIT_RECEIVED\n");
+    umemd.state |= UMEM_STATE_QUIT_RECEIVED;
+    qemu_fclose(umemd.from_qemu);
+    umemd.from_qemu = NULL;
+    fd_close(&umemd.from_qemu_fd);
+}
+
+static void postcopy_incoming_umem_queue_quit(void)
+{
+    if (umemd.state & UMEM_STATE_QUIT_QUEUED) {
+        return;
+    }
+    DPRINTF("|= UMEM_STATE_QUIT_QUEUED\n");
+    umem_daemon_quit(umemd.to_qemu->file);
+    umemd.state |= UMEM_STATE_QUIT_QUEUED;
+}
+
+static void postcopy_incoming_umem_send_eoc_req(void)
+{
+    struct qemu_umem_req req;
+
+    if (umemd.state & UMEM_STATE_EOC_SENT) {
+        return;
+    }
+
+    DPRINTF("|= UMEM_STATE_EOC_SENT\n");
+    req.cmd = QEMU_UMEM_REQ_EOC;
+    postcopy_incoming_send_req(umemd.mig_write->file, &req);
+    umemd.state |= UMEM_STATE_EOC_SENT;
+    qemu_fclose(umemd.mig_write->file);
+    umemd.mig_write = NULL;
+    fd_close(&umemd.mig_write_fd);
+}
+
+static void postcopy_incoming_umem_send_page_req(RAMBlock *block)
+{
+    struct qemu_umem_req req;
+    int bit;
+    uint64_t target_pgoff;
+    int i;
+
+    umemd.page_request->nr = MAX_REQUESTS;
+    umem_get_page_request(block->umem, umemd.page_request);
+    DPRINTF("id %s nr %"PRId64" offs 0x%"PRIx64" 0x%"PRIx64"\n",
+            block->idstr, (uint64_t)umemd.page_request->nr,
+            (uint64_t)umemd.page_request->pgoffs[0],
+            (uint64_t)umemd.page_request->pgoffs[1]);
+
+    if (umemd.last_block_write != block) {
+        req.cmd = QEMU_UMEM_REQ_ON_DEMAND;
+        req.idstr = block->idstr;
+    } else {
+        req.cmd = QEMU_UMEM_REQ_ON_DEMAND_CONT;
+    }
+
+    req.nr = 0;
+    req.pgoffs = umemd.target_pgoffs;
+    if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
+        for (i = 0; i < umemd.page_request->nr; i++) {
+            target_pgoff = umemd.page_request->pgoffs[i] >>
+                umemd.host_to_target_page_shift;
+            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
+
+            if (!test_and_set_bit(bit, umemd.phys_requested)) {
+                req.pgoffs[req.nr] = target_pgoff;
+                req.nr++;
+            }
+        }
+    } else {
+        for (i = 0; i < umemd.page_request->nr; i++) {
+            int j;
+            target_pgoff = umemd.page_request->pgoffs[i] <<
+                umemd.host_to_target_page_shift;
+            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
+
+            for (j = 0; j < umemd.nr_target_pages_per_host_page; j++) {
+                if (!test_and_set_bit(bit + j, umemd.phys_requested)) {
+                    req.pgoffs[req.nr] = target_pgoff + j;
+                    req.nr++;
+                }
+            }
+        }
+    }
+
+    DPRINTF("id %s nr %d offs 0x%"PRIx64" 0x%"PRIx64"\n",
+            block->idstr, req.nr, req.pgoffs[0], req.pgoffs[1]);
+    if (req.nr > 0 && umemd.mig_write != NULL) {
+        postcopy_incoming_send_req(umemd.mig_write->file, &req);
+        umemd.last_block_write = block;
+    }
+}
+
+static void postcopy_incoming_umem_send_pages_present(void)
+{
+    if (umemd.present_request->nr > 0) {
+        umem_daemon_send_pages_present(umemd.to_qemu->file,
+                                       umemd.present_request);
+        umemd.present_request->nr = 0;
+    }
+}
+
+static void postcopy_incoming_umem_pages_present_one(
+    uint32_t nr, const uint64_t *pgoffs, uint64_t ramblock_pgoffset)
+{
+    uint32_t i;
+    assert(nr <= MAX_PRESENT_REQUESTS);
+
+    if (umemd.present_request->nr + nr > MAX_PRESENT_REQUESTS) {
+        postcopy_incoming_umem_send_pages_present();
+    }
+
+    for (i = 0; i < nr; i++) {
+        umemd.present_request->pgoffs[umemd.present_request->nr + i] =
+            pgoffs[i] + ramblock_pgoffset;
+    }
+    umemd.present_request->nr += nr;
+}
+
+static void postcopy_incoming_umem_pages_present(
+    const struct umem_pages *page_cached, uint64_t ramblock_pgoffset)
+{
+    uint32_t left = page_cached->nr;
+    uint32_t offset = 0;
+
+    while (left > 0) {
+        uint32_t nr = MIN(left, MAX_PRESENT_REQUESTS);
+        postcopy_incoming_umem_pages_present_one(
+            nr, &page_cached->pgoffs[offset], ramblock_pgoffset);
+
+        left -= nr;
+        offset += nr;
+    }
+}
+
+static int postcopy_incoming_umem_ram_load(void)
+{
+    ram_addr_t offset;
+    int flags;
+
+    int ret;
+    size_t skip = 0;
+    uint64_t be64;
+    RAMBlock *block;
+
+    void *shmem;
+    int error;
+    int i;
+    int bit;
+
+    if (umemd.version_id != RAM_SAVE_VERSION_ID) {
+        return -EINVAL;
+    }
+
+    ret = qemu_peek_buffer(umemd.mig_read, (uint8_t*)&be64, sizeof(be64),
+                           skip);
+    skip += ret;
+    if (ret != sizeof(be64)) {
+        return -EAGAIN;
+    }
+    offset = be64_to_cpu(be64);
+
+    flags = offset & ~TARGET_PAGE_MASK;
+    offset &= TARGET_PAGE_MASK;
+
+    assert(!(flags & RAM_SAVE_FLAG_MEM_SIZE));
+
+    if (flags & RAM_SAVE_FLAG_EOS) {
+        DPRINTF("RAM_SAVE_FLAG_EOS\n");
+        postcopy_incoming_umem_send_eoc_req();
+
+        qemu_fclose(umemd.mig_read);
+        umemd.mig_read = NULL;
+        fd_close(&umemd.mig_read_fd);
+        umemd.state |= UMEM_STATE_EOS_RECEIVED;
+
+        postcopy_incoming_umem_queue_quit();
+        DPRINTF("|= UMEM_STATE_EOS_RECEIVED\n");
+        return 0;
+    }
+
+    block = NULL;
+    if (flags & RAM_SAVE_FLAG_CONTINUE) {
+        block = umemd.last_block_read;
+    } else {
+        uint8_t len;
+        char id[256];
+
+        ret = qemu_peek_buffer(umemd.mig_read, &len, sizeof(len), skip);
+        skip += ret;
+        if (ret != sizeof(len)) {
+            return -EAGAIN;
+        }
+        ret = qemu_peek_buffer(umemd.mig_read, (uint8_t*)id, len, skip);
+        skip += ret;
+        if (ret != len) {
+            return -EAGAIN;
+        }
+        block = ram_find_block(id, len);
+    }
+
+    if (block == NULL) {
+        return -EINVAL;
+    }
+    umemd.last_block_read = block;
+    shmem = block->host + offset;
+
+    if (flags & RAM_SAVE_FLAG_COMPRESS) {
+        uint8_t ch;
+        ret = qemu_peek_buffer(umemd.mig_read, &ch, sizeof(ch), skip);
+        skip += ret;
+        if (ret != sizeof(ch)) {
+            return -EAGAIN;
+        }
+        memset(shmem, ch, TARGET_PAGE_SIZE);
+    } else if (flags & RAM_SAVE_FLAG_PAGE) {
+        ret = qemu_peek_buffer(umemd.mig_read, shmem, TARGET_PAGE_SIZE, skip);
+        skip += ret;
+        if (ret != TARGET_PAGE_SIZE){
+            return -EAGAIN;
+        }
+    }
+    qemu_file_skip(umemd.mig_read, skip);
+
+    error = qemu_file_get_error(umemd.mig_read);
+    if (error) {
+        DPRINTF("error %d\n", error);
+        return error;
+    }
+
+    qemu_madvise(shmem, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
+
+    umemd.page_cached->nr = 0;
+    bit = (umemd.last_block_read->offset + offset) >> TARGET_PAGE_BITS;
+    if (!test_and_set_bit(bit, umemd.phys_received)) {
+        if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
+            uint64_t pgoff = offset >> umemd.host_page_shift;
+            for (i = 0; i < umemd.nr_host_pages_per_target_page; i++) {
+                umemd.page_cached->pgoffs[umemd.page_cached->nr] = pgoff + i;
+                umemd.page_cached->nr++;
+            }
+        } else {
+            bool mark_cache = true;
+            for (i = 0; i < umemd.nr_target_pages_per_host_page; i++) {
+                if (!test_bit(bit + i, umemd.phys_received)) {
+                    mark_cache = false;
+                    break;
+                }
+            }
+            if (mark_cache) {
+                umemd.page_cached->pgoffs[0] = offset >> umemd.host_page_shift;
+                umemd.page_cached->nr = 1;
+            }
+        }
+    }
+
+    if (umemd.page_cached->nr > 0) {
+        umem_mark_page_cached(umemd.last_block_read->umem, umemd.page_cached);
+
+        if (!(umemd.state & UMEM_STATE_QUIT_QUEUED) && umemd.to_qemu_fd >=0 &&
+            (incoming_postcopy_flags & INCOMING_FLAGS_FAULT_REQUEST)) {
+            uint64_t ramblock_pgoffset;
+
+            ramblock_pgoffset =
+                umemd.last_block_read->offset >> umemd.host_page_shift;
+            postcopy_incoming_umem_pages_present(umemd.page_cached,
+                                                 ramblock_pgoffset);
+        }
+    }
+
+    return 0;
+}
+
+static bool postcopy_incoming_umem_check_umem_done(void)
+{
+    bool all_done = true;
+    RAMBlock *block;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        UMem *umem = block->umem;
+        if (umem != NULL && umem->nsets == umem->nbits) {
+            umem_unmap_shmem(umem);
+            umem_destroy(umem);
+            block->umem = NULL;
+        }
+        if (block->umem != NULL) {
+            all_done = false;
+        }
+    }
+    return all_done;
+}
+
+static bool postcopy_incoming_umem_page_faulted(const struct umem_pages *pages)
+{
+    int i;
+
+    for (i = 0; i < pages->nr; i++) {
+        ram_addr_t addr = pages->pgoffs[i] << umemd.host_page_shift;
+        RAMBlock *block = qemu_get_ram_block(addr);
+        addr -= block->offset;
+        umem_remove_shmem(block->umem, addr, umemd.host_page_size);
+    }
+    return postcopy_incoming_umem_check_umem_done();
+}
+
+static bool
+postcopy_incoming_umem_page_unmapped(const struct umem_pages *pages)
+{
+    RAMBlock *block;
+    ram_addr_t addr;
+    int i;
+
+    struct qemu_umem_req req = {
+        .cmd = QEMU_UMEM_REQ_REMOVE,
+        .nr = 0,
+        .pgoffs = (uint64_t*)pages->pgoffs,
+    };
+
+    addr = pages->pgoffs[0] << umemd.host_page_shift;
+    block = qemu_get_ram_block(addr);
+
+    for (i = 0; i < pages->nr; i++)  {
+        int pgoff;
+
+        addr = pages->pgoffs[i] << umemd.host_page_shift;
+        pgoff = addr >> TARGET_PAGE_BITS;
+        if (!test_bit(pgoff, umemd.phys_received) &&
+            !test_bit(pgoff, umemd.phys_requested)) {
+            req.pgoffs[req.nr] = pgoff;
+            req.nr++;
+        }
+        set_bit(pgoff, umemd.phys_received);
+        set_bit(pgoff, umemd.phys_requested);
+
+        umem_remove_shmem(block->umem,
+                          addr - block->offset, umemd.host_page_size);
+    }
+    if (req.nr > 0 && umemd.mig_write != NULL) {
+        req.idstr = block->idstr;
+        postcopy_incoming_send_req(umemd.mig_write->file, &req);
+    }
+
+    return postcopy_incoming_umem_check_umem_done();
+}
+
+static void postcopy_incoming_umem_done(void)
+{
+    postcopy_incoming_umem_send_eoc_req();
+    postcopy_incoming_umem_queue_quit();
+}
+
+static int postcopy_incoming_umem_handle_qemu(void)
+{
+    int ret;
+    int offset = 0;
+    uint8_t cmd;
+
+    ret = qemu_peek_buffer(umemd.from_qemu, &cmd, sizeof(cmd), offset);
+    offset += sizeof(cmd);
+    if (ret != sizeof(cmd)) {
+        return -EAGAIN;
+    }
+    DPRINTF("cmd %c\n", cmd);
+    switch (cmd) {
+    case UMEM_QEMU_QUIT:
+        postcopy_incoming_umem_recv_quit();
+        postcopy_incoming_umem_done();
+        break;
+    case UMEM_QEMU_PAGE_FAULTED: {
+        struct umem_pages *pages = umem_recv_pages(umemd.from_qemu,
+                                                   &offset);
+        if (pages == NULL) {
+            return -EAGAIN;
+        }
+        if (postcopy_incoming_umem_page_faulted(pages)){
+            postcopy_incoming_umem_done();
+        }
+        g_free(pages);
+        break;
+    }
+    case UMEM_QEMU_PAGE_UNMAPPED: {
+        struct umem_pages *pages = umem_recv_pages(umemd.from_qemu,
+                                                   &offset);
+        if (pages == NULL) {
+            return -EAGAIN;
+        }
+        if (postcopy_incoming_umem_page_unmapped(pages)){
+            postcopy_incoming_umem_done();
+        }
+        g_free(pages);
+        break;
+    }
+    default:
+        abort();
+        break;
+    }
+    if (umemd.from_qemu != NULL) {
+        qemu_file_skip(umemd.from_qemu, offset);
+    }
+    return 0;
+}
+
+static void set_fd(int fd, fd_set *fds, int *nfds)
+{
+    FD_SET(fd, fds);
+    if (fd > *nfds) {
+        *nfds = fd;
+    }
+}
+
+static int postcopy_incoming_umemd_main_loop(void)
+{
+    fd_set writefds;
+    fd_set readfds;
+    int nfds;
+    RAMBlock *block;
+    int ret;
+
+    int pending_size;
+    bool get_page_request;
+
+    nfds = -1;
+    FD_ZERO(&writefds);
+    FD_ZERO(&readfds);
+
+    if (umemd.mig_write != NULL) {
+        pending_size = nonblock_pending_size(umemd.mig_write);
+        if (pending_size > 0) {
+            set_fd(umemd.mig_write_fd, &writefds, &nfds);
+        }
+    } else {
+        pending_size = 0;
+    }
+
+#define PENDING_SIZE_MAX (MAX_REQUESTS * sizeof(uint64_t) * 2)
+    /* If page request to the migration source is accumulated,
+       suspend getting page fault request. */
+    get_page_request = (pending_size <= PENDING_SIZE_MAX);
+
+    if (get_page_request) {
+        QLIST_FOREACH(block, &ram_list.blocks, next) {
+            if (block->umem != NULL) {
+                set_fd(block->umem->fd, &readfds, &nfds);
+            }
+        }
+    }
+
+    if (umemd.mig_read_fd >= 0) {
+        set_fd(umemd.mig_read_fd, &readfds, &nfds);
+    }
+
+    if (umemd.to_qemu != NULL &&
+        nonblock_pending_size(umemd.to_qemu) > 0) {
+        set_fd(umemd.to_qemu_fd, &writefds, &nfds);
+    }
+    if (umemd.from_qemu_fd >= 0) {
+        set_fd(umemd.from_qemu_fd, &readfds, &nfds);
+    }
+
+    ret = select(nfds + 1, &readfds, &writefds, NULL, NULL);
+    if (ret == -1) {
+        if (errno == EINTR) {
+            return 0;
+        }
+        return ret;
+    }
+
+    if (umemd.mig_write_fd >= 0 && FD_ISSET(umemd.mig_write_fd, &writefds)) {
+        nonblock_fflush(umemd.mig_write);
+    }
+    if (umemd.to_qemu_fd >= 0 && FD_ISSET(umemd.to_qemu_fd, &writefds)) {
+        nonblock_fflush(umemd.to_qemu);
+    }
+    if (get_page_request) {
+        QLIST_FOREACH(block, &ram_list.blocks, next) {
+            if (block->umem != NULL && FD_ISSET(block->umem->fd, &readfds)) {
+                postcopy_incoming_umem_send_page_req(block);
+            }
+        }
+    }
+    if (umemd.mig_read_fd >= 0 && FD_ISSET(umemd.mig_read_fd, &readfds)) {
+        do {
+            ret = postcopy_incoming_umem_ram_load();
+            if (ret == -EAGAIN) {
+                break;
+            }
+            if (ret < 0) {
+                return ret;
+            }
+        } while (umemd.mig_read != NULL &&
+                 qemu_pending_size(umemd.mig_read) > 0);
+    }
+    if (umemd.from_qemu_fd >= 0 && FD_ISSET(umemd.from_qemu_fd, &readfds)) {
+        do {
+            ret = postcopy_incoming_umem_handle_qemu();
+            if (ret == -EAGAIN) {
+                break;
+            }
+        } while (umemd.from_qemu != NULL &&
+                 qemu_pending_size(umemd.from_qemu) > 0);
+    }
+
+    if (umemd.mig_write != NULL) {
+        nonblock_fflush(umemd.mig_write);
+    }
+    if (umemd.to_qemu != NULL) {
+        if (!(umemd.state & UMEM_STATE_QUIT_QUEUED)) {
+            postcopy_incoming_umem_send_pages_present();
+        }
+        nonblock_fflush(umemd.to_qemu);
+        if ((umemd.state & UMEM_STATE_QUIT_QUEUED) &&
+            nonblock_pending_size(umemd.to_qemu) == 0) {
+            DPRINTF("|= UMEM_STATE_QUIT_SENT\n");
+            qemu_fclose(umemd.to_qemu->file);
+            umemd.to_qemu = NULL;
+            fd_close(&umemd.to_qemu_fd);
+            umemd.state |= UMEM_STATE_QUIT_SENT;
+        }
+    }
+
+    return (umemd.state & UMEM_STATE_END_MASK) == UMEM_STATE_END_MASK;
+}
+
+static void postcopy_incoming_umemd(void)
+{
+    ram_addr_t last_ram_offset;
+    int nbits;
+    RAMBlock *block;
+    int ret;
+
+    qemu_daemon(1, 1);
+    signal(SIGPIPE, SIG_IGN);
+    DPRINTF("daemon pid: %d\n", getpid());
+
+    umemd.page_request = g_malloc(umem_pages_size(MAX_REQUESTS));
+
+    umemd.page_cached = g_malloc(
+        umem_pages_size(MAX_REQUESTS *
+                        (TARGET_PAGE_SIZE >= umemd.host_page_size ?
+                         1: umemd.nr_host_pages_per_target_page)));
+
+    umemd.target_pgoffs =
+        g_new(uint64_t, MAX_REQUESTS *
+              MAX(umemd.nr_host_pages_per_target_page,
+                  umemd.nr_target_pages_per_host_page));
+    umemd.present_request = g_malloc(umem_pages_size(MAX_PRESENT_REQUESTS));
+    umemd.present_request->nr = 0;
+
+    last_ram_offset = qemu_last_ram_offset();
+    nbits = last_ram_offset >> TARGET_PAGE_BITS;
+    umemd.phys_requested = g_new0(unsigned long, BITS_TO_LONGS(nbits));
+    umemd.phys_received = g_new0(unsigned long, BITS_TO_LONGS(nbits));
+    umemd.last_block_read = NULL;
+    umemd.last_block_write = NULL;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        UMem *umem = block->umem;
+        umem->umem = NULL;      /* umem mapping area has VM_DONT_COPY flag,
+                                   so we lost those mappings by fork */
+        block->host = umem_map_shmem(umem);
+        umem_close_shmem(umem);
+    }
+    umem_daemon_ready(umemd.to_qemu_fd);
+    umemd.to_qemu = qemu_fopen_nonblock(umemd.to_qemu_fd);
+
+    /* wait for qemu to disown migration_fd */
+    umem_daemon_wait_for_qemu(umemd.from_qemu_fd);
+    umemd.from_qemu = qemu_fopen_fd(umemd.from_qemu_fd);
+
+    DPRINTF("entering umemd main loop\n");
+    for (;;) {
+        ret = postcopy_incoming_umemd_main_loop();
+        if (ret != 0) {
+            break;
+        }
+    }
+    DPRINTF("exiting umemd main loop\n");
+
+    /* This daemon forked from qemu and the parent qemu is still running.
+     * Cleanups of linked libraries like SDL should not be triggered,
+     * otherwise the parent qemu may use resources which was already freed.
+     */
+    fflush(stdout);
+    fflush(stderr);
+    _exit(ret < 0? EXIT_FAILURE: 0);
+}
diff --git a/migration.c b/migration.c
index 7ad62ef..462620f 100644
--- a/migration.c
+++ b/migration.c
@@ -65,6 +65,10 @@ int qemu_start_incoming_migration(const char *uri, Error **errp)
     const char *p;
     int ret;
 
+    if (incoming_postcopy) {
+        postcopy_incoming_prepare();
+    }
+
     if (strstart(uri, "tcp:", &p))
         ret = tcp_start_incoming_migration(p, errp);
 #if !defined(WIN32)
diff --git a/migration.h b/migration.h
index 091b446..e6f8006 100644
--- a/migration.h
+++ b/migration.h
@@ -84,6 +84,7 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void ram_save_set_params(const MigrationParams *params, void *opaque);
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
 void ram_save_memory_set_dirty(void);
@@ -105,7 +106,19 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+/* For incoming postcopy */
 extern bool incoming_postcopy;
 extern unsigned long incoming_postcopy_flags;
 
+void postcopy_incoming_ram_free(UMem *umem);
+void postcopy_incoming_prepare(void);
+
+int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id);
+void postcopy_incoming_fork_umemd(QEMUFile *mig_read);
+void postcopy_incoming_qemu_ready(void);
+void postcopy_incoming_qemu_cleanup(void);
+#if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
+void postcopy_incoming_qemu_pages_unmapped(ram_addr_t addr, ram_addr_t size);
+#endif
+
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index 057c810..598ad4c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct UMem UMem;
 
 /* we put basic includes here to avoid repeating them in device drivers */
 #include <stdlib.h>
diff --git a/qemu-options.hx b/qemu-options.hx
index a9af31e..b71d1f9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2629,7 +2629,10 @@ DEF("postcopy-flags", HAS_ARG, QEMU_OPTION_postcopy_flags,
     "-postcopy-flags unsigned-int(flags)\n"
     "	                flags for postcopy incoming migration\n"
     "                   when -incoming and -postcopy are specified.\n"
-    "                   This is for benchmark/debug purpose (default: 0)\n",
+    "                   This is for benchmark/debug purpose (default: 0)\n"
+    "                   Currently supprted flags are\n"
+    "                   1: enable fault request from umemd to qemu\n"
+    "                      (default: disabled)\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -postcopy-flags int
diff --git a/savevm.c b/savevm.c
index bd4b5bf..74b15e7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1938,6 +1938,7 @@ int qemu_loadvm_state(QEMUFile *f)
     uint8_t section_type;
     unsigned int v;
     int ret;
+    QEMUFile *orig_f = NULL;
 
     if (qemu_savevm_state_blocked(NULL)) {
         return -EINVAL;
@@ -1964,6 +1965,16 @@ int qemu_loadvm_state(QEMUFile *f)
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
+            if (section_type == QEMU_VM_SECTION_START) {
+                assert(orig_f == NULL);
+            } else {
+                if (incoming_postcopy && orig_f == NULL) {
+                    fprintf(stderr, "qemu: warning no postcopy section\n");
+                    ret = -EINVAL;
+                    goto out;
+                }
+            }
+
             /* Read section start */
             section_id = qemu_get_be32(f);
             len = qemu_get_byte(f);
@@ -2005,6 +2016,7 @@ int qemu_loadvm_state(QEMUFile *f)
             break;
         case QEMU_VM_SECTION_PART:
         case QEMU_VM_SECTION_END:
+            assert(orig_f == NULL);
             section_id = qemu_get_be32(f);
 
             QLIST_FOREACH(le, &loadvm_handlers, entry) {
@@ -2025,6 +2037,31 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_POSTCOPY:
+            if (incoming_postcopy) {
+                /* VMStateDescription:pre/post_load and
+                 * cpu_sychronize_all_post_init() may fault on guest RAM.
+                 * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME)
+                 * postcopy daemon needs to be forked before the fault.
+                 */
+                uint32_t size = qemu_get_be32(f);
+                uint8_t *buf = g_malloc(size);
+                int read_size = qemu_get_buffer(f, buf, size);
+                if (size != read_size) {
+                    fprintf(stderr,
+                            "qemu: warning: error while postcopy size %d %d\n",
+                            size, read_size);
+                    g_free(buf);
+                    ret = -EINVAL;
+                    goto out;
+                }
+                postcopy_incoming_fork_umemd(f);
+
+                orig_f = f;
+                f = qemu_fopen_buf_read(buf, size);
+                break;
+            }
+            /* fallthrough */
         default:
             fprintf(stderr, "Unknown savevm section type %d\n", section_type);
             ret = -EINVAL;
@@ -2032,11 +2069,17 @@ int qemu_loadvm_state(QEMUFile *f)
         }
     }
 
+    fprintf(stderr, "%s:%d QEMU_VM_EOF\n", __func__, __LINE__);
     cpu_synchronize_all_post_init();
 
     ret = 0;
 
 out:
+    if (orig_f != NULL) {
+        assert(incoming_postcopy);
+        qemu_fclose(f);
+        f = orig_f;
+    }
     QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
         QLIST_REMOVE(le, entry);
         g_free(le);
diff --git a/vl.c b/vl.c
index 1674abb..86e2287 100644
--- a/vl.c
+++ b/vl.c
@@ -3444,8 +3444,10 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_sdcard, snapshot, machine->use_scsi,
                   IF_SD, 0, SD_OPTS);
 
-    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID, NULL,
-                         ram_save_live, NULL, ram_load, NULL);
+    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID,
+                         ram_save_set_params, ram_save_live, NULL,
+                         incoming_postcopy ?
+                         postcopy_incoming_ram_load : ram_load, NULL);
 
     if (nb_numa_nodes > 0) {
         int i;
@@ -3664,6 +3666,8 @@ int main(int argc, char **argv, char **envp)
     bdrv_close_all();
     pause_all_vcpus();
     net_cleanup();
+    postcopy_incoming_qemu_cleanup();
+
     res_free();
 
     return 0;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (35 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-14 22:12   ` Juan Quintela
  2012-06-04  9:57 ` [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault Isaku Yamahata
                   ` (5 subsequent siblings)
  42 siblings, 1 reply; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

This patch implements postcopy live migration for outgoing part

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
Changes v1 -> v2:
- fix parameter to qemu_fdopen()
- handle QEMU_UMEM_REQ_EOC properly
  when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
  handle properly it.
- flush on-demand page unconditionally
- improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
- use qemu_fopen_fd
- use memory api instead of obsolete api
- segv in postcopy_outgoing_check_all_ram_sent()
- catch up qapi change
---
 arch_init.c               |   19 ++-
 migration-exec.c          |    4 +
 migration-fd.c            |   17 ++
 migration-postcopy-stub.c |   22 +++
 migration-postcopy.c      |  450 +++++++++++++++++++++++++++++++++++++++++++++
 migration-tcp.c           |   25 ++-
 migration-unix.c          |   26 ++-
 migration.c               |   32 +++-
 migration.h               |   12 ++
 savevm.c                  |   22 ++-
 sysemu.h                  |    2 +-
 11 files changed, 614 insertions(+), 17 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 22d9691..3599e5c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,13 @@ static int is_dup_page(uint8_t *page)
     return 1;
 }
 
+static bool outgoing_postcopy = false;
+
+void ram_save_set_params(const MigrationParams *params, void *opaque)
+{
+    outgoing_postcopy = params->postcopy;
+}
+
 static RAMBlock *last_block_sent = NULL;
 static uint64_t bytes_transferred;
 
@@ -343,6 +350,15 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     uint64_t expected_time = 0;
     int ret;
 
+    if (stage == 1) {
+        bytes_transferred = 0;
+        last_block_sent = NULL;
+        ram_save_set_last_block(NULL, 0);
+    }
+    if (outgoing_postcopy) {
+        return postcopy_outgoing_ram_save_live(f, stage, opaque);
+    }
+
     if (stage < 0) {
         memory_global_dirty_log_stop();
         return 0;
@@ -351,9 +367,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     memory_global_sync_dirty_bitmap(get_system_memory());
 
     if (stage == 1) {
-        bytes_transferred = 0;
-        last_block_sent = NULL;
-        ram_save_set_last_block(NULL, 0);
         sort_ram_list();
 
         /* Make sure all dirty bits are set */
diff --git a/migration-exec.c b/migration-exec.c
index 7f08b3b..a90da5c 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command)
 {
     FILE *f;
 
+    if (s->params.postcopy) {
+        return -ENOSYS;
+    }
+
     f = popen(command, "w");
     if (f == NULL) {
         DPRINTF("Unable to popen exec target\n");
diff --git a/migration-fd.c b/migration-fd.c
index 42b8162..83b5f18 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
     s->write = fd_write;
     s->close = fd_close;
 
+    if (s->params.postcopy) {
+        int flags = fcntl(s->fd, F_GETFL);
+        if ((flags & O_ACCMODE) != O_RDWR) {
+            goto err_after_open;
+        }
+
+        s->fd_read = dup(s->fd);
+        if (s->fd_read == -1) {
+            goto err_after_open;
+        }
+        s->file_read = qemu_fopen_fd(s->fd_read);
+        if (s->file_read == NULL) {
+            close(s->fd_read);
+            goto err_after_open;
+        }
+    }
+
     migrate_fd_connect(s);
     return 0;
 
diff --git a/migration-postcopy-stub.c b/migration-postcopy-stub.c
index f9ebcbe..9c64827 100644
--- a/migration-postcopy-stub.c
+++ b/migration-postcopy-stub.c
@@ -24,6 +24,28 @@
 #include "sysemu.h"
 #include "migration.h"
 
+int postcopy_outgoing_create_read_socket(MigrationState *s)
+{
+    return -ENOSYS;
+}
+
+int postcopy_outgoing_ram_save_live(Monitor *mon,
+                                    QEMUFile *f, int stage, void *opaque)
+{
+    return -ENOSYS;
+}
+
+void *postcopy_outgoing_begin(MigrationState *ms)
+{
+    return NULL;
+}
+
+int postcopy_outgoing_ram_save_background(Monitor *mon, QEMUFile *f,
+                                          void *postcopy)
+{
+    return -ENOSYS;
+}
+
 int postcopy_incoming_init(const char *incoming, bool incoming_postcopy)
 {
     return -ENOSYS;
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 5913e05..eb37094 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -177,6 +177,456 @@ static void postcopy_incoming_send_req(QEMUFile *f,
     }
 }
 
+static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
+                                            struct qemu_umem_req *req,
+                                            size_t *offset)
+{
+    int ret;
+
+    req->len = qemu_peek_byte(f, *offset);
+    *offset += 1;
+    if (req->len == 0) {
+        return -EAGAIN;
+    }
+    req->idstr = g_malloc((int)req->len + 1);
+    ret = qemu_peek_buffer(f, (uint8_t*)req->idstr, req->len, *offset);
+    *offset += ret;
+    if (ret != req->len) {
+        g_free(req->idstr);
+        req->idstr = NULL;
+        return -EAGAIN;
+    }
+    req->idstr[req->len] = 0;
+    return 0;
+}
+
+static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f,
+                                             struct qemu_umem_req *req,
+                                             size_t *offset)
+{
+    int ret;
+    uint32_t be32;
+    uint32_t i;
+
+    ret = qemu_peek_buffer(f, (uint8_t*)&be32, sizeof(be32), *offset);
+    *offset += sizeof(be32);
+    if (ret != sizeof(be32)) {
+        return -EAGAIN;
+    }
+
+    req->nr = be32_to_cpu(be32);
+    req->pgoffs = g_new(uint64_t, req->nr);
+    for (i = 0; i < req->nr; i++) {
+        uint64_t be64;
+        ret = qemu_peek_buffer(f, (uint8_t*)&be64, sizeof(be64), *offset);
+        *offset += sizeof(be64);
+        if (ret != sizeof(be64)) {
+            g_free(req->pgoffs);
+            req->pgoffs = NULL;
+            return -EAGAIN;
+        }
+        req->pgoffs[i] = be64_to_cpu(be64);
+    }
+    return 0;
+}
+
+static int postcopy_outgoing_recv_req(QEMUFile *f, struct qemu_umem_req *req)
+{
+    int size;
+    int ret;
+    size_t offset = 0;
+
+    size = qemu_peek_buffer(f, (uint8_t*)&req->cmd, 1, offset);
+    if (size <= 0) {
+        return -EAGAIN;
+    }
+    offset += 1;
+
+    switch (req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+    case QEMU_UMEM_REQ_EOC:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_ON_DEMAND:
+    case QEMU_UMEM_REQ_BACKGROUND:
+    case QEMU_UMEM_REQ_REMOVE:
+        ret = postcopy_outgoing_recv_req_idstr(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        break;
+    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
+    case QEMU_UMEM_REQ_BACKGROUND_CONT:
+        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
+        if (ret < 0) {
+            return ret;
+        }
+        break;
+    default:
+        abort();
+        break;
+    }
+    qemu_file_skip(f, offset);
+    DPRINTF("cmd %d\n", req->cmd);
+    return 0;
+}
+
+static void postcopy_outgoing_free_req(struct qemu_umem_req *req)
+{
+    g_free(req->idstr);
+    g_free(req->pgoffs);
+}
+
+/***************************************************************************
+ * outgoing part
+ */
+
+#define QEMU_SAVE_LIVE_STAGE_START      0x01    /* = QEMU_VM_SECTION_START */
+#define QEMU_SAVE_LIVE_STAGE_PART       0x02    /* = QEMU_VM_SECTION_PART */
+#define QEMU_SAVE_LIVE_STAGE_END        0x03    /* = QEMU_VM_SECTION_END */
+
+enum POState {
+    PO_STATE_ERROR_RECEIVE,
+    PO_STATE_ACTIVE,
+    PO_STATE_EOC_RECEIVED,
+    PO_STATE_ALL_PAGES_SENT,
+    PO_STATE_COMPLETED,
+};
+typedef enum POState POState;
+
+struct PostcopyOutgoingState {
+    POState state;
+    QEMUFile *mig_read;
+    int fd_read;
+    RAMBlock *last_block_read;
+
+    QEMUFile *mig_buffered_write;
+    MigrationState *ms;
+
+    /* For nobg mode. Check if all pages are sent */
+    RAMBlock *block;
+    ram_addr_t offset;
+};
+typedef struct PostcopyOutgoingState PostcopyOutgoingState;
+
+int postcopy_outgoing_create_read_socket(MigrationState *s)
+{
+    if (!s->params.postcopy) {
+        return 0;
+    }
+
+    s->fd_read = dup(s->fd);
+    if (s->fd_read == -1) {
+        int ret = -errno;
+        perror("dup");
+        return ret;
+    }
+    s->file_read = qemu_fopen_socket(s->fd_read);
+    if (s->file_read == NULL) {
+        return -EINVAL;
+    }
+    return 0;
+}
+
+int postcopy_outgoing_ram_save_live(QEMUFile *f, int stage, void *opaque)
+{
+    int ret = 0;
+    DPRINTF("stage %d\n", stage);
+    switch (stage) {
+    case QEMU_SAVE_LIVE_STAGE_START:
+        sort_ram_list();
+        ram_save_live_mem_size(f);
+        break;
+    case QEMU_SAVE_LIVE_STAGE_PART:
+        ret = 1;
+        break;
+    case QEMU_SAVE_LIVE_STAGE_END:
+        break;
+    default:
+        abort();
+    }
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    return ret;
+}
+
+/*
+ * return value
+ *   0: continue postcopy mode
+ * > 0: completed postcopy mode.
+ * < 0: error
+ */
+static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
+                                        const struct qemu_umem_req *req,
+                                        bool *written)
+{
+    int i;
+    RAMBlock *block;
+
+    DPRINTF("cmd %d state %d\n", req->cmd, s->state);
+    switch(req->cmd) {
+    case QEMU_UMEM_REQ_INIT:
+        /* nothing */
+        break;
+    case QEMU_UMEM_REQ_EOC:
+        /* tell to finish migration. */
+        if (s->state == PO_STATE_ALL_PAGES_SENT) {
+            s->state = PO_STATE_COMPLETED;
+            DPRINTF("-> PO_STATE_COMPLETED\n");
+        } else {
+            s->state = PO_STATE_EOC_RECEIVED;
+            DPRINTF("-> PO_STATE_EOC_RECEIVED\n");
+        }
+        return 1;
+    case QEMU_UMEM_REQ_ON_DEMAND:
+    case QEMU_UMEM_REQ_BACKGROUND:
+        DPRINTF("idstr: %s\n", req->idstr);
+        block = ram_find_block(req->idstr, strlen(req->idstr));
+        if (block == NULL) {
+            return -EINVAL;
+        }
+        s->last_block_read = block;
+        /* fall through */
+    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
+    case QEMU_UMEM_REQ_BACKGROUND_CONT:
+        DPRINTF("nr %d\n", req->nr);
+        if (s->mig_buffered_write == NULL) {
+            assert(s->state == PO_STATE_ALL_PAGES_SENT);
+            break;
+        }
+        for (i = 0; i < req->nr; i++) {
+            DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
+            int ret = ram_save_page(s->mig_buffered_write, s->last_block_read,
+                                    req->pgoffs[i] << TARGET_PAGE_BITS);
+            if (ret > 0) {
+                *written = true;
+            }
+        }
+        break;
+    case QEMU_UMEM_REQ_REMOVE:
+        block = ram_find_block(req->idstr, strlen(req->idstr));
+        if (block == NULL) {
+            return -EINVAL;
+        }
+        for (i = 0; i < req->nr; i++) {
+            ram_addr_t offset = req->pgoffs[i] << TARGET_PAGE_BITS;
+            memory_region_reset_dirty(block->mr, offset, TARGET_PAGE_SIZE,
+                                      MIGRATION_DIRTY_FLAG);
+        }
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static void postcopy_outgoing_close_mig_read(PostcopyOutgoingState *s)
+{
+    if (s->mig_read != NULL) {
+        qemu_set_fd_handler(s->fd_read, NULL, NULL, NULL);
+        qemu_fclose(s->mig_read);
+        s->mig_read = NULL;
+        fd_close(&s->fd_read);
+
+        s->ms->file_read = NULL;
+        s->ms->fd_read = -1;
+    }
+}
+
+static void postcopy_outgoing_completed(PostcopyOutgoingState *s)
+{
+    postcopy_outgoing_close_mig_read(s);
+    s->ms->postcopy = NULL;
+    g_free(s);
+}
+
+static void postcopy_outgoing_recv_handler(void *opaque)
+{
+    PostcopyOutgoingState *s = opaque;
+    bool written = false;
+    int ret = 0;
+
+    assert(s->state == PO_STATE_ACTIVE ||
+           s->state == PO_STATE_ALL_PAGES_SENT);
+
+    do {
+        struct qemu_umem_req req = {.idstr = NULL,
+                                    .pgoffs = NULL};
+
+        ret = postcopy_outgoing_recv_req(s->mig_read, &req);
+        if (ret < 0) {
+            if (ret == -EAGAIN) {
+                ret = 0;
+            }
+            break;
+        }
+
+        /* Even when s->state == PO_STATE_ALL_PAGES_SENT,
+           some request can be received like QEMU_UMEM_REQ_EOC */
+        ret = postcopy_outgoing_handle_req(s, &req, &written);
+        postcopy_outgoing_free_req(&req);
+    } while (ret == 0);
+
+    /*
+     * flush buffered_file.
+     * Although mig_write is rate-limited buffered file, those written pages
+     * are requested on demand by the destination. So forcibly push
+     * those pages ignoring rate limiting
+     */
+    if (written) {
+        qemu_buffered_file_drain(s->mig_buffered_write);
+    }
+
+    if (ret < 0) {
+        switch (s->state) {
+        case PO_STATE_ACTIVE:
+            s->state = PO_STATE_ERROR_RECEIVE;
+            DPRINTF("-> PO_STATE_ERROR_RECEIVE\n");
+            break;
+        case PO_STATE_ALL_PAGES_SENT:
+            s->state = PO_STATE_COMPLETED;
+            DPRINTF("-> PO_STATE_ALL_PAGES_SENT\n");
+            break;
+        default:
+            abort();
+        }
+    }
+    if (s->state == PO_STATE_ERROR_RECEIVE || s->state == PO_STATE_COMPLETED) {
+        postcopy_outgoing_close_mig_read(s);
+    }
+    if (s->state == PO_STATE_COMPLETED) {
+        DPRINTF("PO_STATE_COMPLETED\n");
+        MigrationState *ms = s->ms;
+        postcopy_outgoing_completed(s);
+        migrate_fd_completed(ms);
+    }
+}
+
+void *postcopy_outgoing_begin(MigrationState *ms)
+{
+    PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1);
+    DPRINTF("outgoing begin\n");
+    qemu_fflush(ms->file);
+
+    s->ms = ms;
+    s->state = PO_STATE_ACTIVE;
+    s->fd_read = ms->fd_read;
+    s->mig_read = ms->file_read;
+    s->mig_buffered_write = ms->file;
+    s->block = NULL;
+    s->offset = 0;
+
+    /* Make sure all dirty bits are set */
+    cpu_physical_memory_set_dirty_tracking(0);
+    ram_save_memory_set_dirty();
+
+    qemu_set_fd_handler(s->fd_read,
+                        &postcopy_outgoing_recv_handler, NULL, s);
+    return s;
+}
+
+static void postcopy_outgoing_ram_all_sent(QEMUFile *f,
+                                           PostcopyOutgoingState *s)
+{
+    assert(s->state == PO_STATE_ACTIVE);
+
+    s->state = PO_STATE_ALL_PAGES_SENT;
+    /* tell incoming side that all pages are sent */
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_buffered_file_drain(f);
+    DPRINTF("sent RAM_SAVE_FLAG_EOS\n");
+    migrate_fd_cleanup(s->ms);
+
+    /* Later migrate_fd_complete() will be called which calls
+     * migrate_fd_cleanup() again. So dummy file is created
+     * for qemu monitor to keep working.
+     */
+    s->ms->file = qemu_fopen_ops(NULL, NULL, NULL, NULL, NULL,
+                                 NULL, NULL);
+    s->mig_buffered_write = NULL;
+}
+
+static int postcopy_outgoing_check_all_ram_sent(PostcopyOutgoingState *s,
+                                                RAMBlock *block,
+                                                ram_addr_t offset)
+{
+    if (block == NULL) {
+        block = QLIST_FIRST(&ram_list.blocks);
+        offset = 0;
+    }
+
+    for (; block != NULL; block = QLIST_NEXT(block, next), offset = 0) {
+        for (; offset < block->length; offset += TARGET_PAGE_SIZE) {
+            if (memory_region_get_dirty(block->mr, offset, TARGET_PAGE_SIZE,
+                                        DIRTY_MEMORY_MIGRATION)) {
+                s->block = block;
+                s->offset = offset;
+                return 0;
+            }
+        }
+    }
+
+    return 1;
+}
+
+int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy)
+{
+    PostcopyOutgoingState *s = postcopy;
+
+    assert(s->state == PO_STATE_ACTIVE ||
+           s->state == PO_STATE_EOC_RECEIVED ||
+           s->state == PO_STATE_ERROR_RECEIVE);
+
+    switch (s->state) {
+    case PO_STATE_ACTIVE:
+        /* nothing. processed below */
+        break;
+    case PO_STATE_EOC_RECEIVED:
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+        s->state = PO_STATE_COMPLETED;
+        postcopy_outgoing_completed(s);
+        DPRINTF("PO_STATE_COMPLETED\n");
+        return 1;
+    case PO_STATE_ERROR_RECEIVE:
+        postcopy_outgoing_completed(s);
+        DPRINTF("PO_STATE_ERROR_RECEIVE\n");
+        return -1;
+    default:
+        abort();
+    }
+
+    if (s->ms->params.nobg) {
+        /* See if all pages are sent. */
+        if (postcopy_outgoing_check_all_ram_sent(s,
+                                                 s->block, s->offset) == 0) {
+            return 0;
+        }
+        /* ram_list can be reordered. (it doesn't seem so during migration,
+           though) So the whole list needs to be checked again */
+        if (postcopy_outgoing_check_all_ram_sent(s, NULL, 0) == 0) {
+            return 0;
+        }
+
+        postcopy_outgoing_ram_all_sent(f, s);
+        return 0;
+    }
+
+    DPRINTF("outgoing background state: %d\n", s->state);
+
+    while (qemu_file_rate_limit(f) == 0) {
+        if (ram_save_block(f) == 0) { /* no more blocks */
+            assert(s->state == PO_STATE_ACTIVE);
+            postcopy_outgoing_ram_all_sent(f, s);
+            return 0;
+        }
+    }
+
+    return 0;
+}
+
 /***************************************************************************
  * incoming part
  */
diff --git a/migration-tcp.c b/migration-tcp.c
index 440804d..98be560 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -65,23 +65,32 @@ static void tcp_wait_for_connect(void *opaque)
     } while (ret == -1 && (socket_error()) == EINTR);
 
     if (ret < 0) {
-        migrate_fd_error(s);
-        return;
+        goto error_out;
     }
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
-    if (val == 0)
+    if (val == 0) {
+        ret = postcopy_outgoing_create_read_socket(s);
+        if (ret < 0) {
+            goto error_out;
+        }
         migrate_fd_connect(s);
-    else {
+    } else {
         DPRINTF("error connecting %d\n", val);
-        migrate_fd_error(s);
+        goto error_out;
     }
+    return;
+
+error_out:
+    migrate_fd_error(s);
 }
 
 int tcp_start_outgoing_migration(MigrationState *s, const char *host_port,
                                  Error **errp)
 {
+    int ret;
+
     s->get_error = socket_errno;
     s->write = socket_write;
     s->close = tcp_close;
@@ -105,6 +114,12 @@ int tcp_start_outgoing_migration(MigrationState *s, const char *host_port,
         return -1;
     }
 
+    ret = postcopy_outgoing_create_read_socket(s);
+    if (ret < 0) {
+        migrate_fd_error(s);
+        return ret;
+    }
+
     return 0;
 }
 
diff --git a/migration-unix.c b/migration-unix.c
index 169de88..f3ebaff 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -71,12 +71,20 @@ static void unix_wait_for_connect(void *opaque)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
-    if (val == 0)
+    if (val == 0) {
+        ret = postcopy_outgoing_create_read_socket(s);
+        if (ret < 0) {
+            goto error_out;
+        }
         migrate_fd_connect(s);
-    else {
+    } else {
         DPRINTF("error connecting %d\n", val);
-        migrate_fd_error(s);
+        goto error_out;
     }
+    return;
+
+error_out:
+    migrate_fd_error(s);
 }
 
 int unix_start_outgoing_migration(MigrationState *s, const char *path)
@@ -111,11 +119,19 @@ int unix_start_outgoing_migration(MigrationState *s, const char *path)
 
     if (ret < 0) {
         DPRINTF("connect failed\n");
-        migrate_fd_error(s);
-        return ret;
+        goto error_out;
+    }
+
+    ret = postcopy_outgoing_create_read_socket(s);
+    if (ret < 0) {
+        goto error_out;
     }
     migrate_fd_connect(s);
     return 0;
+
+error_out:
+    migrate_fd_error(s);
+    return ret;
 }
 
 static void unix_accept_incoming_migration(void *opaque)
diff --git a/migration.c b/migration.c
index 462620f..e8be0d1 100644
--- a/migration.c
+++ b/migration.c
@@ -41,6 +41,11 @@ enum {
     MIG_STATE_COMPLETED,
 };
 
+enum {
+    MIG_SUBSTATE_PRECOPY,
+    MIG_SUBSTATE_POSTCOPY,
+};
+
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
 static NotifierList migration_state_notifiers =
@@ -248,6 +253,17 @@ static void migrate_fd_put_ready(void *opaque)
         return;
     }
 
+    if (s->substate == MIG_SUBSTATE_POSTCOPY) {
+        /* PRINTF("postcopy background\n"); */
+        ret = postcopy_outgoing_ram_save_background(s->file, s->postcopy);
+        if (ret > 0) {
+            migrate_fd_completed(s);
+        } else if (ret < 0) {
+            migrate_fd_error(s);
+        }
+        return;
+    }
+
     DPRINTF("iterate\n");
     ret = qemu_savevm_state_iterate(s->file);
     if (ret < 0) {
@@ -259,7 +275,20 @@ static void migrate_fd_put_ready(void *opaque)
         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
         vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 
-        if (qemu_savevm_state_complete(s->file) < 0) {
+        if (s->params.postcopy) {
+            if (qemu_savevm_state_complete(s->file, s->params.postcopy) < 0) {
+                migrate_fd_error(s);
+                if (old_vm_running) {
+                    vm_start();
+                }
+                return;
+            }
+            s->substate = MIG_SUBSTATE_POSTCOPY;
+            s->postcopy = postcopy_outgoing_begin(s);
+            return;
+        }
+
+        if (qemu_savevm_state_complete(s->file, s->params.postcopy) < 0) {
             migrate_fd_error(s);
         } else {
             migrate_fd_completed(s);
@@ -348,6 +377,7 @@ void migrate_fd_connect(MigrationState *s)
     int ret;
 
     s->state = MIG_STATE_ACTIVE;
+    s->substate = MIG_SUBSTATE_PRECOPY;
     s->file = qemu_fopen_ops_buffered(s,
                                       s->bandwidth_limit,
                                       migrate_fd_put_buffer,
diff --git a/migration.h b/migration.h
index e6f8006..90f3bdf 100644
--- a/migration.h
+++ b/migration.h
@@ -39,6 +39,12 @@ struct MigrationState
     int (*write)(MigrationState *s, const void *buff, size_t size);
     void *opaque;
     MigrationParams params;
+
+    /* for postcopy */
+    int substate;              /* precopy or postcopy */
+    int fd_read;
+    QEMUFile *file_read;        /* connection from the detination */
+    void *postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -106,6 +112,12 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+/* For outgoing postcopy */
+int postcopy_outgoing_create_read_socket(MigrationState *s);
+int postcopy_outgoing_ram_save_live(QEMUFile *f, int stage, void *opaque);
+void *postcopy_outgoing_begin(MigrationState *s);
+int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy);
+
 /* For incoming postcopy */
 extern bool incoming_postcopy;
 extern unsigned long incoming_postcopy_flags;
diff --git a/savevm.c b/savevm.c
index 74b15e7..48b636d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1698,8 +1698,10 @@ int qemu_savevm_state_iterate(QEMUFile *f)
     return ret;
 }
 
-int qemu_savevm_state_complete(QEMUFile *f)
+int qemu_savevm_state_complete(QEMUFile *f, bool postcopy)
 {
+    QEMUFile *orig_f = NULL;
+    QEMUFileBuf *buf_file = NULL;
     SaveStateEntry *se;
     int ret;
 
@@ -1719,6 +1721,12 @@ int qemu_savevm_state_complete(QEMUFile *f)
         }
     }
 
+    if (postcopy) {
+        orig_f = f;
+        buf_file = qemu_fopen_buf_write();
+        f = buf_file->file;
+    }
+
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
 
@@ -1742,6 +1750,16 @@ int qemu_savevm_state_complete(QEMUFile *f)
 
     qemu_put_byte(f, QEMU_VM_EOF);
 
+    if (postcopy) {
+        qemu_fflush(f);
+        qemu_put_byte(orig_f, QEMU_VM_POSTCOPY);
+        qemu_put_be32(orig_f, buf_file->buf.buffer_size);
+        qemu_put_buffer(orig_f,
+                        buf_file->buf.buffer, buf_file->buf.buffer_size);
+        qemu_fclose(f);
+        f = orig_f;
+    }
+
     return qemu_file_get_error(f);
 }
 
@@ -1781,7 +1799,7 @@ static int qemu_savevm_state(QEMUFile *f)
             goto out;
     } while (ret == 0);
 
-    ret = qemu_savevm_state_complete(f);
+    ret = qemu_savevm_state_complete(f, params.postcopy);
 
 out:
     if (ret == 0) {
diff --git a/sysemu.h b/sysemu.h
index 3857cf0..6ee4cd8 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -79,7 +79,7 @@ void qemu_announce_self(void);
 bool qemu_savevm_state_blocked(Error **errp);
 int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params);
 int qemu_savevm_state_iterate(QEMUFile *f);
-int qemu_savevm_state_complete(QEMUFile *f);
+int qemu_savevm_state_complete(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cancel(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (36 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 37/41] postcopy: implement outgoing " Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 39/41] postcopy/outgoing: implement prefault Isaku Yamahata
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx  |   15 ++++++++++-----
 hmp.c            |    3 +++
 migration.c      |   20 ++++++++++++++++++++
 migration.h      |    2 ++
 qapi-schema.json |    3 ++-
 5 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3c647f7..38e5c95 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,26 +798,31 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
-        .params     = "[-d] [-b] [-i] [-p [-n]] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
+	              "forward:i?,backward:i?",
+        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
 		      "(base image shared between src and destination)"
 		      "\n\t\t\t-p for migration with postcopy mode enabled"
-		      "\n\t\t\t-n for no background transfer of postcopy mode",
+		      "\n\t\t\t-n for no background transfer of postcopy mode"
+		      "\n\t\t\tforward: the number of pages to "
+		      "forward-prefault when postcopy (default 0)"
+		      "\n\t\t\tbackward: the number of pages to "
+		      "backward-prefault when postcopy (default 0)",
         .mhandler.cmd = hmp_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
-	-p for migration with postcopy mode enabled
+	-p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy)
 	-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index d546a52..79a9c86 100644
--- a/hmp.c
+++ b/hmp.c
@@ -913,11 +913,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
     int nobg = qdict_get_try_bool(qdict, "nobg", 0);
+    int forward = qdict_get_try_int(qdict, "forward", 0);
+    int backward = qdict_get_try_int(qdict, "backward", 0);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
                 !!postcopy, postcopy, !!nobg, nobg,
+                !!forward, forward, !!backward, backward,
                 &err);
     if (err) {
         monitor_printf(mon, "migrate: %s\n", error_get_pretty(err));
diff --git a/migration.c b/migration.c
index e8be0d1..e026085 100644
--- a/migration.c
+++ b/migration.c
@@ -423,6 +423,8 @@ void migrate_del_blocker(Error *reason)
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
                  bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+                 bool has_forward, int64_t forward,
+                 bool has_backward, int64_t backward,
                  Error **errp)
 {
     MigrationState *s = migrate_get_current();
@@ -431,6 +433,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         .shared = false,
         .postcopy = false,
         .nobg = false,
+        .prefault_forward = 0,
+        .prefault_backward = 0,
     };
     const char *p;
     int ret;
@@ -447,6 +451,22 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     if (has_nobg) {
         params.nobg = nobg;
     }
+    if (has_forward) {
+        if (forward < 0) {
+            error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                      "forward", "forward >= 0");
+            return;
+        }
+        params.prefault_forward = forward;
+    }
+    if (has_backward) {
+        if (backward < 0) {
+            error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                      "backward", "backward >= 0");
+            return;
+        }
+        params.prefault_backward = backward;
+    }
 
     if (s->state == MIG_STATE_ACTIVE) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 90f3bdf..9a9b9c6 100644
--- a/migration.h
+++ b/migration.h
@@ -24,6 +24,8 @@ struct MigrationParams {
     int shared;
     int postcopy;
     int nobg;
+    int64_t prefault_forward;
+    int64_t prefault_backward;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 5861fb9..83c2170 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1718,7 +1718,8 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-           '*postcopy': 'bool', '*nobg': 'bool'} }
+           '*postcopy': 'bool', '*nobg': 'bool',
+           '*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
 #
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 39/41] postcopy/outgoing: implement prefault
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (37 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command Isaku Yamahata
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

When page is requested, send surrounding pages are also sent.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration-postcopy.c |   56 +++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index eb37094..6165657 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -353,6 +353,36 @@ int postcopy_outgoing_ram_save_live(QEMUFile *f, int stage, void *opaque)
     return ret;
 }
 
+static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s,
+                                            uint64_t pgoffset, bool *written,
+                                            bool forward,
+                                            int prefault_pgoffset)
+{
+    ram_addr_t offset;
+    int ret;
+
+    if (forward) {
+        pgoffset += prefault_pgoffset;
+    } else {
+        if (pgoffset < prefault_pgoffset) {
+            return;
+        }
+        pgoffset -= prefault_pgoffset;
+    }
+
+    offset = pgoffset << TARGET_PAGE_BITS;
+    if (offset >= s->last_block_read->length) {
+        assert(forward);
+        assert(prefault_pgoffset > 0);
+        return;
+    }
+
+    ret = ram_save_page(s->mig_buffered_write, s->last_block_read, offset);
+    if (ret > 0) {
+        *written = true;
+    }
+}
+
 /*
  * return value
  *   0: continue postcopy mode
@@ -364,6 +394,7 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
                                         bool *written)
 {
     int i;
+    uint64_t j;
     RAMBlock *block;
 
     DPRINTF("cmd %d state %d\n", req->cmd, s->state);
@@ -398,11 +429,26 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
             break;
         }
         for (i = 0; i < req->nr; i++) {
-            DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
-            int ret = ram_save_page(s->mig_buffered_write, s->last_block_read,
-                                    req->pgoffs[i] << TARGET_PAGE_BITS);
-            if (ret > 0) {
-                *written = true;
+            DPRINTF("pgoffs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
+            postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                            true, 0);
+        }
+        /* forward prefault */
+        for (j = 1; j <= s->ms->params.prefault_forward; j++) {
+            for (i = 0; i < req->nr; i++) {
+                DPRINTF("pgoffs[%d] + 0x%"PRIx64" 0x%"PRIx64"\n",
+                        i, j, req->pgoffs[i] + j);
+                postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                                true, j);
+            }
+        }
+        /* backward prefault */
+        for (j = 1; j <= s->ms->params.prefault_backward; j++) {
+            for (i = 0; i < req->nr; i++) {
+                DPRINTF("pgoffs[%d] - 0x%"PRIx64" 0x%"PRIx64"\n",
+                        i, j, req->pgoffs[i] - j);
+                postcopy_outgoing_ram_save_page(s, req->pgoffs[i], written,
+                                                false, j);
             }
         }
         break;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (38 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 39/41] postcopy/outgoing: implement prefault Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04  9:57 ` [PATCH v2 41/41] migration/postcopy: add movebg mode Isaku Yamahata
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hmp-commands.hx  |    5 +++--
 hmp.c            |    3 ++-
 migration.c      |    8 +++++++-
 migration.h      |    1 +
 qapi-schema.json |    2 +-
 qmp-commands.hx  |    2 +-
 savevm.c         |    1 +
 7 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 38e5c95..1912cb8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,15 +798,16 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,"
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,"
 	              "forward:i?,backward:i?",
-        .params     = "[-d] [-b] [-i] [-p [-n] uri [forward] [backword]",
+        .params     = "[-d] [-b] [-i] [-p [-n] [-m] uri [forward] [backword]",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
 		      "(base image shared between src and destination)"
 		      "\n\t\t\t-p for migration with postcopy mode enabled"
+		      "\n\t\t\t-m for move background transfer of postcopy mode"
 		      "\n\t\t\t-n for no background transfer of postcopy mode"
 		      "\n\t\t\tforward: the number of pages to "
 		      "forward-prefault when postcopy (default 0)"
diff --git a/hmp.c b/hmp.c
index 79a9c86..dd3f307 100644
--- a/hmp.c
+++ b/hmp.c
@@ -912,6 +912,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     int postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
+    int movebg = qdict_get_try_bool(qdict, "movebg", 0);
     int nobg = qdict_get_try_bool(qdict, "nobg", 0);
     int forward = qdict_get_try_int(qdict, "forward", 0);
     int backward = qdict_get_try_int(qdict, "backward", 0);
@@ -919,7 +920,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     Error *err = NULL;
 
     qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-                !!postcopy, postcopy, !!nobg, nobg,
+                !!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
                 !!forward, forward, !!backward, backward,
                 &err);
     if (err) {
diff --git a/migration.c b/migration.c
index e026085..c5e6820 100644
--- a/migration.c
+++ b/migration.c
@@ -422,7 +422,9 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+                 bool has_postcopy, bool postcopy,
+                 bool has_movebg, bool movebg,
+                 bool has_nobg, bool nobg,
                  bool has_forward, int64_t forward,
                  bool has_backward, int64_t backward,
                  Error **errp)
@@ -432,6 +434,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         .blk = false,
         .shared = false,
         .postcopy = false,
+        .movebg = false,
         .nobg = false,
         .prefault_forward = 0,
         .prefault_backward = 0,
@@ -448,6 +451,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     if (has_postcopy) {
         params.postcopy = postcopy;
     }
+    if (has_movebg) {
+        params.movebg = movebg;
+    }
     if (has_nobg) {
         params.nobg = nobg;
     }
diff --git a/migration.h b/migration.h
index 9a9b9c6..1e98b20 100644
--- a/migration.h
+++ b/migration.h
@@ -23,6 +23,7 @@ struct MigrationParams {
     int blk;
     int shared;
     int postcopy;
+    int movebg;
     int nobg;
     int64_t prefault_forward;
     int64_t prefault_backward;
diff --git a/qapi-schema.json b/qapi-schema.json
index 83c2170..ef2f48e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1718,7 +1718,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-           '*postcopy': 'bool', '*nobg': 'bool',
+           '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',
            '*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7b5e5b7..5c9ecc8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+        .args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s",
         .mhandler.cmd_new = qmp_marshal_input_migrate,
     },
 
diff --git a/savevm.c b/savevm.c
index 48b636d..19bb8f1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1781,6 +1781,7 @@ static int qemu_savevm_state(QEMUFile *f)
         .blk = 0,
         .shared = 0,
         .postcopy = 0,
+        .movebg = 0,
         .nobg = 0,
     };
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 41/41] migration/postcopy: add movebg mode
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (39 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command Isaku Yamahata
@ 2012-06-04  9:57 ` Isaku Yamahata
  2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
  2012-06-14 22:18 ` Juan Quintela
  42 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04  9:57 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: owasserm, quintela, avi, pbonzini, aliguori, stefanha, dlaor,
	mdroth, yoshikawa.takuya, benoit.hudzia, aarcange, t.hirofuchi,
	satoshi.itoh

When movebg mode is enabled, the point to send background page is set
to the next page to on-demand page.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 migration-postcopy.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index 6165657..3df88d7 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -442,6 +442,14 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
                                                 true, j);
             }
         }
+        if (s->ms->params.movebg) {
+            ram_addr_t last_offset =
+                (req->pgoffs[req->nr - 1] + s->ms->params.prefault_forward) <<
+                TARGET_PAGE_BITS;
+            last_offset = MIN(last_offset,
+                              s->last_block_read->length - TARGET_PAGE_SIZE);
+            ram_save_set_last_block(s->last_block_read, last_offset);
+        }
         /* backward prefault */
         for (j = 1; j <= s->ms->params.prefault_backward; j++) {
             for (i = 0; i < req->nr; i++) {
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (40 preceding siblings ...)
  2012-06-04  9:57 ` [PATCH v2 41/41] migration/postcopy: add movebg mode Isaku Yamahata
@ 2012-06-04 12:37 ` Anthony Liguori
  2012-06-04 13:38   ` Isaku Yamahata
                     ` (2 more replies)
  2012-06-14 22:18 ` Juan Quintela
  42 siblings, 3 replies; 58+ messages in thread
From: Anthony Liguori @ 2012-06-04 12:37 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: qemu-devel, kvm, owasserm, quintela, avi, pbonzini, stefanha,
	dlaor, mdroth, yoshikawa.takuya, benoit.hudzia, aarcange,
	t.hirofuchi, satoshi.itoh

On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
> After the long time, we have v2. This is qemu part.
> The linux kernel part is sent separatedly.
>
> Changes v1 ->  v2:
> - split up patches for review
> - buffered file refactored
> - many bug fixes
>    Espcially PV drivers can work with postcopy
> - optimization/heuristic
>
> Patches
> 1 - 30: refactoring exsiting code and preparation
> 31 - 37: implement postcopy itself (essential part)
> 38 - 41: some optimization/heuristic for postcopy
>
> Intro
> =====
> This patch series implements postcopy live migration.[1]
> As discussed at KVM forum 2011, dedicated character device is used for
> distributed shared memory between migration source and destination.
> Now we can discuss/benchmark/compare with precopy. I believe there are
> much rooms for improvement.
>
> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>
>
> Usage
> =====
> You need load umem character device on the host before starting migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation depend
> on only linux umem character device. But the driver dependent code is split
> into a file.
> I tested only host page size == guest page size case, but the implementation
> allows host page size != guest page size case.
>
> The following options are added with this patch series.
> - incoming part
>    command line options
>    -postcopy [-postcopy-flags<flags>]
>    where flags is for changing behavior for benchmark/debugging
>    Currently the following flags are available
>    0: default
>    1: enable touching page request
>
>    example:
>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>
> - outging part
>    options for migrate command
>    migrate [-p [-n] [-m]] URI [<prefault forward>  [<prefault backword>]]
>    -p: indicate postcopy migration
>    -n: disable background transferring pages: This is for benchmark/debugging
>    -m: move background transfer of postcopy mode
>    <prefault forward>: The number of forward pages which is sent with on-demand
>    <prefault backward>: The number of backward pages which is sent with
>                         on-demand
>
>    example:
>    migrate -p -n tcp:<dest ip address>:4444
>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>
>
> TODO
> ====
> - benchmark/evaluation. Especially how async page fault affects the result.

I don't mean to beat on a dead horse, but I really don't understand the point of 
postcopy migration other than the fact that it's possible.  It's a lot of code 
and a new ABI in an area where we already have too much difficulty maintaining 
our ABI.

Without a compelling real world case with supporting benchmarks for why we need 
postcopy and cannot improve precopy, I'm against merging this.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
@ 2012-06-04 13:38   ` Isaku Yamahata
  2012-06-05 11:23     ` Dor Laor
  2012-06-07  7:46   ` Orit Wasserman
  2012-06-08 10:16   ` Juan Quintela
  2 siblings, 1 reply; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-04 13:38 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, owasserm, quintela, avi, pbonzini, stefanha,
	dlaor, mdroth, yoshikawa.takuya, benoit.hudzia, aarcange,
	t.hirofuchi, satoshi.itoh

On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote:
> On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
>> After the long time, we have v2. This is qemu part.
>> The linux kernel part is sent separatedly.
>>
>> Changes v1 ->  v2:
>> - split up patches for review
>> - buffered file refactored
>> - many bug fixes
>>    Espcially PV drivers can work with postcopy
>> - optimization/heuristic
>>
>> Patches
>> 1 - 30: refactoring exsiting code and preparation
>> 31 - 37: implement postcopy itself (essential part)
>> 38 - 41: some optimization/heuristic for postcopy
>>
>> Intro
>> =====
>> This patch series implements postcopy live migration.[1]
>> As discussed at KVM forum 2011, dedicated character device is used for
>> distributed shared memory between migration source and destination.
>> Now we can discuss/benchmark/compare with precopy. I believe there are
>> much rooms for improvement.
>>
>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>
>>
>> Usage
>> =====
>> You need load umem character device on the host before starting migration.
>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>> on only linux umem character device. But the driver dependent code is split
>> into a file.
>> I tested only host page size == guest page size case, but the implementation
>> allows host page size != guest page size case.
>>
>> The following options are added with this patch series.
>> - incoming part
>>    command line options
>>    -postcopy [-postcopy-flags<flags>]
>>    where flags is for changing behavior for benchmark/debugging
>>    Currently the following flags are available
>>    0: default
>>    1: enable touching page request
>>
>>    example:
>>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>
>> - outging part
>>    options for migrate command
>>    migrate [-p [-n] [-m]] URI [<prefault forward>  [<prefault backword>]]
>>    -p: indicate postcopy migration
>>    -n: disable background transferring pages: This is for benchmark/debugging
>>    -m: move background transfer of postcopy mode
>>    <prefault forward>: The number of forward pages which is sent with on-demand
>>    <prefault backward>: The number of backward pages which is sent with
>>                         on-demand
>>
>>    example:
>>    migrate -p -n tcp:<dest ip address>:4444
>>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>
>>
>> TODO
>> ====
>> - benchmark/evaluation. Especially how async page fault affects the result.
>
> I don't mean to beat on a dead horse, but I really don't understand the 
> point of postcopy migration other than the fact that it's possible.  It's 
> a lot of code and a new ABI in an area where we already have too much 
> difficulty maintaining our ABI.
>
> Without a compelling real world case with supporting benchmarks for why 
> we need postcopy and cannot improve precopy, I'm against merging this.

Some new results are available at 
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf

precopy assumes that the network bandwidth are wide enough and
the number of dirty pages converges. But it doesn't always hold true.

- planned migration
  predictability of total migration time is important

- dynamic consolidation
  In cloud use cases, the resources of physical machine are usually
  over committed.
  When physical machine becomes over loaded, some VMs are moved to another
  physical host to balance the load.
  precopy can't move VMs promptly. compression makes things worse.

- inter data center migration
  With L2 over L3 technology, it has becoming common to create a virtual
  data center which actually spans over multi physical data centers.
  It is useful to migrate VMs over physical data centers as disaster recovery.
  The network bandwidth between DCs is narrower than LAN case. So precopy
  assumption wouldn't hold.

- In case that network bandwidth might be limited by QoS,
  precopy assumption doesn't hold.


thanks,
-- 
yamahata

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04 13:38   ` Isaku Yamahata
@ 2012-06-05 11:23     ` Dor Laor
  0 siblings, 0 replies; 58+ messages in thread
From: Dor Laor @ 2012-06-05 11:23 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, Anthony Liguori, kvm, quintela,
	stefanha, t.hirofuchi, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, Orit Wasserman, avi, pbonzini

On 06/04/2012 04:38 PM, Isaku Yamahata wrote:
> On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote:
>> On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
>>> After the long time, we have v2. This is qemu part.
>>> The linux kernel part is sent separatedly.
>>>
>>> Changes v1 ->   v2:
>>> - split up patches for review
>>> - buffered file refactored
>>> - many bug fixes
>>>     Espcially PV drivers can work with postcopy
>>> - optimization/heuristic
>>>
>>> Patches
>>> 1 - 30: refactoring exsiting code and preparation
>>> 31 - 37: implement postcopy itself (essential part)
>>> 38 - 41: some optimization/heuristic for postcopy
>>>
>>> Intro
>>> =====
>>> This patch series implements postcopy live migration.[1]
>>> As discussed at KVM forum 2011, dedicated character device is used for
>>> distributed shared memory between migration source and destination.
>>> Now we can discuss/benchmark/compare with precopy. I believe there are
>>> much rooms for improvement.
>>>
>>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>>
>>>
>>> Usage
>>> =====
>>> You need load umem character device on the host before starting migration.
>>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>>> on only linux umem character device. But the driver dependent code is split
>>> into a file.
>>> I tested only host page size == guest page size case, but the implementation
>>> allows host page size != guest page size case.
>>>
>>> The following options are added with this patch series.
>>> - incoming part
>>>     command line options
>>>     -postcopy [-postcopy-flags<flags>]
>>>     where flags is for changing behavior for benchmark/debugging
>>>     Currently the following flags are available
>>>     0: default
>>>     1: enable touching page request
>>>
>>>     example:
>>>     qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>>
>>> - outging part
>>>     options for migrate command
>>>     migrate [-p [-n] [-m]] URI [<prefault forward>   [<prefault backword>]]
>>>     -p: indicate postcopy migration
>>>     -n: disable background transferring pages: This is for benchmark/debugging
>>>     -m: move background transfer of postcopy mode
>>>     <prefault forward>: The number of forward pages which is sent with on-demand
>>>     <prefault backward>: The number of backward pages which is sent with
>>>                          on-demand
>>>
>>>     example:
>>>     migrate -p -n tcp:<dest ip address>:4444
>>>     migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>>
>>>
>>> TODO
>>> ====
>>> - benchmark/evaluation. Especially how async page fault affects the result.
>>
>> I don't mean to beat on a dead horse, but I really don't understand the
>> point of postcopy migration other than the fact that it's possible.  It's
>> a lot of code and a new ABI in an area where we already have too much
>> difficulty maintaining our ABI.
>>
>> Without a compelling real world case with supporting benchmarks for why
>> we need postcopy and cannot improve precopy, I'm against merging this.
>
> Some new results are available at
> https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf


It does shows dramatic improvement over pre copy. As stated in the docs, 
async page faults may help lots of various loads and turn post copy into 
a viable solution over today's code.

In addition, the sort of 'demand pages' approach on the destination can 
help us for other usages - For example, we can use this implementation 
to live snapshot VMs w/ RAM (post live migration into a file that leave 
the source active) and live resume VMs from file w/o reading the entire 
RAM from disk.

I didn't go over the api for the live migration part but IIUC, the only 
change needed for the live migration 'protocol' is w.r.t guest pages and 
we need to do it regardless when we'll merge the page ordering optimization.

Cheers,
Dor

> precopy assumes that the network bandwidth are wide enough and
> the number of dirty pages converges. But it doesn't always hold true.
>
> - planned migration
>    predictability of total migration time is important
>
> - dynamic consolidation
>    In cloud use cases, the resources of physical machine are usually
>    over committed.
>    When physical machine becomes over loaded, some VMs are moved to another
>    physical host to balance the load.
>    precopy can't move VMs promptly. compression makes things worse.
>
> - inter data center migration
>    With L2 over L3 technology, it has becoming common to create a virtual
>    data center which actually spans over multi physical data centers.
>    It is useful to migrate VMs over physical data centers as disaster recovery.
>    The network bandwidth between DCs is narrower than LAN case. So precopy
>    assumption wouldn't hold.
>
> - In case that network bandwidth might be limited by QoS,
>    precopy assumption doesn't hold.
>
>
> thanks,

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
  2012-06-04 13:38   ` Isaku Yamahata
@ 2012-06-07  7:46   ` Orit Wasserman
  2012-06-08 10:16   ` Juan Quintela
  2 siblings, 0 replies; 58+ messages in thread
From: Orit Wasserman @ 2012-06-07  7:46 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: benoit.hudzia, aarcange, kvm, quintela, stefanha, t.hirofuchi,
	dlaor, satoshi.itoh, qemu-devel, mdroth, yoshikawa.takuya,
	Isaku Yamahata, avi, pbonzini

On 06/04/2012 03:37 PM, Anthony Liguori wrote:
> On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
>> After the long time, we have v2. This is qemu part.
>> The linux kernel part is sent separatedly.
>>
>> Changes v1 ->  v2:
>> - split up patches for review
>> - buffered file refactored
>> - many bug fixes
>>    Espcially PV drivers can work with postcopy
>> - optimization/heuristic
>>
>> Patches
>> 1 - 30: refactoring exsiting code and preparation
>> 31 - 37: implement postcopy itself (essential part)
>> 38 - 41: some optimization/heuristic for postcopy
>>
>> Intro
>> =====
>> This patch series implements postcopy live migration.[1]
>> As discussed at KVM forum 2011, dedicated character device is used for
>> distributed shared memory between migration source and destination.
>> Now we can discuss/benchmark/compare with precopy. I believe there are
>> much rooms for improvement.
>>
>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>
>>
>> Usage
>> =====
>> You need load umem character device on the host before starting migration.
>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>> on only linux umem character device. But the driver dependent code is split
>> into a file.
>> I tested only host page size == guest page size case, but the implementation
>> allows host page size != guest page size case.
>>
>> The following options are added with this patch series.
>> - incoming part
>>    command line options
>>    -postcopy [-postcopy-flags<flags>]
>>    where flags is for changing behavior for benchmark/debugging
>>    Currently the following flags are available
>>    0: default
>>    1: enable touching page request
>>
>>    example:
>>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>
>> - outging part
>>    options for migrate command
>>    migrate [-p [-n] [-m]] URI [<prefault forward>  [<prefault backword>]]
>>    -p: indicate postcopy migration
>>    -n: disable background transferring pages: This is for benchmark/debugging
>>    -m: move background transfer of postcopy mode
>>    <prefault forward>: The number of forward pages which is sent with on-demand
>>    <prefault backward>: The number of backward pages which is sent with
>>                         on-demand
>>
>>    example:
>>    migrate -p -n tcp:<dest ip address>:4444
>>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>
>>
>> TODO
>> ====
>> - benchmark/evaluation. Especially how async page fault affects the result.
> 
> I don't mean to beat on a dead horse, but I really don't understand the point of postcopy migration other than the fact that it's possible.  It's a lot of code and a new ABI in an area where we already have too much difficulty maintaining our ABI.
> 
> Without a compelling real world case with supporting benchmarks for why we need postcopy and cannot improve precopy, I'm against merging this.
Hi Anthony,

The example is quite simple lets look at a 300G guest that is dirtying 10 percent of it memory every second. (for example SAP ...)
Even if we have a 30G/s network we will need 1 second of downtime of this guest , many workload time out in this kind of downtime.
The guests are getting bigger and bigger so for those big guest the only way to do live migration is using post copy.
I agree we are losing reliability with post copy but we can try to limit the risk :
- do a full copy of the guest ram (precopy) and than switch to post copy only for the updates
- the user will use a private LAN ,maybe with redundancy which is much safer
- maybe backup the memory to storage so in case of network failure we can recover

In the end it is up to the user , he can decide what he is willing to risk.
The default of course should always be precopy live migration, maybe we should even have a different command for post copy.
In the end I can see some users that will have no choice but use post copy live migration or stop the guest in order to move them to 
another host.

Regards,
Orit
> 
> Regards,
> 
> Anthony Liguori
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
  2012-06-04 13:38   ` Isaku Yamahata
  2012-06-07  7:46   ` Orit Wasserman
@ 2012-06-08 10:16   ` Juan Quintela
  2012-06-08 10:23     ` Avi Kivity
  2 siblings, 1 reply; 58+ messages in thread
From: Juan Quintela @ 2012-06-08 10:16 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Isaku Yamahata, qemu-devel, kvm, owasserm, avi, pbonzini,
	stefanha, dlaor, mdroth, yoshikawa.takuya, benoit.hudzia,
	aarcange, t.hirofuchi, satoshi.itoh

Anthony Liguori <aliguori@us.ibm.com> wrote:
>> TODO
>> ====
>> - benchmark/evaluation. Especially how async page fault affects the result.
>
> I don't mean to beat on a dead horse, but I really don't understand
> the point of postcopy migration other than the fact that it's
> possible.  It's a lot of code and a new ABI in an area where we
> already have too much difficulty maintaining our ABI.
>
> Without a compelling real world case with supporting benchmarks for
> why we need postcopy and cannot improve precopy, I'm against merging
> this.

I understand easily the need/want for post-copy migration.  Other thing
is that this didn't came with benchmarks and that post-copy is
difficult.

The basic problem with precopy is that the amount of memory used by
guest is not going to go down any time soon.  The same with number of
cores.  At some point (it didn't matter if it is 16GB, 128GB or 256GB
RAM in the guest, the same for vcpus), precopy just don't have a chance.
And post-copy does.

Once told that, we need to measure what is the time of an async page
fault over the network.  If it is too high, post copy just don't work.

And no, I haven't seen any measurement that told us that this is going
to be fast enough, but there is always hope.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-08 10:16   ` Juan Quintela
@ 2012-06-08 10:23     ` Avi Kivity
  2012-06-14 21:07       ` Juan Quintela
  0 siblings, 1 reply; 58+ messages in thread
From: Avi Kivity @ 2012-06-08 10:23 UTC (permalink / raw)
  To: quintela
  Cc: benoit.hudzia, aarcange, Anthony Liguori, kvm, owasserm,
	stefanha, t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, Isaku Yamahata, pbonzini

On 06/08/2012 01:16 PM, Juan Quintela wrote:
> Anthony Liguori <aliguori@us.ibm.com> wrote:
> >> TODO
> >> ====
> >> - benchmark/evaluation. Especially how async page fault affects the result.
> >
> > I don't mean to beat on a dead horse, but I really don't understand
> > the point of postcopy migration other than the fact that it's
> > possible.  It's a lot of code and a new ABI in an area where we
> > already have too much difficulty maintaining our ABI.
> >
> > Without a compelling real world case with supporting benchmarks for
> > why we need postcopy and cannot improve precopy, I'm against merging
> > this.
>
> I understand easily the need/want for post-copy migration.  Other thing
> is that this didn't came with benchmarks and that post-copy is
> difficult.
>
> The basic problem with precopy is that the amount of memory used by
> guest is not going to go down any time soon.  The same with number of
> cores.  At some point (it didn't matter if it is 16GB, 128GB or 256GB
> RAM in the guest, the same for vcpus), precopy just don't have a chance.
> And post-copy does.
>
> Once told that, we need to measure what is the time of an async page
> fault over the network.  If it is too high, post copy just don't work.
>
> And no, I haven't seen any measurement that told us that this is going
> to be fast enough, but there is always hope.

At 10Gb/sec, the time to transfer one page is 4 microseconds.  At
40Gb/sec this drops to a microsecond, plus the latency.  This is on par
with the time to handle a write protection fault that precopy uses.  But
this can *only* be achieved with RDMA, otherwise the overhead of
messaging and copying will dominate.

Note this does not mean we should postpone merging until RDMA support is
ready.  However we need to make sure the kernel interface is RDMA friendly.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
  2012-06-04  9:57 ` [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
@ 2012-06-08 10:52   ` Juan Quintela
  2012-06-08 16:07     ` [Qemu-devel] " Isaku Yamahata
  0 siblings, 1 reply; 58+ messages in thread
From: Juan Quintela @ 2012-06-08 10:52 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: qemu-devel, kvm, owasserm, avi, pbonzini, aliguori, stefanha,
	dlaor, mdroth, yoshikawa.takuya, benoit.hudzia, aarcange,
	t.hirofuchi, satoshi.itoh

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> This patch prepares for postcopy livemigration.
> It introduces -postcopy option and its internal flag, migration_postcopy.
> It introduces -postcopy-flags for chaging the behavior of incoming postcopy
> mainly for benchmark/debug.

Why do we need postcopy flag?  -incoming should be enough to detect that
we are doing postcopy.

    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
        QLIST_HEAD_INITIALIZER(loadvm_handlers);
    LoadStateEntry *le, *new_le;
    uint8_t section_type;
    unsigned int v;
    int ret;

    if (qemu_savevm_state_blocked(NULL)) {
        return -EINVAL;
    }

    v = qemu_get_be32(f);
    if (v != QEMU_VM_FILE_MAGIC)
        return -EINVAL;

    v = qemu_get_be32(f);
    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
        fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
        return -ENOTSUP;
    }
    if (v != QEMU_VM_FILE_VERSION)
        return -ENOTSUP;

Shouldn't we be able to change some version field here and make the
"recognition of postcopy automatic"?  Having to hack around a new
command line option for each page is not going to be nice.  And about
postcopy flags, if they are for "incoming side", please consider just
sent that flags on the stream as a first field?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
  2012-06-08 10:52   ` Juan Quintela
@ 2012-06-08 16:07     ` Isaku Yamahata
  0 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-08 16:07 UTC (permalink / raw)
  To: Juan Quintela
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

On Fri, Jun 08, 2012 at 12:52:54PM +0200, Juan Quintela wrote:
> Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> > This patch prepares for postcopy livemigration.
> > It introduces -postcopy option and its internal flag, migration_postcopy.
> > It introduces -postcopy-flags for chaging the behavior of incoming postcopy
> > mainly for benchmark/debug.
> 
> Why do we need postcopy flag?  -incoming should be enough to detect that
> we are doing postcopy.
> 
>     QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
>         QLIST_HEAD_INITIALIZER(loadvm_handlers);
>     LoadStateEntry *le, *new_le;
>     uint8_t section_type;
>     unsigned int v;
>     int ret;
> 
>     if (qemu_savevm_state_blocked(NULL)) {
>         return -EINVAL;
>     }
> 
>     v = qemu_get_be32(f);
>     if (v != QEMU_VM_FILE_MAGIC)
>         return -EINVAL;
> 
>     v = qemu_get_be32(f);
>     if (v == QEMU_VM_FILE_VERSION_COMPAT) {
>         fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
>         return -ENOTSUP;
>     }
>     if (v != QEMU_VM_FILE_VERSION)
>         return -ENOTSUP;
> 
> Shouldn't we be able to change some version field here and make the
> "recognition of postcopy automatic"?  Having to hack around a new
> command line option for each page is not going to be nice.  And about
> postcopy flags, if they are for "incoming side", please consider just
> sent that flags on the stream as a first field?

Yes, you are right.
If bumping version is allowed, -postcopy can be dropped with auto detection.
-postcopy-flags can be dropped because it is used only for benchmark purpose
to change incoming side behavior independent of outgoing side.
-- 
yamahata

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-08 10:23     ` Avi Kivity
@ 2012-06-14 21:07       ` Juan Quintela
  0 siblings, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 21:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: benoit.hudzia, aarcange, Anthony Liguori, kvm, owasserm,
	stefanha, t.hirofuchi, dlaor, satoshi.itoh, qemu-devel, mdroth,
	yoshikawa.takuya, Isaku Yamahata, pbonzini

Avi Kivity <avi@redhat.com> wrote:
> On 06/08/2012 01:16 PM, Juan Quintela wrote:
>> Anthony Liguori <aliguori@us.ibm.com> wrote:
>>
>> Once told that, we need to measure what is the time of an async page
>> fault over the network.  If it is too high, post copy just don't work.
>>
>> And no, I haven't seen any measurement that told us that this is going
>> to be fast enough, but there is always hope.
>
> At 10Gb/sec, the time to transfer one page is 4 microseconds.  At
> 40Gb/sec this drops to a microsecond, plus the latency.  This is on par
> with the time to handle a write protection fault that precopy uses.  But
> this can *only* be achieved with RDMA, otherwise the overhead of
> messaging and copying will dominate.
>
> Note this does not mean we should postpone merging until RDMA support is
> ready.  However we need to make sure the kernel interface is RDMA friendly.

Fully agree here.  I always thought that postcopy will work with RDMA or
something like that, any other thing would just add too much latency.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
  2012-06-04  9:57 ` [PATCH v2 35/41] postcopy: introduce helper functions for postcopy Isaku Yamahata
@ 2012-06-14 21:34   ` Juan Quintela
  2012-06-16  9:48     ` Isaku Yamahata
  0 siblings, 1 reply; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 21:34 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> +//#define DEBUG_UMEM
> +#ifdef DEBUG_UMEM
> +#include <sys/syscall.h>
> +#define DPRINTF(format, ...)                                            \
> +    do {                                                                \
> +        printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid),    \
> +               __func__, __LINE__, ## __VA_ARGS__);                     \
> +    } while (0)

This should be in a header file that is linux specific?  And (at least
on my systems) gettid is already defined on glibc.


> +#else
> +#define DPRINTF(format, ...)    do { } while (0)
> +#endif


> +
> +#define DEV_UMEM        "/dev/umem"
> +
> +UMem *umem_new(void *hostp, size_t size)
> +{
> +    struct umem_init uinit = {
> +        .size = size,
> +    };
> +    UMem *umem;
> +
> +    assert((size % getpagesize()) == 0);
> +    umem = g_new(UMem, 1);
> +    umem->fd = open(DEV_UMEM, O_RDWR);
> +    if (umem->fd < 0) {
> +        perror("can't open "DEV_UMEM);
> +        abort();

Can we return one error insntead of abort?  the same for the rest of the
file aborts.


> +size_t umem_pages_size(uint64_t nr)
> +{
> +    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);

Can we make sure that the pgoffs field is aligned?  I know that as it is
now it is aligned, but better to be sure?

> +}
> +
> +static void umem_write_cmd(int fd, uint8_t cmd)
> +{
> +    DPRINTF("write cmd %c\n", cmd);
> +
> +    for (;;) {
> +        ssize_t ret = write(fd, &cmd, 1);
> +        if (ret == -1) {
> +            if (errno == EINTR) {
> +                continue;
> +            } else if (errno == EPIPE) {
> +                perror("pipe");
> +                DPRINTF("write cmd %c %zd %d: pipe is closed\n",
> +                        cmd, ret, errno);
> +                break;
> +            }


Grr, we don't have a function that writes does a "safe_write".  The most
similar thing in qemu looks to be send_all().

> +
> +            perror("pipe");

Can we make a different perror() message than previous error?

> +            DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno);
> +            abort();
> +        }
> +
> +        break;
> +    }
> +}
> +
> +static void umem_read_cmd(int fd, uint8_t expect)
> +{
> +    uint8_t cmd;
> +    for (;;) {
> +        ssize_t ret = read(fd, &cmd, 1);
> +        if (ret == -1) {
> +            if (errno == EINTR) {
> +                continue;
> +            }
> +            perror("pipe");
> +            DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno);
> +            abort();
> +        }
> +
> +        if (ret == 0) {
> +            DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret);
> +            abort();
> +        }
> +
> +        break;
> +    }
> +
> +    DPRINTF("read cmd %c\n", cmd);
> +    if (cmd != expect) {
> +        DPRINTF("cmd %c expect %d\n", cmd, expect);
> +        abort();

Ouch.  If we receive garbage, we just exit?

I really think that we should implement error handling.

> +    }
> +}
> +
> +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
> +{
> +    int ret;
> +    uint64_t nr;
> +    size_t size;
> +    struct umem_pages *pages;
> +
> +    ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset);
> +    *offset += sizeof(nr);
> +    DPRINTF("ret %d nr %ld\n", ret, nr);
> +    if (ret != sizeof(nr) || nr == 0) {
> +        return NULL;
> +    }
> +
> +    size = umem_pages_size(nr);
> +    pages = g_malloc(size);

Just thinking about this.  Couldn't we just decide on a "big enough"
buffer, and never send anything bigger than that?  That would remove the
need to have to malloc()/free() a buffer for each reception?



> +/* qemu side handler */
> +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
> +                                                int *offset)
> +{
> +    uint64_t i;
> +    int page_shift = ffs(getpagesize()) - 1;
> +    struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
> +    if (pages == NULL) {
> +        return NULL;
> +    }
> +
> +    for (i = 0; i < pages->nr; i++) {
> +        ram_addr_t addr = pages->pgoffs[i] << page_shift;
> +
> +        /* make pages present by forcibly triggering page fault. */
> +        volatile uint8_t *ram = qemu_get_ram_ptr(addr);
> +        uint8_t dummy_read = ram[0];
> +        (void)dummy_read;   /* suppress unused variable warning */
> +    }
> +
> +    /*
> +     * Very Linux implementation specific.
> +     * Make it sure that other thread doesn't fault on the above virtual
> +     * address. (More exactly other thread doesn't call fault handler with
> +     * the offset.)
> +     * the fault handler is called with mmap_sem read locked.
> +     * madvise() does down/up_write(mmap_sem)
> +     */
> +    qemu_madvise(NULL, 0, MADV_NORMAL);

If it is linux specific, should be inside CONFIG_LINUX ifdef, or a
function hided on some header.

Talking about looking, what protects that no other thread enters this
function before this one calls madvise?   Or I am losing something obvious?

> +
> +struct umem_pages {
> +    uint64_t nr;
> +    uint64_t pgoffs[0];
> +};
> +

QEMU really likes typedefs for structs.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
  2012-06-04  9:57 ` [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
@ 2012-06-14 21:56   ` Juan Quintela
  2012-06-14 21:58   ` Juan Quintela
  1 sibling, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 21:56 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> This patch implements postcopy live migration for incoming part
>
>  vl.c                                               |    8 +-
>  13 files changed, 1409 insertions(+), 21 deletions(-)
>  copy linux-headers/linux/umem.h => migration-postcopy-stub.c (55%)

Ouch, git got really confused.


> +void postcopy_incoming_prepare(void)
> +{
> +    RAMBlock *block;
> +
> +    if (!incoming_postcopy) {
> +        return;
> +    }

We are testing the negation of this before calling the function, it is
not needed in one of the sides?

> +
> +    state.state = 0;
> +    state.host_page_size = getpagesize();
> +    state.host_page_shift = ffs(state.host_page_size) - 1;
> +    state.version_id = RAM_SAVE_VERSION_ID; /* = save version of
> +                                               ram_save_live() */
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        block->umem = umem_new(block->host, block->length);
> +        block->flags |= RAM_POSTCOPY_UMEM_MASK;
> +    }
> +}
> +
> +static int postcopy_incoming_ram_load_get64(QEMUFile *f,
> +                                            ram_addr_t *addr, int *flags)
> +{
> +    *addr = qemu_get_be64(f);
> +    *flags = *addr & ~TARGET_PAGE_MASK;
> +    *addr &= TARGET_PAGE_MASK;
> +    return qemu_file_get_error(f);
> +}
> +
> +int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    ram_addr_t addr;
> +    int flags;
> +    int error;
> +
> +    DPRINTF("incoming ram load\n");
> +    /*
> +     * RAM_SAVE_FLAGS_EOS or
> +     * RAM_SAVE_FLAGS_MEM_SIZE + mem size + RAM_SAVE_FLAGS_EOS
> +     * see postcopy_outgoing_ram_save_live()
> +     */
> +
> +    if (version_id != RAM_SAVE_VERSION_ID) {
> +        DPRINTF("RAM_SAVE_VERSION_ID %d != %d\n",
> +                version_id, RAM_SAVE_VERSION_ID);
> +        return -EINVAL;
> +    }
> +    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
> +    DPRINTF("addr 0x%lx flags 0x%x\n", addr, flags);
> +    if (error) {
> +        DPRINTF("error %d\n", error);
> +        return error;
> +    }
> +    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
> +        DPRINTF("EOS\n");
> +        return 0;
> +    }
> +
> +    if (flags != RAM_SAVE_FLAG_MEM_SIZE) {
> +        DPRINTF("-EINVAL flags 0x%x\n", flags);
> +        return -EINVAL;
> +    }
> +    error = ram_load_mem_size(f, addr);
> +    if (error) {
> +        DPRINTF("addr 0x%lx error %d\n", addr, error);
> +        return error;
> +    }
> +
> +    error = postcopy_incoming_ram_load_get64(f, &addr, &flags);
> +    if (error) {
> +        DPRINTF("addr 0x%lx flags 0x%x error %d\n", addr, flags, error);
> +        return error;
> +    }
> +    if (flags == RAM_SAVE_FLAG_EOS && addr == 0) {
> +        DPRINTF("done\n");
> +        return 0;
> +    }
> +    DPRINTF("-EINVAL\n");
> +    return -EINVAL;
> +}
> +
> +static void postcopy_incoming_pipe_and_fork_umemd(int mig_read_fd,
> +                                                  QEMUFile *mig_read)
> +{
> +    int fds[2];
> +    RAMBlock *block;
> +
> +    DPRINTF("fork\n");
> +
> +    /* socketpair(AF_UNIX)? */
> +
> +    if (qemu_pipe(fds) == -1) {
> +        perror("qemu_pipe");
> +        abort();
> +    }
> +    state.from_umemd_fd = fds[0];
> +    umemd.to_qemu_fd = fds[1];
> +
> +    if (qemu_pipe(fds) == -1) {
> +        perror("qemu_pipe");
> +        abort();
> +    }
> +    umemd.from_qemu_fd = fds[0];
> +    state.to_umemd_fd = fds[1];
> +
> +    pid_t child = fork();
> +    if (child < 0) {
> +        perror("fork");
> +        abort();
> +    }
> +
> +    if (child == 0) {
> +        int mig_write_fd;
> +
> +        fd_close(&state.to_umemd_fd);
> +        fd_close(&state.from_umemd_fd);
> +        umemd.host_page_size = state.host_page_size;
> +        umemd.host_page_shift = state.host_page_shift;
> +
> +        umemd.nr_host_pages_per_target_page =
> +            TARGET_PAGE_SIZE / umemd.host_page_size;
> +        umemd.nr_target_pages_per_host_page =
> +            umemd.host_page_size / TARGET_PAGE_SIZE;
> +
> +        umemd.target_to_host_page_shift =
> +            ffs(umemd.nr_host_pages_per_target_page) - 1;
> +        umemd.host_to_target_page_shift =
> +            ffs(umemd.nr_target_pages_per_host_page) - 1;
> +
> +        umemd.state = 0;
> +        umemd.version_id = state.version_id;
> +        umemd.mig_read_fd = mig_read_fd;
> +        umemd.mig_read = mig_read;
> +
> +        mig_write_fd = dup(mig_read_fd);
> +        if (mig_write_fd < 0) {
> +            perror("could not dup for writable socket \n");
> +            abort();
> +        }
> +        umemd.mig_write_fd = mig_write_fd;
> +        umemd.mig_write = qemu_fopen_nonblock(mig_write_fd);
> +
> +        postcopy_incoming_umemd(); /* noreturn */
> +    }
> +
> +    DPRINTF("qemu pid: %d daemon pid: %d\n", getpid(), child);
> +    fd_close(&umemd.to_qemu_fd);
> +    fd_close(&umemd.from_qemu_fd);
> +    state.faulted_pages = g_malloc(umem_pages_size(MAX_FAULTED_PAGES));
> +    state.faulted_pages->nr = 0;
> +
> +    /* close all UMem.shmem_fd */
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        umem_close_shmem(block->umem);
> +    }
> +    umem_qemu_wait_for_daemon(state.from_umemd_fd);
> +}
> +
> +void postcopy_incoming_fork_umemd(QEMUFile *mig_read)
> +{
> +    int fd = qemu_file_fd(mig_read);
> +    assert((fcntl(fd, F_GETFL) & O_ACCMODE) == O_RDWR);
> +
> +    socket_set_nonblock(fd);
> +    postcopy_incoming_pipe_and_fork_umemd(fd, mig_read);
> +    /* now socket is disowned. So tell umem server that it's safe to use it */
> +    postcopy_incoming_qemu_ready();
> +}
> +
> +static void postcopy_incoming_qemu_recv_quit(void)
> +{
> +    RAMBlock *block;
> +    if (state.state & PIS_STATE_QUIT_RECEIVED) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        if (block->umem != NULL) {
> +            umem_destroy(block->umem);
> +            block->umem = NULL;
> +            block->flags &= ~RAM_POSTCOPY_UMEM_MASK;
> +        }
> +    }
> +
> +    DPRINTF("|= PIS_STATE_QUIT_RECEIVED\n");
> +    state.state |= PIS_STATE_QUIT_RECEIVED;
> +    qemu_set_fd_handler(state.from_umemd_fd, NULL, NULL, NULL);
> +    qemu_fclose(state.from_umemd);
> +    state.from_umemd = NULL;
> +    fd_close(&state.from_umemd_fd);
> +}
> +
> +static void postcopy_incoming_qemu_fflush_to_umemd_handler(void *opaque)
> +{
> +    assert(state.to_umemd != NULL);
> +
> +    nonblock_fflush(state.to_umemd);
> +    if (nonblock_pending_size(state.to_umemd) > 0) {
> +        return;
> +    }
> +
> +    qemu_set_fd_handler(state.to_umemd->fd, NULL, NULL, NULL);
> +    if (state.state & PIS_STATE_QUIT_QUEUED) {
> +        DPRINTF("|= PIS_STATE_QUIT_SENT\n");
> +        state.state |= PIS_STATE_QUIT_SENT;
> +        qemu_fclose(state.to_umemd->file);
> +        state.to_umemd = NULL;
> +        fd_close(&state.to_umemd_fd);
> +        g_free(state.faulted_pages);
> +        state.faulted_pages = NULL;
> +    }
> +}
> +
> +static void postcopy_incoming_qemu_fflush_to_umemd(void)
> +{
> +    qemu_set_fd_handler(state.to_umemd->fd, NULL,
> +                        postcopy_incoming_qemu_fflush_to_umemd_handler, NULL);
> +    postcopy_incoming_qemu_fflush_to_umemd_handler(NULL);
> +}
> +
> +static void postcopy_incoming_qemu_queue_quit(void)
> +{
> +    if (state.state & PIS_STATE_QUIT_QUEUED) {
> +        return;
> +    }
> +
> +    DPRINTF("|= PIS_STATE_QUIT_QUEUED\n");
> +    umem_qemu_quit(state.to_umemd->file);
> +    state.state |= PIS_STATE_QUIT_QUEUED;
> +}
> +
> +static void postcopy_incoming_qemu_send_pages_present(void)
> +{
> +    if (state.faulted_pages->nr > 0) {
> +        umem_qemu_send_pages_present(state.to_umemd->file,
> +                                     state.faulted_pages);
> +        state.faulted_pages->nr = 0;
> +    }
> +}
> +
> +static void postcopy_incoming_qemu_faulted_pages(
> +    const struct umem_pages *pages)
> +{
> +    assert(pages->nr <= MAX_FAULTED_PAGES);
> +    assert(state.faulted_pages != NULL);
> +
> +    if (state.faulted_pages->nr + pages->nr > MAX_FAULTED_PAGES) {
> +        postcopy_incoming_qemu_send_pages_present();
> +    }
> +    memcpy(&state.faulted_pages->pgoffs[state.faulted_pages->nr],
> +           &pages->pgoffs[0], sizeof(pages->pgoffs[0]) * pages->nr);
> +    state.faulted_pages->nr += pages->nr;
> +}
> +
> +static void postcopy_incoming_qemu_cleanup_umem(void);
> +
> +static int postcopy_incoming_qemu_handle_req_one(void)
> +{
> +    int offset = 0;
> +    int ret;
> +    uint8_t cmd;
> +
> +    ret = qemu_peek_buffer(state.from_umemd, &cmd, sizeof(cmd), offset);
> +    offset += sizeof(cmd);
> +    if (ret != sizeof(cmd)) {
> +        return -EAGAIN;
> +    }
> +    DPRINTF("cmd %c\n", cmd);
> +
> +    switch (cmd) {
> +    case UMEM_DAEMON_QUIT:
> +        postcopy_incoming_qemu_recv_quit();
> +        postcopy_incoming_qemu_queue_quit();
> +        postcopy_incoming_qemu_cleanup_umem();
> +        break;
> +    case UMEM_DAEMON_TRIGGER_PAGE_FAULT: {
> +        struct umem_pages *pages =
> +            umem_qemu_trigger_page_fault(state.from_umemd, &offset);
> +        if (pages == NULL) {
> +            return -EAGAIN;
> +        }
> +        if (state.to_umemd_fd >= 0 && !(state.state & PIS_STATE_QUIT_QUEUED)) {
> +            postcopy_incoming_qemu_faulted_pages(pages);
> +            g_free(pages);
> +        }
> +        break;
> +    }
> +    case UMEM_DAEMON_ERROR:
> +        /* umem daemon hit troubles, so it warned us to stop vm execution */
> +        vm_stop(RUN_STATE_IO_ERROR); /* or RUN_STATE_INTERNAL_ERROR */
> +        break;
> +    default:
> +        abort();
> +        break;
> +    }
> +
> +    if (state.from_umemd != NULL) {
> +        qemu_file_skip(state.from_umemd, offset);
> +    }
> +    return 0;
> +}
> +
> +static void postcopy_incoming_qemu_handle_req(void *opaque)
> +{
> +    do {
> +        int ret = postcopy_incoming_qemu_handle_req_one();
> +        if (ret == -EAGAIN) {
> +            break;
> +        }
> +    } while (state.from_umemd != NULL &&
> +             qemu_pending_size(state.from_umemd) > 0);
> +
> +    if (state.to_umemd != NULL) {
> +        if (state.faulted_pages->nr > 0) {
> +            postcopy_incoming_qemu_send_pages_present();
> +        }
> +        postcopy_incoming_qemu_fflush_to_umemd();
> +    }
> +}
> +
> +void postcopy_incoming_qemu_ready(void)
> +{
> +    umem_qemu_ready(state.to_umemd_fd);
> +
> +    state.from_umemd = qemu_fopen_fd(state.from_umemd_fd);
> +    state.to_umemd = qemu_fopen_nonblock(state.to_umemd_fd);
> +    qemu_set_fd_handler(state.from_umemd_fd,
> +                        postcopy_incoming_qemu_handle_req, NULL, NULL);
> +}
> +
> +static void postcopy_incoming_qemu_cleanup_umem(void)
> +{
> +    /* when qemu will quit before completing postcopy, tell umem daemon
> +       to tear down umem device and exit. */
> +    if (state.to_umemd_fd >= 0) {
> +        postcopy_incoming_qemu_queue_quit();
> +        postcopy_incoming_qemu_fflush_to_umemd();
> +    }
> +}
> +
> +void postcopy_incoming_qemu_cleanup(void)
> +{
> +    postcopy_incoming_qemu_cleanup_umem();
> +    if (state.to_umemd != NULL) {
> +        nonblock_wait_for_flush(state.to_umemd);
> +    }
> +}
> +
> +void postcopy_incoming_qemu_pages_unmapped(ram_addr_t addr, ram_addr_t size)
> +{
> +    uint64_t nr = DIV_ROUND_UP(size, state.host_page_size);
> +    size_t len = umem_pages_size(nr);
> +    ram_addr_t end = addr + size;
> +    struct umem_pages *pages;
> +    int i;
> +
> +    if (state.to_umemd_fd < 0 || state.state & PIS_STATE_QUIT_QUEUED) {
> +        return;
> +    }
> +    pages = g_malloc(len);
> +    pages->nr = nr;
> +    for (i = 0; addr < end; addr += state.host_page_size, i++) {
> +        pages->pgoffs[i] = addr >> state.host_page_shift;
> +    }
> +    umem_qemu_send_pages_unmapped(state.to_umemd->file, pages);
> +    g_free(pages);
> +    assert(state.to_umemd != NULL);
> +    postcopy_incoming_qemu_fflush_to_umemd();
> +}
> +
> +/**************************************************************************
> + * incoming umem daemon
> + */
> +
> +static void postcopy_incoming_umem_recv_quit(void)
> +{
> +    if (umemd.state & UMEM_STATE_QUIT_RECEIVED) {
> +        return;
> +    }
> +    DPRINTF("|= UMEM_STATE_QUIT_RECEIVED\n");
> +    umemd.state |= UMEM_STATE_QUIT_RECEIVED;
> +    qemu_fclose(umemd.from_qemu);
> +    umemd.from_qemu = NULL;
> +    fd_close(&umemd.from_qemu_fd);
> +}
> +
> +static void postcopy_incoming_umem_queue_quit(void)
> +{
> +    if (umemd.state & UMEM_STATE_QUIT_QUEUED) {
> +        return;
> +    }
> +    DPRINTF("|= UMEM_STATE_QUIT_QUEUED\n");
> +    umem_daemon_quit(umemd.to_qemu->file);
> +    umemd.state |= UMEM_STATE_QUIT_QUEUED;
> +}
> +
> +static void postcopy_incoming_umem_send_eoc_req(void)
> +{
> +    struct qemu_umem_req req;
> +
> +    if (umemd.state & UMEM_STATE_EOC_SENT) {
> +        return;
> +    }
> +
> +    DPRINTF("|= UMEM_STATE_EOC_SENT\n");
> +    req.cmd = QEMU_UMEM_REQ_EOC;
> +    postcopy_incoming_send_req(umemd.mig_write->file, &req);
> +    umemd.state |= UMEM_STATE_EOC_SENT;
> +    qemu_fclose(umemd.mig_write->file);
> +    umemd.mig_write = NULL;
> +    fd_close(&umemd.mig_write_fd);
> +}
> +
> +static void postcopy_incoming_umem_send_page_req(RAMBlock *block)
> +{
> +    struct qemu_umem_req req;
> +    int bit;
> +    uint64_t target_pgoff;
> +    int i;
> +
> +    umemd.page_request->nr = MAX_REQUESTS;
> +    umem_get_page_request(block->umem, umemd.page_request);
> +    DPRINTF("id %s nr %"PRId64" offs 0x%"PRIx64" 0x%"PRIx64"\n",
> +            block->idstr, (uint64_t)umemd.page_request->nr,
> +            (uint64_t)umemd.page_request->pgoffs[0],
> +            (uint64_t)umemd.page_request->pgoffs[1]);
> +
> +    if (umemd.last_block_write != block) {
> +        req.cmd = QEMU_UMEM_REQ_ON_DEMAND;
> +        req.idstr = block->idstr;
> +    } else {
> +        req.cmd = QEMU_UMEM_REQ_ON_DEMAND_CONT;
> +    }
> +
> +    req.nr = 0;
> +    req.pgoffs = umemd.target_pgoffs;
> +    if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
> +        for (i = 0; i < umemd.page_request->nr; i++) {
> +            target_pgoff = umemd.page_request->pgoffs[i] >>
> +                umemd.host_to_target_page_shift;
> +            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
> +
> +            if (!test_and_set_bit(bit, umemd.phys_requested)) {
> +                req.pgoffs[req.nr] = target_pgoff;
> +                req.nr++;
> +            }
> +        }
> +    } else {
> +        for (i = 0; i < umemd.page_request->nr; i++) {
> +            int j;
> +            target_pgoff = umemd.page_request->pgoffs[i] <<
> +                umemd.host_to_target_page_shift;
> +            bit = (block->offset >> TARGET_PAGE_BITS) + target_pgoff;
> +
> +            for (j = 0; j < umemd.nr_target_pages_per_host_page; j++) {
> +                if (!test_and_set_bit(bit + j, umemd.phys_requested)) {
> +                    req.pgoffs[req.nr] = target_pgoff + j;
> +                    req.nr++;
> +                }
> +            }
> +        }
> +    }
> +
> +    DPRINTF("id %s nr %d offs 0x%"PRIx64" 0x%"PRIx64"\n",
> +            block->idstr, req.nr, req.pgoffs[0], req.pgoffs[1]);
> +    if (req.nr > 0 && umemd.mig_write != NULL) {
> +        postcopy_incoming_send_req(umemd.mig_write->file, &req);
> +        umemd.last_block_write = block;
> +    }
> +}
> +
> +static void postcopy_incoming_umem_send_pages_present(void)
> +{
> +    if (umemd.present_request->nr > 0) {
> +        umem_daemon_send_pages_present(umemd.to_qemu->file,
> +                                       umemd.present_request);
> +        umemd.present_request->nr = 0;
> +    }
> +}
> +
> +static void postcopy_incoming_umem_pages_present_one(
> +    uint32_t nr, const uint64_t *pgoffs, uint64_t ramblock_pgoffset)
> +{
> +    uint32_t i;
> +    assert(nr <= MAX_PRESENT_REQUESTS);
> +
> +    if (umemd.present_request->nr + nr > MAX_PRESENT_REQUESTS) {
> +        postcopy_incoming_umem_send_pages_present();
> +    }
> +
> +    for (i = 0; i < nr; i++) {
> +        umemd.present_request->pgoffs[umemd.present_request->nr + i] =
> +            pgoffs[i] + ramblock_pgoffset;
> +    }
> +    umemd.present_request->nr += nr;
> +}
> +
> +static void postcopy_incoming_umem_pages_present(
> +    const struct umem_pages *page_cached, uint64_t ramblock_pgoffset)
> +{
> +    uint32_t left = page_cached->nr;
> +    uint32_t offset = 0;
> +
> +    while (left > 0) {
> +        uint32_t nr = MIN(left, MAX_PRESENT_REQUESTS);
> +        postcopy_incoming_umem_pages_present_one(
> +            nr, &page_cached->pgoffs[offset], ramblock_pgoffset);
> +
> +        left -= nr;
> +        offset += nr;
> +    }
> +}
> +
> +static int postcopy_incoming_umem_ram_load(void)
> +{
> +    ram_addr_t offset;
> +    int flags;
> +
> +    int ret;
> +    size_t skip = 0;
> +    uint64_t be64;
> +    RAMBlock *block;
> +
> +    void *shmem;
> +    int error;
> +    int i;
> +    int bit;
> +
> +    if (umemd.version_id != RAM_SAVE_VERSION_ID) {
> +        return -EINVAL;
> +    }
> +
> +    ret = qemu_peek_buffer(umemd.mig_read, (uint8_t*)&be64, sizeof(be64),
> +                           skip);
> +    skip += ret;
> +    if (ret != sizeof(be64)) {
> +        return -EAGAIN;
> +    }
> +    offset = be64_to_cpu(be64);
> +
> +    flags = offset & ~TARGET_PAGE_MASK;
> +    offset &= TARGET_PAGE_MASK;
> +
> +    assert(!(flags & RAM_SAVE_FLAG_MEM_SIZE));
> +
> +    if (flags & RAM_SAVE_FLAG_EOS) {
> +        DPRINTF("RAM_SAVE_FLAG_EOS\n");
> +        postcopy_incoming_umem_send_eoc_req();
> +
> +        qemu_fclose(umemd.mig_read);
> +        umemd.mig_read = NULL;
> +        fd_close(&umemd.mig_read_fd);
> +        umemd.state |= UMEM_STATE_EOS_RECEIVED;
> +
> +        postcopy_incoming_umem_queue_quit();
> +        DPRINTF("|= UMEM_STATE_EOS_RECEIVED\n");
> +        return 0;
> +    }
> +
> +    block = NULL;
> +    if (flags & RAM_SAVE_FLAG_CONTINUE) {
> +        block = umemd.last_block_read;
> +    } else {
> +        uint8_t len;
> +        char id[256];
> +
> +        ret = qemu_peek_buffer(umemd.mig_read, &len, sizeof(len), skip);
> +        skip += ret;
> +        if (ret != sizeof(len)) {
> +            return -EAGAIN;
> +        }
> +        ret = qemu_peek_buffer(umemd.mig_read, (uint8_t*)id, len, skip);
> +        skip += ret;
> +        if (ret != len) {
> +            return -EAGAIN;
> +        }
> +        block = ram_find_block(id, len);
> +    }
> +
> +    if (block == NULL) {
> +        return -EINVAL;
> +    }
> +    umemd.last_block_read = block;
> +    shmem = block->host + offset;
> +
> +    if (flags & RAM_SAVE_FLAG_COMPRESS) {
> +        uint8_t ch;
> +        ret = qemu_peek_buffer(umemd.mig_read, &ch, sizeof(ch), skip);
> +        skip += ret;
> +        if (ret != sizeof(ch)) {
> +            return -EAGAIN;
> +        }
> +        memset(shmem, ch, TARGET_PAGE_SIZE);
> +    } else if (flags & RAM_SAVE_FLAG_PAGE) {
> +        ret = qemu_peek_buffer(umemd.mig_read, shmem, TARGET_PAGE_SIZE, skip);
> +        skip += ret;
> +        if (ret != TARGET_PAGE_SIZE){
> +            return -EAGAIN;
> +        }
> +    }
> +    qemu_file_skip(umemd.mig_read, skip);
> +
> +    error = qemu_file_get_error(umemd.mig_read);
> +    if (error) {
> +        DPRINTF("error %d\n", error);
> +        return error;
> +    }
> +
> +    qemu_madvise(shmem, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
> +
> +    umemd.page_cached->nr = 0;
> +    bit = (umemd.last_block_read->offset + offset) >> TARGET_PAGE_BITS;
> +    if (!test_and_set_bit(bit, umemd.phys_received)) {
> +        if (TARGET_PAGE_SIZE >= umemd.host_page_size) {
> +            uint64_t pgoff = offset >> umemd.host_page_shift;
> +            for (i = 0; i < umemd.nr_host_pages_per_target_page; i++) {
> +                umemd.page_cached->pgoffs[umemd.page_cached->nr] = pgoff + i;
> +                umemd.page_cached->nr++;
> +            }
> +        } else {
> +            bool mark_cache = true;
> +            for (i = 0; i < umemd.nr_target_pages_per_host_page; i++) {
> +                if (!test_bit(bit + i, umemd.phys_received)) {
> +                    mark_cache = false;
> +                    break;
> +                }
> +            }
> +            if (mark_cache) {
> +                umemd.page_cached->pgoffs[0] = offset >> umemd.host_page_shift;
> +                umemd.page_cached->nr = 1;
> +            }
> +        }
> +    }
> +
> +    if (umemd.page_cached->nr > 0) {
> +        umem_mark_page_cached(umemd.last_block_read->umem, umemd.page_cached);
> +
> +        if (!(umemd.state & UMEM_STATE_QUIT_QUEUED) && umemd.to_qemu_fd >=0 &&
> +            (incoming_postcopy_flags & INCOMING_FLAGS_FAULT_REQUEST)) {
> +            uint64_t ramblock_pgoffset;
> +
> +            ramblock_pgoffset =
> +                umemd.last_block_read->offset >> umemd.host_page_shift;
> +            postcopy_incoming_umem_pages_present(umemd.page_cached,
> +                                                 ramblock_pgoffset);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static bool postcopy_incoming_umem_check_umem_done(void)
> +{
> +    bool all_done = true;
> +    RAMBlock *block;
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        UMem *umem = block->umem;
> +        if (umem != NULL && umem->nsets == umem->nbits) {
> +            umem_unmap_shmem(umem);
> +            umem_destroy(umem);
> +            block->umem = NULL;
> +        }
> +        if (block->umem != NULL) {
> +            all_done = false;
> +        }
> +    }
> +    return all_done;
> +}
> +
> +static bool postcopy_incoming_umem_page_faulted(const struct umem_pages *pages)
> +{
> +    int i;
> +
> +    for (i = 0; i < pages->nr; i++) {
> +        ram_addr_t addr = pages->pgoffs[i] << umemd.host_page_shift;
> +        RAMBlock *block = qemu_get_ram_block(addr);
> +        addr -= block->offset;
> +        umem_remove_shmem(block->umem, addr, umemd.host_page_size);
> +    }
> +    return postcopy_incoming_umem_check_umem_done();
> +}
> +
> +static bool
> +postcopy_incoming_umem_page_unmapped(const struct umem_pages *pages)
> +{
> +    RAMBlock *block;
> +    ram_addr_t addr;
> +    int i;
> +
> +    struct qemu_umem_req req = {
> +        .cmd = QEMU_UMEM_REQ_REMOVE,
> +        .nr = 0,
> +        .pgoffs = (uint64_t*)pages->pgoffs,
> +    };
> +
> +    addr = pages->pgoffs[0] << umemd.host_page_shift;
> +    block = qemu_get_ram_block(addr);
> +
> +    for (i = 0; i < pages->nr; i++)  {
> +        int pgoff;
> +
> +        addr = pages->pgoffs[i] << umemd.host_page_shift;
> +        pgoff = addr >> TARGET_PAGE_BITS;
> +        if (!test_bit(pgoff, umemd.phys_received) &&
> +            !test_bit(pgoff, umemd.phys_requested)) {
> +            req.pgoffs[req.nr] = pgoff;
> +            req.nr++;
> +        }
> +        set_bit(pgoff, umemd.phys_received);
> +        set_bit(pgoff, umemd.phys_requested);
> +
> +        umem_remove_shmem(block->umem,
> +                          addr - block->offset, umemd.host_page_size);
> +    }
> +    if (req.nr > 0 && umemd.mig_write != NULL) {
> +        req.idstr = block->idstr;
> +        postcopy_incoming_send_req(umemd.mig_write->file, &req);
> +    }
> +
> +    return postcopy_incoming_umem_check_umem_done();
> +}
> +
> +static void postcopy_incoming_umem_done(void)
> +{
> +    postcopy_incoming_umem_send_eoc_req();
> +    postcopy_incoming_umem_queue_quit();
> +}
> +
> +static int postcopy_incoming_umem_handle_qemu(void)
> +{
> +    int ret;
> +    int offset = 0;
> +    uint8_t cmd;
> +
> +    ret = qemu_peek_buffer(umemd.from_qemu, &cmd, sizeof(cmd), offset);
> +    offset += sizeof(cmd);
> +    if (ret != sizeof(cmd)) {
> +        return -EAGAIN;
> +    }
> +    DPRINTF("cmd %c\n", cmd);
> +    switch (cmd) {
> +    case UMEM_QEMU_QUIT:
> +        postcopy_incoming_umem_recv_quit();
> +        postcopy_incoming_umem_done();
> +        break;
> +    case UMEM_QEMU_PAGE_FAULTED: {
> +        struct umem_pages *pages = umem_recv_pages(umemd.from_qemu,
> +                                                   &offset);
> +        if (pages == NULL) {
> +            return -EAGAIN;
> +        }
> +        if (postcopy_incoming_umem_page_faulted(pages)){
> +            postcopy_incoming_umem_done();
> +        }
> +        g_free(pages);
> +        break;
> +    }
> +    case UMEM_QEMU_PAGE_UNMAPPED: {
> +        struct umem_pages *pages = umem_recv_pages(umemd.from_qemu,
> +                                                   &offset);
> +        if (pages == NULL) {
> +            return -EAGAIN;
> +        }
> +        if (postcopy_incoming_umem_page_unmapped(pages)){
> +            postcopy_incoming_umem_done();
> +        }
> +        g_free(pages);
> +        break;
> +    }
> +    default:
> +        abort();
> +        break;
> +    }
> +    if (umemd.from_qemu != NULL) {
> +        qemu_file_skip(umemd.from_qemu, offset);
> +    }
> +    return 0;
> +}
> +
> +static void set_fd(int fd, fd_set *fds, int *nfds)
> +{
> +    FD_SET(fd, fds);
> +    if (fd > *nfds) {
> +        *nfds = fd;
> +    }
> +}
> +
> +static int postcopy_incoming_umemd_main_loop(void)
> +{
> +    fd_set writefds;
> +    fd_set readfds;
> +    int nfds;
> +    RAMBlock *block;
> +    int ret;
> +
> +    int pending_size;
> +    bool get_page_request;
> +
> +    nfds = -1;
> +    FD_ZERO(&writefds);
> +    FD_ZERO(&readfds);
> +
> +    if (umemd.mig_write != NULL) {
> +        pending_size = nonblock_pending_size(umemd.mig_write);
> +        if (pending_size > 0) {
> +            set_fd(umemd.mig_write_fd, &writefds, &nfds);
> +        }
> +    } else {
> +        pending_size = 0;
> +    }
> +
> +#define PENDING_SIZE_MAX (MAX_REQUESTS * sizeof(uint64_t) * 2)
> +    /* If page request to the migration source is accumulated,
> +       suspend getting page fault request. */
> +    get_page_request = (pending_size <= PENDING_SIZE_MAX);
> +
> +    if (get_page_request) {
> +        QLIST_FOREACH(block, &ram_list.blocks, next) {
> +            if (block->umem != NULL) {
> +                set_fd(block->umem->fd, &readfds, &nfds);
> +            }
> +        }
> +    }
> +
> +    if (umemd.mig_read_fd >= 0) {
> +        set_fd(umemd.mig_read_fd, &readfds, &nfds);
> +    }
> +
> +    if (umemd.to_qemu != NULL &&
> +        nonblock_pending_size(umemd.to_qemu) > 0) {
> +        set_fd(umemd.to_qemu_fd, &writefds, &nfds);
> +    }
> +    if (umemd.from_qemu_fd >= 0) {
> +        set_fd(umemd.from_qemu_fd, &readfds, &nfds);
> +    }
> +
> +    ret = select(nfds + 1, &readfds, &writefds, NULL, NULL);
> +    if (ret == -1) {
> +        if (errno == EINTR) {
> +            return 0;
> +        }
> +        return ret;
> +    }
> +
> +    if (umemd.mig_write_fd >= 0 && FD_ISSET(umemd.mig_write_fd, &writefds)) {
> +        nonblock_fflush(umemd.mig_write);
> +    }
> +    if (umemd.to_qemu_fd >= 0 && FD_ISSET(umemd.to_qemu_fd, &writefds)) {
> +        nonblock_fflush(umemd.to_qemu);
> +    }
> +    if (get_page_request) {
> +        QLIST_FOREACH(block, &ram_list.blocks, next) {
> +            if (block->umem != NULL && FD_ISSET(block->umem->fd, &readfds)) {
> +                postcopy_incoming_umem_send_page_req(block);
> +            }
> +        }
> +    }
> +    if (umemd.mig_read_fd >= 0 && FD_ISSET(umemd.mig_read_fd, &readfds)) {
> +        do {
> +            ret = postcopy_incoming_umem_ram_load();
> +            if (ret == -EAGAIN) {
> +                break;
> +            }
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        } while (umemd.mig_read != NULL &&
> +                 qemu_pending_size(umemd.mig_read) > 0);
> +    }
> +    if (umemd.from_qemu_fd >= 0 && FD_ISSET(umemd.from_qemu_fd, &readfds)) {
> +        do {
> +            ret = postcopy_incoming_umem_handle_qemu();
> +            if (ret == -EAGAIN) {
> +                break;
> +            }
> +        } while (umemd.from_qemu != NULL &&
> +                 qemu_pending_size(umemd.from_qemu) > 0);
> +    }
> +
> +    if (umemd.mig_write != NULL) {
> +        nonblock_fflush(umemd.mig_write);
> +    }
> +    if (umemd.to_qemu != NULL) {
> +        if (!(umemd.state & UMEM_STATE_QUIT_QUEUED)) {
> +            postcopy_incoming_umem_send_pages_present();
> +        }
> +        nonblock_fflush(umemd.to_qemu);
> +        if ((umemd.state & UMEM_STATE_QUIT_QUEUED) &&
> +            nonblock_pending_size(umemd.to_qemu) == 0) {
> +            DPRINTF("|= UMEM_STATE_QUIT_SENT\n");
> +            qemu_fclose(umemd.to_qemu->file);
> +            umemd.to_qemu = NULL;
> +            fd_close(&umemd.to_qemu_fd);
> +            umemd.state |= UMEM_STATE_QUIT_SENT;
> +        }
> +    }
> +
> +    return (umemd.state & UMEM_STATE_END_MASK) == UMEM_STATE_END_MASK;
> +}
> +
> +static void postcopy_incoming_umemd(void)
> +{
> +    ram_addr_t last_ram_offset;
> +    int nbits;
> +    RAMBlock *block;
> +    int ret;
> +
> +    qemu_daemon(1, 1);
> +    signal(SIGPIPE, SIG_IGN);
> +    DPRINTF("daemon pid: %d\n", getpid());
> +
> +    umemd.page_request = g_malloc(umem_pages_size(MAX_REQUESTS));
> +
> +    umemd.page_cached = g_malloc(
> +        umem_pages_size(MAX_REQUESTS *
> +                        (TARGET_PAGE_SIZE >= umemd.host_page_size ?
> +                         1: umemd.nr_host_pages_per_target_page)));
> +
> +    umemd.target_pgoffs =
> +        g_new(uint64_t, MAX_REQUESTS *
> +              MAX(umemd.nr_host_pages_per_target_page,
> +                  umemd.nr_target_pages_per_host_page));
> +    umemd.present_request = g_malloc(umem_pages_size(MAX_PRESENT_REQUESTS));
> +    umemd.present_request->nr = 0;
> +
> +    last_ram_offset = qemu_last_ram_offset();
> +    nbits = last_ram_offset >> TARGET_PAGE_BITS;
> +    umemd.phys_requested = g_new0(unsigned long, BITS_TO_LONGS(nbits));
> +    umemd.phys_received = g_new0(unsigned long, BITS_TO_LONGS(nbits));
> +    umemd.last_block_read = NULL;
> +    umemd.last_block_write = NULL;
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        UMem *umem = block->umem;
> +        umem->umem = NULL;      /* umem mapping area has VM_DONT_COPY flag,
> +                                   so we lost those mappings by fork */
> +        block->host = umem_map_shmem(umem);
> +        umem_close_shmem(umem);
> +    }
> +    umem_daemon_ready(umemd.to_qemu_fd);
> +    umemd.to_qemu = qemu_fopen_nonblock(umemd.to_qemu_fd);
> +
> +    /* wait for qemu to disown migration_fd */
> +    umem_daemon_wait_for_qemu(umemd.from_qemu_fd);
> +    umemd.from_qemu = qemu_fopen_fd(umemd.from_qemu_fd);
> +
> +    DPRINTF("entering umemd main loop\n");
> +    for (;;) {
> +        ret = postcopy_incoming_umemd_main_loop();
> +        if (ret != 0) {
> +            break;
> +        }
> +    }
> +    DPRINTF("exiting umemd main loop\n");
> +
> +    /* This daemon forked from qemu and the parent qemu is still running.
> +     * Cleanups of linked libraries like SDL should not be triggered,
> +     * otherwise the parent qemu may use resources which was already freed.
> +     */
> +    fflush(stdout);
> +    fflush(stderr);
> +    _exit(ret < 0? EXIT_FAILURE: 0);
> +}


> index 057c810..598ad4c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
>  
>  struct Monitor;
>  typedef struct Monitor Monitor;
> +typedef struct UMem UMem;

Does this belong in the umem patc?

> diff --git a/savevm.c b/savevm.c
> index bd4b5bf..74b15e7 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1938,6 +1938,7 @@ int qemu_loadvm_state(QEMUFile *f)
>      uint8_t section_type;
>      unsigned int v;
>      int ret;
> +    QEMUFile *orig_f = NULL;

This is hack-y at least.  But I don't have a good suggestion right now :-(

> +        case QEMU_VM_POSTCOPY:
> +            if (incoming_postcopy) {
> +                /* VMStateDescription:pre/post_load and
> +                 * cpu_sychronize_all_post_init() may fault on guest RAM.
> +                 * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME)
> +                 * postcopy daemon needs to be forked before the fault.
> +                 */
> +                uint32_t size = qemu_get_be32(f);
> +                uint8_t *buf = g_malloc(size);

This cames from the network.  Checking that the value is "reasonable"
looks like a good idea (yes, I know that migration is specially bad
about testing values that cames from the network.


> @@ -2032,11 +2069,17 @@ int qemu_loadvm_state(QEMUFile *f)
>          }
>      }
>  
> +    fprintf(stderr, "%s:%d QEMU_VM_EOF\n", __func__, __LINE__);

DPRINTF or removal?

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
  2012-06-04  9:57 ` [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
  2012-06-14 21:56   ` Juan Quintela
@ 2012-06-14 21:58   ` Juan Quintela
  1 sibling, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 21:58 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> This patch implements postcopy live migration for incoming part
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>


> +void ram_save_set_params(const MigrationParams *params, void *opaque);

> -    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID, NULL,
> -                         ram_save_live, NULL, ram_load, NULL);
> +    register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID,
> +                         ram_save_set_params, ram_save_live, NULL,
> +                         incoming_postcopy ?
> +                         postcopy_incoming_ram_load : ram_load, NULL);


ram_save_set_params() used on this patch but defined on next one.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
  2012-06-04  9:57 ` [PATCH v2 37/41] postcopy: implement outgoing " Isaku Yamahata
@ 2012-06-14 22:12   ` Juan Quintela
  0 siblings, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 22:12 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: qemu-devel, kvm, owasserm, avi, pbonzini, aliguori, stefanha,
	dlaor, mdroth, yoshikawa.takuya, benoit.hudzia, aarcange,
	t.hirofuchi, satoshi.itoh

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> This patch implements postcopy live migration for outgoing part
>
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
> Changes v1 -> v2:
> - fix parameter to qemu_fdopen()
> - handle QEMU_UMEM_REQ_EOC properly
>   when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
>   handle properly it.
> - flush on-demand page unconditionally
> - improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
> - use qemu_fopen_fd
> - use memory api instead of obsolete api
> - segv in postcopy_outgoing_check_all_ram_sent()
> - catch up qapi change
> ---
>  arch_init.c               |   19 ++-
>  migration-exec.c          |    4 +
>  migration-fd.c            |   17 ++
>  migration-postcopy-stub.c |   22 +++
>  migration-postcopy.c      |  450 +++++++++++++++++++++++++++++++++++++++++++++
>  migration-tcp.c           |   25 ++-
>  migration-unix.c          |   26 ++-
>  migration.c               |   32 +++-
>  migration.h               |   12 ++
>  savevm.c                  |   22 ++-
>  sysemu.h                  |    2 +-
>  11 files changed, 614 insertions(+), 17 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 22d9691..3599e5c 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -154,6 +154,13 @@ static int is_dup_page(uint8_t *page)
>      return 1;
>  }
>  
> +static bool outgoing_postcopy = false;
> +
> +void ram_save_set_params(const MigrationParams *params, void *opaque)
> +{
> +    outgoing_postcopy = params->postcopy;
> +}
> +
>  static RAMBlock *last_block_sent = NULL;
>  static uint64_t bytes_transferred;
>  
> @@ -343,6 +350,15 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>      uint64_t expected_time = 0;
>      int ret;
>  
> +    if (stage == 1) {
> +        bytes_transferred = 0;
> +        last_block_sent = NULL;
> +        ram_save_set_last_block(NULL, 0);
> +    }
> +    if (outgoing_postcopy) {
> +        return postcopy_outgoing_ram_save_live(f, stage, opaque);
> +    }
> +
>      if (stage < 0) {
>          memory_global_dirty_log_stop();
>          return 0;
> @@ -351,9 +367,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>      memory_global_sync_dirty_bitmap(get_system_memory());
>  
>      if (stage == 1) {
> -        bytes_transferred = 0;
> -        last_block_sent = NULL;
> -        ram_save_set_last_block(NULL, 0);
>          sort_ram_list();
>  
>          /* Make sure all dirty bits are set */
> diff --git a/migration-exec.c b/migration-exec.c
> index 7f08b3b..a90da5c 100644
> --- a/migration-exec.c
> +++ b/migration-exec.c
> @@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command)
>  {
>      FILE *f;
>  
> +    if (s->params.postcopy) {
> +        return -ENOSYS;
> +    }
> +
>      f = popen(command, "w");
>      if (f == NULL) {
>          DPRINTF("Unable to popen exec target\n");
> diff --git a/migration-fd.c b/migration-fd.c
> index 42b8162..83b5f18 100644
> --- a/migration-fd.c
> +++ b/migration-fd.c
> @@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname)
>      s->write = fd_write;
>      s->close = fd_close;
>  
> +    if (s->params.postcopy) {
> +        int flags = fcntl(s->fd, F_GETFL);
> +        if ((flags & O_ACCMODE) != O_RDWR) {
> +            goto err_after_open;
> +        }
> +
> +        s->fd_read = dup(s->fd);
> +        if (s->fd_read == -1) {
> +            goto err_after_open;
> +        }
> +        s->file_read = qemu_fopen_fd(s->fd_read);
> +        if (s->file_read == NULL) {
> +            close(s->fd_read);
> +            goto err_after_open;
> +        }
> +    }
> +
>      migrate_fd_connect(s);
>      return 0;
>  
> diff --git a/migration-postcopy-stub.c b/migration-postcopy-stub.c
> index f9ebcbe..9c64827 100644
> --- a/migration-postcopy-stub.c
> +++ b/migration-postcopy-stub.c
> @@ -24,6 +24,28 @@
>  #include "sysemu.h"
>  #include "migration.h"
>  
> +int postcopy_outgoing_create_read_socket(MigrationState *s)
> +{
> +    return -ENOSYS;
> +}
> +
> +int postcopy_outgoing_ram_save_live(Monitor *mon,
> +                                    QEMUFile *f, int stage, void *opaque)
> +{
> +    return -ENOSYS;
> +}
> +
> +void *postcopy_outgoing_begin(MigrationState *ms)
> +{
> +    return NULL;
> +}
> +
> +int postcopy_outgoing_ram_save_background(Monitor *mon, QEMUFile *f,
> +                                          void *postcopy)
> +{
> +    return -ENOSYS;
> +}
> +
>  int postcopy_incoming_init(const char *incoming, bool incoming_postcopy)
>  {
>      return -ENOSYS;
> diff --git a/migration-postcopy.c b/migration-postcopy.c
> index 5913e05..eb37094 100644
> --- a/migration-postcopy.c
> +++ b/migration-postcopy.c
> @@ -177,6 +177,456 @@ static void postcopy_incoming_send_req(QEMUFile *f,
>      }
>  }
>  
> +static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
> +                                            struct qemu_umem_req *req,
> +                                            size_t *offset)
> +{
> +    int ret;
> +
> +    req->len = qemu_peek_byte(f, *offset);
> +    *offset += 1;
> +    if (req->len == 0) {
> +        return -EAGAIN;
> +    }
> +    req->idstr = g_malloc((int)req->len + 1);
> +    ret = qemu_peek_buffer(f, (uint8_t*)req->idstr, req->len, *offset);
> +    *offset += ret;
> +    if (ret != req->len) {
> +        g_free(req->idstr);
> +        req->idstr = NULL;
> +        return -EAGAIN;
> +    }
> +    req->idstr[req->len] = 0;
> +    return 0;
> +}
> +
> +static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f,
> +                                             struct qemu_umem_req *req,
> +                                             size_t *offset)
> +{
> +    int ret;
> +    uint32_t be32;
> +    uint32_t i;
> +
> +    ret = qemu_peek_buffer(f, (uint8_t*)&be32, sizeof(be32), *offset);
> +    *offset += sizeof(be32);
> +    if (ret != sizeof(be32)) {
> +        return -EAGAIN;
> +    }
> +
> +    req->nr = be32_to_cpu(be32);
> +    req->pgoffs = g_new(uint64_t, req->nr);
> +    for (i = 0; i < req->nr; i++) {
> +        uint64_t be64;
> +        ret = qemu_peek_buffer(f, (uint8_t*)&be64, sizeof(be64), *offset);
> +        *offset += sizeof(be64);
> +        if (ret != sizeof(be64)) {
> +            g_free(req->pgoffs);
> +            req->pgoffs = NULL;
> +            return -EAGAIN;
> +        }
> +        req->pgoffs[i] = be64_to_cpu(be64);
> +    }
> +    return 0;
> +}
> +
> +static int postcopy_outgoing_recv_req(QEMUFile *f, struct qemu_umem_req *req)
> +{
> +    int size;
> +    int ret;
> +    size_t offset = 0;
> +
> +    size = qemu_peek_buffer(f, (uint8_t*)&req->cmd, 1, offset);
> +    if (size <= 0) {
> +        return -EAGAIN;
> +    }
> +    offset += 1;
> +
> +    switch (req->cmd) {
> +    case QEMU_UMEM_REQ_INIT:
> +    case QEMU_UMEM_REQ_EOC:
> +        /* nothing */
> +        break;
> +    case QEMU_UMEM_REQ_ON_DEMAND:
> +    case QEMU_UMEM_REQ_BACKGROUND:
> +    case QEMU_UMEM_REQ_REMOVE:
> +        ret = postcopy_outgoing_recv_req_idstr(f, req, &offset);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        break;
> +    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
> +    case QEMU_UMEM_REQ_BACKGROUND_CONT:
> +        ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        break;
> +    default:
> +        abort();
> +        break;
> +    }
> +    qemu_file_skip(f, offset);
> +    DPRINTF("cmd %d\n", req->cmd);
> +    return 0;
> +}
> +
> +static void postcopy_outgoing_free_req(struct qemu_umem_req *req)
> +{
> +    g_free(req->idstr);
> +    g_free(req->pgoffs);
> +}
> +
> +/***************************************************************************
> + * outgoing part
> + */
> +
> +#define QEMU_SAVE_LIVE_STAGE_START      0x01    /* = QEMU_VM_SECTION_START */
> +#define QEMU_SAVE_LIVE_STAGE_PART       0x02    /* = QEMU_VM_SECTION_PART */
> +#define QEMU_SAVE_LIVE_STAGE_END        0x03    /* = QEMU_VM_SECTION_END */
> +
> +enum POState {
> +    PO_STATE_ERROR_RECEIVE,
> +    PO_STATE_ACTIVE,
> +    PO_STATE_EOC_RECEIVED,
> +    PO_STATE_ALL_PAGES_SENT,
> +    PO_STATE_COMPLETED,
> +};
> +typedef enum POState POState;
> +
> +struct PostcopyOutgoingState {
> +    POState state;
> +    QEMUFile *mig_read;
> +    int fd_read;
> +    RAMBlock *last_block_read;
> +
> +    QEMUFile *mig_buffered_write;
> +    MigrationState *ms;
> +
> +    /* For nobg mode. Check if all pages are sent */
> +    RAMBlock *block;
> +    ram_addr_t offset;
> +};
> +typedef struct PostcopyOutgoingState PostcopyOutgoingState;
> +
> +int postcopy_outgoing_create_read_socket(MigrationState *s)
> +{
> +    if (!s->params.postcopy) {
> +        return 0;
> +    }
> +
> +    s->fd_read = dup(s->fd);
> +    if (s->fd_read == -1) {
> +        int ret = -errno;
> +        perror("dup");
> +        return ret;
> +    }
> +    s->file_read = qemu_fopen_socket(s->fd_read);
> +    if (s->file_read == NULL) {
> +        return -EINVAL;
> +    }
> +    return 0;
> +}
> +
> +int postcopy_outgoing_ram_save_live(QEMUFile *f, int stage, void *opaque)
> +{
> +    int ret = 0;
> +    DPRINTF("stage %d\n", stage);
> +    switch (stage) {
> +    case QEMU_SAVE_LIVE_STAGE_START:
> +        sort_ram_list();
> +        ram_save_live_mem_size(f);
> +        break;
> +    case QEMU_SAVE_LIVE_STAGE_PART:
> +        ret = 1;
> +        break;
> +    case QEMU_SAVE_LIVE_STAGE_END:
> +        break;
> +    default:
> +        abort();
> +    }
> +    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +    return ret;
> +}
> +
> +/*
> + * return value
> + *   0: continue postcopy mode
> + * > 0: completed postcopy mode.
> + * < 0: error
> + */
> +static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
> +                                        const struct qemu_umem_req *req,
> +                                        bool *written)
> +{
> +    int i;
> +    RAMBlock *block;
> +
> +    DPRINTF("cmd %d state %d\n", req->cmd, s->state);
> +    switch(req->cmd) {
> +    case QEMU_UMEM_REQ_INIT:
> +        /* nothing */
> +        break;
> +    case QEMU_UMEM_REQ_EOC:
> +        /* tell to finish migration. */
> +        if (s->state == PO_STATE_ALL_PAGES_SENT) {
> +            s->state = PO_STATE_COMPLETED;
> +            DPRINTF("-> PO_STATE_COMPLETED\n");
> +        } else {
> +            s->state = PO_STATE_EOC_RECEIVED;
> +            DPRINTF("-> PO_STATE_EOC_RECEIVED\n");
> +        }
> +        return 1;
> +    case QEMU_UMEM_REQ_ON_DEMAND:
> +    case QEMU_UMEM_REQ_BACKGROUND:
> +        DPRINTF("idstr: %s\n", req->idstr);
> +        block = ram_find_block(req->idstr, strlen(req->idstr));
> +        if (block == NULL) {
> +            return -EINVAL;
> +        }
> +        s->last_block_read = block;
> +        /* fall through */
> +    case QEMU_UMEM_REQ_ON_DEMAND_CONT:
> +    case QEMU_UMEM_REQ_BACKGROUND_CONT:
> +        DPRINTF("nr %d\n", req->nr);
> +        if (s->mig_buffered_write == NULL) {
> +            assert(s->state == PO_STATE_ALL_PAGES_SENT);
> +            break;
> +        }
> +        for (i = 0; i < req->nr; i++) {
> +            DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]);
> +            int ret = ram_save_page(s->mig_buffered_write, s->last_block_read,
> +                                    req->pgoffs[i] << TARGET_PAGE_BITS);
> +            if (ret > 0) {
> +                *written = true;
> +            }
> +        }
> +        break;
> +    case QEMU_UMEM_REQ_REMOVE:
> +        block = ram_find_block(req->idstr, strlen(req->idstr));
> +        if (block == NULL) {
> +            return -EINVAL;
> +        }
> +        for (i = 0; i < req->nr; i++) {
> +            ram_addr_t offset = req->pgoffs[i] << TARGET_PAGE_BITS;
> +            memory_region_reset_dirty(block->mr, offset, TARGET_PAGE_SIZE,
> +                                      MIGRATION_DIRTY_FLAG);
> +        }
> +        break;
> +    default:
> +        return -EINVAL;
> +    }
> +    return 0;
> +}
> +
> +static void postcopy_outgoing_close_mig_read(PostcopyOutgoingState *s)
> +{
> +    if (s->mig_read != NULL) {
> +        qemu_set_fd_handler(s->fd_read, NULL, NULL, NULL);
> +        qemu_fclose(s->mig_read);
> +        s->mig_read = NULL;
> +        fd_close(&s->fd_read);
> +
> +        s->ms->file_read = NULL;
> +        s->ms->fd_read = -1;
> +    }
> +}
> +
> +static void postcopy_outgoing_completed(PostcopyOutgoingState *s)
> +{
> +    postcopy_outgoing_close_mig_read(s);
> +    s->ms->postcopy = NULL;
> +    g_free(s);
> +}
> +
> +static void postcopy_outgoing_recv_handler(void *opaque)
> +{
> +    PostcopyOutgoingState *s = opaque;
> +    bool written = false;
> +    int ret = 0;
> +
> +    assert(s->state == PO_STATE_ACTIVE ||
> +           s->state == PO_STATE_ALL_PAGES_SENT);
> +
> +    do {
> +        struct qemu_umem_req req = {.idstr = NULL,
> +                                    .pgoffs = NULL};
> +
> +        ret = postcopy_outgoing_recv_req(s->mig_read, &req);
> +        if (ret < 0) {
> +            if (ret == -EAGAIN) {
> +                ret = 0;
> +            }
> +            break;
> +        }
> +
> +        /* Even when s->state == PO_STATE_ALL_PAGES_SENT,
> +           some request can be received like QEMU_UMEM_REQ_EOC */
> +        ret = postcopy_outgoing_handle_req(s, &req, &written);
> +        postcopy_outgoing_free_req(&req);
> +    } while (ret == 0);
> +
> +    /*
> +     * flush buffered_file.
> +     * Although mig_write is rate-limited buffered file, those written pages
> +     * are requested on demand by the destination. So forcibly push
> +     * those pages ignoring rate limiting
> +     */
> +    if (written) {
> +        qemu_buffered_file_drain(s->mig_buffered_write);
> +    }
> +
> +    if (ret < 0) {
> +        switch (s->state) {
> +        case PO_STATE_ACTIVE:
> +            s->state = PO_STATE_ERROR_RECEIVE;
> +            DPRINTF("-> PO_STATE_ERROR_RECEIVE\n");
> +            break;
> +        case PO_STATE_ALL_PAGES_SENT:
> +            s->state = PO_STATE_COMPLETED;
> +            DPRINTF("-> PO_STATE_ALL_PAGES_SENT\n");
> +            break;
> +        default:
> +            abort();
> +        }
> +    }
> +    if (s->state == PO_STATE_ERROR_RECEIVE || s->state == PO_STATE_COMPLETED) {
> +        postcopy_outgoing_close_mig_read(s);
> +    }
> +    if (s->state == PO_STATE_COMPLETED) {
> +        DPRINTF("PO_STATE_COMPLETED\n");
> +        MigrationState *ms = s->ms;
> +        postcopy_outgoing_completed(s);
> +        migrate_fd_completed(ms);
> +    }
> +}
> +
> +void *postcopy_outgoing_begin(MigrationState *ms)
> +{
> +    PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1);
> +    DPRINTF("outgoing begin\n");
> +    qemu_fflush(ms->file);
> +
> +    s->ms = ms;
> +    s->state = PO_STATE_ACTIVE;
> +    s->fd_read = ms->fd_read;
> +    s->mig_read = ms->file_read;
> +    s->mig_buffered_write = ms->file;
> +    s->block = NULL;
> +    s->offset = 0;
> +
> +    /* Make sure all dirty bits are set */
> +    cpu_physical_memory_set_dirty_tracking(0);
> +    ram_save_memory_set_dirty();
> +
> +    qemu_set_fd_handler(s->fd_read,
> +                        &postcopy_outgoing_recv_handler, NULL, s);
> +    return s;
> +}

Why des it return void*?  It can just return PostcopyOutgoingState, and
be doing with it?



> @@ -111,11 +119,19 @@ int unix_start_outgoing_migration(MigrationState *s, const char *path)
>  
>      if (ret < 0) {
>          DPRINTF("connect failed\n");
> -        migrate_fd_error(s);
> -        return ret;
> +        goto error_out;
> +    }
> +
> +    ret = postcopy_outgoing_create_read_socket(s);
> +    if (ret < 0) {
> +        goto error_out;
>      }
>      migrate_fd_connect(s);
>      return 0;
> +
> +error_out:
> +    migrate_fd_error(s);
> +    return ret;
>  }
>  

Wondering if we can refactor this in a single place?  We are repeating
it for each migration type.

> diff --git a/sysemu.h b/sysemu.h
> index 3857cf0..6ee4cd8 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -79,7 +79,7 @@ void qemu_announce_self(void);
>  bool qemu_savevm_state_blocked(Error **errp);
>  int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params);
>  int qemu_savevm_state_iterate(QEMUFile *f);
> -int qemu_savevm_state_complete(QEMUFile *f);
> +int qemu_savevm_state_complete(QEMUFile *f, bool postcopy);
>  void qemu_savevm_state_cancel(QEMUFile *f);
>  int qemu_loadvm_state(QEMUFile *f);

A question, could we just pass the params structure?  otherwise, if we
need another argument for another extensio, we should have to add yet
another argument.

I still need to take a more detailed look to see if I can see an easy
way to integrate postcopy, still quite a bit of code repeated.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/41] postcopy live migration
  2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
                   ` (41 preceding siblings ...)
  2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
@ 2012-06-14 22:18 ` Juan Quintela
  42 siblings, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-14 22:18 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> After the long time, we have v2. This is qemu part.
> The linux kernel part is sent separatedly.
>
> Changes v1 -> v2:
> - split up patches for review
> - buffered file refactored
> - many bug fixes
>   Espcially PV drivers can work with postcopy
> - optimization/heuristic
>
> Patches
> 1 - 30: refactoring exsiting code and preparation
> 31 - 37: implement postcopy itself (essential part)
> 38 - 41: some optimization/heuristic for postcopy
>

After reviewing the changes.  I think we can merge the patches 1-30.
For the rest of them we still need another round of review /coding (at
least we need to implement the error handling).

IMHO, it makes no sense to add CONFIG_POSTCOPY, we can just compile the
code in.  Furthermore, we have not ifdefed the code calls on the common
code.  But that is just my opinion.

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
  2012-06-14 21:34   ` Juan Quintela
@ 2012-06-16  9:48     ` Isaku Yamahata
  2012-06-16 13:19       ` Juan Quintela
  0 siblings, 1 reply; 58+ messages in thread
From: Isaku Yamahata @ 2012-06-16  9:48 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, kvm, owasserm, avi, pbonzini, aliguori, stefanha,
	dlaor, mdroth, yoshikawa.takuya, benoit.hudzia, aarcange,
	t.hirofuchi, satoshi.itoh

On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote:
> Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> > +//#define DEBUG_UMEM
> > +#ifdef DEBUG_UMEM
> > +#include <sys/syscall.h>
> > +#define DPRINTF(format, ...)                                            \
> > +    do {                                                                \
> > +        printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid),    \
> > +               __func__, __LINE__, ## __VA_ARGS__);                     \
> > +    } while (0)
> 
> This should be in a header file that is linux specific?  And (at least
> on my systems) gettid is already defined on glibc.

I'll remove getpid/gettid. It was just for debugging in early phase.
They are not necessary any more.


> > +#else
> > +#define DPRINTF(format, ...)    do { } while (0)
> > +#endif
> 
> 
> > +
> > +#define DEV_UMEM        "/dev/umem"
> > +
> > +UMem *umem_new(void *hostp, size_t size)
> > +{
> > +    struct umem_init uinit = {
> > +        .size = size,
> > +    };
> > +    UMem *umem;
> > +
> > +    assert((size % getpagesize()) == 0);
> > +    umem = g_new(UMem, 1);
> > +    umem->fd = open(DEV_UMEM, O_RDWR);
> > +    if (umem->fd < 0) {
> > +        perror("can't open "DEV_UMEM);
> > +        abort();
> 
> Can we return one error insntead of abort?  the same for the rest of the
> file aborts.

Ok.


> > +size_t umem_pages_size(uint64_t nr)
> > +{
> > +    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);
> 
> Can we make sure that the pgoffs field is aligned?  I know that as it is
> now it is aligned, but better to be sure?

It is already done by gcc extension, zero length array.


> > +}
> > +
> > +static void umem_write_cmd(int fd, uint8_t cmd)
> > +{
> > +    DPRINTF("write cmd %c\n", cmd);
> > +
> > +    for (;;) {
> > +        ssize_t ret = write(fd, &cmd, 1);
> > +        if (ret == -1) {
> > +            if (errno == EINTR) {
> > +                continue;
> > +            } else if (errno == EPIPE) {
> > +                perror("pipe");
> > +                DPRINTF("write cmd %c %zd %d: pipe is closed\n",
> > +                        cmd, ret, errno);
> > +                break;
> > +            }
> 
> 
> Grr, we don't have a function that writes does a "safe_write".  The most
> similar thing in qemu looks to be send_all().

So we should introduce something like qemu_safe_write/read?


> > +
> > +            perror("pipe");
> 
> Can we make a different perror() message than previous error?
> 
> > +            DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno);
> > +            abort();
> > +        }
> > +
> > +        break;
> > +    }
> > +}
> > +
> > +static void umem_read_cmd(int fd, uint8_t expect)
> > +{
> > +    uint8_t cmd;
> > +    for (;;) {
> > +        ssize_t ret = read(fd, &cmd, 1);
> > +        if (ret == -1) {
> > +            if (errno == EINTR) {
> > +                continue;
> > +            }
> > +            perror("pipe");
> > +            DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno);
> > +            abort();
> > +        }
> > +
> > +        if (ret == 0) {
> > +            DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret);
> > +            abort();
> > +        }
> > +
> > +        break;
> > +    }
> > +
> > +    DPRINTF("read cmd %c\n", cmd);
> > +    if (cmd != expect) {
> > +        DPRINTF("cmd %c expect %d\n", cmd, expect);
> > +        abort();
> 
> Ouch.  If we receive garbage, we just exit?
> 
> I really think that we should implement error handling.
> 
> > +    }
> > +}
> > +
> > +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
> > +{
> > +    int ret;
> > +    uint64_t nr;
> > +    size_t size;
> > +    struct umem_pages *pages;
> > +
> > +    ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset);
> > +    *offset += sizeof(nr);
> > +    DPRINTF("ret %d nr %ld\n", ret, nr);
> > +    if (ret != sizeof(nr) || nr == 0) {
> > +        return NULL;
> > +    }
> > +
> > +    size = umem_pages_size(nr);
> > +    pages = g_malloc(size);
> 
> Just thinking about this.  Couldn't we just decide on a "big enough"
> buffer, and never send anything bigger than that?  That would remove the
> need to have to malloc()/free() a buffer for each reception?

Will try to address it.


> > +/* qemu side handler */
> > +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
> > +                                                int *offset)
> > +{
> > +    uint64_t i;
> > +    int page_shift = ffs(getpagesize()) - 1;
> > +    struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
> > +    if (pages == NULL) {
> > +        return NULL;
> > +    }
> > +
> > +    for (i = 0; i < pages->nr; i++) {
> > +        ram_addr_t addr = pages->pgoffs[i] << page_shift;
> > +
> > +        /* make pages present by forcibly triggering page fault. */
> > +        volatile uint8_t *ram = qemu_get_ram_ptr(addr);
> > +        uint8_t dummy_read = ram[0];
> > +        (void)dummy_read;   /* suppress unused variable warning */
> > +    }
> > +
> > +    /*
> > +     * Very Linux implementation specific.
> > +     * Make it sure that other thread doesn't fault on the above virtual
> > +     * address. (More exactly other thread doesn't call fault handler with
> > +     * the offset.)
> > +     * the fault handler is called with mmap_sem read locked.
> > +     * madvise() does down/up_write(mmap_sem)
> > +     */
> > +    qemu_madvise(NULL, 0, MADV_NORMAL);
> 
> If it is linux specific, should be inside CONFIG_LINUX ifdef, or a
> function hided on some header.

Good idea.


> Talking about looking, what protects that no other thread enters this
> function before this one calls madvise?   Or I am losing something obvious?

It is assumed that only main thread calls this function via iohandler.


> > +
> > +struct umem_pages {
> > +    uint64_t nr;
> > +    uint64_t pgoffs[0];
> > +};
> > +
> 
> QEMU really likes typedefs for structs.
> 
> Later, Juan.
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
  2012-06-16  9:48     ` Isaku Yamahata
@ 2012-06-16 13:19       ` Juan Quintela
  0 siblings, 0 replies; 58+ messages in thread
From: Juan Quintela @ 2012-06-16 13:19 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: benoit.hudzia, aarcange, aliguori, kvm, satoshi.itoh, stefanha,
	t.hirofuchi, dlaor, qemu-devel, mdroth, yoshikawa.takuya,
	owasserm, avi, pbonzini

Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote:

>> > +size_t umem_pages_size(uint64_t nr)
>> > +{
>> > +    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);
>> 
>> Can we make sure that the pgoffs field is aligned?  I know that as it is
>> now it is aligned, but better to be sure?
>
> It is already done by gcc extension, zero length array.

Ah, I didn't knew that propierty of the zero arrays extension.  thanks.

>> 
>> Grr, we don't have a function that writes does a "safe_write".  The most
>> similar thing in qemu looks to be send_all().
>
> So we should introduce something like qemu_safe_write/read?

I guess so.  If you look around, you will see that we have a lot of
cases where we have this pattern.  But that is not a problem ofthis
patch, was already there.

>
>> Talking about looking, what protects that no other thread enters this
>> function before this one calls madvise?   Or I am losing something obvious?
>
> It is assumed that only main thread calls this function via iohandler.

Ok.  Can we add a comment then?

Later, Juan.

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2012-06-16 13:19 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-04  9:57 [PATCH v2 00/41] postcopy live migration Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 04/41] arch_init: refactor host_from_stream_offset() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 06/41] arch_init: refactor ram_save_block() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 08/41] arch_init/ram_load: refactor ram_load Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 10/41] arch_init: simplify a bit by ram_find_block() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 11/41] arch_init: factor out counting transferred bytes Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 12/41] arch_init: factor out setting last_block, last_offset Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 14/41] exec.c: export last_ram_offset() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 23/41] migration.c: remove redundant line in migrate_init() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 25/41] migration: factor out parameters into MigrationParams Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 26/41] buffered_file: factor out buffer management logic Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 29/41] umem.h: import Linux umem.h Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 31/41] configure: add CONFIG_POSTCOPY option Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 32/41] savevm: add new section that is used by postcopy Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
2012-06-08 10:52   ` Juan Quintela
2012-06-08 16:07     ` [Qemu-devel] " Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 35/41] postcopy: introduce helper functions for postcopy Isaku Yamahata
2012-06-14 21:34   ` Juan Quintela
2012-06-16  9:48     ` Isaku Yamahata
2012-06-16 13:19       ` Juan Quintela
2012-06-04  9:57 ` [PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration Isaku Yamahata
2012-06-14 21:56   ` Juan Quintela
2012-06-14 21:58   ` Juan Quintela
2012-06-04  9:57 ` [PATCH v2 37/41] postcopy: implement outgoing " Isaku Yamahata
2012-06-14 22:12   ` Juan Quintela
2012-06-04  9:57 ` [PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 39/41] postcopy/outgoing: implement prefault Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 40/41] migrate: add -m (movebg) option to migrate command Isaku Yamahata
2012-06-04  9:57 ` [PATCH v2 41/41] migration/postcopy: add movebg mode Isaku Yamahata
2012-06-04 12:37 ` [PATCH v2 00/41] postcopy live migration Anthony Liguori
2012-06-04 13:38   ` Isaku Yamahata
2012-06-05 11:23     ` Dor Laor
2012-06-07  7:46   ` Orit Wasserman
2012-06-08 10:16   ` Juan Quintela
2012-06-08 10:23     ` Avi Kivity
2012-06-14 21:07       ` Juan Quintela
2012-06-14 22:18 ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).