All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
@ 2015-04-14 17:03 Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 01/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
                   ` (47 more replies)
  0 siblings, 48 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

  This is the 6th cut of my version of postcopy; it is designed for use with
the Linux kernel additions posted by Andrea Arcangeli here:

git clone --reference linux -b userfault18 
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git

(Note this is a different API from the last version)

This qemu series can be found at:

https://github.com/orbitfp7/qemu.git
on the wp3-postcopy-v6 tag.

It addresses some but not yet all of the previous review comments;
however there are a couple of large simplifications, so it seems
worth posting to meet the new kernel API and to stop people
reviewing deadcode.

Note: That the userfaultfd.h header is no longer included in this
tree:
      - if you're building with the appropriate kernel headers it should find it
      - if you're building on a host that doesn't have the kernel headers
        installed in the right place then:
           configure with:   --extra-cflags="-D__NR_userfaultfd=323"
           cp include/uapi/linux/userfaultfd.h into somewhere in the include
           path, e.g.  /usr/local/include/linux

v6
  Removed the PMI bitmaps
      - Andrea updated the kernel API so that userspace doesn't
        need to do wakeups, and thus QEMU doesn't need to keep
        track of which pages it's received; there is a price - which
        is we end up sending more dupes to the source, but it simplifies
        stuff a lot and makes the normal paths a lot quicker.
        (10s of line change in kernel, 10%-ish simplification in this code!)
  Changed discard message format to a simpler start/end address scheme
        and rework discard and chunking code to work in long's to match bitmap
  'qemu_get_buffer_less_copy' for postcopy pages
      - avoids a userspace copy since the kernel now does it
      - the new qemufile interface might also be useful for other places that
        don't need a copy (maybe xbzrle?)
  Changed the blockingness of the incoming fd
      it was incorrectly blocking during the precopy phase after a postcopy was
      enabled, causing the HMP to be unavailable.  It's now blocking only once
      the postcopy thread starts up, since it's not a coroutine it can't deal
      with the yields in qemu_file.
  An error on the return-path now marks the migration as failed

  Fixups from Dave Gibson's comments
    Removed can_postcopy, renamed save_complete to save_complete_precopy
        added save_complete_postcopy
    Simplified loadvm loop exits
    discard message format changes above
    and many more smaller changes.

  small fixups for RCU


This work has been partially funded by the EU Orbit project:
  see http://www.orbitproject.eu/about/

TODO:
  The major work is to rework the page send/receive loops so that supporting
  larger host pages doesn't make it quite as messy.

Dr. David Alan Gilbert (47):
  Start documenting how postcopy works.
  Split header writing out of qemu_savevm_state_begin
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Create MigrationIncomingState
  Provide runtime Target page information
  Move copy out of qemu_peek_buffer
  Add qemu_get_buffer_less_copy to avoid copies some of the time
  Add wrapper for setting blocking status on a QEMUFile
  Rename save_live_complete to save_live_complete_precopy
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  Move loadvm_handlers into MigrationIncomingState
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  migrate_init: Call from savevm
  Modify save_live_pending for postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  Add qemu_savevm_state_complete_postcopy
  Postcopy: Maintain sentmap and calculate discard
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: Postcopy startup in migration thread
  Postcopy end in migration_thread
  Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  qemu_ram_block_from_host
  Don't sync dirty bitmaps in postcopy
  Host page!=target page: Cleanup bitmaps
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_handle_ commands
  End of migration for postcopy
  Disable mlock around incoming postcopy
  Inhibit ballooning during postcopy

 arch_init.c                      | 868 ++++++++++++++++++++++++++++++++++++---
 balloon.c                        |  11 +
 docs/migration.txt               | 167 ++++++++
 exec.c                           |  74 +++-
 hmp-commands.hx                  |  15 +
 hmp.c                            |   7 +
 hmp.h                            |   1 +
 hw/ppc/spapr.c                   |   2 +-
 hw/virtio/virtio-balloon.c       |   4 +-
 include/exec/cpu-all.h           |   2 -
 include/exec/cpu-common.h        |   7 +-
 include/migration/migration.h    | 126 +++++-
 include/migration/postcopy-ram.h |  88 ++++
 include/migration/qemu-file.h    |  15 +-
 include/migration/vmstate.h      |  10 +-
 include/qemu/typedefs.h          |   5 +
 include/sysemu/balloon.h         |   2 +
 include/sysemu/sysemu.h          |  45 +-
 migration/Makefile.objs          |   2 +-
 migration/block.c                |   9 +-
 migration/migration.c            | 743 +++++++++++++++++++++++++++++++--
 migration/postcopy-ram.c         | 715 ++++++++++++++++++++++++++++++++
 migration/qemu-file-unix.c       | 106 ++++-
 migration/qemu-file.c            | 100 ++++-
 migration/rdma.c                 |   4 +-
 migration/vmstate.c              |   5 +-
 qapi-schema.json                 |  19 +-
 qmp-commands.hx                  |  19 +
 savevm.c                         | 809 ++++++++++++++++++++++++++++++++----
 trace-events                     |  77 +++-
 30 files changed, 3832 insertions(+), 225 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 01/47] Start documenting how postcopy works.
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin Dr. David Alan Gilbert (git)
                   ` (46 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..f975c75 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -294,3 +294,170 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way communication; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is made to postcopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal
+precopy, except for the addition of a 'postcopy advise' command at
+the beginning, to tell the destination that postcopy might happen.
+When postcopy starts the source sends the page discard data and then
+forms the 'package' containing:
+
+   Command: 'postcopy listen'
+   The device state
+      A series of sections, identical to the precopy streams device state stream
+      containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+------------------------------------------------------------------------------
+                        1      2   3     4 5                      6   7
+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
+thread                             |       |
+                                   |     (page request)
+                                   |        \___
+                                   v            \
+listen thread:                     --- page -- page -- page -- page -- page --
+
+                                   a   b        c
+------------------------------------------------------------------------------
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+
+On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package.   It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+
+=== Postcopy states ===
+
+Postcopy moves through a series of states (see postcopy_state) from
+ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't had the start command; here the destination
+          checks that its OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_ADVISE command)
+
+  Listen: The first command in the package, POSTCOPY_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+  Running: POSTCOPY_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+          state, the migration is now complete.
+
+=== Source side page maps ===
+
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that the contents of the sentmap are sacrificed during the calculation
+of the discard set and thus aren't valid once in postcopy.  The dirtymap
+is still valid and is used to ensure that no page is sent more than once.  Any
+request for a page that has already been sent is ignored.  Duplicate requests
+such as this can happen as a page is sent at about the same time the
+destination accesses it.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 01/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-11 11:16   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Split qemu_savevm_state_begin to:
  qemu_savevm_state_header   That writes the initial file header.
  qemu_savevm_state_begin    That sets up devices and does the first
                             device pass.

Used later in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 migration/migration.c   |  1 +
 savevm.c                | 11 ++++++++---
 trace-events            |  1 +
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8a52934..7a1ea91 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,7 @@ void qemu_announce_self(void);
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
+void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
diff --git a/migration/migration.c b/migration/migration.c
index bc42490..ce6c2e3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -617,6 +617,7 @@ static void *migration_thread(void *opaque)
     int64_t start_time = initial_time;
     bool old_vm_running = false;
 
+    qemu_savevm_state_header(s->file);
     qemu_savevm_state_begin(s->file, &s->params);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
diff --git a/savevm.c b/savevm.c
index 3b0e222..c08abcc 100644
--- a/savevm.c
+++ b/savevm.c
@@ -616,6 +616,13 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
+void qemu_savevm_state_header(QEMUFile *f)
+{
+    trace_savevm_state_header();
+    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+}
+
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params)
 {
@@ -630,9 +637,6 @@ void qemu_savevm_state_begin(QEMUFile *f,
         se->ops->set_params(params, se->opaque);
     }
 
-    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
-
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
 
@@ -842,6 +846,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     }
 
     qemu_mutex_unlock_iothread();
+    qemu_savevm_state_header(f);
     qemu_savevm_state_begin(f, &params);
     qemu_mutex_lock_iothread();
 
diff --git a/trace-events b/trace-events
index 30eba92..b4641b6 100644
--- a/trace-events
+++ b/trace-events
@@ -1174,6 +1174,7 @@ qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_state_begin(void) ""
+savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_complete(void) ""
 savevm_state_cancel(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 01/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-15 10:38   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration/rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 874ecfc..7693794 100644
--- a/exec.c
+++ b/exec.c
@@ -3067,14 +3067,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret = 0;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->used_length, opaque);
+        ret = func(block->idstr, block->host, block->offset,
+                   block->used_length, opaque);
+        if (ret) {
+            break;
+        }
     }
     rcu_read_unlock();
+    return ret;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index fcc3162..2abecac 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -125,10 +125,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 77e3444..c13ec6b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    rdma_add_block(opaque, host_addr, block_offset, length);
+    return rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-15 13:50   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

and use it in loadvm_state and ram_load.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   |  5 +----
 include/migration/qemu-file.h |  3 +++
 migration/qemu-file.c         | 16 ++++++++++++++++
 savevm.c                      | 11 ++++++-----
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4c8fcee..06722bb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1145,13 +1145,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             total_ram_bytes = addr;
             while (!ret && total_ram_bytes) {
                 RAMBlock *block;
-                uint8_t len;
                 char id[256];
                 ram_addr_t length;
 
-                len = qemu_get_byte(f);
-                qemu_get_buffer(f, (uint8_t *)id, len);
-                id[len] = 0;
+                qemu_get_counted_string(f, id);
                 length = qemu_get_be64(f);
 
                 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 745a850..236a2e4 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -309,4 +309,7 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
     qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+int qemu_get_counted_string(QEMUFile *f, char buf[256]);
+
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1a4f986..6c18e55 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -546,3 +546,19 @@ uint64_t qemu_get_be64(QEMUFile *f)
     v |= qemu_get_be32(f);
     return v;
 }
+
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: 0 on success and a 0 terminated string in the buffer
+ */
+int qemu_get_counted_string(QEMUFile *f, char buf[256])
+{
+    unsigned int len = qemu_get_byte(f);
+    int res = qemu_get_buffer(f, (uint8_t *)buf, len);
+
+    buf[len] = 0;
+
+    return res != len;
+}
+
diff --git a/savevm.c b/savevm.c
index c08abcc..9795e2e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -969,8 +969,7 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
-        char idstr[257];
-        int len;
+        char idstr[256];
 
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
@@ -978,9 +977,11 @@ int qemu_loadvm_state(QEMUFile *f)
         case QEMU_VM_SECTION_FULL:
             /* Read section start */
             section_id = qemu_get_be32(f);
-            len = qemu_get_byte(f);
-            qemu_get_buffer(f, (uint8_t *)idstr, len);
-            idstr[len] = 0;
+            if (qemu_get_counted_string(f, idstr)) {
+                error_report("Unable to read ID string for section %u",
+                            section_id);
+                return -EINVAL;
+            }
             instance_id = qemu_get_be32(f);
             version_id = qemu_get_be32(f);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-18  6:58   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  9 +++++++++
 include/qemu/typedefs.h       |  1 +
 migration/migration.c         | 28 ++++++++++++++++++++++++++++
 savevm.c                      |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index bf09968..7a6f521 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -42,6 +42,15 @@ struct MigrationParams {
 
 typedef struct MigrationState MigrationState;
 
+/* State for the incoming migration */
+struct MigrationIncomingState {
+    QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_get_current(void);
+MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
+void migration_incoming_state_destroy(void);
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index cde3314..74dfad3 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -38,6 +38,7 @@ typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
diff --git a/migration/migration.c b/migration/migration.c
index ce6c2e3..ce488cf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -45,6 +45,7 @@ static bool deferred_incoming;
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
 
+/* For outgoing */
 MigrationState *migrate_get_current(void)
 {
     static MigrationState current_migration = {
@@ -57,6 +58,28 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+/* For incoming */
+static MigrationIncomingState *mis_current;
+
+MigrationIncomingState *migration_incoming_get_current(void)
+{
+    return mis_current;
+}
+
+MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
+{
+    mis_current = g_malloc0(sizeof(MigrationIncomingState));
+    mis_current->file = f;
+
+    return mis_current;
+}
+
+void migration_incoming_state_destroy(void)
+{
+    g_free(mis_current);
+    mis_current = NULL;
+}
+
 /*
  * Called on -incoming with a defer: uri.
  * The migration can be started later after any parameters have been
@@ -101,9 +124,14 @@ static void process_incoming_migration_co(void *opaque)
     Error *local_err = NULL;
     int ret;
 
+    migration_incoming_state_new(f);
+
     ret = qemu_loadvm_state(f);
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
+    migration_incoming_state_destroy();
+
     if (ret < 0) {
         error_report("load of migration failed: %s", strerror(-ret));
         exit(EXIT_FAILURE);
diff --git a/savevm.c b/savevm.c
index 9795e2e..81f6a29 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1320,9 +1320,11 @@ int load_vmstate(const char *name)
     }
 
     qemu_system_reset(VMRESET_SILENT);
+    migration_incoming_state_new(f);
     ret = qemu_loadvm_state(f);
 
     qemu_fclose(f);
+    migration_incoming_state_destroy();
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-18  7:06   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into arch_init.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c                  | 10 ++++++++++
 include/sysemu/sysemu.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/exec.c b/exec.c
index 7693794..c3027cf 100644
--- a/exec.c
+++ b/exec.c
@@ -3038,6 +3038,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
     }
     return 0;
 }
+
+/*
+ * Allows code that needs to deal with migration bitmaps etc to still be built
+ * target independent.
+ */
+size_t qemu_target_page_bits(void)
+{
+    return TARGET_PAGE_BITS;
+}
+
 #endif
 
 /*
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7a1ea91..bd67f86 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -68,6 +68,7 @@ int qemu_reset_requested_get(void);
 void qemu_system_killed(int signal, pid_t pid);
 void qemu_devices_reset(void);
 void qemu_system_reset(bool report);
+size_t qemu_target_page_bits(void);
 
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-21  6:47   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

qemu_peek_buffer currently copies the data it reads into a buffer,
however the next patch wants access to the buffer without the copy,
hence rework to remove the copy to the layer above.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 +-
 migration/qemu-file.c         | 12 +++++++-----
 migration/vmstate.c           |  5 +++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 236a2e4..3fe545e 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -157,7 +157,7 @@ static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 void qemu_put_be16(QEMUFile *f, unsigned int v);
 void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
-int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 /*
  * Note that you can only peek continuous bytes from where the current pointer
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 6c18e55..8dc5767 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -348,14 +348,14 @@ void qemu_file_skip(QEMUFile *f, int size)
 }
 
 /*
- * Read 'size' bytes from file (at 'offset') into buf without moving the
- * pointer.
+ * Read 'size' bytes from file (at 'offset') without moving the
+ * pointer and set 'buf' to point to that data.
  *
  * It will return size bytes unless there was an error, in which case it will
  * return as many as it managed to read (assuming blocking fd's which
  * all current QEMUFile are)
  */
-int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset)
 {
     int pending;
     int index;
@@ -391,7 +391,7 @@ int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
         size = pending;
     }
 
-    memcpy(buf, f->buf + index, size);
+    *buf = f->buf + index;
     return size;
 }
 
@@ -410,11 +410,13 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 
     while (pending > 0) {
         int res;
+        uint8_t *src;
 
-        res = qemu_peek_buffer(f, buf, MIN(pending, IO_BUF_SIZE), 0);
+        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
         if (res == 0) {
             return done;
         }
+        memcpy(buf, src, res);
         qemu_file_skip(f, res);
         buf += res;
         pending -= res;
diff --git a/migration/vmstate.c b/migration/vmstate.c
index e5388f0..a64ebcc 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -358,7 +358,7 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
     trace_vmstate_subsection_load(vmsd->name);
 
     while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) {
-        char idstr[256];
+        char idstr[256], *idstr_ret;
         int ret;
         uint8_t version_id, len, size;
         const VMStateDescription *sub_vmsd;
@@ -369,11 +369,12 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
             trace_vmstate_subsection_load_bad(vmsd->name, "(short)");
             return 0;
         }
-        size = qemu_peek_buffer(f, (uint8_t *)idstr, len, 2);
+        size = qemu_peek_buffer(f, (uint8_t **)&idstr_ret, len, 2);
         if (size != len) {
             trace_vmstate_subsection_load_bad(vmsd->name, "(peek fail)");
             return 0;
         }
+        memcpy(idstr, idstr_ret, size);
         idstr[size] = 0;
 
         if (strncmp(vmsd->name, idstr, strlen(vmsd->name)) != 0) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-21  7:09   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

qemu_get_buffer always copies the data it reads to a users buffer,
however in many cases the file buffer inside qemu_file could be given
back to the caller, avoiding the copy.  This isn't always possible
depending on the size and alignment of the data.

Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
buffer or updates a pointer to the internal buffer if convenient.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 migration/qemu-file.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 3fe545e..4cac58f 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -159,6 +159,8 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
+int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size);
+
 /*
  * Note that you can only peek continuous bytes from where the current pointer
  * is; you aren't guaranteed to be able to peak to +n bytes unless you've
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 8dc5767..ec3a598 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -426,6 +426,51 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 }
 
 /*
+ * Read 'size' bytes of data from the file.
+ * 'size' can be larger than the internal buffer.
+ *
+ * The data:
+ *   may be held on an internal buffer (in which case *buf is updated
+ *     to point to it) that is valid until the next qemu_file operation.
+ * OR
+ *   will be copied to the *buf that was passed in.
+ *
+ * The code tries to avoid the copy if possible.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size)
+{
+    int pending = size;
+    int done = 0;
+    bool first = true;
+
+    while (pending > 0) {
+        int res;
+        uint8_t *src;
+
+        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
+        if (res == 0) {
+            return done;
+        }
+        qemu_file_skip(f, res);
+        done += res;
+        pending -= res;
+        if (first && res == size) {
+            *buf = src;
+            return done;
+        } else {
+            first = false;
+            memcpy(buf, src, res);
+            buf += res;
+        }
+    }
+    return done;
+}
+
+/*
  * Peeks a single byte from the buffer; this isn't guaranteed to work if
  * offset leaves a gap after the previous read/peeked data.
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-18  7:35   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to change the blocking status on a QEMUFile
rather than having to use qemu_set_block(qemu_get_fd(f));
it seems best to avoid exposing the fd since not all QEMUFile's
really have one.  With this wrapper we could move the implementation
down to be different on different transports.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  1 +
 migration/qemu-file.c         | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 4cac58f..c14555d 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -190,6 +190,7 @@ int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_file_change_blocking(QEMUFile *f, bool block);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index ec3a598..d84830f 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -609,3 +609,18 @@ int qemu_get_counted_string(QEMUFile *f, char buf[256])
     return res != len;
 }
 
+/*
+ * Change the blocking state of the QEMUFile.
+ * Note: On some transports the OS only keeps a single blocking state for
+ *       both directions, and thus changing the blocking on the main
+ *       QEMUFile can also affect the return path.
+ */
+void qemu_file_change_blocking(QEMUFile *f, bool block)
+{
+    if (block) {
+        qemu_set_block(qemu_get_fd(f));
+    } else {
+        qemu_set_nonblock(qemu_get_fd(f));
+    }
+}
+
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-18  7:35   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy we're going to need to perform the complete phase
for postcopiable devices at a different point, start out by
renaming all of the 'complete's to make the difference obvious.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  2 +-
 hw/ppc/spapr.c              |  2 +-
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  2 +-
 migration/block.c           |  2 +-
 migration/migration.c       |  2 +-
 savevm.c                    | 10 +++++-----
 trace-events                |  2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 06722bb..3a21f0e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1233,7 +1233,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete = ram_save_complete,
+    .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 61ddc79..20a1187 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1373,7 +1373,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_htab_handlers = {
     .save_live_setup = htab_save_setup,
     .save_live_iterate = htab_save_iterate,
-    .save_live_complete = htab_save_complete,
+    .save_live_complete_precopy = htab_save_complete,
     .load_state = htab_load,
 };
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index bc7616a..55cd174 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,7 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
-    int (*save_live_complete)(QEMUFile *f, void *opaque);
+    int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
     bool (*is_active)(void *opaque);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index bd67f86..8402e6e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,7 +87,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
-void qemu_savevm_state_complete(QEMUFile *f);
+void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
diff --git a/migration/block.c b/migration/block.c
index 085c0fa..00f4998 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -878,7 +878,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .set_params = block_set_params,
     .save_live_setup = block_save_setup,
     .save_live_iterate = block_save_iterate,
-    .save_live_complete = block_save_complete,
+    .save_live_complete_precopy = block_save_complete,
     .save_live_pending = block_save_pending,
     .load_state = block_load,
     .cancel = block_migration_cancel,
diff --git a/migration/migration.c b/migration/migration.c
index ce488cf..872d1e1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -671,7 +671,7 @@ static void *migration_thread(void *opaque)
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
+                    qemu_savevm_state_complete_precopy(s->file);
                 }
                 qemu_mutex_unlock_iothread();
 
diff --git a/savevm.c b/savevm.c
index 81f6a29..eba9174 100644
--- a/savevm.c
+++ b/savevm.c
@@ -720,19 +720,19 @@ static bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc;
 }
 
-void qemu_savevm_state_complete(QEMUFile *f)
+void qemu_savevm_state_complete_precopy(QEMUFile *f)
 {
     QJSON *vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    trace_savevm_state_complete();
+    trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -745,7 +745,7 @@ void qemu_savevm_state_complete(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
-        ret = se->ops->save_live_complete(f, se->opaque);
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
@@ -858,7 +858,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 
     ret = qemu_file_get_error(f);
     if (ret == 0) {
-        qemu_savevm_state_complete(f);
+        qemu_savevm_state_complete_precopy(f);
         ret = qemu_file_get_error(f);
     }
     if (ret != 0) {
diff --git a/trace-events b/trace-events
index b4641b6..39957fe 100644
--- a/trace-events
+++ b/trace-events
@@ -1176,7 +1176,7 @@ savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, sectio
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-savevm_state_complete(void) ""
+savevm_state_complete_precopy(void) ""
 savevm_state_cancel(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-06-10  9:00   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 12/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 migration/qemu-file-unix.c    | 65 +++++++++++++++++++++++++++++++++++++------
 migration/qemu-file.c         | 12 ++++++++
 3 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c14555d..1ff8fbc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                uint64_t *bytes_sent);
 
 /*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
+/*
  * Stop any read or write (depending on flags) on the underlying
  * transport on the QEMUFile.
  * Existing blocking reads/writes must be woken
@@ -102,6 +107,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
     QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
@@ -189,6 +195,7 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_change_blocking(QEMUFile *f, bool block);
 
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index bfbc086..1e7de7b 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -96,6 +96,52 @@ static int socket_shutdown(void *opaque, bool rd, bool wr)
     }
 }
 
+/*
+ * Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *qfs = opaque;
+    int revfd;
+    bool this_is_read;
+    QEMUFile *result;
+
+    if (qemu_file_get_error(qfs->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    /* I don't think there's a better way to tell which direction 'this' is */
+    this_is_read = qfs->file->ops->get_buffer != NULL;
+
+    revfd = dup(qfs->fd);
+    if (revfd == -1) {
+        error_report("Error duplicating fd for return path: %s",
+                      strerror(errno));
+        return NULL;
+    }
+
+    result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+
+    if (!result) {
+        close(revfd);
+    }
+
+    if (this_is_read) {
+        /* The qemu_fopen_socket "wb" will mark the socket blocking,
+         * which would be OK for the return path, but the semantics
+         * of non-blocking is that it follows the underlying connection
+         * not the fd number, and thus setting the return path non-blocking
+         * ends up setting the forward path blocking, which we don't want
+         */
+        qemu_set_nonblock(revfd);
+    }
+
+
+    return result;
+}
+
 static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                   int64_t pos)
 {
@@ -204,18 +250,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd     = socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close      = socket_close,
-    .shut_down  = socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd        = socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close         = socket_close,
-    .shut_down     = socket_shutdown
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 QEMUFile *qemu_fopen_socket(int fd, const char *mode)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d84830f..8b2ae8d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -42,6 +42,18 @@ int qemu_file_shutdown(QEMUFile *f)
     return f->ops->shut_down(f->opaque, true, true);
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 12/47] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 13/47] Migration commands Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The destination sets the fd to non-blocking on incoming migrations;
this also affects the return path from the destination, and thus we
need to make sure we can safely write to the return path.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qemu-file-unix.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index 1e7de7b..6b024e5 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -39,12 +39,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
-    }
-    return len;
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN && err != EWOULDBLOCK) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
+     }
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 13/47] Migration commands
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 12/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  7 +++++++
 savevm.c                      | 47 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 4 files changed, 56 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 7a6f521..f221c99 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -34,6 +34,7 @@
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
 #define QEMU_VM_VMDESCRIPTION        0x06
+#define QEMU_VM_COMMAND              0x07
 
 struct MigrationParams {
     bool blk;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8402e6e..e82b205 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -82,6 +82,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    MIG_CMD_INVALID = 0,   /* Must be 0 */
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -90,6 +95,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/savevm.c b/savevm.c
index eba9174..2dc5fbb 100644
--- a/savevm.c
+++ b/savevm.c
@@ -602,6 +602,24 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
     vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
 }
 
+
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, (uint16_t)command);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -927,6 +945,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * negative return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t com;
+    uint16_t len;
+
+    com = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    trace_loadvm_process_command(com, len);
+    switch (com) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 typedef struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -1042,6 +1083,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
diff --git a/trace-events b/trace-events
index 39957fe..4d093dc 100644
--- a/trace-events
+++ b/trace-events
@@ -1171,6 +1171,7 @@ vmware_setmode(uint32_t w, uint32_t h, uint32_t bpp) "%dx%d @ %d bpp"
 qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_state_begin(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 14/47] Return path: Control commands
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 13/47] Migration commands Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPEN_RETURN_PATH - To request that the destination open the return path
   * PING - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  6 ++++-
 savevm.c                      | 59 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  2 ++
 4 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f221c99..e2e251d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -46,6 +46,8 @@ typedef struct MigrationState MigrationState;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
+
+    QEMUFile *return_path;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e82b205..49ba134 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,7 +84,9 @@ void qemu_announce_self(void);
 
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
-    MIG_CMD_INVALID = 0,   /* Must be 0 */
+    MIG_CMD_INVALID = 0,       /* Must be 0 */
+    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
+    MIG_CMD_PING,              /* Request a PONG on the RP */
 };
 
 bool qemu_savevm_state_blocked(Error **errp);
@@ -97,6 +99,8 @@ void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/savevm.c b/savevm.c
index 2dc5fbb..4dc8f06 100644
--- a/savevm.c
+++ b/savevm.c
@@ -620,6 +620,20 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    trace_savevm_send_ping(value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, MIG_CMD_PING, 4, (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_open_return_path(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -945,20 +959,65 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t com;
     uint16_t len;
+    uint32_t tmp32;
 
     com = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
     trace_loadvm_process_command(com, len);
     switch (com) {
+    case MIG_CMD_OPEN_RETURN_PATH:
+        if (loadvm_process_command_simple_lencheck("CMD_OPEN_RETURN_PATH",
+                                                   len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPEN_RETURN_PATH failed");
+            return -1;
+        }
+        break;
+
+    case MIG_CMD_PING:
+        if (loadvm_process_command_simple_lencheck("CMD_PING", len, 4)) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        trace_loadvm_process_command_ping(tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_PING (0x%x) received with no return path",
+                         tmp32);
+            return -1;
+        }
+        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
diff --git a/trace-events b/trace-events
index 4d093dc..0f74836 100644
--- a/trace-events
+++ b/trace-events
@@ -1172,8 +1172,10 @@ qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command_ping(uint32_t val) "%x"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
+savevm_send_ping(uint32_t val) "%x"
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 15/47] Return path: Send responses from destination to source
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
  Use it in the MSG_RP_PING handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 17 ++++++++++++++++
 migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
 savevm.c                      |  2 +-
 trace-events                  |  1 +
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e2e251d..6300ec1 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,6 +41,13 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Messages sent on the return path from destination to source */
+enum mig_rp_message_type {
+    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
+    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
+};
+
 typedef struct MigrationState MigrationState;
 
 /* State for the incoming migration */
@@ -48,6 +55,7 @@ struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -164,6 +172,15 @@ int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value);
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 872d1e1..db9471d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -70,6 +70,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -162,6 +163,50 @@ void process_incoming_migration(QEMUFile *f)
     qemu_coroutine_enter(co, f);
 }
 
+/*
+ * Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data)
+{
+    trace_migrate_send_rp_message((int)message_type, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)message_type);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  None-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_SHUT, sizeof(buf), &buf);
+}
+
+/*
+ * Send a 'PONG' message on the return channel with the given value
+ * (normally in response to a 'PING')
+ */
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
+}
+
 /* amount of nanoseconds we are willing to wait for migration to be down.
  * the choice of nanoseconds is because it is the maximum resolution that
  * get_clock() can achieve. It is an internal measure. All user-visible
diff --git a/savevm.c b/savevm.c
index 4dc8f06..f6b8b90 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1016,7 +1016,7 @@ static int loadvm_process_command(QEMUFile *f)
                          tmp32);
             return -1;
         }
-        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        migrate_send_rp_pong(mis, tmp32);
         break;
 
     default:
diff --git a/trace-events b/trace-events
index 0f74836..9f0a071 100644
--- a/trace-events
+++ b/trace-events
@@ -1383,6 +1383,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 16/47] Return path: Source handling of return path
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 migration/migration.c         | 177 +++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  12 +++
 3 files changed, 196 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 6300ec1..0719d82 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -73,6 +73,14 @@ struct MigrationState
 
     int state;
     MigrationParams params;
+
+    /* State related to return path */
+    struct {
+        QEMUFile     *file;
+        QemuThread    rp_thread;
+        bool          error;
+    } rp_state;
+
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration/migration.c b/migration/migration.c
index db9471d..88355e2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -243,6 +243,23 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
     return head;
 }
 
+/*
+ * Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -365,6 +382,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    QEMUFile *rp = ms->rp_state.file;
+
+    /*
+     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
+     * cleaned up from a few threads; make sure not to do it twice in parallel
+     */
+    rp = atomic_cmpxchg(&ms->rp_state.file, rp, NULL);
+    if (rp) {
+        trace_migrate_fd_cleanup_src_rp();
+        qemu_fclose(rp);
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -372,6 +404,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -410,6 +444,11 @@ static void migrate_fd_cancel(MigrationState *s)
     QEMUFile *f = migrate_get_current()->file;
     trace_migrate_fd_cancel();
 
+    if (s->rp_state.file) {
+        /* shutdown the rp socket, so causing the rp thread to shutdown */
+        qemu_file_shutdown(s->rp_state.file);
+    }
+
     do {
         old_state = s->state;
         if (old_state != MIGRATION_STATUS_SETUP &&
@@ -678,8 +717,144 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ */
+static void *source_return_path_thread(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->rp_state.file;
+    uint16_t expected_len, header_len, header_type;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    trace_source_return_path_thread_entry();
+    while (rp && !qemu_file_get_error(rp) &&
+        migration_already_active(ms)) {
+        trace_source_return_path_thread_loop_top();
+        header_type = qemu_get_be16(rp);
+        header_len = qemu_get_be16(rp);
+
+        switch (header_type) {
+        case MIG_RP_MSG_SHUT:
+        case MIG_RP_MSG_PONG:
+            expected_len = 4;
+            break;
+
+        default:
+            error_report("RP: Received invalid message 0x%04x length 0x%04x",
+                    header_type, header_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        if (header_len > expected_len) {
+            error_report("RP: Received message 0x%04x with"
+                    "incorrect length %d expecting %d",
+                    header_type, header_len,
+                    expected_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, header_len);
+        if (res != header_len) {
+            trace_source_return_path_thread_failed_read_cmd_data();
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the message and the data */
+        switch (header_type) {
+        case MIG_RP_MSG_SHUT:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_shut(tmp32);
+            if (tmp32) {
+                error_report("RP: Sibling indicated error %d", tmp32);
+                source_return_path_bad(ms);
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RP_MSG_PONG:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_pong(tmp32);
+            break;
+
+        default:
+            break;
+        }
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        trace_source_return_path_thread_bad_end();
+        source_return_path_bad(ms);
+    }
+
+    trace_source_return_path_thread_end();
+out:
+    return NULL;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+static int open_return_path_on_source(MigrationState *ms)
+{
+
+    ms->rp_state.file = qemu_file_get_return_path(ms->file);
+    if (!ms->rp_state.file) {
+        return -1;
+    }
+
+    trace_open_return_path_on_source();
+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 
+    trace_open_return_path_on_source_continue();
+
+    return 0;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
+static int await_return_path_close_on_source(MigrationState *ms)
+{
+    /*
+     * If this is a normal exit then the destination will send a SHUT and the
+     * rp_thread will exit, however if there's an error we need to cause
+     * it to exit, which we can do by a shutdown.
+     * (canceling must also shutdown to stop us getting stuck here if
+     * the destination died at just the wrong place)
+     */
+    if (qemu_file_get_error(ms->file) && ms->rp_state.file) {
+        qemu_file_shutdown(ms->rp_state.file);
+    }
+    trace_await_return_path_close_on_source_joining();
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    trace_await_return_path_close_on_source_close();
+    return ms->rp_state.error;
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
diff --git a/trace-events b/trace-events
index 9f0a071..eb40e61 100644
--- a/trace-events
+++ b/trace-events
@@ -1378,12 +1378,24 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
 flic_reset_failed(int err) "flic: reset failed %d"
 
 # migration.c
+await_return_path_close_on_source_close(void) ""
+await_return_path_close_on_source_joining(void) ""
 migrate_set_state(int new_state) "new state %d"
 migrate_fd_cleanup(void) ""
+migrate_fd_cleanup_src_rp(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+open_return_path_on_source(void) ""
+open_return_path_on_source_continue(void) ""
+source_return_path_thread_bad_end(void) ""
+source_return_path_thread_end(void) ""
+source_return_path_thread_entry(void) ""
+source_return_path_thread_failed_read_cmd_data(void) ""
+source_return_path_thread_loop_top(void) ""
+source_return_path_thread_pong(uint32_t val) "%x"
+source_return_path_thread_shut(uint32_t val) "%x"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-05-21  9:21   ` Amit Shah
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 18/47] Move loadvm_handlers into MigrationIncomingState Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Misses out lines that are all the expected value so the output
can be quite compact depending on the circumstance.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 40 +++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h |  1 +
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 3a21f0e..2b0cd18 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -833,13 +833,51 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
-
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
  * granularity of these critical sections.
  */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of and it won't bother printing lines that are all this value
+ * if 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur+linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur+curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found = found || (thisbit != expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
+        }
+    }
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0719d82..fb7551d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -152,6 +152,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 18/47] Move loadvm_handlers into MigrationIncomingState
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy we need the loadvm_handlers to be used in a couple
of different instances of the loadvm loop/routine, and thus
it can't be local any more.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h |  5 +++++
 include/migration/vmstate.h   |  2 ++
 include/qemu/typedefs.h       |  1 +
 migration/migration.c         |  2 ++
 savevm.c                      | 28 ++++++++++++++++------------
 5 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index fb7551d..92a6068 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -50,10 +50,15 @@ enum mig_rp_message_type {
 
 typedef struct MigrationState MigrationState;
 
+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    /* See savevm.c */
+    LoadStateEntry_Head loadvm_handlers;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 55cd174..b86b3d9 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -812,6 +812,8 @@ extern const VMStateInfo vmstate_info_bitmap;
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
+void loadvm_free_handlers(MigrationIncomingState *mis);
+
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                        void *opaque, int version_id);
 void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 74dfad3..6fdcbcd 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -31,6 +31,7 @@ typedef struct I2CBus I2CBus;
 typedef struct I2SCodec I2SCodec;
 typedef struct ISABus ISABus;
 typedef struct ISADevice ISADevice;
+typedef struct LoadStateEntry LoadStateEntry;
 typedef struct MACAddr MACAddr;
 typedef struct MachineClass MachineClass;
 typedef struct MachineState MachineState;
diff --git a/migration/migration.c b/migration/migration.c
index 88355e2..bcad9a4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -70,6 +70,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
+    QLIST_INIT(&mis_current->loadvm_handlers);
     qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
@@ -77,6 +78,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
 
 void migration_incoming_state_destroy(void)
 {
+    loadvm_free_handlers(mis_current);
     g_free(mis_current);
     mis_current = NULL;
 }
diff --git a/savevm.c b/savevm.c
index f6b8b90..ef174d7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1027,18 +1027,26 @@ static int loadvm_process_command(QEMUFile *f)
     return 0;
 }
 
-typedef struct LoadStateEntry {
+struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
     int section_id;
     int version_id;
-} LoadStateEntry;
+};
 
-int qemu_loadvm_state(QEMUFile *f)
+void loadvm_free_handlers(MigrationIncomingState *mis)
 {
-    QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-        QLIST_HEAD_INITIALIZER(loadvm_handlers);
     LoadStateEntry *le, *new_le;
+
+    QLIST_FOREACH_SAFE(le, &mis->loadvm_handlers, entry, new_le) {
+        QLIST_REMOVE(le, entry);
+        g_free(le);
+    }
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     uint8_t section_type;
     unsigned int v;
@@ -1069,6 +1077,7 @@ int qemu_loadvm_state(QEMUFile *f)
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
+        LoadStateEntry *le;
         char idstr[256];
 
         trace_qemu_loadvm_state_section(section_type);
@@ -1110,7 +1119,7 @@ int qemu_loadvm_state(QEMUFile *f)
             le->se = se;
             le->section_id = section_id;
             le->version_id = version_id;
-            QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+            QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
@@ -1124,7 +1133,7 @@ int qemu_loadvm_state(QEMUFile *f)
             section_id = qemu_get_be32(f);
 
             trace_qemu_loadvm_state_section_partend(section_id);
-            QLIST_FOREACH(le, &loadvm_handlers, entry) {
+            QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
                 if (le->section_id == section_id) {
                     break;
                 }
@@ -1178,11 +1187,6 @@ int qemu_loadvm_state(QEMUFile *f)
     ret = 0;
 
 out:
-    QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-        QLIST_REMOVE(le, entry);
-        g_free(le);
-    }
-
     if (ret == 0) {
         /* We may not have a VMDESC section, so ignore relative errors */
         ret = file_error_after_eof;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 19/47] Rework loadvm path for subloops
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 18/47] Move loadvm_handlers into MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and cause the parent loops to
exit as well.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   6 ++
 migration/migration.c         |   2 +
 savevm.c                      | 125 +++++++++++++++++++++++-------------------
 trace-events                  |   4 ++
 4 files changed, 81 insertions(+), 56 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 92a6068..ae85958 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -59,6 +59,12 @@ struct MigrationIncomingState {
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 
+    /*
+     * Free at the start of the main state load, set as the main thread finishes
+     * loading state.
+     */
+    QemuEvent      main_thread_load_event;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
diff --git a/migration/migration.c b/migration/migration.c
index bcad9a4..01ed1d0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,12 +72,14 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
     mis_current->file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
     qemu_mutex_init(&mis_current->rp_mutex);
+    qemu_event_init(&mis_current->main_thread_load_event, false);
 
     return mis_current;
 }
 
 void migration_incoming_state_destroy(void)
 {
+    qemu_event_destroy(&mis_current->main_thread_load_event);
     loadvm_free_handlers(mis_current);
     g_free(mis_current);
     mis_current = NULL;
diff --git a/savevm.c b/savevm.c
index ef174d7..e7d42dc 100644
--- a/savevm.c
+++ b/savevm.c
@@ -959,6 +959,13 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+enum LoadVMExitCodes {
+    /* Allow a command to quit all layers of nested loadvm loops */
+    LOADVM_QUIT     =  1,
+};
+
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -974,7 +981,9 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
- * negative return on error (will issue error message)
+ * 0           just a normal return
+ * LOADVM_QUIT All good, but exit the loop
+ * <0          Error
  */
 static int loadvm_process_command(QEMUFile *f)
 {
@@ -1044,36 +1053,12 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
     }
 }
 
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
-    MigrationIncomingState *mis = migration_incoming_get_current();
-    Error *local_err = NULL;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-    int file_error_after_eof = -1;
-
-    if (qemu_savevm_state_blocked(&local_err)) {
-        error_report_err(local_err);
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        error_report("Not a migration stream");
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        error_report("Unsupported migration stream version");
-        return -ENOTSUP;
-    }
 
+    trace_qemu_loadvm_state_main();
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
@@ -1101,16 +1086,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                              version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1125,7 +1108,7 @@ int qemu_loadvm_state(QEMUFile *f)
             if (ret < 0) {
                 error_report("error while loading state for instance 0x%x of"
                              " device '%s'", instance_id, idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1140,54 +1123,84 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("error while loading state section id %d(%s)",
                              section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            trace_qemu_loadvm_state_section_command(ret);
+            if ((ret < 0) || (ret & LOADVM_QUIT)) {
+                return ret;
             }
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
 
-    file_error_after_eof = qemu_file_get_error(f);
+    return 0;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
+    unsigned int v;
+    int ret;
 
-    /*
-     * Try to read in the VMDESC section as well, so that dumping tools that
-     * intercept our migration stream have the chance to see it.
-     */
-    if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {
-        uint32_t size = qemu_get_be32(f);
-        uint8_t *buf = g_malloc(0x1000);
+    if (qemu_savevm_state_blocked(&local_err)) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
 
-        while (size > 0) {
-            uint32_t read_chunk = MIN(size, 0x1000);
-            qemu_get_buffer(f, buf, read_chunk);
-            size -= read_chunk;
-        }
-        g_free(buf);
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        error_report("Not a migration stream");
+        return -EINVAL;
     }
 
-    cpu_synchronize_all_post_init();
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        error_report("Unsupported migration stream version");
+        return -ENOTSUP;
+    }
 
-    ret = 0;
+    ret = qemu_loadvm_state_main(f, mis);
+    qemu_event_set(&mis->main_thread_load_event);
 
-out:
+    trace_qemu_loadvm_state_post_main(ret);
     if (ret == 0) {
+        int file_error_after_eof = qemu_file_get_error(f);
+
+        /*
+         * Try to read in the VMDESC section as well, so that dumping tools that
+         * intercept our migration stream have the chance to see it.
+         */
+        if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {
+            uint32_t size = qemu_get_be32(f);
+            uint8_t *buf = g_malloc(0x1000);
+
+            while (size > 0) {
+                uint32_t read_chunk = MIN(size, 0x1000);
+                qemu_get_buffer(f, buf, read_chunk);
+                size -= read_chunk;
+            }
+            g_free(buf);
+        }
+
+        cpu_synchronize_all_post_init();
         /* We may not have a VMDESC section, so ignore relative errors */
         ret = file_error_after_eof;
     }
diff --git a/trace-events b/trace-events
index eb40e61..e343c1a 100644
--- a/trace-events
+++ b/trace-events
@@ -1169,7 +1169,11 @@ vmware_setmode(uint32_t w, uint32_t h, uint32_t bpp) "%dx%d @ %d bpp"
 
 # savevm.c
 qemu_loadvm_state_section(unsigned int section_type) "%d"
+qemu_loadvm_state_section_command(int ret) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
+qemu_loadvm_state_main(void) ""
+qemu_loadvm_state_main_quit_parent(void) ""
+qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 20/47] Add migration-capability boolean for postcopy-ram.
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The 'postcopy ram' capability allows postcopy migration of RAM;
note that the migration starts off in precopy mode until
postcopy mode is triggered (see the migrate_start_postcopy
patch later in the series).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/migration.h | 1 +
 migration/migration.c         | 9 +++++++++
 qapi-schema.json              | 7 ++++++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ae85958..5858788 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -179,6 +179,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_zero_blocks(void);
 
 bool migrate_auto_converge(void);
diff --git a/migration/migration.c b/migration/migration.c
index 01ed1d0..f641fc7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -685,6 +685,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     max_downtime = (uint64_t)value;
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index ac9594d..dcd3e62 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -518,10 +518,15 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has
+#          been migrated, pulling the remaining pages along as needed. NOTE: If
+#          the migration fails during postcopy the VM will fail.  (since 2.4)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
+           'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 22/47] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The state of the postcopy process is managed via a series of messages;
   * Add wrappers and handlers for sending/receiving these messages
   * Add state variable that track the current state of postcopy

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  15 +++
 include/sysemu/sysemu.h       |  20 ++++
 migration/migration.c         |  13 +++
 savevm.c                      | 247 ++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  10 ++
 5 files changed, 305 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5858788..e3389dc 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -52,6 +52,14 @@ typedef struct MigrationState MigrationState;
 
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 
+typedef enum {
+    POSTCOPY_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+    POSTCOPY_INCOMING_ADVISE,
+    POSTCOPY_INCOMING_LISTENING,
+    POSTCOPY_INCOMING_RUNNING,
+    POSTCOPY_INCOMING_END
+} PostcopyState;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -59,6 +67,8 @@ struct MigrationIncomingState {
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 
+    PostcopyState postcopy_state;
+
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
@@ -220,4 +230,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              uint64_t *bytes_sent);
 
+PostcopyState postcopy_state_get(MigrationIncomingState *mis);
+
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(MigrationIncomingState *mis,
+                                 PostcopyState new_state);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 49ba134..6dd2382 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,17 @@ enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,       /* Must be 0 */
     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
     MIG_CMD_PING,              /* Request a PONG on the RP */
+
+    MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
+                                      warn we might want to do PC */
+    MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
+                                      pages as it's running. */
+    MIG_CMD_POSTCOPY_RUN,          /* Start execution */
+
+    MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
+                                      were previously sent during
+                                      precopy but are dirty. */
+
 };
 
 bool qemu_savevm_state_blocked(Error **errp);
@@ -101,6 +112,15 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+void qemu_savevm_send_postcopy_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_run(QEMUFile *f);
+
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *end_list);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/migration.c b/migration/migration.c
index f641fc7..b72a4c7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -995,3 +995,16 @@ void migrate_fd_connect(MigrationState *s)
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
 }
+
+PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
+{
+    return atomic_fetch_add(&mis->postcopy_state, 0);
+}
+
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(MigrationIncomingState *mis,
+                                 PostcopyState new_state)
+{
+    return atomic_xchg(&mis->postcopy_state, new_state);
+}
+
diff --git a/savevm.c b/savevm.c
index e7d42dc..8d2fe1f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -39,6 +39,7 @@
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -634,6 +635,77 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* Send prior to any postcopy transfer */
+void qemu_savevm_send_postcopy_advise(QEMUFile *f)
+{
+    uint64_t tmp[2];
+    tmp[0] = cpu_to_be64(getpagesize());
+    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
+
+    trace_qemu_savevm_send_postcopy_advise();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *)tmp);
+}
+
+/* Sent prior to starting the destination running in postcopy, discard pages
+ * that have already been sent but redirtied on the source.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *      byte   version (0)
+ *      byte   Length of name field (not including 0)
+ *  n x byte   RAM block name
+ *      byte   0 terminator (just for safety)
+ *  n x        Byte ranges within the named RAMBlock
+ *      be64   Start of the range
+ *      be64   end of the range + 1
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  start_list: 'len' addresses
+ *  end_list: 'len' addresses
+ *
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *end_list)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+    uint16_t t;
+    size_t name_len = strlen(name);
+
+    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
+    buf = g_malloc0(len*16 + name_len + 3);
+    buf[0] = 0; /* Version */
+    assert(name_len < 256);
+    buf[1] = name_len;
+    memcpy(buf+2, name, name_len);
+    tmplen = 2+name_len;
+    buf[tmplen++] = '\0';
+
+    for (t = 0; t < len; t++) {
+        cpu_to_be64w((uint64_t *)(buf + tmplen), start_list[t]);
+        tmplen += 8;
+        cpu_to_be64w((uint64_t *)(buf + tmplen), end_list[t]);
+        tmplen += 8;
+    }
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RAM_DISCARD, tmplen, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive postcopy data. */
+void qemu_savevm_send_postcopy_listen(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_listen();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_run(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_run();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -966,6 +1038,154 @@ enum LoadVMExitCodes {
 
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 
+/* ------ incoming postcopy messages ------ */
+/* 'advise' arrives before any transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
+                                         uint64_t remote_hps,
+                                         uint64_t remote_tps)
+{
+    PostcopyState ps = postcopy_state_get(mis);
+    trace_loadvm_postcopy_handle_advise();
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (remote_hps != getpagesize())  {
+        /*
+         * Some combinations of mismatch are probably possible but it gets
+         * a bit more complicated.  In particular we need to place whole
+         * host pages on the dest at once, and we need to ensure that we
+         * handle dirtying to make sure we never end up sending part of
+         * a hostpage on it's own.
+         */
+        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
+                     (int)remote_hps, getpagesize());
+        return -1;
+    }
+
+    if (remote_tps != (1ul << qemu_target_page_bits())) {
+        /*
+         * Again, some differences could be dealt with, but for now keep it
+         * simple.
+         */
+        error_report("Postcopy needs matching target page sizes (s=%d d=%d)",
+                     (int)remote_tps, 1 << qemu_target_page_bits());
+        return -1;
+    }
+
+    postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    char ramid[256];
+    PostcopyState ps = postcopy_state_get(mis);
+
+    trace_loadvm_postcopy_ram_handle_discard();
+
+    if (ps != POSTCOPY_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     ps);
+        return -1;
+    }
+    /* We're expecting a
+     *    Version (0)
+     *    a RAM ID string (length byte, name, 0 term)
+     *    then at least 1 16 byte chunk
+    */
+    if (len < 20) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+
+    if (qemu_get_counted_string(mis->file, ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD missing nil (%d)", tmp);
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len % 16) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
+    while (len) {
+        /* TODO - ram_discard_range gets added in a later patch
+        uint64_t start_addr, end_addr;
+        start_addr = qemu_get_be64(mis->file);
+        end_addr = qemu_get_be64(mis->file);
+
+        len -= 16;
+        int ret = ram_discard_range(mis, ramid, start_addr, end_addr - 1);
+        if (ret) {
+            return ret;
+        }
+        */
+    }
+    trace_loadvm_postcopy_ram_handle_discard_end();
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive postcopy data */
+static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_LISTENING);
+    trace_loadvm_postcopy_handle_listen();
+    if (ps != POSTCOPY_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_RUNNING);
+    trace_loadvm_postcopy_handle_run();
+    if (ps != POSTCOPY_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -991,6 +1211,7 @@ static int loadvm_process_command(QEMUFile *f)
     uint16_t com;
     uint16_t len;
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
 
     com = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -1028,6 +1249,32 @@ static int loadvm_process_command(QEMUFile *f)
         migrate_send_rp_pong(mis, tmp32);
         break;
 
+    case MIG_CMD_POSTCOPY_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_ADVISE",
+                                                   len, 16)) {
+            return -1;
+        }
+        tmp64a = qemu_get_be64(f); /* hps */
+        tmp64b = qemu_get_be64(f); /* tps */
+        return loadvm_postcopy_handle_advise(mis, tmp64a, tmp64b);
+
+    case MIG_CMD_POSTCOPY_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_handle_listen(mis);
+
+    case MIG_CMD_POSTCOPY_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_handle_run(mis);
+
+    case MIG_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", com, len);
         return -1;
diff --git a/trace-events b/trace-events
index e343c1a..26625be 100644
--- a/trace-events
+++ b/trace-events
@@ -1175,11 +1175,21 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_postcopy_handle_advise(void) ""
+loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_ram_handle_discard(void) ""
+loadvm_postcopy_ram_handle_discard_end(void) ""
+loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+qemu_savevm_send_postcopy_advise(void) ""
+qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_send_ping(uint32_t val) "%x"
+savevm_send_postcopy_listen(void) ""
+savevm_send_postcopy_run(void) ""
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 22/47] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
stream inside a package whose length can be determined purely by reading
its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
is read off the stream prior to parsing the contents.

This is used by postcopy to load device state (from the package)
while leaving the main stream free to receive memory pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 savevm.c                | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events            |  4 +++
 3 files changed, 102 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 6dd2382..0e3bf1e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,7 @@ enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,       /* Must be 0 */
     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
     MIG_CMD_PING,              /* Request a PONG on the RP */
+    MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
 
     MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
                                       warn we might want to do PC */
@@ -100,6 +101,8 @@ enum qemu_vm_cmd {
 
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -112,6 +115,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 8d2fe1f..1e940af 100644
--- a/savevm.c
+++ b/savevm.c
@@ -635,6 +635,50 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ *
+ * Returns:
+ *    0 on success
+ *    -ve on error
+ */
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    if (len > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("%s: Unreasonably large packaged state: %zu",
+                     __func__, len);
+        return -1;
+    }
+
+    tmp = cpu_to_be32(len);
+
+    trace_qemu_savevm_send_packaged();
+    qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+
+    return 0;
+}
+
 /* Send prior to any postcopy transfer */
 void qemu_savevm_send_postcopy_advise(QEMUFile *f)
 {
@@ -1199,6 +1243,48 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    trace_loadvm_handle_cmd_packaged(length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    trace_loadvm_handle_cmd_packaged_received(ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    if (!qsb) {
+        error_report("Unable to create qsb");
+    }
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, mis);
+    trace_loadvm_handle_cmd_packaged_main(ret);
+    qemu_fclose(packf);
+    qsb_free(qsb);
+
+    return ret;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
@@ -1249,6 +1335,14 @@ static int loadvm_process_command(QEMUFile *f)
         migrate_send_rp_pong(mis, tmp32);
         break;
 
+    case MIG_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_PACKAGED",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32);
+
     case MIG_CMD_POSTCOPY_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_ADVISE",
                                                    len, 16)) {
diff --git a/trace-events b/trace-events
index 26625be..4e53ad8 100644
--- a/trace-events
+++ b/trace-events
@@ -1175,6 +1175,10 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+qemu_savevm_send_packaged(void) ""
+loadvm_handle_cmd_packaged(unsigned int length) "%u"
+loadvm_handle_cmd_packaged_main(int ret) "%d"
+loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 23/47] migrate_init: Call from savevm
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 22/47] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 24/47] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h | 3 +--
 include/qemu/typedefs.h       | 1 +
 migration/migration.c         | 2 +-
 savevm.c                      | 2 ++
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e3389dc..1b9a535 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -48,8 +48,6 @@ enum mig_rp_message_type {
     MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
 };
 
-typedef struct MigrationState MigrationState;
-
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 
 typedef enum {
@@ -148,6 +146,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 6fdcbcd..611db46 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -41,6 +41,7 @@ typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
 typedef struct MSIMessage MSIMessage;
diff --git a/migration/migration.c b/migration/migration.c
index b72a4c7..45284b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -500,7 +500,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index 1e940af..c281d1b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -988,6 +988,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(errp)) {
         return -EINVAL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 24/47] Modify save_live_pending for postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Modify save_live_pending to return separate postcopiable and
non-postcopiable counts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  8 ++++++--
 include/migration/vmstate.h |  5 +++--
 include/sysemu/sysemu.h     |  4 +++-
 migration/block.c           |  7 +++++--
 migration/migration.c       |  9 +++++++--
 savevm.c                    | 21 +++++++++++++++++----
 trace-events                |  2 +-
 7 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2b0cd18..977e98b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1053,7 +1053,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                             uint64_t *non_postcopiable_pending,
+                             uint64_t *postcopiable_pending)
 {
     uint64_t remaining_size;
 
@@ -1067,7 +1069,9 @@ static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
         qemu_mutex_unlock_iothread();
         remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
     }
-    return remaining_size;
+
+    *non_postcopiable_pending = 0;
+    *postcopiable_pending = remaining_size;
 }
 
 static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b86b3d9..50efb09 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,8 +54,9 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
-    uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    void (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size,
+                              uint64_t *non_postcopiable_pending,
+                              uint64_t *postcopiable_pending);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0e3bf1e..e45ef62 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -110,7 +110,9 @@ void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/migration/block.c b/migration/block.c
index 00f4998..802dbfa 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -755,7 +755,9 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                               uint64_t *non_postcopiable_pending,
+                               uint64_t *postcopiable_pending)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -774,7 +776,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
     qemu_mutex_unlock_iothread();
 
     DPRINTF("Enter save live pending  %" PRIu64 "\n", pending);
-    return pending;
+    *non_postcopiable_pending = pending;
+    *postcopiable_pending = 0;
 }
 
 static int block_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/migration/migration.c b/migration/migration.c
index 45284b2..ae737d1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -889,8 +889,13 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
-            trace_migrate_pending(pending_size, max_size);
+            uint64_t pend_post, pend_nonpost;
+
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
+            trace_migrate_pending(pending_size, max_size,
+                                  pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/savevm.c b/savevm.c
index c281d1b..79bbded 100644
--- a/savevm.c
+++ b/savevm.c
@@ -950,10 +950,20 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t tmp_non_postcopiable, tmp_postcopiable;
+
+    *res_non_postcopiable = 0;
+    *res_postcopiable = 0;
+
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -964,9 +974,12 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        se->ops->save_live_pending(f, se->opaque, max_size,
+                                   &tmp_non_postcopiable, &tmp_postcopiable);
+
+        *res_postcopiable += tmp_postcopiable;
+        *res_non_postcopiable += tmp_non_postcopiable;
     }
-    return ret;
 }
 
 void qemu_savevm_state_cancel(void)
diff --git a/trace-events b/trace-events
index 4e53ad8..4f94cf4 100644
--- a/trace-events
+++ b/trace-events
@@ -1403,7 +1403,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_cleanup_src_rp(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
-migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 25/47] postcopy: OS support test
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 24/47] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  19 +++++
 migration/Makefile.objs          |   2 +-
 migration/postcopy-ram.c         | 157 +++++++++++++++++++++++++++++++++++++++
 savevm.c                         |   5 ++
 4 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..d81934f
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return true if the host supports everything we need to do postcopy-ram */
+bool postcopy_ram_supported_by_host(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..0cac6d7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,7 +1,7 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
-common-obj-y += xbzrle.o
+common-obj-y += xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
new file mode 100644
index 0000000..7704bc1
--- /dev/null
+++ b/migration/postcopy-ram.c
@@ -0,0 +1,157 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2015 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <asm/types.h> /* for __u64 */
+#endif
+
+#if defined(__linux__) && defined(__NR_userfaultfd)
+#include <linux/userfaultfd.h>
+
+static bool ufd_version_check(int ufd)
+{
+    struct uffdio_api api_struct;
+    uint64_t feature_mask;
+
+    api_struct.api = UFFD_API;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+                     strerror(errno));
+        return false;
+    }
+
+    feature_mask = (__u64)1 << _UFFDIO_REGISTER |
+                   (__u64)1 << _UFFDIO_UNREGISTER;
+    if ((api_struct.ioctls & feature_mask) != feature_mask) {
+        error_report("Missing userfault features: %" PRIx64,
+                     (uint64_t)(~api_struct.ioctls & feature_mask));
+        return false;
+    }
+
+    return true;
+}
+
+bool postcopy_ram_supported_by_host(void)
+{
+    long pagesize = getpagesize();
+    int ufd = -1;
+    bool ret = false; /* Error unless we change it */
+    void *testarea = NULL;
+    struct uffdio_register reg_struct;
+    struct uffdio_range range_struct;
+    uint64_t feature_mask;
+
+    if ((1ul << qemu_target_page_bits()) > pagesize) {
+        error_report("Target page size bigger than host page size");
+        goto out;
+    }
+
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        error_report("%s: userfaultfd not available: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+
+    /* Version and features check */
+    if (!ufd_version_check(ufd)) {
+        goto out;
+    }
+
+    /*
+     *  We need to check that the ops we need are supported on anon memory
+     *  To do that we need to register a chunk and see the flags that
+     *  are returned.
+     */
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (testarea == MAP_FAILED) {
+        error_report("%s: Failed to map test area: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    reg_struct.range.start = (uintptr_t)testarea;
+    reg_struct.range.len = pagesize;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    if (ioctl(ufd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    range_struct.start = (uintptr_t)testarea;
+    range_struct.len = pagesize;
+    if (ioctl(ufd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s userfault unregister: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    feature_mask = (__u64)1 << _UFFDIO_WAKE |
+                   (__u64)1 << _UFFDIO_COPY |
+                   (__u64)1 << _UFFDIO_ZEROPAGE;
+    if ((reg_struct.ioctls & feature_mask) != feature_mask) {
+        error_report("Missing userfault map features: %" PRIx64,
+                     (uint64_t)(~reg_struct.ioctls & feature_mask));
+        goto out;
+    }
+
+    /* Success! */
+    ret = true;
+out:
+    if (testarea) {
+        munmap(testarea, pagesize);
+    }
+    if (ufd != -1) {
+        close(ufd);
+    }
+    return ret;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+bool postcopy_ram_supported_by_host(void)
+{
+    error_report("%s: No OS support", __func__);
+    return false;
+}
+
+#endif
+
diff --git a/savevm.c b/savevm.c
index 79bbded..23cc99e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -33,6 +33,7 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
@@ -1113,6 +1114,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (!postcopy_ram_supported_by_host()) {
+        return -1;
+    }
+
     if (remote_hps != getpagesize())  {
         /*
          * Some combinations of mismatch are probably possible but it gets
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:38   ` Eric Blake
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration/migration.c         | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3089533..ff620ce 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -993,6 +993,21 @@ Enable/Disable the usage of a capability @var{capability} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index f31ae27..60e4411 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1184,6 +1184,13 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict)
     }
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 2b9308b..c79a7b5 100644
--- a/hmp.h
+++ b/hmp.h
@@ -65,6 +65,7 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1b9a535..4db9393 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -110,6 +110,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index ae737d1..17da8ab 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -377,6 +377,28 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIGRATION_STATUS_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    atomic_set(&s->start_postcopy, true);
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index dcd3e62..faf572f 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -566,6 +566,14 @@
 { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.3
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 3a42ad0..d564d7b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -713,6 +713,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_input_query_migrate_cache_size,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:40   ` Eric Blake
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 28/47] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h |  2 ++
 migration/migration.c         | 56 ++++++++++++++++++++++++++++++++++++-------
 qapi-schema.json              |  4 +++-
 trace-events                  |  1 +
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4db9393..b9d028c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -153,6 +153,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_bytes_remaining(void);
diff --git a/migration/migration.c b/migration/migration.c
index 17da8ab..d69e102 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -255,6 +255,7 @@ static bool migration_already_active(MigrationState *ms)
 {
     switch (ms->state) {
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -325,6 +326,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -366,8 +400,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -442,7 +475,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIGRATION_STATUS_ACTIVE);
+    assert((s->state != MIGRATION_STATUS_ACTIVE) &&
+           (s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE));
 
     if (s->state != MIGRATION_STATUS_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -477,8 +511,7 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIGRATION_STATUS_SETUP &&
-            old_state != MIGRATION_STATUS_ACTIVE) {
+        if (!migration_already_active(s)) {
             break;
         }
         migrate_set_state(s, old_state, MIGRATION_STATUS_CANCELLING);
@@ -522,6 +555,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -593,8 +631,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIGRATION_STATUS_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -906,7 +943,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
-    while (s->state == MIGRATION_STATUS_ACTIVE) {
+    trace_migration_thread_setup_complete();
+
+    while (s->state == MIGRATION_STATUS_ACTIVE ||
+           s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
diff --git a/qapi-schema.json b/qapi-schema.json
index faf572f..a7afc97 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -424,6 +424,8 @@
 #
 # @active: in the process of doing migration.
 #
+# @postcopy-active: as active, but now in postcopy mode.
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -433,7 +435,7 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed' ] }
 
 ##
 # @MigrationInfo
diff --git a/trace-events b/trace-events
index 4f94cf4..8983bde 100644
--- a/trace-events
+++ b/trace-events
@@ -1405,6 +1405,7 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 source_return_path_thread_bad_end(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 28/47] Add qemu_savevm_state_complete_postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 29/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add qemu_savevm_state_complete_postcopy to complement
qemu_savevm_state_complete_precopy together with a new
save_live_complete_postcopy method on devices.

The save_live_complete_precopy method is called on
all devices during a precopy migration, and all non-postcopy
devices during a postcopy migration at the transition.

The save_live_complete_postcopy method is called at
the end of postcopy for all postcopiable devices.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                 |  1 +
 include/migration/vmstate.h |  1 +
 include/sysemu/sysemu.h     |  1 +
 savevm.c                    | 50 ++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 977e98b..0a49ace 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1275,6 +1275,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
+    .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 50efb09..06bed0a 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,6 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
+    int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
     int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e45ef62..248f0d6 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -108,6 +108,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
+void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
diff --git a/savevm.c b/savevm.c
index 23cc99e..c2d6241 100644
--- a/savevm.c
+++ b/savevm.c
@@ -866,7 +866,46 @@ int qemu_savevm_state_iterate(QEMUFile *f)
 static bool should_send_vmdesc(void)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
-    return !machine->suppress_vmdesc;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
+    return !machine->suppress_vmdesc && !in_postcopy;
+}
+
+/*
+ * Calls the save_live_complete_postcopy methods
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls qemu_savevm_state_complete_precopy to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_complete_postcopy(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete_postcopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
 }
 
 void qemu_savevm_state_complete_precopy(QEMUFile *f)
@@ -875,13 +914,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_precopy) {
+        if (!se->ops || !se->ops->save_live_complete_precopy ||
+            (in_postcopy && se->ops->save_live_complete_postcopy)) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -935,7 +976,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
         trace_savevm_section_end(se->idstr, se->section_id, 0);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
 
     json_end_array(vmdesc);
     qjson_finish(vmdesc);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 29/47] Postcopy: Maintain sentmap and calculate discard
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 28/47] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                      | 202 ++++++++++++++++++++++++++++++++++++++-
 include/migration/migration.h    |  12 +++
 include/migration/postcopy-ram.h |  35 +++++++
 include/qemu/typedefs.h          |   1 +
 migration/migration.c            |   1 +
 migration/postcopy-ram.c         | 108 +++++++++++++++++++++
 savevm.c                         |   2 -
 trace-events                     |   5 +
 8 files changed, 362 insertions(+), 4 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0a49ace..efc2938 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -40,6 +40,7 @@
 #include "hw/audio/audio.h"
 #include "sysemu/kvm.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "hw/i386/smbios.h"
 #include "exec/address-spaces.h"
 #include "hw/audio/pcspk.h"
@@ -443,9 +444,17 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return 1;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * ram_addr_abs: Pointer into which to store the address of the dirty page
+ *               within the global ram_addr space
+ *
+ * Returns: byte offset within memory region of the start of a dirty page
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 ram_addr_t *ram_addr_abs)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -464,6 +473,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -603,6 +613,19 @@ static void migration_bitmap_sync(void)
     }
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /**
  * ram_save_page: Send the given page to the stream
  *
@@ -713,13 +736,16 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     bool complete_round = false;
     int pages = 0;
     MemoryRegion *mr;
+    ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
+                                 ram_addr_t space */
 
     if (!block)
         block = QLIST_FIRST_RCU(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                       &dirty_ram_abs);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -738,6 +764,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
 
             /* if page is unmodified, continue to the next */
             if (pages > 0) {
+                MigrationState *ms = migrate_get_current();
+                if (ms->sentmap) {
+                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -799,12 +830,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -878,6 +916,160 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/* **** functions for postcopy ***** */
+
+/*
+ * Callback from postcopy_each_ram_send_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+static int postcopy_send_discard_bm_ram(MigrationState *ms,
+                                        PostcopyDiscardState *pds,
+                                        unsigned long start, unsigned long end)
+{
+    unsigned long current;
+
+    for (current = start; current <= end; ) {
+        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
+
+        if (set <= end) {
+            unsigned long zero = find_next_zero_bit(ms->sentmap,
+                                                    end + 1, set + 1);
+
+            if (zero > end) {
+                zero = end + 1;
+            }
+            postcopy_discard_send_range(ms, pds, set, zero - 1);
+            current = zero + 1;
+        } else {
+            current = set;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+static int postcopy_each_ram_send_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->max_length-1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first,
+                                                               block->idstr);
+
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        ret = postcopy_send_discard_bm_ram(ms, pds, first, last);
+        postcopy_discard_send_finish(ms, pds);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    int ret;
+
+    rcu_read_lock();
+    /* This should be our last sync, the src is now paused */
+    migration_bitmap_sync();
+
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     */
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap,
+               last_ram_offset() >> TARGET_PAGE_BITS);
+
+
+    trace_ram_postcopy_send_discard_bitmap();
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
+#endif
+
+    ret = postcopy_each_ram_send_discard(ms);
+    rcu_read_unlock();
+
+    return ret;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive byte address range within the RAMBlock
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      uint64_t start, uint64_t end)
+{
+    int ret = -1;
+
+    assert(end >= start);
+
+    rcu_read_lock();
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        goto err;
+    }
+
+    uint8_t *host_startaddr = rb->host + start;
+    uint8_t *host_endaddr;
+
+    if ((uintptr_t)host_startaddr & (qemu_host_page_size - 1)) {
+        error_report("ram_discard_range: Unaligned start address: %p",
+                     host_startaddr);
+        goto err;
+    }
+
+    if (end <= rb->used_length) {
+        host_endaddr   = rb->host + end;
+        if (((uintptr_t)host_endaddr + 1) & (qemu_host_page_size - 1)) {
+            error_report("ram_discard_range: Unaligned end address: %p",
+                         host_endaddr);
+            goto err;
+        }
+        ret = postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%" PRIu64
+                     "/%" PRIu64 "/%zu)",
+                     block_name, start, end, rb->used_length);
+    }
+
+err:
+    rcu_read_unlock();
+
+    return ret;
+}
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
@@ -929,6 +1121,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index b9d028c..15707fc 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -113,6 +113,13 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
+
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     * where it's used to send the dirtymap at the start
+     * of the postcopy phase
+     */
+    unsigned long *sentmap;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -178,6 +185,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+/* For outgoing discard bitmap */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms);
+/* For incoming postcopy discard */
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index d81934f..1d38f76 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -16,4 +16,39 @@
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(void);
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * 'offset' is the bitmap offset of the named RAMBlock in the migration
+ * bitmap.
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name);
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ * 'start' and 'end' describe an inclusive range of pages in the
+ * migration bitmap in the RAM block passed to postcopy_discard_send_init
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long start, unsigned long end);
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms,
+                                  PostcopyDiscardState *pds);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 611db46..5f130fe 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -61,6 +61,7 @@ typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCMCIACardState PCMCIACardState;
 typedef struct PixelFormat PixelFormat;
+typedef struct PostcopyDiscardState PostcopyDiscardState;
 typedef struct PropertyInfo PropertyInfo;
 typedef struct Property Property;
 typedef struct QEMUBH QEMUBH;
diff --git a/migration/migration.c b/migration/migration.c
index d69e102..63205c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 7704bc1..a10f3ca 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,22 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 
+#define MAX_DISCARDS_PER_COMMAND 12
+
+struct PostcopyDiscardState {
+    const char *name;
+    uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */
+    uint16_t cur_entry;
+    /*
+     * Start and end address of a discard range; end_list points to the byte
+     * after the end of the range.
+     */
+    uint64_t start_list[MAX_DISCARDS_PER_COMMAND];
+    uint64_t   end_list[MAX_DISCARDS_PER_COMMAND];
+    unsigned int nsentwords;
+    unsigned int nsentcmds;
+};
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -144,6 +160,22 @@ out:
     return ret;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    trace_postcopy_ram_discard_range(start, end);
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -153,5 +185,81 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    assert(0);
+}
 #endif
 
+/* ------------------------------------------------------------------------- */
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * 'offset' is the bitmap offset of the named RAMBlock in the migration
+ * bitmap.
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name)
+{
+    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));
+
+    if (res) {
+        res->name = name;
+        res->cur_entry = 0;
+        res->nsentwords = 0;
+        res->nsentcmds = 0;
+        res->offset = offset;
+    }
+
+    return res;
+}
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ * 'start' and 'end' describe an inclusive range of pages in the
+ * migration bitmap in the RAM block passed to postcopy_discard_send_init
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long start, unsigned long end)
+{
+    size_t tp_bits = qemu_target_page_bits();
+    /* Convert to byte offsets within the RAM block */
+    pds->start_list[pds->cur_entry] = (start - pds->offset) << tp_bits;
+    pds->end_list[pds->cur_entry] = (1 + end - pds->offset) << tp_bits;
+    pds->cur_entry++;
+    pds->nsentwords++;
+
+    if (pds->cur_entry == MAX_DISCARDS_PER_COMMAND) {
+        /* Full set, ship it! */
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list, pds->end_list);
+        pds->nsentcmds++;
+        pds->cur_entry = 0;
+    }
+}
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
+{
+    /* Anything unsent? */
+    if (pds->cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list, pds->end_list);
+        pds->nsentcmds++;
+    }
+
+    trace_postcopy_discard_send_finish(pds->name, pds->nsentwords,
+                                       pds->nsentcmds);
+
+    g_free(pds);
+}
diff --git a/savevm.c b/savevm.c
index c2d6241..b7f0846 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1242,7 +1242,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     }
     trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
     while (len) {
-        /* TODO - ram_discard_range gets added in a later patch
         uint64_t start_addr, end_addr;
         start_addr = qemu_get_be64(mis->file);
         end_addr = qemu_get_be64(mis->file);
@@ -1252,7 +1251,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
         if (ret) {
             return ret;
         }
-        */
     }
     trace_loadvm_postcopy_ram_handle_discard_end();
 
diff --git a/trace-events b/trace-events
index 8983bde..6ab309d 100644
--- a/trace-events
+++ b/trace-events
@@ -1219,6 +1219,7 @@ qemu_file_fclose(void) ""
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
+ram_postcopy_send_discard_bitmap(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
@@ -1479,6 +1480,10 @@ rdma_start_incoming_migration_after_rdma_listen(void) ""
 rdma_start_outgoing_migration_after_rdma_connect(void) ""
 rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
+# migration/postcopy-ram.c
+postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
+postcopy_ram_discard_range(void *start, void *end) "%p,%p"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 30/47] postcopy: Incoming initialisation
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 29/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch_init.c                      |  11 ++++
 include/migration/migration.h    |   3 +
 include/migration/postcopy-ram.h |  12 ++++
 migration/postcopy-ram.c         | 116 +++++++++++++++++++++++++++++++++++++++
 savevm.c                         |   4 ++
 trace-events                     |   2 +
 6 files changed, 148 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index efc2938..2c937d1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1353,6 +1353,17 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 15707fc..8c8afc4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -73,6 +73,8 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    /* For the kernel to send us notifications */
+    int            userfault_fd;
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 };
@@ -190,6 +192,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 1d38f76..b46af08 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,18 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Discard the contents of memory start..end inclusive.
  * We can assume that if we've been called postcopy_ram_hosttest returned true
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a10f3ca..16b78c2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -176,6 +176,111 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    trace_postcopy_init_area(block_name, host_addr, offset, length);
+
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     * (Precopy will just overwrite this data, so doesn't need the discard)
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can atomically place the
+     * 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+#ifdef MADV_NOHUGEPAGE
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+#endif
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_range range_struct;
+    trace_postcopy_cleanup_area(block_name, host_addr, offset, length);
+
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+#ifdef MADV_HUGEPAGE
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+#endif
+
+    /*
+     * We can also turn off userfault now since we should have all the
+     * pages.   It can be useful to leave it on to debug postcopy
+     * if you're not sure it's always getting every page.
+     */
+    range_struct.start = (uintptr_t)host_addr;
+    range_struct.len = length;
+
+    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s: userfault unregister %s", __func__, strerror(errno));
+
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -185,6 +290,17 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                uint8_t *end)
 {
diff --git a/savevm.c b/savevm.c
index b7f0846..c383ce0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1185,6 +1185,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
     postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
 
     return 0;
diff --git a/trace-events b/trace-events
index 6ab309d..b2099ee 100644
--- a/trace-events
+++ b/trace-events
@@ -1482,7 +1482,9 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
 # migration/postcopy-ram.c
 postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
+postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
+postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 31/47] postcopy: ram_enable_notify to switch on userfault
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 32/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Mark the area of RAM as 'userfault'
Start up a fault-thread to handle any userfaults we might receive
from it (to be filled in later)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h    |  3 ++
 include/migration/postcopy-ram.h |  6 ++++
 migration/postcopy-ram.c         | 69 +++++++++++++++++++++++++++++++++++++++-
 savevm.c                         |  9 ++++++
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8c8afc4..36451de 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -73,6 +73,9 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     QEMUFile *return_path;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index b46af08..88793b3 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,12 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+
+/*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
  * postcopy later; must be called prior to any precopy.
  * called from arch_init's similarly named ram_postcopy_incoming_init
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 16b78c2..1be3bc9 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -281,9 +281,71 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: MigrationIncomingState pointer
+ * Returns 0 on success
+ */
+static int ram_block_enable_notify(const char *block_name, void *host_addr,
+                                   ram_addr_t offset, ram_addr_t length,
+                                   void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_register reg_struct;
+
+    reg_struct.range.start = (uintptr_t)host_addr;
+    reg_struct.range.len = length;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    /* Now tell our userfault_fd that it's responsible for this area */
+    if (ioctl(mis->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+    qemu_sem_destroy(&mis->fault_thread_sem);
+
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 bool postcopy_ram_supported_by_host(void)
 {
     error_report("%s: No OS support", __func__);
@@ -306,6 +368,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 {
     assert(0);
 }
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    assert(0);
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/savevm.c b/savevm.c
index c383ce0..f606ce8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1271,6 +1271,15 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
     /* TODO start up the postcopy listening thread */
     return 0;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 32/47] Postcopy: Postcopy startup in migration thread
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 33/47] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Rework the migration thread to setup and start postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   3 +
 migration/migration.c         | 163 ++++++++++++++++++++++++++++++++++++++++--
 trace-events                  |   4 ++
 3 files changed, 165 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 36451de..c02266e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -119,6 +119,9 @@ struct MigrationState
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
 
+    /* Flag set once the migration thread is running (and needs joining) */
+    bool started_migration_thread;
+
     /* bitmap of pages that have been sent at least once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
diff --git a/migration/migration.c b/migration/migration.c
index 63205c3..611aca8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -469,7 +469,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->started_migration_thread) {
+            qemu_thread_join(&s->thread);
+            s->started_migration_thread = false;
+        }
         qemu_mutex_lock_iothread();
 
         qemu_fclose(s->file);
@@ -886,7 +889,6 @@ out:
     return NULL;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 static int open_return_path_on_source(MigrationState *ms)
 {
 
@@ -925,23 +927,141 @@ static int await_return_path_close_on_source(MigrationState *ms)
 }
 
 /*
+ * Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms, bool *old_vm_running)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    trace_postcopy_start();
+    qemu_mutex_lock_iothread();
+    trace_postcopy_start_set_run();
+
+    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+    *old_vm_running = runstate_is_running();
+
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (ram_postcopy_send_discard_bitmap(ms)) {
+        error_report("postcopy send discard bitmap failed");
+        goto fail;
+    }
+
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    /* Ping just for debugging, helps line traces up */
+    qemu_savevm_send_ping(ms->file, 2);
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+    if (!fb) {
+        error_report("Failed to create buffered file");
+        goto fail;
+    }
+
+    qemu_savevm_state_complete_precopy(fb);
+    qemu_savevm_send_ping(fb, 3);
+
+    qemu_savevm_send_postcopy_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qemu_savevm_send_packaged(ms->file, qsb)) {
+        goto fail_closefb;
+    }
+    qemu_fclose(fb);
+    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
+
+    qemu_mutex_unlock_iothread();
+
+    /*
+     * Although this ping is just for debug, it could potentially be
+     * used for getting a better measurement of downtime at the source.
+     */
+    qemu_savevm_send_ping(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                              MIGRATION_STATUS_FAILED);
+    }
+
+    return ret;
+
+fail_closefb:
+    qemu_fclose(fb);
+fail:
+    migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+    qemu_mutex_unlock_iothread();
+    return -1;
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    bool entered_postcopy = false;
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;
 
     qemu_savevm_state_header(s->file);
+
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open its end so it can reply */
+        qemu_savevm_send_open_return_path(s->file);
+
+        /* And do a ping that will make stuff easier to debug */
+        qemu_savevm_send_ping(s->file, 1);
+
+        /*
+         * Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_advise(s->file);
+    }
+
     qemu_savevm_state_begin(s->file, &s->params);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIGRATION_STATUS_ACTIVE;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
     trace_migration_thread_setup_complete();
@@ -960,6 +1080,22 @@ static void *migration_thread(void *opaque)
             trace_migrate_pending(pending_size, max_size,
                                   pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE &&
+                    pend_nonpost <= max_size &&
+                    atomic_read(&s->start_postcopy)) {
+
+                    if (!postcopy_start(s, &old_vm_running)) {
+                        current_active_type = MIGRATION_STATUS_POSTCOPY_ACTIVE;
+                        entered_postcopy = true;
+                    }
+
+                    continue;
+                }
+                /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
                 int ret;
@@ -991,8 +1127,8 @@ static void *migration_thread(void *opaque)
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(s, current_active_type, MIGRATION_STATUS_FAILED);
+            trace_migration_thread_file_err();
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1023,12 +1159,15 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    trace_migration_thread_after_loop();
     qemu_mutex_lock_iothread();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
-        s->downtime = end_time - start_time;
+        if (!entered_postcopy) {
+            s->downtime = end_time - start_time;
+        }
         if (s->total_time) {
             s->mbps = (((double) transferred_bytes * 8.0) /
                        ((double) s->total_time)) / 1000;
@@ -1060,8 +1199,22 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /* Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_return_path_on_source(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIGRATION_STATUS_SETUP,
+                              MIGRATION_STATUS_FAILED);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->started_migration_thread = true;
 }
 
 PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
diff --git a/trace-events b/trace-events
index b2099ee..efee724 100644
--- a/trace-events
+++ b/trace-events
@@ -1406,9 +1406,13 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_after_loop(void) ""
+migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
+postcopy_start(void) ""
+postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
 source_return_path_thread_entry(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 33/47] Postcopy end in migration_thread
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 32/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2015-04-14 17:03 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 34/47] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The end of migration in postcopy is a bit different since some of
the things normally done at the end of migration have already been
done on the transition to postcopy.

The end of migration code is getting a bit complciated now, so
move out into its own function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 91 +++++++++++++++++++++++++++++++++++++--------------
 trace-events          |  6 ++++
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 611aca8..cf26d0d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -906,7 +906,6 @@ static int open_return_path_on_source(MigrationState *ms)
     return 0;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
@@ -1024,6 +1023,68 @@ fail:
 }
 
 /*
+ * Used by migration_thread when there's not much left pending.
+ * The caller 'breaks' the loop when this returns.
+ */
+static void migration_thread_end_of_iteration(MigrationState *s,
+                                              int current_active_state,
+                                              bool *old_vm_running,
+                                              int64_t *start_time)
+{
+    int ret;
+    if (s->state == MIGRATION_STATUS_ACTIVE) {
+        qemu_mutex_lock_iothread();
+        *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+        *old_vm_running = runstate_is_running();
+
+        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        if (ret >= 0) {
+            qemu_file_set_rate_limit(s->file, INT64_MAX);
+            qemu_savevm_state_complete_precopy(s->file);
+        }
+        qemu_mutex_unlock_iothread();
+
+        if (ret < 0) {
+            goto fail;
+        }
+    } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        trace_migration_thread_end_of_iteration_postcopy_end();
+
+        qemu_savevm_state_complete_postcopy(s->file);
+        trace_migration_thread_end_of_iteration_postcopy_end_after_complete();
+    }
+
+    /*
+     * If rp was opened we must clean up the thread before
+     * cleaning everything else up (since if there are no failures
+     * it will wait for the destination to send it's status in
+     * a SHUT command).
+     * Postcopy opens rp if enabled (even if it's not avtivated)
+     */
+    if (migrate_postcopy_ram()) {
+        int rp_error;
+        trace_migration_thread_end_of_iteration_postcopy_end_before_rp();
+        rp_error = await_return_path_close_on_source(s);
+        trace_migration_thread_end_of_iteration_postcopy_end_after_rp(rp_error);
+        if (rp_error) {
+            goto fail;
+        }
+    }
+
+    if (qemu_file_get_error(s->file)) {
+        trace_migration_thread_end_of_iteration_file_err();
+        goto fail;
+    }
+
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
+    return;
+
+fail:
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
@@ -1098,31 +1159,11 @@ static void *migration_thread(void *opaque)
                 /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
-                int ret;
-
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
-
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete_precopy(s->file);
-                }
-                qemu_mutex_unlock_iothread();
+                trace_migration_thread_low_pending(pending_size);
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_FAILED);
-                    break;
-                }
-
-                if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_COMPLETED);
-                    break;
-                }
+                migration_thread_end_of_iteration(s, current_active_type,
+                    &old_vm_running, &start_time);
+                break;
             }
         }
 
diff --git a/trace-events b/trace-events
index efee724..a83eec2 100644
--- a/trace-events
+++ b/trace-events
@@ -1409,6 +1409,12 @@ migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
+migration_thread_low_pending(uint64_t pending) "%" PRIu64
+migration_thread_end_of_iteration_file_err(void) ""
+migration_thread_end_of_iteration_postcopy_end(void) ""
+migration_thread_end_of_iteration_postcopy_end_after_complete(void) ""
+migration_thread_end_of_iteration_postcopy_end_before_rp(void) ""
+migration_thread_end_of_iteration_postcopy_end_after_rp(int rp_error) "%d"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 34/47] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 33/47] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 35/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         | 70 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 3 files changed, 75 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c02266e..37bd54a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -46,6 +46,8 @@ enum mig_rp_message_type {
     MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
     MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
     MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
+
+    MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be64) */
 };
 
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
@@ -236,6 +238,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index cf26d0d..41f377c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -99,6 +99,36 @@ static void deferred_incoming_migration(Error **errp)
     deferred_incoming = true;
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                               ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = cpu_to_be64((uint64_t)start);
+    buf64[1] = cpu_to_be64((uint64_t)len);
+    migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -804,6 +834,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    trace_migrate_handle_rp_req_pages(rbname, start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  */
@@ -815,6 +856,8 @@ static void *source_return_path_thread(void *opaque)
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    ram_addr_t start, len;
+    char *tmpstr;
     int res;
 
     trace_source_return_path_thread_entry();
@@ -830,6 +873,11 @@ static void *source_return_path_thread(void *opaque)
             expected_len = 4;
             break;
 
+        case MIG_RP_MSG_REQ_PAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             error_report("RP: Received invalid message 0x%04x length 0x%04x",
                     header_type, header_len);
@@ -875,6 +923,28 @@ static void *source_return_path_thread(void *opaque)
             trace_source_return_path_thread_pong(tmp32);
             break;
 
+        case MIG_RP_MSG_REQ_PAGES:
+            start = be64_to_cpup((uint64_t *)buf);
+            len = be64_to_cpup(((uint64_t *)buf)+1);
+            tmpstr = NULL;
+            if (len & 1) {
+                len -= 1; /* Remove the flag */
+                /* Now we expect an idstr */
+                tmp32 = buf[16]; /* Length of the following idstr */
+                tmpstr = (char *)&buf[17];
+                buf[17+tmp32] = '\0';
+                expected_len = 16+1+tmp32;
+            } else {
+                expected_len = 16;
+            }
+            if (header_len != expected_len) {
+                error_report("RP: Req_Page with length %d expecting %d",
+                        header_len, expected_len);
+                source_return_path_bad(ms);
+            }
+            migrate_handle_rp_req_pages(ms, tmpstr, start, len);
+            break;
+
         default:
             break;
         }
diff --git a/trace-events b/trace-events
index a83eec2..8379c1f 100644
--- a/trace-events
+++ b/trace-events
@@ -1406,6 +1406,7 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 35/47] Page request: Process incoming page request
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 34/47] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 36/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQ_PAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   | 64 ++++++++++++++++++++++++++++++++++++++++++-
 include/exec/cpu-all.h        |  2 --
 include/migration/migration.h | 21 ++++++++++++++
 include/qemu/typedefs.h       |  1 +
 migration/migration.c         | 31 +++++++++++++++++++++
 trace-events                  |  1 +
 6 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2c937d1..48403f3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -715,7 +715,69 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
     return pages;
 }
 
-/**
+/*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    rcu_read_lock();
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            /*
+             * Shouldn't happen, we can't reuse the last RAMBlock if
+             * it's the 1st request.
+             */
+            error_report("ram_save_queue_pages no previous block");
+            goto err;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            /* We shouldn't be asked for a non-existent RAMBlock */
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            goto err;
+        }
+    }
+    trace_ram_save_queue_pages(ramblock->idstr, start, len);
+    if (start+len > ramblock->used_length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->used_length);
+        goto err;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    memory_region_ref(ramblock->mr);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+    rcu_read_unlock();
+
+    return 0;
+
+err:
+    rcu_read_unlock();
+    return -1;
+}
+
+/*
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
  * Called within an RCU critical section.
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ac06c67..1f336e6 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -266,8 +266,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 /* memory API */
 
-typedef struct RAMBlock RAMBlock;
-
 struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 37bd54a..75c3299 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -88,6 +88,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -130,6 +142,12 @@ struct MigrationState
      * of the postcopy phase
      */
     unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -259,6 +277,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                              ram_addr_t offset, size_t size,
                              uint64_t *bytes_sent);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 PostcopyState postcopy_state_get(MigrationIncomingState *mis);
 
 /* Set the state and return the old state */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f130fe..61b5b46 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -72,6 +72,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUTimer QEMUTimer;
+typedef struct RAMBlock RAMBlock;
 typedef struct Range Range;
 typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
diff --git a/migration/migration.c b/migration/migration.c
index 41f377c..2509798 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -496,6 +498,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -614,6 +625,9 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIGRATION_STATUS_SETUP;
     trace_migrate_set_state(MIGRATION_STATUS_SETUP);
 
+    qemu_mutex_init(&s->src_page_req_mutex);
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -842,6 +856,23 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
     trace_migrate_handle_rp_req_pages(rbname, start, len);
+
+    /* Round everything up to our host page size */
+    long our_host_ps = getpagesize();
+    if (start & (our_host_ps-1)) {
+        long roundings = start & (our_host_ps-1);
+        start -= roundings;
+        len += roundings;
+    }
+    if (len & (our_host_ps-1)) {
+        long roundings = len & (our_host_ps-1);
+        len -= roundings;
+        len += our_host_ps;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
diff --git a/trace-events b/trace-events
index 8379c1f..3792d2e 100644
--- a/trace-events
+++ b/trace-events
@@ -1220,6 +1220,7 @@ migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
 ram_postcopy_send_discard_bitmap(void) ""
+ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 36/47] Page request: Consume pages off the post-copy queue
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 35/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 37/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c  | 156 +++++++++++++++++++++++++++++++++++++++++++++++++----------
 trace-events |   2 +
 2 files changed, 132 insertions(+), 26 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 48403f3..c96c4c1 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -312,6 +312,7 @@ static RAMBlock *last_seen_block;
 /* This is the last block from where we have sent data */
 static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
+static bool last_was_from_queue;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
@@ -490,6 +491,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
     return ret;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     ram_addr_t addr;
@@ -716,6 +730,40 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
 }
 
 /*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:      The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:      MigrationState in
+ *  offset:      the byte offset within the RAMBlock for the start of the page
+ * ram_addr_abs: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       ram_addr_t *ram_addr_abs)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *ram_addr_abs = (entry->offset + entry->rb->offset) & TARGET_PAGE_MASK;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            memory_region_unref(result->mr);
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+/*
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
  *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
@@ -793,47 +841,102 @@ err:
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
                                    uint64_t *bytes_transferred)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int pages = 0;
-    MemoryRegion *mr;
     ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
                                  ram_addr_t space */
+    unsigned long hps = sysconf(_SC_PAGESIZE);
 
-    if (!block)
+    if (!block) {
         block = QLIST_FIRST_RCU(&ram_list.blocks);
+        last_was_from_queue = false;
+    }
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
-                                                       &dirty_ram_abs);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = NULL;
+
+        /*
+         * Don't break host-page chunks up with queue items
+         * so only unqueue if,
+         *   a) The last item came from the queue anyway
+         *   b) The last sent item was the last target-page in a host page
+         */
+        if (last_was_from_queue || !last_sent_block ||
+            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
+            tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
         }
-        if (offset >= block->used_length) {
-            offset = 0;
-            block = QLIST_NEXT_RCU(block, next);
-            if (!block) {
-                block = QLIST_FIRST_RCU(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
+
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
+                                                   (uint64_t)tmpoffset,
+                                                   (uint64_t)dirty_ram_abs);
+            /*
+             * We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again.
+             */
+            if (!migration_bitmap_clear_dirty(dirty_ram_abs)) {
+                trace_ram_find_and_save_block_postcopy_not_dirty(
+                    tmpblock->idstr, (uint64_t)tmpoffset,
+                    (uint64_t)dirty_ram_abs,
+                    test_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap));
+
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We mustn't change block/offset unless it's to a valid one
+             * otherwise we can go down some of the exit cases in the normal
+             * path.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
+            last_was_from_queue = true;
         } else {
-            pages = ram_save_page(f, block, offset, last_stage,
-                                  bytes_transferred);
-
-            /* if page is unmodified, continue to the next */
-            if (pages > 0) {
-                MigrationState *ms = migrate_get_current();
-                if (ms->sentmap) {
-                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                           &dirty_ram_abs);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
+            }
+            if (offset >= block->used_length) {
+                offset = 0;
+                block = QLIST_NEXT_RCU(block, next);
+                if (!block) {
+                    block = QLIST_FIRST_RCU(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
                 }
+                continue; /* pick an offset in the new block */
+            }
+            last_was_from_queue = false;
+        }
 
-                last_sent_block = block;
-                break;
+        /* We have a page to send, so send it */
+        pages = ram_save_page(f, block, offset, last_stage,
+                              bytes_transferred);
+
+        /* if page is unmodified, continue to the next */
+        if (pages > 0) {
+            if (ms->sentmap) {
+                set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
             }
+
+            break;
         }
     }
 
@@ -929,6 +1032,7 @@ static void reset_ram_globals(void)
     last_offset = 0;
     last_version = ram_list.version;
     ram_bulk_stage = true;
+    last_was_from_queue = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
diff --git a/trace-events b/trace-events
index 3792d2e..c4b92ce 100644
--- a/trace-events
+++ b/trace-events
@@ -1219,6 +1219,8 @@ qemu_file_fclose(void) ""
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
+ram_find_and_save_block_postcopy(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr) "%s/%" PRIx64 " ram_addr=%" PRIx64
+ram_find_and_save_block_postcopy_not_dirty(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr, int sent) "%s/%" PRIx64 " ram_addr=%" PRIx64 " (sent=%d)"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 37/47] postcopy_ram.c: place_page and helpers
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 36/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 38/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the copy ioctl on the ufd).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  1 +
 include/migration/postcopy-ram.h | 16 ++++++++
 migration/postcopy-ram.c         | 87 ++++++++++++++++++++++++++++++++++++++++
 trace-events                     |  1 +
 4 files changed, 105 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 75c3299..db06fd2 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -82,6 +82,7 @@ struct MigrationIncomingState {
     int            userfault_fd;
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    void          *postcopy_tmp_page;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 88793b3..5d8ec61 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -69,4 +69,20 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
 void postcopy_discard_send_finish(MigrationState *ms,
                                   PostcopyDiscardState *pds);
 
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1be3bc9..33aadbc 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -278,6 +278,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -344,6 +348,77 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a host page (from) at (host) atomically
+ * all_zero: Hint that the page being placed is 0 throughout
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero)
+{
+    if (!all_zero) {
+        struct uffdio_copy copy_struct;
+
+        copy_struct.dst = (uint64_t)(uintptr_t)host;
+        copy_struct.src = (uint64_t)(uintptr_t)from;
+        copy_struct.len = getpagesize();
+        copy_struct.mode = 0;
+
+        /* copy also acks to the kernel waking the stalled thread up
+         * TODO: We can inhibit that ack and only do it if it was requested
+         * which would be slightly cheaper, but we'd have to be careful
+         * of the order of updating our page state.
+         */
+        if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
+            int e = errno;
+            error_report("%s: %s copy host: %p from: %p",
+                         __func__, strerror(e), host, from);
+
+            return -e;
+        }
+    } else {
+        struct uffdio_zeropage zero_struct;
+
+        zero_struct.range.start = (uint64_t)(uintptr_t)host;
+        zero_struct.range.len = getpagesize();
+        zero_struct.mode = 0;
+
+        if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
+            int e = errno;
+            error_report("%s: %s zero host: %p from: %p",
+                         __func__, strerror(e), host, from);
+
+            return -e;
+        }
+    }
+
+    trace_postcopy_place_page(host, all_zero);
+    return 0;
+}
+
+/*
+ * Returns a target page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ *
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            error_report("%s: %s", __func__, strerror(errno));
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -373,6 +448,18 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     assert(0);
 }
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero)
+{
+    assert(0);
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/trace-events b/trace-events
index c4b92ce..b65740c 100644
--- a/trace-events
+++ b/trace-events
@@ -1499,6 +1499,7 @@ postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s ma
 postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
 postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
+postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 38/47] Postcopy: Use helpers to map pages during migration
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 37/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 39/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
in the case that postcopy is going to do a copy anyway.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c96c4c1..0d3e865 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1476,7 +1476,17 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
 /* Must be called from within a rcu critical section.
  * Returns a pointer from within the RCU-protected ram_list.
  */
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
                                             int flags)
 {
@@ -1489,7 +1499,6 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
-
         return memory_region_get_ram_ptr(block->mr) + offset;
     }
 
@@ -1534,6 +1543,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
     static uint64_t seq_iter;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    bool postcopy_running = postcopy_state_get(mis) >=
+                            POSTCOPY_INCOMING_LISTENING;
+    void *postcopy_host_page = NULL;
+    bool postcopy_place_needed = false;
+    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
 
     seq_iter++;
 
@@ -1549,13 +1568,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     rcu_read_lock();
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr, total_ram_bytes;
-        void *host;
+        void *host = 0;
+        void *page_buffer = 0;
+        void *postcopy_place_source = 0;
         uint8_t ch;
+        bool all_zero = false;
 
         addr = qemu_get_be64(f);
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
+        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
+                     RAM_SAVE_FLAG_XBZRLE)) {
+            host = host_from_stream_offset(f, mis, addr, flags);
+            if (!host) {
+                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
+                ret = -EINVAL;
+                break;
+            }
+            if (!postcopy_running) {
+                page_buffer = host;
+            } else {
+                /*
+                 * Postcopy requires that we place whole host pages atomically.
+                 * To make it atomic, the data is read into a temporary page
+                 * that's moved into place later.
+                 * The migration protocol uses,  possibly smaller, target-pages
+                 * however the source ensures it always sends all the components
+                 * of a host page in order.
+                 */
+                if (!postcopy_host_page) {
+                    postcopy_host_page = postcopy_get_tmp_page(mis);
+                }
+                page_buffer = postcopy_host_page +
+                              ((uintptr_t)host & ~qemu_host_page_mask);
+                /* If all TP are zero then we can optimise the place */
+                if (!((uintptr_t)host & ~qemu_host_page_mask)) {
+                    all_zero = true;
+                }
+
+                /*
+                 * If it's the last part of a host page then we place the host
+                 * page
+                 */
+                postcopy_place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
+                                         ~qemu_host_page_mask) == 0;
+                postcopy_place_source = postcopy_host_page;
+            }
+        } else {
+            postcopy_place_needed = false;
+        }
+
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_MEM_SIZE:
             /* Synchronize RAM block list */
@@ -1592,30 +1655,36 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
             break;
         case RAM_SAVE_FLAG_COMPRESS:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
-            }
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                memset(page_buffer, ch, TARGET_PAGE_SIZE);
+                if (ch) {
+                    all_zero = false;
+                }
+            }
             break;
+
         case RAM_SAVE_FLAG_PAGE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (!postcopy_place_needed || !matching_page_sizes) {
+                qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
+            } else {
+                /* Avoids the qemu_file copy during postcopy, which is
+                 * going to do a copy later; can only do it when we
+                 * do this read in one go (matching page sizes)
+                 */
+                qemu_get_buffer_less_copy(f, (uint8_t **)&postcopy_place_source,
+                                          TARGET_PAGE_SIZE);
             }
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
             break;
+
         case RAM_SAVE_FLAG_XBZRLE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
             }
             if (load_xbzrle(f, addr, host) < 0) {
                 error_report("Failed to decompress XBZRLE page at "
@@ -1636,6 +1705,14 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
             }
         }
+
+        if (postcopy_place_needed) {
+            /* This gets called at the last target page in the host page */
+            ret = postcopy_place_page(mis, host + TARGET_PAGE_SIZE -
+                                           qemu_host_page_size,
+                                      postcopy_place_source,
+                                      all_zero);
+        }
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 39/47] qemu_ram_block_from_host
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 38/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 40/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock and the global ram_addr_t value.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since its the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 exec.c                    | 54 +++++++++++++++++++++++++++++++++++++++--------
 include/exec/cpu-common.h |  3 +++
 2 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/exec.c b/exec.c
index c3027cf..86f2b87 100644
--- a/exec.c
+++ b/exec.c
@@ -1280,6 +1280,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
@@ -1768,8 +1773,16 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
- * (typically a TLB entry) back to a ram offset.
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ *
+ * Returns: RAMBlock (or NULL if not found)
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
@@ -1777,18 +1790,22 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
  * pointer, such as a reference to the region that includes the incoming
  * ram_addr_t.
  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
-    MemoryRegion *mr;
 
     if (xen_enabled()) {
         rcu_read_lock();
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        mr = qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (block) {
+            *offset = (host - block->host);
+        }
         rcu_read_unlock();
-        return mr;
+        return block;
     }
 
     rcu_read_lock();
@@ -1811,10 +1828,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
-    mr = block->mr;
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
     rcu_read_unlock();
-    return mr;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);
+
+    if (!block) {
+        return NULL;
+    }
+
+    return block->mr;
 }
 
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 2abecac..13f8d3a 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -62,8 +62,11 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 40/47] Don't sync dirty bitmaps in postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 39/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 41/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch_init.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0d3e865..dc672bf 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1391,7 +1391,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     rcu_read_lock();
 
-    migration_bitmap_sync();
+    if (!migration_postcopy_phase(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1425,7 +1427,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_postcopy_phase(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         rcu_read_lock();
         migration_bitmap_sync();
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 41/47] Host page!=target page: Cleanup bitmaps
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 40/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Prior to the start of postcopy, ensure that everything that will
be transferred later is a whole host-page in size.

This is accomplished by discarding partially transferred host pages
and marking any that are partially dirty as fully dirty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c | 271 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 269 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index dc672bf..18253af 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -850,7 +850,6 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     int pages = 0;
     ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
                                  ram_addr_t space */
-    unsigned long hps = sysconf(_SC_PAGESIZE);
 
     if (!block) {
         block = QLIST_FIRST_RCU(&ram_list.blocks);
@@ -867,7 +866,8 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
          *   b) The last sent item was the last target-page in a host page
          */
         if (last_was_from_queue || !last_sent_block ||
-            ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) {
+            ((last_offset & ~qemu_host_page_mask) ==
+             (qemu_host_page_size - TARGET_PAGE_SIZE))) {
             tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
         }
 
@@ -1152,6 +1152,265 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
 }
 
 /*
+ * Helper for postcopy_chunk_hostpages where HPS/TPS >= bits-in-long
+ *
+ * !! Untested !!
+ */
+static int hostpage_big_chunk_helper(const char *block_name, void *host_addr,
+                                     ram_addr_t offset, ram_addr_t length,
+                                     void *opaque)
+{
+    MigrationState *ms = opaque;
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned int host_len = (qemu_host_page_size / TARGET_PAGE_SIZE) /
+                            long_bits;
+    unsigned long first_long, last_long, cur_long, current_hp;
+    unsigned long first = offset >> TARGET_PAGE_BITS;
+    unsigned long last = (offset + (length - 1)) >> TARGET_PAGE_BITS;
+
+    PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                           first,
+                                                           block_name);
+    first_long = first / long_bits;
+    last_long = last / long_bits;
+
+    /*
+     * I'm assuming RAMBlocks must start at the start of host pages,
+     * but I guess they might not use the whole of the host page
+     */
+
+    /* Work along one host page at a time */
+    for (current_hp = first_long; current_hp <= last_long;
+         current_hp += host_len) {
+        bool discard = 0;
+        bool redirty = 0;
+        bool has_some_dirty = false;
+        bool has_some_undirty = false;
+        bool has_some_sent = false;
+        bool has_some_unsent = false;
+
+        /*
+         * Check each long of mask for this hp, and see if anything
+         * needs updating.
+         */
+        for (cur_long = current_hp; cur_long < (current_hp + host_len);
+             cur_long++) {
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+
+            if (sdata) {
+                has_some_sent = true;
+            }
+            if (sdata != ~0ul) {
+                has_some_unsent = true;
+            }
+            if (ddata) {
+                has_some_dirty = true;
+            }
+            if (ddata != ~0ul) {
+                has_some_undirty = true;
+            }
+
+        }
+
+        if (has_some_sent && has_some_unsent) {
+            /* Partially sent host page */
+            discard = true;
+            redirty = true;
+        }
+
+        if (has_some_dirty && has_some_undirty) {
+            /* Partially dirty host page */
+            redirty = true;
+        }
+
+        if (!discard && !redirty) {
+            /* All consistent - next host page */
+            continue;
+        }
+
+
+        /* Now walk the chunks again, sending discards etc */
+        for (cur_long = current_hp; cur_long < (current_hp + host_len);
+             cur_long++) {
+            unsigned long cur_bits = cur_long * long_bits;
+
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+
+            if (discard && sdata) {
+                /* Tell the destination to discard these pages */
+                postcopy_discard_send_range(ms, pds, cur_bits,
+                                            cur_bits + long_bits - 1);
+                /* And clear them in the sent data structure */
+                ms->sentmap[cur_long] = 0;
+            }
+
+            if (redirty) {
+                migration_bitmap[cur_long] = ~0ul;
+                /* Inc the count of dirty pages */
+                migration_dirty_pages += ctpopl(~ddata);
+            }
+        }
+    }
+
+    postcopy_discard_send_finish(ms, pds);
+
+    return 0;
+}
+
+/*
+ * When working on long chunks of a bitmap where the only valid section
+ * is between start..end (inclusive), generate a mask with only those
+ * valid bits set for the current long word within that bitmask.
+ */
+static unsigned long make_long_mask(unsigned long start, unsigned long end,
+                                    unsigned long cur_long)
+{
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned long long_bits_mask = long_bits - 1;
+    unsigned long first_long, last_long;
+    unsigned long mask = ~(unsigned long)0;
+    first_long = start / long_bits ;
+    last_long = end / long_bits;
+
+    if ((cur_long == first_long) && (start & long_bits_mask)) {
+        /* e.g. (start & 31) = 3
+         *         1 << .    -> 2^3
+         *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+         *         ~.        -> mask 31..3
+         */
+        mask &= ~((((unsigned long)1) << (start & long_bits_mask)) - 1);
+    }
+
+    if ((cur_long == last_long) && ((end & long_bits_mask) != long_bits_mask)) {
+        /* e.g. (end & 31) = 3
+         *            .   +1 -> 4
+         *         1 << .    -> 2^4
+         *         . -1      -> 2^4 - 1
+         *                   = mask set 3..0
+         */
+        mask &= (((unsigned long)1) << ((end & long_bits_mask) + 1)) - 1;
+    }
+
+    return mask;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *
+ * Discard any partially sent host-page size chunks, mark any partially
+ * dirty host-page size chunks as all dirty.
+ *
+ * Returns: 0 on success
+ */
+static int postcopy_chunk_hostpages(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    unsigned int host_bits = qemu_host_page_size / TARGET_PAGE_SIZE;
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned long host_mask;
+
+    assert(is_power_of_2(host_bits));
+
+    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
+        /* Easy case - TPS==HPS - nothing to be done */
+        return 0;
+    }
+
+    /* Easiest way to make sure we don't resume in the middle of a host-page */
+    last_seen_block = NULL;
+    last_sent_block = NULL;
+
+    /*
+     * The currently worst known ratio is ARM that has 1kB target pages, and
+     * can have 64kB host pages, which is thus inconveniently larger than a long
+     * on ARM (32bits), and a long is the underlying element of the migration
+     * bitmaps.
+     */
+    if (host_bits >= long_bits) {
+        /* Deal with the odd case separately */
+        return qemu_ram_foreach_block(hostpage_big_chunk_helper, ms);
+    } else {
+        host_mask =  (1ul << host_bits) - 1;
+    }
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first_long, last_long, cur_long;
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->used_length - 1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first,
+                                                               block->idstr);
+
+        first_long = first / long_bits;
+        last_long = last / long_bits;
+        for (cur_long = first_long; cur_long <= last_long; cur_long++) {
+            unsigned long current_hp;
+            /* Deal with start/end not on alignment */
+            unsigned long mask = make_long_mask(first, last, cur_long);
+
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+            unsigned long discard = 0;
+            unsigned long redirty = 0;
+            sdata &= mask;
+            ddata &= mask;
+
+            for (current_hp = 0; current_hp < long_bits;
+                 current_hp += host_bits) {
+                unsigned long host_sent = (sdata >> current_hp) & host_mask;
+                unsigned long host_dirty = (ddata >> current_hp) & host_mask;
+
+                if (host_sent && (host_sent != host_mask)) {
+                    /* Partially sent host page */
+                    redirty |= host_mask << current_hp;
+                    discard |= host_mask << current_hp;
+
+                    /* Tell the destination to discard this page */
+                    postcopy_discard_send_range(ms, pds,
+                             cur_long * long_bits + current_hp /* start */,
+                             cur_long * long_bits + current_hp +
+                                 host_bits - 1 /* end */);
+                } else if (host_dirty && (host_dirty != host_mask)) {
+                    /* Partially dirty host page */
+                    redirty |= host_mask << current_hp;
+                }
+            }
+            if (discard) {
+                /* clear the page in the sentmap */
+                ms->sentmap[cur_long] &= ~discard;
+            }
+            if (redirty) {
+                /*
+                 * Reread original dirty bits and OR in ones we clear; we
+                 * must reread since we might be at the start or end of
+                 * a RAMBlock that the original 'mask' discarded some
+                 * bits from
+                */
+                ddata = migration_bitmap[cur_long];
+                migration_bitmap[cur_long] = ddata | redirty;
+                /* Inc the count of dirty pages */
+                migration_dirty_pages += ctpopl(redirty - (ddata & redirty));
+            }
+        }
+
+        postcopy_discard_send_finish(ms, pds);
+    }
+
+    rcu_read_unlock();
+    return 0;
+}
+
+/*
  * Transmit the set of pages to be discarded after precopy to the target
  * these are pages that have been sent previously but have been dirtied
  * Hopefully this is pretty sparse
@@ -1161,9 +1420,17 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
     int ret;
 
     rcu_read_lock();
+
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
 
+    /* Deal with TPS != HPS */
+    ret = postcopy_chunk_hostpages(ms);
+    if (ret) {
+        rcu_read_unlock();
+        return ret;
+    }
+
     /*
      * Update the sentmap to be  sentmap&=dirty
      */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 41/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-05-25  9:18   ` zhanghailiang
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 43/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  47 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages registered with it and allows
the program to acknowledge those stalls and tell the accessing
thread to carry on.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   4 +
 migration/postcopy-ram.c      | 165 +++++++++++++++++++++++++++++++++++++++---
 trace-events                  |   9 +++
 3 files changed, 169 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index db06fd2..4d6f33a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -75,11 +75,15 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
     /* For the kernel to send us notifications */
     int            userfault_fd;
+    /* To tell the fault_thread to quit */
+    int            userfault_quit_fd;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     void          *postcopy_tmp_page;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 33aadbc..b2dc3b7 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -49,6 +49,8 @@ struct PostcopyDiscardState {
  */
 #if defined(__linux__)
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/ioctl.h>
 #include <sys/syscall.h>
@@ -273,15 +275,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
-    if (qemu_ram_foreach_block(cleanup_area, mis)) {
-        return -1;
+    trace_postcopy_ram_incoming_cleanup_entry();
+
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+
+        if (qemu_ram_foreach_block(cleanup_area, mis)) {
+            return -1;
+        }
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to inc it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            trace_postcopy_ram_incoming_cleanup_join();
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            error_report("%s: incing userfault_quit_fd: %s", __func__,
+                         strerror(errno));
+        }
+        trace_postcopy_ram_incoming_cleanup_closeuf();
+        close(mis->userfault_fd);
+        close(mis->userfault_quit_fd);
+        mis->have_fault_thread = false;
     }
 
+    postcopy_state_set(mis, POSTCOPY_INCOMING_END);
+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
+
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
 
@@ -320,31 +348,150 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
-
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    uint64_t hostaddr; /* The kernel always gives us 64 bit, not a pointer */
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
+    uint8_t *local_tmp_page;
+
+    trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
+
+    local_tmp_page = mmap(NULL, getpagesize(),
+                          PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,
+                          -1, 0);
+    if (!local_tmp_page) {
+        error_report("%s mapping local tmp page: %s", __func__,
+                     strerror(errno));
+        return NULL;
     }
+    if (madvise(local_tmp_page, getpagesize(), MADV_DONTFORK)) {
+        munmap(local_tmp_page, getpagesize());
+        error_report("%s postcopy local page DONTFORK: %s", __func__,
+                     strerror(errno));
+        return NULL;
+    }
+
+    while (true) {
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        struct pollfd pfd[2];
+
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            error_report("%s: userfault poll: %s", __func__, strerror(errno));
+            break;
+        }
+
+        if (pfd[1].revents) {
+            trace_postcopy_ram_fault_thread_quit();
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
+        if (ret != sizeof(hostaddr)) {
+            if (errno == EAGAIN) {
+                /*
+                 * if a wake up happens on the other thread just after
+                 * the poll, there is nothing to read.
+                 */
+                continue;
+            }
+            if (ret < 0) {
+                error_report("%s: Failed to read full userfault hostaddr: %s",
+                             __func__, strerror(errno));
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %zd",
+                             __func__, ret, sizeof(hostaddr));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
 
+        rb = qemu_ram_block_from_host((void *)(uintptr_t)hostaddr, true,
+                                      &in_raspace, &rb_offset);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
+                         PRIx64, hostaddr);
+            break;
+        }
+
+        trace_postcopy_ram_fault_thread_request(hostaddr,
+                                                qemu_ram_get_idstr(rb),
+                                                rb_offset);
+
+        /*
+         * Send the request to the source - we want to request one
+         * of our host page sizes (which is >= TPS)
+         */
+        if (rb != last_rb) {
+            last_rb = rb;
+            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                     rb_offset, hostpagesize);
+        } else {
+            /* Save some space */
+            migrate_send_rp_req_pages(mis, NULL,
+                                     rb_offset, hostpagesize);
+        }
+    }
+    munmap(local_tmp_page, getpagesize());
+    trace_postcopy_ram_fault_thread_exit();
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    if (mis->userfault_fd == -1) {
+        error_report("%s: Failed to open userfault fd: %s", __func__,
+                     strerror(errno));
+        return -1;
+    }
+
+    /*
+     * Although the host check already tested the API, we need to
+     * do the check again as an ABI handshake on the new fd.
+     */
+    if (!ufd_version_check(mis->userfault_fd)) {
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+                     strerror(errno));
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
     qemu_sem_destroy(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
         return -1;
     }
 
+    trace_postcopy_ram_enable_notify();
+
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index b65740c..72a65fa 100644
--- a/trace-events
+++ b/trace-events
@@ -1500,6 +1500,15 @@ postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
 postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
+postcopy_ram_enable_notify(void) ""
+postcopy_ram_fault_thread_entry(void) ""
+postcopy_ram_fault_thread_exit(void) ""
+postcopy_ram_fault_thread_quit(void) ""
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_incoming_cleanup_closeuf(void) ""
+postcopy_ram_incoming_cleanup_entry(void) ""
+postcopy_ram_incoming_cleanup_exit(void) ""
+postcopy_ram_incoming_cleanup_join(void) ""
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 43/47] Start up a postcopy/listener thread ready for incoming page data
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 44/47] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         |  6 ++++
 savevm.c                      | 79 ++++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  2 ++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4d6f33a..cce4c50 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -79,6 +79,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration/migration.c b/migration/migration.c
index 2509798..6537d23 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1082,6 +1082,12 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
         goto fail;
     }
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_listen(fb);
+
     qemu_savevm_state_complete_precopy(fb);
     qemu_savevm_send_ping(fb, 3);
 
diff --git a/savevm.c b/savevm.c
index f606ce8..ce8c3b5 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1261,6 +1261,65 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    QEMUFile *f = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    trace_postcopy_ram_listen_thread_start();
+
+    /*
+     * Because we're a thread and not a coroutine we can't yield
+     * in qemu_file, and thus we must be blocking now.
+     */
+    qemu_file_change_blocking(f, true);
+    load_res = qemu_loadvm_state_main(f, mis);
+    /* And non-blocking again so we don't block in any cleanup */
+    qemu_file_change_blocking(f, false);
+
+    trace_postcopy_ram_listen_thread_exit();
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(f, load_res);
+    } else {
+        /*
+         * This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet; wait for the end of the main thread.
+         */
+        qemu_event_wait(&mis->main_thread_load_event);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    migration_incoming_state_destroy();
+
+    if (load_res < 0) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * depending how far we got it might be possible at this point
+         * to leave the guest running and fire MCEs for pages that never
+         * arrived as a desperate recovery step.
+         */
+        exit(EXIT_FAILURE);
+    }
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
@@ -1280,7 +1339,20 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, mis->file,
+                       QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+    qemu_sem_destroy(&mis->listen_thread_sem);
+
     return 0;
 }
 
@@ -1597,6 +1669,11 @@ int qemu_loadvm_state(QEMUFile *f)
     qemu_event_set(&mis->main_thread_load_event);
 
     trace_qemu_loadvm_state_post_main(ret);
+    if (mis->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         int file_error_after_eof = qemu_file_get_error(f);
 
diff --git a/trace-events b/trace-events
index 72a65fa..2f50cc4 100644
--- a/trace-events
+++ b/trace-events
@@ -1187,6 +1187,8 @@ loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+postcopy_ram_listen_thread_exit(void) ""
+postcopy_ram_listen_thread_start(void) ""
 qemu_savevm_send_postcopy_advise(void) ""
 qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 44/47] postcopy: Wire up loadvm_postcopy_handle_ commands
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 43/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 45/47] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 savevm.c     | 29 ++++++++++++++++++++++++++++-
 trace-events |  2 ++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/savevm.c b/savevm.c
index ce8c3b5..a1fabb5 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1360,12 +1360,34 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_RUNNING);
+    Error *local_err = NULL;
+
     trace_loadvm_postcopy_handle_run();
     if (ps != POSTCOPY_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
         return -1;
     }
 
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    trace_loadvm_postcopy_handle_run_cpu_sync();
+    cpu_synchronize_all_post_init();
+
+    trace_loadvm_postcopy_handle_run_vmstart();
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1374,7 +1396,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    /* We need to finish reading the stream from the package
+     * and also stop reading anything more from the stream that loaded the
+     * package (since it's now being read by the listener thread).
+     * LOADVM_QUIT will quit all the layers of nested loadvm loops.
+     */
+    return LOADVM_QUIT;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
diff --git a/trace-events b/trace-events
index 2f50cc4..1ab9079 100644
--- a/trace-events
+++ b/trace-events
@@ -1182,6 +1182,8 @@ loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_handle_run_cpu_sync(void) ""
+loadvm_postcopy_handle_run_vmstart(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 45/47] End of migration for postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 44/47] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 46/47] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 25 ++++++++++++++++++++++++-
 trace-events          |  2 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6537d23..180c8b0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -160,12 +160,35 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
+    PostcopyState ps;
     int ret;
 
-    migration_incoming_state_new(f);
+    mis = migration_incoming_state_new(f);
 
     ret = qemu_loadvm_state(f);
 
+    ps = postcopy_state_get(mis);
+    trace_process_incoming_migration_co_end(ret, ps);
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        if (ps == POSTCOPY_INCOMING_ADVISE) {
+            /*
+             * Where a migration had postcopy enabled (and thus went to advise)
+             * but managed to complete within the precopy period, we can use
+             * the normal exit.
+             */
+            postcopy_ram_incoming_cleanup(mis);
+        } else if (ret >= 0) {
+            /*
+             * Postcopy was started, cleanup should happen at the end of the
+             * postcopy thread.
+             */
+            trace_process_incoming_migration_co_postcopy_end_main();
+            return;
+        }
+        /* Else if something went wrong then just fall out of the normal exit */
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
diff --git a/trace-events b/trace-events
index 1ab9079..1378992 100644
--- a/trace-events
+++ b/trace-events
@@ -1435,6 +1435,8 @@ source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "%x"
 source_return_path_thread_shut(uint32_t val) "%x"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
+process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_postcopy_end_main(void) ""
 
 # migration/rdma.c
 qemu_dma_accept_incoming_migration(void) ""
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 46/47] Disable mlock around incoming postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 45/47] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 47/47] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
  2015-04-27  8:04 ` [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Li, Liang Z
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Userfault doesn't work with mlock; mlock is designed to nail down pages
so they don't move, userfault is designed to tell you when they're not
there.

munlock the pages we userfault protect before postcopy.
mlock everything again at the end if mlock is enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/sysemu/sysemu.h  |  1 +
 migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 248f0d6..d6fca99 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -171,6 +171,7 @@ extern int boot_menu;
 extern bool boot_strict;
 extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
+extern bool enable_mlock;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b2dc3b7..ddf0841 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -84,6 +84,11 @@ static bool ufd_version_check(int ufd)
     return true;
 }
 
+/*
+ * Note: This has the side effect of munlock'ing all of RAM, that's
+ * normally fine since if the postcopy succeeds it gets turned back on at the
+ * end.
+ */
 bool postcopy_ram_supported_by_host(void)
 {
     long pagesize = getpagesize();
@@ -112,6 +117,15 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /*
+     * userfault and mlock don't go together; we'll put it back later if
+     * it was enabled.
+     */
+    if (munlockall()) {
+        error_report("%s: munlockall: %s", __func__,  strerror(errno));
+        return -1;
+    }
+
+    /*
      *  We need to check that the ops we need are supported on anon memory
      *  To do that we need to register a chunk and see the flags that
      *  are returned.
@@ -302,6 +316,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    if (enable_mlock) {
+        if (os_mlock() < 0) {
+            error_report("mlock: %s", strerror(errno));
+            /*
+             * It doesn't feel right to fail at this point, we have a valid
+             * VM state.
+             */
+        }
+    }
+
     postcopy_state_set(mis, POSTCOPY_INCOMING_END);
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [PATCH v6 47/47] Inhibit ballooning during postcopy
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 46/47] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:04 ` Dr. David Alan Gilbert (git)
  2015-04-27  8:04 ` [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Li, Liang Z
  47 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-14 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, david, yayanghy

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 balloon.c                  | 11 +++++++++++
 hw/virtio/virtio-balloon.c |  4 +++-
 include/sysemu/balloon.h   |  2 ++
 migration/postcopy-ram.c   |  9 +++++++++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/balloon.c b/balloon.c
index 70c00f5..0274df8 100644
--- a/balloon.c
+++ b/balloon.c
@@ -35,6 +35,17 @@
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
+static bool balloon_inhibited;
+
+bool qemu_balloon_is_inhibited(void)
+{
+    return balloon_inhibited;
+}
+
+void qemu_balloon_inhibit(bool state)
+{
+    balloon_inhibited = state;
+}
 
 static bool have_balloon(Error **errp)
 {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 95b0643..8bb93db 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,9 +37,11 @@
 static void balloon_page(void *addr, int deflate)
 {
 #if defined(__linux__)
-    if (!kvm_enabled() || kvm_has_sync_mmu())
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
         qemu_madvise(addr, TARGET_PAGE_SIZE,
                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    }
 #endif
 }
 
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 0345e01..6851d99 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -23,5 +23,7 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
+bool qemu_balloon_is_inhibited(void);
+void qemu_balloon_inhibit(bool state);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ddf0841..50ce6eb 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -24,6 +24,7 @@
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/balloon.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -316,6 +317,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    qemu_balloon_inhibit(false);
+
     if (enable_mlock) {
         if (os_mlock() < 0) {
             error_report("mlock: %s", strerror(errno));
@@ -514,6 +517,12 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Ballooning can mark pages as absent while we're postcopying
+     * that would cause false userfaults.
+     */
+    qemu_balloon_inhibit(true);
+
     trace_postcopy_ram_enable_notify();
 
     return 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-04-14 17:38   ` Eric Blake
  2015-04-14 17:40     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Eric Blake @ 2015-04-14 17:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, yayanghy, david

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On 04/14/2015 11:03 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
> 
>   migrate_start_postcopy
> 
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
> 
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---

> +++ b/qapi-schema.json
> @@ -566,6 +566,14 @@
>  { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
>  
>  ##
> +# @migrate-start-postcopy
> +#
> +# Switch migration to postcopy mode
> +#
> +# Since: 2.3

2.4

> +{ 'command': 'migrate-start-postcopy' }
> +

As that's easily fixable by the maintainer, my R-b stands.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-04-14 17:38   ` Eric Blake
@ 2015-04-14 17:40     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-14 17:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, yamahata, quintela, qemu-devel, amit.shah, pbonzini, david

* Eric Blake (eblake@redhat.com) wrote:
> On 04/14/2015 11:03 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Once postcopy is enabled (with migrate_set_capability), the migration
> > will still start on precopy mode.  To cause a transition into postcopy
> > the:
> > 
> >   migrate_start_postcopy
> > 
> > command must be issued.  Postcopy will start sometime after this
> > (when it's next checked in the migration loop).
> > 
> > Issuing the command before migration has started will error,
> > and issuing after it has finished is ignored.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > ---
> 
> > +++ b/qapi-schema.json
> > @@ -566,6 +566,14 @@
> >  { 'command': 'query-migrate-capabilities', 'returns':   ['MigrationCapabilityStatus']}
> >  
> >  ##
> > +# @migrate-start-postcopy
> > +#
> > +# Switch migration to postcopy mode
> > +#
> > +# Since: 2.3
> 
> 2.4

Thanks, fixed.

Dave

> > +{ 'command': 'migrate-start-postcopy' }
> > +
> 
> As that's easily fixable by the maintainer, my R-b stands.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-04-14 17:40   ` Eric Blake
  2015-04-14 18:00     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Eric Blake @ 2015-04-14 17:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, yayanghy, david

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

On 04/14/2015 11:03 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> 
> 'migration_postcopy_phase' is provided for other sections to know if
> they're in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/migration/migration.h |  2 ++
>  migration/migration.c         | 56 ++++++++++++++++++++++++++++++++++++-------
>  qapi-schema.json              |  4 +++-
>  trace-events                  |  1 +
>  4 files changed, 54 insertions(+), 9 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -424,6 +424,8 @@
>  #
>  # @active: in the process of doing migration.
>  #
> +# @postcopy-active: as active, but now in postcopy mode.
> +#

s/as/like/
Needs a (since 2.4) designation.

Minor enough that I'm okay if you fix them and add:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-04-14 17:40   ` Eric Blake
@ 2015-04-14 18:00     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-14 18:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, yamahata, quintela, qemu-devel, amit.shah, pbonzini, david

* Eric Blake (eblake@redhat.com) wrote:
> On 04/14/2015 11:03 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> > 
> > 'migration_postcopy_phase' is provided for other sections to know if
> > they're in postcopy.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/migration/migration.h |  2 ++
> >  migration/migration.c         | 56 ++++++++++++++++++++++++++++++++++++-------
> >  qapi-schema.json              |  4 +++-
> >  trace-events                  |  1 +
> >  4 files changed, 54 insertions(+), 9 deletions(-)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -424,6 +424,8 @@
> >  #
> >  # @active: in the process of doing migration.
> >  #
> > +# @postcopy-active: as active, but now in postcopy mode.
> > +#
> 
> s/as/like/
> Needs a (since 2.4) designation.
> 
> Minor enough that I'm okay if you fix them and add:
> Reviewed-by: Eric Blake <eblake@redhat.com>

Done.  Thanks.

Dave
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
  2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (46 preceding siblings ...)
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 47/47] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
@ 2015-04-27  8:04 ` Li, Liang Z
  2015-04-29 17:23   ` Dr. David Alan Gilbert
  47 siblings, 1 reply; 74+ messages in thread
From: Li, Liang Z @ 2015-04-27  8:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, yayanghy, david

Hi David,

I have tired your v6 postcopy patches and found it doesn't work. When I tried to start the 
postcopy in live migration, some errors were printed. I just did the following things:

On destination side, started the qemu like this:

/root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
-enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
-monitor stdio -incoming tcp:0:4444

On source side, started the qemu like this:

/root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
-enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
-monitor stdio

and then
(qemu) migrate_set_capability x-postcopy-ram on

When I started the post copy with
(qemu) migrate -d tcp:localhost:4444

I got the error message on the source side:

(qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for (131552/-1)
 qemu-system-x86_64: RP: Received invalid message 0x0000 length    0x0000

and the following error on the destination side:

(qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS support
qemu-system-x86_64: load of migration failed: Operation not permitted


the dmesg printed:
[  233.456545] kvm: zapping shadow pages for mmio generation wraparound
[  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd


The v5 patches have no such errors. Do you have any suggestion?

Liang


> -----Original Message-----
> From: qemu-devel-bounces+liang.z.li=intel.com@nongnu.org [mailto:qemu-
> devel-bounces+liang.z.li=intel.com@nongnu.org] On Behalf Of Dr. David Alan
> Gilbert (git)
> Sent: Wednesday, April 15, 2015 1:03 AM
> To: qemu-devel@nongnu.org
> Cc: aarcange@redhat.com; yamahata@private.email.ne.jp;
> quintela@redhat.com; amit.shah@redhat.com; pbonzini@redhat.com;
> david@gibson.dropbear.id.au; yayanghy@cn.fujitsu.com
> Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> 
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
>   This is the 6th cut of my version of postcopy; it is designed for use with the
> Linux kernel additions posted by Andrea Arcangeli here:
> 
> git clone --reference linux -b userfault18
> git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> 
> (Note this is a different API from the last version)
> 
> This qemu series can be found at:
> 
> https://github.com/orbitfp7/qemu.git
> on the wp3-postcopy-v6 tag.
> 
> It addresses some but not yet all of the previous review comments; however
> there are a couple of large simplifications, so it seems worth posting to meet
> the new kernel API and to stop people reviewing deadcode.
> 
> Note: That the userfaultfd.h header is no longer included in this
> tree:
>       - if you're building with the appropriate kernel headers it should find it
>       - if you're building on a host that doesn't have the kernel headers
>         installed in the right place then:
>            configure with:   --extra-cflags="-D__NR_userfaultfd=323"
>            cp include/uapi/linux/userfaultfd.h into somewhere in the include
>            path, e.g.  /usr/local/include/linux
> 
> v6
>   Removed the PMI bitmaps
>       - Andrea updated the kernel API so that userspace doesn't
>         need to do wakeups, and thus QEMU doesn't need to keep
>         track of which pages it's received; there is a price - which
>         is we end up sending more dupes to the source, but it simplifies
>         stuff a lot and makes the normal paths a lot quicker.
>         (10s of line change in kernel, 10%-ish simplification in this code!)
>   Changed discard message format to a simpler start/end address scheme
>         and rework discard and chunking code to work in long's to match bitmap
>   'qemu_get_buffer_less_copy' for postcopy pages
>       - avoids a userspace copy since the kernel now does it
>       - the new qemufile interface might also be useful for other places that
>         don't need a copy (maybe xbzrle?)
>   Changed the blockingness of the incoming fd
>       it was incorrectly blocking during the precopy phase after a postcopy was
>       enabled, causing the HMP to be unavailable.  It's now blocking only once
>       the postcopy thread starts up, since it's not a coroutine it can't deal
>       with the yields in qemu_file.
>   An error on the return-path now marks the migration as failed
> 
>   Fixups from Dave Gibson's comments
>     Removed can_postcopy, renamed save_complete to
> save_complete_precopy
>         added save_complete_postcopy
>     Simplified loadvm loop exits
>     discard message format changes above
>     and many more smaller changes.
> 
>   small fixups for RCU
> 
> 
> This work has been partially funded by the EU Orbit project:
>   see http://www.orbitproject.eu/about/
> 
> TODO:
>   The major work is to rework the page send/receive loops so that supporting
>   larger host pages doesn't make it quite as messy.
> 
> Dr. David Alan Gilbert (47):
>   Start documenting how postcopy works.
>   Split header writing out of qemu_savevm_state_begin
>   qemu_ram_foreach_block: pass up error value, and down the ramblock
>     name
>   Add qemu_get_counted_string to read a string prefixed by a count byte
>   Create MigrationIncomingState
>   Provide runtime Target page information
>   Move copy out of qemu_peek_buffer
>   Add qemu_get_buffer_less_copy to avoid copies some of the time
>   Add wrapper for setting blocking status on a QEMUFile
>   Rename save_live_complete to save_live_complete_precopy
>   Return path: Open a return path on QEMUFile for sockets
>   Return path: socket_writev_buffer: Block even on non-blocking fd's
>   Migration commands
>   Return path: Control commands
>   Return path: Send responses from destination to source
>   Return path: Source handling of return path
>   ram_debug_dump_bitmap: Dump a migration bitmap as text
>   Move loadvm_handlers into MigrationIncomingState
>   Rework loadvm path for subloops
>   Add migration-capability boolean for postcopy-ram.
>   Add wrappers and handlers for sending/receiving the postcopy-ram
>     migration messages.
>   MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
>   migrate_init: Call from savevm
>   Modify save_live_pending for postcopy
>   postcopy: OS support test
>   migrate_start_postcopy: Command to trigger transition to postcopy
>   MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
>   Add qemu_savevm_state_complete_postcopy
>   Postcopy: Maintain sentmap and calculate discard
>   postcopy: Incoming initialisation
>   postcopy: ram_enable_notify to switch on userfault
>   Postcopy: Postcopy startup in migration thread
>   Postcopy end in migration_thread
>   Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
>   Page request: Process incoming page request
>   Page request: Consume pages off the post-copy queue
>   postcopy_ram.c: place_page and helpers
>   Postcopy: Use helpers to map pages during migration
>   qemu_ram_block_from_host
>   Don't sync dirty bitmaps in postcopy
>   Host page!=target page: Cleanup bitmaps
>   Postcopy; Handle userfault requests
>   Start up a postcopy/listener thread ready for incoming page data
>   postcopy: Wire up loadvm_postcopy_handle_ commands
>   End of migration for postcopy
>   Disable mlock around incoming postcopy
>   Inhibit ballooning during postcopy
> 
>  arch_init.c                      | 868 ++++++++++++++++++++++++++++++++++++---
>  balloon.c                        |  11 +
>  docs/migration.txt               | 167 ++++++++
>  exec.c                           |  74 +++-
>  hmp-commands.hx                  |  15 +
>  hmp.c                            |   7 +
>  hmp.h                            |   1 +
>  hw/ppc/spapr.c                   |   2 +-
>  hw/virtio/virtio-balloon.c       |   4 +-
>  include/exec/cpu-all.h           |   2 -
>  include/exec/cpu-common.h        |   7 +-
>  include/migration/migration.h    | 126 +++++-
>  include/migration/postcopy-ram.h |  88 ++++
>  include/migration/qemu-file.h    |  15 +-
>  include/migration/vmstate.h      |  10 +-
>  include/qemu/typedefs.h          |   5 +
>  include/sysemu/balloon.h         |   2 +
>  include/sysemu/sysemu.h          |  45 +-
>  migration/Makefile.objs          |   2 +-
>  migration/block.c                |   9 +-
>  migration/migration.c            | 743 +++++++++++++++++++++++++++++++--
>  migration/postcopy-ram.c         | 715
> ++++++++++++++++++++++++++++++++
>  migration/qemu-file-unix.c       | 106 ++++-
>  migration/qemu-file.c            | 100 ++++-
>  migration/rdma.c                 |   4 +-
>  migration/vmstate.c              |   5 +-
>  qapi-schema.json                 |  19 +-
>  qmp-commands.hx                  |  19 +
>  savevm.c                         | 809 ++++++++++++++++++++++++++++++++----
>  trace-events                     |  77 +++-
>  30 files changed, 3832 insertions(+), 225 deletions(-)  create mode 100644
> include/migration/postcopy-ram.h  create mode 100644 migration/postcopy-
> ram.c
> 
> --
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
  2015-04-27  8:04 ` [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Li, Liang Z
@ 2015-04-29 17:23   ` Dr. David Alan Gilbert
  2015-04-30  1:09     ` Li, Liang Z
  0 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-04-29 17:23 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: aarcange, yamahata, quintela, qemu-devel, amit.shah, pbonzini,
	yayanghy, david

* Li, Liang Z (liang.z.li@intel.com) wrote:
> Hi David,
> 
> I have tired your v6 postcopy patches and found it doesn't work. When I tried to start the 
> postcopy in live migration, some errors were printed. I just did the following things:
> 
> On destination side, started the qemu like this:
> 
> /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
> -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> -monitor stdio -incoming tcp:0:4444
> 
> On source side, started the qemu like this:
> 
> /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
> -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> -monitor stdio
> 
> and then
> (qemu) migrate_set_capability x-postcopy-ram on
> 
> When I started the post copy with
> (qemu) migrate -d tcp:localhost:4444
> 
> I got the error message on the source side:
> 
> (qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for (131552/-1)
>  qemu-system-x86_64: RP: Received invalid message 0x0000 length    0x0000
> 
> and the following error on the destination side:
> 
> (qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS support
> qemu-system-x86_64: load of migration failed: Operation not permitted

OK, the important error here is:
           postcopy_ram_supported_by_host: No OS support

that's saying that the destination OS either:
   1) The kernel isn't the correct kernel with Andrea's userfault code compiled in
      (check that userfaultfd is configured into the kernel as well)
   2) That when you built the QEMU it didn't find the syscall definition for the
      userfaultfd in the header as it compiled it.

I think from that error it is (2) - so make sure that when you built the qemu
that you're using the headers from that kernel, or use the extra-cflags hack
that I mentioned in the cover letter.

Note that you need to use the kernel tree which I point to in the first message.
(The older kernel from v5 wont work).

Dave
P.S. I'm on holiday this week, so not checking work email much.

> 
> 
> the dmesg printed:
> [  233.456545] kvm: zapping shadow pages for mmio generation wraparound
> [  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
> 
> 
> The v5 patches have no such errors. Do you have any suggestion?
> 
> Liang
> 
> 
> > -----Original Message-----
> > From: qemu-devel-bounces+liang.z.li=intel.com@nongnu.org [mailto:qemu-
> > devel-bounces+liang.z.li=intel.com@nongnu.org] On Behalf Of Dr. David Alan
> > Gilbert (git)
> > Sent: Wednesday, April 15, 2015 1:03 AM
> > To: qemu-devel@nongnu.org
> > Cc: aarcange@redhat.com; yamahata@private.email.ne.jp;
> > quintela@redhat.com; amit.shah@redhat.com; pbonzini@redhat.com;
> > david@gibson.dropbear.id.au; yayanghy@cn.fujitsu.com
> > Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> > 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> >   This is the 6th cut of my version of postcopy; it is designed for use with the
> > Linux kernel additions posted by Andrea Arcangeli here:
> > 
> > git clone --reference linux -b userfault18
> > git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> > 
> > (Note this is a different API from the last version)
> > 
> > This qemu series can be found at:
> > 
> > https://github.com/orbitfp7/qemu.git
> > on the wp3-postcopy-v6 tag.
> > 
> > It addresses some but not yet all of the previous review comments; however
> > there are a couple of large simplifications, so it seems worth posting to meet
> > the new kernel API and to stop people reviewing deadcode.
> > 
> > Note: That the userfaultfd.h header is no longer included in this
> > tree:
> >       - if you're building with the appropriate kernel headers it should find it
> >       - if you're building on a host that doesn't have the kernel headers
> >         installed in the right place then:
> >            configure with:   --extra-cflags="-D__NR_userfaultfd=323"
> >            cp include/uapi/linux/userfaultfd.h into somewhere in the include
> >            path, e.g.  /usr/local/include/linux
> > 
> > v6
> >   Removed the PMI bitmaps
> >       - Andrea updated the kernel API so that userspace doesn't
> >         need to do wakeups, and thus QEMU doesn't need to keep
> >         track of which pages it's received; there is a price - which
> >         is we end up sending more dupes to the source, but it simplifies
> >         stuff a lot and makes the normal paths a lot quicker.
> >         (10s of line change in kernel, 10%-ish simplification in this code!)
> >   Changed discard message format to a simpler start/end address scheme
> >         and rework discard and chunking code to work in long's to match bitmap
> >   'qemu_get_buffer_less_copy' for postcopy pages
> >       - avoids a userspace copy since the kernel now does it
> >       - the new qemufile interface might also be useful for other places that
> >         don't need a copy (maybe xbzrle?)
> >   Changed the blockingness of the incoming fd
> >       it was incorrectly blocking during the precopy phase after a postcopy was
> >       enabled, causing the HMP to be unavailable.  It's now blocking only once
> >       the postcopy thread starts up, since it's not a coroutine it can't deal
> >       with the yields in qemu_file.
> >   An error on the return-path now marks the migration as failed
> > 
> >   Fixups from Dave Gibson's comments
> >     Removed can_postcopy, renamed save_complete to
> > save_complete_precopy
> >         added save_complete_postcopy
> >     Simplified loadvm loop exits
> >     discard message format changes above
> >     and many more smaller changes.
> > 
> >   small fixups for RCU
> > 
> > 
> > This work has been partially funded by the EU Orbit project:
> >   see http://www.orbitproject.eu/about/
> > 
> > TODO:
> >   The major work is to rework the page send/receive loops so that supporting
> >   larger host pages doesn't make it quite as messy.
> > 
> > Dr. David Alan Gilbert (47):
> >   Start documenting how postcopy works.
> >   Split header writing out of qemu_savevm_state_begin
> >   qemu_ram_foreach_block: pass up error value, and down the ramblock
> >     name
> >   Add qemu_get_counted_string to read a string prefixed by a count byte
> >   Create MigrationIncomingState
> >   Provide runtime Target page information
> >   Move copy out of qemu_peek_buffer
> >   Add qemu_get_buffer_less_copy to avoid copies some of the time
> >   Add wrapper for setting blocking status on a QEMUFile
> >   Rename save_live_complete to save_live_complete_precopy
> >   Return path: Open a return path on QEMUFile for sockets
> >   Return path: socket_writev_buffer: Block even on non-blocking fd's
> >   Migration commands
> >   Return path: Control commands
> >   Return path: Send responses from destination to source
> >   Return path: Source handling of return path
> >   ram_debug_dump_bitmap: Dump a migration bitmap as text
> >   Move loadvm_handlers into MigrationIncomingState
> >   Rework loadvm path for subloops
> >   Add migration-capability boolean for postcopy-ram.
> >   Add wrappers and handlers for sending/receiving the postcopy-ram
> >     migration messages.
> >   MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
> >   migrate_init: Call from savevm
> >   Modify save_live_pending for postcopy
> >   postcopy: OS support test
> >   migrate_start_postcopy: Command to trigger transition to postcopy
> >   MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
> >   Add qemu_savevm_state_complete_postcopy
> >   Postcopy: Maintain sentmap and calculate discard
> >   postcopy: Incoming initialisation
> >   postcopy: ram_enable_notify to switch on userfault
> >   Postcopy: Postcopy startup in migration thread
> >   Postcopy end in migration_thread
> >   Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
> >   Page request: Process incoming page request
> >   Page request: Consume pages off the post-copy queue
> >   postcopy_ram.c: place_page and helpers
> >   Postcopy: Use helpers to map pages during migration
> >   qemu_ram_block_from_host
> >   Don't sync dirty bitmaps in postcopy
> >   Host page!=target page: Cleanup bitmaps
> >   Postcopy; Handle userfault requests
> >   Start up a postcopy/listener thread ready for incoming page data
> >   postcopy: Wire up loadvm_postcopy_handle_ commands
> >   End of migration for postcopy
> >   Disable mlock around incoming postcopy
> >   Inhibit ballooning during postcopy
> > 
> >  arch_init.c                      | 868 ++++++++++++++++++++++++++++++++++++---
> >  balloon.c                        |  11 +
> >  docs/migration.txt               | 167 ++++++++
> >  exec.c                           |  74 +++-
> >  hmp-commands.hx                  |  15 +
> >  hmp.c                            |   7 +
> >  hmp.h                            |   1 +
> >  hw/ppc/spapr.c                   |   2 +-
> >  hw/virtio/virtio-balloon.c       |   4 +-
> >  include/exec/cpu-all.h           |   2 -
> >  include/exec/cpu-common.h        |   7 +-
> >  include/migration/migration.h    | 126 +++++-
> >  include/migration/postcopy-ram.h |  88 ++++
> >  include/migration/qemu-file.h    |  15 +-
> >  include/migration/vmstate.h      |  10 +-
> >  include/qemu/typedefs.h          |   5 +
> >  include/sysemu/balloon.h         |   2 +
> >  include/sysemu/sysemu.h          |  45 +-
> >  migration/Makefile.objs          |   2 +-
> >  migration/block.c                |   9 +-
> >  migration/migration.c            | 743 +++++++++++++++++++++++++++++++--
> >  migration/postcopy-ram.c         | 715
> > ++++++++++++++++++++++++++++++++
> >  migration/qemu-file-unix.c       | 106 ++++-
> >  migration/qemu-file.c            | 100 ++++-
> >  migration/rdma.c                 |   4 +-
> >  migration/vmstate.c              |   5 +-
> >  qapi-schema.json                 |  19 +-
> >  qmp-commands.hx                  |  19 +
> >  savevm.c                         | 809 ++++++++++++++++++++++++++++++++----
> >  trace-events                     |  77 +++-
> >  30 files changed, 3832 insertions(+), 225 deletions(-)  create mode 100644
> > include/migration/postcopy-ram.h  create mode 100644 migration/postcopy-
> > ram.c
> > 
> > --
> > 2.1.0
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
  2015-04-29 17:23   ` Dr. David Alan Gilbert
@ 2015-04-30  1:09     ` Li, Liang Z
       [not found]       ` <20150505150112.GM2126@work-vm>
  0 siblings, 1 reply; 74+ messages in thread
From: Li, Liang Z @ 2015-04-30  1:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, qemu-devel, amit.shah, pbonzini,
	yayanghy, david

> * Li, Liang Z (liang.z.li@intel.com) wrote:
> > Hi David,
> >
> > I have tired your v6 postcopy patches and found it doesn't work. When
> > I tried to start the postcopy in live migration, some errors were printed. I
> just did the following things:
> >
> > On destination side, started the qemu like this:
> >
> > /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-
> x86_64
> > -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> > -monitor stdio -incoming tcp:0:4444
> >
> > On source side, started the qemu like this:
> >
> > /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-
> x86_64
> > -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> > -monitor stdio
> >
> > and then
> > (qemu) migrate_set_capability x-postcopy-ram on
> >
> > When I started the post copy with
> > (qemu) migrate -d tcp:localhost:4444
> >
> > I got the error message on the source side:
> >
> > (qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for
> (131552/-1)
> >  qemu-system-x86_64: RP: Received invalid message 0x0000 length
> 0x0000
> >
> > and the following error on the destination side:
> >
> > (qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS
> > support
> > qemu-system-x86_64: load of migration failed: Operation not permitted
> 
> OK, the important error here is:
>            postcopy_ram_supported_by_host: No OS support
> 
> that's saying that the destination OS either:
>    1) The kernel isn't the correct kernel with Andrea's userfault code compiled
> in
>       (check that userfaultfd is configured into the kernel as well)
>    2) That when you built the QEMU it didn't find the syscall definition for the
>       userfaultfd in the header as it compiled it.
> 
> I think from that error it is (2) - so make sure that when you built the qemu
> that you're using the headers from that kernel, or use the extra-cflags hack
> that I mentioned in the cover letter.
> 
> Note that you need to use the kernel tree which I point to in the first
> message.
> (The older kernel from v5 wont work).
> 

Thanks Dave, I will retry according to your suggestion.

> Dave
> P.S. I'm on holiday this week, so not checking work email much.
> 
> >
> >
> > the dmesg printed:
> > [  233.456545] kvm: zapping shadow pages for mmio generation
> > wraparound [  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr:
> > 0xc1 data 0xabcd
> >
> >
> > The v5 patches have no such errors. Do you have any suggestion?
> >
> > Liang
> >
> >
> > > -----Original Message-----
> > > From: qemu-devel-bounces+liang.z.li=intel.com@nongnu.org
> > > [mailto:qemu-
> > > devel-bounces+liang.z.li=intel.com@nongnu.org] On Behalf Of Dr.
> > > devel-bounces+David Alan
> > > Gilbert (git)
> > > Sent: Wednesday, April 15, 2015 1:03 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: aarcange@redhat.com; yamahata@private.email.ne.jp;
> > > quintela@redhat.com; amit.shah@redhat.com; pbonzini@redhat.com;
> > > david@gibson.dropbear.id.au; yayanghy@cn.fujitsu.com
> > > Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> > >
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > >   This is the 6th cut of my version of postcopy; it is designed for
> > > use with the Linux kernel additions posted by Andrea Arcangeli here:
> > >
> > > git clone --reference linux -b userfault18
> > > git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> > >
> > > (Note this is a different API from the last version)
> > >
> > > This qemu series can be found at:
> > >
> > > https://github.com/orbitfp7/qemu.git
> > > on the wp3-postcopy-v6 tag.
> > >
> > > It addresses some but not yet all of the previous review comments;
> > > however there are a couple of large simplifications, so it seems
> > > worth posting to meet the new kernel API and to stop people reviewing
> deadcode.
> > >
> > > Note: That the userfaultfd.h header is no longer included in this
> > > tree:
> > >       - if you're building with the appropriate kernel headers it should find it
> > >       - if you're building on a host that doesn't have the kernel headers
> > >         installed in the right place then:
> > >            configure with:   --extra-cflags="-D__NR_userfaultfd=323"
> > >            cp include/uapi/linux/userfaultfd.h into somewhere in the include
> > >            path, e.g.  /usr/local/include/linux
> > >
> > > v6
> > >   Removed the PMI bitmaps
> > >       - Andrea updated the kernel API so that userspace doesn't
> > >         need to do wakeups, and thus QEMU doesn't need to keep
> > >         track of which pages it's received; there is a price - which
> > >         is we end up sending more dupes to the source, but it simplifies
> > >         stuff a lot and makes the normal paths a lot quicker.
> > >         (10s of line change in kernel, 10%-ish simplification in this code!)
> > >   Changed discard message format to a simpler start/end address scheme
> > >         and rework discard and chunking code to work in long's to match
> bitmap
> > >   'qemu_get_buffer_less_copy' for postcopy pages
> > >       - avoids a userspace copy since the kernel now does it
> > >       - the new qemufile interface might also be useful for other places that
> > >         don't need a copy (maybe xbzrle?)
> > >   Changed the blockingness of the incoming fd
> > >       it was incorrectly blocking during the precopy phase after a postcopy
> was
> > >       enabled, causing the HMP to be unavailable.  It's now blocking only
> once
> > >       the postcopy thread starts up, since it's not a coroutine it can't deal
> > >       with the yields in qemu_file.
> > >   An error on the return-path now marks the migration as failed
> > >
> > >   Fixups from Dave Gibson's comments
> > >     Removed can_postcopy, renamed save_complete to
> > > save_complete_precopy
> > >         added save_complete_postcopy
> > >     Simplified loadvm loop exits
> > >     discard message format changes above
> > >     and many more smaller changes.
> > >
> > >   small fixups for RCU
> > >
> > >
> > > This work has been partially funded by the EU Orbit project:
> > >   see http://www.orbitproject.eu/about/
> > >
> > > TODO:
> > >   The major work is to rework the page send/receive loops so that
> supporting
> > >   larger host pages doesn't make it quite as messy.
> > >
> > > Dr. David Alan Gilbert (47):
> > >   Start documenting how postcopy works.
> > >   Split header writing out of qemu_savevm_state_begin
> > >   qemu_ram_foreach_block: pass up error value, and down the ramblock
> > >     name
> > >   Add qemu_get_counted_string to read a string prefixed by a count byte
> > >   Create MigrationIncomingState
> > >   Provide runtime Target page information
> > >   Move copy out of qemu_peek_buffer
> > >   Add qemu_get_buffer_less_copy to avoid copies some of the time
> > >   Add wrapper for setting blocking status on a QEMUFile
> > >   Rename save_live_complete to save_live_complete_precopy
> > >   Return path: Open a return path on QEMUFile for sockets
> > >   Return path: socket_writev_buffer: Block even on non-blocking fd's
> > >   Migration commands
> > >   Return path: Control commands
> > >   Return path: Send responses from destination to source
> > >   Return path: Source handling of return path
> > >   ram_debug_dump_bitmap: Dump a migration bitmap as text
> > >   Move loadvm_handlers into MigrationIncomingState
> > >   Rework loadvm path for subloops
> > >   Add migration-capability boolean for postcopy-ram.
> > >   Add wrappers and handlers for sending/receiving the postcopy-ram
> > >     migration messages.
> > >   MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
> > >   migrate_init: Call from savevm
> > >   Modify save_live_pending for postcopy
> > >   postcopy: OS support test
> > >   migrate_start_postcopy: Command to trigger transition to postcopy
> > >   MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
> > >   Add qemu_savevm_state_complete_postcopy
> > >   Postcopy: Maintain sentmap and calculate discard
> > >   postcopy: Incoming initialisation
> > >   postcopy: ram_enable_notify to switch on userfault
> > >   Postcopy: Postcopy startup in migration thread
> > >   Postcopy end in migration_thread
> > >   Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
> > >   Page request: Process incoming page request
> > >   Page request: Consume pages off the post-copy queue
> > >   postcopy_ram.c: place_page and helpers
> > >   Postcopy: Use helpers to map pages during migration
> > >   qemu_ram_block_from_host
> > >   Don't sync dirty bitmaps in postcopy
> > >   Host page!=target page: Cleanup bitmaps
> > >   Postcopy; Handle userfault requests
> > >   Start up a postcopy/listener thread ready for incoming page data
> > >   postcopy: Wire up loadvm_postcopy_handle_ commands
> > >   End of migration for postcopy
> > >   Disable mlock around incoming postcopy
> > >   Inhibit ballooning during postcopy
> > >
> > >  arch_init.c                      | 868
> ++++++++++++++++++++++++++++++++++++---
> > >  balloon.c                        |  11 +
> > >  docs/migration.txt               | 167 ++++++++
> > >  exec.c                           |  74 +++-
> > >  hmp-commands.hx                  |  15 +
> > >  hmp.c                            |   7 +
> > >  hmp.h                            |   1 +
> > >  hw/ppc/spapr.c                   |   2 +-
> > >  hw/virtio/virtio-balloon.c       |   4 +-
> > >  include/exec/cpu-all.h           |   2 -
> > >  include/exec/cpu-common.h        |   7 +-
> > >  include/migration/migration.h    | 126 +++++-
> > >  include/migration/postcopy-ram.h |  88 ++++
> > >  include/migration/qemu-file.h    |  15 +-
> > >  include/migration/vmstate.h      |  10 +-
> > >  include/qemu/typedefs.h          |   5 +
> > >  include/sysemu/balloon.h         |   2 +
> > >  include/sysemu/sysemu.h          |  45 +-
> > >  migration/Makefile.objs          |   2 +-
> > >  migration/block.c                |   9 +-
> > >  migration/migration.c            | 743
> +++++++++++++++++++++++++++++++--
> > >  migration/postcopy-ram.c         | 715
> > > ++++++++++++++++++++++++++++++++
> > >  migration/qemu-file-unix.c       | 106 ++++-
> > >  migration/qemu-file.c            | 100 ++++-
> > >  migration/rdma.c                 |   4 +-
> > >  migration/vmstate.c              |   5 +-
> > >  qapi-schema.json                 |  19 +-
> > >  qmp-commands.hx                  |  19 +
> > >  savevm.c                         | 809 ++++++++++++++++++++++++++++++++----
> > >  trace-events                     |  77 +++-
> > >  30 files changed, 3832 insertions(+), 225 deletions(-)  create mode
> > > 100644 include/migration/postcopy-ram.h  create mode 100644
> > > migration/postcopy- ram.c
> > >
> > > --
> > > 2.1.0
> > >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
       [not found]           ` <20150506083056.GB2204@work-vm>
@ 2015-05-07  1:21             ` Li, Liang Z
  2015-05-07  8:01               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Li, Liang Z @ 2015-05-07  1:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, qemu-devel
  Cc: aarcange, yamahata, quintela, amit.shah, pbonzini, yayanghy, david

> > > > Thanks Dave, I will retry according to your suggestion.
> > >
> > > Did that work for you?
> > >
> >
> > Yes, it works.
> 
> Great.
> 
> > Bye the way, I found that the source guest will resume after about 15
> > minuets if there are some network errors happened during post copy. Is it
> the expected behavior?
> > And have you any plan about handing such errors?
> 
> Interesting; it shouldn't do that.  I think it's best for the source to stay in
> paused following an error.  Were you driving it directly or via libvirt?
> 

Drive it directly.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
  2015-05-07  1:21             ` Li, Liang Z
@ 2015-05-07  8:01               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-07  8:01 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: aarcange, yamahata, quintela, qemu-devel, amit.shah, pbonzini,
	yayanghy, david

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > > Thanks Dave, I will retry according to your suggestion.
> > > >
> > > > Did that work for you?
> > > >
> > >
> > > Yes, it works.
> > 
> > Great.
> > 
> > > Bye the way, I found that the source guest will resume after about 15
> > > minuets if there are some network errors happened during post copy. Is it
> > the expected behavior?
> > > And have you any plan about handing such errors?
> > 
> > Interesting; it shouldn't do that.  I think it's best for the source to stay in
> > paused following an error.  Were you driving it directly or via libvirt?
> > 
> 
> Drive it directly.

OK, thanks, I'll have a look at it.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin Dr. David Alan Gilbert (git)
@ 2015-05-11 11:16   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-11 11:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:28], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Split qemu_savevm_state_begin to:
>   qemu_savevm_state_header   That writes the initial file header.
>   qemu_savevm_state_begin    That sets up devices and does the first
>                              device pass.
> 
> Used later in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

Function names sound silly now - savevm_state_header() and
savevm_state_begin().

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-05-15 10:38   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-15 10:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:29], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> check the return value of the function it calls and error if it's non-0
> Fixup qemu_rdma_init_one_block that is the only current caller,
>   and rdma_add_block the only function it calls using it.
> 
> Pass the name of the ramblock to the function; helps in debugging.

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
@ 2015-05-15 13:50   ` Amit Shah
  2015-05-15 14:06     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Amit Shah @ 2015-05-15 13:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:30], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> and use it in loadvm_state and ram_load.

This patch is doing several things at once:

- reducing size of a buffer from 257 to 256 (it's safe, but not
  mentioned in the commit log)

- adding an error return to one calling site (again not mentioned
  here)

> @@ -1145,13 +1145,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              total_ram_bytes = addr;
>              while (!ret && total_ram_bytes) {
>                  RAMBlock *block;
> -                uint8_t len;
>                  char id[256];
>                  ram_addr_t length;
>  
> -                len = qemu_get_byte(f);
> -                qemu_get_buffer(f, (uint8_t *)id, len);
> -                id[len] = 0;
> +                qemu_get_counted_string(f, id);
>                  length = qemu_get_be64(f);
>  
>                  QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {

- ... while not doing that for the other calling site.  In fact we
  really should check return value there too, isn't it?  buf[len] is
  set to 0, not buf[0] in case of an error, and ram_load could happily
  start using string functions on the bogus data in id[].

Can you please split the patches up, or write a verbose commit
message?

Also, I think you should just post these preparatory patches in a
separate series so they can be applied as they're good on their own.

Postcopy patches themselves can come as another series, and that also
makes reviewing easier.

Also:

> +
> +/*
> + * Get a string whose length is determined by a single preceding byte
> + * A preallocated 256 byte buffer must be passed in.
> + * Returns: 0 on success and a 0 terminated string in the buffer
> + */
> +int qemu_get_counted_string(QEMUFile *f, char buf[256])
> +{
> +    unsigned int len = qemu_get_byte(f);
> +    int res = qemu_get_buffer(f, (uint8_t *)buf, len);
> +
> +    buf[len] = 0;
> +
> +    return res != len;

since you're returning bool, how about making this bool?  Though I'd
like it if this was

return res == len ? res : 0;

BTW I'd like it if everything (return value, res, len) were all
unsigned.  The operations are safe, but it sucks we use signed values
for counting things all over the place.

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte
  2015-05-15 13:50   ` Amit Shah
@ 2015-05-15 14:06     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-15 14:06 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, Dr. David Alan Gilbert (git),
	qemu-devel, pbonzini, david, yayanghy

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 14 Apr 2015 [18:03:30], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > and use it in loadvm_state and ram_load.
> 
> This patch is doing several things at once:
> 
> - reducing size of a buffer from 257 to 256 (it's safe, but not
>   mentioned in the commit log)
> 
> - adding an error return to one calling site (again not mentioned
>   here)
> 
> > @@ -1145,13 +1145,10 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              total_ram_bytes = addr;
> >              while (!ret && total_ram_bytes) {
> >                  RAMBlock *block;
> > -                uint8_t len;
> >                  char id[256];
> >                  ram_addr_t length;
> >  
> > -                len = qemu_get_byte(f);
> > -                qemu_get_buffer(f, (uint8_t *)id, len);
> > -                id[len] = 0;
> > +                qemu_get_counted_string(f, id);
> >                  length = qemu_get_be64(f);
> >  
> >                  QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> 
> - ... while not doing that for the other calling site.  In fact we
>   really should check return value there too, isn't it?  buf[len] is
>   set to 0, not buf[0] in case of an error, and ram_load could happily
>   start using string functions on the bogus data in id[].
> 
> Can you please split the patches up, or write a verbose commit
> message?
> 
> Also, I think you should just post these preparatory patches in a
> separate series so they can be applied as they're good on their own.

Yep; OK, I can split them out on an individual basis.

> Postcopy patches themselves can come as another series, and that also
> makes reviewing easier.
> 
> Also:
> 
> > +
> > +/*
> > + * Get a string whose length is determined by a single preceding byte
> > + * A preallocated 256 byte buffer must be passed in.
> > + * Returns: 0 on success and a 0 terminated string in the buffer
> > + */
> > +int qemu_get_counted_string(QEMUFile *f, char buf[256])
> > +{
> > +    unsigned int len = qemu_get_byte(f);
> > +    int res = qemu_get_buffer(f, (uint8_t *)buf, len);
> > +
> > +    buf[len] = 0;
> > +
> > +    return res != len;
> 
> since you're returning bool, how about making this bool?  Though I'd
> like it if this was
> 
> return res == len ? res : 0;
> 
> BTW I'd like it if everything (return value, res, len) were all
> unsigned.  The operations are safe, but it sucks we use signed values
> for counting things all over the place.

Yes, I can do that (I intend some day to fix qemu_get_buffer and co to use size_t).

Thanks,

Dave

> Thanks,
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
@ 2015-05-18  6:58   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-18  6:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:31], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There are currently lots of pieces of incoming migration state scattered
> around,

next few patches don't move state into this new struct -- not sure if
you're doing that here, but would be nice to have them ordered
together (also in the separate series).

> and postcopy is adding more, and it seems better to try and keep
> it together.
> 
> allocate MIS in process_incoming_migration_co
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2015-05-18  7:06   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-18  7:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Tue) 14 Apr 2015 [18:03:32], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The migration code generally is built target-independent, however
> there are a few places where knowing the target page size would
> avoid artificially moving stuff into arch_init.
> 
> Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
> to other bits of code so that they can stay target-independent.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
@ 2015-05-18  7:35   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-18  7:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Tue) 14 Apr 2015 [18:03:35], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add a wrapper to change the blocking status on a QEMUFile
> rather than having to use qemu_set_block(qemu_get_fd(f));
> it seems best to avoid exposing the fd since not all QEMUFile's
> really have one.  With this wrapper we could move the implementation
> down to be different on different transports.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
@ 2015-05-18  7:35   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-18  7:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:36], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In postcopy we're going to need to perform the complete phase
> for postcopiable devices at a different point, start out by
> renaming all of the 'complete's to make the difference obvious.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer Dr. David Alan Gilbert (git)
@ 2015-05-21  6:47   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-21  6:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Tue) 14 Apr 2015 [18:03:33], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> qemu_peek_buffer currently copies the data it reads into a buffer,
> however the next patch wants access to the buffer without the copy,
> hence rework to remove the copy to the layer above.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

It's silly to do a copy in a peek function, so this is a good thing.

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
@ 2015-05-21  7:09   ` Amit Shah
  2015-05-21  8:45     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Amit Shah @ 2015-05-21  7:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david, yayanghy

On (Tue) 14 Apr 2015 [18:03:34], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> qemu_get_buffer always copies the data it reads to a users buffer,
> however in many cases the file buffer inside qemu_file could be given
> back to the caller, avoiding the copy.  This isn't always possible
> depending on the size and alignment of the data.
> 
> Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
> buffer or updates a pointer to the internal buffer if convenient.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/qemu-file.h |  2 ++
>  migration/qemu-file.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 3fe545e..4cac58f 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -159,6 +159,8 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
>  void qemu_put_be64(QEMUFile *f, uint64_t v);
>  int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset);
>  int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
> +int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size);
> +
>  /*
>   * Note that you can only peek continuous bytes from where the current pointer
>   * is; you aren't guaranteed to be able to peak to +n bytes unless you've
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 8dc5767..ec3a598 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -426,6 +426,51 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
>  }
>  
>  /*
> + * Read 'size' bytes of data from the file.
> + * 'size' can be larger than the internal buffer.
> + *
> + * The data:
> + *   may be held on an internal buffer (in which case *buf is updated
> + *     to point to it) that is valid until the next qemu_file operation.
> + * OR
> + *   will be copied to the *buf that was passed in.
> + *
> + * The code tries to avoid the copy if possible.

So is it expected that callers will store the originally-allocated
start location, so that g_free can be called on the correct location
in either case?  If that's a requirement, this text needs to be
updated.

If not (alternative idea below), text needs to be updated as well.

> + * It will return size bytes unless there was an error, in which case it will
> + * return as many as it managed to read (assuming blocking fd's which
> + * all current QEMUFile are)
> + */
> +int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size)
> +{
> +    int pending = size;
> +    int done = 0;
> +    bool first = true;
> +
> +    while (pending > 0) {
> +        int res;
> +        uint8_t *src;
> +
> +        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
> +        if (res == 0) {
> +            return done;
> +        }
> +        qemu_file_skip(f, res);
> +        done += res;
> +        pending -= res;
> +        if (first && res == size) {
> +            *buf = src;

So we've got to assume that buf was allocated by the calling function,
and since we're modifying the pointer (alternative idea to one above),
should we unallocate it here?  Can lead to a bad g_free later, or a
memleak.

> +            return done;

How about just 'break' instead of return?

> +        } else {
> +            first = false;
> +            memcpy(buf, src, res);
> +            buf += res;

In either case (break or return), the 'else' can be dropped..

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-05-21  7:09   ` Amit Shah
@ 2015-05-21  8:45     ` Dr. David Alan Gilbert
  2015-05-21  8:58       ` Amit Shah
  0 siblings, 1 reply; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-21  8:45 UTC (permalink / raw)
  To: Amit Shah; +Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 14 Apr 2015 [18:03:34], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > qemu_get_buffer always copies the data it reads to a users buffer,
> > however in many cases the file buffer inside qemu_file could be given
> > back to the caller, avoiding the copy.  This isn't always possible
> > depending on the size and alignment of the data.
> > 
> > Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
> > buffer or updates a pointer to the internal buffer if convenient.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/qemu-file.h |  2 ++
> >  migration/qemu-file.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 47 insertions(+)
> > 
> > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> > index 3fe545e..4cac58f 100644
> > --- a/include/migration/qemu-file.h
> > +++ b/include/migration/qemu-file.h
> > @@ -159,6 +159,8 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
> >  void qemu_put_be64(QEMUFile *f, uint64_t v);
> >  int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset);
> >  int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
> > +int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size);
> > +
> >  /*
> >   * Note that you can only peek continuous bytes from where the current pointer
> >   * is; you aren't guaranteed to be able to peak to +n bytes unless you've
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 8dc5767..ec3a598 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -426,6 +426,51 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
> >  }
> >  
> >  /*
> > + * Read 'size' bytes of data from the file.
> > + * 'size' can be larger than the internal buffer.
> > + *
> > + * The data:
> > + *   may be held on an internal buffer (in which case *buf is updated
> > + *     to point to it) that is valid until the next qemu_file operation.
> > + * OR
> > + *   will be copied to the *buf that was passed in.
> > + *
> > + * The code tries to avoid the copy if possible.
> 
> So is it expected that callers will store the originally-allocated
> start location, so that g_free can be called on the correct location
> in either case?  If that's a requirement, this text needs to be
> updated.
> 
> If not (alternative idea below), text needs to be updated as well.

see reply below.

> 
> > + * It will return size bytes unless there was an error, in which case it will
> > + * return as many as it managed to read (assuming blocking fd's which
> > + * all current QEMUFile are)
> > + */
> > +int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size)
> > +{
> > +    int pending = size;
> > +    int done = 0;
> > +    bool first = true;
> > +
> > +    while (pending > 0) {
> > +        int res;
> > +        uint8_t *src;
> > +
> > +        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
> > +        if (res == 0) {
> > +            return done;
> > +        }
> > +        qemu_file_skip(f, res);
> > +        done += res;
> > +        pending -= res;
> > +        if (first && res == size) {
> > +            *buf = src;
> 
> So we've got to assume that buf was allocated by the calling function,
> and since we're modifying the pointer (alternative idea to one above),
> should we unallocate it here?  Can lead to a bad g_free later, or a
> memleak.

My use tends to involve a buffer allocated once:

        uint8_t *mybuffer = g_malloc(...)

        while (aloop) {
            uint8_t *ourdata = mybuffer;

            if (qemu_get_buffer_less_copy(f, &ourdata, size)...) {
                do something with *ourdata
            }
  
        }
        g_free(mybuffer);

The pointer that's passed into qemu_get_buffer_less_copy is only a copy
of the allocation pointer, and thus you're not losing anything when it
changes it.

I've added the following text, does this make it clearer?

 * Note: Since **buf may get changed, the caller should take care to
 *       keep a pointer to the original buffer if it needs to deallocate it.

> > +            return done;
> 
> How about just 'break' instead of return?

Changed.

> 
> > +        } else {
> > +            first = false;
> > +            memcpy(buf, src, res);
> > +            buf += res;
> 
> In either case (break or return), the 'else' can be dropped..

Changed.

Thanks,

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-05-21  8:45     ` Dr. David Alan Gilbert
@ 2015-05-21  8:58       ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-05-21  8:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Thu) 21 May 2015 [09:45:19], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:

> > So we've got to assume that buf was allocated by the calling function,
> > and since we're modifying the pointer (alternative idea to one above),
> > should we unallocate it here?  Can lead to a bad g_free later, or a
> > memleak.
> 
> My use tends to involve a buffer allocated once:
> 
>         uint8_t *mybuffer = g_malloc(...)
> 
>         while (aloop) {
>             uint8_t *ourdata = mybuffer;
> 
>             if (qemu_get_buffer_less_copy(f, &ourdata, size)...) {
>                 do something with *ourdata
>             }
>   
>         }
>         g_free(mybuffer);
> 
> The pointer that's passed into qemu_get_buffer_less_copy is only a copy
> of the allocation pointer, and thus you're not losing anything when it
> changes it.
> 
> I've added the following text, does this make it clearer?
> 
>  * Note: Since **buf may get changed, the caller should take care to
>  *       keep a pointer to the original buffer if it needs to deallocate it.

Yes, thanks.

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2015-05-21  9:21   ` Amit Shah
  2015-05-21 10:10     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: Amit Shah @ 2015-05-21  9:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Tue) 14 Apr 2015 [18:03:43], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Misses out lines that are all the expected value so the output
> can be quite compact depending on the circumstance.

s/Misses out lines/Prints out lines to stderr/

s/the expected/an expected/

?

Also, some sentence explaining why this is being added - like helps
with debugging when <something> goes wrong, and how it saves time.  I
really think this is a good function to have, and if we can somehow
use this output to even create a visualisation of how far ahead we are
in the migration process, users can see fancy output which tells them
how fast things are converging (in precopy), and how fast the guest is
updating the memory vs throttling applied, etc.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  arch_init.c                   | 40 +++++++++++++++++++++++++++++++++++++++-
>  include/migration/migration.h |  1 +
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 3a21f0e..2b0cd18 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -833,13 +833,51 @@ static void reset_ram_globals(void)
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
>  
> -
>  /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
>   * start to become numerous it will be necessary to reduce the
>   * granularity of these critical sections.
>   */

This new function should go above this comment.

> +/*
> + * 'expected' is the value you expect the bitmap mostly to be full
> + * of and it won't bother printing lines that are all this value
> + * if 'todump' is null the migration bitmap is dumped.

Missing punctuation?  The last line there is a new sentence, isn't it?

> + */
> +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> +{
> +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> +
> +    int64_t cur;
> +    int64_t linelen = 128;
> +    char linebuf[129];
> +
> +    if (!todump) {
> +        todump = migration_bitmap;
> +    }
> +
> +    for (cur = 0; cur < ram_pages; cur += linelen) {
> +        int64_t curb;
> +        bool found = false;
> +        /*
> +         * Last line; catch the case where the line length
> +         * is longer than remaining ram
> +         */
> +        if (cur+linelen > ram_pages) {

spacing is off: whitespace around '+'.  (I didn't run checkpatch,
though.)  Similar below too.

> +            linelen = ram_pages - cur;
> +        }
> +        for (curb = 0; curb < linelen; curb++) {
> +            bool thisbit = test_bit(cur+curb, todump);

whitespace around +

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-05-21  9:21   ` Amit Shah
@ 2015-05-21 10:10     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-21 10:10 UTC (permalink / raw)
  To: Amit Shah; +Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 14 Apr 2015 [18:03:43], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Misses out lines that are all the expected value so the output
> > can be quite compact depending on the circumstance.
> 
> s/Misses out lines/Prints out lines to stderr/
> 
> s/the expected/an expected/
> 
> ?
> 
> Also, some sentence explaining why this is being added - like helps
> with debugging when <something> goes wrong, and how it saves time.  I
> really think this is a good function to have, and if we can somehow
> use this output to even create a visualisation of how far ahead we are
> in the migration process, users can see fancy output which tells them
> how fast things are converging (in precopy), and how fast the guest is
> updating the memory vs throttling applied, etc.


Changed to:
----
ram_debug_dump_bitmap: Dump a migration bitmap as text

Useful for debugging the migration bitmap and other bitmaps
of the same format (including the sentmap in postcopy).

The bitmap is printed to stderr.
Lines that are all the expected value are excluded so the output
can be quite compact for many bitmaps.
----

I'm not sure it's that useful to endusers; during the migration the bitmap
can be quite a mix and thus the 'exclude' doesn't help much so the output
can be quite big.  Sanidhya Kashyap's series that did repeated logging
is more useful as a tool to find out why things aren't converging.

> 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  arch_init.c                   | 40 +++++++++++++++++++++++++++++++++++++++-
> >  include/migration/migration.h |  1 +
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch_init.c b/arch_init.c
> > index 3a21f0e..2b0cd18 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -833,13 +833,51 @@ static void reset_ram_globals(void)
> >  
> >  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> >  
> > -
> >  /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
> >   * long-running RCU critical section.  When rcu-reclaims in the code
> >   * start to become numerous it will be necessary to reduce the
> >   * granularity of these critical sections.
> >   */
> 
> This new function should go above this comment.

Done.

> > +/*
> > + * 'expected' is the value you expect the bitmap mostly to be full
> > + * of and it won't bother printing lines that are all this value
> > + * if 'todump' is null the migration bitmap is dumped.
> 
> Missing punctuation?  The last line there is a new sentence, isn't it?

Replaced by:
/*
 * 'expected' is the value you expect the bitmap mostly to be full
 * of; it won't bother printing lines that are all this value.
 * If 'todump' is null the migration bitmap is dumped.
 */

> > + */
> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> > +{
> > +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> > +
> > +    int64_t cur;
> > +    int64_t linelen = 128;
> > +    char linebuf[129];
> > +
> > +    if (!todump) {
> > +        todump = migration_bitmap;
> > +    }
> > +
> > +    for (cur = 0; cur < ram_pages; cur += linelen) {
> > +        int64_t curb;
> > +        bool found = false;
> > +        /*
> > +         * Last line; catch the case where the line length
> > +         * is longer than remaining ram
> > +         */
> > +        if (cur+linelen > ram_pages) {
> 
> spacing is off: whitespace around '+'.  (I didn't run checkpatch,
> though.)  Similar below too.

Done, oddly checkpatch didn't moan - I don't see why.

> > +            linelen = ram_pages - cur;
> > +        }
> > +        for (curb = 0; curb < linelen; curb++) {
> > +            bool thisbit = test_bit(cur+curb, todump);
> 
> whitespace around +

Done; again checkpatch doesn't moan.

Thanks, 

Dave

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests
  2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2015-05-25  9:18   ` zhanghailiang
  2015-05-26  9:50     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 74+ messages in thread
From: zhanghailiang @ 2015-05-25  9:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, peter.huangpeng, amit.shah,
	pbonzini, yayanghy, david

On 2015/4/15 1:04, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> userfaultfd is a Linux syscall that gives an fd that receives a stream
> of notifications of accesses to pages registered with it and allows
> the program to acknowledge those stalls and tell the accessing
> thread to carry on.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   include/migration/migration.h |   4 +
>   migration/postcopy-ram.c      | 165 +++++++++++++++++++++++++++++++++++++++---
>   trace-events                  |   9 +++
>   3 files changed, 169 insertions(+), 9 deletions(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index db06fd2..4d6f33a 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -75,11 +75,15 @@ struct MigrationIncomingState {
>        */
>       QemuEvent      main_thread_load_event;
>
> +    bool           have_fault_thread;
>       QemuThread     fault_thread;
>       QemuSemaphore  fault_thread_sem;
>
>       /* For the kernel to send us notifications */
>       int            userfault_fd;
> +    /* To tell the fault_thread to quit */
> +    int            userfault_quit_fd;
> +
>       QEMUFile *return_path;
>       QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>       void          *postcopy_tmp_page;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 33aadbc..b2dc3b7 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -49,6 +49,8 @@ struct PostcopyDiscardState {
>    */
>   #if defined(__linux__)
>
> +#include <poll.h>
> +#include <sys/eventfd.h>
>   #include <sys/mman.h>
>   #include <sys/ioctl.h>
>   #include <sys/syscall.h>
> @@ -273,15 +275,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>    */
>   int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>   {
> -    /* TODO: Join the fault thread once we're sure it will exit */
> -    if (qemu_ram_foreach_block(cleanup_area, mis)) {
> -        return -1;
> +    trace_postcopy_ram_incoming_cleanup_entry();
> +
> +    if (mis->have_fault_thread) {
> +        uint64_t tmp64;
> +
> +        if (qemu_ram_foreach_block(cleanup_area, mis)) {
> +            return -1;
> +        }
> +        /*
> +         * Tell the fault_thread to exit, it's an eventfd that should
> +         * currently be at 0, we're going to inc it to 1
> +         */
> +        tmp64 = 1;
> +        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> +            trace_postcopy_ram_incoming_cleanup_join();
> +            qemu_thread_join(&mis->fault_thread);
> +        } else {
> +            /* Not much we can do here, but may as well report it */
> +            error_report("%s: incing userfault_quit_fd: %s", __func__,
> +                         strerror(errno));
> +        }
> +        trace_postcopy_ram_incoming_cleanup_closeuf();
> +        close(mis->userfault_fd);
> +        close(mis->userfault_quit_fd);
> +        mis->have_fault_thread = false;
>       }
>
> +    postcopy_state_set(mis, POSTCOPY_INCOMING_END);
> +    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
> +
>       if (mis->postcopy_tmp_page) {
>           munmap(mis->postcopy_tmp_page, getpagesize());
>           mis->postcopy_tmp_page = NULL;
>       }
> +    trace_postcopy_ram_incoming_cleanup_exit();
>       return 0;
>   }
>
> @@ -320,31 +348,150 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>   static void *postcopy_ram_fault_thread(void *opaque)
>   {
>       MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> -
> -    fprintf(stderr, "postcopy_ram_fault_thread\n");
> -    /* TODO: In later patch */
> +    uint64_t hostaddr; /* The kernel always gives us 64 bit, not a pointer */
> +    int ret;
> +    size_t hostpagesize = getpagesize();
> +    RAMBlock *rb = NULL;
> +    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> +    uint8_t *local_tmp_page;
> +
> +    trace_postcopy_ram_fault_thread_entry();
>       qemu_sem_post(&mis->fault_thread_sem);
> -    while (1) {
> -        /* TODO: In later patch */
> +
> +    local_tmp_page = mmap(NULL, getpagesize(),
> +                          PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,
> +                          -1, 0);
> +    if (!local_tmp_page) {
> +        error_report("%s mapping local tmp page: %s", __func__,
> +                     strerror(errno));
> +        return NULL;
>       }
> +    if (madvise(local_tmp_page, getpagesize(), MADV_DONTFORK)) {

What's this 'local_tmp_page' used for ? I don't find where it is used in this function.
Besides, there is a helper function qemu_madvise() in qemu, maybe you should use it :)

Thanks.

> +        munmap(local_tmp_page, getpagesize());
> +        error_report("%s postcopy local page DONTFORK: %s", __func__,
> +                     strerror(errno));
> +        return NULL;
> +    }
> +
> +    while (true) {
> +        ram_addr_t rb_offset;
> +        ram_addr_t in_raspace;
> +        struct pollfd pfd[2];
> +
> +        /*
> +         * We're mainly waiting for the kernel to give us a faulting HVA,
> +         * however we can be told to quit via userfault_quit_fd which is
> +         * an eventfd
> +         */
> +        pfd[0].fd = mis->userfault_fd;
> +        pfd[0].events = POLLIN;
> +        pfd[0].revents = 0;
> +        pfd[1].fd = mis->userfault_quit_fd;
> +        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> +        pfd[1].revents = 0;
> +
> +        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> +            error_report("%s: userfault poll: %s", __func__, strerror(errno));
> +            break;
> +        }
> +
> +        if (pfd[1].revents) {
> +            trace_postcopy_ram_fault_thread_quit();
> +            break;
> +        }
> +
> +        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
> +        if (ret != sizeof(hostaddr)) {
> +            if (errno == EAGAIN) {
> +                /*
> +                 * if a wake up happens on the other thread just after
> +                 * the poll, there is nothing to read.
> +                 */
> +                continue;
> +            }
> +            if (ret < 0) {
> +                error_report("%s: Failed to read full userfault hostaddr: %s",
> +                             __func__, strerror(errno));
> +                break;
> +            } else {
> +                error_report("%s: Read %d bytes from userfaultfd expected %zd",
> +                             __func__, ret, sizeof(hostaddr));
> +                break; /* Lost alignment, don't know what we'd read next */
> +            }
> +        }
>
> +        rb = qemu_ram_block_from_host((void *)(uintptr_t)hostaddr, true,
> +                                      &in_raspace, &rb_offset);
> +        if (!rb) {
> +            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
> +                         PRIx64, hostaddr);
> +            break;
> +        }
> +
> +        trace_postcopy_ram_fault_thread_request(hostaddr,
> +                                                qemu_ram_get_idstr(rb),
> +                                                rb_offset);
> +
> +        /*
> +         * Send the request to the source - we want to request one
> +         * of our host page sizes (which is >= TPS)
> +         */
> +        if (rb != last_rb) {
> +            last_rb = rb;
> +            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
> +                                     rb_offset, hostpagesize);
> +        } else {
> +            /* Save some space */
> +            migrate_send_rp_req_pages(mis, NULL,
> +                                     rb_offset, hostpagesize);
> +        }
> +    }
> +    munmap(local_tmp_page, getpagesize());
> +    trace_postcopy_ram_fault_thread_exit();
>       return NULL;
>   }
>
>   int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>   {
> -    /* Create the fault handler thread and wait for it to be ready */
> +    /* Open the fd for the kernel to give us userfaults */
> +    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> +    if (mis->userfault_fd == -1) {
> +        error_report("%s: Failed to open userfault fd: %s", __func__,
> +                     strerror(errno));
> +        return -1;
> +    }
> +
> +    /*
> +     * Although the host check already tested the API, we need to
> +     * do the check again as an ABI handshake on the new fd.
> +     */
> +    if (!ufd_version_check(mis->userfault_fd)) {
> +        return -1;
> +    }
> +
> +    /* Now an eventfd we use to tell the fault-thread to quit */
> +    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
> +    if (mis->userfault_quit_fd == -1) {
> +        error_report("%s: Opening userfault_quit_fd: %s", __func__,
> +                     strerror(errno));
> +        close(mis->userfault_fd);
> +        return -1;
> +    }
> +
>       qemu_sem_init(&mis->fault_thread_sem, 0);
>       qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>                          postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
>       qemu_sem_wait(&mis->fault_thread_sem);
>       qemu_sem_destroy(&mis->fault_thread_sem);
> +    mis->have_fault_thread = true;
>
>       /* Mark so that we get notified of accesses to unwritten areas */
>       if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
>           return -1;
>       }
>
> +    trace_postcopy_ram_enable_notify();
> +
>       return 0;
>   }
>
> diff --git a/trace-events b/trace-events
> index b65740c..72a65fa 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1500,6 +1500,15 @@ postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size
>   postcopy_ram_discard_range(void *start, void *end) "%p,%p"
>   postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
>   postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
> +postcopy_ram_enable_notify(void) ""
> +postcopy_ram_fault_thread_entry(void) ""
> +postcopy_ram_fault_thread_exit(void) ""
> +postcopy_ram_fault_thread_quit(void) ""
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_incoming_cleanup_closeuf(void) ""
> +postcopy_ram_incoming_cleanup_entry(void) ""
> +postcopy_ram_incoming_cleanup_exit(void) ""
> +postcopy_ram_incoming_cleanup_join(void) ""
>
>   # kvm-all.c
>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests
  2015-05-25  9:18   ` zhanghailiang
@ 2015-05-26  9:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 74+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-26  9:50 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, quintela, qemu-devel, peter.huangpeng,
	amit.shah, pbonzini, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/4/15 1:04, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >userfaultfd is a Linux syscall that gives an fd that receives a stream
> >of notifications of accesses to pages registered with it and allows
> >the program to acknowledge those stalls and tell the accessing
> >thread to carry on.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  include/migration/migration.h |   4 +
> >  migration/postcopy-ram.c      | 165 +++++++++++++++++++++++++++++++++++++++---
> >  trace-events                  |   9 +++
> >  3 files changed, 169 insertions(+), 9 deletions(-)
> >
> >diff --git a/include/migration/migration.h b/include/migration/migration.h
> >index db06fd2..4d6f33a 100644
> >--- a/include/migration/migration.h
> >+++ b/include/migration/migration.h
> >@@ -75,11 +75,15 @@ struct MigrationIncomingState {
> >       */
> >      QemuEvent      main_thread_load_event;
> >
> >+    bool           have_fault_thread;
> >      QemuThread     fault_thread;
> >      QemuSemaphore  fault_thread_sem;
> >
> >      /* For the kernel to send us notifications */
> >      int            userfault_fd;
> >+    /* To tell the fault_thread to quit */
> >+    int            userfault_quit_fd;
> >+
> >      QEMUFile *return_path;
> >      QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >      void          *postcopy_tmp_page;
> >diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >index 33aadbc..b2dc3b7 100644
> >--- a/migration/postcopy-ram.c
> >+++ b/migration/postcopy-ram.c
> >@@ -49,6 +49,8 @@ struct PostcopyDiscardState {
> >   */
> >  #if defined(__linux__)
> >
> >+#include <poll.h>
> >+#include <sys/eventfd.h>
> >  #include <sys/mman.h>
> >  #include <sys/ioctl.h>
> >  #include <sys/syscall.h>
> >@@ -273,15 +275,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> >   */
> >  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >  {
> >-    /* TODO: Join the fault thread once we're sure it will exit */
> >-    if (qemu_ram_foreach_block(cleanup_area, mis)) {
> >-        return -1;
> >+    trace_postcopy_ram_incoming_cleanup_entry();
> >+
> >+    if (mis->have_fault_thread) {
> >+        uint64_t tmp64;
> >+
> >+        if (qemu_ram_foreach_block(cleanup_area, mis)) {
> >+            return -1;
> >+        }
> >+        /*
> >+         * Tell the fault_thread to exit, it's an eventfd that should
> >+         * currently be at 0, we're going to inc it to 1
> >+         */
> >+        tmp64 = 1;
> >+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> >+            trace_postcopy_ram_incoming_cleanup_join();
> >+            qemu_thread_join(&mis->fault_thread);
> >+        } else {
> >+            /* Not much we can do here, but may as well report it */
> >+            error_report("%s: incing userfault_quit_fd: %s", __func__,
> >+                         strerror(errno));
> >+        }
> >+        trace_postcopy_ram_incoming_cleanup_closeuf();
> >+        close(mis->userfault_fd);
> >+        close(mis->userfault_quit_fd);
> >+        mis->have_fault_thread = false;
> >      }
> >
> >+    postcopy_state_set(mis, POSTCOPY_INCOMING_END);
> >+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
> >+
> >      if (mis->postcopy_tmp_page) {
> >          munmap(mis->postcopy_tmp_page, getpagesize());
> >          mis->postcopy_tmp_page = NULL;
> >      }
> >+    trace_postcopy_ram_incoming_cleanup_exit();
> >      return 0;
> >  }
> >
> >@@ -320,31 +348,150 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> >  static void *postcopy_ram_fault_thread(void *opaque)
> >  {
> >      MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> >-
> >-    fprintf(stderr, "postcopy_ram_fault_thread\n");
> >-    /* TODO: In later patch */
> >+    uint64_t hostaddr; /* The kernel always gives us 64 bit, not a pointer */
> >+    int ret;
> >+    size_t hostpagesize = getpagesize();
> >+    RAMBlock *rb = NULL;
> >+    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> >+    uint8_t *local_tmp_page;
> >+
> >+    trace_postcopy_ram_fault_thread_entry();
> >      qemu_sem_post(&mis->fault_thread_sem);
> >-    while (1) {
> >-        /* TODO: In later patch */
> >+
> >+    local_tmp_page = mmap(NULL, getpagesize(),
> >+                          PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,
> >+                          -1, 0);
> >+    if (!local_tmp_page) {
> >+        error_report("%s mapping local tmp page: %s", __func__,
> >+                     strerror(errno));
> >+        return NULL;
> >      }
> >+    if (madvise(local_tmp_page, getpagesize(), MADV_DONTFORK)) {
> 
> What's this 'local_tmp_page' used for ? I don't find where it is used in this function.

Yes, that was old code; it's already gone.

> Besides, there is a helper function qemu_madvise() in qemu, maybe you should use it :)

Thanks;  I could do for this one but I've got to be careful;
qemu_madvise is mainly for portability, and this code is Linux specific
anyway, and some of the QEMU_MADV_ flags aren't valid for all of the OSs
anyway.

Dave

> 
> Thanks.
> 
> >+        munmap(local_tmp_page, getpagesize());
> >+        error_report("%s postcopy local page DONTFORK: %s", __func__,
> >+                     strerror(errno));
> >+        return NULL;
> >+    }
> >+
> >+    while (true) {
> >+        ram_addr_t rb_offset;
> >+        ram_addr_t in_raspace;
> >+        struct pollfd pfd[2];
> >+
> >+        /*
> >+         * We're mainly waiting for the kernel to give us a faulting HVA,
> >+         * however we can be told to quit via userfault_quit_fd which is
> >+         * an eventfd
> >+         */
> >+        pfd[0].fd = mis->userfault_fd;
> >+        pfd[0].events = POLLIN;
> >+        pfd[0].revents = 0;
> >+        pfd[1].fd = mis->userfault_quit_fd;
> >+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> >+        pfd[1].revents = 0;
> >+
> >+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> >+            error_report("%s: userfault poll: %s", __func__, strerror(errno));
> >+            break;
> >+        }
> >+
> >+        if (pfd[1].revents) {
> >+            trace_postcopy_ram_fault_thread_quit();
> >+            break;
> >+        }
> >+
> >+        ret = read(mis->userfault_fd, &hostaddr, sizeof(hostaddr));
> >+        if (ret != sizeof(hostaddr)) {
> >+            if (errno == EAGAIN) {
> >+                /*
> >+                 * if a wake up happens on the other thread just after
> >+                 * the poll, there is nothing to read.
> >+                 */
> >+                continue;
> >+            }
> >+            if (ret < 0) {
> >+                error_report("%s: Failed to read full userfault hostaddr: %s",
> >+                             __func__, strerror(errno));
> >+                break;
> >+            } else {
> >+                error_report("%s: Read %d bytes from userfaultfd expected %zd",
> >+                             __func__, ret, sizeof(hostaddr));
> >+                break; /* Lost alignment, don't know what we'd read next */
> >+            }
> >+        }
> >
> >+        rb = qemu_ram_block_from_host((void *)(uintptr_t)hostaddr, true,
> >+                                      &in_raspace, &rb_offset);
> >+        if (!rb) {
> >+            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
> >+                         PRIx64, hostaddr);
> >+            break;
> >+        }
> >+
> >+        trace_postcopy_ram_fault_thread_request(hostaddr,
> >+                                                qemu_ram_get_idstr(rb),
> >+                                                rb_offset);
> >+
> >+        /*
> >+         * Send the request to the source - we want to request one
> >+         * of our host page sizes (which is >= TPS)
> >+         */
> >+        if (rb != last_rb) {
> >+            last_rb = rb;
> >+            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
> >+                                     rb_offset, hostpagesize);
> >+        } else {
> >+            /* Save some space */
> >+            migrate_send_rp_req_pages(mis, NULL,
> >+                                     rb_offset, hostpagesize);
> >+        }
> >+    }
> >+    munmap(local_tmp_page, getpagesize());
> >+    trace_postcopy_ram_fault_thread_exit();
> >      return NULL;
> >  }
> >
> >  int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> >  {
> >-    /* Create the fault handler thread and wait for it to be ready */
> >+    /* Open the fd for the kernel to give us userfaults */
> >+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> >+    if (mis->userfault_fd == -1) {
> >+        error_report("%s: Failed to open userfault fd: %s", __func__,
> >+                     strerror(errno));
> >+        return -1;
> >+    }
> >+
> >+    /*
> >+     * Although the host check already tested the API, we need to
> >+     * do the check again as an ABI handshake on the new fd.
> >+     */
> >+    if (!ufd_version_check(mis->userfault_fd)) {
> >+        return -1;
> >+    }
> >+
> >+    /* Now an eventfd we use to tell the fault-thread to quit */
> >+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
> >+    if (mis->userfault_quit_fd == -1) {
> >+        error_report("%s: Opening userfault_quit_fd: %s", __func__,
> >+                     strerror(errno));
> >+        close(mis->userfault_fd);
> >+        return -1;
> >+    }
> >+
> >      qemu_sem_init(&mis->fault_thread_sem, 0);
> >      qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> >                         postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
> >      qemu_sem_wait(&mis->fault_thread_sem);
> >      qemu_sem_destroy(&mis->fault_thread_sem);
> >+    mis->have_fault_thread = true;
> >
> >      /* Mark so that we get notified of accesses to unwritten areas */
> >      if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
> >          return -1;
> >      }
> >
> >+    trace_postcopy_ram_enable_notify();
> >+
> >      return 0;
> >  }
> >
> >diff --git a/trace-events b/trace-events
> >index b65740c..72a65fa 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1500,6 +1500,15 @@ postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size
> >  postcopy_ram_discard_range(void *start, void *end) "%p,%p"
> >  postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
> >  postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
> >+postcopy_ram_enable_notify(void) ""
> >+postcopy_ram_fault_thread_entry(void) ""
> >+postcopy_ram_fault_thread_exit(void) ""
> >+postcopy_ram_fault_thread_quit(void) ""
> >+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> >+postcopy_ram_incoming_cleanup_closeuf(void) ""
> >+postcopy_ram_incoming_cleanup_entry(void) ""
> >+postcopy_ram_incoming_cleanup_exit(void) ""
> >+postcopy_ram_incoming_cleanup_join(void) ""
> >
> >  # kvm-all.c
> >  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets
  2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-06-10  9:00   ` Amit Shah
  0 siblings, 0 replies; 74+ messages in thread
From: Amit Shah @ 2015-06-10  9:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, qemu-devel, pbonzini, david

On (Tue) 14 Apr 2015 [18:03:37], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
> 
> Wire it up for 'socket' QEMUFile's using a dup'd fd.

I'm not too sure we should create a new fd; just use the same one
since sockets are supposed to be used that way.

I suspect, though, that using an entirely new connection could
simplify a few things (like the next patch).

In a private chat, Dave mentioned he'd give this some thought (and
give it a go) -- hope we don't end up needing to do things the dup
way.

		Amit

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2015-06-10  9:00 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14 17:03 [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 01/47] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 02/47] Split header writing out of qemu_savevm_state_begin Dr. David Alan Gilbert (git)
2015-05-11 11:16   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 03/47] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2015-05-15 10:38   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 04/47] Add qemu_get_counted_string to read a string prefixed by a count byte Dr. David Alan Gilbert (git)
2015-05-15 13:50   ` Amit Shah
2015-05-15 14:06     ` Dr. David Alan Gilbert
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 05/47] Create MigrationIncomingState Dr. David Alan Gilbert (git)
2015-05-18  6:58   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 06/47] Provide runtime Target page information Dr. David Alan Gilbert (git)
2015-05-18  7:06   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 07/47] Move copy out of qemu_peek_buffer Dr. David Alan Gilbert (git)
2015-05-21  6:47   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 08/47] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
2015-05-21  7:09   ` Amit Shah
2015-05-21  8:45     ` Dr. David Alan Gilbert
2015-05-21  8:58       ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 09/47] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
2015-05-18  7:35   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 10/47] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
2015-05-18  7:35   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 11/47] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2015-06-10  9:00   ` Amit Shah
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 12/47] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 13/47] Migration commands Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 14/47] Return path: Control commands Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 15/47] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 16/47] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 17/47] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2015-05-21  9:21   ` Amit Shah
2015-05-21 10:10     ` Dr. David Alan Gilbert
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 18/47] Move loadvm_handlers into MigrationIncomingState Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 19/47] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 20/47] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 22/47] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 23/47] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 24/47] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 25/47] postcopy: OS support test Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 26/47] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:38   ` Eric Blake
2015-04-14 17:40     ` Dr. David Alan Gilbert
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 27/47] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2015-04-14 17:40   ` Eric Blake
2015-04-14 18:00     ` Dr. David Alan Gilbert
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 28/47] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 29/47] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 30/47] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 31/47] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 32/47] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
2015-04-14 17:03 ` [Qemu-devel] [PATCH v6 33/47] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 34/47] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 35/47] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 36/47] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 37/47] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 38/47] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 39/47] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 40/47] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 41/47] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 42/47] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2015-05-25  9:18   ` zhanghailiang
2015-05-26  9:50     ` Dr. David Alan Gilbert
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 43/47] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 44/47] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 45/47] End of migration for postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 46/47] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
2015-04-14 17:04 ` [Qemu-devel] [PATCH v6 47/47] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
2015-04-27  8:04 ` [Qemu-devel] [PATCH v6 00/47] Postcopy implementation Li, Liang Z
2015-04-29 17:23   ` Dr. David Alan Gilbert
2015-04-30  1:09     ` Li, Liang Z
     [not found]       ` <20150505150112.GM2126@work-vm>
     [not found]         ` <F2CBF3009FA73547804AE4C663CAB28E50F0E1@shsmsx102.ccr.corp.intel.com>
     [not found]           ` <20150506083056.GB2204@work-vm>
2015-05-07  1:21             ` Li, Liang Z
2015-05-07  8:01               ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.