From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33105) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X37W9-0008O8-T8 for qemu-devel@nongnu.org; Fri, 04 Jul 2014 13:43:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X37W3-0000jq-M1 for qemu-devel@nongnu.org; Fri, 04 Jul 2014 13:43:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:20546) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X37W3-0000jj-Ar for qemu-devel@nongnu.org; Fri, 04 Jul 2014 13:43:19 -0400 From: "Dr. David Alan Gilbert (git)" Date: Fri, 4 Jul 2014 18:41:57 +0100 Message-Id: <1404495717-4239-47-git-send-email-dgilbert@redhat.com> In-Reply-To: <1404495717-4239-1-git-send-email-dgilbert@redhat.com> References: <1404495717-4239-1-git-send-email-dgilbert@redhat.com> Subject: [Qemu-devel] [PATCH 46/46] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, lilei@linux.vnet.ibm.com, quintela@redhat.com From: "Dr. David Alan Gilbert" Signed-off-by: Dr. David Alan Gilbert --- docs/migration.txt | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/docs/migration.txt b/docs/migration.txt index 0492a45..dbd5e5f 100644 --- a/docs/migration.txt +++ b/docs/migration.txt @@ -294,3 +294,151 @@ save/send this state when we are in the middle of a pio operation (that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is not enabled, the values on that fields are garbage and don't need to be sent. + += Return path = + +In most migration scenarios there is only a single data path that runs +from the source VM to the destination, typically along a single fd (although +possibly with another fd or similar for some fast way of throwing pages across). + +However, some uses need two way comms; in particular the Postcopy destination +needs to be able to request pages on demand from the source. + +For these scenarios there is a 'return path' from the destination to the source; +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return +path. + + Source side + Forward path - written by migration thread + Return path - opened by main thread, read by fd_handler on main thread + + Destination side + Forward path - read by main thread + Return path - opened by main thread, written by main thread AND postcopy + thread (protected by rp_mutex) + +Opening the return path generally sets the fd to be non-blocking so that a +failed destination can't block the source; and since the non-blockingness seems +to follow both directions it does alter the semantics of the forward path. + += Postcopy = +'Postcopy' migration is a way to deal with migrations that refuse to converge; +it's plus side is that there is an upper bound on the amount of migration traffic +and time it takes, the down side is that during the postcopy phase, a failure of +*either* side or the network connection causes the guest to be lost. + +In postcopy the destination CPUs are started before all the memory has been +transferred, and accesses to pages that are yet to be transferred cause +a fault that's translated by QEMU into a request to the source QEMU. + +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy +doesn't finish in a given time the switch is automatically made to precopy. + +=== Enabling postcopy === + +To enable pure postcopy: + +migrate_set_capability x-postcopy-ram on + +To add a period of precopy: + +migrate_set_parameter x-postcopy-start-time 500 + +(time in ms) + +=== Postcopy states === +Postcopy moves through a series of states (see postcopy_ram_state) +from ADVISE->LISTEN->RUNNING->END + + Advise: Set at the start of migration if postcopy is enabled, even + if it hasn't passed the start-time threshold; here the destination + checks it's OS has the support needed for postcopy, and performs + setup to ensure the RAM mappings are suitable for later postcopy. + (Triggered by reception of POSTCOPY_RAM_ADVISE command) + +Normal precopy now carries on as normal, until the point that the source +hits the start-time threshold and transitions to postcopy. The source +stops it's CPUs and transmits a 'discard bitmap' indicating pages that +have been previously sent but are now dirty again and hence are out of +date on the destination. + +The migration stream now contains a 'package' containing it's own chunk +of migration stream, followed by a return to a normal stream containing +page data. The package (sent as CMD_PACKAGED) contains the commands to +cycle the states on the destination, followed by all of the device +state excluding RAM. This lets the destination request pages from the +source in parallel with loading device state, this is required since +some devices (virtio) access guest memory during device initialisation. + + Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches + the destination state to Listen, and starts a new thread + (the 'listen thread') which takes over the job of receiving + pages off the migration stream, while the main thread carries + on processing the blob. With this thread able to process page + reception, the destination now 'sensitises' the RAM to detect + any access to missing pages (on Linux using the 'userfault' + system). + +The package now contains all the remaining state data and the command +to transition to the next state. + + Running: POSTCOPY_RAM_RUN causes the destination to synchronise all + state and start the CPUs and IO devices running. The main + thread now finishes processing the migration package and + now carries on as it would for normal precopy migration + (although it can't do the cleanup it would do as it + finishes a normal migration). + +Page data is sent from the source to the destination both as part +of a linear scan (like normal migration), and received by the 'listen thread', +When the destination tries to use a page it hasn't got, it requests +it from the source (down the return path) and the source sends this +page in the same stream. When the source has transmitted all pages +it sends a POSTCOPY_RAM_END command to transition to + + End: The listen thread can now quit, and perform the cleanup of migration +state, the migration is now complete. + +=== Source side page maps === +The source side keeps two bitmaps during postcopy; 'the migration bitmap' +and 'sent map'. The 'migration bitmap' is basically the same as in +the precopy case, and holds a bit to indicate that page is 'dirty' - +i.e. needs sending. During the precopy phase this is updated as the CPU +dirties pages, however during postcopy the CPUs are stopped and nothing +should dirty anything any more. + +The 'sent map' is used for the transition to postcopy. It is a bitmap that +has a bit set whenever a page is sent to the destination, however during +the transition to postcopy mode it is masked against the migration bitmap +(sentmap &= migrationbitmap) to generate a bitmap recording pages that +have been previously been sent but are now dirty again. This masked +sentmap is sent to the destination which discards those now dirty pages +before starting the CPUs. + +Note that once in postcopy mode, the sent map is still updated, however it's +contents are not-consistent as a local view of what's been sent since it's +only got the masked result. + +=== Destination side page maps === +(Needs to be changed so we can update both easily - at the moment updates are done + with a lock) +The destination keeps a 'requested map' and a 'received map'. +Both maps are initially 0, as pages are received the bits are set in 'received map'. +Incoming requests from the kernel cause the bit to be set in the 'requested map'. +When a page is received that is marked as 'requested' the kernel is notified. +If the kernel requests a page that has already been 'received' the kernel is notified +without re-requesting. + +This leads to three valid page states: +page states: + missing (!rc,!rq) - page not yet received or requested + received (rc,!rq) - Page received + requested (!rc,rq) - page requested but not yet received + +state transitions: + received -> missing (only during setup/discard) + + missing -> received (normal incoming page) + requested -> received (incoming page previously requested) + missing -> requested (userfault request) + -- 1.9.3