From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34284) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQfCE-0007SK-KF for qemu-devel@nongnu.org; Wed, 25 Feb 2015 11:52:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YQfC9-0000xQ-IB for qemu-devel@nongnu.org; Wed, 25 Feb 2015 11:52:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56844) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQfC9-0000wc-Ar for qemu-devel@nongnu.org; Wed, 25 Feb 2015 11:52:21 -0500 From: "Dr. David Alan Gilbert (git)" Date: Wed, 25 Feb 2015 16:51:24 +0000 Message-Id: <1424883128-9841-2-git-send-email-dgilbert@redhat.com> In-Reply-To: <1424883128-9841-1-git-send-email-dgilbert@redhat.com> References: <1424883128-9841-1-git-send-email-dgilbert@redhat.com> Subject: [Qemu-devel] [PATCH v5 01/45] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com, david@gibson.dropbear.id.au From: "Dr. David Alan Gilbert" Signed-off-by: Dr. David Alan Gilbert --- docs/migration.txt | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 189 insertions(+) diff --git a/docs/migration.txt b/docs/migration.txt index 0492a45..c6c3798 100644 --- a/docs/migration.txt +++ b/docs/migration.txt @@ -294,3 +294,192 @@ save/send this state when we are in the middle of a pio operation (that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is not enabled, the values on that fields are garbage and don't need to be sent. + += Return path = + +In most migration scenarios there is only a single data path that runs +from the source VM to the destination, typically along a single fd (although +possibly with another fd or similar for some fast way of throwing pages across). + +However, some uses need two way communication; in particular the Postcopy destination +needs to be able to request pages on demand from the source. + +For these scenarios there is a 'return path' from the destination to the source; +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return +path. + + Source side + Forward path - written by migration thread + Return path - opened by main thread, read by return-path thread + + Destination side + Forward path - read by main thread + Return path - opened by main thread, written by main thread AND postcopy + thread (protected by rp_mutex) + += Postcopy = +'Postcopy' migration is a way to deal with migrations that refuse to converge; +its plus side is that there is an upper bound on the amount of migration traffic +and time it takes, the down side is that during the postcopy phase, a failure of +*either* side or the network connection causes the guest to be lost. + +In postcopy the destination CPUs are started before all the memory has been +transferred, and accesses to pages that are yet to be transferred cause +a fault that's translated by QEMU into a request to the source QEMU. + +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy +doesn't finish in a given time the switch is made to postcopy. + +=== Enabling postcopy === + +To enable postcopy (prior to the start of migration): + +migrate_set_capability x-postcopy-ram on + +The migration will still start in precopy mode, however issuing: + +migrate_start_postcopy + +will now cause the transition from precopy to postcopy. +It can be issued immediately after migration is started or any +time later on. Issuing it after the end of a migration is harmless. + +=== Postcopy device transfer === + +Loading of device data may cause the device emulation to access guest RAM +that may trigger faults that have to be resolved by the source, as such +the migration stream has to be able to respond with page data *during* the +device load, and hence the device data has to be read from the stream completely +before the device load begins to free the stream up. This is achieved by +'packaging' the device data into a blob that's read in one go. + +Source behaviour + +Until postcopy is entered the migration stream is identical to normal +precopy, except for the addition of a 'postcopy advise' command at +the beginning, to tell the destination that postcopy might happen. +When postcopy starts the source sends the page discard data and then +forms the 'package' containing: + + Command: 'postcopy ram listen' + The device state + A series of sections, identical to the precopy streams device state stream + containing everything except postcopiable devices (i.e. RAM) + Command: 'postcopy ram run' + +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the +contents are formatted in the same way as the main migration stream. + +Destination behaviour + +Initially the destination looks the same as precopy, with a single thread +reading the migration stream; the 'postcopy advise' and 'discard' commands +are processed to change the way RAM is managed, but don't affect the stream +processing. + +------------------------------------------------------------------------------ + 1 2 3 4 5 6 7 +main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) +thread | | + | (page request) + | \___ + v \ +listen thread: --- page -- page -- page -- page -- page -- + + a b c +------------------------------------------------------------------------------ + +On receipt of CMD_PACKAGED (1) + All the data associated with the package - the ( ... ) section in the +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread +recurses into qemu_loadvm_state_main to process the contents of the package (2) +which contains commands (3,6) and devices (4...) + +On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package) +a new thread (a) is started that takes over servicing the migration stream, +while the main thread carries on loading the package. It loads normal +background page data (b) but if during a device load a fault happens (5) the +returned page (c) is loaded by the listen thread allowing the main threads +device load to carry on. + +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination +CPUs start running. +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour +and is no longer used by migration, while the listen thread carries +on servicing page data until the end of migration. + +=== Postcopy states === + +Postcopy moves through a series of states (see postcopy_state) from +ADVISE->LISTEN->RUNNING->END + + Advise: Set at the start of migration if postcopy is enabled, even + if it hasn't had the start command; here the destination + checks that its OS has the support needed for postcopy, and performs + setup to ensure the RAM mappings are suitable for later postcopy. + (Triggered by reception of POSTCOPY_ADVISE command) + + Listen: The first command in the package, POSTCOPY_LISTEN, switches + the destination state to Listen, and starts a new thread + (the 'listen thread') which takes over the job of receiving + pages off the migration stream, while the main thread carries + on processing the blob. With this thread able to process page + reception, the destination now 'sensitises' the RAM to detect + any access to missing pages (on Linux using the 'userfault' + system). + + Running: POSTCOPY_RUN causes the destination to synchronise all + state and start the CPUs and IO devices running. The main + thread now finishes processing the migration package and + now carries on as it would for normal precopy migration + (although it can't do the cleanup it would do as it + finishes a normal migration). + + End: The listen thread can now quit, and perform the cleanup of migration + state, the migration is now complete. + +=== Source side page maps === + +The source side keeps two bitmaps during postcopy; 'the migration bitmap' +and 'sent map'. The 'migration bitmap' is basically the same as in +the precopy case, and holds a bit to indicate that page is 'dirty' - +i.e. needs sending. During the precopy phase this is updated as the CPU +dirties pages, however during postcopy the CPUs are stopped and nothing +should dirty anything any more. + +The 'sent map' is used for the transition to postcopy. It is a bitmap that +has a bit set whenever a page is sent to the destination, however during +the transition to postcopy mode it is masked against the migration bitmap +(sentmap &= migrationbitmap) to generate a bitmap recording pages that +have been previously been sent but are now dirty again. This masked +sentmap is sent to the destination which discards those now dirty pages +before starting the CPUs. + +Note that once in postcopy mode, the sent map is still updated; however, +its contents are not necessarily consistent with the pages already sent +due to the masking with the migration bitmap. + +=== Destination side page maps === + +(Needs to be changed so we can update both easily - at the moment updates are done + with a lock) +The destination keeps a state for each page which is 'missing', 'received' +or 'requested'; these three states are encoded in a 2 bit state array. +Incoming requests from the kernel cause the state to transition from 'missing' +to 'requested'. Received pages cause a transition from either 'missing' or +'requested' to 'received'; the kernel is notified on reception to wake up +any threads that were waiting for the page. +If the kernel requests a page that has already been 'received' the kernel is +notified without re-requesting. + +This leads to four valid page states: +page states: + missing - page not yet received or requested + received - Page received + requested - page requested but not yet received + +state transitions: + received -> missing (only during setup/discard) + missing -> received (normal incoming page) + requested -> received (incoming page previously requested) + missing -> requested (userfault request) -- 2.1.0