On Thu, Mar 26, 2015 at 11:44:36AM +0000, Dr. David Alan Gilbert wrote: > * David Gibson (david@gibson.dropbear.id.au) wrote: > > On Wed, Mar 25, 2015 at 04:40:11PM +0000, Dr. David Alan Gilbert wrote: > > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > On Tue, Mar 24, 2015 at 08:04:14PM +0000, Dr. David Alan Gilbert wrote: > > > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > > > On Fri, Mar 20, 2015 at 12:37:59PM +0000, Dr. David Alan Gilbert wrote: > > > > > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > > > > > On Fri, Mar 13, 2015 at 10:19:54AM +0000, Dr. David Alan Gilbert wrote: > > > > > > > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > > > > > > > On Wed, Feb 25, 2015 at 04:51:43PM +0000, Dr. David Alan Gilbert (git) wrote: > > > > > > > > > > > > From: "Dr. David Alan Gilbert" > > > > > > > > > > > > > > > > > > > > > > > > Modify save_live_pending to return separate postcopiable and > > > > > > > > > > > > non-postcopiable counts. > > > > > > > > > > > > > > > > > > > > > > > > Add 'can_postcopy' to allow a device to state if it can postcopy > > > > > > > > > > > > > > > > > > > > > > What's the purpose of the can_postcopy callback? There are no callers > > > > > > > > > > > in this patch - is it still necessary with the change to > > > > > > > > > > > save_live_pending? > > > > > > > > > > > > > > > > > > > > The patch 'qemu_savevm_state_complete: Postcopy changes' uses > > > > > > > > > > it in qemu_savevm_state_postcopy_complete and qemu_savevm_state_complete > > > > > > > > > > to decide which devices must be completed at that point. > > > > > > > > > > > > > > > > > > Couldn't they check for non-zero postcopiable state from > > > > > > > > > save_live_pending instead? > > > > > > > > > > > > > > > > That would be a bit weird. > > > > > > > > > > > > > > > > At the moment for each device we call the: > > > > > > > > save_live_setup method (from qemu_savevm_state_begin) > > > > > > > > > > > > > > > > 0...multiple times we call: > > > > > > > > save_live_pending > > > > > > > > save_live_iterate > > > > > > > > > > > > > > > > and then we always call > > > > > > > > save_live_complete > > > > > > > > > > > > > > > > > > > > > > > > To my mind we have to call save_live_complete for any device > > > > > > > > that we've called save_live_setup on (maybe it allocated something > > > > > > > > in _setup that it clears up in _complete). > > > > > > > > > > > > > > > > save_live_pending could perfectly well return 0 remaining at the end of > > > > > > > > the migrate for our device, and thus if we used that then we wouldn't > > > > > > > > call save_live_complete. > > > > > > > > > > > > > > Um.. I don't follow. I was suggesting that at the precopy->postcopy > > > > > > > transition point you call save_live_complete for everything that > > > > > > > reports 0 post-copiable state. > > > > > > > > > > > > > > > > > > > > > Then again, a different approach would be to split the > > > > > > > save_live_complete hook into (possibly NULL) "complete precopy" and > > > > > > > "complete postcopy" hooks. The core would ensure that every chunk of > > > > > > > state has both completion hooks called (unless NULL). That might also > > > > > > > address my concerns about the no longer entirely accurate > > > > > > > save_live_complete function name. > > > > > > > > > > > > OK, that one I prefer. Are you OK with: > > > > > > qemu_savevm_state_complete_precopy > > > > > > calls -> save_live_complete_precopy > > > > > > > > > > > > qemu_savevm_state_complete_postcopy > > > > > > calls -> save_live_complete_postcopy > > > > > > > > > > > > ? > > > > > > > > > > Sounds ok to me. Fwiw, I was thinking that both the complete_precopy > > > > > and complete_postcopy hooks should always be called. For a > > > > > non-postcopy migration, the postcopy hooks would just be called > > > > > immediately after the precopy hooks. > > > > > > > > OK, I've made the change as described in my last mail; but I haven't called > > > > the complete_postcopy hook in the precopy case. If it was as simple as making > > > > all devices use one or the other then it would work, however there are > > > > existing (precopy) assumptions about ordering of device state on the wire that > > > > I want to be careful not to alter; for example RAM must come first is the one > > > > I know. > > > > > > Actually, I spoke too soon; testing this found a bad breakage. > > > > > > the functions in savevm.c add the per-section headers, and then call the _complete > > > methods on the devices. Those _complete methods can't elect to do nothing, because > > > a header has already been planted. > > > > Hrm.. couldn't you move the test for presence of the hook earlier so > > you don't sent the header if the hook is NULL? > > There's two tests that you have to make: > a) in qemu_savevm_state_complete_precopy do you call save_live_complete_precopy > b) in qemu_savevm_state_complete_postcopy do you call save_live_complete_postcopy > > The obvious case is if either hook is NULL you don't call it. > (a) is the harder cases, if we're doing postcopy then we don't want to call the > save_live_complete_precopy method on a device which isn't expecting to complete until > postcopy. Uh.. no, I'm expecting the complete_precopy hook to be called every time if it's non-NULL. If it's a postcopy item that doesn't expect to finish at the end of precopy, it should put its code in the complete_postcopy hook instead. That should be fine for the non-postcopy case too, because the postcopy_complete hook will be called momentarily. > The code in qemu_savevm_state_complete_precopy checks for the presence of the *postcopy* > hook, and doesn't emit the header or call the precopy commit if the postcopy hook > is present and we're in postcopy. > > > > I've ended up with something between the two; we still have a complete_precopy and > > > complete_postcopy method on the devices; if the complete_postcopy method exists and > > > we're in postcopy mode, the complete_precopy method isn't called at all. > > > A device could decide to do something different in complete_postcopy from complete_precopy > > > but it must do something to complete the section. > > > Effectively the presence of the complete_postcopy is now doing what > > > can_postcopy() used to do. > > > > Hmm.. but it means there's no per-device hook for the precopy to > > postcopy transition point. I'm not sure if that might matter. > > This is true, but if we needed a generic hook for that (which might be useful) > it probably shouldn't be 'complete'. Well, I'm thinking of these as "state (complete precopy)" rather than "(state complete) precopy". But yeah, a different name might be less ambiguous. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson