From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35390) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YaxFV-0006jp-Sw for qemu-devel@nongnu.org; Wed, 25 Mar 2015 22:10:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YaxFQ-0003aK-0G for qemu-devel@nongnu.org; Wed, 25 Mar 2015 22:10:21 -0400 Received: from ozlabs.org ([103.22.144.67]:33765) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YaxFP-0003Ze-BC for qemu-devel@nongnu.org; Wed, 25 Mar 2015 22:10:15 -0400 Date: Thu, 26 Mar 2015 12:35:08 +1100 From: David Gibson Message-ID: <20150326013508.GF28039@voom.redhat.com> References: <1424883128-9841-21-git-send-email-dgilbert@redhat.com> <20150313010058.GZ11973@voom.redhat.com> <20150313101953.GA2486@work-vm> <20150316061840.GE5741@voom.redhat.com> <20150320123759.GE2468@work-vm> <20150323022542.GG25043@voom.fritz.box> <20150324200414.GG2332@work-vm> <20150324223227.GK25043@voom.fritz.box> <20150325150024.GH2313@work-vm> <20150325164010.GI2313@work-vm> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/QKKmeG/X/bPShih" Content-Disposition: inline In-Reply-To: <20150325164010.GI2313@work-vm> Subject: Re: [Qemu-devel] [PATCH v5 20/45] Modify savevm handlers for postcopy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, amit.shah@redhat.com, pbonzini@redhat.com, yanghy@cn.fujitsu.com --/QKKmeG/X/bPShih Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 25, 2015 at 04:40:11PM +0000, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > On Tue, Mar 24, 2015 at 08:04:14PM +0000, Dr. David Alan Gilbert wrot= e: > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > On Fri, Mar 20, 2015 at 12:37:59PM +0000, Dr. David Alan Gilbert = wrote: > > > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > > > On Fri, Mar 13, 2015 at 10:19:54AM +0000, Dr. David Alan Gilb= ert wrote: > > > > > > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > > > > > > On Wed, Feb 25, 2015 at 04:51:43PM +0000, Dr. David Alan = Gilbert (git) wrote: > > > > > > > > > > From: "Dr. David Alan Gilbert" > > > > > > > > > >=20 > > > > > > > > > > Modify save_live_pending to return separate postcopiabl= e and > > > > > > > > > > non-postcopiable counts. > > > > > > > > > >=20 > > > > > > > > > > Add 'can_postcopy' to allow a device to state if it can= postcopy > > > > > > > > >=20 > > > > > > > > > What's the purpose of the can_postcopy callback? There a= re no callers > > > > > > > > > in this patch - is it still necessary with the change to > > > > > > > > > save_live_pending? > > > > > > > >=20 > > > > > > > > The patch 'qemu_savevm_state_complete: Postcopy changes' us= es > > > > > > > > it in qemu_savevm_state_postcopy_complete and qemu_savevm_s= tate_complete > > > > > > > > to decide which devices must be completed at that point. > > > > > > >=20 > > > > > > > Couldn't they check for non-zero postcopiable state from > > > > > > > save_live_pending instead? > > > > > >=20 > > > > > > That would be a bit weird. > > > > > >=20 > > > > > > At the moment for each device we call the: > > > > > > save_live_setup method (from qemu_savevm_state_begin) > > > > > >=20 > > > > > > 0...multiple times we call: > > > > > > save_live_pending > > > > > > save_live_iterate > > > > > >=20 > > > > > > and then we always call > > > > > > save_live_complete > > > > > >=20 > > > > > >=20 > > > > > > To my mind we have to call save_live_complete for any device > > > > > > that we've called save_live_setup on (maybe it allocated someth= ing > > > > > > in _setup that it clears up in _complete). > > > > > >=20 > > > > > > save_live_pending could perfectly well return 0 remaining at th= e end of > > > > > > the migrate for our device, and thus if we used that then we wo= uldn't > > > > > > call save_live_complete. > > > > >=20 > > > > > Um.. I don't follow. I was suggesting that at the precopy->postc= opy > > > > > transition point you call save_live_complete for everything that > > > > > reports 0 post-copiable state. > > > > >=20 > > > > >=20 > > > > > Then again, a different approach would be to split the > > > > > save_live_complete hook into (possibly NULL) "complete precopy" a= nd > > > > > "complete postcopy" hooks. The core would ensure that every chun= k of > > > > > state has both completion hooks called (unless NULL). That might= also > > > > > address my concerns about the no longer entirely accurate > > > > > save_live_complete function name. > > > >=20 > > > > OK, that one I prefer. Are you OK with: > > > > qemu_savevm_state_complete_precopy > > > > calls -> save_live_complete_precopy > > > >=20 > > > > qemu_savevm_state_complete_postcopy > > > > calls -> save_live_complete_postcopy > > > >=20 > > > > ? > > >=20 > > > Sounds ok to me. Fwiw, I was thinking that both the complete_precopy > > > and complete_postcopy hooks should always be called. For a > > > non-postcopy migration, the postcopy hooks would just be called > > > immediately after the precopy hooks. > >=20 > > OK, I've made the change as described in my last mail; but I haven't ca= lled > > the complete_postcopy hook in the precopy case. If it was as simple as= making > > all devices use one or the other then it would work, however there are > > existing (precopy) assumptions about ordering of device state on the wi= re that > > I want to be careful not to alter; for example RAM must come first is t= he one > > I know. >=20 > Actually, I spoke too soon; testing this found a bad breakage. >=20 > the functions in savevm.c add the per-section headers, and then call the = _complete > methods on the devices. Those _complete methods can't elect to do nothin= g, because > a header has already been planted. Hrm.. couldn't you move the test for presence of the hook earlier so you don't sent the header if the hook is NULL? > I've ended up with something between the two; we still have a complete_p= recopy and > complete_postcopy method on the devices; if the complete_postcopy method = exists and > we're in postcopy mode, the complete_precopy method isn't called at all. > A device could decide to do something different in complete_postcopy from= complete_precopy > but it must do something to complete the section. > Effectively the presence of the complete_postcopy is now doing what > can_postcopy() used to do. Hmm.. but it means there's no per-device hook for the precopy to postcopy transition point. I'm not sure if that might matter. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --/QKKmeG/X/bPShih Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVE2JMAAoJEGw4ysog2bOSLKsP/iI/YAPmgBQg/z9gFO1GTNBE RZc5Yh+Z75w0ySGErP5nbwOJRBpKG5W78kuN62V7EbDW9YLScjPBPrPOKBc8lBoT 6SAQTeB7uGfyCkeq0aXb/k5+fq2gMh1nWorILQ0u1mUO/c3hBUlcaX8p1PLZ3LfS KUQ5/H4Hu1y5IZwCvOtMBKenHVhFmUv12oMQt07Acw6V2JaSQZS/2pPB/Xo6rjX1 zNIkazPXgx5L22zXRNEHoasqkHP9LjBX1L7el8R8VNNx69OYq45XPuJbD9PIjlW9 ZF6V/TsxhucOlbQbC27eTMRZiAxNDNV37XjiYL7OpUnhidNfsxq+URvgWKhA9jXc fOZH2aygNDwTUG0nEePn9lOsjbHfG0xvUrwIIKoz4xLE0sdmou0RCeq7TNuKIPiP rIaeWPt+SPuvRaJKCC+AowJaRydDZ9956g0efsO93SSj0AwfdbayH9YeXZMB2EzB Ehns2tNOaQV94UduNte0lrhBigjtIBx8tVOPOaHWJ7RpF+oRk+kW8DdjlI3F/KQy XlecN6LJPliLxOrq6HIPy3BO1eNTVzYD+Jsgyz8iVH9uexeZGcIdu1l1TDolF5m3 XAE7giI7x1CmikM/5cijGvpPPr/aU3pSFDrexT+DDGNwCCXCLqRkFOrGcnmXAgfm MlDcdNTi9LFLiS8dlNPG =6dk3 -----END PGP SIGNATURE----- --/QKKmeG/X/bPShih--