From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52162) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bu6cZ-00089y-7R for qemu-devel@nongnu.org; Tue, 11 Oct 2016 19:38:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bu6cW-000596-5o for qemu-devel@nongnu.org; Tue, 11 Oct 2016 19:38:07 -0400 Date: Wed, 12 Oct 2016 10:37:26 +1100 From: David Gibson Message-ID: <20161011233726.GC17844@umbus.fritz.box> References: <1475519097-27611-1-git-send-email-duanj@linux.vnet.ibm.com> <1475519097-27611-2-git-send-email-duanj@linux.vnet.ibm.com> <20161005101254.GD2036@work-vm> <9dc23ca2-9ca6-9b2e-9c22-257d11e234d1@linux.vnet.ibm.com> <20161007025442.GK18490@umbus.fritz.box> <20161007080749.GA2040@work-vm> <20161010053120.GG22498@umbus.fritz.box> <20161011161735.9563.26396@loki> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="PuGuTyElPB9bOcsM" Content-Disposition: inline In-Reply-To: <20161011161735.9563.26396@loki> Subject: Re: [Qemu-devel] [QEMU PATCH v5 1/6] migration: alternative way to set instance_id in SaveStateEntry List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: "Dr. David Alan Gilbert" , Jianjun Duan , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, dmitry@daynix.com, peter.maydell@linaro.org, kraxel@redhat.com, mst@redhat.com, pbonzini@redhat.com, veroniabahaa@gmail.com, quintela@redhat.com, amit.shah@redhat.com, mreitz@redhat.com, kwolf@redhat.com, rth@twiddle.net, aurelien@aurel32.net, leon.alrae@imgtec.com, blauwirbel@gmail.com, mark.cave-ayland@ilande.co.uk --PuGuTyElPB9bOcsM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 11, 2016 at 11:17:35AM -0500, Michael Roth wrote: > Quoting David Gibson (2016-10-10 00:31:20) > > On Fri, Oct 07, 2016 at 09:07:49AM +0100, Dr. David Alan Gilbert wrote: > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > On Wed, Oct 05, 2016 at 09:44:57AM -0700, Jianjun Duan wrote: > > > > > Please see comments below: > > > > >=20 > > > > > On 10/05/2016 03:12 AM, Dr. David Alan Gilbert wrote: > > > > > > * Jianjun Duan (duanj@linux.vnet.ibm.com) wrote: > > > > > >> In QOM(QEMU Object Model) migrated objects are identified with= instance_id > > > > > >> which is calculated automatically using their path in the QOM = composition > > > > > >> tree. For some objects, this path could change from source to = target in > > > > > >> migration. To migrate such objects, we need to make sure the i= nstance_id does > > > > > >> not change from source to target. We add a hook in DeviceClass= to do customized > > > > > >> instance_id calculation in such cases. > > > > > >=20 > > > > > > Can you explain a bit about why the path changes from source to= destination; > > > > > > the path here should be a feature of the guest state not the ho= st, and so I > > > > > > don't understand why it changes. > > > > > Please see the discussion with David in the previous versions: > > > > > http://lists.nongnu.org/archive/html/qemu-ppc/2016-06/msg00062.ht= ml > > > >=20 > > > > Um.. your description above really isn't an accurate summary of that > > > > discussion. > > > >=20 > > > > The point is not that the qom path will vary from source to > > > > destination for some arbitrary reason, but rather that we anticipate > > > > future changes in the QOM structure. Specifically we're considering > > > > eliminating the DRC objects, and folding their (limited) state into= an > > > > array in the parent object (either the machine or a PCI host bridge= ). > > > >=20 > > > > That would change the qom paths, and hence the auto-generated insta= nce > > > > ids, which would break migration between qemu versions before and > > > > after the restructure. > > > >=20 > > > > I'm not sure that changing the instance ids is enough though, anywa= y, > > > > since we're talking about eliminating the object entirely, the > > > > class/type information in the migration stream also wouldn't match. > > > >=20 > > > > Dave, if you have ideas on how to deal with that, I'd love to hear > > > > them > > >=20 > > > Eliminating the object entirely would be tricky to deal with; > > > allowing the structure to change would work if instead of a custom in= stance_id > > > you had a custom idstr. > >=20 > > Sorry, two misunderstandings here. > >=20 > > * When I said "structure" I meant in the high-level sense of what > > objects exist and are responsible for what things, not in the > > 'struct WhateverState' sense. > >=20 > > * In fact right now eliminating the objects would be ok, since they > > have no migration state (which causes the problems this series is > > trying to address). But applying this series which adds migration > > state would make it difficult to eliminate the objects in future. > > That's an awkward constraint given that we've already got some > > hints that these objects are not a good idea. > >=20 > > > However, the question then becomes why is the structure changing so m= uch; > > > ideally things in the migration stream should represent some tangible= object > > > that corresponds to something real, but (again ideally) the contents > > > of the stream should reflect the state of those objects not the curre= nt > > > implementation - so if you want to change the implementation the stre= am > > > doesn't change. Is it your implementation, or the understanding of w= hat > > > the objects actually represent that's in flux? > >=20 > > So, the objects in question are DRCs - "Dynamic Re-configuration > > Connector"; silly IBM talk for "a port into which something can be > > hotplugged", bascially. These aren't "real" devices, but rather a > > firmware/hypervisor abstraction which are used to describe a hotplug > > point. Each DRC allows either one CPU core, one PCI device, or one > > LMB (256MiB chunk of RAM) to be hotplugged (or removed). The PCI DRCs > > are "owned" by the PCI host bridge to which the device would be > > connected, the CPU and memory DRCs are owned by the machine. > >=20 > > The state variables which Jianjun Duan is adding to migration are > > values defined in the PAPR (hypervisor interface) spec, and so are > > tangible enough to be sensible to migrate. At the moment, each LMB is > > represented by a discrete QOM object, but I've been thinking for a > > while that this may be a mistake. In particular it's a problem for > > the LMB DRCs - because each LMB is only 256MiB of memory, we end up > > with thousands, maybe tens of thousands of DRC objects on a guest with > > with large maxmem (even if initial memory is small). The QOM > > infrastructure doesn't deal terribly well that many object child > > properties. > >=20 > > The construction of those DRCs used to be an N^3 algorithm, which as > > you'd imagine caused horrible startup times with large maxmem (minutes > > to hours above ~512GiB). We fixed it to be only N^2, so now the > > startup delays are merely bad (at least once you get to ~2TiB+ > > maxmem). > >=20 > > We could fix that by changing QOM to use a hash or tree for object > > children, instead of a plain list, but I'm not sure if it makes sense > > to do that with only this use case. >=20 > Is that still as much an issue as of this patch? >=20 > commit b604a854e843505007c59d68112c654556102a20 > Author: Pavel Fedin > Date: Tue Oct 13 13:37:45 2015 +0100 >=20 > qom: Replace object property list with GHashTable > =20 > ARM GICv3 systems with large number of CPUs create lots of IRQ pins. = Since > every pin is represented as a property, number of these properties be= comes > very large. Every property add first makes sure there's no duplicates. > Traversing the list becomes very slow, therefore QEMU initialization = takes > significant time (several seconds for e. g. 16 CPUs). > =20 > This patch replaces list with GHashTable, making lookup very fast. Th= e only > drawback is that object_child_foreach() and object_child_foreach_recu= rsive() > cannot add or remove properties during traversal, since GHashTableIte= r does > not have modify-safe version. However, the code seems not to modify o= bjects > via these functions. Huh.. good point. I'd completely forgotten that change had already happened. > > So, for some time, I've been considering rewriting the DRC stuff to be > > implemented via QOM interface on the "owner" object, rather than > > separate objects for every single connector. >=20 > At a high-level I think I'd prefer improving QOM over working around it's > performance issues. It seems like those limits are bound to be pushed > eventually given the wide range of hardware it's meant to support > in theory. Yeah, given we already have that hash, it probably makes sense to keep the DRCs as "real" objects. > But identifying a DRC state in the migration stream with a stable value b= ased > on DRC index makes sense either way. If I understand this patch it seems > like we have that: if a custom instance_id getter is available, we avoid > using the QOM path for idstr and use the default of vmsd->name, then use > the instance_id from the getter. If we dropped DRCs as objects, we'd need > a mechanism other than DeviceClass->dev_get_instance_id() to provide this > custom instance_id, but otherwise so long as the state was the same, and > the VMSD was the same, it seems like it should be doable in theory. So, as noted in other comments, I'm not sure we actually need a new mechanism for overriding the instance id. But regardless of the mechanism once we have a stable id based on the DRC index, yes it should be ok to migrate the state variables of each DRC this way. >=20 > >=20 > > So the difficulty is that if we add the state migration to the DRC > > objects as they stand, we prevent ourselves from doing a refactor > > that's probably a good idea. But no-one's yet had time to actually > > thrash out such a refactor. :/ > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --PuGuTyElPB9bOcsM Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX/XezAAoJEGw4ysog2bOSCRQP/06CZm8dCtLfjUvlKz2rorRU orUNwP+KjXq1oVcMgqajqebxTt92qIwBGnJ/m5Udwt8rIbKxcaWlyZCTFFfwHkpQ X0tkWjLVlfQfBzMUQweUye6PhSeF4vdMdqlI7zz9WPJroPhCUbqq+BBI/QaavsVt ESCQXcVxFr2JTIxMFJk5i+nBb88l0vo/6/17dSPeVWIzI5lBYnqzEnCWpk6Ww9lf q/a/xTIB1nGhQh/tJbbY+mD7pHHtOzFhwjdSfTpBbxybrhCPG17K5EcLMotOUF3w LnLG1iNx49+42fDAgTDzsIgY595cFIGuNvwxl3wW8CVslIiawUsc/Y+reQPVjWh8 Xsp8aQrf0Te6sfykLfV20aOSdYvbymrh0cNiLmz0jCsc8suCZIQdAZbTXHfS7Z9x n3KYeU8HWgQ5a1YQ96PblAZ7sNOgbWdrrvJiiCFo7WgTHkPyWjgU6LxXDyedaQzG ZIHiqKqcMdUz3d1c/xH0Qrf/dtv9EFha+iNr31PY1jwyXVe+o6XsCagjbyELBwKg YXPVsXFsfu0HeDVJ+H0Q37a9L9zETLz82/N8pox4CxckVaPWDEAhq5I7GsXNrBZH UhsgOZS4bg7/3i7SAI/bqigztpUx7vGReWuNQiDMLjXQIXxbp2p7HjGsaxN1oxXg VDi5+whkdNyVwqYMFbcJ =TsCd -----END PGP SIGNATURE----- --PuGuTyElPB9bOcsM--