From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlQiz-0003FN-FG for qemu-devel@nongnu.org; Thu, 23 Apr 2015 19:40:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlQix-0007ol-Ng for qemu-devel@nongnu.org; Thu, 23 Apr 2015 19:40:05 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:50485) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlQix-0007kX-4j for qemu-devel@nongnu.org; Thu, 23 Apr 2015 19:40:03 -0400 Date: Thu, 23 Apr 2015 17:37:15 +1000 From: David Gibson Message-ID: <20150423073715.GC26536@voom.redhat.com> References: <1427131923-4670-1-git-send-email-afaerber@suse.de> <5523D0FF.7090609@de.ibm.com> <20150423073233.GB26536@voom.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="nmemrqcdn5VTmUEE" Content-Disposition: inline In-Reply-To: <20150423073233.GB26536@voom.redhat.com> Subject: Re: [Qemu-devel] cpu modelling and hotplug (was: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling, part 1) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger , g@voom.redhat.com Cc: Peter Maydell , Eduardo Habkost , Bharata B Rao , qemu-devel@nongnu.org, Alexander Graf , "Jason J. Herne" , Paolo Bonzini , Cornelia Huck , Igor Mammedov , Andreas =?iso-8859-1?Q?F=E4rber?= --nmemrqcdn5VTmUEE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 23, 2015 at 05:32:33PM +1000, David Gibson wrote: > On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote: > > We had a call and I was asked to write a summary about our conclusion. > >=20 > > The more I wrote, there more I became uncertain if we really came to a= =20 > > conclusion and became more certain that we want to define the QMP/HMP/C= LI > > interfaces first (or quite early in the process) > >=20 > > As discussed I will provide an initial document as a discussion starter > >=20 > > So here is my current understanding with each piece of information on o= ne line, so=20 > > that everybody can correct me or make additions: > >=20 > > current wrap-up of architecture support > > ------------------- > > x86 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > - supports cpu hotplug via cpu_add > >=20 > > power > > - Topology possible > > - interfaces to query topology? >=20 > For power, topology information is communicated via the > "ibm,associativity" (and related) properties in the device tree. This > is can encode heirarchical topologies, but it is *not* bound to the > socket/core/thread heirarchy. On the guest side in Power there's no > real notion of "socket", just cores with specified proximities to > various memory nodes. >=20 > > - SMT: Power8: no threads in host and full core passed in due to HW des= ign > > may change in the future > >=20 > > s/390 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - always virtualized via PR/SM LPAR > > - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st s= ocket, 4 in 2nd) > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > >=20 > >=20 > > Current downsides of CPU definitions/hotplug > > ----------------------------------------------- > > - smp, sockets=3D,cores=3D,threads=3D builds only homogeneous topology > > - cpu_add does not tell were to add > > - artificial icc bus construct on x86 for several reasons (link, sysbus= not hotpluggable..) >=20 > Artificial though it may be, I think having a "cpus" pseudo-bus is not > such a bad idea >=20 > > discussions > > ------------------- > > - we want to be able to (most important question, IHMO) > > - hotplug CPUs on power/x86/s390 and maybe others > > - define topology information > > - bind the guest topology to the host topology in some way > > - to host nodes > > - maybe also for gang scheduling of threads (might face reluctance = =66rom > > the linux scheduler folks) > > - not really deeply outlined in this call > > - QOM links must be allocated at boot time, but can be set later on > > - nothing that we want to expose to users > > - Machine provides QOM links that the device_add hotplug mechanism = can use to add > > new CPUs into preallocated slots. "CPUs" can be groups of cores a= nd/or threads.=20 > > - hotplug and initial config should use same semantics > > - cpu and memory topology might be somewhat independent > > --> - define nodes > > - map CPUs to nodes > > - map memory to nodes > >=20 > > - hotplug per > > - socket > > - core > > - thread > > ? > > Now comes the part where I am not sure if we came to a conclusion or no= t: > > - hotplug/definition per core (but not per thread) seems to handle all = cases > > - core might have multiple threads ( and thus multiple cpustates) > > - as device statement (or object?) > > - mapping of cpus to nodes or defining the topology not really > > outlined in this call > >=20 > > To be defined: > > - QEMU command line for initial setup > > - QEMU hmp/qmp interfaces for dynamic setup >=20 > So, I can't say I've entirely got my head around this, but here's my > thoughts so far. >=20 > I think the basic problem here is that the fixed socket -> core -> > thread heirarchy is something from x86 land that's become integrated > into qemu's generic code where it doesn't entirely make sense. >=20 > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: >=20 > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug >=20 > Now, returning to NUMA topology. What the guest, and therefore qemu, > really needs to know is the relative proximity of each thread to each > block of memory. That usually forms some sort of node heirarchy, > but it doesn't necessarily correspond to a socket->core->thread > heirarchy you can see in physical units. >=20 > On Power, an arbitrary NUMA node heirarchy can be described in the > device tree without reference to "cores" or "sockets", so really qemu > has no business even talking about such units. >=20 > IIUC, on x86 the NUMA topology is bound up to the socket->core->thread > heirarchy so it needs to have a notion of those layers, but ideally > that would be specific to the pc machine type. >=20 > So, here's what I'd propose: >=20 > 1) I think we really need some better terminology to refer to the unit > of plug/unplug. Until someone comes up with something better, I'm > going to use "CPU Module" (CM), to distinguish from the NUMA baggage > of "socket" and also to refer more clearly to the thing that goes into > the socket, rather than the socket itself. >=20 > 2) A Virtual CPU Module (vCM) need not correspond to a real physical > object. For machine types which we want to faithfully represent a > specific physical machine, it would. For generic or pure virtual > machines, the vCMs would be as small as possible. So for current > Power, they'd be one virtual core, for future power (maybe) or s390 a > single virtual thread. For x86 I'm not sure what they'd be. >=20 > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. >=20 > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. >=20 > 5) A vCM *might* have internal subdivisions into "cores" or "nodes" or > "chips" or "MCMs" or whatever, but that would be up to the machine > type specific code, and not represented in the QOM heirarchy. >=20 > 6) Obviously we'd need some backwards compat goo to sort out existing > command line options referring to cores and sockets into the new > representation. This will need machine type specific hooks - so for > x86 it would need to set up the right vCM subdivisions and make sure > the right NUMA topology info goes into ACPI. For -machine pseries I'm > thinking that "-smp sockets=3D2,cores=3D1,threads=3D4" and "-smp > sockets=3D1,cores=3D2,threads=3D4" should result in exactly the same thing > internally. >=20 >=20 > Thoughts? I should add - the terminology's a bit different, but I think in terms of code this should be very similar to the "socket" approach previously proposed. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --nmemrqcdn5VTmUEE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVOKEqAAoJEGw4ysog2bOSNgAQAJy1/VtxqrijoQ7+rLPCLram psm0ee+7ZEg7hyyTYYK7QA7/GplFd+NjzHhFw9X680lw/kVbBnox45kS6Fu8UHxv yLbUyP+3fmmOOtQhViUNtjRWJKNgG0cSIaWh6lOkazcmhLj3FeLkFQ8KONmmvEz9 7KUXyH2G7IgOOTur1gmaSO8peg+ZkJAE0R/DJ3oCWKdzdDtKHXsWdGt0dL/xCdwh +ESRvo3h6BKv786idiBFqfUfzlxy+9vunqt+MKO0/SzdW1TdjYTWtw7efRmez7Vt yJvyP83SnkdhYU7fop0vQtn3vc1IbacJ2MvsO9oSmoXHqQaUoces5rDcUKgOC+2W ThULhnoeeazj9D3RM/K1BtTn/BVRWYjiugWCX67dYK/oEX7gKBKR/QqPlYMOPvSX zZ0GjUUJTOXCUHUr69snyMWFRxtnrG1QhEDVmKji5mCH//GOoJ2BGxc8aOJ8YZSB 6/x4ZtCY8GdLyvNZQNVUcM4TmBuDVy+K3T02XHzSlAAmxTsuQRPu6Y5EqUHn7zy/ YAmuPp93Yj40EWsdx2XLFHkzBRWJdloccf3w139j76c0XAggAVxjflGtzA80Ixtv wpQvinHfajETcWeEPx9lY5bjud7O8f5HAWIWIWZfscSzQqIoZrgcdBpQwPqOQh8W KhJCNaKidLG3AW5a1V+Y =3wuR -----END PGP SIGNATURE----- --nmemrqcdn5VTmUEE--