From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [RFC 0/5] xen/arm: support big.little SoC Date: Thu, 22 Sep 2016 10:50:23 +0200 Message-ID: <1474534223.4393.320.camel@citrix.com> References: <4c52141f-a6a4-a0b1-dced-f799b592481e@arm.com> <61196660-df7c-7324-2fb6-cfb11f44ea1e@arm.com> <39623498-bb30-4ff7-f075-219487a5afbb@arm.com> <6bd7d587-f9ba-c3bf-db96-46a2958d9e5b@arm.com> <1ae3ca04-2fdd-531f-7cb1-0b3ab80feccb@arm.com> <20160922064928.GB19448@linux-u7w5.ap.freescale.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8425115555532270665==" Return-path: In-Reply-To: <20160922064928.GB19448@linux-u7w5.ap.freescale.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Peng Fan , Julien Grall Cc: Juergen Gross , Peng Fan , Stefano Stabellini , Steve Capper , George Dunlap , Andrew Cooper , Punit Agrawal , George Dunlap , "xen-devel@lists.xen.org" , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============8425115555532270665== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-ap5dmCnZIqTdB32r6iu/" --=-ap5dmCnZIqTdB32r6iu/ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote: > On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote: > >=20 > > Hi Stefano, > >=20 > > On 21/09/2016 19:13, Stefano Stabellini wrote: > > >=20 > > > On Wed, 21 Sep 2016, Julien Grall wrote: > > > >=20 > > > > (CC a couple of ARM folks) > > > >=20 > > > > On 21/09/16 11:22, George Dunlap wrote: > > > > >=20 > > > > > On 21/09/16 11:09, Julien Grall wrote: > > > > > >=20 > > > > > >=20 > > > > > >=20 > > > > > > On 20/09/16 21:17, Stefano Stabellini wrote: > > > > > > >=20 > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote: > > > > > > > >=20 > > > > > > > > Hi Stefano, > > > > > > > >=20 > > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote: > > > > > > > > >=20 > > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote: > > > > > > > > > >=20 > > > > > > > > > > Hi, > > > > > > > > > >=20 > > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote: > > > > > > > > > > >=20 > > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >=20 > > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario > > > > > > > > > > > > Faggioli > > > > > > > > > > > > wrote: > > > > > > > > > > > > >=20 > > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano > > > > > > > > > > > > > Stabellini wrote: > > > > > > > > > > > > > >=20 > > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote: > > > > > > > > > > > > I'd like to add a computing capability in > > > > > > > > > > > > xen/arm, like this: > > > > > > > > > > > >=20 > > > > > > > > > > > > struct compute_capatiliby > > > > > > > > > > > > { > > > > > > > > > > > > =C2=A0 char *core_name; > > > > > > > > > > > > =C2=A0 uint32_t rank; > > > > > > > > > > > > =C2=A0 uint32_t cpu_partnum; > > > > > > > > > > > > }; > > > > > > > > > > > >=20 > > > > > > > > > > > > struct compute_capatiliby cc=3D > > > > > > > > > > > > { > > > > > > > > > > > > =C2=A0{"A72", 4, 0xd08}, > > > > > > > > > > > > =C2=A0{"A57", 3, 0xxxx}, > > > > > > > > > > > > =C2=A0{"A53", 2, 0xd03}, > > > > > > > > > > > > =C2=A0{"A35", 1, ...}, > > > > > > > > > > > > } > > > > > > > > > > > >=20 > > > > > > > > > > > > Then when identify cpu, we decide which cpu is > > > > > > > > > > > > big and which > > > > > > > > > > > > cpu is > > > > > > > > > > > > little > > > > > > > > > > > > according to the computing rank. > > > > > > > > > > > >=20 > > > > > > > > > > > > Any comments? > > > > > > > > > > >=20 > > > > > > > > > > > I think we definitely need to have Xen have some > > > > > > > > > > > kind of idea > > > > > > > > > > > the > > > > > > > > > > > order between processors, so that the user > > > > > > > > > > > doesn't need to > > > > > > > > > > > figure out > > > > > > > > > > > which class / pool is big and which pool is > > > > > > > > > > > LITTLE.=C2=A0=C2=A0Whether > > > > > > > > > > > this > > > > > > > > > > > sort > > > > > > > > > > > of enumeration is the best way to do that I'll > > > > > > > > > > > let Julien and > > > > > > > > > > > Stefano > > > > > > > > > > > give their opinion. > > > > > > > > > >=20 > > > > > > > > > > I don't think an hardcoded list of processor in Xen > > > > > > > > > > is the right > > > > > > > > > > solution. > > > > > > > > > > There are many existing processors and combinations > > > > > > > > > > for big.LITTLE > > > > > > > > > > so it > > > > > > > > > > will > > > > > > > > > > nearly be impossible to keep updated. > > > > > > > > > >=20 > > > > > > > > > > I would expect the firmware table (device tree, > > > > > > > > > > ACPI) to provide > > > > > > > > > > relevant > > > > > > > > > > data > > > > > > > > > > for each processor and differentiate big from > > > > > > > > > > LITTLE core. > > > > > > > > > > Note that I haven't looked at it for now. A good > > > > > > > > > > place to start is > > > > > > > > > > looking > > > > > > > > > > at > > > > > > > > > > how Linux does. > > > > > > > > >=20 > > > > > > > > > That's right, see > > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It > > > > > > > > > is > > > > > > > > > trivial to identify the two different CPU classes and > > > > > > > > > which cores > > > > > > > > > belong > > > > > > > > > to which class.t, as > > > > > > > >=20 > > > > > > > > The class of the CPU can be found from the MIDR, there > > > > > > > > is no need to > > > > > > > > use the > > > > > > > > device tree/acpi for that. Note that I don't think > > > > > > > > there is an easy > > > > > > > > way in > > > > > > > > ACPI (i.e not in AML) to find out the class. > > > > > > > >=20 > > > > > > > > >=20 > > > > > > > > > It is harder to figure out which one is supposed to > > > > > > > > > be > > > > > > > > > big and which one LITTLE. Regardless, we could > > > > > > > > > default to using the > > > > > > > > > first cluster (usually big), which is also the > > > > > > > > > cluster of the boot > > > > > > > > > cpu, > > > > > > > > > and utilize the second cluster only when the user > > > > > > > > > demands it. > > > > > > > >=20 > > > > > > > > Why do you think the boot CPU will usually be a big > > > > > > > > one? In the case > > > > > > > > of Juno > > > > > > > > platform it is configurable, and the boot CPU is a > > > > > > > > little core on r2 > > > > > > > > by > > > > > > > > default. > > > > > > > >=20 > > > > > > > > In any case, what we care about is differentiate > > > > > > > > between two set of > > > > > > > > CPUs. I > > > > > > > > don't think Xen should care about migrating a guest > > > > > > > > vCPU between big > > > > > > > > and > > > > > > > > LITTLE cpus. So I am not sure why we would want to know > > > > > > > > that. > > > > > > >=20 > > > > > > > No, it is not about migrating (at least yet). It is about > > > > > > > giving useful > > > > > > > information to the user. It would be nice if the user had > > > > > > > to choose > > > > > > > between "big" and "LITTLE" rather than "class 0x1" and > > > > > > > "class 0x100", or > > > > > > > even "A7" or "A15". > > > > > >=20 > > > > > > I don't think it is wise to assume that we may have only 2 > > > > > > kind of CPUs > > > > > > on the platform. We may have more in the future, if so how > > > > > > would you > > > > > > name them? > > > > >=20 > > > > > I would suggest that internally Xen recognize an arbitrary > > > > > number of > > > > > processor "classes", and order them according to more > > > > > powerful -> less > > > > > powerful.=C2=A0=C2=A0Then if at some point someone makes a platfo= rm > > > > > with three > > > > > processors, you can say "class 0", "class 1" or "class > > > > > 2".=C2=A0=C2=A0"big" would > > > > > be an alias for "class 0" and "little" would be an alias for > > > > > "class 1". > > > >=20 > > > > As mentioned earlier, there is no upstreamed yet device tree > > > > bindings to know > > > > the "power" of a CPU (see [1] > > > >=20 > > > > >=20 > > > > >=20 > > > > > And in my suggestion, we allow a richer set of labels, so > > > > > that the user > > > > > could also be more specific -- e.g., asking for "A15" > > > > > specifically, for > > > > > example, and failing to build if there are no A15 cores > > > > > present, while > > > > > allowing users to simply write "big" or "little" if they want > > > > > simplicity > > > > > / things which work across different platforms. > > > >=20 > > > > Well, before trying to do something clever like that (i.e > > > > naming "big" and > > > > "little"), we need to have upstreamed bindings available to > > > > acknowledge the > > > > difference. AFAICT, it is not yet upstreamed for Device Tree > > > > (see [1]) and I > > > > don't know any static ACPI tables providing the similar > > > > information. > > >=20 > > > I like George's idea that "big" and "little" could be just > > > convenience > > > aliases. Of course they are predicated on the necessary device > > > tree > > > bindings being upstream. We don't need [1] to be upstream in > > > Linux, just > > > the binding: > > >=20 > > > http://marc.info/?l=3Dlinux-arm-kernel&m=3D147308556729426&w=3D2 > > >=20 > > > which has already been acked by the relevant maintainers. > >=20 > > This is device tree only. What about ACPI? > >=20 > > >=20 > > >=20 > > >=20 > > > >=20 > > > > I had few discussions and=C2=A0=C2=A0more thought about big.LITTLE > > > > support in Xen. The > > > > main goal of big.LITTLE is power efficiency by moving task > > > > around and been > > > > able to idle one cluster. All the solutions suggested > > > > (including mine) so far, > > > > can be replicated by hand (except the VPIDR) so they are mostly > > > > an automatic > > > > way. This will also remove the real benefits of big.LITTLE > > > > because Xen will > > > > not be able to migrate vCPU across cluster for power > > > > efficiency. > > >=20 > > > The goal of the architects of big.LITTLE might have been power > > > efficiency, but of course we are free to use any features that > > > the > > > hardware provides in the best way for Xen and the Xen community. > >=20 > > This is very dependent on how the big.LITTLE has been implemented > > by the > > hardware. Some platform can not run both big and LITTLE cores at > > the same > > time. You need a proper switch in the firmware/hypervisor. > >=20 > > >=20 > > >=20 > > > >=20 > > > > If we care about power efficiency, we would have to handle > > > > seamlessly > > > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU). > > > > This arise quite > > > > few problem, nothing insurmountable, similar to migration > > > > across two platforms > > > > with different micro-architecture (e.g processors): errata, > > > > features > > > > supported... The guest would have to know the union of all the > > > > errata (this is > > > > done so far via the MIDR, so we would a PV way to do it), and > > > > only the > > > > intersection of features would be exposed to the guest. This > > > > also means the > > > > scheduler would have to be modified to handle power efficiency > > > > (not strictly > > > > necessary at the beginning). > > > >=20 > > > > I agree that a such solution would require some work to > > > > implement, although > > > > Xen will have a better control of the energy consumption of the > > > > platform. > > > >=20 > > > > So the question here, is what do we want to achieve with > > > > big.LITTLE? > > >=20 > > > I don't think that handling seamlessly big.LITTLE in Xen is the > > > best way > > > to do it in the scenarios where Xen on ARM is being used today. I > > > understand the principles behind it, but I don't think that it > > > will lead > > > to good results in a virtualized environment, where there is more > > > activity and more vcpus than pcpus. > >=20 > > Can you detail why you don't think it will give good results? > >=20 > > >=20 > > >=20 > > > What we discussed in this thread so far is actionable, and gives > > > us > > > big.LITTLE support in a short time frame. It is a good fit for > > > Xen on > > > ARM use cases and still leads to lower power consumption with an > > > wise > > > allocation of big and LITTLE vcpus and pcpus to guests. > >=20 > > How this would lead to lower power consumption? If there is nothing > > running > > of the processor we would have a wfi loop which will never put the > > physical > > CPU in deep sleep. The main advantage of big.LITTLE is too be able > > to switch > > off a cluster/cpu when it is not used. > >=20 > > Without any knowledge in Xen (such as CPU freq), I am afraid the > > the power > > consumption will still be the same. > >=20 > > >=20 > > >=20 > > > I would start from this approach, then if somebody comes along > > > with a > > > plan to implement a big.LITTLE switcher in Xen, I welcome her to > > > do it > > > and I would be happy to accept the code in Xen. We'll just make > > > it > > > optional. > >=20 > > I think we are discussing here a simple design for big.LITTLE. I > > never asked > > Peng to do all the work. I am worry that if we start to expose the > > big.LITTLE > > to the userspace it will be hard in the future to step back from > > it. >=20 > Hello Julien, >=20 >=20 > I prefer the simple doable method, and As Stefano said, actionable. >=20 Yep, this is a very good starting point, IMO. > =C2=A0- introduce a hypercall interface to let xl can get the different > classes cpu info. > +1 > =C2=A0- Use vcpuclass in xl cfg file to let user create different vcpus > Yep. > =C2=A0- extract cpu computing cap from dts to differentiate cpus. As you > said, bindings > =C2=A0=C2=A0=C2=A0not upstreamed. But this is not the hardpoint, we could= change the > info, whether > =C2=A0=C2=A0=C2=A0dmips or cap in future. > Yes (or I should say, "whatever", as I know nothing about all this! :-P) > =C2=A0- use cpu hard affinity to block vcpu scheduling bwtween little and > big pcpu. > "to block vcpu scheduling within the specified classes, for each vcpu" But, again, yes. > =C2=A0- block user setting vcpu hard affinity bwtween big and LITTLE. > "between the specified class" Indeed. > =C2=A0- only hard affinity seems enough, no need soft affinity. >=20 Correct. Just don't care at all and don't touch soft affinity for now. > Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the > right > direction and you have an idea on how to implement, I could follow > you on > enabling this feature with you leading the work. I do not have much > idea on this. >=20 This can come later, either as an enhancement of this affinity based solution, or being implemented on top of it. In any case, let's start with the affinity based solution for now. It's a good level support already, and a nice first step toward future improvements. Oh, btw, please don't throw away this cpupool patch series either! A feature like `xl cpupool-biglittle-split' can still be interesting, completely orthogonally and independently from the affinity based work, and this series looks like it can be used to implement that. :-) Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-ap5dmCnZIqTdB32r6iu/ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJX45tPAAoJEBZCeImluHPud6QQAKs5Op0yI7Jd0pbWQcf6W5vy U2ylPAldymBMbBIIrzmofgvDaCph7DiBBpwTSQkxNLFiIsBtJanoSy5ardFFsr5M r8LDC1swtaA5R8yDBOVQAHi1PCM1CEz6xK2UTv+wjpu6W7KLSGq8fFtSyMfIifRp xakgIE1HBOiS5lvTtb1sWajAItEFznvcnmfNz1lXl9SNrV1/JAK//8r0p7Pb6X7T 8zwL9rNMBM6ySjD91J+0tcAUeR79AYXRkhcmuJxtQ52cKgBBscP5wGHD4ILA9Shv 55VkxYdOQUtLIilZHW76Oul1UKjVROxJQipaSq1bniN2P/NjrTXkaV/znmCCUk8h 4D7boHUNtz8IwJlR67SKSCvZ9y1gBzsiUZhJClCgjplDLptt5p6yk2E3zMEi6Ox1 4pn4BWYuBEnZEy5VCauywyb3hZqB28stChqSjo1RIDdp3SFFnqCZk0+ngvAtbta4 SKByd5m/mKz/AH68NWfDVLoR5MuH4EfPT/AQRAmTGCCgsXNanSMe5j1CaFU2Bp93 XEHzpza9AraSh/nXJZP3BZKgwqmVxUJFsOQAL7llqFMMx38cz5PtyCU8hR2tQjzc TpTGS63qsg2aSwALduKF46MMTRXx9SYbc3+msYOt7Z4H93iVdhFCre2SZhTALPMC DzTuJcoFXaJM7JqrvThT =wodG -----END PGP SIGNATURE----- --=-ap5dmCnZIqTdB32r6iu/-- --===============8425115555532270665== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============8425115555532270665==--