From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [RFC 0/5] xen/arm: support big.little SoC
Date: Tue, 20 Sep 2016 02:11:04 +0200
Message-ID: <1474330264.4393.129.camel@citrix.com>
References: <1474250936-27962-1-git-send-email-peng.fan@nxp.com>
 <10152e13-bccb-0794-44e4-556845875e33@arm.com>
 <20160919083619.GA16854@linux-7smt.suse>
 <5ddefbc1-3bd4-c990-b615-0039761535d8@arm.com>
 <CAFLBxZaa7cM_poOzHo+aQbcOP+KLJUUN3asEMyasZpaAKTU5dg@mail.gmail.com>
 <f0375dc0-3ce3-7188-7cff-0685cab9aca1@arm.com>
 <170e2787-a410-37c5-a675-6fc7cf31ad6f@citrix.com>
 <20160919133259.GC7407@linux-u7w5.ap.freescale.net>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============2997899403128260413=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <20160919133259.GC7407@linux-u7w5.ap.freescale.net>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: Peng Fan <van.freenix@gmail.com>, George Dunlap <george.dunlap@citrix.com>
Cc: J??rgen Gro?? <jgross@suse.com>, Peng Fan <peng.fan@nxp.com>, Stefano Stabellini <sstabellini@kernel.org>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Julien Grall <julien.grall@arm.com>, Jan Beulich <jbeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

--===============2997899403128260413==
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="=-9WF1DYIjVWYTBSm1tfg1"

--=-9WF1DYIjVWYTBSm1tfg1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, 2016-09-19 at 21:33 +0800, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
> >=C2=A0
> > No, I think it would be a lot simpler to just teach the scheduler
> > about
> > different classes of cpus.=C2=A0=C2=A0credit1 would probably need to be
> > modified
> > so that its credit algorithm would be per-class rather than pool-
> > wide;
> > but credit2 shouldn't need much modification at all, other than to
> > make
> > sure that a given runqueue doesn't include more than one class; and
> > to
> > do load-balancing only with runqueues of the same class.
>=20
> I try to follow.
> =C2=A0- scheduler needs to be aware of different classes of cpus. ARM
> big.Little cpus.
>
Yes, I think this is essential.

> =C2=A0- scheduler schedules vcpus on different physical cpus in one
> cpupool.
>
Yep, that's what the scheduler does. And personally, I'd start
implementing big.LITTLE support for a situation where both big and
LITTLE cpus coexists in the same pool.

> =C2=A0- different cpu classes needs to be in different runqueue.
>=20
Yes. So, basically, imagine to use vcpu pinning to support big.LITTLE.
I've spoken briefly about this in my reply to Juergen. You probably can
even get something like this up-&-running by writing very few or zero
code (you'll need --for now-- max_dom0_vcpus, dom0_vcpus_pin, and then,
in domain config files, "cpus=3D'...'").

Then, the real goal, would be to achieve the same behavior
automatically, by acting on runqueues' arrangement and load balancing
logic in the scheduler(s).

Anyway, sorry for my ignorance on big.LITTLE, but there's something I'm
missing: _when_ is it that it is (or needs to be) decided whether a
vcpu will run on a big or LITTLE core?

Thinking to a bare metal system, I think that cpu X is, for instance, big, =
and will always be like that; similarly, cpu Y is LITTLE.

This makes me think that, for a virtual machine, it is ok to choose/specify=
 at _domain_creation_ time, which vcpus are big and which vcpus are LITTLE,=
 is this correct?
If yes, this also means that --whatever way we find to make this happen, cp=
upools, scheduler, etc-- the vcpus that we decided they are big, must only =
be scheduled on actual big pcpus, and pcpus that we decided they are LITTLE=
, must only be scheduled on actual LITTLE pcpus, correct again?

> Then for implementation.
> =C2=A0- When create a guest, specific physical cpus that the guest will b=
e
> run on.
>
I'd actually do that the other way round. I'd ask the user to specify
how many --and, if that's important-- vcpus are big and how many/which
are LITTLE.

Knowing that, we also know whether the domain is a big only, LITTLE
only or big.LITTLE one. And we also know on which set of pcpus each set
of vcpus should be restrict to.

So, basically (but it's just an example) something like this, in the xl
config file of a guest:

1) big.LITTLE guest, with 2 big and 2 LITTLE pcpus. User doesn't care =C2=
=A0
=C2=A0 =C2=A0which is which, so a default could be 0,1 big and 2,3 LITTLE:

=C2=A0vcpus =3D 4
=C2=A0vcpus.big =3D 2

2) big.LITTLE guest, with 8 vcpus, of which 0,2,4 and 6 are big:

vcpus =3D 8
vcpus.big =3D [0, 2, 4, 6]

Which would be the same as

vcpus =3D 8
vcpus.little =3D [1, 3, 5, 7]

3) guest with 4 vcpus, all big:

vcpus =3D 4
vcpus.big =3D "all"

Which would be the same as:

vcpus =3D 4
vcpus.little =3D "none"

And also the same as just:

vcpus =3D 4


Or something like this

> =C2=A0- If the physical cpus are different cpus, indicate the guest would
> like to be a big.little guest.
> =C2=A0=C2=A0=C2=A0And have big vcpus and little vcpus.
>
Not liking this as _the_ way of specifying the guest topology, wrt
big.LITTLE-ness (see alternative proposal right above. :-))

However, right now we support pinning/affinity already. We certainly
need to decide what to do if, e.g., no vcpus.big or vcpus.little are
present, but the vcpus have hard or soft affinity to some specific
pcpus.

So, right now, this, in the xl config file:

cpus =3D [2, 8, 12, 13, 15, 17]

means that we want to ping 1-to-1 vcpu 0 to pcpu 2, vcpu 1 to pcpu 8,
vcpu 2 to pcpu 12, vcpu 3 to pcpu 13, vcpu 4 to pcpu 15 and vcpu 5 to
pcpu 17. Now, if cores 2, 8 and 12 are big, and no vcpus.big or
vcpu.little is specified, I'd put forward the assumption that the user
wants vcpus 0, 1 and 2 to be big, and vcpus 3, 4, and 5 to be LITTLE.

If, instead, there are vcpus.big or vcpus.little specified, and there's
disagreement, I'd either error out or decide which overrun the other
(and print a WARNING about that happening).

Still right now, this:

cpus =3D "2-12"

means that all the vcpus of the domain have hard affinity (i.e., are
pinned) to pcpus 2-12. And in this case I'd conclude that the user
wants for all the vcpus to be big.

I'm less sure what to do if _only_ soft-affinity is specified (via
"cpus_soft=3D"), or if hard-affinity contains both big and LITTLE pcpus,
like, e.g.:

cpus =3D "2-15"

> =C2=A0- If no physical cpus specificed, then the guest may runs on big
> cpus or on little cpus. But not both.
>
Yes. if nothing (or something contradictory) is specified, we "just"
have to decide what's the sanest default.

> =C2=A0=C2=A0=C2=A0How to decide runs on big or little physical cpus?
>
I'd default to big.

> =C2=A0- For Dom0, I am still not sure,default big.little or else?
>=20
Again, if nothing is specified, I'd probably default to:
=C2=A0- give dom0 as much vcpus are there are big cores
=C2=A0- restrict them to big cores

But, of course, I think we should add boot time parameters like these
ones:

=C2=A0dom0_vcpus_big =3D 4
=C2=A0dom0_vcpus_little =3D 2

which would mean the user wants dom0 to have 4 big and 2 LITTLE
cores... and then we act accordingly, as described above, and in other
emails.

> If use scheduler to handle the different classes cpu, we do not need
> to use cpupool
> to block vcpus be scheduled onto different physical cpus. And using
> scheudler to handle this
> gives an opportunity to support big.little guest.
>=20
Exactly, this is one strong point in favour of this solution, IMO!

Regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--=-9WF1DYIjVWYTBSm1tfg1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCAAGBQJX4H6ZAAoJEBZCeImluHPuZVYQAKDb8tggqKBZhwZejTK9L3uF
2q/tUknnqR+pYfXccgDkOwdzbNeAWgZNn0ONeWfxbwgdRokFfhyc2HvZ35qZCghz
hxCasMPVU7dRj7LUXp68vE6YrDsfiCPuL1tSEB4Fod4kGfeCj446ZFB3fAIRe8zj
0Yoja2zSADsYh+ZmbDWRk6anFX2kz88nmaK7c+ob/Jmu3Qw1TrCb2lX9mMy6peTj
6BcreQT7fQLD9kEaAp1yIigHfFwnub0WNj+kw4zt56l6cXx+JDQtED+Fu/px4g69
2VD4RjttS8XWyYlOoDifTqTh+Yvh60afkmmMo6H5ETn/LD7EQsQembdOE90yWkGl
LIHq0KrQVajIrhxrKyrP06Clu35FJfmlQfDsqIcVTW/GkV+2VTvca37m3DWln/bW
ZjB14BnabucIud9IoprTOoDNcLCoJgDiUGufnEcPkb6SL+es/xCtxrtkW4PR1iJd
cJlcGBvENXYZ5Nhec1Uafedw100p0+Lcqz5HkRw97DkLBfWR7QeTBVlrRkPT9CcJ
sKPaD/ySBUKWAtfKYyDiMS0eftCOYCpUz66Q67+nIbEvsCK459hmiq96YsJVub+w
8BtIbbmy/YY3OWDzKTi4LKZ9i0R9u765VARX8QcTtZE4UrLGUr8T4S/q4+QMbmDg
BN7TsAdzUtuZA5Ox3tH6
=Gkok
-----END PGP SIGNATURE-----

--=-9WF1DYIjVWYTBSm1tfg1--


--===============2997899403128260413==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v
cmcveGVuLWRldmVsCg==

--===============2997899403128260413==--