On Mon, 2016-09-19 at 12:23 +0200, Juergen Gross wrote:
> On 19/09/16 12:06, Julien Grall wrote:
> > On 19/09/2016 11:45, George Dunlap wrote:
> > > But expanding the schedulers to know about different classes of
> > > cpus,
> > > and having vcpus specified as running only on specific types of
> > > pcpus,
> > > seems like a more flexible approach.
> > 
> > So, if I understand correctly, you would not recommend to extend
> > the
> > number of CPU pool per domain, correct?
> 
> Before deciding in which direction to go (multiple cpupools, sub-
> pools,
> kind of implicit cpu pinning) 
>
You mention "implicit pinning" here, and I'd like to stress this,
because basically no one (else) in the conversation seem to have
considered it. In fact, it may not necessarily be the best long term
solution, but doing something based on pinning is, IMO, a very
convenient first step (and may well become one of the 'modes' available
to the user for taking advantage of big.LITTLE.

So, if cpus 0-3 are big and cpus 4,5 are LITTLE, we can:
 - for domain X, which wants to run only on big cores, pin all it's
   vcpus to pcpus 0-3
 - for domain Y, which wants to run only on LITTLE cores, pin all it's
   vcpus to pcpus 4,5
 - for domain Z, which wants its vcpus 0,1 to run on big cores, and
   it's vcpus 2,3 to run on LITTLE cores, pin vcpus 0,1 to pcpus 0-3, 
   and pin vcpus 2,3 to pcpus 4,5

Setting thing up like this, even automatically, either in hypervisor or
toolstack, is basically already possible (with all the good and bad
aspects of pinning, of course).

Then, sure (as I said when replying to George), we may want things to
be more flexible, and we also probably want to be on the safe side --if 
ever some components manages to undo our automatic pinning-- wrt the
scheduler not picking up work for the wrong architecture... But still
I'm a bit surprised this did not came up... Julien, Peng, is that
because you think this is not doable for any reason I'm missing?

> I think we should think about the
> implications regarding today's interfaces:
> 
I totally agree. (At least) These three things should be very clear,
before starting to implement anything:
 - what is the behavior that we want to achieve, from the point of 
   view of both the hypervisor and the guests
 - what will be the interface
 - how this new interface will map and will interact with existing 
   interfaces

> - Do we want to be able to use different schedulers for big/little
>   (this would mean some cpupool related solution)? I'd prefer to
>   have only one scheduler type for each domain. :-)
> 
Well, this, actually is, IMO, from a behavioral perspective, a nice
point in favour of supporting a split-cpupool solution. In fact, I
think I can envision scenario and reasons for having different
schedulers between big cpus and LITTLE cpus (or same scheduler with
different parameters).

But then, yes, if we then want a domain to have both big and LITTLE
cpus, we'd need to allow a domain to live in more than one cpupool at a
time, which means a domain will have multiple schedulers.

I don't think this is impossible... almost all the scheduling happens
at the vcpu level already. The biggest challenge is probably the
interface. _HOWEVER_, I think this is something that can well come
later, like in phase 2 or 3, as an enhancement/possibility, instead
than be the foundation of big.LITTLE support in Xen.

> - What about scheduling parameters like weight and cap? How would
>   those apply (answer probably influencing pinning solution).
>   Remember that especially the downsides of pinning led to the
>   introduction of cpupools.
> 
Very important bit indeed. FWIW, there's already a scheduler that
supports per-vcpu parameters (so some glue code, or code from which to
take inspiration) is there already. And scheduling happens at the vcpu
level anyway. I.e., it would not be to hard to make it possible to pass
down to Xen, say, per-vcpu weights. Then, at, e.g., xl level, you
specify a set of parameters for big cpus, and another set for LITTLE
cpus, and either xl itself or libxl will do the mapping and prepare the
per-vcpu values.

Again, this is just to say that the "cpupool way" does not look too
impossible, and may be interesting. However, although I'd like to think
more (and see more thoughts) about designs and possibilities, I still
continue to think it should not be neither the only nor the first mode
that we will implement.

> - Is big.LITTLE to be expected to be combined with NUMA?
> 
> - Do we need to support live migration for domains containing both
>   types of cpus?
> 
Interesting points too.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)