On Tue, 2016-09-20 at 18:03 +0800, Peng Fan wrote:
> Hi Dario,
> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > 
> > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > 
> > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > 
> > > > And this would work even if/when there is only one cpupool, or
> > > > in
> > > > general for domains that are in a pool that has both big and
> > > > LITTLE
> > > > pcpus. Furthermore, big.LITTLE support and cpupools will be
> > > > orthogonal,
> > > > just like pinning and cpupools are orthogonal right now. I.e.,
> > > > once
> > > > we
> > > > will have what I described above, nothing prevents us from
> > > > implementing
> > > > per-vcpu cpupool membership, and either create the two (or
> > > > more!)
> > > > big
> > > > and LITTLE pools, or from mixing things even more, for more
> > > > complex
> > > > and
> > > > specific use cases. :-)
> > > 
> > > I think that everybody agrees that this is the best long term
> > > solution.
> > > 
> > Well, no, that wasn't obvious to me. If that's the case, it's
> > already
> > something! :-)
> > 
> > > 
> > > > 
> > > > 
> > > > Actually, with the cpupool solution, if you want a guest (or
> > > > dom0)
> > > > to
> > > > actually have both big and LITTLE vcpus, you necessarily have
> > > > to
> > > > implement per-vcpu (rather than per-domain, as it is now)
> > > > cpupool
> > > > membership. I said myself it's not impossible, but certainly
> > > > it's
> > > > some
> > > > work... with the scheduler solution you basically get that for
> > > > free!
> > > > 
> > > > So, basically, if we use cpupools for the basics of big.LITTLE
> > > > support,
> > > > there's no way out of it (apart from going implementing
> > > > scheduling
> > > > support afterwords, but that looks backwards to me, especially
> > > > when
> > > > thinking at it with the code in mind).
> > > 
> > > The question is: what is the best short-term solution we can ask
> > > Peng
> > > to
> > > implement that allows Xen to run on big.LITTLE systems today?
> > > Possibly
> > > getting us closer to the long term solution, or at least not
> > > farther
> > > from it?
> > > 
> > So, I still have to look closely at the patches in these series.
> > But,
> > with Credit2 in mind, if one:
> > 
> > ??- take advantage of the knowledge of what arch a pcpu belongs
> > inside??
> 
> > 
> > ?? ??the code that arrange the pcpus in runqueues, which means
> > we'll end??
> > ?? ??up with big runqueues and LITTLE runqueues. I re-wrote that
> > code, I
> > ?? ??can provide pointers and help, if necessary;
> > ??- tweak the one or two instance of for_each_runqueue() [*] that
> > there
> > ?? ??are in the code into a for_each_runqueue_of_same_class(),
> > i.e.:
> 
> Do you have plan to add this support for big.LITTLE?
> 
> I admit that this is the first time I look into the scheduler part.
> If I understand wrongly, please correct me.
> 
No, I was not really planning to work on this directly myself... I was
only providing opinions and advice.

That of course may change, e.g., if we think that it is absolutely and
of capital importance for Xen to gain big.LITTLE support in matter of
days. :-)  That's a bit unlikely at this stage anyway, though, even
independently of who'll work on that, given where we stand in Xen 4.8
release process.

In any case, I'm happy to help, though, with any kind of advice --as
I'm already trying to do-- but also in a more concrete way, on actual
code... but I strongly think that it's better if you lead the effort,
e.g., by trying to do what we agree upon, and ask immediately, as soon
as you get stuck. :-)

> There is a runqueue for each physical cpu, and there are several
> vcpus in the runqueue.
> The scheduler will pick a vcpu in the runqueue to run on the physical
> cpu.
> 
If you start by "just" using pinning, as I envisioned for early
support, and that also George is suggesting as first step, there's
going to be nothing to do withing Xen and on scheduler's runqueue at
all.

And it won't actually even be wasted effort, because all the code for
parsing and implementing the interface in xl and libxl, will be
reusable for when we'll switch to ditch implicit pinning and integrate
the mechanism within the scheduler's logic.

> A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be
> scheduled
> or migrated to a different physical cpu.
> 
> Settings cpu soft affinity and hard affinity to restrict vcpus be
> scheduled
> on specific cpus. Then is there a need to introuduce more runqueues?
> 
No, it's all more dynamic and --allow me-- more elegant than this that
you describe... But I do understand the fact that you've never looked
at scheduling code, so it's ok to not have this clear. :-_

> This seems more complicated than cpupool (:
> 
Nah, it's not... It may be a comparable amount of effort, but for a
better end result! :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)