On Fri, 2018-10-19 at 17:33 +0200, Frederic Weisbecker wrote:
> On Fri, Oct 19, 2018 at 11:16:49AM -0400, Rik van Riel wrote:
> > On Fri, 2018-10-19 at 13:40 +0200, Jan H. Schönherr wrote:
> > > 
> > > Now, it would be possible to "invent" relocatable cpusets to
> > > address
> > > that
> > > issue ("I want affinity restricted to a core, I don't care
> > > which"),
> > > but
> > > then, the current way how cpuset affinity is enforced doesn't
> > > scale
> > > for
> > > making use of it from within the balancer. (The upcoming load
> > > balancing
> > > portion of the coscheduler currently uses a file similar to
> > > cpu.scheduled
> > > to restrict affinity to a load-balancer-controlled subset of the
> > > system.)
> > 
> > Oh boy, so the coscheduler is going to get its
> > own load balancer?
> > 
> > At that point, why bother integrating the
> > coscheduler into CFS, instead of making it its
> > own scheduling class?
> > 
> > CFS is already complicated enough that it borders
> > on unmaintainable. I would really prefer to have
> > the coscheduler code separate from CFS, unless
> > there is a really compelling reason to do otherwise.
> 
> I guess he wants to reuse as much as possible from the CFS features
> and
> code present or to come (nice, fairness, load balancing, power aware,
> NUMA aware, etc...).

I wonder if things like nice levels, fairness,
and balancing could be broken out into code
that could be reused from both CFS and a new
co-scheduler scheduling class.

A bunch of the cgroup code is already broken
out, but maybe some more could be broken out
and shared, too?

> OTOH you're right, the thing has specific enough requirements to
> consider a new sched policy. 

Some bits of functionality come to mind:
- track groups of tasks that should be co-scheduled
  (eg all the VCPUs of a virtual machine)
- track the subsets of those groups that are runnable
  (eg. the currently runnable VCPUs of a virtual machine)
- figure out time slots and CPU assignments to efficiently
  use CPU time for the co-scheduled tasks
  (while leaving some configurable(?) amount of CPU time 
  available for other tasks)
- configuring some lower-level code on each affected CPU
  to "run task A in slot X", etc

This really does not seem like something that could be
shoehorned into CFS without making it unmaintainable.

Furthermore, it also seems like the thing that you could
never really get into a highly efficient state as long
as it is weighed down by the rest of CFS.

-- 
All Rights Reversed.