On Fri, 2018-10-19 at 17:33 +0200, Frederic Weisbecker wrote: > On Fri, Oct 19, 2018 at 11:16:49AM -0400, Rik van Riel wrote: > > On Fri, 2018-10-19 at 13:40 +0200, Jan H. Schönherr wrote: > > > > > > Now, it would be possible to "invent" relocatable cpusets to > > > address > > > that > > > issue ("I want affinity restricted to a core, I don't care > > > which"), > > > but > > > then, the current way how cpuset affinity is enforced doesn't > > > scale > > > for > > > making use of it from within the balancer. (The upcoming load > > > balancing > > > portion of the coscheduler currently uses a file similar to > > > cpu.scheduled > > > to restrict affinity to a load-balancer-controlled subset of the > > > system.) > > > > Oh boy, so the coscheduler is going to get its > > own load balancer? > > > > At that point, why bother integrating the > > coscheduler into CFS, instead of making it its > > own scheduling class? > > > > CFS is already complicated enough that it borders > > on unmaintainable. I would really prefer to have > > the coscheduler code separate from CFS, unless > > there is a really compelling reason to do otherwise. > > I guess he wants to reuse as much as possible from the CFS features > and > code present or to come (nice, fairness, load balancing, power aware, > NUMA aware, etc...). I wonder if things like nice levels, fairness, and balancing could be broken out into code that could be reused from both CFS and a new co-scheduler scheduling class. A bunch of the cgroup code is already broken out, but maybe some more could be broken out and shared, too? > OTOH you're right, the thing has specific enough requirements to > consider a new sched policy. Some bits of functionality come to mind: - track groups of tasks that should be co-scheduled (eg all the VCPUs of a virtual machine) - track the subsets of those groups that are runnable (eg. the currently runnable VCPUs of a virtual machine) - figure out time slots and CPU assignments to efficiently use CPU time for the co-scheduled tasks (while leaving some configurable(?) amount of CPU time available for other tasks) - configuring some lower-level code on each affected CPU to "run task A in slot X", etc This really does not seem like something that could be shoehorned into CFS without making it unmaintainable. Furthermore, it also seems like the thing that you could never really get into a highly efficient state as long as it is weighed down by the rest of CFS. -- All Rights Reversed.