On Tue, 2018-09-18 at 15:22 +0200, Jan H. Schönherr wrote: > On 09/17/2018 11:48 AM, Peter Zijlstra wrote: > > On Sat, Sep 15, 2018 at 10:48:20AM +0200, Jan H. Schönherr wrote: > > > > > > > > CFS bandwidth control would also need to change significantly as > > > we would now > > > have to dequeue/enqueue nested cgroups below a > > > throttled/unthrottled hierarchy. > > > Unless *those* task groups don't participate in this flattening. > > > > Right, so the whole bandwidth thing becomes a pain; the simplest > > solution is to detect the throttle at task-pick time, dequeue and > > try > > again. But that is indeed quite horrible. > > > > I'm not quite sure how this will play out. > > > > Anyway, if we pull off this flattening feat, then you can no longer > > use > > the hierarchy for this co-scheduling stuff. > > Yeah. I might be a bit biased towards keeping or at least not fully > throwing away > the nesting of CFS runqueues. ;) I do not have a strong bias either way. However, I would like the overhead of the cpu controller to be so low that we can actually use it :) Task priorities in a flat runqueue are relatively straightforward, with vruntime scaling just like done for nice levels, but I have to admit that throttled groups provide a challenge. Dequeueing throttled tasks is pretty straightforward, but requeueing them afterwards when they are no longer throttled could present a real challenge in some situations. > However, the only efficient way that I can currently think of, is a > hybrid model > between the "full nesting" that is currently there, and the "no > nesting" you were > describing above. > > It would flatten all task groups that do not actively contribute some > function, > which would be all task groups purely for accounting purposes and > those for > *unthrottled* CFS hierarchies (and those for coscheduling that > contain exactly > one SE in a runqueue). The nesting would still be kept for > *throttled* hierarchies > (and the coscheduling stuff). (And if you wouldn't have mentioned a > way to get > rid of nesting completely, I would have kept a single level of > nesting for > accounting purposes as well.) > > This would allow us to lazily dequeue SEs that have run out of > bandwidth when > we encounter them, and already enqueue them in the nested task group > (whose SE > is not enqueued at the moment). That way, it's still a O(1) operation > to re-enable > all tasks, once runtime is available again. And O(1) to throttle a > repeat offender. I suspect most systems will have a number of runnable tasks no larger than the number of CPUs most of the time. That makes "re-enable all the tasks" often equivalent to "re-enable one task". Can we handle the re-enabling (or waking up!) of one task almost as fast as we can without the cpu controller? -- All Rights Reversed.