On Tue, 2018-09-18 at 15:22 +0200, Jan H. Schönherr wrote:
> On 09/17/2018 11:48 AM, Peter Zijlstra wrote:
> > On Sat, Sep 15, 2018 at 10:48:20AM +0200, Jan H. Schönherr wrote:
> > 
> > > 
> > > CFS bandwidth control would also need to change significantly as
> > > we would now
> > > have to dequeue/enqueue nested cgroups below a
> > > throttled/unthrottled hierarchy.
> > > Unless *those* task groups don't participate in this flattening.
> > 
> > Right, so the whole bandwidth thing becomes a pain; the simplest
> > solution is to detect the throttle at task-pick time, dequeue and
> > try
> > again. But that is indeed quite horrible.
> > 
> > I'm not quite sure how this will play out.
> > 
> > Anyway, if we pull off this flattening feat, then you can no longer
> > use
> > the hierarchy for this co-scheduling stuff.
> 
> Yeah. I might be a bit biased towards keeping or at least not fully
> throwing away
> the nesting of CFS runqueues. ;)

I do not have a strong bias either way. However, I 
would like the overhead of the cpu controller to be
so low that we can actually use it :)

Task priorities in a flat runqueue are relatively
straightforward, with vruntime scaling just like
done for nice levels, but I have to admit that
throttled groups provide a challenge.

Dequeueing throttled tasks is pretty straightforward,
but requeueing them afterwards when they are no
longer throttled could present a real challenge
in some situations.

> However, the only efficient way that I can currently think of, is a
> hybrid model
> between the "full nesting" that is currently there, and the "no
> nesting" you were
> describing above.
> 
> It would flatten all task groups that do not actively contribute some
> function,
> which would be all task groups purely for accounting purposes and
> those for
> *unthrottled* CFS hierarchies (and those for coscheduling that
> contain exactly
> one SE in a runqueue). The nesting would still be kept for
> *throttled* hierarchies
> (and the coscheduling stuff). (And if you wouldn't have mentioned a
> way to get
> rid of nesting completely, I would have kept a single level of
> nesting for
> accounting purposes as well.)
> 
> This would allow us to lazily dequeue SEs that have run out of
> bandwidth when
> we encounter them, and already enqueue them in the nested task group
> (whose SE
> is not enqueued at the moment). That way, it's still a O(1) operation
> to re-enable
> all tasks, once runtime is available again. And O(1) to throttle a
> repeat offender.

I suspect most systems will have a number of runnable
tasks no larger than the number of CPUs most of the
time.

That makes "re-enable all the tasks" often equivalent
to "re-enable one task".

Can we handle the re-enabling (or waking up!) of one
task almost as fast as we can without the cpu controller?

-- 
All Rights Reversed.