On Wed, 2019-08-28 at 15:50 +0200, Vincent Guittot wrote: > Hi Rik, > > On Thu, 22 Aug 2019 at 04:18, Rik van Riel wrote: > > The runnable_load magic is used to quickly propagate information > > about > > runnable tasks up the hierarchy of runqueues. The runnable_load_avg > > is > > mostly used for the load balancing code, which only examines the > > value at > > the root cfs_rq. > > > > Redefine the root cfs_rq runnable_load_avg to be the sum of > > task_h_loads > > of the runnable tasks. This works because the hierarchical > > runnable_load of > > a task is already equal to the task_se_h_load today. This provides > > enough > > information to the load balancer. > > > > The runnable_load_avg of the cgroup cfs_rqs does not appear to be > > used for anything, so don't bother calculating those. > > > > This removes one of the things that the code currently traverses > > the > > cgroup hierarchy for, and getting rid of it brings us one step > > closer > > to a flat runqueue for the CPU controller. > > I like your proposal but just wanted to clarify one thing with this > patch. > Although you removed the computation of runnable_load_avg of the > cgroup cfs_rq, we are still traversing the hierarchy to update the > root cfs_rq runnable_load_avg because we are traversing the hierarchy > for computing the task_h_loads The task_h_load hierarchy traversal in update_cfs_rq_h_load is rate limited to once a jiffy, though. Rate limiting the hierarchy traversal significantly reduces overhead. > That being said, if we manage to remove the need on using > runnable_load_avg we will completely skip this traversal. I have a > proposal to remove it from load balance and wake up path but i > haven't > look at numa stats which also use it -- All Rights Reversed.