linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	linux-kernel@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
	dietmar.eggemann@arm.com, peterz@infradead.org, mingo@redhat.com,
	morten.rasmussen@arm.com, tglx@linutronix.de,
	mgorman@techsingularity.net, vincent.guittot@linaro.org
Subject: Re: [PATCH 03/10] sched,fair: redefine runnable_load_avg as the sum of task_h_load
Date: Mon, 1 Jul 2019 15:22:35 -0400	[thread overview]
Message-ID: <20190701192234.5oxkuysimi437utz@macbook-pro-91.dhcp.thefacebook.com> (raw)
In-Reply-To: <757e0af14b714b596417b31c45098fc314ed7c8a.camel@surriel.com>

On Mon, Jul 01, 2019 at 12:47:35PM -0400, Rik van Riel wrote:
> On Mon, 2019-07-01 at 12:29 -0400, Josef Bacik wrote:
> > On Fri, Jun 28, 2019 at 04:49:06PM -0400, Rik van Riel wrote:
> > > The runnable_load magic is used to quickly propagate information
> > > about
> > > runnable tasks up the hierarchy of runqueues. The runnable_load_avg
> > > is
> > > mostly used for the load balancing code, which only examines the
> > > value at
> > > the root cfs_rq.
> > > 
> > > Redefine the root cfs_rq runnable_load_avg to be the sum of
> > > task_h_loads
> > > of the runnable tasks. This works because the hierarchical
> > > runnable_load of
> > > a task is already equal to the task_se_h_load today. This provides
> > > enough
> > > information to the load balancer.
> > > 
> > > The runnable_load_avg of the cgroup cfs_rqs does not appear to be
> > > used for anything, so don't bother calculating those.
> > > 
> > > This removes one of the things that the code currently traverses
> > > the
> > > cgroup hierarchy for, and getting rid of it brings us one step
> > > closer
> > > to a flat runqueue for the CPU controller.
> > > 
> > 
> > My memory on this stuff is very hazy, but IIRC we had the
> > runnable_sum and the
> > runnable_avg separated out because you could have the avg lag behind
> > the sum.
> > So like you enqueue a bunch of new entities who's avg may have
> > decayed a bunch
> > and so their overall load is not felt on the CPU until they start
> > running, and
> > now you've overloaded that CPU.  The sum was there to make sure new
> > things
> > coming onto the CPU added actual load to the queue instead of looking
> > like there
> > was no load.
> > 
> > Is this going to be a problem now with this new code?
> 
> That is a good question!
> 
> On the one hand, you may well be right.
> 
> On the other hand, while I see the old code calculating
> runnable_sum, I don't really see it _using_ it to drive
> scheduling decisions.
> 
> It would be easy to define the CPU cfs_rq->runnable_load_sum
> as being the sum of task_se_h_weight() of each runnable task
> on the CPU (for example), but what would we use it for?
> 
> What am I missing?

It's suuuuuper sublte, but you're right in that we don't really need the
runnable_avg per-se, but what you do is you kill calc_group_runnable, which used
to do this

load_avg = max(cfs_rq->avg.load_avg,
               scale_load_down(cfs_rq->load.weight));

runnable = max(cfs_rq->avg.runnable_load_avg,
               scale_load_down(cfs_rq->runnable_weight));

so we'd account for this weirdness of adding a previously idle process to a new
CPU and overloading the CPU because we'd add a bunch of these 0 weight looking
tasks that suddenly all wake up and are on the same CPU.  So we used the
runnable_weight to account for what was actually happening, and the max of
load_avg and the weight to figure out what the potential load would be.

What you've done here is change the weighting stuff to be completely based on
load avg, which is problematic for the reasons above.  Did you fix this later on
in your patches?  If so then just tell me to keep reading and I'll do that ;).
Thanks,

Josef

  reply	other threads:[~2019-07-01 19:22 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-28 20:49 [PATCH RFC v2 0/10] sched,fair: flatten CPU controller runqueues Rik van Riel
2019-06-28 20:49 ` [PATCH 01/10] sched: introduce task_se_h_load helper Rik van Riel
2019-07-01 19:43   ` Josef Bacik
2019-06-28 20:49 ` [PATCH 02/10] sched: change /proc/sched_debug fields Rik van Riel
2019-07-01 19:52   ` Josef Bacik
2019-06-28 20:49 ` [PATCH 03/10] sched,fair: redefine runnable_load_avg as the sum of task_h_load Rik van Riel
2019-07-01 16:29   ` Josef Bacik
2019-07-01 16:47     ` Rik van Riel
2019-07-01 19:22       ` Josef Bacik [this message]
2019-07-01 19:32         ` Rik van Riel
2019-06-28 20:49 ` [PATCH 04/10] sched,fair: move runnable_load_avg to cfs_rq Rik van Riel
2019-06-28 20:49 ` [PATCH 05/10] sched,fair: remove cfs rqs from leaf_cfs_rq_list bottom up Rik van Riel
2019-07-04  9:33   ` Vincent Guittot
2019-07-05  1:02     ` Rik van Riel
2019-06-28 20:49 ` [PATCH 06/10] sched,cfs: use explicit cfs_rq of parent se helper Rik van Riel
2019-06-28 20:49 ` [PATCH 07/10] sched,cfs: fix zero length timeslice calculation Rik van Riel
2019-07-05 14:51   ` Vincent Guittot
2019-07-05 15:15     ` Rik van Riel
2019-06-28 20:49 ` [PATCH 08/10] sched,fair: refactor enqueue/dequeue_entity Rik van Riel
2019-06-28 20:49 ` [PATCH 09/10] sched,fair: add helper functions for flattened runqueue Rik van Riel
2019-07-01 20:20   ` Josef Bacik
2019-07-01 20:53     ` Rik van Riel
2019-06-28 20:49 ` [PATCH 10/10] sched,fair: flatten hierarchical runqueues Rik van Riel
2019-07-01 20:34   ` Josef Bacik
2019-07-01 20:58     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190701192234.5oxkuysimi437utz@macbook-pro-91.dhcp.thefacebook.com \
    --to=josef@toxicpanda.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).