linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org,
	josef@toxicpanda.com, torvalds@linux-foundation.org,
	vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com,
	clm@fb.com, dietmar.eggemann@arm.com, bsegall@google.com,
	yuyang.du@intel.com
Subject: Re: [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation
Date: Mon, 9 Oct 2017 11:45:17 +0200	[thread overview]
Message-ID: <20171009094517.wfovc5b2dpbazta4@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20171009080856.GB24129@e105550-lin.cambridge.arm.com>

On Mon, Oct 09, 2017 at 09:08:57AM +0100, Morten Rasmussen wrote:
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -565,6 +565,8 @@ void print_cfs_rq(struct seq_file *m, in
> >  			cfs_rq->removed.load_avg);
> >  	SEQ_printf(m, "  .%-30s: %ld\n", "removed.util_avg",
> >  			cfs_rq->removed.util_avg);
> > +	SEQ_printf(m, "  .%-30s: %ld\n", "removed.runnable_sum",
> > +			cfs_rq->removed.runnable_sum);
> >  #ifdef CONFIG_FAIR_GROUP_SCHED
> >  	SEQ_printf(m, "  .%-30s: %lu\n", "tg_load_avg_contrib",
> >  			cfs_rq->tg_load_avg_contrib);
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3330,11 +3330,77 @@ void set_task_rq_fair(struct sched_entit
> >  	se->avg.last_update_time = n_last_update_time;
> >  }
> >  
> > -/* Take into account change of utilization of a child task group */
> > +
> > +/*
> > + * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> > + * propagate its contribution. The key to this propagation is the invariant
> > + * that for each group:
> > + *
> > + *   ge->avg == grq->avg						(1)
> > + *
> > + * _IFF_ we look at the pure running and runnable sums. Because they
> > + * represent the very same entity, just at different points in the hierarchy.
> > + *
> > + *
> > + * Per the above update_tg_cfs_util() is trivial (and still 'wrong') and
> > + * simply copies the running sum over.
> > + *
> > + * However, update_tg_cfs_runnable() is more complex. So we have:
> > + *
> > + *   ge->avg.load_avg = ge->load.weight * ge->avg.runnable_avg		(2)
> > + *
> > + * And since, like util, the runnable part should be directly transferable,
> > + * the following would _appear_ to be the straight forward approach:
> > + *
> > + *   grq->avg.load_avg = grq->load.weight * grq->avg.running_avg	(3)
> 
> Should it be grq->avg.runnable_avg instead of running_avg?

Yes very much so. Typing hard. Otherwise (3) would not follow from (2)
either.

> cfs_rq->avg.load_avg has been defined previous (in patch 2 I think) to
> be:
> 
> 	load_avg = \Sum se->avg.load_avg
> 		 = \Sum se->load.weight * se->avg.runnable_avg
> 
> That sum will increase when ge is runnable regardless of whether it is
> running or not. So, I think it has to be runnable_avg to make sense?

Ack.

> > + *
> > + * And per (1) we have:
> > + *
> > + *   ge->avg.running_avg == grq->avg.running_avg
> 
> You just said further up that (1) only applies to running and runnable
> sums? These are averages, so I think this is invalid use of (1). But
> maybe that is part of your point about (4) being wrong?
> 
> I'm still trying to get my head around the remaining bits, but it sort
> of depends if I understood the above bits correctly :)

So while true, the thing we're looking for is indeed runnable_avg.

> > + *
> > + * Which gives:
> > + *
> > + *                      ge->load.weight * grq->avg.load_avg
> > + *   ge->avg.load_avg = -----------------------------------		(4)
> > + *                               grq->load.weight
> > + *
> > + * Except that is wrong!
> > + *
> > + * Because while for entities historical weight is not important and we
> > + * really only care about our future and therefore can consider a pure
> > + * runnable sum, runqueues can NOT do this.
> > + *
> > + * We specifically want runqueues to have a load_avg that includes
> > + * historical weights. Those represent the blocked load, the load we expect
> > + * to (shortly) return to us. This only works by keeping the weights as
> > + * integral part of the sum. We therefore cannot decompose as per (3).
> > + *
> > + * OK, so what then?

And as the text above suggests, we cannot decompose because it contains
the blocked weight, which is not included in grq->load.weight and thus
things come apart.

> > + * Another way to look at things is:
> > + *
> > + *   grq->avg.load_avg = \Sum se->avg.load_avg
> > + *
> > + * Therefore, per (2):
> > + *
> > + *   grq->avg.load_avg = \Sum se->load.weight * se->avg.runnable_avg
> > + *
> > + * And the very thing we're propagating is a change in that sum (someone
> > + * joined/left). So we can easily know the runnable change, which would be, per
> > + * (2) the already tracked se->load_avg divided by the corresponding
> > + * se->weight.
> > + *
> > + * Basically (4) but in differential form:
> > + *
> > + *   d(runnable_avg) += se->avg.load_avg / se->load.weight
> > + *								   (5)
> > + *   ge->avg.load_avg += ge->load.weight * d(runnable_avg)

And this all has runnable again, and so should make sense.

Combined with an earlier bit, noted by Dietmar, I now have the below
delta.


---
 kernel/sched/fair.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9e520b6923e..ba879c42bddd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3333,7 +3333,7 @@ __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
  * differential update where we store the last value we propagated. This in
  * turn allows skipping updates if the differential is 'small'.
  *
- * Updating tg's load_avg is necessary before update_cfs_share().
+ * Updating tg's load_avg is necessary before update_cfs_group().
  */
 static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
 {
@@ -3422,11 +3422,11 @@ void set_task_rq_fair(struct sched_entity *se,
  * And since, like util, the runnable part should be directly transferable,
  * the following would _appear_ to be the straight forward approach:
  *
- *   grq->avg.load_avg = grq->load.weight * grq->avg.running_avg	(3)
+ *   grq->avg.load_avg = grq->load.weight * grq->avg.runnable_avg	(3)
  *
  * And per (1) we have:
  *
- *   ge->avg.running_avg == grq->avg.running_avg
+ *   ge->avg.runnable_avg == grq->avg.runnable_avg
  *
  * Which gives:
  *
@@ -3601,7 +3601,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum
  * avg. The immediate corollary is that all (fair) tasks must be attached, see
  * post_init_entity_util_avg().
  *
- * cfs_rq->avg is used for task_h_load() and update_cfs_share() for example.
+ * cfs_rq->avg is used for task_h_load() and update_cfs_group() for example.
  *
  * Returns true if the load decayed or we removed load.
  *

  reply	other threads:[~2017-10-09  9:45 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 13:20 [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 01/18] sched/fair: Clean up calc_cfs_shares() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 02/18] sched/fair: Add comment to calc_cfs_shares() Peter Zijlstra
2017-09-28 10:03   ` Morten Rasmussen
2017-09-29 11:35     ` Peter Zijlstra
2017-09-29 13:03       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity() Peter Zijlstra
2017-09-29  9:04   ` Morten Rasmussen
2017-09-29 11:38     ` Peter Zijlstra
2017-09-29 13:00       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 04/18] sched/fair: Remove se->load.weight from se->avg.load_sum Peter Zijlstra
2017-09-29 15:26   ` Morten Rasmussen
2017-09-29 16:39     ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 05/18] sched/fair: Change update_load_avg() arguments Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 06/18] sched/fair: Move enqueue migrate handling Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 07/18] sched/fair: Rename {en,de}queue_entity_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 08/18] sched/fair: Introduce {en,de}queue_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 09/18] sched/fair: More accurate reweight_entity() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 10/18] sched/fair: Use reweight_entity() for set_user_nice() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 11/18] sched/fair: Rewrite cfs_rq->removed_*avg Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation Peter Zijlstra
2017-10-09  8:08   ` Morten Rasmussen
2017-10-09  9:45     ` Peter Zijlstra [this message]
2017-10-18 12:45       ` Morten Rasmussen
2017-10-30 13:35         ` Peter Zijlstra
2017-10-09 15:03   ` Vincent Guittot
2017-10-09 15:29     ` Vincent Guittot
2017-10-10  7:29       ` Peter Zijlstra
2017-10-10  7:44         ` Vincent Guittot
2017-10-13 15:22           ` Vincent Guittot
2017-10-13 20:41             ` Peter Zijlstra
2017-10-15 12:01               ` Vincent Guittot
2017-10-16 13:55               ` Vincent Guittot
2017-10-19 15:04                 ` Vincent Guittot
2017-10-30 17:20                   ` Peter Zijlstra
2017-10-31 11:14                     ` Vincent Guittot
2017-10-31 15:01                       ` Peter Zijlstra
2017-10-31 16:38                         ` Vincent Guittot
2017-11-16 14:09                         ` [PATCH v3] sched: Update runnable propagation rule Vincent Guittot
2017-11-16 14:21                           ` [PATCH v4] " Vincent Guittot
2017-12-06 11:40                             ` Peter Zijlstra
2017-12-06 17:10                               ` Ingo Molnar
2017-12-06 20:29                             ` [tip:sched/core] sched/fair: Update and fix the " tip-bot for Vincent Guittot
2017-09-01 13:21 ` [PATCH -v2 13/18] sched/fair: Propagate an effective runnable_load_avg Peter Zijlstra
2017-10-02 17:46   ` Dietmar Eggemann
2017-10-03  8:50     ` Peter Zijlstra
2017-10-03  9:29       ` Dietmar Eggemann
2017-10-03 12:26   ` Dietmar Eggemann
2017-09-01 13:21 ` [PATCH -v2 14/18] sched/fair: Synchonous PELT detach on load-balance migrate Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq and its se Peter Zijlstra
2017-10-04 19:27   ` Dietmar Eggemann
2017-10-06 13:02     ` Peter Zijlstra
2017-10-09 12:15       ` Dietmar Eggemann
2017-10-09 12:19         ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 16/18] sched/fair: More accurate async detach Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 17/18] sched/fair: Calculate runnable_weight slightly differently Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 18/18] sched/fair: Update calc_group_*() comments Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009094517.wfovc5b2dpbazta4@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=clm@fb.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).