All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Ben Segall <bsegall@google.com>, Yuyang Du <yuyang.du@intel.com>
Subject: Re: [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation
Date: Fri, 13 Oct 2017 17:22:54 +0200	[thread overview]
Message-ID: <20171013152254.GA7393@linaro.org> (raw)
In-Reply-To: <CAKfTPtBh7rLNvLLkhaa1qc6UFomUOgKVY6Rv-ZLVm7i4CENy7A@mail.gmail.com>

Hi Peter,

Le Tuesday 10 Oct 2017 à 09:44:53 (+0200), Vincent Guittot a écrit :
> On 10 October 2017 at 09:29, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Oct 09, 2017 at 05:29:04PM +0200, Vincent Guittot wrote:
> >> On 9 October 2017 at 17:03, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >> > On 1 September 2017 at 15:21, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> >> +/*
> >> >> + * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> >> >> + * propagate its contribution. The key to this propagation is the invariant
> >> >> + * that for each group:
> >> >> + *
> >> >> + *   ge->avg == grq->avg                                               (1)
> >> >> + *
> >> >> + * _IFF_ we look at the pure running and runnable sums. Because they
> >> >> + * represent the very same entity, just at different points in the hierarchy.
> >> >
> >> > I agree for the running part because only one entity can be running
> >> > but i'm not sure for the pure runnable sum because we can have
> >> > several runnable task in a cfs_rq but only one runnable group entity
> >> > to reflect them or I misunderstand (1)
> >
> > The idea is that they (ge and grq) are the _same_ entity, just at
> > different levels in the hierarchy. If the grq is runnable, it is through
> > the ge.
> >
> > As a whole, they don't care how many runnable tasks there are.
> 
> Ok so I agree with this point that both grp and ge runnable follow the
> same trend
> 
> my point is that even if same changes apply on both grp and ge, we
> can't directly apply changes of grp's runnable on ge's runnable
> 
> >
> >> > As an example, we have 2 always running task TA and TB so their
> >> > load_sum is LOAD_AVG_MAX for each task The grq->avg.load_sum = \Sum
> >> > se->avg.load_sum = 2*LOAD_AVG_MAX But the ge->avg.load_sum will be
> >> > only LOAD_AVG_MAX
> >> >
> >> > So If we apply directly the d(TB->avg.load_sum) on the group hierachy
> >> > and on ge->avg.load_sum in particular, the latter decreases to 0
> >> > whereas it should decrease only by half
> >> >
> >> > I have been able to see this wrong behavior with a rt-app json file
> >> >
> >> > so I think that we should instead remove only
> >> >
> >> > delta = se->avg.load_sum / grq->avg.load_sum * ge->avg.load_sum
> >>
> >> delta = se->avg.load_sum / (grq->avg.load_sum+se->avg.load_sum) *
> >> ge->avg.load_sum
> >>
> >> as the se has already been detached
> >>
> >> > We don't have grq->avg.load_sum but we can have a rough estimate with
> >> > grq->avg.load_avg/grq->weight
> >
> > Hurm, I think I see what you're saying, let me ponder this more.
> 
> The formula above works was an example for detaching but it doesn't
> apply for all use case. we need a more generic way to propagate
> runnable changes

I have studied a bit more how to improve the propagation formula and the
changes below is doing the job for the UCs that I have tested.

Unlike running, we can't directly propagate the runnable through hierarchy
when we migrate a task. Instead we must ensure that we will not
over/underestimate the impact of the migration thanks to several rules:
-ge->avg.runnable_sum can't be higher than LOAD_AVG_MAX
-ge->avg.runnable_sum can't be lower than ge->avg.running_sum (once scaled to
the same range)
-we can't directly propagate a negative delta of runnable_sum because part of
this runnable time can be "shared" with others sched_entities and stays on the
gcfs_rq. Instead, we can't estimate the new runnable_sum of the gcfs_rq with
the formula: gcfs_rq's runnable sum = gcfs_rq's load_sum / gcfs_rq's weight.
-ge->avg.runnable_sum can't increase when we detach a task.

---
 kernel/sched/fair.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 350dbec0..a063b048 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3489,33 +3489,67 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 static inline void
 update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long runnable_sum = gcfs_rq->prop_runnable_sum;
-	long runnable_load_avg, load_avg;
-	s64 runnable_load_sum, load_sum;
+	long running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+	long runnable_load_avg, delta_avg, load_avg;
+	s64 runnable_load_sum, delta_sum, load_sum = 0;
 
 	if (!runnable_sum)
 		return;
 
 	gcfs_rq->prop_runnable_sum = 0;
 
+	/*
+	 * Get a rough estimate of gcfs_rq's runnable
+	 * This is a low guess as it assumes that tasks are equally
+	 * runnable which is not true but we can't really do better
+	 */
+	if (scale_load_down(gcfs_rq->load.weight))
+		load_sum = div_s64(gcfs_rq->avg.load_sum,
+				scale_load_down(gcfs_rq->load.weight));
+
+	/*
+	 * Propating a delta of runnable is not just adding it to ge's
+	 * runnable_sum:
+	 * - Adding a delta runnable can't make ge's runnable_sum higher than
+	 *   LOAD_AVG_MAX
+	 * - We can't directly remove a delta of runnable from
+	 *   ge's runnable_sum but we can only guest estimate what runnable
+	 *   will become thanks to few simple rules:
+	 *   - gcfs_rq's runnable is a good estimate
+	 *   - ge's runnable_sum can't increase when we remove runnable
+	 *   - runnable_sum can't be lower than running_sum
+	 */
+	if (runnable_sum >= 0) {
+		runnable_sum += se->avg.load_sum;
+		runnable_sum = min(runnable_sum, LOAD_AVG_MAX);
+	} else
+		runnable_sum = min(se->avg.load_sum, load_sum);
+
+	running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
+	runnable_sum = max(runnable_sum, running_sum);
+
 	load_sum = (s64)se_weight(se) * runnable_sum;
 	load_avg = div_s64(load_sum, LOAD_AVG_MAX);
 
-	add_positive(&se->avg.load_sum, runnable_sum);
-	add_positive(&se->avg.load_avg, load_avg);
+	delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
+	delta_avg = load_avg - se->avg.load_avg;
 
-	add_positive(&cfs_rq->avg.load_avg, load_avg);
-	add_positive(&cfs_rq->avg.load_sum, load_sum);
+	se->avg.load_sum = runnable_sum;
+	se->avg.load_avg = load_avg;
+	add_positive(&cfs_rq->avg.load_avg, delta_avg);
+	add_positive(&cfs_rq->avg.load_sum, delta_sum);
 
 	runnable_load_sum = (s64)se_runnable(se) * runnable_sum;
 	runnable_load_avg = div_s64(runnable_load_sum, LOAD_AVG_MAX);
+	delta_sum = runnable_load_sum - se_weight(se) * se->avg.runnable_load_sum;
+	delta_avg = runnable_load_avg - se->avg.runnable_load_avg;
 
-	add_positive(&se->avg.runnable_load_sum, runnable_sum);
-	add_positive(&se->avg.runnable_load_avg, runnable_load_avg);
+	se->avg.runnable_load_sum = runnable_sum;
+	se->avg.runnable_load_avg = runnable_load_avg;
 
 	if (se->on_rq) {
-		add_positive(&cfs_rq->avg.runnable_load_avg, runnable_load_avg);
-		add_positive(&cfs_rq->avg.runnable_load_sum, runnable_load_sum);
+		add_positive(&cfs_rq->avg.runnable_load_avg, delta_avg);
+		add_positive(&cfs_rq->avg.runnable_load_sum, delta_sum);
 	}
 }
 
-- 
2.7.4

  reply	other threads:[~2017-10-13 15:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 13:20 [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 01/18] sched/fair: Clean up calc_cfs_shares() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 02/18] sched/fair: Add comment to calc_cfs_shares() Peter Zijlstra
2017-09-28 10:03   ` Morten Rasmussen
2017-09-29 11:35     ` Peter Zijlstra
2017-09-29 13:03       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity() Peter Zijlstra
2017-09-29  9:04   ` Morten Rasmussen
2017-09-29 11:38     ` Peter Zijlstra
2017-09-29 13:00       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 04/18] sched/fair: Remove se->load.weight from se->avg.load_sum Peter Zijlstra
2017-09-29 15:26   ` Morten Rasmussen
2017-09-29 16:39     ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 05/18] sched/fair: Change update_load_avg() arguments Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 06/18] sched/fair: Move enqueue migrate handling Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 07/18] sched/fair: Rename {en,de}queue_entity_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 08/18] sched/fair: Introduce {en,de}queue_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 09/18] sched/fair: More accurate reweight_entity() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 10/18] sched/fair: Use reweight_entity() for set_user_nice() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 11/18] sched/fair: Rewrite cfs_rq->removed_*avg Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation Peter Zijlstra
2017-10-09  8:08   ` Morten Rasmussen
2017-10-09  9:45     ` Peter Zijlstra
2017-10-18 12:45       ` Morten Rasmussen
2017-10-30 13:35         ` Peter Zijlstra
2017-10-09 15:03   ` Vincent Guittot
2017-10-09 15:29     ` Vincent Guittot
2017-10-10  7:29       ` Peter Zijlstra
2017-10-10  7:44         ` Vincent Guittot
2017-10-13 15:22           ` Vincent Guittot [this message]
2017-10-13 20:41             ` Peter Zijlstra
2017-10-15 12:01               ` Vincent Guittot
2017-10-16 13:55               ` Vincent Guittot
2017-10-19 15:04                 ` Vincent Guittot
2017-10-30 17:20                   ` Peter Zijlstra
2017-10-31 11:14                     ` Vincent Guittot
2017-10-31 15:01                       ` Peter Zijlstra
2017-10-31 16:38                         ` Vincent Guittot
2017-11-16 14:09                         ` [PATCH v3] sched: Update runnable propagation rule Vincent Guittot
2017-11-16 14:21                           ` [PATCH v4] " Vincent Guittot
2017-12-06 11:40                             ` Peter Zijlstra
2017-12-06 17:10                               ` Ingo Molnar
2017-12-06 20:29                             ` [tip:sched/core] sched/fair: Update and fix the " tip-bot for Vincent Guittot
2017-09-01 13:21 ` [PATCH -v2 13/18] sched/fair: Propagate an effective runnable_load_avg Peter Zijlstra
2017-10-02 17:46   ` Dietmar Eggemann
2017-10-03  8:50     ` Peter Zijlstra
2017-10-03  9:29       ` Dietmar Eggemann
2017-10-03 12:26   ` Dietmar Eggemann
2017-09-01 13:21 ` [PATCH -v2 14/18] sched/fair: Synchonous PELT detach on load-balance migrate Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq and its se Peter Zijlstra
2017-10-04 19:27   ` Dietmar Eggemann
2017-10-06 13:02     ` Peter Zijlstra
2017-10-09 12:15       ` Dietmar Eggemann
2017-10-09 12:19         ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 16/18] sched/fair: More accurate async detach Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 17/18] sched/fair: Calculate runnable_weight slightly differently Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 18/18] sched/fair: Update calc_group_*() comments Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171013152254.GA7393@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=clm@fb.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.