From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
Chris Mason <clm@fb.com>,
kernel-team@fb.com
Subject: Re: [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity
Date: Mon, 1 May 2017 16:17:33 +0200 [thread overview]
Message-ID: <20170501141733.shphf35psasefraj@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20170426165123.GA17921@linaro.org>
Hi,
sorry for being late to the party, trying to catch-up..
On Wed, Apr 26, 2017 at 06:51:23PM +0200, Vincent Guittot wrote:
> Ok, I see your point and agree that there is an issue when propagating
> load_avg of a task group which has tasks with lower weight than the share
> but your proposal has got issue because it uses runnable_load_avg instead
> of load_avg and this makes propagation of loadavg_avg incorrect, something
> like below which keeps using load_avg solve the problem
>
> + if (gcfs_rq->load.weight) {
> + long shares = scale_load_down(calc_cfs_shares(gcfs_rq, gcfs_rq->tg));
> +
> + load = min(gcfs_rq->avg.load_avg *
> + shares / scale_load_down(gcfs_rq->load.weight), shares);
>
So this here does:
( tg->load_avg = \Sum cfs_rq->load_avg )
load = cfs_rq->load.weight
tg_weight = tg->load_avg - cfs_rq->contrib + load
tg->shares * load
shares = -----------------
tg_weight
cfs_rq->load_avg
avg_shares = shares * ----------------
load
tg->shares * cfs_rq->load_avg
= -----------------------------
tg_weight
( se->load.weight = shares )
se->load_avg = min(shares, avg_shares);
So where shares (and se->load.weight) are an upper bound (due to using
cfs_rq->load.weight, see calc_cfs_shares), avg_shares is supposedly a
more accurate representation based on our PELT averages.
This looks OK; and I agree with Vincent that we should use
cfs_rq->avg.load_avg, not cfs_rq->runnable_load_avg, since tg->load_avg
is a sum of the former, not the latter.
Also, arguably calculating the above avg_shares directly (using the
second equation) might be more precise; but I doubt it makes much of a
difference, however since we do min(), we should at least clamp against
MIN_SHARES again.
Furthermore, it appears to me we want a different tg_weight value for
the avg_shares, something like:
tg_weight = tg->load_avg - cfs_rq->contrib + cfs_rq->avg.load_avg
To better match with the numerator's units, otherwise it will have a
tendency to push avg_shares down further than it needs to be.
(All assuming it actually works of course.. compile tested only)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2634,7 +2634,9 @@ account_entity_dequeue(struct cfs_rq *cf
# ifdef CONFIG_SMP
static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
{
- long tg_weight, load, shares;
+ long tg_weight, tg_shares, load, shares;
+
+ tg_shares = READ_ONCE(tg->shares);
/*
* This really should be: cfs_rq->avg.load_avg, but instead we use
@@ -2665,12 +2667,7 @@ static long calc_cfs_shares(struct cfs_r
* case no task is runnable on a CPU MIN_SHARES=2 should be returned
* instead of 0.
*/
- if (shares < MIN_SHARES)
- shares = MIN_SHARES;
- if (shares > tg->shares)
- shares = tg->shares;
-
- return shares;
+ return clamp_t(long, shares, MIN_SHARES, tg_shares);
}
# else /* CONFIG_SMP */
static inline long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
@@ -3075,37 +3072,69 @@ static inline void
update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
struct cfs_rq *gcfs_rq = group_cfs_rq(se);
- long delta, load = gcfs_rq->avg.load_avg;
+ long load = 0, delta;
/*
- * If the load of group cfs_rq is null, the load of the
- * sched_entity will also be null so we can skip the formula
+ * If there is load in our group cfs_rq (@gcfs_rq), then update its
+ * corresponding group se (@se) load value to reflect any change in
+ * weights.
+ *
+ * Also see effective_load(), where we do something similar.
+ *
+ * ( tg->load_avg = \Sum cfs_rq->load_avg )
+ *
+ * tg_weight = tg->load_avg - cfs_rq->contrib
+ *
+ *
+ * tg->shares * cfs_rq->weight
+ * shares = ---------------------------
+ * tg_weight + cfs_rq->weight
+ *
+ *
+ * tg->shares * cfs_rq->load_avg
+ * avg_shares = -----------------------------
+ * tg_weight + cfs_rq->load_avg
+ *
+ *
+ * ( se->load.weight = shares )
+ *
+ * se->load_avg = min(shares, avg_shares);
+ *
+ * Where @shares is an upper bound to @avg_shares, see the comments
+ * in calc_cfs_shares().
*/
- if (load) {
- long tg_load;
-
- /* Get tg's load and ensure tg_load > 0 */
- tg_load = atomic_long_read(&gcfs_rq->tg->load_avg) + 1;
-
- /* Ensure tg_load >= load and updated with current load*/
- tg_load -= gcfs_rq->tg_load_avg_contrib;
- tg_load += load;
+ if (gcfs_rq->load.weight) {
+ long tg_weight1, tg_weight2, tg_shares, shares, avg_shares;
+ struct task_group *tg = gcfs_rq->tg;
/*
- * We need to compute a correction term in the case that the
- * task group is consuming more CPU than a task of equal
- * weight. A task with a weight equals to tg->shares will have
- * a load less or equal to scale_load_down(tg->shares).
- * Similarly, the sched_entities that represent the task group
- * at parent level, can't have a load higher than
- * scale_load_down(tg->shares). And the Sum of sched_entities'
- * load must be <= scale_load_down(tg->shares).
+ * This open-codes calc_cfs_shares(), in order to ensure
+ * we use consistent values for @shares and @avg_shares,
+ * as well as make sure we clip the result properly.
*/
- if (tg_load > scale_load_down(gcfs_rq->tg->shares)) {
- /* scale gcfs_rq's load into tg's shares*/
- load *= scale_load_down(gcfs_rq->tg->shares);
- load /= tg_load;
- }
+
+ tg_shares = READ_ONCE(tg->shares);
+
+ load = scale_load_down(gcfs_rq->load.weight);
+
+ tg_weight1 = atomic_long_read(&tg->load_avg);
+ tg_weight2 = (tg_weight1 -= cfs_rq->tg_load_avg_contrib);
+ tg_weight1 += load;
+
+ shares = tg_shares * load;
+ if (tg_weight1)
+ shares /= tg_weight1;
+
+
+ load = READ_ONCE(gcfs_rq->avg.load_avg);
+
+ tg_weight2 += load;
+
+ avg_shares = tg_shares * load;
+ if (tg_weight2)
+ avg_shares /= tg_weight2;
+
+ load = clamp_t(long, min(shares, avg_shares), MIN_SHARES, tg_shares);
}
delta = load - se->avg.load_avg;
next prev parent reply other threads:[~2017-05-01 14:17 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33 ` [PATCH v2 " Tejun Heo
2017-05-03 18:00 ` Peter Zijlstra
2017-05-03 21:45 ` Tejun Heo
2017-05-04 5:51 ` Peter Zijlstra
2017-05-04 6:21 ` Peter Zijlstra
2017-05-04 9:49 ` Dietmar Eggemann
2017-05-04 10:57 ` Peter Zijlstra
2017-05-04 17:39 ` Tejun Heo
2017-05-05 10:36 ` Dietmar Eggemann
2017-05-04 10:26 ` Vincent Guittot
2017-04-25 8:35 ` [PATCH " Vincent Guittot
2017-04-25 18:12 ` Tejun Heo
2017-04-26 16:51 ` Vincent Guittot
2017-04-26 22:40 ` Tejun Heo
2017-04-27 7:00 ` Vincent Guittot
2017-05-01 14:17 ` Peter Zijlstra [this message]
2017-05-01 14:52 ` Peter Zijlstra
2017-05-01 21:56 ` Tejun Heo
2017-05-02 8:19 ` Peter Zijlstra
2017-05-02 8:30 ` Peter Zijlstra
2017-05-02 20:00 ` Tejun Heo
2017-05-03 9:10 ` Peter Zijlstra
2017-04-26 16:14 ` Vincent Guittot
2017-04-26 22:27 ` Tejun Heo
2017-04-27 8:59 ` Vincent Guittot
2017-04-28 17:46 ` Tejun Heo
2017-05-02 7:20 ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25 8:46 ` Vincent Guittot
2017-04-25 9:05 ` Vincent Guittot
2017-04-25 12:59 ` Vincent Guittot
2017-04-25 18:49 ` Tejun Heo
2017-04-25 20:49 ` Tejun Heo
2017-04-25 21:15 ` Chris Mason
2017-04-25 21:08 ` Tejun Heo
2017-04-26 10:21 ` Vincent Guittot
2017-04-27 0:30 ` Tejun Heo
2017-04-27 8:28 ` Vincent Guittot
2017-04-28 16:14 ` Tejun Heo
2017-05-02 6:56 ` Vincent Guittot
2017-05-02 20:56 ` Tejun Heo
2017-05-03 7:25 ` Vincent Guittot
2017-05-03 7:54 ` Vincent Guittot
2017-04-26 18:12 ` Vincent Guittot
2017-04-26 22:52 ` Tejun Heo
2017-04-27 8:29 ` Vincent Guittot
2017-04-28 20:33 ` Tejun Heo
2017-04-28 20:38 ` Tejun Heo
2017-05-01 15:56 ` Peter Zijlstra
2017-05-02 22:01 ` Tejun Heo
2017-05-02 7:18 ` Vincent Guittot
2017-05-02 13:26 ` Vincent Guittot
2017-05-02 22:37 ` Tejun Heo
2017-05-02 21:50 ` Tejun Heo
2017-05-03 7:34 ` Vincent Guittot
2017-05-03 9:37 ` Peter Zijlstra
2017-05-03 10:37 ` Vincent Guittot
2017-05-03 13:09 ` Peter Zijlstra
2017-05-03 21:49 ` Tejun Heo
2017-05-04 8:19 ` Vincent Guittot
2017-05-04 17:43 ` Tejun Heo
2017-05-04 19:02 ` Vincent Guittot
2017-05-04 19:04 ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48 ` Peter Zijlstra
2017-04-24 22:54 ` Tejun Heo
2017-04-25 21:09 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170501141733.shphf35psasefraj@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=clm@fb.com \
--cc=efault@gmx.de \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pjt@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).