linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity
Date: Wed, 26 Apr 2017 18:14:17 +0200	[thread overview]
Message-ID: <CAKfTPtD_PneNuAC43vh-+s6_zunL3fz3u+S55WuiNAW5hpzQUg@mail.gmail.com> (raw)
In-Reply-To: <20170424201415.GB14169@wtj.duckdns.org>

On 24 April 2017 at 22:14, Tejun Heo <tj@kernel.org> wrote:
> 09a43ace1f98 ("sched/fair: Propagate load during synchronous
> attach/detach") added immediate load propagation from cfs_rq to its
> sched_entity then to the parent cfs_rq; however, what gets propagated
> doesn't seem to make sense.
>
> It repeats the tg_weight calculation done in calc_cfs_shares() but
> only uses it to compensate for shares being out of date.  After that,
> it sets the sched_entity's load_avg to the load_avg of the
> corresponding cfs_rq.
>
> This doesn't make sense as the cfs_rq's load_avg is some fraction of
> its total weight, which the sched_entity's weight has nothing to with.
> For example, if the cfs_rq has a single constant load 1 task the
> cfs_rq's load_avg would be around 1.  If that cfs_rq is the only
> active sched_entity in the parent cfs_rq which has the maximum weight,
> the sched_entity's load should be around the maximum weight but
> update_tg_cfs_load() ends up overriding it to 1.
>
> At the parent's level, the absolute value of load_avg inside a child
> cfs_rq doesn't mean anything.  Only the ratio against its weight is
> meaningful.
>
> This patch changes update_tg_cfs_load() to normalize the
> runnable_load_avg of the cfs_rq and then scale it to the matching
> sched_entity's freshly calculated shares for propagation.  Use of
> runnable_load_avg instead of load_avg is intentional and keeps the
> parent's runnable_load_avg true to the sum of scaled loads of all
> tasks queued under it which is critical for the correction operation
> of load balancer.  The next patch will depend on it.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Turner <pjt@google.com>
> ---
>  kernel/sched/fair.c |   46 +++++++++++++++++++---------------------------
>  1 file changed, 19 insertions(+), 27 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3078,37 +3078,29 @@ static inline void
>  update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>         struct cfs_rq *gcfs_rq = group_cfs_rq(se);
> -       long delta, load = gcfs_rq->avg.load_avg;
> +       long load = 0, delta;
>
>         /*
> -        * If the load of group cfs_rq is null, the load of the
> -        * sched_entity will also be null so we can skip the formula
> +        * A cfs_rq's load avg contribution to the parent should be scaled
> +        * to the sched_entity's weight.  Use freshly calculated shares
> +        * instead of @se->load.weight as the latter may not reflect
> +        * changes from the current scheduling operation.
> +        *
> +        * Note that the propagation source is runnable_load_avg instead of
> +        * load_avg.  This keeps every cfs_rq's runnable_load_avg true to
> +        * the sum of the scaled loads of all tasks queued under it, which
> +        * is important for the correct operation of the load balancer.
> +        *
> +        * This can make the sched_entity's load_avg jumpier but that
> +        * correctly reflects what would happen without cgroups if each
> +        * task's load is scaled across nesting - the load is being
> +        * averaged at the task and each cfs_rq.
>          */
> -       if (load) {
> -               long tg_load;
> +       if (gcfs_rq->load.weight) {
> +               long shares = calc_cfs_shares(gcfs_rq, gcfs_rq->tg);
>
> -               /* Get tg's load and ensure tg_load > 0 */
> -               tg_load = atomic_long_read(&gcfs_rq->tg->load_avg) + 1;
> -
> -               /* Ensure tg_load >= load and updated with current load*/
> -               tg_load -= gcfs_rq->tg_load_avg_contrib;
> -               tg_load += load;
> -
> -               /*
> -                * We need to compute a correction term in the case that the
> -                * task group is consuming more CPU than a task of equal
> -                * weight. A task with a weight equals to tg->shares will have
> -                * a load less or equal to scale_load_down(tg->shares).
> -                * Similarly, the sched_entities that represent the task group
> -                * at parent level, can't have a load higher than
> -                * scale_load_down(tg->shares). And the Sum of sched_entities'
> -                * load must be <= scale_load_down(tg->shares).
> -                */
> -               if (tg_load > scale_load_down(gcfs_rq->tg->shares)) {
> -                       /* scale gcfs_rq's load into tg's shares*/
> -                       load *= scale_load_down(gcfs_rq->tg->shares);
> -                       load /= tg_load;
> -               }
> +               load = min(gcfs_rq->runnable_load_avg *
> +                          shares / gcfs_rq->load.weight, shares);

There is a unit problem above:
runnable_load_avg and shares are not in the same range but
runnable_load_avg and  scale_load_down(gcfs_rq->load.weight) are so
you should use
gcfs_rq->runnable_load_avg * scale_load_down(shares) /
scale_load_down(gcfs_rq->load.weight).
Hopefully both  scale_load_down cancel between them
But the min should be then tested with scale_load_down(shares) and not
only shares


>         }
>
>         delta = load - se->avg.load_avg;

  parent reply	other threads:[~2017-04-26 16:14 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33   ` [PATCH v2 " Tejun Heo
2017-05-03 18:00     ` Peter Zijlstra
2017-05-03 21:45       ` Tejun Heo
2017-05-04  5:51         ` Peter Zijlstra
2017-05-04  6:21           ` Peter Zijlstra
2017-05-04  9:49             ` Dietmar Eggemann
2017-05-04 10:57               ` Peter Zijlstra
2017-05-04 17:39               ` Tejun Heo
2017-05-05 10:36                 ` Dietmar Eggemann
2017-05-04 10:26       ` Vincent Guittot
2017-04-25  8:35   ` [PATCH " Vincent Guittot
2017-04-25 18:12     ` Tejun Heo
2017-04-26 16:51       ` Vincent Guittot
2017-04-26 22:40         ` Tejun Heo
2017-04-27  7:00           ` Vincent Guittot
2017-05-01 14:17         ` Peter Zijlstra
2017-05-01 14:52           ` Peter Zijlstra
2017-05-01 21:56           ` Tejun Heo
2017-05-02  8:19             ` Peter Zijlstra
2017-05-02  8:30               ` Peter Zijlstra
2017-05-02 20:00                 ` Tejun Heo
2017-05-03  9:10                   ` Peter Zijlstra
2017-04-26 16:14   ` Vincent Guittot [this message]
2017-04-26 22:27     ` Tejun Heo
2017-04-27  8:59       ` Vincent Guittot
2017-04-28 17:46         ` Tejun Heo
2017-05-02  7:20           ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25  8:46   ` Vincent Guittot
2017-04-25  9:05     ` Vincent Guittot
2017-04-25 12:59       ` Vincent Guittot
2017-04-25 18:49         ` Tejun Heo
2017-04-25 20:49           ` Tejun Heo
2017-04-25 21:15             ` Chris Mason
2017-04-25 21:08           ` Tejun Heo
2017-04-26 10:21             ` Vincent Guittot
2017-04-27  0:30               ` Tejun Heo
2017-04-27  8:28                 ` Vincent Guittot
2017-04-28 16:14                   ` Tejun Heo
2017-05-02  6:56                     ` Vincent Guittot
2017-05-02 20:56                       ` Tejun Heo
2017-05-03  7:25                         ` Vincent Guittot
2017-05-03  7:54                           ` Vincent Guittot
2017-04-26 18:12   ` Vincent Guittot
2017-04-26 22:52     ` Tejun Heo
2017-04-27  8:29       ` Vincent Guittot
2017-04-28 20:33         ` Tejun Heo
2017-04-28 20:38           ` Tejun Heo
2017-05-01 15:56           ` Peter Zijlstra
2017-05-02 22:01             ` Tejun Heo
2017-05-02  7:18           ` Vincent Guittot
2017-05-02 13:26             ` Vincent Guittot
2017-05-02 22:37               ` Tejun Heo
2017-05-02 21:50             ` Tejun Heo
2017-05-03  7:34               ` Vincent Guittot
2017-05-03  9:37                 ` Peter Zijlstra
2017-05-03 10:37                   ` Vincent Guittot
2017-05-03 13:09                     ` Peter Zijlstra
2017-05-03 21:49                       ` Tejun Heo
2017-05-04  8:19                         ` Vincent Guittot
2017-05-04 17:43                           ` Tejun Heo
2017-05-04 19:02                             ` Vincent Guittot
2017-05-04 19:04                               ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48   ` Peter Zijlstra
2017-04-24 22:54     ` Tejun Heo
2017-04-25 21:09   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtD_PneNuAC43vh-+s6_zunL3fz3u+S55WuiNAW5hpzQUg@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=clm@fb.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).