All of lore.kernel.org
 help / color / mirror / Atom feed
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Joseph Salisbury <joseph.salisbury@canonical.com>,
	Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>,
	omer.akram@canonical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Date: Thu, 20 Oct 2016 08:56:49 +0100	[thread overview]
Message-ID: <20161020075649.GH7509@e105550-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAKfTPtCK=riUXyapEhTxB9ZF+HrPzO+=9pOQYsYsNgyRa+rc6w@mail.gmail.com>

On Wed, Oct 19, 2016 at 07:41:36PM +0200, Vincent Guittot wrote:
> On 19 October 2016 at 15:30, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> > On Tue, Oct 18, 2016 at 01:56:51PM +0200, Vincent Guittot wrote:
> >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit :
> >> > On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote:
> >> > > On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > > > So aside from funny BIOSes, this should also show up when creating
> >> > > > cgroups when you have offlined a few CPUs, which is far more common I'd
> >> > > > think.
> >> > >
> >> > > The problem is also that the load of the tg->se[cpu] that represents
> >> > > the tg->cfs_rq[cpu] is initialized to 1024 in:
> >> > > alloc_fair_sched_group
> >> > >      for_each_possible_cpu(i) {
> >> > >          init_entity_runnable_average(se);
> >> > >             sa->load_avg = scale_load_down(se->load.weight);
> >> > >
> >> > > Initializing  sa->load_avg to 1024 for a newly created task makes
> >> > > sense as we don't know yet what will be its real load but i'm not sure
> >> > > that we have to do the same for se that represents a task group. This
> >> > > load should be initialized to 0 and it will increase when task will be
> >> > > moved/attached into task group
> >> >
> >> > Yes, I think that makes sense, not sure how horrible that is with the
> >>
> >> That should not be that bad because this initial value is only useful for
> >> the few dozens of ms that follow the creation of the task group
> >
> > IMHO, it doesn't make much sense to initialize empty containers, which
> > group sched_entities really are, to 1024. It is meant to represent what
> > is in it, and a creation it is empty, so in my opinion initializing it
> > to zero make sense.
> >
> >> > current state of things, but after your propagate patch, that
> >> > reinstates the interactivity hack that should work for sure.
> >
> > It actually works on mainline/tip as well.
> >
> > As I see it, the fundamental problem is keeping group entities up to
> > date. Because the load_weight and hence se->avg.load_avg each per-cpu
> > group sched_entity depends on the group cfs_rq->tg_load_avg_contrib for
> > all cpus (tg->load_avg), including those that might be empty and
> > therefore not enqueued, we must ensure that they are updated some other
> > way. Most naturally as part of update_blocked_averages().
> >
> > To guarantee that, it basically boils down to making sure:
> > Any cfs_rq with a non-zero tg_load_avg_contrib must be on the
> > leaf_cfs_rq_list.
> >
> > We can do that in different ways: 1) Add all cfs_rqs to the
> > leaf_cfs_rq_list at task group creation, or 2) initialize group
> > sched_entity contributions to zero and make sure that they are added to
> > leaf_cfs_rq_list as soon as a sched_entity (task or group) is enqueued
> > on it.
> >
> > Vincent patch below gives us the second option.
> >
> >>  kernel/sched/fair.c | 9 ++++++++-
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 8b03fb5..89776ac 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se)
> >>        * will definitely be update (after enqueue).
> >>        */
> >>       sa->period_contrib = 1023;
> >> -     sa->load_avg = scale_load_down(se->load.weight);
> >> +     /*
> >> +      * Tasks are intialized with full load to be seen as heavy task until
> >> +      * they get a chance to stabilize to their real load level.
> >> +      * group entity are intialized with null load to reflect the fact that
> >> +      * nothing has been attached yet to the task group.
> >> +      */
> >> +     if (entity_is_task(se))
> >> +             sa->load_avg = scale_load_down(se->load.weight);
> >>       sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> >>       /*
> >>        * At this point, util_avg won't be used in select_task_rq_fair anyway
> >
> > I would suggest adding a comment somewhere stating that we need to keep
> > group cfs_rqs up to date:
> >
> > -----
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index abb3763dff69..2b820d489be0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6641,6 +6641,11 @@ static void update_blocked_averages(int cpu)
> >                 if (throttled_hierarchy(cfs_rq))
> >                         continue;
> >
> > +               /*
> > +                * Note that _any_ leaf cfs_rq with a non-zero tg_load_avg_contrib
> > +                * _must_ be on the leaf_cfs_rq_list to ensure that group shares
> > +                * are updated correctly.
> > +                */
> 
> As discussed on IRC, the point is that even if the leaf cfs_rq is
> added to the leaf_cfs_rq_list, it doesn't ensure that it will be
> updated correctly for unplugged CPUs

Agreed. We have to ensure that tg_load_avg_contrib is zeroed for leaf
cfs_rqs belonging to unplugged cpus. And if modify the above to say
leaf_cfs_rq_list of an online cpu, then we should be covered I think.

  reply	other threads:[~2016-10-20  7:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 19:38 [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Joseph Salisbury
2016-10-07 19:57 ` Linus Torvalds
2016-10-07 20:22   ` Joseph Salisbury
2016-10-07 20:37     ` Linus Torvalds
2016-10-08  8:00 ` Peter Zijlstra
2016-10-08  8:39   ` Ingo Molnar
2016-10-08 11:37     ` Vincent Guittot
2016-10-08 11:49       ` Mike Galbraith
2016-10-12 12:20         ` Vincent Guittot
2016-10-12 15:35           ` Joseph Salisbury
2016-10-12 16:21           ` Joseph Salisbury
2016-10-13 10:58             ` Vincent Guittot
2016-10-13 15:52               ` Joseph Salisbury
2016-10-13 16:48                 ` Vincent Guittot
2016-10-13 18:49                   ` Dietmar Eggemann
2016-10-13 21:34                     ` Vincent Guittot
2016-10-14  8:24                       ` Vincent Guittot
2016-10-14 13:10                         ` Dietmar Eggemann
2016-10-14 15:18                           ` Vincent Guittot
2016-10-14 16:04                             ` Joseph Salisbury
2016-10-17  9:09                               ` Vincent Guittot
2016-10-17 11:49                                 ` Dietmar Eggemann
2016-10-17 13:19                                   ` Peter Zijlstra
2016-10-17 13:54                                     ` Vincent Guittot
2016-10-17 22:52                                       ` Dietmar Eggemann
2016-10-18  8:43                                         ` Vincent Guittot
2016-10-18  9:07                                         ` Peter Zijlstra
2016-10-18  9:45                                           ` Vincent Guittot
2016-10-18 10:34                                             ` Peter Zijlstra
2016-10-18 11:56                                               ` Vincent Guittot
2016-10-18 21:58                                                 ` Joonwoo Park
2016-10-19  6:42                                                   ` Vincent Guittot
2016-10-19  9:46                                                 ` Dietmar Eggemann
2016-10-19 11:25                                                   ` Vincent Guittot
2016-10-19 15:33                                                     ` Dietmar Eggemann
2016-10-19 17:33                                                       ` Joonwoo Park
2016-10-19 17:50                                                       ` Vincent Guittot
2016-10-19 11:33                                                 ` Peter Zijlstra
2016-10-19 11:50                                                   ` Vincent Guittot
2016-10-19 13:30                                                 ` Morten Rasmussen
2016-10-19 17:41                                                   ` Vincent Guittot
2016-10-20  7:56                                                     ` Morten Rasmussen [this message]
2016-10-19 14:49                                                 ` Joseph Salisbury
2016-10-19 14:53                                                   ` Vincent Guittot
2016-10-18 11:15                                           ` Dietmar Eggemann
2016-10-18 12:07                                             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161020075649.GH7509@e105550-lin.cambridge.arm.com \
    --to=morten.rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=joseph.salisbury@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=omer.akram@canonical.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.