linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Joseph Salisbury <joseph.salisbury@canonical.com>,
	Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>,
	omer.akram@canonical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Date: Thu, 20 Oct 2016 08:56:49 +0100	[thread overview]
Message-ID: <20161020075649.GH7509@e105550-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAKfTPtCK=riUXyapEhTxB9ZF+HrPzO+=9pOQYsYsNgyRa+rc6w@mail.gmail.com>

On Wed, Oct 19, 2016 at 07:41:36PM +0200, Vincent Guittot wrote:
> On 19 October 2016 at 15:30, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> > On Tue, Oct 18, 2016 at 01:56:51PM +0200, Vincent Guittot wrote:
> >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit :
> >> > On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote:
> >> > > On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > > > So aside from funny BIOSes, this should also show up when creating
> >> > > > cgroups when you have offlined a few CPUs, which is far more common I'd
> >> > > > think.
> >> > >
> >> > > The problem is also that the load of the tg->se[cpu] that represents
> >> > > the tg->cfs_rq[cpu] is initialized to 1024 in:
> >> > > alloc_fair_sched_group
> >> > >      for_each_possible_cpu(i) {
> >> > >          init_entity_runnable_average(se);
> >> > >             sa->load_avg = scale_load_down(se->load.weight);
> >> > >
> >> > > Initializing  sa->load_avg to 1024 for a newly created task makes
> >> > > sense as we don't know yet what will be its real load but i'm not sure
> >> > > that we have to do the same for se that represents a task group. This
> >> > > load should be initialized to 0 and it will increase when task will be
> >> > > moved/attached into task group
> >> >
> >> > Yes, I think that makes sense, not sure how horrible that is with the
> >>
> >> That should not be that bad because this initial value is only useful for
> >> the few dozens of ms that follow the creation of the task group
> >
> > IMHO, it doesn't make much sense to initialize empty containers, which
> > group sched_entities really are, to 1024. It is meant to represent what
> > is in it, and a creation it is empty, so in my opinion initializing it
> > to zero make sense.
> >
> >> > current state of things, but after your propagate patch, that
> >> > reinstates the interactivity hack that should work for sure.
> >
> > It actually works on mainline/tip as well.
> >
> > As I see it, the fundamental problem is keeping group entities up to
> > date. Because the load_weight and hence se->avg.load_avg each per-cpu
> > group sched_entity depends on the group cfs_rq->tg_load_avg_contrib for
> > all cpus (tg->load_avg), including those that might be empty and
> > therefore not enqueued, we must ensure that they are updated some other
> > way. Most naturally as part of update_blocked_averages().
> >
> > To guarantee that, it basically boils down to making sure:
> > Any cfs_rq with a non-zero tg_load_avg_contrib must be on the
> > leaf_cfs_rq_list.
> >
> > We can do that in different ways: 1) Add all cfs_rqs to the
> > leaf_cfs_rq_list at task group creation, or 2) initialize group
> > sched_entity contributions to zero and make sure that they are added to
> > leaf_cfs_rq_list as soon as a sched_entity (task or group) is enqueued
> > on it.
> >
> > Vincent patch below gives us the second option.
> >
> >>  kernel/sched/fair.c | 9 ++++++++-
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 8b03fb5..89776ac 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se)
> >>        * will definitely be update (after enqueue).
> >>        */
> >>       sa->period_contrib = 1023;
> >> -     sa->load_avg = scale_load_down(se->load.weight);
> >> +     /*
> >> +      * Tasks are intialized with full load to be seen as heavy task until
> >> +      * they get a chance to stabilize to their real load level.
> >> +      * group entity are intialized with null load to reflect the fact that
> >> +      * nothing has been attached yet to the task group.
> >> +      */
> >> +     if (entity_is_task(se))
> >> +             sa->load_avg = scale_load_down(se->load.weight);
> >>       sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> >>       /*
> >>        * At this point, util_avg won't be used in select_task_rq_fair anyway
> >
> > I would suggest adding a comment somewhere stating that we need to keep
> > group cfs_rqs up to date:
> >
> > -----
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index abb3763dff69..2b820d489be0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6641,6 +6641,11 @@ static void update_blocked_averages(int cpu)
> >                 if (throttled_hierarchy(cfs_rq))
> >                         continue;
> >
> > +               /*
> > +                * Note that _any_ leaf cfs_rq with a non-zero tg_load_avg_contrib
> > +                * _must_ be on the leaf_cfs_rq_list to ensure that group shares
> > +                * are updated correctly.
> > +                */
> 
> As discussed on IRC, the point is that even if the leaf cfs_rq is
> added to the leaf_cfs_rq_list, it doesn't ensure that it will be
> updated correctly for unplugged CPUs

Agreed. We have to ensure that tg_load_avg_contrib is zeroed for leaf
cfs_rqs belonging to unplugged cpus. And if modify the above to say
leaf_cfs_rq_list of an online cpu, then we should be covered I think.

  reply	other threads:[~2016-10-20  7:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 19:38 [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Joseph Salisbury
2016-10-07 19:57 ` Linus Torvalds
2016-10-07 20:22   ` Joseph Salisbury
2016-10-07 20:37     ` Linus Torvalds
2016-10-08  8:00 ` Peter Zijlstra
2016-10-08  8:39   ` Ingo Molnar
2016-10-08 11:37     ` Vincent Guittot
2016-10-08 11:49       ` Mike Galbraith
2016-10-12 12:20         ` Vincent Guittot
2016-10-12 15:35           ` Joseph Salisbury
2016-10-12 16:21           ` Joseph Salisbury
2016-10-13 10:58             ` Vincent Guittot
2016-10-13 15:52               ` Joseph Salisbury
2016-10-13 16:48                 ` Vincent Guittot
2016-10-13 18:49                   ` Dietmar Eggemann
2016-10-13 21:34                     ` Vincent Guittot
2016-10-14  8:24                       ` Vincent Guittot
2016-10-14 13:10                         ` Dietmar Eggemann
2016-10-14 15:18                           ` Vincent Guittot
2016-10-14 16:04                             ` Joseph Salisbury
2016-10-17  9:09                               ` Vincent Guittot
2016-10-17 11:49                                 ` Dietmar Eggemann
2016-10-17 13:19                                   ` Peter Zijlstra
2016-10-17 13:54                                     ` Vincent Guittot
2016-10-17 22:52                                       ` Dietmar Eggemann
2016-10-18  8:43                                         ` Vincent Guittot
2016-10-18  9:07                                         ` Peter Zijlstra
2016-10-18  9:45                                           ` Vincent Guittot
2016-10-18 10:34                                             ` Peter Zijlstra
2016-10-18 11:56                                               ` Vincent Guittot
2016-10-18 21:58                                                 ` Joonwoo Park
2016-10-19  6:42                                                   ` Vincent Guittot
2016-10-19  9:46                                                 ` Dietmar Eggemann
2016-10-19 11:25                                                   ` Vincent Guittot
2016-10-19 15:33                                                     ` Dietmar Eggemann
2016-10-19 17:33                                                       ` Joonwoo Park
2016-10-19 17:50                                                       ` Vincent Guittot
2016-10-19 11:33                                                 ` Peter Zijlstra
2016-10-19 11:50                                                   ` Vincent Guittot
2016-10-19 13:30                                                 ` Morten Rasmussen
2016-10-19 17:41                                                   ` Vincent Guittot
2016-10-20  7:56                                                     ` Morten Rasmussen [this message]
2016-10-19 14:49                                                 ` Joseph Salisbury
2016-10-19 14:53                                                   ` Vincent Guittot
2016-10-18 11:15                                           ` Dietmar Eggemann
2016-10-18 12:07                                             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161020075649.GH7509@e105550-lin.cambridge.arm.com \
    --to=morten.rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=joseph.salisbury@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=omer.akram@canonical.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).