From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Joseph Salisbury <joseph.salisbury@canonical.com>,
Ingo Molnar <mingo@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Mike Galbraith <efault@gmx.de>,
omer.akram@canonical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Date: Mon, 17 Oct 2016 15:54:42 +0200 [thread overview]
Message-ID: <CAKfTPtAxw1b-vy285HKtPUBFYuJdv2CFZH_gP3CMtZHs1wLPXg@mail.gmail.com> (raw)
In-Reply-To: <20161017131952.GR3117@twins.programming.kicks-ass.net>
On 17 October 2016 at 15:19, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Oct 17, 2016 at 12:49:55PM +0100, Dietmar Eggemann wrote:
>
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 8b03fb5..8926685 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2902,7 +2902,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> > */
>> > static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
>> > {
>> > - long delta = cfs_rq->avg.load_avg - cfs_rq->tg_load_avg_contrib;
>> > + unsigned long load_avg = READ_ONCE(cfs_rq->avg.load_avg);
>> > + long delta = load_avg - cfs_rq->tg_load_avg_contrib;
>> >
>> > /*
>> > * No need to update load_avg for root_task_group as it is not used.
>> > @@ -2912,7 +2913,7 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
>> >
>> > if (force || abs(delta) > cfs_rq->tg_load_avg_contrib / 64) {
>> > atomic_long_add(delta, &cfs_rq->tg->load_avg);
>> > - cfs_rq->tg_load_avg_contrib = cfs_rq->avg.load_avg;
>> > + cfs_rq->tg_load_avg_contrib = load_avg;
>> > }
>> > }
>>
>> Tested it on an Ubuntu 16.10 Server (on top of the default 4.8.0-22-generic
>> kernel) on a Lenovo T430 and it didn't help.
>
> Right, I don't think that race exists, we update cfs_rq->avg.load_avg
> with rq->lock held and have it held here, so it cannot change under us.
yes I agree. It was just to be sure that an unexpected race condition
doesn't happen
>
> This might do with a few lockdep_assert_held() instances to clarify this
> though.
>
>> What seems to cure it is to get rid of this snippet (part of the commit
>> mentioned earlier in this thread: 3d30544f0212):
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 039de34f1521..16c692049fbf 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -726,7 +726,6 @@ void post_init_entity_util_avg(struct sched_entity *se)
>> struct sched_avg *sa = &se->avg;
>> long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
>> u64 now = cfs_rq_clock_task(cfs_rq);
>> - int tg_update;
>>
>> if (cap > 0) {
>> if (cfs_rq->avg.util_avg != 0) {
>> @@ -759,10 +758,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
>> }
>> }
>>
>> - tg_update = update_cfs_rq_load_avg(now, cfs_rq, false);
>> + update_cfs_rq_load_avg(now, cfs_rq, false);
>> attach_entity_load_avg(cfs_rq, se);
>> - if (tg_update)
>> - update_tg_load_avg(cfs_rq, false);
>> }
>>
>> #else /* !CONFIG_SMP */
>>
>> BTW, I guess we can reach .tg_load_avg up to ~300000-400000 on such a system
>> initially because systemd will create all ~100 services (and therefore the
>> corresponding 2. level tg's) at once. In my previous example, there was 500ms
>> between the creation of 2 tg's so there was a lot of decaying going on in between.
>
> Cute... on current kernels that translates to simply removing the call
> to update_tg_load_avg(), lets see if we can figure out what goes
> sideways first though, because it _should_ decay back out. And if that
yes, Reaching ~300000-400000 is not an issue in itself, the problem is
that load_avg has decayed but it has not been reflected in
tg->load_avg in the buggy case
> can fail here, I'm not seeing why that wouldn't fail elsewhere either.
>
> I'll see if I can reproduce this with a script creating heaps of cgroups
> in a hurry, I have a total lack of system-disease on all my machines.
>
>
> /me goes prod..
next prev parent reply other threads:[~2016-10-17 13:55 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-07 19:38 [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Joseph Salisbury
2016-10-07 19:57 ` Linus Torvalds
2016-10-07 20:22 ` Joseph Salisbury
2016-10-07 20:37 ` Linus Torvalds
2016-10-08 8:00 ` Peter Zijlstra
2016-10-08 8:39 ` Ingo Molnar
2016-10-08 11:37 ` Vincent Guittot
2016-10-08 11:49 ` Mike Galbraith
2016-10-12 12:20 ` Vincent Guittot
2016-10-12 15:35 ` Joseph Salisbury
2016-10-12 16:21 ` Joseph Salisbury
2016-10-13 10:58 ` Vincent Guittot
2016-10-13 15:52 ` Joseph Salisbury
2016-10-13 16:48 ` Vincent Guittot
2016-10-13 18:49 ` Dietmar Eggemann
2016-10-13 21:34 ` Vincent Guittot
2016-10-14 8:24 ` Vincent Guittot
2016-10-14 13:10 ` Dietmar Eggemann
2016-10-14 15:18 ` Vincent Guittot
2016-10-14 16:04 ` Joseph Salisbury
2016-10-17 9:09 ` Vincent Guittot
2016-10-17 11:49 ` Dietmar Eggemann
2016-10-17 13:19 ` Peter Zijlstra
2016-10-17 13:54 ` Vincent Guittot [this message]
2016-10-17 22:52 ` Dietmar Eggemann
2016-10-18 8:43 ` Vincent Guittot
2016-10-18 9:07 ` Peter Zijlstra
2016-10-18 9:45 ` Vincent Guittot
2016-10-18 10:34 ` Peter Zijlstra
2016-10-18 11:56 ` Vincent Guittot
2016-10-18 21:58 ` Joonwoo Park
2016-10-19 6:42 ` Vincent Guittot
2016-10-19 9:46 ` Dietmar Eggemann
2016-10-19 11:25 ` Vincent Guittot
2016-10-19 15:33 ` Dietmar Eggemann
2016-10-19 17:33 ` Joonwoo Park
2016-10-19 17:50 ` Vincent Guittot
2016-10-19 11:33 ` Peter Zijlstra
2016-10-19 11:50 ` Vincent Guittot
2016-10-19 13:30 ` Morten Rasmussen
2016-10-19 17:41 ` Vincent Guittot
2016-10-20 7:56 ` Morten Rasmussen
2016-10-19 14:49 ` Joseph Salisbury
2016-10-19 14:53 ` Vincent Guittot
2016-10-18 11:15 ` Dietmar Eggemann
2016-10-18 12:07 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKfTPtAxw1b-vy285HKtPUBFYuJdv2CFZH_gP3CMtZHs1wLPXg@mail.gmail.com \
--to=vincent.guittot@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=efault@gmx.de \
--cc=joseph.salisbury@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=omer.akram@canonical.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).