All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, linux-kernel@vger.kernel.org,
	rickyiu@google.com, odin@uged.al, sachinp@linux.vnet.ibm.com,
	naresh.kamboju@linaro.org
Subject: Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg
Date: Fri, 7 Jan 2022 16:21:31 +0100	[thread overview]
Message-ID: <CAKfTPtAREuJtj8AuZPwfe_=W7v8J-UOXDWeyBL0-VcKGaTSr5Q@mail.gmail.com> (raw)
In-Reply-To: <f1549032-50f6-e9fc-a7ae-24373352576b@arm.com>

On Fri, 7 Jan 2022 at 12:43, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 05/01/2022 14:57, Vincent Guittot wrote:
> > On Wed, 5 Jan 2022 at 14:15, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>
> >> On 04/01/2022 14:42, Vincent Guittot wrote:
> >>> On Tue, 4 Jan 2022 at 12:47, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>>>
> >>>> On 22/12/2021 10:38, Vincent Guittot wrote:
> >>
> >> [...]
> >>
> >>>> I still wonder whether the regression only comes from the changes in
> >>>> update_cfs_rq_load_avg(), introduced by 1c35b07e6d39.
> >>>> And could be fixed only by this part of the patch-set (A):
> >>>
> >>> I have been able to trigger the warning even with (A) though It took
> >>> much more time.
> >>> And I have been able to catch wrong situations  (with additional
> >>> traces) in the 3 places A, B and C
> >>
> >> OK. By wrong situation you mean '_sum < _avg * MIN_DIVIDER' ?
> >
> > not only.
> > also util_sum == 0 but util_avg !=0 in different places although these
>
> Ah OK, I saw this one as part of '_sum < _avg * MIN_DIVIDER'.
>
> > situation didn't trigger sched_warn because some other sync happened
> > before the periodic call of __update_blocked_fair
> > or if nr_running == 1 and  and task's util_avg/sum > cfs' util_avg/sum
> > just before removing the task
>
> I see.
>
> >>>> @@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq
> >>>> *cfs_rq)
> >>>>
> >>>>     r = removed_load;
> >>>>     sub_positive(&sa->load_avg, r);
> >>>> -   sa->load_sum = sa->load_avg * divider;
> >>>> +   sub_positive(&sa->load_sum, r * divider);
> >>>> +   sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
> >>>>
> >>>>     r = removed_util;
> >>>>     sub_positive(&sa->util_avg, r);
> >>>> -   sa->util_sum = sa->util_avg * divider;
> >>>> +   sub_positive(&sa->util_sum, r * divider);
> >>>> +   sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
> >>>>
> >>>>     r = removed_runnable;
> >>>>     sub_positive(&sa->runnable_avg, r);
> >>>> -   sa->runnable_sum = sa->runnable_avg * divider;
> >>>> +   sub_positive(&sa->runnable_sum, r * divider);
> >>>> +   sa->runnable_sum = max_t(u32, sa->runnable_sum,
> >>>> +                                 sa->runnable_avg * MIN_DIVIDER);
> >>>>
> >>>> i.e. w/o changing update_tg_cfs_X() (and
> >>>> detach_entity_load_avg()/dequeue_load_avg()).
> >>>>
> >>>> update_load_avg()
> >>>>   update_cfs_rq_load_avg()    <---
> >>>>   propagate_entity_load_avg()
> >>>>     update_tg_cfs_X()         <---
> >>>>
> >>>>
> >>>> I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on
> >>>> hackbench in several different sched group levels on
> >>>> [Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime].
> >>>
> >>> IIRC, it was with hikey960 with cgroup v1
> >>> As a side note, I never trigger the problem with dragonboard845 and cgroup v2
> >>
> >> OK, just started a test on hikey960 cgroupv1. Let's see if I can catch it.
>
> Still no sign of the issue (hikey960, cgroupv1,  4 taskgroups: A/B C/D
> E/F G/H > 45h uptime
>
> >>>>> @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>>>
> >>>>>       dequeue_load_avg(cfs_rq, se);
> >>>>>       sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
> >>>>> -     cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> >>>>> +     sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
> >>>>> +     /* See update_tg_cfs_util() */
> >>>>> +     cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> >>>>> +                                       cfs_rq->avg.util_avg * MIN_DIVIDER);
> >>>>> +
> >>>>
> >>>> Maybe add a:
> >>>>
> >>>> Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced
> >>>> with *_avg")
> >>>
> >>> I spent time thinking about adding fixes tag. There is no crash/warn
> >>> so far so should we propagate it back in LTS for better performance ?
> >>
> >> Not sure I understand. What do you mean by 'should we propagate it back
> >> in LTS'?
> >
> > Sorry I had any stables in mind and not only LTS.
> >
> > Some of the changes done in PELT signal propagation that replace
> > subtracting util_sum  by using util_avg * divider instead, are related
> > to other problems with sched group hierarchy and
> > throttling/unthrottling. I'm not 100% confident that using fixes tag
> > to backport this on stables doesn't need to backport more patches on
> > other areas in order to not resurrect old problems. So I wonder if
> > it's worth  backporting this on stables
>
> OK, I see. So only 1c35b07e6d39 (i.e. the util _sum/_avg change in
> update_cfs_rq_load_avg() (1)) caused the CPU frequency regression. That
> was the reason why I initially suggested to split the patch-set
> differently. But you said that you saw the issue also when (1) is fixed.

Ok, I think that we were not speaking about the same setup. I wrongly
read that you were saying that
sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
was only needed in update_cfs_rq_load_avg() but not in the other places.

But what you said is that we only need the below to fix the perf
regression raised by rick ?
     r = removed_util;
     sub_positive(&sa->util_avg, r);
 -   sa->util_sum = sa->util_avg * divider;
 +   sub_positive(&sa->util_sum, r * divider);
 +   sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);

The WARN that I mentioned in my previous email was about not adding
the max_t in all 3 places. I rerun some test today and I triggered the
WARN after a detach without the max_t line.

I can probably isolate the code above in a dedicated patch for the
regression raised by Rick and we could consider adding a fixes tag; I
will run more tests with only this part during the weekend.
That being said, we need to stay consistent in all 3 places where we
move or propagate some *_avg. In particular, using  "sa->util_sum =
sa->util_avg * divider" has the drawback of filtering the contribution
not already accounted for in util_avg and the impact is much larger
than expected. It means that  although fixing only
update_cfs_rq_load_avg() seems enough for rick's case, some other
cases could be impacted by the 2 other places and we need to fixed
them now that we have a better view of the root cause

  reply	other threads:[~2022-01-07 15:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22  9:37 [PATCH v2 0/3] sched/pelt: Don't sync hardly *_sum with *_avg Vincent Guittot
2021-12-22  9:38 ` [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann
2022-01-04 13:42     ` Vincent Guittot
2022-01-05 13:15       ` Dietmar Eggemann
2022-01-05 13:57         ` Vincent Guittot
2022-01-07 11:43           ` Dietmar Eggemann
2022-01-07 15:21             ` Vincent Guittot [this message]
2022-01-11  7:54               ` Vincent Guittot
2022-01-11 12:37                 ` Dietmar Eggemann
2022-01-04 13:48     ` Vincent Guittot
2021-12-22  9:38 ` [PATCH v2 2/3] sched/pelt: Don't sync hardly runnable_sum with runnable_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann
2021-12-22  9:38 ` [PATCH v2 3/3] sched/pelt: Don't sync hardly load_sum with load_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann
2022-01-04 11:46 ` [PATCH v2 0/3] sched/pelt: Don't sync hardly *_sum with *_avg Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtAREuJtj8AuZPwfe_=W7v8J-UOXDWeyBL0-VcKGaTSr5Q@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=naresh.kamboju@linaro.org \
    --cc=odin@uged.al \
    --cc=peterz@infradead.org \
    --cc=rickyiu@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sachinp@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.