All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, linux-kernel@vger.kernel.org,
	rickyiu@google.com, odin@uged.al
Cc: sachinp@linux.vnet.ibm.com, naresh.kamboju@linaro.org
Subject: Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg
Date: Tue, 4 Jan 2022 12:47:10 +0100	[thread overview]
Message-ID: <9e526482-905c-e759-8aa6-1ff84bb5b2a3@arm.com> (raw)
In-Reply-To: <20211222093802.22357-2-vincent.guittot@linaro.org>

On 22/12/2021 10:38, Vincent Guittot wrote:

s/util_sum with uti_avg/util_sum with util_avg

[...]

> +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)

Shouldn't this be in pelt.h?

[...]

> @@ -3466,13 +3466,30 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
>  	 */
>  	divider = get_pelt_divider(&cfs_rq->avg);
>  
> +
>  	/* Set new sched_entity's utilization */
>  	se->avg.util_avg = gcfs_rq->avg.util_avg;
> -	se->avg.util_sum = se->avg.util_avg * divider;
> +	new_sum = se->avg.util_avg * divider;
> +	delta_sum = (long)new_sum - (long)se->avg.util_sum;
> +	se->avg.util_sum = new_sum;
>  
>  	/* Update parent cfs_rq utilization */
> -	add_positive(&cfs_rq->avg.util_avg, delta);
> -	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> +	add_positive(&cfs_rq->avg.util_avg, delta_avg);
> +	add_positive(&cfs_rq->avg.util_sum, delta_sum);
> +
> +	/*
> +	 * Because of rounding, se->util_sum might ends up being +1 more than
> +	 * cfs->util_sum (XXX fix the rounding). Although this is not
> +	 * a problem by itself, detaching a lot of tasks with the rounding
> +	 * problem between 2 updates of util_avg (~1ms) can make cfs->util_sum
> +	 * becoming null whereas cfs_util_avg is not.
> +	 * Check that util_sum is still above its lower bound for the new
> +	 * util_avg. Given that period_contrib might have moved since the last
> +	 * sync, we are only sure that util_sum must be above or equal to
> +	 *    util_avg * minimum possible divider
> +	 */
> +	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> +					  cfs_rq->avg.util_avg * MIN_DIVIDER);
>  }
>

I still wonder whether the regression only comes from the changes in
update_cfs_rq_load_avg(), introduced by 1c35b07e6d39.
And could be fixed only by this part of the patch-set (A):

@@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq
*cfs_rq)

    r = removed_load;
    sub_positive(&sa->load_avg, r);
-   sa->load_sum = sa->load_avg * divider;
+   sub_positive(&sa->load_sum, r * divider);
+   sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);

    r = removed_util;
    sub_positive(&sa->util_avg, r);
-   sa->util_sum = sa->util_avg * divider;
+   sub_positive(&sa->util_sum, r * divider);
+   sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);

    r = removed_runnable;
    sub_positive(&sa->runnable_avg, r);
-   sa->runnable_sum = sa->runnable_avg * divider;
+   sub_positive(&sa->runnable_sum, r * divider);
+   sa->runnable_sum = max_t(u32, sa->runnable_sum,
+                                 sa->runnable_avg * MIN_DIVIDER);

i.e. w/o changing update_tg_cfs_X() (and
detach_entity_load_avg()/dequeue_load_avg()).

update_load_avg()
  update_cfs_rq_load_avg()    <---
  propagate_entity_load_avg()
    update_tg_cfs_X()         <---


I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on
hackbench in several different sched group levels on
[Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime].

Rick is probably in a position to test whether this would be sufficient
to cure the CPU frequency regression.

I can see that you want to use the same _avg/_sum sync in
detach_entity_load_avg()/dequeue_load_avg() as in
update_cfs_rq_load_avg(). (B)

And finally in update_tg_cfs_X() as well plus down-propagating _sum
independently from _avg. (C)

So rather splitting the patchset into X (util, runnable, load) the whole
change might be easier to handle IMHO when split into (A), (B), (C)
(obviously only in case (A) cures the regression).

>  static inline void
> @@ -3681,7 +3698,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  
>  		r = removed_util;
>  		sub_positive(&sa->util_avg, r);
> -		sa->util_sum = sa->util_avg * divider;
> +		sub_positive(&sa->util_sum, r * divider);
> +		/* See update_tg_cfs_util() */
> +		sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
>  
>  		r = removed_runnable;
>  		sub_positive(&sa->runnable_avg, r);
> @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  
>  	dequeue_load_avg(cfs_rq, se);
>  	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
> -	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> +	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
> +	/* See update_tg_cfs_util() */
> +	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> +					  cfs_rq->avg.util_avg * MIN_DIVIDER);
> +

Maybe add a:

Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced
with *_avg")

[...]

This max_t() should make sure that `_sum is always >= _avg *
MIN_DIVIDER`. Which is not the case sometimes. Currently this is done in

(1) update_cfs_rq_load_avg()
(2) detach_entity_load_avg() and dequeue_load_avg()
(3) update_tg_cfs_X()

but not in attach_entity_load_avg(), enqueue_load_avg(). What's the
reason for this?

  reply	other threads:[~2022-01-04 11:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22  9:37 [PATCH v2 0/3] sched/pelt: Don't sync hardly *_sum with *_avg Vincent Guittot
2021-12-22  9:38 ` [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann [this message]
2022-01-04 13:42     ` Vincent Guittot
2022-01-05 13:15       ` Dietmar Eggemann
2022-01-05 13:57         ` Vincent Guittot
2022-01-07 11:43           ` Dietmar Eggemann
2022-01-07 15:21             ` Vincent Guittot
2022-01-11  7:54               ` Vincent Guittot
2022-01-11 12:37                 ` Dietmar Eggemann
2022-01-04 13:48     ` Vincent Guittot
2021-12-22  9:38 ` [PATCH v2 2/3] sched/pelt: Don't sync hardly runnable_sum with runnable_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann
2021-12-22  9:38 ` [PATCH v2 3/3] sched/pelt: Don't sync hardly load_sum with load_avg Vincent Guittot
2022-01-04 11:47   ` Dietmar Eggemann
2022-01-04 11:46 ` [PATCH v2 0/3] sched/pelt: Don't sync hardly *_sum with *_avg Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e526482-905c-e759-8aa6-1ff84bb5b2a3@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=naresh.kamboju@linaro.org \
    --cc=odin@uged.al \
    --cc=peterz@infradead.org \
    --cc=rickyiu@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.