linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Vincent Donnefort <Vincent.Donnefort@arm.com>
Cc: peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com,
	Valentin.Schneider@arm.com, Morten.Rasmussen@arm.com,
	Chris.Redpath@arm.com, qperret@google.com, Lukasz.Luba@arm.com
Subject: Re: [PATCH v2 2/7] sched/fair: Decay task PELT values during migration
Date: Wed, 19 Jan 2022 14:22:28 +0100	[thread overview]
Message-ID: <CAKfTPtB=CJNFDrpXY9o8g5XfjBfnVTUgb2rWke1SyWMUxz0M+g@mail.gmail.com> (raw)
In-Reply-To: <Yef8kTnlP5h4I7/1@FVFF7649Q05P>

On Wed, 19 Jan 2022 at 12:59, Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > > >
> > > > This has several shortfalls:
> > > > - have a look at cfs_rq_clock_pelt() and rq_clock_pelt(). What you
> > > > name clock_pelt in your commit message and is used to update PELT and
> > > > saved in se->avg.last_update_time is : rq->clock_pelt -
> > > > rq->lost_idle_time - cfs_rq->throttled_clock_task_time
> > >
> > > That's why, the PELT "lag" is added onto se->avg.last_update_time. (see the last
> > > paragraph of the commit message) The estimator is just a time delta, that is
> > > added on top of the entity's last_update_time. I don't see any problem with the
> > > lost_idle_time here.
> >
> > lost_idle_time is updated before entering idle and after your
> > clock_pelt_lag has been updated. This means that the delta that you
> > are computing can be wrong
> >
> > I haven't look in details but similar problem probably happens for
> > throttled_clock_task_time
> >
> > >
> > > > - you are doing this whatever the state of the cpu : idle or not. But
> > > > the clock cycles are not accounted for in the same way in both cases.
> > >
> > > If the CPU is idle and clock_pelt == clock_task, the component A of the
> > > estimator would be 0 and we only would account for how outdated is the rq's
> > > clock, i.e. component B.
> >
> > And if cpu is not idle, you can't apply the diff between clk_pelt and clock_task
> >
> > >
> > > > - (B) doesn't seem to be accurate as you skip irq and steal time
> > > > accounting and you don't apply any scale invariance if the cpu is not
> > > > idle
> > >
> > > The missing irq and paravirt time is the reason why it is called "estimator".
> > > But maybe there's a chance of improving this part with a lockless version of
> > > rq->prev_irq_time and rq->prev_steal_time_rq?
> > >
> > > > - IIUC your explanation in the commit message above, the (A) period
> > > > seems to be a problem only when idle but you apply it unconditionally.
> > >
> > > If the CPU is idle (and clock_pelt == clock_task), only the B part would be
> > > worth something:
> > >
> > >   A + B = [clock_task - clock_pelt] + [sched_clock_cpu() - clock]
> > >                       A                            B
> > >
> > > > If cpu is idle you can assume that clock_pelt should be equal to
> > > > clock_task but you can't if cpu is not idle otherwise your sync will
> > > > be inaccurate and defeat the primary goal of this patch. If your
> > > > problem with clock_pelt is that the pending idle time is not accounted
> > > > for when entering idle but only at the next update (update blocked
> > > > load or wakeup of a thread). This patch below should fix this and
> > > > remove your A.
> > >
> > > That would help slightly the current situation, but this part is already
> > > covered by the estimator.
> >
> > But the estimator, as you name it, is wrong beaus ethe A part can't be
> > applied unconditionally
>
> Hum, it is used only in the !active migration. So we know the task was sleeping
> before that migration. As a consequence, the time we need to account is "sleeping"
> time from the task point of view, which is clock_pelt == clock_task (for
> __update_load_avg_blocked_se()). Otherwise, we would only decay with the
> "wallclock" idle time instead of the "scaled" one wouldn't we?

clock_pelt == clock_task only when cpu is idle and after updating
lost_idle_time but you have no idea of the state of the cpu when
migrating the task

>
>
>      +-------------+--------------
>      |   Task A    |    Task B    .....
>               ^    ^             ^
>               |    |          migrate A
>               |    |             |
>               |    |             |
>               |    |             |
>               |    |<----------->|
>               |  Wallclock Task A idle time
>               |<---------------->|
>             "Scaled" Task A idle time
>
>
> [...]

  reply	other threads:[~2022-01-19 13:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-12 16:12 [PATCH v2 0/7] feec() energy margin removal Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 1/7] sched/fair: Provide u64 read for 32-bits arch helper Vincent Donnefort
2022-01-17 16:11   ` Tao Zhou
2022-01-17 19:42     ` Vincent Donnefort
2022-01-17 23:44       ` Tao Zhou
2022-01-12 16:12 ` [PATCH v2 2/7] sched/fair: Decay task PELT values during migration Vincent Donnefort
2022-01-17 17:31   ` Vincent Guittot
2022-01-18 10:56     ` Vincent Donnefort
2022-01-19  9:54       ` Vincent Guittot
2022-01-19 11:59         ` Vincent Donnefort
2022-01-19 13:22           ` Vincent Guittot [this message]
2022-01-20 21:12             ` Vincent Donnefort
2022-01-21 15:27               ` Vincent Guittot
2022-01-12 16:12 ` [PATCH v2 3/7] sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 4/7] sched/fair: Rename select_idle_mask to select_rq_mask Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 5/7] sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 6/7] sched/fair: Remove task_util from effective utilization in feec() Vincent Donnefort
2022-01-17 13:17   ` Dietmar Eggemann
2022-01-18  9:46     ` Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 7/7] sched/fair: Remove the energy margin " Vincent Donnefort

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtB=CJNFDrpXY9o8g5XfjBfnVTUgb2rWke1SyWMUxz0M+g@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=Chris.Redpath@arm.com \
    --cc=Lukasz.Luba@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=Valentin.Schneider@arm.com \
    --cc=Vincent.Donnefort@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).