All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Vincent Donnefort <Vincent.Donnefort@arm.com>
Cc: peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com,
	Valentin.Schneider@arm.com, Morten.Rasmussen@arm.com,
	Chris.Redpath@arm.com, qperret@google.com, Lukasz.Luba@arm.com
Subject: Re: [PATCH v2 2/7] sched/fair: Decay task PELT values during migration
Date: Wed, 19 Jan 2022 14:22:28 +0100	[thread overview]
Message-ID: <CAKfTPtB=CJNFDrpXY9o8g5XfjBfnVTUgb2rWke1SyWMUxz0M+g@mail.gmail.com> (raw)
In-Reply-To: <Yef8kTnlP5h4I7/1@FVFF7649Q05P>

On Wed, 19 Jan 2022 at 12:59, Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> [...]
>
> > > >
> > > > This has several shortfalls:
> > > > - have a look at cfs_rq_clock_pelt() and rq_clock_pelt(). What you
> > > > name clock_pelt in your commit message and is used to update PELT and
> > > > saved in se->avg.last_update_time is : rq->clock_pelt -
> > > > rq->lost_idle_time - cfs_rq->throttled_clock_task_time
> > >
> > > That's why, the PELT "lag" is added onto se->avg.last_update_time. (see the last
> > > paragraph of the commit message) The estimator is just a time delta, that is
> > > added on top of the entity's last_update_time. I don't see any problem with the
> > > lost_idle_time here.
> >
> > lost_idle_time is updated before entering idle and after your
> > clock_pelt_lag has been updated. This means that the delta that you
> > are computing can be wrong
> >
> > I haven't look in details but similar problem probably happens for
> > throttled_clock_task_time
> >
> > >
> > > > - you are doing this whatever the state of the cpu : idle or not. But
> > > > the clock cycles are not accounted for in the same way in both cases.
> > >
> > > If the CPU is idle and clock_pelt == clock_task, the component A of the
> > > estimator would be 0 and we only would account for how outdated is the rq's
> > > clock, i.e. component B.
> >
> > And if cpu is not idle, you can't apply the diff between clk_pelt and clock_task
> >
> > >
> > > > - (B) doesn't seem to be accurate as you skip irq and steal time
> > > > accounting and you don't apply any scale invariance if the cpu is not
> > > > idle
> > >
> > > The missing irq and paravirt time is the reason why it is called "estimator".
> > > But maybe there's a chance of improving this part with a lockless version of
> > > rq->prev_irq_time and rq->prev_steal_time_rq?
> > >
> > > > - IIUC your explanation in the commit message above, the (A) period
> > > > seems to be a problem only when idle but you apply it unconditionally.
> > >
> > > If the CPU is idle (and clock_pelt == clock_task), only the B part would be
> > > worth something:
> > >
> > >   A + B = [clock_task - clock_pelt] + [sched_clock_cpu() - clock]
> > >                       A                            B
> > >
> > > > If cpu is idle you can assume that clock_pelt should be equal to
> > > > clock_task but you can't if cpu is not idle otherwise your sync will
> > > > be inaccurate and defeat the primary goal of this patch. If your
> > > > problem with clock_pelt is that the pending idle time is not accounted
> > > > for when entering idle but only at the next update (update blocked
> > > > load or wakeup of a thread). This patch below should fix this and
> > > > remove your A.
> > >
> > > That would help slightly the current situation, but this part is already
> > > covered by the estimator.
> >
> > But the estimator, as you name it, is wrong beaus ethe A part can't be
> > applied unconditionally
>
> Hum, it is used only in the !active migration. So we know the task was sleeping
> before that migration. As a consequence, the time we need to account is "sleeping"
> time from the task point of view, which is clock_pelt == clock_task (for
> __update_load_avg_blocked_se()). Otherwise, we would only decay with the
> "wallclock" idle time instead of the "scaled" one wouldn't we?

clock_pelt == clock_task only when cpu is idle and after updating
lost_idle_time but you have no idea of the state of the cpu when
migrating the task

>
>
>      +-------------+--------------
>      |   Task A    |    Task B    .....
>               ^    ^             ^
>               |    |          migrate A
>               |    |             |
>               |    |             |
>               |    |             |
>               |    |<----------->|
>               |  Wallclock Task A idle time
>               |<---------------->|
>             "Scaled" Task A idle time
>
>
> [...]

  reply	other threads:[~2022-01-19 13:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-12 16:12 [PATCH v2 0/7] feec() energy margin removal Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 1/7] sched/fair: Provide u64 read for 32-bits arch helper Vincent Donnefort
2022-01-17 16:11   ` Tao Zhou
2022-01-17 19:42     ` Vincent Donnefort
2022-01-17 23:44       ` Tao Zhou
2022-01-12 16:12 ` [PATCH v2 2/7] sched/fair: Decay task PELT values during migration Vincent Donnefort
2022-01-17 17:31   ` Vincent Guittot
2022-01-18 10:56     ` Vincent Donnefort
2022-01-19  9:54       ` Vincent Guittot
2022-01-19 11:59         ` Vincent Donnefort
2022-01-19 13:22           ` Vincent Guittot [this message]
2022-01-20 21:12             ` Vincent Donnefort
2022-01-21 15:27               ` Vincent Guittot
2022-01-12 16:12 ` [PATCH v2 3/7] sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 4/7] sched/fair: Rename select_idle_mask to select_rq_mask Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 5/7] sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 6/7] sched/fair: Remove task_util from effective utilization in feec() Vincent Donnefort
2022-01-12 19:44   ` kernel test robot
2022-01-12 19:44     ` kernel test robot
2022-01-17 13:17   ` Dietmar Eggemann
2022-01-18  9:46     ` Vincent Donnefort
2022-01-12 16:12 ` [PATCH v2 7/7] sched/fair: Remove the energy margin " Vincent Donnefort

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtB=CJNFDrpXY9o8g5XfjBfnVTUgb2rWke1SyWMUxz0M+g@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=Chris.Redpath@arm.com \
    --cc=Lukasz.Luba@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=Valentin.Schneider@arm.com \
    --cc=Vincent.Donnefort@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.