linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Dai <davidai@google.com>
To: Saravana Kannan <saravanak@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	kernel-team@android.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks
Date: Wed, 5 Apr 2023 16:36:04 -0700	[thread overview]
Message-ID: <CABN1KC+BwYM1uYexL+RDcWzhSo-0n0yZHB_thpRdv4FiQNJr-g@mail.gmail.com> (raw)
In-Reply-To: <CAGETcx_3h9_+y91EfhDMk-gPdRLA3mhdiX2AksN6xHZha7U_mw@mail.gmail.com>

On Wed, Apr 5, 2023 at 2:43 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Apr 5, 2023 at 3:50 AM Dietmar Eggemann
> <dietmar.eggemann@arm.com> wrote:
> >
> > On 04/04/2023 03:11, David Dai wrote:
> > > On Mon, Apr 3, 2023 at 4:40 AM Dietmar Eggemann
> > > <dietmar.eggemann@arm.com> wrote:
> > >>
> > >> Hi David,
> > > Hi Dietmar, thanks for your comments.
> > >>
> > >> On 31/03/2023 00:43, David Dai wrote:
> >
> > [...]
> >
> > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>> index 6986ea31c984..998649554344 100644
> > >>> --- a/kernel/sched/fair.c
> > >>> +++ b/kernel/sched/fair.c
> > >>> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf);
> > >>>
> > >>>  static inline unsigned long task_util(struct task_struct *p)
> > >>>  {
> > >>> -     return READ_ONCE(p->se.avg.util_avg);
> > >>> +     return max(READ_ONCE(p->se.avg.util_avg),
> > >>> +                     READ_ONCE(p->se.avg.util_guest));
> > >>>  }
> > >>>
> > >>>  static inline unsigned long _task_util_est(struct task_struct *p)
> > >>>  {
> > >>>       struct util_est ue = READ_ONCE(p->se.avg.util_est);
> > >>>
> > >>> -     return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
> > >>> +     return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest),
> > >>> +                     max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)));
> > >>>  }
> > >>
> > >> I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't be
> > >> used here instead p->se.avg.util_guest.
> > > Using p->uclamp_req[UCLAMP_MIN].value would result in folding in
> > > uclamp values into task_util and task_util_est for all tasks that have
> > > uclamp values set. The intent of these patches isn’t to modify
> > > existing uclamp behaviour. Users would also override util values from
> > > the guest when they set uclamp values.
> > >>
> > >> I do understand the issue of inheriting uclamp values at fork but don't
> > >> get the not being `additive` thing. We are at task level here.
> >
> > > Uclamp values are max aggregated with other tasks at the runqueue
> > > level when deciding CPU frequency. For example, a vCPU runqueue may
> > > have an util of 512 that results in setting 512 to uclamp_min on the
> > > vCPU task. This is insufficient to drive a frequency response if it
> > > shares the runqueue with another host task running with util of 512 as
> > > it would result in a clamped util value of 512 at the runqueue(Ex. If
> > > a guest thread had just migrated onto this vCPU).
> >
> > OK, see your point now. You want an accurate per-task boost for this
> > vCPU task on the host run-queue.
> > And a scenario in which a vCPU can ask for 100% in these moments is not
> > sufficient I guess? In this case uclamp_min could work.
>
> Right. vCPU can have whatever utilization and there can be random host
> threads completely unrelated to the VM. And we need to aggregate both
> of their util when deciding CPU freq.
>
> >
> > >> The fact that you have to max util_avg and util_est directly in
> > >> task_util() and _task_util_est() tells me that there are places where
> > >> this helps and uclamp_task_util() is not called there.
> > > Can you clarify on this point a bit more?
> >
> > Sorry, I meant s/util_est/util_guest/.
> >
> > The effect of the change in _task_util_est() you see via:
> >
> > enqueue_task_fair()
> >   util_est_enqueue()
> >     cfs_rq->avg.util_est.enqueued += _task_util_est(p)
> >
> > so that `sugov_get_util() -> cpu_util_cfs() ->
> > cfs_rq->avg.util_est.enqueued` can see the effect of util_guest?

That sequence looks correct to me.

> >
> > Not sure about the change in task_util() yet.

task_util() provides some signaling in addition to task_util_est() via:

find_energy_effcient_cpu()
  cpu_util_next()
    lsub_positive(&util, task_util(p));
    ...
    util += task_util(p);
    //Can provide a better signal than util_est.

dequeue_task_fair()
  util_est_update()
    ue.enqueued = task_util(p);
    //Updates ue.ewma

Thanks,
David

> >
> > >> When you say in the cover letter that you tried uclamp_min, how exactly
> > >> did you use it? Did you run the existing mainline or did you use
> > >> uclamp_min as a replacement for util_guest in this patch here?
> >
> > > I called sched_setattr_nocheck() with .sched_flags =
> > > SCHED_FLAG_UTIL_CLAMP when updating uclamp_min and clamp_max is left
> > > at 1024. Uclamp_min was not aggregated with task_util and
> > > task_util_est during my testing. The only caveat there is that I added
> > > a change to only reset uclamp on fork when testing(I realize there is
> > > specifically a SCHED_FLAG_RESET_ON_FORK, but I didn’t want to reset
> > > other sched attributes).
> >
> > OK, understood. It's essentially a util_est v2 for vCPU tasks on host.
>
> Yup. We initially looked into just overwriting util_est, but didn't
> think that'll land well with the community :) as it was a bit messier
> because we needed to make sure the current util_est update paths don't
> run for vCPU tasks on host (because those values would be wrong).
>
> > >>>  static inline unsigned long task_util_est(struct task_struct *p)
> > >>> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >>>        */
> > >>>       util_est_enqueue(&rq->cfs, p);
> > >>>
> > >>> +     /*
> > >>> +      * The normal code path for host thread enqueue doesn't take into
> > >>> +      * account guest task migrations when updating cpufreq util.
> > >>> +      * So, always update the cpufreq when a vCPU thread has a
> > >>> +      * non-zero util_guest value.
> > >>> +      */
> > >>> +     if (READ_ONCE(p->se.avg.util_guest))
> > >>> +             cpufreq_update_util(rq, 0);
> > >>
> > >>
> > >> This is because enqueue_entity() -> update_load_avg() ->
> > >> attach_entity_load_avg() -> cfs_rq_util_change() requires root run-queue
> > >> (&rq->cfs == cfs_rq) to call cpufreq_update_util()?
> > > The enqueue_entity() would not call into update_load_avg() due to the
> > > check for !se->avg.last_update_time. se->avg.last_update_time is
> > > non-zero because the vCPU task did not migrate before this enqueue.
> > > This enqueue path is reached when util_guest is updated for the vCPU
> > > task through the sched_setattr_nocheck call where we want to ensure a
> > > frequency update occurs.
> >
> > OK, vCPU tasks are pinned so always !WF_MIGRATED wakeup I guess?
>
> Even if say little-vCPU threads are allowed to migrate within little
> CPUs, this will still be an issue. While a vCPU thread is continuously
> running on a single CPU, a guest thread can migrate into that vCPU and
> cause a huge increase in util_guest. But that won't trigger an cpufreq
> update on the host side because the host doesn't see a task migration.
> That's what David is trying to address.
>
> -Saravana

  reply	other threads:[~2023-04-05 23:36 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 22:43 [RFC PATCH 0/6] Improve VM DVFS and task placement behavior David Dai
2023-03-30 22:43 ` [RFC PATCH 1/6] sched/fair: Add util_guest for tasks David Dai
2023-04-03 11:40   ` Dietmar Eggemann
2023-04-04  1:11     ` David Dai
2023-04-05  8:29       ` Quentin Perret
2023-04-05 10:50       ` Dietmar Eggemann
2023-04-05 21:42         ` Saravana Kannan
2023-04-05 23:36           ` David Dai [this message]
2023-04-05  8:14   ` Peter Zijlstra
2023-04-05 22:54     ` David Dai
2023-04-06  7:33       ` Peter Zijlstra
2023-03-30 22:43 ` [RFC PATCH 2/6] kvm: arm64: Add support for get_cur_cpufreq service David Dai
2023-04-05  8:04   ` Quentin Perret
2023-03-30 22:43 ` [RFC PATCH 3/6] kvm: arm64: Add support for util_hint service David Dai
2023-03-30 22:43 ` [RFC PATCH 4/6] kvm: arm64: Add support for get_freqtbl service David Dai
2023-03-30 22:43 ` [RFC PATCH 5/6] dt-bindings: cpufreq: add bindings for virtual kvm cpufreq David Dai
2023-03-30 22:43 ` [RFC PATCH 6/6] cpufreq: add kvm-cpufreq driver David Dai
2023-04-05  8:22   ` Peter Zijlstra
2023-04-05 22:42     ` David Dai
2023-03-30 23:20 ` [RFC PATCH 0/6] Improve VM DVFS and task placement behavior Oliver Upton
2023-03-30 23:36   ` Saravana Kannan
2023-03-30 23:40     ` Oliver Upton
2023-03-31  0:34       ` Saravana Kannan
2023-03-31  0:49 ` Matthew Wilcox
2023-04-03 10:18   ` Mel Gorman
2023-04-04 19:43 ` Oliver Upton
2023-04-04 20:49   ` Marc Zyngier
2023-04-05  7:48     ` Quentin Perret
2023-04-05  8:33       ` Vincent Guittot
2023-04-05 21:07       ` Saravana Kannan
2023-04-06 12:52         ` Quentin Perret
2023-04-06 21:39           ` David Dai
2023-04-05 21:00     ` Saravana Kannan
2023-04-06  8:42       ` Marc Zyngier
2023-04-05  8:05 ` Peter Zijlstra
2023-04-05 21:08   ` Saravana Kannan
2023-04-06  7:36     ` Peter Zijlstra
2023-04-06  7:38     ` Peter Zijlstra
2023-04-27  7:46 ` Pavan Kondeti
2023-04-27  9:52   ` Gupta, Pankaj
2023-04-27 11:26     ` Pavan Kondeti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABN1KC+BwYM1uYexL+RDcWzhSo-0n0yZHB_thpRdv4FiQNJr-g@mail.gmail.com \
    --to=davidai@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=saravanak@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).