From: David Dai <davidai@google.com>
To: Saravana Kannan <saravanak@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
kernel-team@android.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks
Date: Wed, 5 Apr 2023 16:36:04 -0700 [thread overview]
Message-ID: <CABN1KC+BwYM1uYexL+RDcWzhSo-0n0yZHB_thpRdv4FiQNJr-g@mail.gmail.com> (raw)
In-Reply-To: <CAGETcx_3h9_+y91EfhDMk-gPdRLA3mhdiX2AksN6xHZha7U_mw@mail.gmail.com>
On Wed, Apr 5, 2023 at 2:43 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Apr 5, 2023 at 3:50 AM Dietmar Eggemann
> <dietmar.eggemann@arm.com> wrote:
> >
> > On 04/04/2023 03:11, David Dai wrote:
> > > On Mon, Apr 3, 2023 at 4:40 AM Dietmar Eggemann
> > > <dietmar.eggemann@arm.com> wrote:
> > >>
> > >> Hi David,
> > > Hi Dietmar, thanks for your comments.
> > >>
> > >> On 31/03/2023 00:43, David Dai wrote:
> >
> > [...]
> >
> > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>> index 6986ea31c984..998649554344 100644
> > >>> --- a/kernel/sched/fair.c
> > >>> +++ b/kernel/sched/fair.c
> > >>> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf);
> > >>>
> > >>> static inline unsigned long task_util(struct task_struct *p)
> > >>> {
> > >>> - return READ_ONCE(p->se.avg.util_avg);
> > >>> + return max(READ_ONCE(p->se.avg.util_avg),
> > >>> + READ_ONCE(p->se.avg.util_guest));
> > >>> }
> > >>>
> > >>> static inline unsigned long _task_util_est(struct task_struct *p)
> > >>> {
> > >>> struct util_est ue = READ_ONCE(p->se.avg.util_est);
> > >>>
> > >>> - return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
> > >>> + return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest),
> > >>> + max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)));
> > >>> }
> > >>
> > >> I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't be
> > >> used here instead p->se.avg.util_guest.
> > > Using p->uclamp_req[UCLAMP_MIN].value would result in folding in
> > > uclamp values into task_util and task_util_est for all tasks that have
> > > uclamp values set. The intent of these patches isn’t to modify
> > > existing uclamp behaviour. Users would also override util values from
> > > the guest when they set uclamp values.
> > >>
> > >> I do understand the issue of inheriting uclamp values at fork but don't
> > >> get the not being `additive` thing. We are at task level here.
> >
> > > Uclamp values are max aggregated with other tasks at the runqueue
> > > level when deciding CPU frequency. For example, a vCPU runqueue may
> > > have an util of 512 that results in setting 512 to uclamp_min on the
> > > vCPU task. This is insufficient to drive a frequency response if it
> > > shares the runqueue with another host task running with util of 512 as
> > > it would result in a clamped util value of 512 at the runqueue(Ex. If
> > > a guest thread had just migrated onto this vCPU).
> >
> > OK, see your point now. You want an accurate per-task boost for this
> > vCPU task on the host run-queue.
> > And a scenario in which a vCPU can ask for 100% in these moments is not
> > sufficient I guess? In this case uclamp_min could work.
>
> Right. vCPU can have whatever utilization and there can be random host
> threads completely unrelated to the VM. And we need to aggregate both
> of their util when deciding CPU freq.
>
> >
> > >> The fact that you have to max util_avg and util_est directly in
> > >> task_util() and _task_util_est() tells me that there are places where
> > >> this helps and uclamp_task_util() is not called there.
> > > Can you clarify on this point a bit more?
> >
> > Sorry, I meant s/util_est/util_guest/.
> >
> > The effect of the change in _task_util_est() you see via:
> >
> > enqueue_task_fair()
> > util_est_enqueue()
> > cfs_rq->avg.util_est.enqueued += _task_util_est(p)
> >
> > so that `sugov_get_util() -> cpu_util_cfs() ->
> > cfs_rq->avg.util_est.enqueued` can see the effect of util_guest?
That sequence looks correct to me.
> >
> > Not sure about the change in task_util() yet.
task_util() provides some signaling in addition to task_util_est() via:
find_energy_effcient_cpu()
cpu_util_next()
lsub_positive(&util, task_util(p));
...
util += task_util(p);
//Can provide a better signal than util_est.
dequeue_task_fair()
util_est_update()
ue.enqueued = task_util(p);
//Updates ue.ewma
Thanks,
David
> >
> > >> When you say in the cover letter that you tried uclamp_min, how exactly
> > >> did you use it? Did you run the existing mainline or did you use
> > >> uclamp_min as a replacement for util_guest in this patch here?
> >
> > > I called sched_setattr_nocheck() with .sched_flags =
> > > SCHED_FLAG_UTIL_CLAMP when updating uclamp_min and clamp_max is left
> > > at 1024. Uclamp_min was not aggregated with task_util and
> > > task_util_est during my testing. The only caveat there is that I added
> > > a change to only reset uclamp on fork when testing(I realize there is
> > > specifically a SCHED_FLAG_RESET_ON_FORK, but I didn’t want to reset
> > > other sched attributes).
> >
> > OK, understood. It's essentially a util_est v2 for vCPU tasks on host.
>
> Yup. We initially looked into just overwriting util_est, but didn't
> think that'll land well with the community :) as it was a bit messier
> because we needed to make sure the current util_est update paths don't
> run for vCPU tasks on host (because those values would be wrong).
>
> > >>> static inline unsigned long task_util_est(struct task_struct *p)
> > >>> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >>> */
> > >>> util_est_enqueue(&rq->cfs, p);
> > >>>
> > >>> + /*
> > >>> + * The normal code path for host thread enqueue doesn't take into
> > >>> + * account guest task migrations when updating cpufreq util.
> > >>> + * So, always update the cpufreq when a vCPU thread has a
> > >>> + * non-zero util_guest value.
> > >>> + */
> > >>> + if (READ_ONCE(p->se.avg.util_guest))
> > >>> + cpufreq_update_util(rq, 0);
> > >>
> > >>
> > >> This is because enqueue_entity() -> update_load_avg() ->
> > >> attach_entity_load_avg() -> cfs_rq_util_change() requires root run-queue
> > >> (&rq->cfs == cfs_rq) to call cpufreq_update_util()?
> > > The enqueue_entity() would not call into update_load_avg() due to the
> > > check for !se->avg.last_update_time. se->avg.last_update_time is
> > > non-zero because the vCPU task did not migrate before this enqueue.
> > > This enqueue path is reached when util_guest is updated for the vCPU
> > > task through the sched_setattr_nocheck call where we want to ensure a
> > > frequency update occurs.
> >
> > OK, vCPU tasks are pinned so always !WF_MIGRATED wakeup I guess?
>
> Even if say little-vCPU threads are allowed to migrate within little
> CPUs, this will still be an issue. While a vCPU thread is continuously
> running on a single CPU, a guest thread can migrate into that vCPU and
> cause a huge increase in util_guest. But that won't trigger an cpufreq
> update on the host side because the host doesn't see a task migration.
> That's what David is trying to address.
>
> -Saravana
next prev parent reply other threads:[~2023-04-05 23:36 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-30 22:43 [RFC PATCH 0/6] Improve VM DVFS and task placement behavior David Dai
2023-03-30 22:43 ` [RFC PATCH 1/6] sched/fair: Add util_guest for tasks David Dai
2023-04-03 11:40 ` Dietmar Eggemann
2023-04-04 1:11 ` David Dai
2023-04-05 8:29 ` Quentin Perret
2023-04-05 10:50 ` Dietmar Eggemann
2023-04-05 21:42 ` Saravana Kannan
2023-04-05 23:36 ` David Dai [this message]
2023-04-05 8:14 ` Peter Zijlstra
2023-04-05 22:54 ` David Dai
2023-04-06 7:33 ` Peter Zijlstra
2023-03-30 22:43 ` [RFC PATCH 2/6] kvm: arm64: Add support for get_cur_cpufreq service David Dai
2023-04-05 8:04 ` Quentin Perret
2023-03-30 22:43 ` [RFC PATCH 3/6] kvm: arm64: Add support for util_hint service David Dai
2023-03-30 22:43 ` [RFC PATCH 4/6] kvm: arm64: Add support for get_freqtbl service David Dai
2023-03-30 22:43 ` [RFC PATCH 5/6] dt-bindings: cpufreq: add bindings for virtual kvm cpufreq David Dai
2023-03-30 22:43 ` [RFC PATCH 6/6] cpufreq: add kvm-cpufreq driver David Dai
2023-04-05 8:22 ` Peter Zijlstra
2023-04-05 22:42 ` David Dai
2023-03-30 23:20 ` [RFC PATCH 0/6] Improve VM DVFS and task placement behavior Oliver Upton
2023-03-30 23:36 ` Saravana Kannan
2023-03-30 23:40 ` Oliver Upton
2023-03-31 0:34 ` Saravana Kannan
2023-03-31 0:49 ` Matthew Wilcox
2023-04-03 10:18 ` Mel Gorman
2023-04-04 19:43 ` Oliver Upton
2023-04-04 20:49 ` Marc Zyngier
2023-04-05 7:48 ` Quentin Perret
2023-04-05 8:33 ` Vincent Guittot
2023-04-05 21:07 ` Saravana Kannan
2023-04-06 12:52 ` Quentin Perret
2023-04-06 21:39 ` David Dai
2023-04-05 21:00 ` Saravana Kannan
2023-04-06 8:42 ` Marc Zyngier
2023-04-05 8:05 ` Peter Zijlstra
2023-04-05 21:08 ` Saravana Kannan
2023-04-06 7:36 ` Peter Zijlstra
2023-04-06 7:38 ` Peter Zijlstra
2023-04-27 7:46 ` Pavan Kondeti
2023-04-27 9:52 ` Gupta, Pankaj
2023-04-27 11:26 ` Pavan Kondeti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABN1KC+BwYM1uYexL+RDcWzhSo-0n0yZHB_thpRdv4FiQNJr-g@mail.gmail.com \
--to=davidai@google.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=saravanak@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).