From: Quentin Perret <quentin.perret@arm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Saravana Kannan <skannan@codeaurora.org>,
Peter Zijlstra <peterz@infradead.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux PM <linux-pm@vger.kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Ingo Molnar <mingo@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Chris Redpath <chris.redpath@arm.com>,
Patrick Bellasi <patrick.bellasi@arm.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Thara Gopinath <thara.gopinath@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Todd Kjos <tkjos@google.com>,
Joel Fernandes <joel@joelfernandes.org>,
Steve Muckle <smuckle@google.com>,
adharmap@quicinc.com, skannan@quicinc.com,
Pavan Kondeti <pkondeti@codeaurora.org>,
Juri Lelli <juri.lelli@redhat.com>,
Eduardo Valentin <edubezval@gmail.com>,
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
currojerez@riseup.net, Javi Merino <javi.merino@kernel.org>,
linux-pm-owner@vger.kernel.org
Subject: Re: [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method
Date: Wed, 1 Aug 2018 09:23:55 +0100 [thread overview]
Message-ID: <20180801082353.egym4tsbr7ppql27@queper01-lin> (raw)
In-Reply-To: <CAJZ5v0j=EYnANGAj9bd44eeux1eCfeMtdn9npe5pSAzE8EVKaA@mail.gmail.com>
On Wednesday 01 Aug 2018 at 09:32:49 (+0200), Rafael J. Wysocki wrote:
> On Tue, Jul 31, 2018 at 9:31 PM, <skannan@codeaurora.org> wrote:
> > On 2018-07-31 00:59, Quentin Perret wrote:
> >>
> >> On Monday 30 Jul 2018 at 12:35:27 (-0700), skannan@codeaurora.org wrote:
> >> [...]
> >>>
> >>> If it's going to be a different aggregation from what's done for
> >>> frequency
> >>> guidance, I don't see the point of having this inside schedutil. Why not
> >>> keep it inside the scheduler files?
> >>
> >>
> >> This code basically results from a discussion we had with Peter on v4.
> >> Keeping everything centralized can make sense from a maintenance
> >> perspective, I think. That makes it easy to see the impact of any change
> >> to utilization signals for both EAS and schedutil.
> >
> >
> > In that case, I'd argue it makes more sense to keep the code centralized in
> > the scheduler. The scheduler can let schedutil know about the utilization
> > after it aggregates them. There's no need for a cpufreq governor to know
> > that there are scheduling classes or how many there are. And the scheduler
> > can then choose to aggregate one way for task packing and another way for
> > frequency guidance.
>
> Also the aggregate utilization may be used by cpuidle governors in
> principle to decide how deep they can go with idle state selection.
The only issue I see with this right now is that some of the things done
in this function are policy decisions which really belong to the governor,
I think. The RT-go-to-max-freq thing in particular. And I really don't
think EAS should cope with that, at least for now.
But if this specific bit is factored out of the aggregation function, I
suppose we could move it somewhere else. Maybe pelt.c ?
How ugly is something like the below (totally untested) code ? It would
change slightly how we deal with DL utilization in EAS but I don't think
this is an issue.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index af86050edcf5..51c9ac9f30e8 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -178,121 +178,17 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
return cpufreq_driver_resolve_freq(policy, freq);
}
-/*
- * This function computes an effective utilization for the given CPU, to be
- * used for frequency selection given the linear relation: f = u * f_max.
- *
- * The scheduler tracks the following metrics:
- *
- * cpu_util_{cfs,rt,dl,irq}()
- * cpu_bw_dl()
- *
- * Where the cfs,rt and dl util numbers are tracked with the same metric and
- * synchronized windows and are thus directly comparable.
- *
- * The cfs,rt,dl utilization are the running times measured with rq->clock_task
- * which excludes things like IRQ and steal-time. These latter are then accrued
- * in the irq utilization.
- *
- * The DL bandwidth number otoh is not a measured metric but a value computed
- * based on the task model parameters and gives the minimal utilization
- * required to meet deadlines.
- */
-unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
- enum schedutil_type type)
-{
- struct rq *rq = cpu_rq(cpu);
- unsigned long util, irq, max;
-
- max = arch_scale_cpu_capacity(NULL, cpu);
-
- if (type == frequency_util && rt_rq_is_runnable(&rq->rt))
- return max;
-
- /*
- * Early check to see if IRQ/steal time saturates the CPU, can be
- * because of inaccuracies in how we track these -- see
- * update_irq_load_avg().
- */
- irq = cpu_util_irq(rq);
- if (unlikely(irq >= max))
- return max;
-
- /*
- * Because the time spend on RT/DL tasks is visible as 'lost' time to
- * CFS tasks and we use the same metric to track the effective
- * utilization (PELT windows are synchronized) we can directly add them
- * to obtain the CPU's actual utilization.
- */
- util = util_cfs;
- util += cpu_util_rt(rq);
-
- if (type == frequency_util) {
- /*
- * For frequency selection we do not make cpu_util_dl() a
- * permanent part of this sum because we want to use
- * cpu_bw_dl() later on, but we need to check if the
- * CFS+RT+DL sum is saturated (ie. no idle time) such
- * that we select f_max when there is no idle time.
- *
- * NOTE: numerical errors or stop class might cause us
- * to not quite hit saturation when we should --
- * something for later.
- */
-
- if ((util + cpu_util_dl(rq)) >= max)
- return max;
- } else {
- /*
- * OTOH, for energy computation we need the estimated
- * running time, so include util_dl and ignore dl_bw.
- */
- util += cpu_util_dl(rq);
- if (util >= max)
- return max;
- }
-
- /*
- * There is still idle time; further improve the number by using the
- * irq metric. Because IRQ/steal time is hidden from the task clock we
- * need to scale the task numbers:
- *
- * 1 - irq
- * U' = irq + ------- * U
- * max
- */
- util *= (max - irq);
- util /= max;
- util += irq;
-
- if (type == frequency_util) {
- /*
- * Bandwidth required by DEADLINE must always be granted
- * while, for FAIR and RT, we use blocked utilization of
- * IDLE CPUs as a mechanism to gracefully reduce the
- * frequency when no tasks show up for longer periods of
- * time.
- *
- * Ideally we would like to set bw_dl as min/guaranteed
- * freq and util + bw_dl as requested freq. However,
- * cpufreq is not yet ready for such an interface. So,
- * we only do the latter for now.
- */
- util += cpu_bw_dl(rq);
- }
-
- return min(max, util);
-}
-
static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
{
struct rq *rq = cpu_rq(sg_cpu->cpu);
- unsigned long util = cpu_util_cfs(rq);
sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
sg_cpu->bw_dl = cpu_bw_dl(rq);
- return schedutil_freq_util(sg_cpu->cpu, util, frequency_util);
+ if (rt_rq_is_runnable(&rq->rt))
+ return sg_cpu->max;
+
+ return cpu_util_total(sg_cpu->cpu, cpu_util_cfs(rq));
}
/**
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 35475c0c5419..5f99bd564dfc 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -397,3 +397,77 @@ int update_irq_load_avg(struct rq *rq, u64 running)
return ret;
}
#endif
+
+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ * cpu_util_{cfs,rt,dl,irq}()
+ * cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued
+ * in the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured metric but a value computed
+ * based on the task model parameters and gives the minimal utilization
+ * required to meet deadlines.
+ */
+unsigned long cpu_util_total(int cpu, unsigned long util_cfs)
+{
+ struct rq *rq = cpu_rq(cpu);
+ unsigned long util, irq, max;
+
+ max = arch_scale_cpu_capacity(NULL, cpu);
+
+ /*
+ * Early check to see if IRQ/steal time saturates the CPU, can be
+ * because of inaccuracies in how we track these -- see
+ * update_irq_load_avg().
+ */
+ irq = cpu_util_irq(rq);
+ if (unlikely(irq >= max))
+ return max;
+
+ /*
+ * Because the time spend on RT/DL tasks is visible as 'lost' time to
+ * CFS tasks and we use the same metric to track the effective
+ * utilization (PELT windows are synchronized) we can directly add them
+ * to obtain the CPU's actual utilization.
+ */
+ util = util_cfs;
+ util += cpu_util_rt(rq);
+
+ if ((util + cpu_util_dl(rq)) >= max)
+ return max;
+
+ /*
+ * There is still idle time; further improve the number by using the
+ * irq metric. Because IRQ/steal time is hidden from the task clock we
+ * need to scale the task numbers:
+ *
+ * 1 - irq
+ * U' = irq + ------- * U
+ * max
+ */
+ util *= (max - irq);
+ util /= max;
+ util += irq;
+
+ /*
+ * Bandwidth required by DEADLINE must always be granted while, for
+ * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+ * to gracefully reduce the frequency when no tasks show up for longer
+ * periods of time.
+ *
+ * Ideally we would like to set bw_dl as min/guaranteed freq and util +
+ * bw_dl as requested freq. However, cpufreq is not yet ready for such
+ * an interface. So, we only do the latter for now.
+ */
+ return min(max, util + cpu_bw_dl(rq));
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 51e7f113ee23..7ad037bb653e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2185,14 +2185,9 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
# define arch_scale_freq_invariant() false
#endif
-enum schedutil_type {
- frequency_util,
- energy_util,
-};
-#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
- enum schedutil_type type);
+#ifdef CONFIG_SMP
+unsigned long cpu_util_total(int cpu, unsigned long cfs_util);
static inline unsigned long cpu_bw_dl(struct rq *rq)
{
@@ -2233,12 +2228,6 @@ static inline unsigned long cpu_util_irq(struct rq *rq)
}
#endif
-#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
-static inline unsigned long schedutil_freq_util(int cpu, unsigned long util,
- enum schedutil_type type)
-{
- return util;
-}
#endif
#ifdef CONFIG_SMP
next prev parent reply other threads:[~2018-08-01 8:24 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-24 12:25 [PATCH v5 00/14] Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 01/14] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-07-24 12:25 ` [PATCH v5 02/14] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-07-24 12:25 ` [PATCH v5 03/14] PM: Introduce an Energy Model management framework Quentin Perret
2018-08-09 21:52 ` Rafael J. Wysocki
2018-08-10 8:15 ` Quentin Perret
2018-08-10 8:41 ` Rafael J. Wysocki
2018-08-10 9:12 ` Quentin Perret
2018-08-10 11:13 ` Rafael J. Wysocki
2018-08-10 12:30 ` Quentin Perret
2018-08-12 9:49 ` Rafael J. Wysocki
2018-07-24 12:25 ` [PATCH v5 04/14] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-07-24 12:25 ` [PATCH v5 05/14] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-07-24 12:25 ` [PATCH v5 06/14] sched/topology: Lowest energy aware balancing sched_domain level pointer Quentin Perret
2018-07-26 16:00 ` Valentin Schneider
2018-07-26 17:01 ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 07/14] sched/topology: Introduce sched_energy_present static key Quentin Perret
2018-07-24 12:25 ` [PATCH v5 08/14] sched/fair: Clean-up update_sg_lb_stats parameters Quentin Perret
2018-07-24 12:25 ` [PATCH v5 09/14] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-08-02 12:26 ` Peter Zijlstra
2018-08-02 13:03 ` Quentin Perret
2018-08-02 13:08 ` Peter Zijlstra
2018-08-02 13:18 ` Quentin Perret
2018-08-02 13:48 ` Vincent Guittot
2018-08-02 14:14 ` Quentin Perret
2018-08-02 15:14 ` Vincent Guittot
2018-08-02 15:30 ` Quentin Perret
2018-08-02 15:55 ` Vincent Guittot
2018-08-02 16:00 ` Quentin Perret
2018-08-02 16:07 ` Vincent Guittot
2018-08-02 16:10 ` Quentin Perret
2018-08-02 16:38 ` Vincent Guittot
2018-08-02 16:59 ` Quentin Perret
2018-08-03 7:48 ` Vincent Guittot
2018-08-03 8:18 ` Quentin Perret
2018-08-03 13:49 ` Vincent Guittot
2018-08-03 14:21 ` Vincent Guittot
2018-08-03 15:55 ` Quentin Perret
2018-08-06 8:40 ` Vincent Guittot
2018-08-06 9:43 ` Quentin Perret
2018-08-06 10:45 ` Vincent Guittot
2018-08-06 11:02 ` Quentin Perret
2018-08-06 10:08 ` Dietmar Eggemann
2018-08-06 10:33 ` Vincent Guittot
2018-08-06 12:29 ` Dietmar Eggemann
2018-08-06 12:37 ` Vincent Guittot
2018-08-06 13:20 ` Dietmar Eggemann
2018-08-09 9:30 ` Vincent Guittot
2018-08-09 9:38 ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method Quentin Perret
2018-07-30 19:35 ` skannan
2018-07-31 7:59 ` Quentin Perret
2018-07-31 19:31 ` skannan
2018-08-01 7:32 ` Rafael J. Wysocki
2018-08-01 8:23 ` Quentin Perret [this message]
2018-08-01 8:35 ` Rafael J. Wysocki
2018-08-01 9:23 ` Quentin Perret
2018-08-01 9:40 ` Rafael J. Wysocki
2018-08-02 13:04 ` Peter Zijlstra
2018-08-02 15:39 ` Quentin Perret
2018-08-03 13:04 ` Quentin Perret
2018-08-02 12:33 ` Peter Zijlstra
2018-08-02 12:45 ` Peter Zijlstra
2018-08-02 15:21 ` Quentin Perret
2018-08-02 17:36 ` Peter Zijlstra
2018-08-03 12:42 ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 11/14] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-07-24 12:25 ` [PATCH v5 12/14] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-08-02 13:54 ` Peter Zijlstra
2018-08-02 16:21 ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 13/14] OPTIONAL: arch_topology: Start Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 14/14] OPTIONAL: cpufreq: dt: Register an Energy Model Quentin Perret
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180801082353.egym4tsbr7ppql27@queper01-lin \
--to=quentin.perret@arm.com \
--cc=adharmap@quicinc.com \
--cc=chris.redpath@arm.com \
--cc=currojerez@riseup.net \
--cc=dietmar.eggemann@arm.com \
--cc=edubezval@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=javi.merino@kernel.org \
--cc=joel@joelfernandes.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm-owner@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=patrick.bellasi@arm.com \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
--cc=skannan@codeaurora.org \
--cc=skannan@quicinc.com \
--cc=smuckle@google.com \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=thara.gopinath@linaro.org \
--cc=tkjos@google.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).