LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Quentin Perret <quentin.perret@arm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Saravana Kannan <skannan@codeaurora.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Chris Redpath <chris.redpath@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Thara Gopinath <thara.gopinath@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Steve Muckle <smuckle@google.com>,
	adharmap@quicinc.com, skannan@quicinc.com,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Eduardo Valentin <edubezval@gmail.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	currojerez@riseup.net, Javi Merino <javi.merino@kernel.org>,
	linux-pm-owner@vger.kernel.org
Subject: Re: [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method
Date: Wed, 1 Aug 2018 09:23:55 +0100
Message-ID: <20180801082353.egym4tsbr7ppql27@queper01-lin> (raw)
In-Reply-To: <CAJZ5v0j=EYnANGAj9bd44eeux1eCfeMtdn9npe5pSAzE8EVKaA@mail.gmail.com>

On Wednesday 01 Aug 2018 at 09:32:49 (+0200), Rafael J. Wysocki wrote:
> On Tue, Jul 31, 2018 at 9:31 PM,  <skannan@codeaurora.org> wrote:
> > On 2018-07-31 00:59, Quentin Perret wrote:
> >>
> >> On Monday 30 Jul 2018 at 12:35:27 (-0700), skannan@codeaurora.org wrote:
> >> [...]
> >>>
> >>> If it's going to be a different aggregation from what's done for
> >>> frequency
> >>> guidance, I don't see the point of having this inside schedutil. Why not
> >>> keep it inside the scheduler files?
> >>
> >>
> >> This code basically results from a discussion we had with Peter on v4.
> >> Keeping everything centralized can make sense from a maintenance
> >> perspective, I think. That makes it easy to see the impact of any change
> >> to utilization signals for both EAS and schedutil.
> >
> >
> > In that case, I'd argue it makes more sense to keep the code centralized in
> > the scheduler. The scheduler can let schedutil know about the utilization
> > after it aggregates them. There's no need for a cpufreq governor to know
> > that there are scheduling classes or how many there are. And the scheduler
> > can then choose to aggregate one way for task packing and another way for
> > frequency guidance.
> 
> Also the aggregate utilization may be used by cpuidle governors in
> principle to decide how deep they can go with idle state selection.

The only issue I see with this right now is that some of the things done
in this function are policy decisions which really belong to the governor,
I think. The RT-go-to-max-freq thing in particular. And I really don't
think EAS should cope with that, at least for now.

But if this specific bit is factored out of the aggregation function, I
suppose we could move it somewhere else. Maybe pelt.c ?

How ugly is something like the below (totally untested) code ? It would
change slightly how we deal with DL utilization in EAS but I don't think
this is an issue.

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index af86050edcf5..51c9ac9f30e8 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -178,121 +178,17 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
        return cpufreq_driver_resolve_freq(policy, freq);
 }

-/*
- * This function computes an effective utilization for the given CPU, to be
- * used for frequency selection given the linear relation: f = u * f_max.
- *
- * The scheduler tracks the following metrics:
- *
- *   cpu_util_{cfs,rt,dl,irq}()
- *   cpu_bw_dl()
- *
- * Where the cfs,rt and dl util numbers are tracked with the same metric and
- * synchronized windows and are thus directly comparable.
- *
- * The cfs,rt,dl utilization are the running times measured with rq->clock_task
- * which excludes things like IRQ and steal-time. These latter are then accrued
- * in the irq utilization.
- *
- * The DL bandwidth number otoh is not a measured metric but a value computed
- * based on the task model parameters and gives the minimal utilization
- * required to meet deadlines.
- */
-unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
-                                 enum schedutil_type type)
-{
-       struct rq *rq = cpu_rq(cpu);
-       unsigned long util, irq, max;
-
-       max = arch_scale_cpu_capacity(NULL, cpu);
-
-       if (type == frequency_util && rt_rq_is_runnable(&rq->rt))
-               return max;
-
-       /*
-        * Early check to see if IRQ/steal time saturates the CPU, can be
-        * because of inaccuracies in how we track these -- see
-        * update_irq_load_avg().
-        */
-       irq = cpu_util_irq(rq);
-       if (unlikely(irq >= max))
-               return max;
-
-       /*
-        * Because the time spend on RT/DL tasks is visible as 'lost' time to
-        * CFS tasks and we use the same metric to track the effective
-        * utilization (PELT windows are synchronized) we can directly add them
-        * to obtain the CPU's actual utilization.
-        */
-       util = util_cfs;
-       util += cpu_util_rt(rq);
-
-       if (type == frequency_util) {
-               /*
-                * For frequency selection we do not make cpu_util_dl() a
-                * permanent part of this sum because we want to use
-                * cpu_bw_dl() later on, but we need to check if the
-                * CFS+RT+DL sum is saturated (ie. no idle time) such
-                * that we select f_max when there is no idle time.
-                *
-                * NOTE: numerical errors or stop class might cause us
-                * to not quite hit saturation when we should --
-                * something for later.
-                */
-
-               if ((util + cpu_util_dl(rq)) >= max)
-                       return max;
-       } else {
-               /*
-                * OTOH, for energy computation we need the estimated
-                * running time, so include util_dl and ignore dl_bw.
-                */
-               util += cpu_util_dl(rq);
-               if (util >= max)
-                       return max;
-       }
-
-       /*
-        * There is still idle time; further improve the number by using the
-        * irq metric. Because IRQ/steal time is hidden from the task clock we
-        * need to scale the task numbers:
-        *
-        *              1 - irq
-        *   U' = irq + ------- * U
-        *                max
-        */
-       util *= (max - irq);
-       util /= max;
-       util += irq;
-
-       if (type == frequency_util) {
-               /*
-                * Bandwidth required by DEADLINE must always be granted
-                * while, for FAIR and RT, we use blocked utilization of
-                * IDLE CPUs as a mechanism to gracefully reduce the
-                * frequency when no tasks show up for longer periods of
-                * time.
-                *
-                * Ideally we would like to set bw_dl as min/guaranteed
-                * freq and util + bw_dl as requested freq. However,
-                * cpufreq is not yet ready for such an interface. So,
-                * we only do the latter for now.
-                */
-               util += cpu_bw_dl(rq);
-       }
-
-       return min(max, util);
-}
-
 static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
 {
        struct rq *rq = cpu_rq(sg_cpu->cpu);
-       unsigned long util = cpu_util_cfs(rq);

        sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
        sg_cpu->bw_dl = cpu_bw_dl(rq);

-       return schedutil_freq_util(sg_cpu->cpu, util, frequency_util);
+       if (rt_rq_is_runnable(&rq->rt))
+               return sg_cpu->max;
+
+       return cpu_util_total(sg_cpu->cpu, cpu_util_cfs(rq));
 }

 /**
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 35475c0c5419..5f99bd564dfc 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -397,3 +397,77 @@ int update_irq_load_avg(struct rq *rq, u64 running)
        return ret;
 }
 #endif
+
+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ *   cpu_util_{cfs,rt,dl,irq}()
+ *   cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued
+ * in the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured metric but a value computed
+ * based on the task model parameters and gives the minimal utilization
+ * required to meet deadlines.
+ */
+unsigned long cpu_util_total(int cpu, unsigned long util_cfs)
+{
+       struct rq *rq = cpu_rq(cpu);
+       unsigned long util, irq, max;
+
+       max = arch_scale_cpu_capacity(NULL, cpu);
+
+       /*
+        * Early check to see if IRQ/steal time saturates the CPU, can be
+        * because of inaccuracies in how we track these -- see
+        * update_irq_load_avg().
+        */
+       irq = cpu_util_irq(rq);
+       if (unlikely(irq >= max))
+               return max;
+
+       /*
+        * Because the time spend on RT/DL tasks is visible as 'lost' time to
+        * CFS tasks and we use the same metric to track the effective
+        * utilization (PELT windows are synchronized) we can directly add them
+        * to obtain the CPU's actual utilization.
+        */
+       util = util_cfs;
+       util += cpu_util_rt(rq);
+
+       if ((util + cpu_util_dl(rq)) >= max)
+               return max;
+
+       /*
+        * There is still idle time; further improve the number by using the
+        * irq metric. Because IRQ/steal time is hidden from the task clock we
+        * need to scale the task numbers:
+        *
+        *              1 - irq
+        *   U' = irq + ------- * U
+        *                max
+        */
+       util *= (max - irq);
+       util /= max;
+       util += irq;
+
+       /*
+        * Bandwidth required by DEADLINE must always be granted while, for
+        * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+        * to gracefully reduce the frequency when no tasks show up for longer
+        * periods of time.
+        *
+        * Ideally we would like to set bw_dl as min/guaranteed freq and util +
+        * bw_dl as requested freq. However, cpufreq is not yet ready for such
+        * an interface. So, we only do the latter for now.
+        */
+       return min(max, util + cpu_bw_dl(rq));
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 51e7f113ee23..7ad037bb653e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2185,14 +2185,9 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 # define arch_scale_freq_invariant()   false
 #endif

-enum schedutil_type {
-       frequency_util,
-       energy_util,
-};

-#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
-                                 enum schedutil_type type);
+#ifdef CONFIG_SMP
+unsigned long cpu_util_total(int cpu, unsigned long cfs_util);

 static inline unsigned long cpu_bw_dl(struct rq *rq)
 {
@@ -2233,12 +2228,6 @@ static inline unsigned long cpu_util_irq(struct rq *rq)
 }

 #endif
-#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
-static inline unsigned long schedutil_freq_util(int cpu, unsigned long util,
-                                 enum schedutil_type type)
-{
-       return util;
-}
 #endif

 #ifdef CONFIG_SMP

  reply index

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24 12:25 [PATCH v5 00/14] Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 01/14] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-07-24 12:25 ` [PATCH v5 02/14] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-07-24 12:25 ` [PATCH v5 03/14] PM: Introduce an Energy Model management framework Quentin Perret
2018-08-09 21:52   ` Rafael J. Wysocki
2018-08-10  8:15     ` Quentin Perret
2018-08-10  8:41       ` Rafael J. Wysocki
2018-08-10  9:12         ` Quentin Perret
2018-08-10 11:13           ` Rafael J. Wysocki
2018-08-10 12:30             ` Quentin Perret
2018-08-12  9:49               ` Rafael J. Wysocki
2018-07-24 12:25 ` [PATCH v5 04/14] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-07-24 12:25 ` [PATCH v5 05/14] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-07-24 12:25 ` [PATCH v5 06/14] sched/topology: Lowest energy aware balancing sched_domain level pointer Quentin Perret
2018-07-26 16:00   ` Valentin Schneider
2018-07-26 17:01     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 07/14] sched/topology: Introduce sched_energy_present static key Quentin Perret
2018-07-24 12:25 ` [PATCH v5 08/14] sched/fair: Clean-up update_sg_lb_stats parameters Quentin Perret
2018-07-24 12:25 ` [PATCH v5 09/14] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-08-02 12:26   ` Peter Zijlstra
2018-08-02 13:03     ` Quentin Perret
2018-08-02 13:08       ` Peter Zijlstra
2018-08-02 13:18         ` Quentin Perret
2018-08-02 13:48           ` Vincent Guittot
2018-08-02 14:14             ` Quentin Perret
2018-08-02 15:14               ` Vincent Guittot
2018-08-02 15:30                 ` Quentin Perret
2018-08-02 15:55                   ` Vincent Guittot
2018-08-02 16:00                     ` Quentin Perret
2018-08-02 16:07                       ` Vincent Guittot
2018-08-02 16:10                         ` Quentin Perret
2018-08-02 16:38                           ` Vincent Guittot
2018-08-02 16:59                             ` Quentin Perret
2018-08-03  7:48                               ` Vincent Guittot
2018-08-03  8:18                                 ` Quentin Perret
2018-08-03 13:49                                   ` Vincent Guittot
2018-08-03 14:21                                     ` Vincent Guittot
2018-08-03 15:55                                     ` Quentin Perret
2018-08-06  8:40                                       ` Vincent Guittot
2018-08-06  9:43                                         ` Quentin Perret
2018-08-06 10:45                                           ` Vincent Guittot
2018-08-06 11:02                                             ` Quentin Perret
2018-08-06 10:08                                         ` Dietmar Eggemann
2018-08-06 10:33                                           ` Vincent Guittot
2018-08-06 12:29                                             ` Dietmar Eggemann
2018-08-06 12:37                                               ` Vincent Guittot
2018-08-06 13:20                                                 ` Dietmar Eggemann
2018-08-09  9:30   ` Vincent Guittot
2018-08-09  9:38     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method Quentin Perret
2018-07-30 19:35   ` skannan
2018-07-31  7:59     ` Quentin Perret
2018-07-31 19:31       ` skannan
2018-08-01  7:32         ` Rafael J. Wysocki
2018-08-01  8:23           ` Quentin Perret [this message]
2018-08-01  8:35             ` Rafael J. Wysocki
2018-08-01  9:23               ` Quentin Perret
2018-08-01  9:40                 ` Rafael J. Wysocki
2018-08-02 13:04                 ` Peter Zijlstra
2018-08-02 15:39                   ` Quentin Perret
2018-08-03 13:04                     ` Quentin Perret
2018-08-02 12:33     ` Peter Zijlstra
2018-08-02 12:45       ` Peter Zijlstra
2018-08-02 15:21         ` Quentin Perret
2018-08-02 17:36           ` Peter Zijlstra
2018-08-03 12:42             ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 11/14] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-07-24 12:25 ` [PATCH v5 12/14] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-08-02 13:54   ` Peter Zijlstra
2018-08-02 16:21     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 13/14] OPTIONAL: arch_topology: Start Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 14/14] OPTIONAL: cpufreq: dt: Register an Energy Model Quentin Perret

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180801082353.egym4tsbr7ppql27@queper01-lin \
    --to=quentin.perret@arm.com \
    --cc=adharmap@quicinc.com \
    --cc=chris.redpath@arm.com \
    --cc=currojerez@riseup.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=javi.merino@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm-owner@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=skannan@codeaurora.org \
    --cc=skannan@quicinc.com \
    --cc=smuckle@google.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=thara.gopinath@linaro.org \
    --cc=tkjos@google.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox