linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Quentin Perret <quentin.perret@arm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Saravana Kannan <skannan@codeaurora.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Chris Redpath <chris.redpath@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Thara Gopinath <thara.gopinath@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Steve Muckle <smuckle@google.com>,
	adharmap@quicinc.com, skannan@quicinc.com,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Eduardo Valentin <edubezval@gmail.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	currojerez@riseup.net, Javi Merino <javi.merino@kernel.org>,
	linux-pm-owner@vger.kernel.org
Subject: Re: [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method
Date: Wed, 1 Aug 2018 10:35:32 +0200	[thread overview]
Message-ID: <CAJZ5v0joYFXkXxoV8odPtrCNW=jAw3RVv3rTzMoabeYWGDnREw@mail.gmail.com> (raw)
In-Reply-To: <20180801082353.egym4tsbr7ppql27@queper01-lin>

On Wed, Aug 1, 2018 at 10:23 AM, Quentin Perret <quentin.perret@arm.com> wrote:
> On Wednesday 01 Aug 2018 at 09:32:49 (+0200), Rafael J. Wysocki wrote:
>> On Tue, Jul 31, 2018 at 9:31 PM,  <skannan@codeaurora.org> wrote:
>> > On 2018-07-31 00:59, Quentin Perret wrote:
>> >>
>> >> On Monday 30 Jul 2018 at 12:35:27 (-0700), skannan@codeaurora.org wrote:
>> >> [...]
>> >>>
>> >>> If it's going to be a different aggregation from what's done for
>> >>> frequency
>> >>> guidance, I don't see the point of having this inside schedutil. Why not
>> >>> keep it inside the scheduler files?
>> >>
>> >>
>> >> This code basically results from a discussion we had with Peter on v4.
>> >> Keeping everything centralized can make sense from a maintenance
>> >> perspective, I think. That makes it easy to see the impact of any change
>> >> to utilization signals for both EAS and schedutil.
>> >
>> >
>> > In that case, I'd argue it makes more sense to keep the code centralized in
>> > the scheduler. The scheduler can let schedutil know about the utilization
>> > after it aggregates them. There's no need for a cpufreq governor to know
>> > that there are scheduling classes or how many there are. And the scheduler
>> > can then choose to aggregate one way for task packing and another way for
>> > frequency guidance.
>>
>> Also the aggregate utilization may be used by cpuidle governors in
>> principle to decide how deep they can go with idle state selection.
>
> The only issue I see with this right now is that some of the things done
> in this function are policy decisions which really belong to the governor,
> I think.

Well, the scheduler makes policy decisions too, in quite a few places. :-)

The really important consideration here is whether or not there may be
multiple governors making different policy decisions in that respect.
If not, then where exactly the single policy decision is made doesn't
particularly matter IMO.

> The RT-go-to-max-freq thing in particular. And I really don't
> think EAS should cope with that, at least for now.
>
> But if this specific bit is factored out of the aggregation function, I
> suppose we could move it somewhere else. Maybe pelt.c ?
>
> How ugly is something like the below (totally untested) code ? It would
> change slightly how we deal with DL utilization in EAS but I don't think
> this is an issue.
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index af86050edcf5..51c9ac9f30e8 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -178,121 +178,17 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         return cpufreq_driver_resolve_freq(policy, freq);
>  }
>
> -/*
> - * This function computes an effective utilization for the given CPU, to be
> - * used for frequency selection given the linear relation: f = u * f_max.
> - *
> - * The scheduler tracks the following metrics:
> - *
> - *   cpu_util_{cfs,rt,dl,irq}()
> - *   cpu_bw_dl()
> - *
> - * Where the cfs,rt and dl util numbers are tracked with the same metric and
> - * synchronized windows and are thus directly comparable.
> - *
> - * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> - * which excludes things like IRQ and steal-time. These latter are then accrued
> - * in the irq utilization.
> - *
> - * The DL bandwidth number otoh is not a measured metric but a value computed
> - * based on the task model parameters and gives the minimal utilization
> - * required to meet deadlines.
> - */
> -unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
> -                                 enum schedutil_type type)
> -{
> -       struct rq *rq = cpu_rq(cpu);
> -       unsigned long util, irq, max;
> -
> -       max = arch_scale_cpu_capacity(NULL, cpu);
> -
> -       if (type == frequency_util && rt_rq_is_runnable(&rq->rt))
> -               return max;
> -
> -       /*
> -        * Early check to see if IRQ/steal time saturates the CPU, can be
> -        * because of inaccuracies in how we track these -- see
> -        * update_irq_load_avg().
> -        */
> -       irq = cpu_util_irq(rq);
> -       if (unlikely(irq >= max))
> -               return max;
> -
> -       /*
> -        * Because the time spend on RT/DL tasks is visible as 'lost' time to
> -        * CFS tasks and we use the same metric to track the effective
> -        * utilization (PELT windows are synchronized) we can directly add them
> -        * to obtain the CPU's actual utilization.
> -        */
> -       util = util_cfs;
> -       util += cpu_util_rt(rq);
> -
> -       if (type == frequency_util) {
> -               /*
> -                * For frequency selection we do not make cpu_util_dl() a
> -                * permanent part of this sum because we want to use
> -                * cpu_bw_dl() later on, but we need to check if the
> -                * CFS+RT+DL sum is saturated (ie. no idle time) such
> -                * that we select f_max when there is no idle time.
> -                *
> -                * NOTE: numerical errors or stop class might cause us
> -                * to not quite hit saturation when we should --
> -                * something for later.
> -                */
> -
> -               if ((util + cpu_util_dl(rq)) >= max)
> -                       return max;
> -       } else {
> -               /*
> -                * OTOH, for energy computation we need the estimated
> -                * running time, so include util_dl and ignore dl_bw.
> -                */
> -               util += cpu_util_dl(rq);
> -               if (util >= max)
> -                       return max;
> -       }
> -
> -       /*
> -        * There is still idle time; further improve the number by using the
> -        * irq metric. Because IRQ/steal time is hidden from the task clock we
> -        * need to scale the task numbers:
> -        *
> -        *              1 - irq
> -        *   U' = irq + ------- * U
> -        *                max
> -        */
> -       util *= (max - irq);
> -       util /= max;
> -       util += irq;
> -
> -       if (type == frequency_util) {
> -               /*
> -                * Bandwidth required by DEADLINE must always be granted
> -                * while, for FAIR and RT, we use blocked utilization of
> -                * IDLE CPUs as a mechanism to gracefully reduce the
> -                * frequency when no tasks show up for longer periods of
> -                * time.
> -                *
> -                * Ideally we would like to set bw_dl as min/guaranteed
> -                * freq and util + bw_dl as requested freq. However,
> -                * cpufreq is not yet ready for such an interface. So,
> -                * we only do the latter for now.
> -                */
> -               util += cpu_bw_dl(rq);
> -       }
> -
> -       return min(max, util);
> -}
> -
>  static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
>         struct rq *rq = cpu_rq(sg_cpu->cpu);
> -       unsigned long util = cpu_util_cfs(rq);
>
>         sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
>         sg_cpu->bw_dl = cpu_bw_dl(rq);
>
> -       return schedutil_freq_util(sg_cpu->cpu, util, frequency_util);
> +       if (rt_rq_is_runnable(&rq->rt))
> +               return sg_cpu->max;
> +
> +       return cpu_util_total(sg_cpu->cpu, cpu_util_cfs(rq));
>  }
>
>  /**
> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
> index 35475c0c5419..5f99bd564dfc 100644
> --- a/kernel/sched/pelt.c
> +++ b/kernel/sched/pelt.c
> @@ -397,3 +397,77 @@ int update_irq_load_avg(struct rq *rq, u64 running)
>         return ret;
>  }
>  #endif
> +
> +/*
> + * This function computes an effective utilization for the given CPU, to be
> + * used for frequency selection given the linear relation: f = u * f_max.
> + *
> + * The scheduler tracks the following metrics:
> + *
> + *   cpu_util_{cfs,rt,dl,irq}()
> + *   cpu_bw_dl()
> + *
> + * Where the cfs,rt and dl util numbers are tracked with the same metric and
> + * synchronized windows and are thus directly comparable.
> + *
> + * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> + * which excludes things like IRQ and steal-time. These latter are then accrued
> + * in the irq utilization.
> + *
> + * The DL bandwidth number otoh is not a measured metric but a value computed
> + * based on the task model parameters and gives the minimal utilization
> + * required to meet deadlines.
> + */
> +unsigned long cpu_util_total(int cpu, unsigned long util_cfs)
> +{
> +       struct rq *rq = cpu_rq(cpu);
> +       unsigned long util, irq, max;
> +
> +       max = arch_scale_cpu_capacity(NULL, cpu);
> +
> +       /*
> +        * Early check to see if IRQ/steal time saturates the CPU, can be
> +        * because of inaccuracies in how we track these -- see
> +        * update_irq_load_avg().
> +        */
> +       irq = cpu_util_irq(rq);
> +       if (unlikely(irq >= max))
> +               return max;
> +
> +       /*
> +        * Because the time spend on RT/DL tasks is visible as 'lost' time to
> +        * CFS tasks and we use the same metric to track the effective
> +        * utilization (PELT windows are synchronized) we can directly add them
> +        * to obtain the CPU's actual utilization.
> +        */
> +       util = util_cfs;
> +       util += cpu_util_rt(rq);
> +
> +       if ((util + cpu_util_dl(rq)) >= max)
> +               return max;
> +
> +       /*
> +        * There is still idle time; further improve the number by using the
> +        * irq metric. Because IRQ/steal time is hidden from the task clock we
> +        * need to scale the task numbers:
> +        *
> +        *              1 - irq
> +        *   U' = irq + ------- * U
> +        *                max
> +        */
> +       util *= (max - irq);
> +       util /= max;
> +       util += irq;
> +
> +       /*
> +        * Bandwidth required by DEADLINE must always be granted while, for
> +        * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
> +        * to gracefully reduce the frequency when no tasks show up for longer
> +        * periods of time.
> +        *
> +        * Ideally we would like to set bw_dl as min/guaranteed freq and util +
> +        * bw_dl as requested freq. However, cpufreq is not yet ready for such
> +        * an interface. So, we only do the latter for now.
> +        */
> +       return min(max, util + cpu_bw_dl(rq));
> +}
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 51e7f113ee23..7ad037bb653e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2185,14 +2185,9 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>  # define arch_scale_freq_invariant()   false
>  #endif
>
> -enum schedutil_type {
> -       frequency_util,
> -       energy_util,
> -};
>
> -#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> -unsigned long schedutil_freq_util(int cpu, unsigned long util_cfs,
> -                                 enum schedutil_type type);
> +#ifdef CONFIG_SMP
> +unsigned long cpu_util_total(int cpu, unsigned long cfs_util);
>
>  static inline unsigned long cpu_bw_dl(struct rq *rq)
>  {
> @@ -2233,12 +2228,6 @@ static inline unsigned long cpu_util_irq(struct rq *rq)
>  }
>
>  #endif
> -#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
> -static inline unsigned long schedutil_freq_util(int cpu, unsigned long util,
> -                                 enum schedutil_type type)
> -{
> -       return util;
> -}
>  #endif
>
>  #ifdef CONFIG_SMP

This doesn't look objectionable to me.

  reply	other threads:[~2018-08-01  8:35 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24 12:25 [PATCH v5 00/14] Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 01/14] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-07-24 12:25 ` [PATCH v5 02/14] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-07-24 12:25 ` [PATCH v5 03/14] PM: Introduce an Energy Model management framework Quentin Perret
2018-08-09 21:52   ` Rafael J. Wysocki
2018-08-10  8:15     ` Quentin Perret
2018-08-10  8:41       ` Rafael J. Wysocki
2018-08-10  9:12         ` Quentin Perret
2018-08-10 11:13           ` Rafael J. Wysocki
2018-08-10 12:30             ` Quentin Perret
2018-08-12  9:49               ` Rafael J. Wysocki
2018-07-24 12:25 ` [PATCH v5 04/14] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-07-24 12:25 ` [PATCH v5 05/14] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-07-24 12:25 ` [PATCH v5 06/14] sched/topology: Lowest energy aware balancing sched_domain level pointer Quentin Perret
2018-07-26 16:00   ` Valentin Schneider
2018-07-26 17:01     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 07/14] sched/topology: Introduce sched_energy_present static key Quentin Perret
2018-07-24 12:25 ` [PATCH v5 08/14] sched/fair: Clean-up update_sg_lb_stats parameters Quentin Perret
2018-07-24 12:25 ` [PATCH v5 09/14] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-08-02 12:26   ` Peter Zijlstra
2018-08-02 13:03     ` Quentin Perret
2018-08-02 13:08       ` Peter Zijlstra
2018-08-02 13:18         ` Quentin Perret
2018-08-02 13:48           ` Vincent Guittot
2018-08-02 14:14             ` Quentin Perret
2018-08-02 15:14               ` Vincent Guittot
2018-08-02 15:30                 ` Quentin Perret
2018-08-02 15:55                   ` Vincent Guittot
2018-08-02 16:00                     ` Quentin Perret
2018-08-02 16:07                       ` Vincent Guittot
2018-08-02 16:10                         ` Quentin Perret
2018-08-02 16:38                           ` Vincent Guittot
2018-08-02 16:59                             ` Quentin Perret
2018-08-03  7:48                               ` Vincent Guittot
2018-08-03  8:18                                 ` Quentin Perret
2018-08-03 13:49                                   ` Vincent Guittot
2018-08-03 14:21                                     ` Vincent Guittot
2018-08-03 15:55                                     ` Quentin Perret
2018-08-06  8:40                                       ` Vincent Guittot
2018-08-06  9:43                                         ` Quentin Perret
2018-08-06 10:45                                           ` Vincent Guittot
2018-08-06 11:02                                             ` Quentin Perret
2018-08-06 10:08                                         ` Dietmar Eggemann
2018-08-06 10:33                                           ` Vincent Guittot
2018-08-06 12:29                                             ` Dietmar Eggemann
2018-08-06 12:37                                               ` Vincent Guittot
2018-08-06 13:20                                                 ` Dietmar Eggemann
2018-08-09  9:30   ` Vincent Guittot
2018-08-09  9:38     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 10/14] sched/cpufreq: Refactor the utilization aggregation method Quentin Perret
2018-07-30 19:35   ` skannan
2018-07-31  7:59     ` Quentin Perret
2018-07-31 19:31       ` skannan
2018-08-01  7:32         ` Rafael J. Wysocki
2018-08-01  8:23           ` Quentin Perret
2018-08-01  8:35             ` Rafael J. Wysocki [this message]
2018-08-01  9:23               ` Quentin Perret
2018-08-01  9:40                 ` Rafael J. Wysocki
2018-08-02 13:04                 ` Peter Zijlstra
2018-08-02 15:39                   ` Quentin Perret
2018-08-03 13:04                     ` Quentin Perret
2018-08-02 12:33     ` Peter Zijlstra
2018-08-02 12:45       ` Peter Zijlstra
2018-08-02 15:21         ` Quentin Perret
2018-08-02 17:36           ` Peter Zijlstra
2018-08-03 12:42             ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 11/14] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-07-24 12:25 ` [PATCH v5 12/14] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-08-02 13:54   ` Peter Zijlstra
2018-08-02 16:21     ` Quentin Perret
2018-07-24 12:25 ` [PATCH v5 13/14] OPTIONAL: arch_topology: Start Energy Aware Scheduling Quentin Perret
2018-07-24 12:25 ` [PATCH v5 14/14] OPTIONAL: cpufreq: dt: Register an Energy Model Quentin Perret

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0joYFXkXxoV8odPtrCNW=jAw3RVv3rTzMoabeYWGDnREw@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=adharmap@quicinc.com \
    --cc=chris.redpath@arm.com \
    --cc=currojerez@riseup.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=javi.merino@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm-owner@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=quentin.perret@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=skannan@codeaurora.org \
    --cc=skannan@quicinc.com \
    --cc=smuckle@google.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=thara.gopinath@linaro.org \
    --cc=tkjos@google.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).