All of lore.kernel.org
 help / color / mirror / Atom feed
From: Quentin Perret <quentin.perret@arm.com>
To: Leo Yan <leo.yan@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Thara Gopinath <thara.gopinath@linaro.org>,
	linux-pm@vger.kernel.org,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Chris Redpath <chris.redpath@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steve Muckle <smuckle@google.com>,
	Eduardo Valentin <edubezval@gmail.com>
Subject: Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function
Date: Wed, 18 Apr 2018 09:13:39 +0100	[thread overview]
Message-ID: <20180418081339.GB3943@e108498-lin.cambridge.arm.com> (raw)
In-Reply-To: <20180417152213.GC18509@leoy-ThinkPad-X240s>

On Tuesday 17 Apr 2018 at 23:22:13 (+0800), Leo Yan wrote:
> On Fri, Apr 06, 2018 at 04:36:05PM +0100, Dietmar Eggemann wrote:
> > From: Quentin Perret <quentin.perret@arm.com>
> > 
> > In preparation for the definition of an energy-aware wakeup path, a
> > helper function is provided to estimate the consequence on system energy
> > when a specific task wakes-up on a specific CPU. compute_energy()
> > estimates the OPPs to be reached by all frequency domains and estimates
> > the consumption of each online CPU according to its energy model and its
> > percentage of busy time.
> > 
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > ---
> >  include/linux/sched/energy.h | 20 +++++++++++++
> >  kernel/sched/fair.c          | 68 ++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/sched/sched.h         |  2 +-
> >  3 files changed, 89 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
> > index 941071eec013..b4110b145228 100644
> > --- a/include/linux/sched/energy.h
> > +++ b/include/linux/sched/energy.h
> > @@ -27,6 +27,24 @@ static inline bool sched_energy_enabled(void)
> >  	return static_branch_unlikely(&sched_energy_present);
> >  }
> >  
> > +static inline
> > +struct capacity_state *find_cap_state(int cpu, unsigned long util)
> > +{
> > +	struct sched_energy_model *em = *per_cpu_ptr(energy_model, cpu);
> > +	struct capacity_state *cs = NULL;
> > +	int i;
> > +
> > +	util += util >> 2;
> > +
> > +	for (i = 0; i < em->nr_cap_states; i++) {
> > +		cs = &em->cap_states[i];
> > +		if (cs->cap >= util)
> > +			break;
> > +	}
> > +
> > +	return cs;
> 
> 'cs' is possible to return NULL.

Only if em-nr_cap_states==0, and that shouldn't be possible if
sched_energy_present==True, so this code should be safe :-)

> 
> > +}
> > +
> >  static inline struct cpumask *freq_domain_span(struct freq_domain *fd)
> >  {
> >  	return &fd->span;
> > @@ -42,6 +60,8 @@ struct freq_domain;
> >  static inline bool sched_energy_enabled(void) { return false; }
> >  static inline struct cpumask
> >  *freq_domain_span(struct freq_domain *fd) { return NULL; }
> > +static inline struct capacity_state
> > +*find_cap_state(int cpu, unsigned long util) { return NULL; }
> >  static inline void init_sched_energy(void) { }
> >  #define for_each_freq_domain(fdom) for (; fdom; fdom = NULL)
> >  #endif
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6960e5ef3c14..8cb9fb04fff2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6633,6 +6633,74 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
> >  }
> >  
> >  /*
> > + * Returns the util of "cpu" if "p" wakes up on "dst_cpu".
> > + */
> > +static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
> > +{
> > +	unsigned long util, util_est;
> > +	struct cfs_rq *cfs_rq;
> > +
> > +	/* Task is where it should be, or has no impact on cpu */
> > +	if ((task_cpu(p) == dst_cpu) || (cpu != task_cpu(p) && cpu != dst_cpu))
> > +		return cpu_util(cpu);
> > +
> > +	cfs_rq = &cpu_rq(cpu)->cfs;
> > +	util = READ_ONCE(cfs_rq->avg.util_avg);
> > +
> > +	if (dst_cpu == cpu)
> > +		util += task_util(p);
> > +	else
> > +		util = max_t(long, util - task_util(p), 0);
> 
> I tried to understand the logic at here, below code is more clear for
> myself:
> 
>         int prev_cpu = task_cpu(p);
> 
>         cfs_rq = &cpu_rq(cpu)->cfs;
>         util = READ_ONCE(cfs_rq->avg.util_avg);
> 
>         /* Bail out if src and dst CPUs are the same one */
>         if (prev_cpu == cpu && dst_cpu == cpu)
>                 return util;
> 
>         /* Remove task utilization for src CPU */
>         if (cpu == prev_cpu)
>                 util = max_t(long, util - task_util(p), 0);
> 
>         /* Add task utilization for dst CPU */
>         if (dst_cpu == cpu)
>                 util += task_util(p);
> 
> BTW, CPU utilization is decayed value and task_util() is not decayed
> value, so 'util - task_util(p)' calculates a smaller value than the
> prev CPU pure utilization, right?

task_util() is the raw PELT signal, without UTIL_EST, so I think it's
fine to do `util - task_util()`.

> 
> Another question is can we reuse the function cpu_util_wake() and
> just compenstate task util for dst cpu?

Well it's not that simple. cpu_util_wake() will give you the max between
the util_avg and the util_est value, so which task_util() should you add
to it ? The util_avg or the uti_est value ?

Here we are trying to predict what will be the cpu_util signal in the
future, so the only always-correct implementation of this function has
to predict what will be the CPU util_avg and util_est signals in
parallel and take the max of the two.

> 
> > +	if (sched_feat(UTIL_EST)) {
> > +		util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> > +		if (dst_cpu == cpu)
> > +			util_est += _task_util_est(p);
> > +		else
> > +			util_est = max_t(long, util_est - _task_util_est(p), 0);
> > +		util = max(util, util_est);
> > +	}
> > +
> > +	return min_t(unsigned long, util, capacity_orig_of(cpu));
> > +}
> > +
> > +/*
> > + * Estimates the system level energy assuming that p wakes-up on dst_cpu.
> > + *
> > + * compute_energy() is safe to call only if an energy model is available for
> > + * the platform, which is when sched_energy_enabled() is true.
> > + */
> > +static unsigned long compute_energy(struct task_struct *p, int dst_cpu)
> > +{
> > +	unsigned long util, max_util, sum_util;
> > +	struct capacity_state *cs;
> > +	unsigned long energy = 0;
> > +	struct freq_domain *fd;
> > +	int cpu;
> > +
> > +	for_each_freq_domain(fd) {
> > +		max_util = sum_util = 0;
> > +		for_each_cpu_and(cpu, freq_domain_span(fd), cpu_online_mask) {
> > +			util = cpu_util_next(cpu, p, dst_cpu);
> > +			util += cpu_util_dl(cpu_rq(cpu));
> > +			max_util = max(util, max_util);
> > +			sum_util += util;
> > +		}
> > +
> > +		/*
> > +		 * Here we assume that the capacity states of CPUs belonging to
> > +		 * the same frequency domains are shared. Hence, we look at the
> > +		 * capacity state of the first CPU and re-use it for all.
> > +		 */
> > +		cpu = cpumask_first(freq_domain_span(fd));
> > +		cs = find_cap_state(cpu, max_util);
> > +		energy += cs->power * sum_util / cs->cap;
> > +	}
> 
> This means all CPUs will be iterated for calculation, the complexity is
> O(n)...
> 
> > +	return energy;
> > +}
> > +
> > +/*
> >   * select_task_rq_fair: Select target runqueue for the waking task in domains
> >   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> >   * SD_BALANCE_FORK, or SD_BALANCE_EXEC.
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 5d552c0d7109..6eb38f41d5d9 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -2156,7 +2156,7 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >  # define arch_scale_freq_invariant()	false
> >  #endif
> >  
> > -#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> > +#ifdef CONFIG_SMP
> >  static inline unsigned long cpu_util_dl(struct rq *rq)
> >  {
> >  	return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> > -- 
> > 2.11.0
> > 

  reply	other threads:[~2018-04-18  8:13 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-06 15:36 [RFC PATCH v2 0/6] Energy Aware Scheduling Dietmar Eggemann
2018-04-06 15:36 ` [RFC PATCH v2 1/6] sched/fair: Create util_fits_capacity() Dietmar Eggemann
2018-04-12  7:02   ` Viresh Kumar
2018-04-12  8:20     ` Dietmar Eggemann
2018-04-06 15:36 ` [RFC PATCH v2 2/6] sched: Introduce energy models of CPUs Dietmar Eggemann
2018-04-10 11:54   ` Peter Zijlstra
2018-04-10 12:03     ` Dietmar Eggemann
2018-04-13  4:02   ` Viresh Kumar
2018-04-13  8:37     ` Quentin Perret
2018-04-06 15:36 ` [RFC PATCH v2 3/6] sched: Add over-utilization/tipping point indicator Dietmar Eggemann
2018-04-13 23:56   ` Joel Fernandes
2018-04-18 11:17     ` Quentin Perret
2018-04-20  8:13       ` Joel Fernandes
2018-04-20  8:14         ` Joel Fernandes
2018-04-20  8:31           ` Quentin Perret
2018-04-20  8:57             ` Juri Lelli
2018-04-17 14:25   ` Leo Yan
2018-04-17 17:39     ` Dietmar Eggemann
2018-04-18  0:18       ` Leo Yan
2018-04-06 15:36 ` [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function Dietmar Eggemann
2018-04-10 12:51   ` Peter Zijlstra
2018-04-10 13:56     ` Quentin Perret
2018-04-10 14:08       ` Peter Zijlstra
2018-04-13  6:27   ` Viresh Kumar
2018-04-17 15:22   ` Leo Yan
2018-04-18  8:13     ` Quentin Perret [this message]
2018-04-18  9:19       ` Leo Yan
2018-04-18 11:06         ` Quentin Perret
2018-04-18  9:23   ` Leo Yan
2018-04-20 14:51     ` Quentin Perret
2018-04-18 12:15   ` Leo Yan
2018-04-20 14:42     ` Quentin Perret
2018-04-20 16:27       ` Leo Yan
2018-04-25  8:23         ` Quentin Perret
2018-04-06 15:36 ` [RFC PATCH v2 5/6] sched/fair: Select an energy-efficient CPU on task wake-up Dietmar Eggemann
2018-04-09 16:30   ` Peter Zijlstra
2018-04-09 16:43     ` Quentin Perret
2018-04-10 17:29   ` Peter Zijlstra
2018-04-10 18:14     ` Quentin Perret
2018-04-17 15:39   ` Leo Yan
2018-04-18  7:57     ` Quentin Perret
2018-04-06 15:36 ` [RFC PATCH v2 6/6] drivers: base: arch_topology.c: Enable EAS for arm/arm64 platforms Dietmar Eggemann
2018-04-17 12:50 ` [RFC PATCH v2 0/6] Energy Aware Scheduling Leo Yan
2018-04-17 17:22   ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418081339.GB3943@e108498-lin.cambridge.arm.com \
    --to=quentin.perret@arm.com \
    --cc=chris.redpath@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=smuckle@google.com \
    --cc=thara.gopinath@linaro.org \
    --cc=tkjos@google.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.