All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Quentin Perret <quentin.perret@arm.com>
Cc: peterz@infradead.org, rjw@rjwysocki.net,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org, mingo@redhat.com,
	dietmar.eggemann@arm.com, morten.rasmussen@arm.com,
	chris.redpath@arm.com, patrick.bellasi@arm.com,
	valentin.schneider@arm.com, vincent.guittot@linaro.org,
	thara.gopinath@linaro.org, viresh.kumar@linaro.org,
	tkjos@google.com, joelaf@google.com, smuckle@google.com,
	adharmap@quicinc.com, skannan@quicinc.com,
	pkondeti@codeaurora.org, edubezval@gmail.com,
	srinivas.pandruvada@linux.intel.com, currojerez@riseup.net,
	javi.merino@kernel.org
Subject: Re: [RFC PATCH v3 05/10] sched/topology: Reference the Energy Model of CPUs when available
Date: Thu, 7 Jun 2018 16:44:22 +0200	[thread overview]
Message-ID: <20180607144422.GA17216@localhost.localdomain> (raw)
In-Reply-To: <20180521142505.6522-6-quentin.perret@arm.com>

Hi,

On 21/05/18 15:25, Quentin Perret wrote:
> In order to use EAS, the task scheduler has to know about the Energy
> Model (EM) of the platform. This commit extends the scheduler topology
> code to take references on the frequency domains objects of the EM
> framework for all online CPUs. Hence, the availability of the EM for
> those CPUs is guaranteed to the scheduler at runtime without further
> checks in latency sensitive code paths (i.e. task wake-up).
> 
> A (RCU-protected) private list of online frequency domains is maintained
> by the scheduler to enable fast iterations. Furthermore, the availability
> of an EM is notified to the rest of the scheduler with a static key,
> which ensures a low impact on non-EAS systems.
> 
> Energy Aware Scheduling can be started if and only if:
>    1. all online CPUs are covered by the EM;
>    2. the EM complexity is low enough to keep scheduling overheads low;
>    3. the platform has an asymmetric CPU capacity topology (detected by
>       looking for the SD_ASYM_CPUCAPACITY flag in the sched_domain
>       hierarchy).

Not sure about this. How about multi-freq domain same max capacity
systems. I understand that most of the energy saving come from selecting
the right (big/LITTLE) cluster, but EM should still be useful to drive
OPP selection (that was one of the use-cases we discussed lately IIRC)
and also to decide between packing or spreading, no?

> The sched_energy_enabled() function which returns the status of the
> static key is stubbed to false when CONFIG_ENERGY_MODEL=n, hence making
> sure that all the code behind it can be compiled out by constant
> propagation.

Actually, do we need a config option at all? Shouldn't the static key
(and RCU machinery) guard against unwanted overheads when EM is not
present/used?

I was thinking it should be pretty similar to schedutil setup, no?

> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> ---
>  kernel/sched/sched.h    |  27 ++++++++++
>  kernel/sched/topology.c | 113 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 140 insertions(+)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ce562d3b7526..7c517076a74a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -63,6 +63,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/task_work.h>
>  #include <linux/tsacct_kern.h>
> +#include <linux/energy_model.h>
>  
>  #include <asm/tlb.h>
>  
> @@ -2162,3 +2163,29 @@ static inline unsigned long cpu_util_cfs(struct rq *rq)
>  	return util;
>  }
>  #endif
> +
> +struct sched_energy_fd {
> +	struct em_freq_domain *fd;
> +	struct list_head next;
> +	struct rcu_head rcu;
> +};
> +
> +#ifdef CONFIG_ENERGY_MODEL
> +extern struct static_key_false sched_energy_present;
> +static inline bool sched_energy_enabled(void)
> +{
> +	return static_branch_unlikely(&sched_energy_present);
> +}
> +
> +extern struct list_head sched_energy_fd_list;
> +#define for_each_freq_domain(sfd) \
> +		list_for_each_entry_rcu(sfd, &sched_energy_fd_list, next)
> +#define freq_domain_span(sfd) (&((sfd)->fd->cpus))
> +#else
> +static inline bool sched_energy_enabled(void)
> +{
> +	return false;
> +}
> +#define for_each_freq_domain(sfd) for (sfd = NULL; sfd;)
> +#define freq_domain_span(sfd) NULL
> +#endif
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 64cc564f5255..3e22c798f18d 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1500,6 +1500,116 @@ void sched_domains_numa_masks_clear(unsigned int cpu)
>  
>  #endif /* CONFIG_NUMA */
>  
> +#ifdef CONFIG_ENERGY_MODEL
> +
> +/*
> + * The complexity of the Energy Model is defined as the product of the number
> + * of frequency domains with the sum of the number of CPUs and the total
> + * number of OPPs in all frequency domains. It is generally not a good idea
> + * to use such a model on very complex platform because of the associated
> + * scheduling overheads. The arbitrary constraint below prevents that. It
> + * makes EAS usable up to 16 CPUs with per-CPU DVFS and less than 8 OPPs each,
> + * for example.
> + */
> +#define EM_MAX_COMPLEXITY 2048

Do we really need this hardcoded constant?

I guess if one spent time deriving an EM for a big system with lot of
OPPs, she/he already knows what is doing? :)

> +
> +DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> +LIST_HEAD(sched_energy_fd_list);
> +
> +static struct sched_energy_fd *find_sched_energy_fd(int cpu)
> +{
> +	struct sched_energy_fd *sfd;
> +
> +	for_each_freq_domain(sfd) {
> +		if (cpumask_test_cpu(cpu, freq_domain_span(sfd)))
> +			return sfd;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void free_sched_energy_fd(struct rcu_head *rp)
> +{
> +	struct sched_energy_fd *sfd;
> +
> +	sfd = container_of(rp, struct sched_energy_fd, rcu);
> +	kfree(sfd);
> +}
> +
> +static void build_sched_energy(void)
> +{
> +	struct sched_energy_fd *sfd, *tmp;
> +	struct em_freq_domain *fd;
> +	struct sched_domain *sd;
> +	int cpu, nr_fd = 0, nr_opp = 0;
> +
> +	rcu_read_lock();
> +
> +	/* Disable EAS entirely whenever the system isn't asymmetric. */
> +	cpu = cpumask_first(cpu_online_mask);
> +	sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY);
> +	if (!sd) {
> +		pr_debug("%s: no SD_ASYM_CPUCAPACITY\n", __func__);
> +		goto disable;
> +	}
> +
> +	/* Make sure to have an energy model for all CPUs. */
> +	for_each_online_cpu(cpu) {
> +		/* Skip CPUs with a known energy model. */
> +		sfd = find_sched_energy_fd(cpu);
> +		if (sfd)
> +			continue;
> +
> +		/* Add the energy model of others. */
> +		fd = em_cpu_get(cpu);
> +		if (!fd)
> +			goto disable;
> +		sfd = kzalloc(sizeof(*sfd), GFP_NOWAIT);
> +		if (!sfd)
> +			goto disable;
> +		sfd->fd = fd;
> +		list_add_rcu(&sfd->next, &sched_energy_fd_list);
> +	}
> +
> +	list_for_each_entry_safe(sfd, tmp, &sched_energy_fd_list, next) {
> +		if (cpumask_intersects(freq_domain_span(sfd),
> +							cpu_online_mask)) {
> +			nr_opp += em_fd_nr_cap_states(sfd->fd);
> +			nr_fd++;
> +			continue;
> +		}
> +
> +		/* Remove the unused frequency domains */
> +		list_del_rcu(&sfd->next);
> +		call_rcu(&sfd->rcu, free_sched_energy_fd);

Unused because of? Hotplug?

Not sure, but I guess you have considered the idea of tearing all this
down when sched domains are destroied and then rebuilding it again? Why
did you decide for this approach? Or maybe I just missed where you do
that. :/

> +	}
> +
> +	/* Bail out if the Energy Model complexity is too high. */
> +	if (nr_fd * (nr_opp + num_online_cpus()) > EM_MAX_COMPLEXITY) {
> +		pr_warn("%s: EM complexity too high, stopping EAS", __func__);
> +		goto disable;
> +	}
> +
> +	rcu_read_unlock();
> +	static_branch_enable_cpuslocked(&sched_energy_present);
> +	pr_debug("%s: EAS started\n", __func__);

I'd vote for a pr_info here instead, maybe printing info about the em as
well. Looks pretty useful to me to have that in dmesg. Maybe guarded by
sched_debug?

Best,

- Juri

  reply	other threads:[~2018-06-07 14:46 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 14:24 [RFC PATCH v3 00/10] Energy Aware Scheduling Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 01/10] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 02/10] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 03/10] PM: Introduce an Energy Model management framework Quentin Perret
2018-06-06 13:12   ` Dietmar Eggemann
2018-06-06 14:37     ` Quentin Perret
2018-06-06 15:20       ` Juri Lelli
2018-06-06 15:29         ` Quentin Perret
2018-06-06 16:26           ` Quentin Perret
2018-06-07 15:58             ` Dietmar Eggemann
2018-06-08 13:39             ` Javi Merino
2018-06-08 15:47               ` Quentin Perret
2018-06-09  8:24                 ` Javi Merino
2018-06-06 16:47   ` Juri Lelli
2018-06-06 16:59     ` Quentin Perret
2018-06-07 14:44   ` Juri Lelli
2018-06-07 15:19     ` Quentin Perret
2018-06-07 15:55       ` Dietmar Eggemann
2018-06-08  8:25         ` Quentin Perret
2018-06-08  9:36           ` Juri Lelli
2018-06-08 10:31             ` Quentin Perret
2018-06-08 12:39           ` Dietmar Eggemann
2018-06-08 13:11             ` Quentin Perret
2018-06-08 16:39               ` Dietmar Eggemann
2018-06-08 17:02                 ` Quentin Perret
2018-06-07 16:04       ` Juri Lelli
2018-06-07 17:31         ` Quentin Perret
2018-06-09  8:13         ` Javi Merino
2018-06-19 11:07   ` Peter Zijlstra
2018-06-19 12:35     ` Quentin Perret
2018-06-19 11:31   ` Peter Zijlstra
2018-06-19 12:40     ` Quentin Perret
2018-06-19 11:34   ` Peter Zijlstra
2018-06-19 12:58     ` Quentin Perret
2018-06-19 13:23       ` Peter Zijlstra
2018-06-19 13:38         ` Quentin Perret
2018-06-19 14:16           ` Peter Zijlstra
2018-06-19 14:21             ` Peter Zijlstra
2018-06-19 14:30               ` Peter Zijlstra
2018-06-19 14:23             ` Quentin Perret
2018-05-21 14:24 ` [RFC PATCH v3 04/10] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-06-19 12:16   ` Peter Zijlstra
2018-06-19 13:06     ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 05/10] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-06-07 14:44   ` Juri Lelli [this message]
2018-06-07 16:02     ` Quentin Perret
2018-06-07 16:29       ` Juri Lelli
2018-06-07 17:26         ` Quentin Perret
2018-06-19 12:26   ` Peter Zijlstra
2018-06-19 13:24     ` Quentin Perret
2018-06-19 16:20       ` Peter Zijlstra
2018-06-19 17:13         ` Quentin Perret
2018-06-19 18:42           ` Peter Zijlstra
2018-06-20  7:58             ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 06/10] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-06-19  7:01   ` Pavan Kondeti
2018-06-19 10:26     ` Dietmar Eggemann
2018-05-21 14:25 ` [RFC PATCH v3 07/10] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-06-08 10:30   ` Juri Lelli
2018-06-19  9:51   ` Pavan Kondeti
2018-06-19  9:53     ` Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 08/10] sched: Lowest energy aware balancing sched_domain level pointer Quentin Perret
2018-05-21 14:25 ` [RFC PATCH v3 09/10] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-06-08 10:24   ` Juri Lelli
2018-06-08 11:19     ` Quentin Perret
2018-06-08 11:59       ` Juri Lelli
2018-06-08 16:26         ` Quentin Perret
2018-06-19  5:06   ` Pavan Kondeti
2018-06-19  7:57     ` Quentin Perret
2018-06-19  8:41       ` Pavan Kondeti
2018-05-21 14:25 ` [RFC PATCH v3 10/10] arch_topology: Start Energy Aware Scheduling Quentin Perret
2018-06-19  9:18   ` Pavan Kondeti
2018-06-19  9:40     ` Quentin Perret
2018-06-19  9:47       ` Juri Lelli
2018-06-19 10:02         ` Quentin Perret
2018-06-19 10:19           ` Juri Lelli
2018-06-19 10:25             ` Quentin Perret
2018-06-19 10:31               ` Juri Lelli
2018-06-19 10:49                 ` Quentin Perret
2018-06-01  9:29 ` [RFC PATCH v3 00/10] " Quentin Perret

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180607144422.GA17216@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=adharmap@quicinc.com \
    --cc=chris.redpath@arm.com \
    --cc=currojerez@riseup.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edubezval@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=javi.merino@kernel.org \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=quentin.perret@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=skannan@quicinc.com \
    --cc=smuckle@google.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=thara.gopinath@linaro.org \
    --cc=tkjos@google.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.