* [PATCH] PM / EM: Inefficient OPPs detection @ 2021-04-08 17:10 Vincent Donnefort 2021-04-08 17:10 ` Vincent Donnefort 0 siblings, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-08 17:10 UTC (permalink / raw) To: peterz, rjw, viresh.kumar, vincent.guittot, qperret Cc: linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann, Vincent Donnefort We (Power team in Arm) are working with an experimental kernel for the Google's Pixel4 to evaluate and improve the current mainline performance and energy consumption on a real life device with Android. The SD855 SoC found in this phone has several OPPs that are inefficient. I.e. despite a lower frequency, they have a greater cost. (That cost being fmax * OPP power / OPP freq). This issue is twofold. First of course, running a specific workload at an inefficient OPP is counterproductive since it wastes wasting energy. But also, inefficient OPPs make a performance domain less appealing for task placement than it really is. We evaluated the change presented here by running 30 iterations of Android PCMark "Work 2.0 Performance". While we did not see any statistically significant performance impact, this change allowed to drastically improve the idle time residency. | Running | WFI [1] | Idle | ------------------------+-------------+-----------+-----------+ Little cluster (4 CPUs) | -0.35% | +0.35% | +0.79% | ------------------------+-------------+-----------+-----------+ Medium cluster (3 CPUs) | -6.3% | -18% | +12% | ------------------------+-------------+-----------+-----------+ Big cluster (1 CPU) | -6.4% | -6.5% | +2.8% | ------------------------+-------------+-----------+-----------+ On the SD855, the inefficient OPPs are found on the little cluster. By removing them from the Energy Model, we make the most efficient CPUs more appealing for task placement, helping to reduce the running time for the medium and big CPUs. Increasing idle time is crucial for this platform due to the substantial energy cost differences among the clusters. Also, despite not appearing in the statistics (the idle driver used here doesn't report it), we can speculate that we also improve the cluster idle time. [1] WFI: Wait for interrupt. Vincent Donnefort (1): PM / EM: Inefficient OPPs detection include/linux/energy_model.h | 131 ++++++++++++++++++++++++++++++++++++--- kernel/power/energy_model.c | 126 +++++++++++++++++++++++++++++++------ kernel/sched/cpufreq_schedutil.c | 4 ++ 3 files changed, 234 insertions(+), 27 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] PM / EM: Inefficient OPPs detection 2021-04-08 17:10 [PATCH] PM / EM: Inefficient OPPs detection Vincent Donnefort @ 2021-04-08 17:10 ` Vincent Donnefort 2021-04-15 13:12 ` Quentin Perret ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Vincent Donnefort @ 2021-04-08 17:10 UTC (permalink / raw) To: peterz, rjw, viresh.kumar, vincent.guittot, qperret Cc: linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann, Vincent Donnefort Some SoCs, such as the sd855 have OPPs within the same performance domain, whose cost is higher than others with a higher frequency. Even though those OPPs are interesting from a cooling perspective, it makes no sense to use them when the device can run at full capacity. Those OPPs handicap the performance domain, when choosing the most energy-efficient CPU and are wasting energy. They are inefficient. Hence, add support for such OPPs to the Energy Model, which creates for each OPP a performance state. The Energy Model can now be read using the regular table, which contains all performance states available, or using an efficient table, where inefficient performance states (and by extension, inefficient OPPs) have been removed. Currently, the efficient table is used in two paths. Schedutil, and find_energy_efficient_cpu(). We have to modify both paths in the same patch so they stay synchronized. The thermal framework still relies on the original table and hence, DevFreq devices won't create the efficient table. As used in the hot-path, the efficient table is a lookup table, generated dynamically when the perf domain is created. The complexity of searching a performance state is hence changed from O(n) to O(1). This also speeds-up em_cpu_energy() even if no inefficient OPPs have been found. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 757fc60..90b9cb0 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -17,19 +17,60 @@ * device). It can be a total power: static and dynamic. * @cost: The cost coefficient associated with this level, used during * energy calculation. Equal to: power * max_frequency / frequency + * @flags: see "em_perf_state flags" description below. */ struct em_perf_state { unsigned long frequency; unsigned long power; unsigned long cost; + unsigned long flags; +}; + +/* + * em_perf_state flags: + * + * EM_PERF_STATE_INEFFICIENT: The performance state is inefficient. There is + * in this em_perf_domain, another performance state with a higher frequency + * but a lower or equal power cost. Such inefficient states are ignored when + * using em_pd_get_efficient_*() functions. + */ +#define EM_PERF_STATE_INEFFICIENT BIT(0) + +/** + * em_efficient_table - Efficient em_perf_state lookup table. + * @table: Lookup table for the efficient em_perf_state + * @min_state: Minimum efficient state for the perf_domain + * @max_state: Maximum state for the perf_domain + * @min_freq: Minimum efficient frequency for the perf_domain + * @max_freq: Maximum frequency for the perf_domain + * @shift: Shift value used to resolve the lookup table + * + * Resolving a frequency to an efficient em_perf_state is as follows: + * + * 1. Check frequency against min_freq and max_freq + * 2. idx = (frequency - min_freq) >> shift; + * 3. idx = table[idx].frequency < frequency ? idx + 1 : idx; + * 4. table[idx] + * + * 3. Intends to resolve undershoot, when an OPP is in the middle of the + * lookup table bin. + */ +struct em_efficient_table { + struct em_perf_state **table; + struct em_perf_state *min_state; + struct em_perf_state *max_state; + unsigned long min_freq; + unsigned long max_freq; + int shift; }; /** * em_perf_domain - Performance domain * @table: List of performance states, in ascending order + * @efficient_table: List of efficient performance states, in a lookup + * table. This is filled only for CPU devices. * @nr_perf_states: Number of performance states - * @milliwatts: Flag indicating the power values are in milli-Watts - * or some other scale. + * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here * for performance reasons to avoid potential cache * misses during energy calculations in the scheduler @@ -43,11 +84,24 @@ struct em_perf_state { */ struct em_perf_domain { struct em_perf_state *table; + struct em_efficient_table efficient_table; int nr_perf_states; - int milliwatts; + int flags; unsigned long cpus[]; }; +/* + * em_perf_domain flags: + * + * EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some + * other scale. + * + * EM_PERF_DOMAIN_INEFFICIENCIES: This perf domain contains inefficient perf + * states. + */ +#define EM_PERF_DOMAIN_MILLIWATTS BIT(0) +#define EM_PERF_DOMAIN_INEFFICIENCIES BIT(1) + #define em_span_cpus(em) (to_cpumask((em)->cpus)) #ifdef CONFIG_ENERGY_MODEL @@ -86,6 +140,63 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, void em_dev_unregister_perf_domain(struct device *dev); /** + * em_pd_get_efficient_state() - Get an efficient performance state from the EM + * @pd : Performance domain for which we want an efficient frequency + * @freq : Frequency to map with the EM + * + * This function must be used only for CPU devices. It is called from the + * scheduler code quite frequently and as a consequence doesn't implement any + * check. + * + * Return: An efficient performance state, high enough to meet @freq + * requirement. + */ +static inline +struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, + unsigned long freq) +{ + struct em_efficient_table *efficients = &pd->efficient_table; + int idx; + + if (freq <= efficients->min_freq) + return efficients->min_state; + + if (freq >= efficients->max_freq) + return efficients->max_state; + + idx = (freq - efficients->min_freq) >> efficients->shift; + + /* Undershoot due to the bin size. Use the higher perf_state */ + if (efficients->table[idx]->frequency < freq) + idx++; + + return efficients->table[idx]; +} + +/** + * em_pd_get_efficient_freq() - Get the efficient frequency from the EM + * @pd : Performance domain for which we want an efficient frequency + * @freq : Frequency to map with the EM + * + * This function will return @freq if no inefficiencies have been found for + * that @pd. This is to avoid a useless lookup table resolution. + * + * Return: An efficient frequency, high enough to meet @freq requirement. + */ +static inline unsigned long em_pd_get_efficient_freq(struct em_perf_domain *pd, + unsigned long freq) +{ + struct em_perf_state *ps; + + if (!pd || !(pd->flags & EM_PERF_DOMAIN_INEFFICIENCIES)) + return freq; + + ps = em_pd_get_efficient_state(pd, freq); + + return ps->frequency; +} + +/** * em_cpu_energy() - Estimates the energy consumed by the CPUs of a performance domain * @pd : performance domain for which energy has to be estimated @@ -104,7 +215,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, { unsigned long freq, scale_cpu; struct em_perf_state *ps; - int i, cpu; + int cpu; if (!sum_util) return 0; @@ -123,11 +234,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - for (i = 0; i < pd->nr_perf_states; i++) { - ps = &pd->table[i]; - if (ps->frequency >= freq) - break; - } + ps = em_pd_get_efficient_state(pd, freq); /* * The capacity of a CPU in the domain at the performance state (ps) @@ -217,6 +324,12 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) { return 0; } + +static inline unsigned long +em_pd_get_efficient_freq(struct em_perf_domain *pd, unsigned long freq) +{ + return freq; +} #endif #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 1358fa4..fcc64eb 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -2,7 +2,7 @@ /* * Energy Model of devices * - * Copyright (c) 2018-2020, Arm ltd. + * Copyright (c) 2018-2021, Arm ltd. * Written by: Quentin Perret, Arm ltd. * Improvements provided by: Lukasz Luba, Arm ltd. */ @@ -42,6 +42,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd) debugfs_create_ulong("frequency", 0444, d, &ps->frequency); debugfs_create_ulong("power", 0444, d, &ps->power); debugfs_create_ulong("cost", 0444, d, &ps->cost); + debugfs_create_ulong("inefficient", 0444, d, &ps->flags); } static int em_debug_cpus_show(struct seq_file *s, void *unused) @@ -55,7 +56,8 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_cpus); static int em_debug_units_show(struct seq_file *s, void *unused) { struct em_perf_domain *pd = s->private; - char *units = pd->milliwatts ? "milliWatts" : "bogoWatts"; + char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ? + "milliWatts" : "bogoWatts"; seq_printf(s, "%s\n", units); @@ -107,7 +109,6 @@ static void em_debug_remove_pd(struct device *dev) {} static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, int nr_states, struct em_data_callback *cb) { - unsigned long opp_eff, prev_opp_eff = ULONG_MAX; unsigned long power, freq, prev_freq = 0; struct em_perf_state *table; int i, ret; @@ -153,18 +154,6 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, table[i].power = power; table[i].frequency = prev_freq = freq; - - /* - * The hertz/watts efficiency ratio should decrease as the - * frequency grows on sane platforms. But this isn't always - * true in practice so warn the user if a higher OPP is more - * power efficient than a lower one. - */ - opp_eff = freq / power; - if (opp_eff >= prev_opp_eff) - dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n", - i, i - 1); - prev_opp_eff = opp_eff; } /* Compute the cost of each performance state. */ @@ -184,12 +173,88 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, return -EINVAL; } +static inline int em_create_efficient_table(struct em_perf_domain *pd) +{ + unsigned long min_freq, max_freq, min_delta = ULONG_MAX; + struct em_perf_state *prev = NULL, *ps, *min_perf_state = NULL; + int i, j, nr_entries, shift = 0; + + max_freq = pd->table[pd->nr_perf_states - 1].frequency; + + for (i = 0 ; i < pd->nr_perf_states; i++) { + ps = &pd->table[i]; + if (ps->flags & EM_PERF_STATE_INEFFICIENT) + continue; + + if (!min_perf_state) + min_perf_state = ps; + + if (prev) { + unsigned long delta = ps->frequency - prev->frequency; + + if (delta < min_delta) + min_delta = delta; + } + + prev = ps; + } + min_freq = min_perf_state->frequency; + + /* + * Use, as a bin size, a power of two. This allows lookup table + * resolution with a quick shift, instead of a division. Also, use a + * minimum of 1024kHz to avoid creating to many entries in the table for + * the very unlikely case where two efficient OPPs have a small + * frequency delta. + */ + min_delta = rounddown_pow_of_two(max(min_delta, 1024UL)); + shift = ilog2(min_delta); + /* +1 for the division remainder below */ + nr_entries = ((max_freq - min_freq) >> shift) + 1; + + pd->efficient_table.table = kcalloc(nr_entries, + sizeof(*pd->efficient_table.table), + GFP_KERNEL); + if (!pd->efficient_table.table) + return -ENOMEM; + + pd->efficient_table.min_freq = min_freq; + pd->efficient_table.min_state = min_perf_state; + pd->efficient_table.max_freq = max_freq; + pd->efficient_table.max_state = &pd->table[pd->nr_perf_states - 1]; + pd->efficient_table.shift = shift; + + for (i = 0; i < nr_entries; i++) { + unsigned long lower_bin_bound = min_freq + ((1 << shift) * i); + + for (j = 0; j < pd->nr_perf_states; j++) { + ps = &pd->table[j]; + + /* + * The first OPP that covers the lower bound of the bin + * is the right one. It ensures we'll never overshoot. + * Undershoot must be handled during the lookup table + * resolution. + */ + if (ps->flags & EM_PERF_STATE_INEFFICIENT || + ps->frequency < lower_bin_bound) + continue; + + pd->efficient_table.table[i] = ps; + break; + } + } + + return 0; +} + static int em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb, cpumask_t *cpus) { struct em_perf_domain *pd; struct device *cpu_dev; - int cpu, ret; + unsigned int prev_cost; + int cpu, ret, i; if (_is_cpu_device(dev)) { pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL); @@ -209,11 +274,35 @@ static int em_create_pd(struct device *dev, int nr_states, return ret; } - if (_is_cpu_device(dev)) + if (_is_cpu_device(dev)) { + /* Identify inefficient perf states */ + i = pd->nr_perf_states - 1; + prev_cost = pd->table[i].cost; + for (--i; i >= 0; i--) { + if (pd->table[i].cost >= prev_cost) { + pd->table[i].flags = EM_PERF_STATE_INEFFICIENT; + pd->flags |= EM_PERF_DOMAIN_INEFFICIENCIES; + dev_dbg(dev, + "EM: pd%d OPP:%lu is inefficient\n", + cpumask_first(to_cpumask(pd->cpus)), + pd->table[i].frequency); + continue; + } + prev_cost = pd->table[i].cost; + } + + ret = em_create_efficient_table(pd); + if (ret) { + kfree(pd->table); + kfree(pd); + return ret; + } + for_each_cpu(cpu, cpus) { cpu_dev = get_cpu_device(cpu); cpu_dev->em_pd = pd; } + } dev->em_pd = pd; @@ -333,7 +422,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, if (ret) goto unlock; - dev->em_pd->milliwatts = milliwatts; + if (milliwatts) + dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS; em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 7cc2e11..3eefd4c 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -10,6 +10,7 @@ #include "sched.h" +#include <linux/energy_model.h> #include <linux/sched/cpufreq.h> #include <trace/events/power.h> @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, freq = map_util_freq(util, freq, max); + /* Avoid inefficient performance states */ + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); + if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) return sg_policy->next_freq; -- 2.7.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-08 17:10 ` Vincent Donnefort @ 2021-04-15 13:12 ` Quentin Perret 2021-04-15 14:12 ` Vincent Donnefort 2021-04-22 15:36 ` Vincent Donnefort 2021-04-15 13:16 ` Quentin Perret 2021-04-22 17:26 ` Lukasz Luba 2 siblings, 2 replies; 20+ messages in thread From: Quentin Perret @ 2021-04-15 13:12 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann Hi Vincent, On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > Some SoCs, such as the sd855 have OPPs within the same performance domain, > whose cost is higher than others with a higher frequency. Even though > those OPPs are interesting from a cooling perspective, it makes no sense > to use them when the device can run at full capacity. Those OPPs handicap > the performance domain, when choosing the most energy-efficient CPU and > are wasting energy. They are inefficient. > > Hence, add support for such OPPs to the Energy Model, which creates for > each OPP a performance state. The Energy Model can now be read using the > regular table, which contains all performance states available, or using > an efficient table, where inefficient performance states (and by > extension, inefficient OPPs) have been removed. > > Currently, the efficient table is used in two paths. Schedutil, and > find_energy_efficient_cpu(). We have to modify both paths in the same > patch so they stay synchronized. The thermal framework still relies on > the original table and hence, DevFreq devices won't create the efficient > table. > > As used in the hot-path, the efficient table is a lookup table, generated > dynamically when the perf domain is created. The complexity of searching > a performance state is hence changed from O(n) to O(1). This also > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. Interesting. Do you have measurements showing the benefits on wake-up duration? I remember doing so by hacking the wake-up path to force tasks into feec()/compute_energy() even when overutilized, and then running hackbench. Maybe something like that would work for you? Just want to make sure we actually need all that complexity -- while it's good to reduce the asymptotic complexity, we're looking at a rather small problem (max 30 OPPs or so I expect?), so other effects may be dominating. Simply skipping inefficient OPPs could be implemented in a much simpler way I think. Thanks, Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 13:12 ` Quentin Perret @ 2021-04-15 14:12 ` Vincent Donnefort 2021-04-15 15:04 ` Quentin Perret 2021-04-22 15:36 ` Vincent Donnefort 1 sibling, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-15 14:12 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thu, Apr 15, 2021 at 01:12:05PM +0000, Quentin Perret wrote: > Hi Vincent, > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > Some SoCs, such as the sd855 have OPPs within the same performance domain, > > whose cost is higher than others with a higher frequency. Even though > > those OPPs are interesting from a cooling perspective, it makes no sense > > to use them when the device can run at full capacity. Those OPPs handicap > > the performance domain, when choosing the most energy-efficient CPU and > > are wasting energy. They are inefficient. > > > > Hence, add support for such OPPs to the Energy Model, which creates for > > each OPP a performance state. The Energy Model can now be read using the > > regular table, which contains all performance states available, or using > > an efficient table, where inefficient performance states (and by > > extension, inefficient OPPs) have been removed. > > > > Currently, the efficient table is used in two paths. Schedutil, and > > find_energy_efficient_cpu(). We have to modify both paths in the same > > patch so they stay synchronized. The thermal framework still relies on > > the original table and hence, DevFreq devices won't create the efficient > > table. > > > > As used in the hot-path, the efficient table is a lookup table, generated > > dynamically when the perf domain is created. The complexity of searching > > a performance state is hence changed from O(n) to O(1). This also > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > Interesting. Do you have measurements showing the benefits on wake-up > duration? I remember doing so by hacking the wake-up path to force tasks > into feec()/compute_energy() even when overutilized, and then running > hackbench. Maybe something like that would work for you? I'll give a try and see if I get improved numbers. > > Just want to make sure we actually need all that complexity -- while > it's good to reduce the asymptotic complexity, we're looking at a rather > small problem (max 30 OPPs or so I expect?), so other effects may be > dominating. Simply skipping inefficient OPPs could be implemented in a > much simpler way I think. I could indeed just skip the perf state if marked as ineffective. But the idea was to avoid bringing another for loop in this hot-path. Also, not covered by this patch but probably we could get rid of the EM complexity limit as the table resolution is way faster with this change. > > Thanks, > Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 14:12 ` Vincent Donnefort @ 2021-04-15 15:04 ` Quentin Perret 2021-04-15 15:27 ` Vincent Donnefort 0 siblings, 1 reply; 20+ messages in thread From: Quentin Perret @ 2021-04-15 15:04 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 15 Apr 2021 at 15:12:08 (+0100), Vincent Donnefort wrote: > On Thu, Apr 15, 2021 at 01:12:05PM +0000, Quentin Perret wrote: > > Hi Vincent, > > > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > Some SoCs, such as the sd855 have OPPs within the same performance domain, > > > whose cost is higher than others with a higher frequency. Even though > > > those OPPs are interesting from a cooling perspective, it makes no sense > > > to use them when the device can run at full capacity. Those OPPs handicap > > > the performance domain, when choosing the most energy-efficient CPU and > > > are wasting energy. They are inefficient. > > > > > > Hence, add support for such OPPs to the Energy Model, which creates for > > > each OPP a performance state. The Energy Model can now be read using the > > > regular table, which contains all performance states available, or using > > > an efficient table, where inefficient performance states (and by > > > extension, inefficient OPPs) have been removed. > > > > > > Currently, the efficient table is used in two paths. Schedutil, and > > > find_energy_efficient_cpu(). We have to modify both paths in the same > > > patch so they stay synchronized. The thermal framework still relies on > > > the original table and hence, DevFreq devices won't create the efficient > > > table. > > > > > > As used in the hot-path, the efficient table is a lookup table, generated > > > dynamically when the perf domain is created. The complexity of searching > > > a performance state is hence changed from O(n) to O(1). This also > > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > > > Interesting. Do you have measurements showing the benefits on wake-up > > duration? I remember doing so by hacking the wake-up path to force tasks > > into feec()/compute_energy() even when overutilized, and then running > > hackbench. Maybe something like that would work for you? > > I'll give a try and see if I get improved numbers. > > > > > Just want to make sure we actually need all that complexity -- while > > it's good to reduce the asymptotic complexity, we're looking at a rather > > small problem (max 30 OPPs or so I expect?), so other effects may be > > dominating. Simply skipping inefficient OPPs could be implemented in a > > much simpler way I think. > > I could indeed just skip the perf state if marked as ineffective. But the idea > was to avoid bringing another for loop in this hot-path. Right, though it would just extend a little bit the existing loop, so the overhead is unlikely to be noticeable. > Also, not covered by this patch but probably we could get rid of the EM > complexity limit as the table resolution is way faster with this change. Probably yeah. I was considering removing it since eb92692b2544 ("sched/fair: Speed-up energy-aware wake-ups") but ended up keeping it as it's entirely untested on large systems. But maybe we can reconsider. Thanks, Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 15:04 ` Quentin Perret @ 2021-04-15 15:27 ` Vincent Donnefort 0 siblings, 0 replies; 20+ messages in thread From: Vincent Donnefort @ 2021-04-15 15:27 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thu, Apr 15, 2021 at 03:04:34PM +0000, Quentin Perret wrote: > On Thursday 15 Apr 2021 at 15:12:08 (+0100), Vincent Donnefort wrote: > > On Thu, Apr 15, 2021 at 01:12:05PM +0000, Quentin Perret wrote: > > > Hi Vincent, > > > > > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > > Some SoCs, such as the sd855 have OPPs within the same performance domain, > > > > whose cost is higher than others with a higher frequency. Even though > > > > those OPPs are interesting from a cooling perspective, it makes no sense > > > > to use them when the device can run at full capacity. Those OPPs handicap > > > > the performance domain, when choosing the most energy-efficient CPU and > > > > are wasting energy. They are inefficient. > > > > > > > > Hence, add support for such OPPs to the Energy Model, which creates for > > > > each OPP a performance state. The Energy Model can now be read using the > > > > regular table, which contains all performance states available, or using > > > > an efficient table, where inefficient performance states (and by > > > > extension, inefficient OPPs) have been removed. > > > > > > > > Currently, the efficient table is used in two paths. Schedutil, and > > > > find_energy_efficient_cpu(). We have to modify both paths in the same > > > > patch so they stay synchronized. The thermal framework still relies on > > > > the original table and hence, DevFreq devices won't create the efficient > > > > table. > > > > > > > > As used in the hot-path, the efficient table is a lookup table, generated > > > > dynamically when the perf domain is created. The complexity of searching > > > > a performance state is hence changed from O(n) to O(1). This also > > > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > > > > > Interesting. Do you have measurements showing the benefits on wake-up > > > duration? I remember doing so by hacking the wake-up path to force tasks > > > into feec()/compute_energy() even when overutilized, and then running > > > hackbench. Maybe something like that would work for you? > > > > I'll give a try and see if I get improved numbers. > > > > > > > > Just want to make sure we actually need all that complexity -- while > > > it's good to reduce the asymptotic complexity, we're looking at a rather > > > small problem (max 30 OPPs or so I expect?), so other effects may be > > > dominating. Simply skipping inefficient OPPs could be implemented in a > > > much simpler way I think. > > > > I could indeed just skip the perf state if marked as ineffective. But the idea > > was to avoid bringing another for loop in this hot-path. > > Right, though it would just extend a little bit the existing loop, so > the overhead is unlikely to be noticeable. In the case where we let cpufreq_table resolution, it's a whole new loop that we would bring. In the case where we rely only on the EM resolution and bypass the cpufreq_table though it would be even. But with the look-up table, we're winning everywhere :-) Anyway I'll see if I can measure any improvement here. -- Vincent > > > Also, not covered by this patch but probably we could get rid of the EM > > complexity limit as the table resolution is way faster with this change. > > Probably yeah. I was considering removing it since eb92692b2544 > ("sched/fair: Speed-up energy-aware wake-ups") but ended up keeping it > as it's entirely untested on large systems. But maybe we can reconsider. > > Thanks, > Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 13:12 ` Quentin Perret 2021-04-15 14:12 ` Vincent Donnefort @ 2021-04-22 15:36 ` Vincent Donnefort 2021-04-23 16:14 ` Quentin Perret 1 sibling, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-22 15:36 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann > > As used in the hot-path, the efficient table is a lookup table, generated > > dynamically when the perf domain is created. The complexity of searching > > a performance state is hence changed from O(n) to O(1). This also > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > Interesting. Do you have measurements showing the benefits on wake-up > duration? I remember doing so by hacking the wake-up path to force tasks > into feec()/compute_energy() even when overutilized, and then running > hackbench. Maybe something like that would work for you? > > Just want to make sure we actually need all that complexity -- while > it's good to reduce the asymptotic complexity, we're looking at a rather > small problem (max 30 OPPs or so I expect?), so other effects may be > dominating. Simply skipping inefficient OPPs could be implemented in a > much simpler way I think. > > Thanks, > Quentin On the Pixel4, I used rt-app to generate a task whom duty cycle is getting higher for each phase. Then for each rt-app task placement, I measured how long find_energy_efficient_cpu() took to run. I repeated the operation several times to increase the count. Here's what I've got: ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐ │ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │ │ │ │ ├────────┬───────┼───────┬───────┤ Diff │ │ │ │ │ Mean │ count │ Mean │ count │ │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │ │ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤ │ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │ ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ │ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │ └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘ Notice: * The CPU column describes which CPU ran the find_energy_efficient() function. * I modified my patch so that no inefficient OPPs are reported. This is to have a fairer comparison between the original table walk and the lookup table. * I removed from the table results that didn't have enough count to be statistically significant. -- Vincent. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-22 15:36 ` Vincent Donnefort @ 2021-04-23 16:14 ` Quentin Perret 2021-04-28 14:46 ` Vincent Donnefort 0 siblings, 1 reply; 20+ messages in thread From: Quentin Perret @ 2021-04-23 16:14 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 22 Apr 2021 at 16:36:44 (+0100), Vincent Donnefort wrote: > > > As used in the hot-path, the efficient table is a lookup table, generated > > > dynamically when the perf domain is created. The complexity of searching > > > a performance state is hence changed from O(n) to O(1). This also > > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > > > Interesting. Do you have measurements showing the benefits on wake-up > > duration? I remember doing so by hacking the wake-up path to force tasks > > into feec()/compute_energy() even when overutilized, and then running > > hackbench. Maybe something like that would work for you? > > > > Just want to make sure we actually need all that complexity -- while > > it's good to reduce the asymptotic complexity, we're looking at a rather > > small problem (max 30 OPPs or so I expect?), so other effects may be > > dominating. Simply skipping inefficient OPPs could be implemented in a > > much simpler way I think. > > > > Thanks, > > Quentin > > On the Pixel4, I used rt-app to generate a task whom duty cycle is getting > higher for each phase. Then for each rt-app task placement, I measured how long > find_energy_efficient_cpu() took to run. I repeated the operation several > times to increase the count. Here's what I've got: > > ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐ > │ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │ > │ │ │ ├────────┬───────┼───────┬───────┤ Diff │ > │ │ │ │ Mean │ count │ Mean │ count │ │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │ > │ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │ > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > │ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │ > └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘ Thanks for that. Do you have the stddev handy? > Notice: > > * The CPU column describes which CPU ran the find_energy_efficient() > function. > > * I modified my patch so that no inefficient OPPs are reported. This is to > have a fairer comparison between the original table walk and the lookup > table. You mean to avoid the impact of the frequency selection itself? Maybe pinning the frequencies in the cpufreq policy could do? > > * I removed from the table results that didn't have enough count to be > statistically significant. Anyways, this looks like a small but consistent gain throughout, so it's a win for the LUT :) Thanks, Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-23 16:14 ` Quentin Perret @ 2021-04-28 14:46 ` Vincent Donnefort 2021-05-20 11:12 ` Quentin Perret 0 siblings, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-28 14:46 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann > > > > On the Pixel4, I used rt-app to generate a task whom duty cycle is getting > > higher for each phase. Then for each rt-app task placement, I measured how long > > find_energy_efficient_cpu() took to run. I repeated the operation several > > times to increase the count. Here's what I've got: > > > > ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐ > > │ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │ > > │ │ │ ├────────┬───────┼───────┬───────┤ Diff │ > > │ │ │ │ Mean │ count │ Mean │ count │ │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │ > > │ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │ > > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤ > > │ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │ > > └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘ > > Thanks for that. Do you have the stddev handy? I do, it shows that the distribution is quite wide. I also have a 95% confidence interval, as follow: w/o LUT W/ LUT Mean std Mean std Phase0: 10791+/-79 2262 10657+/-71 2240 [1] Phase1: 2924 +/-19 529 2894 +/-16 513 [1] Phase2: 2207 +/-19 535 2162 +/-17 515 Phase3: 1897 +/-18 504 1864 +/-17 515 [1] Phase4: Mid CPU 1700 +/-46 463 1609 +/-26 458 Phase4: Big CPU 1187 +/-15 407 1129 +/-15 385 Phase5: 987 +/-14 395 900 +/-12 365 [1] I included these results originally as the p-value for the test I used showed we can reject confidently the null hypothesis that the 2 samples are coming from the same distribution... However, the confidence intervals for the mean overlaps. It is then complicated to conclude for those phases. Interestingly it shows the distribution is slightly more narrow with the LUT. I suppose due to the fact the LUT is less relying on caches than the original table walk is. > > > Notice: > > > > * The CPU column describes which CPU ran the find_energy_efficient() > > function. > > > > * I modified my patch so that no inefficient OPPs are reported. This is to > > have a fairer comparison between the original table walk and the lookup > > table. > > You mean to avoid the impact of the frequency selection itself? Maybe > pinning the frequencies in the cpufreq policy could do? Yes, it could have worked too, maybe it would have even been better, as it would have removed the running frequency variations. > > > > > * I removed from the table results that didn't have enough count to be > > statistically significant. > > > Anyways, this looks like a small but consistent gain throughout, so it's a > win for the LUT :) > > Thanks, > Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-28 14:46 ` Vincent Donnefort @ 2021-05-20 11:12 ` Quentin Perret 0 siblings, 0 replies; 20+ messages in thread From: Quentin Perret @ 2021-05-20 11:12 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann Hi Vincent, Apologies for the late reply. On Wednesday 28 Apr 2021 at 15:46:10 (+0100), Vincent Donnefort wrote: > I do, it shows that the distribution is quite wide. I also have a 95% confidence > interval, as follow: > w/o LUT W/ LUT > > Mean std Mean std > > Phase0: 10791+/-79 2262 10657+/-71 2240 [1] > Phase1: 2924 +/-19 529 2894 +/-16 513 [1] > Phase2: 2207 +/-19 535 2162 +/-17 515 > Phase3: 1897 +/-18 504 1864 +/-17 515 [1] > Phase4: Mid CPU 1700 +/-46 463 1609 +/-26 458 > Phase4: Big CPU 1187 +/-15 407 1129 +/-15 385 > Phase5: 987 +/-14 395 900 +/-12 365 > > > [1] I included these results originally as the p-value for the test I used > showed we can reject confidently the null hypothesis that the 2 samples are > coming from the same distribution... However, the confidence intervals for > the mean overlaps. It is then complicated to conclude for those phases. Aha, thanks for sharing. So yes, it's not all that obvious that we need the extra complexity of the LUT there :/. In this case I'd be inclined to suggest a simpler version first, and we can always optimize later. How much would you hate something like the below (totally untested)? We still end up with the double lookup in sugov, but as you've pointed out in another reply we can't easily workaround, and as per the above the linear search isn't that bad compared to the LUT. Maybe we could try a few micro-optimizations on top (e.g. binary searching the EM table), but IIRC that wasn't making much of difference last time I tried. Thoughts? Thanks, Quentin --- diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 757fc60658fa..8320f5e87992 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -22,6 +22,7 @@ struct em_perf_state { unsigned long frequency; unsigned long power; unsigned long cost; + unsigned int inefficient; }; /** @@ -85,6 +86,25 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, bool milliwatts); void em_dev_unregister_perf_domain(struct device *dev); +static inline struct em_perf_state *__em_find_ps(struct em_perf_domain *pd, unsigned long freq) +{ + struct em_perf_state *ps; + int i; + + for (i = 0; i < pd->nr_perf_states; i++) { + ps = &pd->table[i]; + if (ps->frequency >= freq && !ps->inefficient) + break; + } + + return ps; +} + +static inline unsigned long em_cpu_find_freq(struct em_perf_domain *pd, unsigned long freq) +{ + return pd ? __em_find_ps(pd, freq)->frequency : freq; +} + /** * em_cpu_energy() - Estimates the energy consumed by the CPUs of a performance domain @@ -104,7 +124,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, { unsigned long freq, scale_cpu; struct em_perf_state *ps; - int i, cpu; + int cpu; if (!sum_util) return 0; @@ -118,16 +138,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, scale_cpu = arch_scale_cpu_capacity(cpu); ps = &pd->table[pd->nr_perf_states - 1]; freq = map_util_freq(max_util, ps->frequency, scale_cpu); - - /* - * Find the lowest performance state of the Energy Model above the - * requested frequency. - */ - for (i = 0; i < pd->nr_perf_states; i++) { - ps = &pd->table[i]; - if (ps->frequency >= freq) - break; - } + ps = __em_find_ps(pd, freq); /* * The capacity of a CPU in the domain at the performance state (ps) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 0f4530b3a8cd..990048e9ec79 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -161,7 +161,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, * power efficient than a lower one. */ opp_eff = freq / power; - if (opp_eff >= prev_opp_eff) + table[i].inefficient = (opp_eff >= prev_opp_eff); + if (table[i].inefficient) dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n", i, i - 1); prev_opp_eff = opp_eff; diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 4f09afd2f321..9186d52d8660 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -10,6 +10,7 @@ #include "sched.h" +#include <linux/energy_model.h> #include <linux/sched/cpufreq.h> #include <trace/events/power.h> @@ -42,6 +43,8 @@ struct sugov_policy { bool limits_changed; bool need_freq_update; + + struct em_perf_domain *pd; }; struct sugov_cpu { @@ -152,6 +155,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, policy->cpuinfo.max_freq : policy->cur; freq = map_util_freq(util, freq, max); + freq = em_cpu_find_freq(sg_policy->pd, freq); if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) return sg_policy->next_freq; @@ -756,6 +760,7 @@ static int sugov_start(struct cpufreq_policy *policy) sg_policy->cached_raw_freq = 0; sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS); + sg_policy->pd = em_cpu_get(policy->cpu); for_each_cpu(cpu, policy->cpus) { struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu); ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-08 17:10 ` Vincent Donnefort 2021-04-15 13:12 ` Quentin Perret @ 2021-04-15 13:16 ` Quentin Perret 2021-04-15 14:34 ` Vincent Donnefort 2021-04-22 17:26 ` Lukasz Luba 2 siblings, 1 reply; 20+ messages in thread From: Quentin Perret @ 2021-04-15 13:16 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -10,6 +10,7 @@ > > #include "sched.h" > > +#include <linux/energy_model.h> > #include <linux/sched/cpufreq.h> > #include <trace/events/power.h> > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > freq = map_util_freq(util, freq, max); > > + /* Avoid inefficient performance states */ > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); I remember this was discussed when Douglas sent his patches some time ago, but I still find it sad we index the EM table here but still re-index the cpufreq frequency table later :/ Yes in your case this lookup is very inexpensive, but still. EAS relies on the EM's table matching cpufreq's accurately, so this second lookup still feels rather unnecessary ... > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) > return sg_policy->next_freq; > > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 13:16 ` Quentin Perret @ 2021-04-15 14:34 ` Vincent Donnefort 2021-04-15 14:59 ` Quentin Perret 0 siblings, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-15 14:34 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -10,6 +10,7 @@ > > > > #include "sched.h" > > > > +#include <linux/energy_model.h> > > #include <linux/sched/cpufreq.h> > > #include <trace/events/power.h> > > > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > > freq = map_util_freq(util, freq, max); > > > > + /* Avoid inefficient performance states */ > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); > > I remember this was discussed when Douglas sent his patches some time > ago, but I still find it sad we index the EM table here but still > re-index the cpufreq frequency table later :/ > > Yes in your case this lookup is very inexpensive, but still. EAS relies > on the EM's table matching cpufreq's accurately, so this second lookup > still feels rather unnecessary ... To get only a single lookup, we would need to bring the inefficiency knowledge directly to the cpufreq framework. But it has its own limitations: The cpufreq driver can have its own resolve_freq() callback, which means that not all the drivers would benefit from that feature. The cpufreq_table can be ordered and accessed in several ways which brings many combinations that would need to be supported, ending-up with something much more intrusive. (We can though decide to limit the feature to the low to high access that schedutil needs). As the EM needs schedutil to exist anyway, it seemed to be the right place for this code. It allows any cpufreq driver to benefit from the feature, simplify a potential extension for a usage by devfreq devices and as a bonus it speeds-up energy computing, allowing a more complex Energy Model. > > > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) > > return sg_policy->next_freq; > > > > -- > > 2.7.4 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 14:34 ` Vincent Donnefort @ 2021-04-15 14:59 ` Quentin Perret 2021-04-15 15:05 ` Quentin Perret 2021-04-15 15:14 ` Vincent Donnefort 0 siblings, 2 replies; 20+ messages in thread From: Quentin Perret @ 2021-04-15 14:59 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote: > On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -10,6 +10,7 @@ > > > > > > #include "sched.h" > > > > > > +#include <linux/energy_model.h> > > > #include <linux/sched/cpufreq.h> > > > #include <trace/events/power.h> > > > > > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > > > > freq = map_util_freq(util, freq, max); > > > > > > + /* Avoid inefficient performance states */ > > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); > > > > I remember this was discussed when Douglas sent his patches some time > > ago, but I still find it sad we index the EM table here but still > > re-index the cpufreq frequency table later :/ > > > > Yes in your case this lookup is very inexpensive, but still. EAS relies > > on the EM's table matching cpufreq's accurately, so this second lookup > > still feels rather unnecessary ... > > To get only a single lookup, we would need to bring the inefficiency knowledge > directly to the cpufreq framework. But it has its own limitations: > > The cpufreq driver can have its own resolve_freq() callback, which means that > not all the drivers would benefit from that feature. > > The cpufreq_table can be ordered and accessed in several ways which brings > many combinations that would need to be supported, ending-up with something > much more intrusive. (We can though decide to limit the feature to the low to > high access that schedutil needs). > > As the EM needs schedutil to exist anyway, it seemed to be the right place for > this code. It allows any cpufreq driver to benefit from the feature, simplify a > potential extension for a usage by devfreq devices and as a bonus it speeds-up > energy computing, allowing a more complex Energy Model. I was thinking of something a bit simpler. cpufreq_driver_resolve_freq appears to be used only from schedutil (why is it even then?), so we could just pull it into cpufreq_schedutil.c and just plain skip the call to cpufreq_frequency_table_target if the target freq has been indexed in the EM table -- it should already be matching a real OPP. Thoughts? Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 14:59 ` Quentin Perret @ 2021-04-15 15:05 ` Quentin Perret 2021-04-15 15:14 ` Vincent Donnefort 1 sibling, 0 replies; 20+ messages in thread From: Quentin Perret @ 2021-04-15 15:05 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 15 Apr 2021 at 14:59:54 (+0000), Quentin Perret wrote: > On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote: > > On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: > > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > @@ -10,6 +10,7 @@ > > > > > > > > #include "sched.h" > > > > > > > > +#include <linux/energy_model.h> > > > > #include <linux/sched/cpufreq.h> > > > > #include <trace/events/power.h> > > > > > > > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > > > > > > freq = map_util_freq(util, freq, max); > > > > > > > > + /* Avoid inefficient performance states */ > > > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); > > > > > > I remember this was discussed when Douglas sent his patches some time > > > ago, but I still find it sad we index the EM table here but still > > > re-index the cpufreq frequency table later :/ > > > > > > Yes in your case this lookup is very inexpensive, but still. EAS relies > > > on the EM's table matching cpufreq's accurately, so this second lookup > > > still feels rather unnecessary ... > > > > To get only a single lookup, we would need to bring the inefficiency knowledge > > directly to the cpufreq framework. But it has its own limitations: > > > > The cpufreq driver can have its own resolve_freq() callback, which means that > > not all the drivers would benefit from that feature. > > > > The cpufreq_table can be ordered and accessed in several ways which brings > > many combinations that would need to be supported, ending-up with something > > much more intrusive. (We can though decide to limit the feature to the low to > > high access that schedutil needs). > > > > As the EM needs schedutil to exist anyway, it seemed to be the right place for > > this code. It allows any cpufreq driver to benefit from the feature, simplify a > > potential extension for a usage by devfreq devices and as a bonus it speeds-up > > energy computing, allowing a more complex Energy Model. > > I was thinking of something a bit simpler. cpufreq_driver_resolve_freq > appears to be used only from schedutil (why is it even then?), so we why is it even *exported* then ... > could just pull it into cpufreq_schedutil.c and just plain skip the call > to cpufreq_frequency_table_target if the target freq has been indexed in > the EM table -- it should already be matching a real OPP. > > Thoughts? > Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 14:59 ` Quentin Perret 2021-04-15 15:05 ` Quentin Perret @ 2021-04-15 15:14 ` Vincent Donnefort 2021-04-15 15:20 ` Quentin Perret 1 sibling, 1 reply; 20+ messages in thread From: Vincent Donnefort @ 2021-04-15 15:14 UTC (permalink / raw) To: Quentin Perret Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thu, Apr 15, 2021 at 02:59:54PM +0000, Quentin Perret wrote: > On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote: > > On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: > > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > @@ -10,6 +10,7 @@ > > > > > > > > #include "sched.h" > > > > > > > > +#include <linux/energy_model.h> > > > > #include <linux/sched/cpufreq.h> > > > > #include <trace/events/power.h> > > > > > > > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > > > > > > freq = map_util_freq(util, freq, max); > > > > > > > > + /* Avoid inefficient performance states */ > > > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); > > > > > > I remember this was discussed when Douglas sent his patches some time > > > ago, but I still find it sad we index the EM table here but still > > > re-index the cpufreq frequency table later :/ > > > > > > Yes in your case this lookup is very inexpensive, but still. EAS relies > > > on the EM's table matching cpufreq's accurately, so this second lookup > > > still feels rather unnecessary ... > > > > To get only a single lookup, we would need to bring the inefficiency knowledge > > directly to the cpufreq framework. But it has its own limitations: > > > > The cpufreq driver can have its own resolve_freq() callback, which means that > > not all the drivers would benefit from that feature. > > > > The cpufreq_table can be ordered and accessed in several ways which brings > > many combinations that would need to be supported, ending-up with something > > much more intrusive. (We can though decide to limit the feature to the low to > > high access that schedutil needs). > > > > As the EM needs schedutil to exist anyway, it seemed to be the right place for > > this code. It allows any cpufreq driver to benefit from the feature, simplify a > > potential extension for a usage by devfreq devices and as a bonus it speeds-up > > energy computing, allowing a more complex Energy Model. > > I was thinking of something a bit simpler. cpufreq_driver_resolve_freq > appears to be used only from schedutil (why is it even then?), so we > could just pull it into cpufreq_schedutil.c and just plain skip the call > to cpufreq_frequency_table_target if the target freq has been indexed in > the EM table -- it should already be matching a real OPP. > > Thoughts? > Quentin Can try that for a V2. That means em_pd_get_efficient_freq() would have to know about policy clamping (but I don't think that's an issue) and probably we still have to do the frequency resolution if the driver declared the resolve_freq callback? -- Vincent ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 15:14 ` Vincent Donnefort @ 2021-04-15 15:20 ` Quentin Perret 2021-04-15 15:32 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Quentin Perret @ 2021-04-15 15:20 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, lukasz.luba, dietmar.eggemann On Thursday 15 Apr 2021 at 16:14:46 (+0100), Vincent Donnefort wrote: > On Thu, Apr 15, 2021 at 02:59:54PM +0000, Quentin Perret wrote: > > On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote: > > > On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: > > > > On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: > > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > > @@ -10,6 +10,7 @@ > > > > > > > > > > #include "sched.h" > > > > > > > > > > +#include <linux/energy_model.h> > > > > > #include <linux/sched/cpufreq.h> > > > > > #include <trace/events/power.h> > > > > > > > > > > @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > > > > > > > > > freq = map_util_freq(util, freq, max); > > > > > > > > > > + /* Avoid inefficient performance states */ > > > > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); > > > > > > > > I remember this was discussed when Douglas sent his patches some time > > > > ago, but I still find it sad we index the EM table here but still > > > > re-index the cpufreq frequency table later :/ > > > > > > > > Yes in your case this lookup is very inexpensive, but still. EAS relies > > > > on the EM's table matching cpufreq's accurately, so this second lookup > > > > still feels rather unnecessary ... > > > > > > To get only a single lookup, we would need to bring the inefficiency knowledge > > > directly to the cpufreq framework. But it has its own limitations: > > > > > > The cpufreq driver can have its own resolve_freq() callback, which means that > > > not all the drivers would benefit from that feature. > > > > > > The cpufreq_table can be ordered and accessed in several ways which brings > > > many combinations that would need to be supported, ending-up with something > > > much more intrusive. (We can though decide to limit the feature to the low to > > > high access that schedutil needs). > > > > > > As the EM needs schedutil to exist anyway, it seemed to be the right place for > > > this code. It allows any cpufreq driver to benefit from the feature, simplify a > > > potential extension for a usage by devfreq devices and as a bonus it speeds-up > > > energy computing, allowing a more complex Energy Model. > > > > I was thinking of something a bit simpler. cpufreq_driver_resolve_freq > > appears to be used only from schedutil (why is it even then?), so we > > could just pull it into cpufreq_schedutil.c and just plain skip the call > > to cpufreq_frequency_table_target if the target freq has been indexed in > > the EM table -- it should already be matching a real OPP. > > > > Thoughts? > > Quentin > > Can try that for a V2. That means em_pd_get_efficient_freq() would have to > know about policy clamping (but I don't think that's an issue) Indeed, and I think we can even see this as an improvement as EAS will now see policy clamps as well in compute_energy(). > and probably > we still have to do the frequency resolution if the driver declared the > resolve_freq callback? Yep, looks like this is unavoidable. Thanks, Quentin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 15:20 ` Quentin Perret @ 2021-04-15 15:32 ` Lukasz Luba 2021-04-15 15:43 ` Quentin Perret 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2021-04-15 15:32 UTC (permalink / raw) To: Quentin Perret Cc: Vincent Donnefort, peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, dietmar.eggemann Hi Quentin, On 4/15/21 4:20 PM, Quentin Perret wrote: > On Thursday 15 Apr 2021 at 16:14:46 (+0100), Vincent Donnefort wrote: >> On Thu, Apr 15, 2021 at 02:59:54PM +0000, Quentin Perret wrote: >>> On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote: >>>> On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote: >>>>> On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote: >>>>>> --- a/kernel/sched/cpufreq_schedutil.c >>>>>> +++ b/kernel/sched/cpufreq_schedutil.c >>>>>> @@ -10,6 +10,7 @@ >>>>>> >>>>>> #include "sched.h" >>>>>> >>>>>> +#include <linux/energy_model.h> >>>>>> #include <linux/sched/cpufreq.h> >>>>>> #include <trace/events/power.h> >>>>>> >>>>>> @@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, >>>>>> >>>>>> freq = map_util_freq(util, freq, max); >>>>>> >>>>>> + /* Avoid inefficient performance states */ >>>>>> + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq); >>>>> >>>>> I remember this was discussed when Douglas sent his patches some time >>>>> ago, but I still find it sad we index the EM table here but still >>>>> re-index the cpufreq frequency table later :/ >>>>> >>>>> Yes in your case this lookup is very inexpensive, but still. EAS relies >>>>> on the EM's table matching cpufreq's accurately, so this second lookup >>>>> still feels rather unnecessary ... >>>> >>>> To get only a single lookup, we would need to bring the inefficiency knowledge >>>> directly to the cpufreq framework. But it has its own limitations: >>>> >>>> The cpufreq driver can have its own resolve_freq() callback, which means that >>>> not all the drivers would benefit from that feature. >>>> >>>> The cpufreq_table can be ordered and accessed in several ways which brings >>>> many combinations that would need to be supported, ending-up with something >>>> much more intrusive. (We can though decide to limit the feature to the low to >>>> high access that schedutil needs). >>>> >>>> As the EM needs schedutil to exist anyway, it seemed to be the right place for >>>> this code. It allows any cpufreq driver to benefit from the feature, simplify a >>>> potential extension for a usage by devfreq devices and as a bonus it speeds-up >>>> energy computing, allowing a more complex Energy Model. >>> >>> I was thinking of something a bit simpler. cpufreq_driver_resolve_freq >>> appears to be used only from schedutil (why is it even then?), so we >>> could just pull it into cpufreq_schedutil.c and just plain skip the call >>> to cpufreq_frequency_table_target if the target freq has been indexed in >>> the EM table -- it should already be matching a real OPP. >>> >>> Thoughts? >>> Quentin >> >> Can try that for a V2. That means em_pd_get_efficient_freq() would have to >> know about policy clamping (but I don't think that's an issue) > > Indeed, and I think we can even see this as an improvement as EAS will > now see policy clamps as well in compute_energy(). Are you sure that the 'policy' can be accessed from compute_energy()? It can be from schedutil freq switch path, but I'm not use about our feec().. For me this cpufreq_driver_resolve_freq sounds a bit out of this patch subject. Regards, Lukasz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 15:32 ` Lukasz Luba @ 2021-04-15 15:43 ` Quentin Perret 2021-04-28 13:28 ` Vincent Donnefort 0 siblings, 1 reply; 20+ messages in thread From: Quentin Perret @ 2021-04-15 15:43 UTC (permalink / raw) To: Lukasz Luba Cc: Vincent Donnefort, peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, dietmar.eggemann On Thursday 15 Apr 2021 at 16:32:31 (+0100), Lukasz Luba wrote: > Are you sure that the 'policy' can be accessed from compute_energy()? > It can be from schedutil freq switch path, but I'm not use about our > feec().. Right, I was just looking at cpufreq_cpu_get() and we'll have locking issue in the wake-up path :/ So maybe making feec() aware of policy caps is for later ... > For me this cpufreq_driver_resolve_freq sounds a bit out of this patch > subject. Not sure I agree -- if we're going to index the EM table from schedutil it should be integrated nicely if possible. Thanks ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-15 15:43 ` Quentin Perret @ 2021-04-28 13:28 ` Vincent Donnefort 0 siblings, 0 replies; 20+ messages in thread From: Vincent Donnefort @ 2021-04-28 13:28 UTC (permalink / raw) To: Quentin Perret Cc: Lukasz Luba, peterz, rjw, viresh.kumar, vincent.guittot, linux-kernel, ionela.voinescu, dietmar.eggemann On Thu, Apr 15, 2021 at 03:43:06PM +0000, Quentin Perret wrote: > On Thursday 15 Apr 2021 at 16:32:31 (+0100), Lukasz Luba wrote: > > Are you sure that the 'policy' can be accessed from compute_energy()? > > It can be from schedutil freq switch path, but I'm not use about our > > feec().. > > Right, I was just looking at cpufreq_cpu_get() and we'll have locking > issue in the wake-up path :/ So maybe making feec() aware of policy caps > is for later ... > > > For me this cpufreq_driver_resolve_freq sounds a bit out of this patch > > subject. > > Not sure I agree -- if we're going to index the EM table from schedutil > it should be integrated nicely if possible. > > Thanks I'm having a look at this topic right now and I don't think we can skip cpufreq_driver_resolve_freq() in the end, for two reasons: 1. It is possible to register OPPs (and by extension perf_states) for a frequency for which, the cpufreq table entry is marked with CPUFREQ_ENTRY_INVALID. It would probably be an issue that would have to be fixed in the driver, but it is currently allowed. 2. More importantly, while resolving the frequency, we also cache the index in cached_resolved_idx. Some drivers, such as qcom-cpufreq-hw rely on this value for their fastswitch support. -- Vincent ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] PM / EM: Inefficient OPPs detection 2021-04-08 17:10 ` Vincent Donnefort 2021-04-15 13:12 ` Quentin Perret 2021-04-15 13:16 ` Quentin Perret @ 2021-04-22 17:26 ` Lukasz Luba 2 siblings, 0 replies; 20+ messages in thread From: Lukasz Luba @ 2021-04-22 17:26 UTC (permalink / raw) To: Vincent Donnefort Cc: peterz, rjw, viresh.kumar, vincent.guittot, qperret, linux-kernel, ionela.voinescu, dietmar.eggemann Hi Vincent, On 4/8/21 6:10 PM, Vincent Donnefort wrote: > Some SoCs, such as the sd855 have OPPs within the same performance domain, > whose cost is higher than others with a higher frequency. Even though > those OPPs are interesting from a cooling perspective, it makes no sense > to use them when the device can run at full capacity. Those OPPs handicap > the performance domain, when choosing the most energy-efficient CPU and > are wasting energy. They are inefficient. > > Hence, add support for such OPPs to the Energy Model, which creates for > each OPP a performance state. The Energy Model can now be read using the > regular table, which contains all performance states available, or using > an efficient table, where inefficient performance states (and by > extension, inefficient OPPs) have been removed. > > Currently, the efficient table is used in two paths. Schedutil, and > find_energy_efficient_cpu(). We have to modify both paths in the same > patch so they stay synchronized. The thermal framework still relies on > the original table and hence, DevFreq devices won't create the efficient > table. > > As used in the hot-path, the efficient table is a lookup table, generated > dynamically when the perf domain is created. The complexity of searching > a performance state is hence changed from O(n) to O(1). This also > speeds-up em_cpu_energy() even if no inefficient OPPs have been found. > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> > I know we have discussed it internally, but I thought it would be good to send it also to LKML. The concept of inefficient OPPs makes perfect sense. It doesn't cause that the CPU would not run on some inefficient OPP when thermal forces it. The lookup table which is now used, makes also sense because if we have example configuration: 1prime + 2bigs + 4littles and they have separate freq domains, there might be a use case when we call 7 times em_cpu_energy() (in a single feec()) which does this O(nr_opps) search, while we can have 7 times O(1) lookups. In your some other email the results are showing improvements (especially for big core) and no regression. Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Regards, Lukasz ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-05-20 12:19 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-08 17:10 [PATCH] PM / EM: Inefficient OPPs detection Vincent Donnefort 2021-04-08 17:10 ` Vincent Donnefort 2021-04-15 13:12 ` Quentin Perret 2021-04-15 14:12 ` Vincent Donnefort 2021-04-15 15:04 ` Quentin Perret 2021-04-15 15:27 ` Vincent Donnefort 2021-04-22 15:36 ` Vincent Donnefort 2021-04-23 16:14 ` Quentin Perret 2021-04-28 14:46 ` Vincent Donnefort 2021-05-20 11:12 ` Quentin Perret 2021-04-15 13:16 ` Quentin Perret 2021-04-15 14:34 ` Vincent Donnefort 2021-04-15 14:59 ` Quentin Perret 2021-04-15 15:05 ` Quentin Perret 2021-04-15 15:14 ` Vincent Donnefort 2021-04-15 15:20 ` Quentin Perret 2021-04-15 15:32 ` Lukasz Luba 2021-04-15 15:43 ` Quentin Perret 2021-04-28 13:28 ` Vincent Donnefort 2021-04-22 17:26 ` Lukasz Luba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).