linux-samsung-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Update Energy Model with perfromance limits
@ 2024-04-03 16:23 Lukasz Luba
  2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-03 16:23 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: lukasz.luba, dietmar.eggemann, linux-arm-kernel, sudeep.holla,
	cristian.marussi, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis

Hi all,

This patch set allows to specify in the EM the range of performance levels that
the device is allowed to operate. It will impact EAS decision, especially for
SoCs where CPUs share the voltage & frequency domain with other CPUs or devices
e.g.
- Mid CPUs + Big CPU
- Little CPU + L3 cache in DSU

The minimum allowed frequency will be taken into account while doing EAS task
placement simulation. When the min frequency is higher for the whole domain
and not driven by the CPUs in that PD utilization, than the energy for
computation in that PD will be higher. This patch helps to reflect that higher
cost.

More explanation can be found in my presentation on OSPM2023 [1].
I have shown experiments with Big CPU running high frequency and increasing
the L3 cache frequency (to reduce the latency), but that impacted Little
CPU which are in the same DVFS domain with L3 cache. It had bad impact for
total energy consumed by small tasks placed on Little CPU. The EAS was not
aware about the min frequency&voltage of the Little CPUs and energy estimation
was wrong.

Depends on:
patch 2/2:
- SCMI cpufreq performance limits notification support (w/ other
   dependency) [2]
patch 1/2:
- EM recent patches for chip binning update - to avoid conflict [3]

Therefore, the patch 1/2 could go first and patch 2/2 can wait longer.

Regards,
Lukasz Luba

[1] https://www.youtube.com/watch?v=2C-5uikSbtM&list=PL0fKordpLTjKsBOUcZqnzlHShri4YBL1H
[2] https://lore.kernel.org/lkml/20240328074131.2839871-1-quic_sibis@quicinc.com/
[3] https://lore.kernel.org/lkml/20240403154907.1420245-1-lukasz.luba@arm.com/

Lukasz Luba (2):
  PM: EM: Add min/max available performance state limits
  cpufreq: scmi: Update Energy Model with allowed performance limits

 drivers/cpufreq/scmi-cpufreq.c | 19 +++++++++++---
 include/linux/energy_model.h   | 22 +++++++++++++---
 kernel/power/energy_model.c    | 48 ++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] PM: EM: Add min/max available performance state limits
  2024-04-03 16:23 [PATCH 0/2] Update Energy Model with perfromance limits Lukasz Luba
@ 2024-04-03 16:23 ` Lukasz Luba
  2024-04-09 14:47   ` Hongyan Xia
  2024-04-22  7:46   ` Dietmar Eggemann
  2024-04-03 16:23 ` [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits Lukasz Luba
  2024-04-05  9:56 ` [PATCH 0/2] Update Energy Model with perfromance limits Jonathan Cameron
  2 siblings, 2 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-03 16:23 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: lukasz.luba, dietmar.eggemann, linux-arm-kernel, sudeep.holla,
	cristian.marussi, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis

On some devices there are HW dependencies for shared frequency and voltage
between devices: CPUs and L3 cache. When the L3 cache frequency is
increased, the affected CPUs might run at higher voltage and frequency.
That higher voltage causes higher CPU power and thus more energy is used
for running the tasks.

Add performance state limits which are applied for the device. This allows
the Energy Model (EM) to better reflect the CPU's currently available
performance states. This information is used by Energy Aware Scheduler
(EAS) during task placement to avoid situation when a simulated energy
cost has error due to using wrong Power Domain (PD) frequency.

The function performs only bare minimum checks (and requires EM as
an argument) to reduce the overhead.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 22 ++++++++++++++---
 kernel/power/energy_model.c  | 48 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index d30d67c2f07cf..feadd0fd6b356 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -55,6 +55,8 @@ struct em_perf_table {
  * struct em_perf_domain - Performance domain
  * @em_table:		Pointer to the runtime modifiable em_perf_table
  * @nr_perf_states:	Number of performance states
+ * @min_ps:		Minimum available performance state index
+ * @max_ps:		Maximum available performance state index
  * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
  *			for performance reasons to avoid potential cache
@@ -70,6 +72,8 @@ struct em_perf_table {
 struct em_perf_domain {
 	struct em_perf_table __rcu *em_table;
 	int nr_perf_states;
+	int min_ps;
+	int max_ps;
 	unsigned long flags;
 	unsigned long cpus[];
 };
@@ -173,6 +177,8 @@ void em_table_free(struct em_perf_table __rcu *table);
 int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
 			 int nr_states);
 int em_dev_update_chip_binning(struct device *dev);
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz);
 
 /**
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
@@ -189,12 +195,13 @@ int em_dev_update_chip_binning(struct device *dev);
  */
 static inline int
 em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
-			  unsigned long max_util, unsigned long pd_flags)
+			  unsigned long max_util, unsigned long pd_flags,
+			  int min_ps, int max_ps)
 {
 	struct em_perf_state *ps;
 	int i;
 
-	for (i = 0; i < nr_perf_states; i++) {
+	for (i = min_ps; i <= max_ps; i++) {
 		ps = &table[i];
 		if (ps->performance >= max_util) {
 			if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
@@ -204,7 +211,7 @@ em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
 		}
 	}
 
-	return nr_perf_states - 1;
+	return max_ps;
 }
 
 /**
@@ -255,7 +262,8 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 */
 	em_table = rcu_dereference(pd->em_table);
 	i = em_pd_get_efficient_state(em_table->state, pd->nr_perf_states,
-				      max_util, pd->flags);
+				      max_util, pd->flags, pd->min_ps,
+				      pd->max_ps);
 	ps = &em_table->state[i];
 
 	/*
@@ -392,6 +400,12 @@ static inline int em_dev_update_chip_binning(struct device *dev)
 {
 	return -EINVAL;
 }
+static inline
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 927cc55ba0b3d..1a8b394251cb2 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -628,6 +628,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 		goto unlock;
 
 	dev->em_pd->flags |= flags;
+	dev->em_pd->min_ps = 0;
+	dev->em_pd->max_ps = nr_states - 1;
 
 	em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
 
@@ -856,3 +858,49 @@ int em_dev_update_chip_binning(struct device *dev)
 	return em_recalc_and_update(dev, pd, em_table);
 }
 EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
+
+
+/**
+ * em_update_performance_limits() - Update Energy Model with performance
+ *				limits information.
+ * @pd			: Performance Domain with EM that has to be updated.
+ * @freq_min_khz	: New minimum allowed frequency for this device.
+ * @freq_max_khz	: New maximum allowed frequency for this device.
+ *
+ * This function allows to update the EM with information about available
+ * performance levels. It takes the minimum and maximum frequency in kHz
+ * and does internal translation to performance levels.
+ * Returns 0 on success or -EINVAL when failed.
+ */
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz)
+{
+	struct em_perf_state *table;
+	int min_ps = -1;
+	int max_ps = -1;
+	int i;
+
+	if (!pd)
+		return -EINVAL;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		if (freq_min_khz == table[i].frequency)
+			min_ps = i;
+		if (freq_max_khz == table[i].frequency)
+			max_ps = i;
+	}
+	rcu_read_unlock();
+
+	/* Only update when both are found and sane */
+	if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
+		return -EINVAL;
+
+	pd->min_ps = min_ps;
+	pd->max_ps = max_ps;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(em_update_performance_limits);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits
  2024-04-03 16:23 [PATCH 0/2] Update Energy Model with perfromance limits Lukasz Luba
  2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
@ 2024-04-03 16:23 ` Lukasz Luba
  2024-04-22 13:11   ` Dietmar Eggemann
  2024-05-01  9:26   ` Cristian Marussi
  2024-04-05  9:56 ` [PATCH 0/2] Update Energy Model with perfromance limits Jonathan Cameron
  2 siblings, 2 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-03 16:23 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: lukasz.luba, dietmar.eggemann, linux-arm-kernel, sudeep.holla,
	cristian.marussi, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis

The Energy Model (EM) supports performance limits updates. Use the SCMI
notifications to get information from FW about allowed frequency scope for
the CPUs.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index d946b7a082584..90c8448578cb1 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -185,12 +185,25 @@ static int scmi_limit_notify_cb(struct notifier_block *nb, unsigned long event,
 {
 	struct scmi_data *priv = container_of(nb, struct scmi_data, limit_notify_nb);
 	struct scmi_perf_limits_report *limit_notify = data;
+	unsigned int limit_freq_max_khz, limit_freq_min_khz;
 	struct cpufreq_policy *policy = priv->policy;
-	unsigned int limit_freq_khz;
+	struct em_perf_domain *pd;
+	int ret;
+
+	limit_freq_max_khz = limit_notify->range_max_freq / HZ_PER_KHZ;
+	limit_freq_min_khz = limit_notify->range_min_freq / HZ_PER_KHZ;
 
-	limit_freq_khz = limit_notify->range_max_freq / HZ_PER_KHZ;
+	pd = em_cpu_get(policy->cpu);
+	if (pd) {
+		ret = em_update_performance_limits(pd, limit_freq_min_khz,
+						   limit_freq_max_khz);
+		if (ret)
+			dev_warn(priv->cpu_dev,
+				 "EM perf limits update failed\n");
+	}
 
-	policy->max = clamp(limit_freq_khz, policy->cpuinfo.min_freq, policy->cpuinfo.max_freq);
+	policy->max = clamp(limit_freq_max_khz, policy->cpuinfo.min_freq,
+			    policy->cpuinfo.max_freq);
 
 	cpufreq_update_pressure(policy);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Update Energy Model with perfromance limits
  2024-04-03 16:23 [PATCH 0/2] Update Energy Model with perfromance limits Lukasz Luba
  2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
  2024-04-03 16:23 ` [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits Lukasz Luba
@ 2024-04-05  9:56 ` Jonathan Cameron
  2024-04-05 10:11   ` Lukasz Luba
  2 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2024-04-05  9:56 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, dietmar.eggemann, linux-arm-kernel,
	sudeep.holla, cristian.marussi, linux-samsung-soc, rafael,
	viresh.kumar, quic_sibis

On Wed,  3 Apr 2024 17:23:13 +0100
Lukasz Luba <lukasz.luba@arm.com> wrote:

Typo in patch title.  performance

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Update Energy Model with perfromance limits
  2024-04-05  9:56 ` [PATCH 0/2] Update Energy Model with perfromance limits Jonathan Cameron
@ 2024-04-05 10:11   ` Lukasz Luba
  0 siblings, 0 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-05 10:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-pm, dietmar.eggemann, linux-arm-kernel,
	sudeep.holla, cristian.marussi, linux-samsung-soc, rafael,
	viresh.kumar, quic_sibis



On 4/5/24 10:56, Jonathan Cameron wrote:
> On Wed,  3 Apr 2024 17:23:13 +0100
> Lukasz Luba <lukasz.luba@arm.com> wrote:
> 
> Typo in patch title.  performance

Thank you, good catch!

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PM: EM: Add min/max available performance state limits
  2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
@ 2024-04-09 14:47   ` Hongyan Xia
  2024-04-22  7:24     ` Lukasz Luba
  2024-04-22  7:46   ` Dietmar Eggemann
  1 sibling, 1 reply; 11+ messages in thread
From: Hongyan Xia @ 2024-04-09 14:47 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm
  Cc: dietmar.eggemann, linux-arm-kernel, sudeep.holla,
	cristian.marussi, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis

On 03/04/2024 17:23, Lukasz Luba wrote:
> [...]
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 927cc55ba0b3d..1a8b394251cb2 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -628,6 +628,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>   		goto unlock;
>   
>   	dev->em_pd->flags |= flags;
> +	dev->em_pd->min_ps = 0;
> +	dev->em_pd->max_ps = nr_states - 1;
>   
>   	em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
>   
> @@ -856,3 +858,49 @@ int em_dev_update_chip_binning(struct device *dev)
>   	return em_recalc_and_update(dev, pd, em_table);
>   }
>   EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
> +
> +
> +/**
> + * em_update_performance_limits() - Update Energy Model with performance
> + *				limits information.
> + * @pd			: Performance Domain with EM that has to be updated.
> + * @freq_min_khz	: New minimum allowed frequency for this device.
> + * @freq_max_khz	: New maximum allowed frequency for this device.
> + *
> + * This function allows to update the EM with information about available
> + * performance levels. It takes the minimum and maximum frequency in kHz
> + * and does internal translation to performance levels.
> + * Returns 0 on success or -EINVAL when failed.
> + */
> +int em_update_performance_limits(struct em_perf_domain *pd,
> +		unsigned long freq_min_khz, unsigned long freq_max_khz)
> +{
> +	struct em_perf_state *table;
> +	int min_ps = -1;
> +	int max_ps = -1;
> +	int i;
> +
> +	if (!pd)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		if (freq_min_khz == table[i].frequency)
> +			min_ps = i;
> +		if (freq_max_khz == table[i].frequency)
> +			max_ps = i;
> +	}
> +	rcu_read_unlock();
> +
> +	/* Only update when both are found and sane */
> +	if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
> +		return -EINVAL;
> +
> +	pd->min_ps = min_ps;
> +	pd->max_ps = max_ps;

Are we sure we are protected against multiple simultaneous updates? Or 
is this a concern for the caller?

The rest of the patch LGTM.

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(em_update_performance_limits);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PM: EM: Add min/max available performance state limits
  2024-04-09 14:47   ` Hongyan Xia
@ 2024-04-22  7:24     ` Lukasz Luba
  0 siblings, 0 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-22  7:24 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: dietmar.eggemann, linux-pm, linux-kernel, linux-arm-kernel,
	sudeep.holla, cristian.marussi, linux-samsung-soc, rafael,
	viresh.kumar, quic_sibis



On 4/9/24 15:47, Hongyan Xia wrote:
> On 03/04/2024 17:23, Lukasz Luba wrote:
>> [...]
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index 927cc55ba0b3d..1a8b394251cb2 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -628,6 +628,8 @@ int em_dev_register_perf_domain(struct device 
>> *dev, unsigned int nr_states,
>>           goto unlock;
>>       dev->em_pd->flags |= flags;
>> +    dev->em_pd->min_ps = 0;
>> +    dev->em_pd->max_ps = nr_states - 1;
>>       em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
>> @@ -856,3 +858,49 @@ int em_dev_update_chip_binning(struct device *dev)
>>       return em_recalc_and_update(dev, pd, em_table);
>>   }
>>   EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
>> +
>> +
>> +/**
>> + * em_update_performance_limits() - Update Energy Model with performance
>> + *                limits information.
>> + * @pd            : Performance Domain with EM that has to be updated.
>> + * @freq_min_khz    : New minimum allowed frequency for this device.
>> + * @freq_max_khz    : New maximum allowed frequency for this device.
>> + *
>> + * This function allows to update the EM with information about 
>> available
>> + * performance levels. It takes the minimum and maximum frequency in kHz
>> + * and does internal translation to performance levels.
>> + * Returns 0 on success or -EINVAL when failed.
>> + */
>> +int em_update_performance_limits(struct em_perf_domain *pd,
>> +        unsigned long freq_min_khz, unsigned long freq_max_khz)
>> +{
>> +    struct em_perf_state *table;
>> +    int min_ps = -1;
>> +    int max_ps = -1;
>> +    int i;
>> +
>> +    if (!pd)
>> +        return -EINVAL;
>> +
>> +    rcu_read_lock();
>> +    table = em_perf_state_from_pd(pd);
>> +
>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>> +        if (freq_min_khz == table[i].frequency)
>> +            min_ps = i;
>> +        if (freq_max_khz == table[i].frequency)
>> +            max_ps = i;
>> +    }
>> +    rcu_read_unlock();
>> +
>> +    /* Only update when both are found and sane */
>> +    if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
>> +        return -EINVAL;
>> +
>> +    pd->min_ps = min_ps;
>> +    pd->max_ps = max_ps;
> 
> Are we sure we are protected against multiple simultaneous updates? Or 
> is this a concern for the caller?
> 
> The rest of the patch LGTM.
> 

I've tried to make it running fast for only one caller. Although,
if someone would like to use it from many places then locking should be
handled under in function (and I will use the existing mutex for it).

I'll change it. Thanks for the review.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PM: EM: Add min/max available performance state limits
  2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
  2024-04-09 14:47   ` Hongyan Xia
@ 2024-04-22  7:46   ` Dietmar Eggemann
  1 sibling, 0 replies; 11+ messages in thread
From: Dietmar Eggemann @ 2024-04-22  7:46 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm
  Cc: linux-arm-kernel, sudeep.holla, cristian.marussi,
	linux-samsung-soc, rafael, viresh.kumar, quic_sibis

On 03/04/2024 18:23, Lukasz Luba wrote:
> On some devices there are HW dependencies for shared frequency and voltage
> between devices: CPUs and L3 cache. When the L3 cache frequency is
> increased, the affected CPUs might run at higher voltage and frequency.

IMHO, this is an example where the min Performance State (PS) makes
sense. But what's a use case for the max PS?

> That higher voltage causes higher CPU power and thus more energy is used
> for running the tasks.
> 
> Add performance state limits which are applied for the device. This allows

Regarding device, I thought that this is only applicable for device type
CPU?

> the Energy Model (EM) to better reflect the CPU's currently available
> performance states. This information is used by Energy Aware Scheduler
> (EAS) during task placement to avoid situation when a simulated energy
> cost has error due to using wrong Power Domain (PD) frequency.

Maybe better?

This is important on SoCs with HW dependencies mentioned above so that
the  Energy Aware Scheduler (EAS) does not use performance states
outside the valid min-max range for energy calculation.

> The function performs only bare minimum checks (and requires EM as

s/The function/em_update_performance_limits()

s/EM/PD ... I guess we always pass a PD pointer to all the EM functions.
I guess we can say that an EM consists of at least 2 PDs.
I guess it's valid to say that we limit per PD, e.g. per little CPUs?

[...]

>  /**
>   * em_pd_get_efficient_state() - Get an efficient performance state from the EM
> @@ -189,12 +195,13 @@ int em_dev_update_chip_binning(struct device *dev);
>   */

Missing  min_ps, max_ps description in function header of
em_pd_get_efficient_state().

[...]

> +/**
> + * em_update_performance_limits() - Update Energy Model with performance
> + *				limits information.
> + * @pd			: Performance Domain with EM that has to be updated.
> + * @freq_min_khz	: New minimum allowed frequency for this device.
> + * @freq_max_khz	: New maximum allowed frequency for this device.
> + *
> + * This function allows to update the EM with information about available
> + * performance levels. It takes the minimum and maximum frequency in kHz
> + * and does internal translation to performance levels.
> + * Returns 0 on success or -EINVAL when failed.
> + */
> +int em_update_performance_limits(struct em_perf_domain *pd,
> +		unsigned long freq_min_khz, unsigned long freq_max_khz)
> +{
> +	struct em_perf_state *table;
> +	int min_ps = -1;
> +	int max_ps = -1;
> +	int i;
> +
> +	if (!pd)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		if (freq_min_khz == table[i].frequency)

So the caller has to know the exact frequency value of the performance
states (PSs)? It's not 'f(PS_n-1) + 1 <= x <= f(PS_n)'.

[...]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits
  2024-04-03 16:23 ` [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits Lukasz Luba
@ 2024-04-22 13:11   ` Dietmar Eggemann
  2024-04-22 13:55     ` Lukasz Luba
  2024-05-01  9:26   ` Cristian Marussi
  1 sibling, 1 reply; 11+ messages in thread
From: Dietmar Eggemann @ 2024-04-22 13:11 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm
  Cc: linux-arm-kernel, sudeep.holla, cristian.marussi,
	linux-samsung-soc, rafael, viresh.kumar, quic_sibis

On 03/04/2024 18:23, Lukasz Luba wrote:
> The Energy Model (EM) supports performance limits updates. Use the SCMI
> notifications to get information from FW about allowed frequency scope for
> the CPUs.

I'm slightly confused here. IMHO this doesn't seem to be related to the
"HW dependency between 'little CPUs & L3 $ in DSU' or similar" usecase.

I assumed that this usecase is rather handled via an additional
out-of-tree driver, potentially the same which updates the EM because of
temperature change (em_dev_compute_costs(), em_dev_update_perf_domain())
or chip binning (em_dev_update_chip_binning()).

What about other CPUFreq drivers registering an EM via
em_dev_register_perf_domain() or 'cpufreq_register_em_with_opp() ->
dev_pm_opp_of_register_em()'? Or is this 'limit notification' an SCMI FW
only thing?

[...]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits
  2024-04-22 13:11   ` Dietmar Eggemann
@ 2024-04-22 13:55     ` Lukasz Luba
  0 siblings, 0 replies; 11+ messages in thread
From: Lukasz Luba @ 2024-04-22 13:55 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-arm-kernel, sudeep.holla, linux-pm, linux-kernel,
	cristian.marussi, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis



On 4/22/24 14:11, Dietmar Eggemann wrote:
> On 03/04/2024 18:23, Lukasz Luba wrote:
>> The Energy Model (EM) supports performance limits updates. Use the SCMI
>> notifications to get information from FW about allowed frequency scope for
>> the CPUs.
> 
> I'm slightly confused here. IMHO this doesn't seem to be related to the
> "HW dependency between 'little CPUs & L3 $ in DSU' or similar" usecase.
> 
> I assumed that this usecase is rather handled via an additional
> out-of-tree driver, potentially the same which updates the EM because of
> temperature change (em_dev_compute_costs(), em_dev_update_perf_domain())
> or chip binning (em_dev_update_chip_binning()).

This patch allows to handle relatively simple and straight forward use
case for updating the perf limits in the EM. The one that you mention,
which would probably always live out-of-tree, is more complex and
focused on leakage estimation in different conditions.

I see those two drivers separate. We have the DSU+Littles dependency
always, because it's HW dependency, while the leakage issue can happen
in some scenarios like gaming and needs more dedicated driver to handle
it (or rely on FW, but that's another story, orthogonal as well) and
more information to do it properly.

> 
> What about other CPUFreq drivers registering an EM via
> em_dev_register_perf_domain() or 'cpufreq_register_em_with_opp() ->
> dev_pm_opp_of_register_em()'? Or is this 'limit notification' an SCMI FW
> only thing?

Other platforms which use different drivers for CPUfreq will have to
develop their own code. Although, when we merge this upstream, they
could follow this pattern as a reference design.

In our SCMI cpufreq and our SCP firmware we have this situation:
1. Sending CPUfreq request from Big CPU to SCP e.g. via fast-channel &
    it's done from sched-util w/o sugov kthread &
    it has to be super fast, we don't check any other dependency CPU
    for Littles or something like that.
2. The SCP firmware receives the frequency request for Big CPU &
    it checks internal dependencies for other components e.g.
    L3 cache min speed (DSU+Littles domian frequency)
3. The SCP changes the Big CPU frequency & changes the DSU+Littles
    frequency as the depended device
4. The SCP sends updated performance limits for the depended DSU+Littles
    domian to the SCMI cpufreq kernel driver, for proper Littles domain
    cpufreq device
5. SCMI cpufreq kernel driver gets the SCP notification about updated
    perf limits & translates the perf limits min value as the lowest
    currently available frequency for the Littles
6. The SCMI cpufreq driver updates the EM perf limits for Littles
    as the currently minimum available frequency.
    This allows to properly simulate the energy impact when the EAS
    tries to put a task on that domain, even when that PD's util signals
    might show lower frequency potentially being used.

That's why I see this as part of the CPUfreq driver feature.

The leakage driver might be better suited for the thermal framework,
since there is more information available there.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits
  2024-04-03 16:23 ` [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits Lukasz Luba
  2024-04-22 13:11   ` Dietmar Eggemann
@ 2024-05-01  9:26   ` Cristian Marussi
  1 sibling, 0 replies; 11+ messages in thread
From: Cristian Marussi @ 2024-05-01  9:26 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, dietmar.eggemann, linux-arm-kernel,
	sudeep.holla, linux-samsung-soc, rafael, viresh.kumar,
	quic_sibis

On Wed, Apr 03, 2024 at 05:23:15PM +0100, Lukasz Luba wrote:
> The Energy Model (EM) supports performance limits updates. Use the SCMI
> notifications to get information from FW about allowed frequency scope for
> the CPUs.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  drivers/cpufreq/scmi-cpufreq.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> index d946b7a082584..90c8448578cb1 100644
> --- a/drivers/cpufreq/scmi-cpufreq.c
> +++ b/drivers/cpufreq/scmi-cpufreq.c
> @@ -185,12 +185,25 @@ static int scmi_limit_notify_cb(struct notifier_block *nb, unsigned long event,
>  {
>  	struct scmi_data *priv = container_of(nb, struct scmi_data, limit_notify_nb);
>  	struct scmi_perf_limits_report *limit_notify = data;
> +	unsigned int limit_freq_max_khz, limit_freq_min_khz;
>  	struct cpufreq_policy *policy = priv->policy;
> -	unsigned int limit_freq_khz;
> +	struct em_perf_domain *pd;
> +	int ret;
> +
> +	limit_freq_max_khz = limit_notify->range_max_freq / HZ_PER_KHZ;
> +	limit_freq_min_khz = limit_notify->range_min_freq / HZ_PER_KHZ;

Note that these values could be zeroed if the notification is good but
the range_min/range_max values could NOT be mapped to a frequency
equivalent (due to some FW errors).

I would probably have to add a warn about this error in the core SCMI
notification path (or drop the notif as a whole); if not here you could
end-up just setting max/min to 0 if the fw has messed up the
notification range_min/range_max.

Or is it just that, especially max_feq = 0 is NOT plausible value and you
will need anyway to check it here ?

>  
> -	limit_freq_khz = limit_notify->range_max_freq / HZ_PER_KHZ;
> +	pd = em_cpu_get(policy->cpu);
> +	if (pd) {
> +		ret = em_update_performance_limits(pd, limit_freq_min_khz,
> +						   limit_freq_max_khz);
> +		if (ret)
> +			dev_warn(priv->cpu_dev,
> +				 "EM perf limits update failed\n");
> +	}
>  
> -	policy->max = clamp(limit_freq_khz, policy->cpuinfo.min_freq, policy->cpuinfo.max_freq);
> +	policy->max = clamp(limit_freq_max_khz, policy->cpuinfo.min_freq,
> +			    policy->cpuinfo.max_freq);

FWIW, regarding the SCMI bits.

LGTM.
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-01  9:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-03 16:23 [PATCH 0/2] Update Energy Model with perfromance limits Lukasz Luba
2024-04-03 16:23 ` [PATCH 1/2] PM: EM: Add min/max available performance state limits Lukasz Luba
2024-04-09 14:47   ` Hongyan Xia
2024-04-22  7:24     ` Lukasz Luba
2024-04-22  7:46   ` Dietmar Eggemann
2024-04-03 16:23 ` [PATCH 2/2] cpufreq: scmi: Update Energy Model with allowed performance limits Lukasz Luba
2024-04-22 13:11   ` Dietmar Eggemann
2024-04-22 13:55     ` Lukasz Luba
2024-05-01  9:26   ` Cristian Marussi
2024-04-05  9:56 ` [PATCH 0/2] Update Energy Model with perfromance limits Jonathan Cameron
2024-04-05 10:11   ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).