devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies
@ 2020-11-02 12:01 Nicola Mazzucato
  2020-11-02 12:01 ` [PATCH v3 1/3] dt-bindings/opp: Update documentation for opp-shared Nicola Mazzucato
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-02 12:01 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

Hi All,

In this V3 posting I have replaced the new dt-binding with minor changes/
improvements for opp (since we are now using opp tables instead).
The RFC still stands on how to make this info available to sw consumers.

In the RFC, I am proposing a simple addition of a performance dependencies
cpumask in CPUFreq core and an example of how drivers and consumers would
make use of it.
I also propose an alternative approach, which does not require changes in
CPUFreq core, but - possibly - in all the consumers.

This is to support systems where exposed cpu performance controls are more
fine-grained that the platform's ability to scale cpus independently.

[v3]
  * Remove proposal for new 'cpu-performance-dependencies' as we instead
    can reuse the opp table.
  * Update documentation for devicetree/bindings/opp
  * Minor changes within opp to support empty opp table
  * Rework the RFC by adding a second proposal

[v2]
  * Fix errors when running make dt_binding_check
  * Improve commit message description for the dt-binding
  * Add RFC for implementation in cpufreq-core and one of its
    drivers.

Nicola Mazzucato (3):
  dt-bindings/opp: Update documentation for opp-shared
  opp/of: Allow empty opp-table with opp-shared
  [RFC] CPUFreq: Add support for cpu-perf-dependencies

 Documentation/devicetree/bindings/opp/opp.txt | 53 +++++++++++++++++++
 drivers/opp/of.c                              | 13 ++++-
 2 files changed, 64 insertions(+), 2 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/3] dt-bindings/opp: Update documentation for opp-shared
  2020-11-02 12:01 [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
@ 2020-11-02 12:01 ` Nicola Mazzucato
  2020-11-02 12:01 ` [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared Nicola Mazzucato
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-02 12:01 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

Currently the optional property opp-shared is used within an opp table
to tell that a set of devices share their clock/voltage lines (and the
opp points).
It is therefore possible to use an empty opp table to convey only that
information, useful in situations where the opp points are provided via
other means (hardware. firmware, etc).

Update the documentation to remark this additional case and provide an
example.

Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
---
 Documentation/devicetree/bindings/opp/opp.txt | 53 +++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index 9847dfeeffcb..a5f1f993eab9 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -72,6 +72,9 @@ Optional properties:
   switch their DVFS state together, i.e. they share clock/voltage/current lines.
   Missing property means devices have independent clock/voltage/current lines,
   but they share OPP tables.
+  This optional property can be used without any OPP nodes when its only purpose
+  is to describe a dependency of clock/voltage/current lines among a set of
+  devices.
 
 - status: Marks the OPP table enabled/disabled.
 
@@ -568,3 +571,53 @@ Example 6: opp-microvolt-<name>, opp-microamp-<name>:
 		};
 	};
 };
+
+Example 7: Single cluster Quad-core ARM cortex A53, OPP points from firmware,
+distinct clock controls but two sets of clock/voltage/current lines.
+
+/ {
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "arm,cortex-a53";
+			reg = <0x0 0x100>;
+			next-level-cache = <&A53_L2>;
+			clocks = <&dvfs_controller 0>;
+			operating-points-v2 = <&cpu_opp0_table>;
+		};
+		cpu@1 {
+			compatible = "arm,cortex-a53";
+			reg = <0x0 0x101>;
+			next-level-cache = <&A53_L2>;
+			clocks = <&dvfs_controller 1>;
+			operating-points-v2 = <&cpu_opp0_table>;
+		};
+		cpu@2 {
+			compatible = "arm,cortex-a53";
+			reg = <0x0 0x102>;
+			next-level-cache = <&A53_L2>;
+			clocks = <&dvfs_controller 2>;
+			operating-points-v2 = <&cpu_opp1_table>;
+		};
+		cpu@3 {
+			compatible = "arm,cortex-a53";
+			reg = <0x0 0x103>;
+			next-level-cache = <&A53_L2>;
+			clocks = <&dvfs_controller 3>;
+			operating-points-v2 = <&cpu_opp1_table>;
+		};
+
+	};
+
+	cpu_opp0_table: opp0_table {
+		compatible = "operating-points-v2";
+		opp-shared;
+	};
+
+	cpu_opp1_table: opp1_table {
+		compatible = "operating-points-v2";
+		opp-shared;
+	};
+};
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared
  2020-11-02 12:01 [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
  2020-11-02 12:01 ` [PATCH v3 1/3] dt-bindings/opp: Update documentation for opp-shared Nicola Mazzucato
@ 2020-11-02 12:01 ` Nicola Mazzucato
  2020-11-03  5:01   ` Viresh Kumar
  2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
  2020-11-03 10:18 ` [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Viresh Kumar
  3 siblings, 1 reply; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-02 12:01 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

The opp binding now allows to have an empty opp table and shared-opp to
merely describe a hw connection among devices (f/v lines).

When initialising an opp table, allow such case by:
- treating some errors as warnings
- do not mark empty tables as shared
- don't fail on empty table

Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
---
 drivers/opp/of.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 874b58756220..b0230490bb31 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -157,6 +157,11 @@ static void _opp_table_free_required_tables(struct opp_table *opp_table)
 /*
  * Populate all devices and opp tables which are part of "required-opps" list.
  * Checking only the first OPP node should be enough.
+ *
+ * Corner case: empty opp table and opp-shared found. In this case we set
+ * unconditionally the opp table access to exclusive, as the opp-shared property
+ * is used purely to describe hw connections. Such information will be retrieved
+ * via dev_pm_opp_of_get_sharing_cpus().
  */
 static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
 					     struct device *dev,
@@ -169,7 +174,9 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
 	/* Traversing the first OPP node is all we need */
 	np = of_get_next_available_child(opp_np, NULL);
 	if (!np) {
-		dev_err(dev, "Empty OPP table\n");
+		dev_warn(dev, "Empty OPP table\n");
+
+		opp_table->shared_opp = OPP_TABLE_ACCESS_EXCLUSIVE;
 		return;
 	}
 
@@ -377,7 +384,9 @@ int dev_pm_opp_of_find_icc_paths(struct device *dev,
 	struct icc_path **paths;
 
 	ret = _bandwidth_supported(dev, opp_table);
-	if (ret <= 0)
+	if (ret == -EINVAL)
+		return 0; /* Empty OPP table is a valid corner-case, let's not fail */
+	else if (ret <= 0)
 		return ret;
 
 	ret = 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-02 12:01 [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
  2020-11-02 12:01 ` [PATCH v3 1/3] dt-bindings/opp: Update documentation for opp-shared Nicola Mazzucato
  2020-11-02 12:01 ` [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared Nicola Mazzucato
@ 2020-11-02 12:01 ` Nicola Mazzucato
  2020-11-06  9:20   ` Viresh Kumar
                     ` (2 more replies)
  2020-11-03 10:18 ` [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Viresh Kumar
  3 siblings, 3 replies; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-02 12:01 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

Hi All,

This is a continuation of the previous v2, where we focused mostly on the
dt binding.

I am seeking some feedback/comments on the following two approaches.

Intro:
We have seen that in a system where performance control and hardware
description do not match (i.e. per-cpu), we still need the information of
how the v/f lines are shared among the cpus. We call this information
"performance dependencies".
We got this info through the opp-shared (the previous 2 patches aim for
that).

Problem:
How do we share such info (retrieved from a cpufreq driver) to other
consumers that rely on it? I have two proposals.

First proposal:
Having a new cpumask 'dependent_cpus' to represent the
CPU performance dependencies seems a viable and scalable way.

The main reason for a new cpumaks is that although 'related_cpus'
could be (or could have been) used for such purpose, its meaning has
changed over time [1].

Provided that the cpufreq driver will populate dependent_cpus and set
shared_type accordingly, the s/w components that rely on such description
(we focus on energy-model and cpufreq_cooling for now) will always be
provided with the correct information, when picking the new cpumask.

Proposed changes (at high level)(3):

1) cpufreq: Add new dependent_cpus cpumaks in cpufreq_policy

   * New cpumask addition
   <snippet>
struct cpufreq_policy {
        cpumask_var_t           related_cpus; /* Online + Offline CPUs */
        cpumask_var_t           real_cpus; /* Related and present */

+       /*
+        * CPUs with hardware clk/perf dependencies
+        *
+        * For sw components that rely on h/w info of clk dependencies when hw
+        * coordinates. This cpumask should always reflect the hw dependencies.
+        */
+       cpumask_var_t           dependent_cpus;                 /* all clk-dependent cpus */
+
        unsigned int            shared_type; /* ACPI: ANY or ALL affected CPUs
   </snippet>

   * Fallback mechanism for dependent_cpus. With this, s/w components can
     always pick dependent_cpus regardless the coordination type.
   <snippet>
static int cpufreq_online(unsigned int cpu)

                /* related_cpus should at least include policy->cpus. */
                cpumask_copy(policy->related_cpus, policy->cpus);
+
+               /* dependent_cpus should differ only when hw coordination is in place */
+               if (policy->shared_type != CPUFREQ_SHARED_TYPE_HW)
+                       cpumask_copy(policy->dependent_cpus, policy->cpus);
        }
   </snippet>

   * Add sysfs attribute for dependent_cpus

2) drivers/thermal/cpufreq_cooling: Replace related_cpus with dependent_cpus

3) drivers/cpufreq/scmi-cpufreq: Get perf-dependencies from dev_pm_opp_of_get_sharing_cpus()

   * Call dev_pm_opp_of_get_sharing_cpus()

   * Populate policy->dependent_cpus if possible

   * Set policy->shared_type accordingly

   * Provide to EM the correct performance dependencies information
   <snippet>
-       em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
+       /*
+       * EM needs accurate information about perf boundaries, thus provide the
+       * correct cpumask.
+       */
+       if (policy->shared_type == CPUFREQ_SHARED_TYPE_HW)
+               em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb,
+                                           policy->dependent_cpus);
+       else
+               em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb,
+                                           policy->cpus);
   </snippet>

Second proposal:

Another option could be for each driver to store internally the performance
dependencies and let the driver directly provide the correct cpumask for
any consumer.
Whilst this works fine for energy model (see above), it is not the case
(currently) for cpufreq_cooling, thus we would need to add a new interface for
it. That way the driver can call directly the registration function.
This seems to be possible since the CPUFreq core can skip to register
cpufreq_cooling (flag CPUFREQ_IS_COOLING_DEV).

In comparison with the first proposal this one is less scalable since each
driver will have to call the registration functions, while in some cases
the CPUFreq core could do it.

Any comments/preferences?

Many Thanks
Nicola

[1] https://lkml.org/lkml/2020/9/24/399
-- 
2.27.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared
  2020-11-02 12:01 ` [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared Nicola Mazzucato
@ 2020-11-03  5:01   ` Viresh Kumar
  2020-11-04 17:54     ` Nicola Mazzucato
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-03  5:01 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 02-11-20, 12:01, Nicola Mazzucato wrote:
> The opp binding now allows to have an empty opp table and shared-opp to
> merely describe a hw connection among devices (f/v lines).
> 
> When initialising an opp table, allow such case by:
> - treating some errors as warnings
> - do not mark empty tables as shared
> - don't fail on empty table
> 
> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
> ---
>  drivers/opp/of.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 874b58756220..b0230490bb31 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -157,6 +157,11 @@ static void _opp_table_free_required_tables(struct opp_table *opp_table)
>  /*
>   * Populate all devices and opp tables which are part of "required-opps" list.
>   * Checking only the first OPP node should be enough.
> + *
> + * Corner case: empty opp table and opp-shared found. In this case we set
> + * unconditionally the opp table access to exclusive, as the opp-shared property
> + * is used purely to describe hw connections. Such information will be retrieved
> + * via dev_pm_opp_of_get_sharing_cpus().
>   */
>  static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
>  					     struct device *dev,
> @@ -169,7 +174,9 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
>  	/* Traversing the first OPP node is all we need */
>  	np = of_get_next_available_child(opp_np, NULL);
>  	if (!np) {
> -		dev_err(dev, "Empty OPP table\n");
> +		dev_warn(dev, "Empty OPP table\n");
> +
> +		opp_table->shared_opp = OPP_TABLE_ACCESS_EXCLUSIVE;

I am not sure I understand the reasoning behind this.

>  		return;
>  	}
>  
> @@ -377,7 +384,9 @@ int dev_pm_opp_of_find_icc_paths(struct device *dev,
>  	struct icc_path **paths;
>  
>  	ret = _bandwidth_supported(dev, opp_table);
> -	if (ret <= 0)
> +	if (ret == -EINVAL)
> +		return 0; /* Empty OPP table is a valid corner-case, let's not fail */
> +	else if (ret <= 0)
>  		return ret;
>  
>  	ret = 0;
> -- 
> 2.27.0

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies
  2020-11-02 12:01 [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
                   ` (2 preceding siblings ...)
  2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
@ 2020-11-03 10:18 ` Viresh Kumar
  2020-11-04 18:04   ` Nicola Mazzucato
  3 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-03 10:18 UTC (permalink / raw)
  To: Nicola Mazzucato, vincent.guittot
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 02-11-20, 12:01, Nicola Mazzucato wrote:
> Hi All,
> 
> In this V3 posting I have replaced the new dt-binding with minor changes/
> improvements for opp (since we are now using opp tables instead).
> The RFC still stands on how to make this info available to sw consumers.
> 
> In the RFC, I am proposing a simple addition of a performance dependencies
> cpumask in CPUFreq core and an example of how drivers and consumers would
> make use of it.
> I also propose an alternative approach, which does not require changes in
> CPUFreq core, but - possibly - in all the consumers.
> 
> This is to support systems where exposed cpu performance controls are more
> fine-grained that the platform's ability to scale cpus independently.

I was talking to Vincent about what you are doing here and we got a
bit confused and so here are few questions that we had:

- Based on my previous understanding, you don't want software
  coordination of frequencies (which is done by cpufreq today), but
  want the hardware to do that and so you want per-cpu cpufreq
  policies.

- What's the real benefit of hardware coordination ? Want to make sure
  I fully understand that.

- Because of hardware co-ordination of otherwise co-ordinated CPUs,
  few things break. Thermal and EAS are some of the examples and so
  you are trying to fix them here by proving them the missing
  information again.

- One other thing that breaks with this is freq-invariance in the
  scheduler, as the scheduler won't see the real frequencies the
  various CPUs are running at. Most of the hardware we have today
  doesn't have counters, like AMUs, not sure if all future ones based
  on SCMI will have that too, so how are they gong to be fixed ?

  And if we even have to fix this (freq invariance), what's hardware
  coordination giving us that makes all this worth it ?

Sorry about the long list :)

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared
  2020-11-03  5:01   ` Viresh Kumar
@ 2020-11-04 17:54     ` Nicola Mazzucato
  2020-11-05  4:41       ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-04 17:54 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

Hi Viresh, thanks for looking into this.

On 11/3/20 5:01 AM, Viresh Kumar wrote:
> On 02-11-20, 12:01, Nicola Mazzucato wrote:
>> The opp binding now allows to have an empty opp table and shared-opp to
>> merely describe a hw connection among devices (f/v lines).
>>
>> When initialising an opp table, allow such case by:
>> - treating some errors as warnings
>> - do not mark empty tables as shared
>> - don't fail on empty table
>>
>> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
>> ---
>>  drivers/opp/of.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
>> index 874b58756220..b0230490bb31 100644
>> --- a/drivers/opp/of.c
>> +++ b/drivers/opp/of.c
>> @@ -157,6 +157,11 @@ static void _opp_table_free_required_tables(struct opp_table *opp_table)
>>  /*
>>   * Populate all devices and opp tables which are part of "required-opps" list.
>>   * Checking only the first OPP node should be enough.
>> + *
>> + * Corner case: empty opp table and opp-shared found. In this case we set
>> + * unconditionally the opp table access to exclusive, as the opp-shared property
>> + * is used purely to describe hw connections. Such information will be retrieved
>> + * via dev_pm_opp_of_get_sharing_cpus().
>>   */
>>  static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
>>  					     struct device *dev,
>> @@ -169,7 +174,9 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
>>  	/* Traversing the first OPP node is all we need */
>>  	np = of_get_next_available_child(opp_np, NULL);
>>  	if (!np) {
>> -		dev_err(dev, "Empty OPP table\n");
>> +		dev_warn(dev, "Empty OPP table\n");
>> +
>> +		opp_table->shared_opp = OPP_TABLE_ACCESS_EXCLUSIVE;
> 
> I am not sure I understand the reasoning behind this.

Initially I thought to place a comment right there but I ended up with an
explanation of this case at the top of this function (the corner-case). It
probably also needs more details..
Basically, on this case - empty opp table & opp-shared - we limit the scope of
opp-shared to *only* tell us about hw description, and not marking the opp
points as shared, since they are not present in DT. It would be the equivalent
of describing that devices share clock/voltage lines, but we can't tell anything
about opp points cause they are not there (in DT).
OTOH If we don't set shared_opp to OPP_TABLE_ACCESS_EXCLUSIVE for that specific
case, we won't be able to add opps for the remaining cpus as the opp core
will find the opps as duplicated. This is a corner case, really.

Please let me know if it's not clear.

Many thanks
Nicola

> 
>>  		return;
>>  	}
>>  
>> @@ -377,7 +384,9 @@ int dev_pm_opp_of_find_icc_paths(struct device *dev,
>>  	struct icc_path **paths;
>>  
>>  	ret = _bandwidth_supported(dev, opp_table);
>> -	if (ret <= 0)
>> +	if (ret == -EINVAL)
>> +		return 0; /* Empty OPP table is a valid corner-case, let's not fail */
>> +	else if (ret <= 0)
>>  		return ret;
>>  
>>  	ret = 0;
>> -- 
>> 2.27.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies
  2020-11-03 10:18 ` [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Viresh Kumar
@ 2020-11-04 18:04   ` Nicola Mazzucato
  2020-11-05 14:25     ` Vincent Guittot
  0 siblings, 1 reply; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-04 18:04 UTC (permalink / raw)
  To: Viresh Kumar, vincent.guittot
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath, Ionela Voinescu

Hi Viresh, thanks for looking into this.

On 11/3/20 10:18 AM, Viresh Kumar wrote:
> On 02-11-20, 12:01, Nicola Mazzucato wrote:
>> Hi All,
>>
>> In this V3 posting I have replaced the new dt-binding with minor changes/
>> improvements for opp (since we are now using opp tables instead).
>> The RFC still stands on how to make this info available to sw consumers.
>>
>> In the RFC, I am proposing a simple addition of a performance dependencies
>> cpumask in CPUFreq core and an example of how drivers and consumers would
>> make use of it.
>> I also propose an alternative approach, which does not require changes in
>> CPUFreq core, but - possibly - in all the consumers.
>>
>> This is to support systems where exposed cpu performance controls are more
>> fine-grained that the platform's ability to scale cpus independently.
> 
> I was talking to Vincent about what you are doing here and we got a
> bit confused and so here are few questions that we had:
> 
> - Based on my previous understanding, you don't want software
>   coordination of frequencies (which is done by cpufreq today), but
>   want the hardware to do that and so you want per-cpu cpufreq
>   policies.

Correct. And this has been done for quite some time in some platforms.

> 
> - What's the real benefit of hardware coordination ? Want to make sure
>   I fully understand that.

The hardware coordination that is coming out by having per-cpu cpufreq policies
is not new, and works just fine in most of the systems.

The benefit of having per-cpu controls is that the firmware will take care of
the performance of the entire system. It is purely a delegation to firmware for
the performance optimizations.

> 
> - Because of hardware co-ordination of otherwise co-ordinated CPUs,
>   few things break. Thermal and EAS are some of the examples and so
>   you are trying to fix them here by proving them the missing
>   information again.

Correct. And for this I have proposed two ways.

> 
> - One other thing that breaks with this is freq-invariance in the
>   scheduler, as the scheduler won't see the real frequencies the
>   various CPUs are running at. Most of the hardware we have today
>   doesn't have counters, like AMUs, not sure if all future ones based
>   on SCMI will have that too, so how are they gong to be fixed ?
> 

Correct. freq-invariance without counters is trying to do its best based on the
information it has available. It definitely relies on the knowledge of the v/f
domains to work at its best so I think in the case of per-cpu it will follow the
same approach as others being affected (EAS, thermal).

>   And if we even have to fix this (freq invariance), what's hardware
>   coordination giving us that makes all this worth it ?

I suppose this is more a generic question for all the platforms running with h/w
coordination, but for our case is that the f/w will take care of the performance
optimizations for us :)

> 
> Sorry about the long list :)

No problem at all. Thank you for your time on this and I hope I have made bits
clearer.

Nicola

> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared
  2020-11-04 17:54     ` Nicola Mazzucato
@ 2020-11-05  4:41       ` Viresh Kumar
  2020-11-16 11:45         ` Nicola Mazzucato
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-05  4:41 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 04-11-20, 17:54, Nicola Mazzucato wrote:
> Initially I thought to place a comment right there but I ended up with an
> explanation of this case at the top of this function (the corner-case). It
> probably also needs more details..

I read it earlier as well but yeah, that wasn't good enough for me to
understand what you are doing.

> Basically, on this case - empty opp table & opp-shared - we limit the scope of
> opp-shared to *only* tell us about hw description, and not marking the opp
> points as shared, since they are not present in DT.

It doesn't matter where the OPP table entries are coming from. The OPP
table should be marked shared if it is found to be shared.

> It would be the equivalent
> of describing that devices share clock/voltage lines, but we can't tell anything
> about opp points cause they are not there (in DT).

Its okay, even then we should set the right flags here. It is really
confusing that we blindly set it as exclusive, even though it may be
shared.

> OTOH If we don't set shared_opp to OPP_TABLE_ACCESS_EXCLUSIVE for that specific
> case, we won't be able to add opps for the remaining cpus as the opp core
> will find the opps as duplicated. This is a corner case, really.

Hmm, I am not sure where you fail and how but this should be set to
OPP_TABLE_ACCESS_EXCLUSIVE only if the OPP table isn't shared. else it
should be OPP_TABLE_ACCESS_SHARED.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies
  2020-11-04 18:04   ` Nicola Mazzucato
@ 2020-11-05 14:25     ` Vincent Guittot
  2020-11-05 15:46       ` Ionela Voinescu
  0 siblings, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-11-05 14:25 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: Viresh Kumar, linux-kernel, LAK, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Sudeep Holla, Rafael J. Wysocki, Viresh Kumar, Rob Herring,
	Stephen Boyd, Nishanth Menon, Daniel Lezcano, Morten Rasmussen,
	Chris Redpath, Ionela Voinescu

On Wed, 4 Nov 2020 at 19:02, Nicola Mazzucato <nicola.mazzucato@arm.com> wrote:
>
> Hi Viresh, thanks for looking into this.
>
> On 11/3/20 10:18 AM, Viresh Kumar wrote:
> > On 02-11-20, 12:01, Nicola Mazzucato wrote:
> >> Hi All,
> >>
> >> In this V3 posting I have replaced the new dt-binding with minor changes/
> >> improvements for opp (since we are now using opp tables instead).
> >> The RFC still stands on how to make this info available to sw consumers.
> >>
> >> In the RFC, I am proposing a simple addition of a performance dependencies
> >> cpumask in CPUFreq core and an example of how drivers and consumers would
> >> make use of it.
> >> I also propose an alternative approach, which does not require changes in
> >> CPUFreq core, but - possibly - in all the consumers.
> >>
> >> This is to support systems where exposed cpu performance controls are more
> >> fine-grained that the platform's ability to scale cpus independently.
> >
> > I was talking to Vincent about what you are doing here and we got a
> > bit confused and so here are few questions that we had:
> >
> > - Based on my previous understanding, you don't want software
> >   coordination of frequencies (which is done by cpufreq today), but
> >   want the hardware to do that and so you want per-cpu cpufreq
> >   policies.
>
> Correct. And this has been done for quite some time in some platforms.
>
> >
> > - What's the real benefit of hardware coordination ? Want to make sure
> >   I fully understand that.
>
> The hardware coordination that is coming out by having per-cpu cpufreq policies
> is not new, and works just fine in most of the systems.
>
> The benefit of having per-cpu controls is that the firmware will take care of
> the performance of the entire system. It is purely a delegation to firmware for
> the performance optimizations.
>
> >
> > - Because of hardware co-ordination of otherwise co-ordinated CPUs,
> >   few things break. Thermal and EAS are some of the examples and so
> >   you are trying to fix them here by proving them the missing
> >   information again.
>
> Correct. And for this I have proposed two ways.
>
> >
> > - One other thing that breaks with this is freq-invariance in the
> >   scheduler, as the scheduler won't see the real frequencies the
> >   various CPUs are running at. Most of the hardware we have today
> >   doesn't have counters, like AMUs, not sure if all future ones based
> >   on SCMI will have that too, so how are they gong to be fixed ?
> >
>
> Correct. freq-invariance without counters is trying to do its best based on the
> information it has available. It definitely relies on the knowledge of the v/f
> domains to work at its best so I think in the case of per-cpu it will follow the
> same approach as others being affected (EAS, thermal).

As frequency invariance has same problem as EAS and Thermal it would
be good to see the solution as part of this proposal like EAS and
Thermal

>
> >   And if we even have to fix this (freq invariance), what's hardware
> >   coordination giving us that makes all this worth it ?
>
> I suppose this is more a generic question for all the platforms running with h/w
> coordination, but for our case is that the f/w will take care of the performance
> optimizations for us :)
>
> >
> > Sorry about the long list :)
>
> No problem at all. Thank you for your time on this and I hope I have made bits
> clearer.
>
> Nicola
>
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies
  2020-11-05 14:25     ` Vincent Guittot
@ 2020-11-05 15:46       ` Ionela Voinescu
  0 siblings, 0 replies; 29+ messages in thread
From: Ionela Voinescu @ 2020-11-05 15:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Nicola Mazzucato, Viresh Kumar, linux-kernel, LAK,
	open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Sudeep Holla, Rafael J. Wysocki, Viresh Kumar, Rob Herring,
	Stephen Boyd, Nishanth Menon, Daniel Lezcano, Morten Rasmussen,
	Chris Redpath

Hi guys,

On Thursday 05 Nov 2020 at 15:25:53 (+0100), Vincent Guittot wrote:
[..]
> > > - Because of hardware co-ordination of otherwise co-ordinated CPUs,
> > >   few things break. Thermal and EAS are some of the examples and so
> > >   you are trying to fix them here by proving them the missing
> > >   information again.
> >
> > Correct. And for this I have proposed two ways.
> >
> > >
> > > - One other thing that breaks with this is freq-invariance in the
> > >   scheduler, as the scheduler won't see the real frequencies the
> > >   various CPUs are running at. Most of the hardware we have today
> > >   doesn't have counters, like AMUs, not sure if all future ones based
> > >   on SCMI will have that too, so how are they gong to be fixed ?
> > >
> >
> > Correct. freq-invariance without counters is trying to do its best based on the
> > information it has available. It definitely relies on the knowledge of the v/f
> > domains to work at its best so I think in the case of per-cpu it will follow the
> > same approach as others being affected (EAS, thermal).
> 
> As frequency invariance has same problem as EAS and Thermal it would
> be good to see the solution as part of this proposal like EAS and
> Thermal
> 

I think I was waiting for a consensus on patch 3/3, although I believe the
discussion at [1] tended towards option 2: "each driver to store
internally the performance dependencies and let the driver directly
provide the correct cpumask for any consumer."
The alternative was option 1: "add a new dependent_cpus cpumaks in
cpufreq_policy", as Nicola mentioned in the commit message for 3/3.

If the choice is clear, I'm happy to take the FIE fixes in a separate
set.

Thanks,
Ionela.

[1] https://lore.kernel.org/linux-arm-kernel/20200924095347.32148-3-nicola.mazzucato@arm.com/

> >
> > >   And if we even have to fix this (freq invariance), what's hardware
> > >   coordination giving us that makes all this worth it ?
> >
> > I suppose this is more a generic question for all the platforms running with h/w
> > coordination, but for our case is that the f/w will take care of the performance
> > optimizations for us :)
> >
> > >
> > > Sorry about the long list :)
> >
> > No problem at all. Thank you for your time on this and I hope I have made bits
> > clearer.
> >
> > Nicola
> >
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
@ 2020-11-06  9:20   ` Viresh Kumar
  2020-11-06 10:37     ` Lukasz Luba
  2020-11-06  9:24   ` Viresh Kumar
  2020-11-19  6:43   ` Viresh Kumar
  2 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-06  9:20 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 02-11-20, 12:01, Nicola Mazzucato wrote:
> This is a continuation of the previous v2, where we focused mostly on the
> dt binding.
> 
> I am seeking some feedback/comments on the following two approaches.
> 
> Intro:
> We have seen that in a system where performance control and hardware
> description do not match (i.e. per-cpu), we still need the information of
> how the v/f lines are shared among the cpus. We call this information
> "performance dependencies".
> We got this info through the opp-shared (the previous 2 patches aim for
> that).
> 
> Problem:
> How do we share such info (retrieved from a cpufreq driver) to other
> consumers that rely on it? I have two proposals.

I haven't really stop thinking about what and how we should solve
this, but I have few concerns first.

> 2) drivers/thermal/cpufreq_cooling: Replace related_cpus with dependent_cpus

I am not sure if I understand completely on how this is going to be
modified/work.

The only use of related_cpus in the cooling driver is in the helper
cdev->get_requested_power(), where we need to find the total power
being consumed by devices controlled by the cooling device. Right ?

Now the cooling devices today are very closely related to the cpufreq
policy, the registration function itself takes a cpufreq policy as an
argument.

Consider that you have an octa-core platform and all the CPUs are
dependent on each other. With your suggested changes and hw control,
we will have different cpufreq policies for each CPU. And so we will
have a cooling device, cdev, for each CPU as well. When the IPA
governor calls cdev->get_requested_power(), why should we ever bother
to traverse the list of dependent_cpus and not related_cpus only ?

Otherwise the same CPU will have its load contributed to the power of
8 cooling devices.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
  2020-11-06  9:20   ` Viresh Kumar
@ 2020-11-06  9:24   ` Viresh Kumar
  2020-11-19  6:43   ` Viresh Kumar
  2 siblings, 0 replies; 29+ messages in thread
From: Viresh Kumar @ 2020-11-06  9:24 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 02-11-20, 12:01, Nicola Mazzucato wrote:
> Hi All,
> 
> This is a continuation of the previous v2, where we focused mostly on the
> dt binding.
> 
> I am seeking some feedback/comments on the following two approaches.
> 
> Intro:
> We have seen that in a system where performance control and hardware
> description do not match (i.e. per-cpu), we still need the information of
> how the v/f lines are shared among the cpus. We call this information
> "performance dependencies".
> We got this info through the opp-shared (the previous 2 patches aim for
> that).
> 
> Problem:
> How do we share such info (retrieved from a cpufreq driver) to other
> consumers that rely on it? I have two proposals.

FWIW, we already have a case in kernel where something similar is
done.

commit f4fd3797848a ("acpi-cpufreq: Add new sysfs attribute freqdomain_cpus")

and the information is shared via:

/sys/devices/system/cpu/cpu#/cpufreq/freqdomain_cpus

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-06  9:20   ` Viresh Kumar
@ 2020-11-06 10:37     ` Lukasz Luba
  2020-11-06 10:55       ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Lukasz Luba @ 2020-11-06 10:37 UTC (permalink / raw)
  To: Viresh Kumar, Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

Hi Viresh,

On 11/6/20 9:20 AM, Viresh Kumar wrote:
> On 02-11-20, 12:01, Nicola Mazzucato wrote:
>> This is a continuation of the previous v2, where we focused mostly on the
>> dt binding.
>>
>> I am seeking some feedback/comments on the following two approaches.
>>
>> Intro:
>> We have seen that in a system where performance control and hardware
>> description do not match (i.e. per-cpu), we still need the information of
>> how the v/f lines are shared among the cpus. We call this information
>> "performance dependencies".
>> We got this info through the opp-shared (the previous 2 patches aim for
>> that).
>>
>> Problem:
>> How do we share such info (retrieved from a cpufreq driver) to other
>> consumers that rely on it? I have two proposals.
> 
> I haven't really stop thinking about what and how we should solve
> this, but I have few concerns first.
> 
>> 2) drivers/thermal/cpufreq_cooling: Replace related_cpus with dependent_cpus
> 
> I am not sure if I understand completely on how this is going to be
> modified/work.
> 
> The only use of related_cpus in the cooling driver is in the helper
> cdev->get_requested_power(), where we need to find the total power
> being consumed by devices controlled by the cooling device. Right ?
> 
> Now the cooling devices today are very closely related to the cpufreq
> policy, the registration function itself takes a cpufreq policy as an
> argument.
> 
> Consider that you have an octa-core platform and all the CPUs are
> dependent on each other. With your suggested changes and hw control,
> we will have different cpufreq policies for each CPU. And so we will
> have a cooling device, cdev, for each CPU as well. When the IPA
> governor calls cdev->get_requested_power(), why should we ever bother
> to traverse the list of dependent_cpus and not related_cpus only ?
> 
> Otherwise the same CPU will have its load contributed to the power of
> 8 cooling devices.
> 

Good question.

How about a different interface for those cpufreq drivers?
That new registration API would allow to specify the cpumask.
Or rely on EM cpumask: em_span_cpus(em)

Currently we have two ways to register cooling device:
1. when the cpufreq driver set a flag CPUFREQ_IS_COOLING_DEV, the core
will register cooling device
2. cpufreq driver can explicitly call the registration function:
cpufreq_cooling_register() with 'policy' as argument

That would need substantial change to the cpufreq cooling code, from
policy oriented to custom driver's cpumask (like EM registration).

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-06 10:37     ` Lukasz Luba
@ 2020-11-06 10:55       ` Viresh Kumar
  2020-11-06 11:14         ` Lukasz Luba
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-06 10:55 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath

On 06-11-20, 10:37, Lukasz Luba wrote:
> Good question.
> 
> How about a different interface for those cpufreq drivers?
> That new registration API would allow to specify the cpumask.
> Or rely on EM cpumask: em_span_cpus(em)
> 
> Currently we have two ways to register cooling device:
> 1. when the cpufreq driver set a flag CPUFREQ_IS_COOLING_DEV, the core
> will register cooling device
> 2. cpufreq driver can explicitly call the registration function:
> cpufreq_cooling_register() with 'policy' as argument
> 
> That would need substantial change to the cpufreq cooling code, from
> policy oriented to custom driver's cpumask (like EM registration).

I am even wondering if we should really make that change. Why do we
need the combined load of the CPUs to be sent back to the IPA governor
? Why shouldn't they all do that (they == cdev) ?

This is a bit confusing to me, sorry about that. The cpufreq governors
take a look at all the CPUs utilization and set the frequency based on
the highest utilization (and not the total util).

While in this case we present the total load of the CPUs to the IPA
(based on the current frequency of the CPUs), in response to which it
tells us the frequency at which all the CPUs of the policy can run at
(I am not even sure if it is the right thing to do as the CPUs have
different loads). And how do we fit this dependent_cpus thing into
this.

Sorry, I am not sure what's the right way of doing thing here.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-06 10:55       ` Viresh Kumar
@ 2020-11-06 11:14         ` Lukasz Luba
  2020-11-09  6:57           ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Lukasz Luba @ 2020-11-06 11:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath



On 11/6/20 10:55 AM, Viresh Kumar wrote:
> On 06-11-20, 10:37, Lukasz Luba wrote:
>> Good question.
>>
>> How about a different interface for those cpufreq drivers?
>> That new registration API would allow to specify the cpumask.
>> Or rely on EM cpumask: em_span_cpus(em)
>>
>> Currently we have two ways to register cooling device:
>> 1. when the cpufreq driver set a flag CPUFREQ_IS_COOLING_DEV, the core
>> will register cooling device
>> 2. cpufreq driver can explicitly call the registration function:
>> cpufreq_cooling_register() with 'policy' as argument
>>
>> That would need substantial change to the cpufreq cooling code, from
>> policy oriented to custom driver's cpumask (like EM registration).
> 
> I am even wondering if we should really make that change. Why do we
> need the combined load of the CPUs to be sent back to the IPA governor
> ? Why shouldn't they all do that (they == cdev) ?
> 
> This is a bit confusing to me, sorry about that. The cpufreq governors
> take a look at all the CPUs utilization and set the frequency based on
> the highest utilization (and not the total util).
> 
> While in this case we present the total load of the CPUs to the IPA
> (based on the current frequency of the CPUs), in response to which it
> tells us the frequency at which all the CPUs of the policy can run at
> (I am not even sure if it is the right thing to do as the CPUs have
> different loads). And how do we fit this dependent_cpus thing into
> this.
> 
> Sorry, I am not sure what's the right way of doing thing here.
> 

I also had similar doubts, because if we make frequency requests
independently for each CPU, why not having N cooling devs, which
will set independently QoS max freq for them...

What convinced me:
EAS and FIE would know the 'real' frequency of the cluster, IPA
can use it also and have only one cooling device per cluster.

We would like to keep this old style 'one cooling device per cpuset'.
I don't have strong opinion and if it would appear that there are
some errors in freq estimation for cluster, then maybe it does make
more sense to have cdev per CPU...




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-06 11:14         ` Lukasz Luba
@ 2020-11-09  6:57           ` Viresh Kumar
  2020-11-16 11:33             ` Lukasz Luba
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-09  6:57 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath

On 06-11-20, 11:14, Lukasz Luba wrote:
> I also had similar doubts, because if we make frequency requests
> independently for each CPU, why not having N cooling devs, which
> will set independently QoS max freq for them...
> 
> What convinced me:
> EAS and FIE would know the 'real' frequency of the cluster, IPA
> can use it also and have only one cooling device per cluster.
> 
> We would like to keep this old style 'one cooling device per cpuset'.
> I don't have strong opinion and if it would appear that there are
> some errors in freq estimation for cluster, then maybe it does make
> more sense to have cdev per CPU...

Let me rephrase my question. What is it that doesn't work _correctly_
with cdev per cpufreq policy in your case? What doesn't work well if
the thermal stuff keeps looking at only the related_cpus thing and not
the cpu-perf-dependencies thing?

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-09  6:57           ` Viresh Kumar
@ 2020-11-16 11:33             ` Lukasz Luba
  2020-11-17 10:11               ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Lukasz Luba @ 2020-11-16 11:33 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath



On 11/9/20 6:57 AM, Viresh Kumar wrote:
> On 06-11-20, 11:14, Lukasz Luba wrote:
>> I also had similar doubts, because if we make frequency requests
>> independently for each CPU, why not having N cooling devs, which
>> will set independently QoS max freq for them...
>>
>> What convinced me:
>> EAS and FIE would know the 'real' frequency of the cluster, IPA
>> can use it also and have only one cooling device per cluster.
>>
>> We would like to keep this old style 'one cooling device per cpuset'.
>> I don't have strong opinion and if it would appear that there are
>> some errors in freq estimation for cluster, then maybe it does make
>> more sense to have cdev per CPU...
> 
> Let me rephrase my question. What is it that doesn't work _correctly_
> with cdev per cpufreq policy in your case? What doesn't work well if
> the thermal stuff keeps looking at only the related_cpus thing and not
> the cpu-perf-dependencies thing?
> 

We don't have a platform which would be this per-cpu freq request, yet.
Thus it's hard to answer your question. The EAS would work in 'old
style' - cluster mode. I don't know how IPA would work on such HW
and SW configuration. To figure this out I need a real platform.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared
  2020-11-05  4:41       ` Viresh Kumar
@ 2020-11-16 11:45         ` Nicola Mazzucato
  0 siblings, 0 replies; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-16 11:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

Hi Viresh,

sorry for being late in replying.


On 11/5/20 4:41 AM, Viresh Kumar wrote:
> On 04-11-20, 17:54, Nicola Mazzucato wrote:
>> Initially I thought to place a comment right there but I ended up with an
>> explanation of this case at the top of this function (the corner-case). It
>> probably also needs more details..
> 
> I read it earlier as well but yeah, that wasn't good enough for me to
> understand what you are doing.
> 
>> Basically, on this case - empty opp table & opp-shared - we limit the scope of
>> opp-shared to *only* tell us about hw description, and not marking the opp
>> points as shared, since they are not present in DT.
> 
> It doesn't matter where the OPP table entries are coming from. The OPP
> table should be marked shared if it is found to be shared.
> 
>> It would be the equivalent
>> of describing that devices share clock/voltage lines, but we can't tell anything
>> about opp points cause they are not there (in DT).
> 
> Its okay, even then we should set the right flags here. It is really
> confusing that we blindly set it as exclusive, even though it may be
> shared.
> 
>> OTOH If we don't set shared_opp to OPP_TABLE_ACCESS_EXCLUSIVE for that specific
>> case, we won't be able to add opps for the remaining cpus as the opp core
>> will find the opps as duplicated. This is a corner case, really.
> 
> Hmm, I am not sure where you fail and how but this should be set to
> OPP_TABLE_ACCESS_EXCLUSIVE only if the OPP table isn't shared. else it
> should be OPP_TABLE_ACCESS_SHARED.
> 
Thanks for providing more details around the meaning of opp-shared, much
appreciated. I had some time to play a bit more, and yes, there is no need to
set shared_opp to OPP_TABLE_ACCESS_EXCLUSIVE. A minimal change in the driver
sequence would suffice.

I will remove that in the V4.

Many thanks,
Nicola

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-16 11:33             ` Lukasz Luba
@ 2020-11-17 10:11               ` Viresh Kumar
  2020-11-17 10:47                 ` Nicola Mazzucato
  2020-11-17 10:47                 ` Lukasz Luba
  0 siblings, 2 replies; 29+ messages in thread
From: Viresh Kumar @ 2020-11-17 10:11 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath

On 16-11-20, 11:33, Lukasz Luba wrote:
> On 11/9/20 6:57 AM, Viresh Kumar wrote:
> > On 06-11-20, 11:14, Lukasz Luba wrote:
> > > I also had similar doubts, because if we make frequency requests
> > > independently for each CPU, why not having N cooling devs, which
> > > will set independently QoS max freq for them...
> > > 
> > > What convinced me:
> > > EAS and FIE would know the 'real' frequency of the cluster, IPA
> > > can use it also and have only one cooling device per cluster.
> > > 
> > > We would like to keep this old style 'one cooling device per cpuset'.
> > > I don't have strong opinion and if it would appear that there are
> > > some errors in freq estimation for cluster, then maybe it does make
> > > more sense to have cdev per CPU...
> > 
> > Let me rephrase my question. What is it that doesn't work _correctly_
> > with cdev per cpufreq policy in your case? What doesn't work well if
> > the thermal stuff keeps looking at only the related_cpus thing and not
> > the cpu-perf-dependencies thing?
> > 
> 
> We don't have a platform which would be this per-cpu freq request, yet.
> Thus it's hard to answer your question. The EAS would work in 'old
> style' - cluster mode. I don't know how IPA would work on such HW
> and SW configuration. To figure this out I need a real platform.

Hmm, so who are going to be the users of this new stuff (dependent
CPUs) ? I don't think cpufreq-cooling should be updated, unless there
is a compelling reason to.

The other one in energy model ? Why does it need this information ?

Who else ?

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 10:11               ` Viresh Kumar
@ 2020-11-17 10:47                 ` Nicola Mazzucato
  2020-11-17 10:53                   ` Viresh Kumar
  2020-11-17 10:47                 ` Lukasz Luba
  1 sibling, 1 reply; 29+ messages in thread
From: Nicola Mazzucato @ 2020-11-17 10:47 UTC (permalink / raw)
  To: Viresh Kumar, Lukasz Luba
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

Hi Viresh,


On 11/17/20 10:11 AM, Viresh Kumar wrote:
> On 16-11-20, 11:33, Lukasz Luba wrote:
>> On 11/9/20 6:57 AM, Viresh Kumar wrote:
>>> On 06-11-20, 11:14, Lukasz Luba wrote:
>>>> I also had similar doubts, because if we make frequency requests
>>>> independently for each CPU, why not having N cooling devs, which
>>>> will set independently QoS max freq for them...
>>>>
>>>> What convinced me:
>>>> EAS and FIE would know the 'real' frequency of the cluster, IPA
>>>> can use it also and have only one cooling device per cluster.
>>>>
>>>> We would like to keep this old style 'one cooling device per cpuset'.
>>>> I don't have strong opinion and if it would appear that there are
>>>> some errors in freq estimation for cluster, then maybe it does make
>>>> more sense to have cdev per CPU...
>>>
>>> Let me rephrase my question. What is it that doesn't work _correctly_
>>> with cdev per cpufreq policy in your case? What doesn't work well if
>>> the thermal stuff keeps looking at only the related_cpus thing and not
>>> the cpu-perf-dependencies thing?
>>>
>>
>> We don't have a platform which would be this per-cpu freq request, yet.
>> Thus it's hard to answer your question. The EAS would work in 'old
>> style' - cluster mode. I don't know how IPA would work on such HW
>> and SW configuration. To figure this out I need a real platform.
> 
> Hmm, so who are going to be the users of this new stuff (dependent
> CPUs) ?

In general, any platform that has hardware coordination in place and some
components need to use the information.

I don't think cpufreq-cooling should be updated, unless there
> is a compelling reason to.
> 
> The other one in energy model ? Why does it need this information ?

The reasons has probably gone lost in the emails, but in a nutshell EM needs
accurate information on performance boundaries to achieve correct task placement.

> 
> Who else ?
> 

Freq-invariance has been mentioned. I suppose the fix will depend on which
strategy we prefer to solve this.

As a reminder, two solutions:
1) dependent_cpus cpumask in cpufreq and involved entities pick this info
or
2) dependent_cpus cpumask in driver but some entities' interfaces may need to change

Hope it helps,
Nicola

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 10:11               ` Viresh Kumar
  2020-11-17 10:47                 ` Nicola Mazzucato
@ 2020-11-17 10:47                 ` Lukasz Luba
  2020-11-17 10:55                   ` Viresh Kumar
  1 sibling, 1 reply; 29+ messages in thread
From: Lukasz Luba @ 2020-11-17 10:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath



On 11/17/20 10:11 AM, Viresh Kumar wrote:
> On 16-11-20, 11:33, Lukasz Luba wrote:
>> On 11/9/20 6:57 AM, Viresh Kumar wrote:
>>> On 06-11-20, 11:14, Lukasz Luba wrote:
>>>> I also had similar doubts, because if we make frequency requests
>>>> independently for each CPU, why not having N cooling devs, which
>>>> will set independently QoS max freq for them...
>>>>
>>>> What convinced me:
>>>> EAS and FIE would know the 'real' frequency of the cluster, IPA
>>>> can use it also and have only one cooling device per cluster.
>>>>
>>>> We would like to keep this old style 'one cooling device per cpuset'.
>>>> I don't have strong opinion and if it would appear that there are
>>>> some errors in freq estimation for cluster, then maybe it does make
>>>> more sense to have cdev per CPU...
>>>
>>> Let me rephrase my question. What is it that doesn't work _correctly_
>>> with cdev per cpufreq policy in your case? What doesn't work well if
>>> the thermal stuff keeps looking at only the related_cpus thing and not
>>> the cpu-perf-dependencies thing?
>>>
>>
>> We don't have a platform which would be this per-cpu freq request, yet.
>> Thus it's hard to answer your question. The EAS would work in 'old
>> style' - cluster mode. I don't know how IPA would work on such HW
>> and SW configuration. To figure this out I need a real platform.
> 
> Hmm, so who are going to be the users of this new stuff (dependent
> CPUs) ? I don't think cpufreq-cooling should be updated, unless there
> is a compelling reason to.

Fair enough. What if we come back with experiments results in future?
Currently we are trying to conduct experiments with Nicola on modified 
Juno firmware and kernel)

> 
> The other one in energy model ? Why does it need this information ?

The energy model needs this information, because it's used during the
sched domain re-build. The sched domain is then used in the EAS main
functions: feec() [1] and compute_energy() [2].
What EAS normally does is 'trying different options to put a task
on different CPUs and observe the potential energy costs'.
Example, we need the 'cluster' frequency, because when you put a task on
a CPU its freq might need to be set a bit higher. This would affect all
CPUs in the cluster not only one and we capture that energy cost. Then,
we try to put a task on other CPU in that cluster and if it appears to
be no need of rising freq for the whole cluster, then it wins.


> 
> Who else ?
> 

FIE..

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 10:47                 ` Nicola Mazzucato
@ 2020-11-17 10:53                   ` Viresh Kumar
  2020-11-17 13:06                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-17 10:53 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: Lukasz Luba, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath

On 17-11-20, 10:47, Nicola Mazzucato wrote:
> Freq-invariance has been mentioned. I suppose the fix will depend on which
> strategy we prefer to solve this.

I am not sure how FIE will use this information, as I thought the
problem is about knowing the current frequency on a CPU instead of
which CPUs are related. Anyway, EM is good enough to get this stuff
going.

> As a reminder, two solutions:
> 1) dependent_cpus cpumask in cpufreq and involved entities pick this info

Yeah, this one. And it will be called freqdomain_cpus. Add support for
freqdomain_cpus in core, update acpi-cpufreq to reuse it instead of
adding its own and you can have it in other drivers then.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 10:47                 ` Lukasz Luba
@ 2020-11-17 10:55                   ` Viresh Kumar
  0 siblings, 0 replies; 29+ messages in thread
From: Viresh Kumar @ 2020-11-17 10:55 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nicola Mazzucato, linux-kernel, linux-arm-kernel, linux-pm,
	devicetree, sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm,
	daniel.lezcano, morten.rasmussen, chris.redpath

On 17-11-20, 10:47, Lukasz Luba wrote:
> Fair enough. What if we come back with experiments results in future?
> Currently we are trying to conduct experiments with Nicola on modified Juno
> firmware and kernel)

Sure. If we think it improves things, why not. I just wanted to make
sure we understand why would we go do this change now.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 10:53                   ` Viresh Kumar
@ 2020-11-17 13:06                     ` Rafael J. Wysocki
  2020-11-18  4:42                       ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Rafael J. Wysocki @ 2020-11-17 13:06 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Nicola Mazzucato, Lukasz Luba, Linux Kernel Mailing List,
	Linux ARM, Linux PM, devicetree, Sudeep Holla, Rafael J. Wysocki,
	Viresh Kumar, Rob Herring, Stephen Boyd, Nishanth Menon,
	Daniel Lezcano, Morten Rasmussen, Chris Redpath

On Tue, Nov 17, 2020 at 11:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 17-11-20, 10:47, Nicola Mazzucato wrote:
> > Freq-invariance has been mentioned. I suppose the fix will depend on which
> > strategy we prefer to solve this.
>
> I am not sure how FIE will use this information, as I thought the
> problem is about knowing the current frequency on a CPU instead of
> which CPUs are related. Anyway, EM is good enough to get this stuff
> going.
>
> > As a reminder, two solutions:
> > 1) dependent_cpus cpumask in cpufreq and involved entities pick this info
>
> Yeah, this one. And it will be called freqdomain_cpus. Add support for
> freqdomain_cpus in core, update acpi-cpufreq to reuse it instead of
> adding its own and you can have it in other drivers then.

Is this really a cpufreq thing, though, or is it arch stuff?  I think
the latter, because it is not necessary for anything in cpufreq.

Yes, acpi-cpufreq happens to know this information, because it uses
processor_perflib, but the latter may as well be used by the arch
enumeration of CPUs and the freqdomain_cpus mask may be populated from
there.

As far as cpufreq is concerned, if the interface to the hardware is
per-CPU, there is one CPU per policy and cpufreq has no business
knowing anything about the underlying hardware coordination.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-17 13:06                     ` Rafael J. Wysocki
@ 2020-11-18  4:42                       ` Viresh Kumar
  2020-11-18 12:00                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 29+ messages in thread
From: Viresh Kumar @ 2020-11-18  4:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nicola Mazzucato, Lukasz Luba, Linux Kernel Mailing List,
	Linux ARM, Linux PM, devicetree, Sudeep Holla, Rafael J. Wysocki,
	Viresh Kumar, Rob Herring, Stephen Boyd, Nishanth Menon,
	Daniel Lezcano, Morten Rasmussen, Chris Redpath

On 17-11-20, 14:06, Rafael J. Wysocki wrote:
> Is this really a cpufreq thing, though, or is it arch stuff?  I think
> the latter, because it is not necessary for anything in cpufreq.
> 
> Yes, acpi-cpufreq happens to know this information, because it uses
> processor_perflib, but the latter may as well be used by the arch
> enumeration of CPUs and the freqdomain_cpus mask may be populated from
> there.
> 
> As far as cpufreq is concerned, if the interface to the hardware is
> per-CPU, there is one CPU per policy and cpufreq has no business
> knowing anything about the underlying hardware coordination.

It won't be used by cpufreq for now at least and yes I understand your
concern. I opted for this because we already have a cpufreq
implementation for the same thing and it is usually better to reuse
this kind of stuff instead of inventing it over.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-18  4:42                       ` Viresh Kumar
@ 2020-11-18 12:00                         ` Rafael J. Wysocki
  2020-11-19  6:40                           ` Viresh Kumar
  0 siblings, 1 reply; 29+ messages in thread
From: Rafael J. Wysocki @ 2020-11-18 12:00 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Nicola Mazzucato, Lukasz Luba,
	Linux Kernel Mailing List, Linux ARM, Linux PM, devicetree,
	Sudeep Holla, Rafael J. Wysocki, Viresh Kumar, Rob Herring,
	Stephen Boyd, Nishanth Menon, Daniel Lezcano, Morten Rasmussen,
	Chris Redpath

On Wed, Nov 18, 2020 at 5:42 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 17-11-20, 14:06, Rafael J. Wysocki wrote:
> > Is this really a cpufreq thing, though, or is it arch stuff?  I think
> > the latter, because it is not necessary for anything in cpufreq.
> >
> > Yes, acpi-cpufreq happens to know this information, because it uses
> > processor_perflib, but the latter may as well be used by the arch
> > enumeration of CPUs and the freqdomain_cpus mask may be populated from
> > there.
> >
> > As far as cpufreq is concerned, if the interface to the hardware is
> > per-CPU, there is one CPU per policy and cpufreq has no business
> > knowing anything about the underlying hardware coordination.
>
> It won't be used by cpufreq for now at least and yes I understand your
> concern. I opted for this because we already have a cpufreq
> implementation for the same thing and it is usually better to reuse
> this kind of stuff instead of inventing it over.

Do you mean related_cpus and real_cpus?

That's the granularity of the interface to the hardware I'm talking about.

Strictly speaking, it means "these CPUs share a HW interface for perf
control" and it need not mean "these CPUs are in the same
clock/voltage domain".  Specifically, it need not mean "these CPUs are
the only CPUs in the given clock/voltage domain".  That's what it
means when the control is exercised by manipulating OPPs directly, but
not in general.

In the ACPI case, for example, what the firmware tells you need not
reflect the HW topology in principle.  It only tells you whether or
not it wants you to coordinate a given group of CPUs and in what way,
but this may not be the whole picture from the HW perspective.  If you
need the latter, you need more information in general (at least you
need to assume that what the firmware tells you actually does reflect
the HW topology on the given SoC).

So yes, in the particular case of OPP-based perf control, cpufreq
happens to have the same information that is needed by the other
subsystems, but otherwise it may not and what I'm saying is that it
generally is a mistake to expect cpufreq to have that information or
to be able to obtain it without the help of the arch/platform code.
Hence, it would be a mistake to design an interface based on that
expectation.

Or looking at it from a different angle, today a cpufreq driver is
only required to specify the granularity of the HW interface for perf
control via related_cpus.  It is not required to obtain extra
information beyond that.  If a new mask to be populated by it is
added, the driver may need to do more work which is not necessary from
the perf control perspective.  That doesn't look particularly clean to
me.

Moreover, adding such a mask to cpufreq_policy would make the users of
it depend on cpufreq sort of artificially, which need not be useful
even.

IMO, the information needed by all of the subsystems in question
should be obtained and made available at the arch/platform level and
everyone who needs it should be able to access it from there,
including the cpufreq driver for the given platform if that's what it
needs to do.

BTW, cpuidle may need the information in question too, so why should
it be provided via cpufreq rather than via cpuidle?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-18 12:00                         ` Rafael J. Wysocki
@ 2020-11-19  6:40                           ` Viresh Kumar
  0 siblings, 0 replies; 29+ messages in thread
From: Viresh Kumar @ 2020-11-19  6:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nicola Mazzucato, Lukasz Luba, Linux Kernel Mailing List,
	Linux ARM, Linux PM, devicetree, Sudeep Holla, Rafael J. Wysocki,
	Viresh Kumar, Rob Herring, Stephen Boyd, Nishanth Menon,
	Daniel Lezcano, Morten Rasmussen, Chris Redpath

On 18-11-20, 13:00, Rafael J. Wysocki wrote:
> On Wed, Nov 18, 2020 at 5:42 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 17-11-20, 14:06, Rafael J. Wysocki wrote:
> > > Is this really a cpufreq thing, though, or is it arch stuff?  I think
> > > the latter, because it is not necessary for anything in cpufreq.
> > >
> > > Yes, acpi-cpufreq happens to know this information, because it uses
> > > processor_perflib, but the latter may as well be used by the arch
> > > enumeration of CPUs and the freqdomain_cpus mask may be populated from
> > > there.
> > >
> > > As far as cpufreq is concerned, if the interface to the hardware is
> > > per-CPU, there is one CPU per policy and cpufreq has no business
> > > knowing anything about the underlying hardware coordination.
> >
> > It won't be used by cpufreq for now at least and yes I understand your
> > concern. I opted for this because we already have a cpufreq
> > implementation for the same thing and it is usually better to reuse
> > this kind of stuff instead of inventing it over.
> 
> Do you mean related_cpus and real_cpus?

Sorry about the confusion, I meant freqdomain_cpus only.

> That's the granularity of the interface to the hardware I'm talking about.
> 
> Strictly speaking, it means "these CPUs share a HW interface for perf
> control" and it need not mean "these CPUs are in the same
> clock/voltage domain".  Specifically, it need not mean "these CPUs are
> the only CPUs in the given clock/voltage domain".  That's what it
> means when the control is exercised by manipulating OPPs directly, but
> not in general.
> 
> In the ACPI case, for example, what the firmware tells you need not
> reflect the HW topology in principle.  It only tells you whether or
> not it wants you to coordinate a given group of CPUs and in what way,
> but this may not be the whole picture from the HW perspective.  If you
> need the latter, you need more information in general (at least you
> need to assume that what the firmware tells you actually does reflect
> the HW topology on the given SoC).
> 
> So yes, in the particular case of OPP-based perf control, cpufreq
> happens to have the same information that is needed by the other
> subsystems, but otherwise it may not and what I'm saying is that it
> generally is a mistake to expect cpufreq to have that information or
> to be able to obtain it without the help of the arch/platform code.
> Hence, it would be a mistake to design an interface based on that
> expectation.
> 
> Or looking at it from a different angle, today a cpufreq driver is
> only required to specify the granularity of the HW interface for perf
> control via related_cpus.  It is not required to obtain extra
> information beyond that.  If a new mask to be populated by it is
> added, the driver may need to do more work which is not necessary from
> the perf control perspective.  That doesn't look particularly clean to
> me.
> 
> Moreover, adding such a mask to cpufreq_policy would make the users of
> it depend on cpufreq sort of artificially, which need not be useful
> even.
> 
> IMO, the information needed by all of the subsystems in question
> should be obtained and made available at the arch/platform level and
> everyone who needs it should be able to access it from there,
> including the cpufreq driver for the given platform if that's what it
> needs to do.
> 
> BTW, cpuidle may need the information in question too, so why should
> it be provided via cpufreq rather than via cpuidle?

Right.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies
  2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
  2020-11-06  9:20   ` Viresh Kumar
  2020-11-06  9:24   ` Viresh Kumar
@ 2020-11-19  6:43   ` Viresh Kumar
  2 siblings, 0 replies; 29+ messages in thread
From: Viresh Kumar @ 2020-11-19  6:43 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, devicetree,
	sudeep.holla, rjw, vireshk, robh+dt, sboyd, nm, daniel.lezcano,
	morten.rasmussen, chris.redpath

On 02-11-20, 12:01, Nicola Mazzucato wrote:
> Second proposal:
> 
> Another option could be for each driver to store internally the performance
> dependencies and let the driver directly provide the correct cpumask for
> any consumer.

From the discussion that happened in this thread, looks like we are
going to do it.

> Whilst this works fine for energy model (see above), it is not the case

Good.

> (currently) for cpufreq_cooling, thus we would need to add a new interface for

And this is not required for now as we discussed.

-- 
viresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-19  6:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 12:01 [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
2020-11-02 12:01 ` [PATCH v3 1/3] dt-bindings/opp: Update documentation for opp-shared Nicola Mazzucato
2020-11-02 12:01 ` [PATCH v3 2/3] opp/of: Allow empty opp-table with opp-shared Nicola Mazzucato
2020-11-03  5:01   ` Viresh Kumar
2020-11-04 17:54     ` Nicola Mazzucato
2020-11-05  4:41       ` Viresh Kumar
2020-11-16 11:45         ` Nicola Mazzucato
2020-11-02 12:01 ` [PATCH v3 3/3] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
2020-11-06  9:20   ` Viresh Kumar
2020-11-06 10:37     ` Lukasz Luba
2020-11-06 10:55       ` Viresh Kumar
2020-11-06 11:14         ` Lukasz Luba
2020-11-09  6:57           ` Viresh Kumar
2020-11-16 11:33             ` Lukasz Luba
2020-11-17 10:11               ` Viresh Kumar
2020-11-17 10:47                 ` Nicola Mazzucato
2020-11-17 10:53                   ` Viresh Kumar
2020-11-17 13:06                     ` Rafael J. Wysocki
2020-11-18  4:42                       ` Viresh Kumar
2020-11-18 12:00                         ` Rafael J. Wysocki
2020-11-19  6:40                           ` Viresh Kumar
2020-11-17 10:47                 ` Lukasz Luba
2020-11-17 10:55                   ` Viresh Kumar
2020-11-06  9:24   ` Viresh Kumar
2020-11-19  6:43   ` Viresh Kumar
2020-11-03 10:18 ` [PATCH v3 0/3] CPUFreq: Add support for cpu performance dependencies Viresh Kumar
2020-11-04 18:04   ` Nicola Mazzucato
2020-11-05 14:25     ` Vincent Guittot
2020-11-05 15:46       ` Ionela Voinescu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).