linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus
@ 2021-01-11 15:45 Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 1/4] scmi-cpufreq: Remove deferred probe Nicola Mazzucato
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-11 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

Hi All,

In this V6 posting I have addressed comments on reworking the scmi cpufreq
driver in smaller changes.

This is to support systems where exposed cpu performance controls are more
fine-grained that the platform's ability to scale cpus independently.

Many thanks,
Nicola

[v6]
  * Remove deferred probe, not occurring
  * Move common stuff for CPUs from _init stage to _probe

This V6 is rebased on next-20210111

[v5]
  * Rework documentation of opp-shared within OPP node
  * Register EM only for the first CPU within cpumask in driver
  * Add check for nr_opp in driver before registering EM
  * Add comments on both dev_pm_opp_get_opp_count in driver
  * Remove redundant ret=0 in driver

This v5 is rebased on top of:
next-20201208 + Lukasz Luba's patches [1]

[v4]
  * Remove unconditional set of opp_table->shared_opp to exclusive
  * Add implementation for scmi-cpufreq
  * Change subject

These patches are on top of:
next-20201201 + Lukasz Luba's patches (waiting for Rafael) [1]

[v3]
  * Remove proposal for new 'cpu-performance-dependencies' as we instead
    can reuse the opp table.
  * Update documentation for devicetree/bindings/opp
  * Minor changes within opp to support empty opp table
  * Rework the RFC by adding a second proposal

[v2]
  * Fix errors when running make dt_binding_check
  * Improve commit message description for the dt-binding
  * Add RFC for implementation in cpufreq-core and one of its
    drivers.

Nicola Mazzucato (3):
  scmi-cpufreq: Remove deferred probe
  scmi-cpufreq: Move CPU initialisation to probe
  scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM

Sudeep Holla (1):
  cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev

 drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
 drivers/cpufreq/scmi-cpufreq.c       | 196 +++++++++++++++++++++------
 2 files changed, 155 insertions(+), 43 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/4] scmi-cpufreq: Remove deferred probe
  2021-01-11 15:45 [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus Nicola Mazzucato
@ 2021-01-11 15:45 ` Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe Nicola Mazzucato
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-11 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

The current implementation still carries a case for a deferred probe, but
in practise this should not happen anymore.

Since the energy model expects to pass the number of OPPs, let us just
move the call dev_pm_opp_get_opp_count closer to EM registration instead.

Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 491a0a24fb1e..15b213ed78fa 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -153,13 +153,6 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 		return ret;
 	}
 
-	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
-	if (nr_opp <= 0) {
-		dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");
-		ret = -EPROBE_DEFER;
-		goto out_free_opp;
-	}
-
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (!priv) {
 		ret = -ENOMEM;
@@ -190,6 +183,15 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 	policy->fast_switch_possible =
 		handle->perf_ops->fast_switch_possible(handle, cpu_dev);
 
+	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
+	if (nr_opp <= 0) {
+		dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
+			__func__, ret);
+
+		ret = -ENODEV;
+		goto out_free_priv;
+	}
+
 	power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
 	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
 				    power_scale_mw);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-11 15:45 [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 1/4] scmi-cpufreq: Remove deferred probe Nicola Mazzucato
@ 2021-01-11 15:45 ` Nicola Mazzucato
  2021-01-12 11:17   ` Viresh Kumar
  2021-01-14 16:54   ` Cristian Marussi
  2021-01-11 15:45 ` [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 4/4] cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev Nicola Mazzucato
  3 siblings, 2 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-11 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

Some of the cpu related initialisation can be done at probe stage.
This patch moves those initialisations from the ->init callback to the
probe stage.

This is done in preparation for adding support to retrieve additional
information from DT (CPUs sharing v/f lines).

Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 180 ++++++++++++++++++++++++---------
 1 file changed, 135 insertions(+), 45 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 15b213ed78fa..4aa97cdc5997 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -25,6 +25,14 @@ struct scmi_data {
 	struct device *cpu_dev;
 };
 
+/* Per-CPU storage for runtime management */
+struct scmi_cpudata {
+	cpumask_var_t scmi_shared_cpus;
+	struct cpufreq_frequency_table *freq_table;
+};
+
+static struct scmi_cpudata *cpudata_table;
+
 static const struct scmi_handle *handle;
 
 static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
@@ -120,13 +128,10 @@ scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
 
 static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 {
-	int ret, nr_opp;
+	int ret;
 	unsigned int latency;
 	struct device *cpu_dev;
 	struct scmi_data *priv;
-	struct cpufreq_frequency_table *freq_table;
-	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
-	bool power_scale_mw;
 
 	cpu_dev = get_cpu_device(policy->cpu);
 	if (!cpu_dev) {
@@ -134,42 +139,19 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 		return -ENODEV;
 	}
 
-	ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
-	if (ret) {
-		dev_warn(cpu_dev, "failed to add opps to the device\n");
-		return ret;
-	}
-
-	ret = scmi_get_sharing_cpus(cpu_dev, policy->cpus);
-	if (ret) {
-		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
-		return ret;
-	}
-
-	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
-	if (ret) {
-		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
-			__func__, ret);
-		return ret;
-	}
-
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (!priv) {
 		ret = -ENOMEM;
 		goto out_free_opp;
 	}
 
-	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
-	if (ret) {
-		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
-		goto out_free_priv;
-	}
+	cpumask_copy(policy->cpus, cpudata_table[policy->cpu].scmi_shared_cpus);
 
 	priv->cpu_dev = cpu_dev;
 	priv->domain_id = handle->perf_ops->device_domain_id(cpu_dev);
 
 	policy->driver_data = priv;
-	policy->freq_table = freq_table;
+	policy->freq_table = cpudata_table[policy->cpu].freq_table;
 
 	/* SCMI allows DVFS request for any domain from any CPU */
 	policy->dvfs_possible_from_any_cpu = true;
@@ -183,23 +165,8 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 	policy->fast_switch_possible =
 		handle->perf_ops->fast_switch_possible(handle, cpu_dev);
 
-	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
-	if (nr_opp <= 0) {
-		dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
-			__func__, ret);
-
-		ret = -ENODEV;
-		goto out_free_priv;
-	}
-
-	power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
-	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
-				    power_scale_mw);
-
 	return 0;
 
-out_free_priv:
-	kfree(priv);
 out_free_opp:
 	dev_pm_opp_remove_all_dynamic(cpu_dev);
 
@@ -210,7 +177,6 @@ static int scmi_cpufreq_exit(struct cpufreq_policy *policy)
 {
 	struct scmi_data *priv = policy->driver_data;
 
-	dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
 	dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
 	kfree(priv);
 
@@ -231,10 +197,102 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
 	.exit	= scmi_cpufreq_exit,
 };
 
+static int scmi_init_cpudata(void)
+{
+	int cpu;
+	unsigned int ncpus = num_possible_cpus();
+
+	cpudata_table = kzalloc(sizeof(*cpudata_table) * ncpus, GFP_KERNEL);
+	if (!cpudata_table)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
+					GFP_KERNEL))
+			goto out;
+	}
+
+	return 0;
+
+out:
+	kfree(cpudata_table);
+	return -ENOMEM;
+}
+
+static int scmi_init_device(const struct scmi_handle *handle, int cpu)
+{
+	struct device *cpu_dev;
+	int ret, nr_opp;
+	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
+	bool power_scale_mw;
+	cpumask_var_t scmi_cpus;
+
+	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_set_cpu(cpu, scmi_cpus);
+
+	cpu_dev = get_cpu_device(cpu);
+
+	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
+	if (ret) {
+		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
+		goto free_cpumask;
+	}
+
+	/*
+	 * We get here for each CPU. Add OPPs only on those CPUs for which we
+	 * haven't already done so, or set their OPPs as shared.
+	 */
+	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
+	if (nr_opp <= 0) {
+		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
+		if (ret) {
+			dev_warn(cpu_dev, "failed to add opps to the device\n");
+			goto free_cpumask;
+		}
+
+		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
+		if (ret) {
+			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
+				__func__, ret);
+			goto free_cpumask;
+		}
+
+		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
+		if (nr_opp <= 0) {
+			dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
+				__func__, ret);
+
+			ret = -ENODEV;
+			goto free_cpumask;
+		}
+
+		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
+		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
+					    power_scale_mw);
+	}
+
+	ret = dev_pm_opp_init_cpufreq_table(cpu_dev,
+					    &cpudata_table[cpu].freq_table);
+	if (ret) {
+		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
+		goto free_cpumask;
+	}
+
+	cpumask_copy(cpudata_table[cpu].scmi_shared_cpus, scmi_cpus);
+
+free_cpumask:
+	free_cpumask_var(scmi_cpus);
+	return ret;
+}
+
 static int scmi_cpufreq_probe(struct scmi_device *sdev)
 {
 	int ret;
 	struct device *dev = &sdev->dev;
+	int cpu;
+	struct device *cpu_dev;
 
 	handle = sdev->handle;
 
@@ -247,6 +305,24 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
 		devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, NULL);
 #endif
 
+	ret = scmi_init_cpudata();
+	if (ret) {
+		pr_err("%s: init cpu data failed\n", __func__);
+		return ret;
+	}
+
+	for_each_possible_cpu(cpu) {
+		cpu_dev = get_cpu_device(cpu);
+
+		ret = scmi_init_device(handle, cpu);
+		if (ret) {
+			dev_err(cpu_dev, "%s: init device failed\n",
+				__func__);
+
+			return ret;
+		}
+	}
+
 	ret = cpufreq_register_driver(&scmi_cpufreq_driver);
 	if (ret) {
 		dev_err(dev, "%s: registering cpufreq failed, err: %d\n",
@@ -258,6 +334,20 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
 
 static void scmi_cpufreq_remove(struct scmi_device *sdev)
 {
+	int cpu;
+	struct device *cpu_dev;
+
+	for_each_possible_cpu(cpu) {
+		cpu_dev = get_cpu_device(cpu);
+
+		dev_pm_opp_free_cpufreq_table(cpu_dev,
+					      &cpudata_table[cpu].freq_table);
+
+		free_cpumask_var(cpudata_table[cpu].scmi_shared_cpus);
+	}
+
+	kfree(cpudata_table);
+
 	cpufreq_unregister_driver(&scmi_cpufreq_driver);
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM
  2021-01-11 15:45 [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 1/4] scmi-cpufreq: Remove deferred probe Nicola Mazzucato
  2021-01-11 15:45 ` [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe Nicola Mazzucato
@ 2021-01-11 15:45 ` Nicola Mazzucato
  2021-01-12 11:20   ` Viresh Kumar
  2021-01-11 15:45 ` [PATCH v6 4/4] cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev Nicola Mazzucato
  3 siblings, 1 reply; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-11 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

By design, SCMI performance domains define the granularity of
performance controls, they do not describe any underlying hardware
dependencies (although they may match in many cases).

It is therefore possible to have some platforms where hardware may have
the ability to control CPU performance at different granularity and choose
to describe fine-grained performance control through SCMI.

In such situations, the energy model would be provided with inaccurate
information based on controls, while it still needs to know the
performance boundaries.

To restore correct functionality, retrieve information of CPUs under the
same performance domain from operating-points-v2 in DT, and pass it on to
EM.

Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 4aa97cdc5997..ff6ba6fab58b 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -226,9 +226,12 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
 	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
 	bool power_scale_mw;
 	cpumask_var_t scmi_cpus;
+	cpumask_var_t opp_shared_cpus;
 
 	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
 		return -ENOMEM;
+	if (!zalloc_cpumask_var(&opp_shared_cpus, GFP_KERNEL))
+		return -ENOMEM;
 
 	cpumask_set_cpu(cpu, scmi_cpus);
 
@@ -240,6 +243,20 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
 		goto free_cpumask;
 	}
 
+	/*
+	 * The OPP 'sharing cpus' info may come from dt through an empty opp
+	 * table and opp-shared. If found, it takes precedence over the SCMI
+	 * domain IDs info.
+	 */
+	ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, opp_shared_cpus);
+	if (ret || !cpumask_weight(opp_shared_cpus)) {
+		/*
+		 * Either opp-table is not set or no opp-shared was found,
+		 * use the information from SCMI domain IDs.
+		 */
+		cpumask_copy(opp_shared_cpus, scmi_cpus);
+	}
+
 	/*
 	 * We get here for each CPU. Add OPPs only on those CPUs for which we
 	 * haven't already done so, or set their OPPs as shared.
@@ -252,7 +269,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
 			goto free_cpumask;
 		}
 
-		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
+		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, opp_shared_cpus);
 		if (ret) {
 			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
 				__func__, ret);
@@ -269,7 +286,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
 		}
 
 		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
-		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
+		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, opp_shared_cpus,
 					    power_scale_mw);
 	}
 
@@ -284,6 +301,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
 
 free_cpumask:
 	free_cpumask_var(scmi_cpus);
+	free_cpumask_var(opp_shared_cpus);
 	return ret;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/4] cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev
  2021-01-11 15:45 [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus Nicola Mazzucato
                   ` (2 preceding siblings ...)
  2021-01-11 15:45 ` [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM Nicola Mazzucato
@ 2021-01-11 15:45 ` Nicola Mazzucato
  3 siblings, 0 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-11 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi
  Cc: morten.rasmussen, chris.redpath, nicola.mazzucato

From: Sudeep Holla <sudeep.holla@arm.com>

Add "arm,vexpress" to cpufreq-dt-platdev blacklist since the actual
scaling is handled by the firmware cpufreq drivers(scpi, scmi and
vexpress-spc).

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/cpufreq/cpufreq-dt-platdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
index bd2db0188cbb..91e6a0c10dbf 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -103,6 +103,8 @@ static const struct of_device_id whitelist[] __initconst = {
 static const struct of_device_id blacklist[] __initconst = {
 	{ .compatible = "allwinner,sun50i-h6", },
 
+	{ .compatible = "arm,vexpress", },
+
 	{ .compatible = "calxeda,highbank", },
 	{ .compatible = "calxeda,ecx-2000", },
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-11 15:45 ` [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe Nicola Mazzucato
@ 2021-01-12 11:17   ` Viresh Kumar
  2021-01-13 11:55     ` Nicola Mazzucato
  2021-01-14 16:54   ` Cristian Marussi
  1 sibling, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2021-01-12 11:17 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

On 11-01-21, 15:45, Nicola Mazzucato wrote:
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> +static int scmi_init_cpudata(void)
> +{
> +	int cpu;
> +	unsigned int ncpus = num_possible_cpus();
> +
> +	cpudata_table = kzalloc(sizeof(*cpudata_table) * ncpus, GFP_KERNEL);
> +	if (!cpudata_table)
> +		return -ENOMEM;

This could have been done with a per-cpu variable instead.

> +	for_each_possible_cpu(cpu) {
> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
> +					GFP_KERNEL))
> +			goto out;
> +	}

You are making a copy of the struct for each CPU and so for a 16 CPUs
sharing their clock lines, you will have 16 copies of the exact same
stuff.

An optimal approach would be to have a linked-list of this structure
and that will only have 1 node per cpufreq policy.

> +	return 0;
> +
> +out:
> +	kfree(cpudata_table);
> +	return -ENOMEM;
> +}
> +
> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
> +{
> +	struct device *cpu_dev;
> +	int ret, nr_opp;
> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
> +	bool power_scale_mw;
> +	cpumask_var_t scmi_cpus;
> +
> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	cpumask_set_cpu(cpu, scmi_cpus);
> +
> +	cpu_dev = get_cpu_device(cpu);
> +
> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);

Where do you expect the sharing information to come from in this case
? DT ?

> +	if (ret) {
> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
> +		goto free_cpumask;
> +	}
> +
> +	/*
> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
> +	 * haven't already done so, or set their OPPs as shared.
> +	 */
> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> +	if (nr_opp <= 0) {
> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
> +		if (ret) {
> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
> +			goto free_cpumask;
> +		}
> +
> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
> +		if (ret) {
> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
> +				__func__, ret);
> +			goto free_cpumask;
> +		}
> +
> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);

Shouldn't you do this just after adding the OPPs ?

> +		if (nr_opp <= 0) {
> +			dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
> +				__func__, ret);
> +
> +			ret = -ENODEV;
> +			goto free_cpumask;
> +		}
> +
> +		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
> +					    power_scale_mw);
> +	}
> +
> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev,
> +					    &cpudata_table[cpu].freq_table);
> +	if (ret) {
> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> +		goto free_cpumask;
> +	}
> +
> +	cpumask_copy(cpudata_table[cpu].scmi_shared_cpus, scmi_cpus);
> +
> +free_cpumask:
> +	free_cpumask_var(scmi_cpus);
> +	return ret;
> +}
> +
>  static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  {
>  	int ret;
>  	struct device *dev = &sdev->dev;
> +	int cpu;
> +	struct device *cpu_dev;

Please keep the list of local variable in decreasing order of their
length, many people including me prefer it that way.

>  
>  	handle = sdev->handle;
>  
> @@ -247,6 +305,24 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  		devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, NULL);
>  #endif
>  
> +	ret = scmi_init_cpudata();
> +	if (ret) {
> +		pr_err("%s: init cpu data failed\n", __func__);
> +		return ret;
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_dev = get_cpu_device(cpu);
> +
> +		ret = scmi_init_device(handle, cpu);
> +		if (ret) {
> +			dev_err(cpu_dev, "%s: init device failed\n",
> +				__func__);
> +
> +			return ret;

You missed undoing scmi_init_cpudata().

> +		}
> +	}
> +
>  	ret = cpufreq_register_driver(&scmi_cpufreq_driver);
>  	if (ret) {
>  		dev_err(dev, "%s: registering cpufreq failed, err: %d\n",
> @@ -258,6 +334,20 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  
>  static void scmi_cpufreq_remove(struct scmi_device *sdev)
>  {
> +	int cpu;
> +	struct device *cpu_dev;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_dev = get_cpu_device(cpu);
> +
> +		dev_pm_opp_free_cpufreq_table(cpu_dev,
> +					      &cpudata_table[cpu].freq_table);
> +
> +		free_cpumask_var(cpudata_table[cpu].scmi_shared_cpus);
> +	}
> +
> +	kfree(cpudata_table);
> +
>  	cpufreq_unregister_driver(&scmi_cpufreq_driver);
>  }
>  
> -- 
> 2.27.0

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM
  2021-01-11 15:45 ` [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM Nicola Mazzucato
@ 2021-01-12 11:20   ` Viresh Kumar
  2021-01-13 12:00     ` Nicola Mazzucato
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2021-01-12 11:20 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

On 11-01-21, 15:45, Nicola Mazzucato wrote:
> By design, SCMI performance domains define the granularity of
> performance controls, they do not describe any underlying hardware
> dependencies (although they may match in many cases).
> 
> It is therefore possible to have some platforms where hardware may have
> the ability to control CPU performance at different granularity and choose
> to describe fine-grained performance control through SCMI.
> 
> In such situations, the energy model would be provided with inaccurate
> information based on controls, while it still needs to know the
> performance boundaries.
> 
> To restore correct functionality, retrieve information of CPUs under the
> same performance domain from operating-points-v2 in DT, and pass it on to
> EM.
> 
> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
> ---
>  drivers/cpufreq/scmi-cpufreq.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> index 4aa97cdc5997..ff6ba6fab58b 100644
> --- a/drivers/cpufreq/scmi-cpufreq.c
> +++ b/drivers/cpufreq/scmi-cpufreq.c
> @@ -226,9 +226,12 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>  	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>  	bool power_scale_mw;
>  	cpumask_var_t scmi_cpus;
> +	cpumask_var_t opp_shared_cpus;
>  
>  	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
>  		return -ENOMEM;
> +	if (!zalloc_cpumask_var(&opp_shared_cpus, GFP_KERNEL))
> +		return -ENOMEM;
>  
>  	cpumask_set_cpu(cpu, scmi_cpus);
>  
> @@ -240,6 +243,20 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>  		goto free_cpumask;
>  	}
>  
> +	/*
> +	 * The OPP 'sharing cpus' info may come from dt through an empty opp
> +	 * table and opp-shared. If found, it takes precedence over the SCMI
> +	 * domain IDs info.
> +	 */
> +	ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, opp_shared_cpus);

If this succeeds, you shouldn't even try to call the other
get_sharing_cpus variant.

> +	if (ret || !cpumask_weight(opp_shared_cpus)) {
> +		/*
> +		 * Either opp-table is not set or no opp-shared was found,
> +		 * use the information from SCMI domain IDs.
> +		 */
> +		cpumask_copy(opp_shared_cpus, scmi_cpus);
> +	}
> +
>  	/*
>  	 * We get here for each CPU. Add OPPs only on those CPUs for which we
>  	 * haven't already done so, or set their OPPs as shared.
> @@ -252,7 +269,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>  			goto free_cpumask;
>  		}
>  
> -		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, opp_shared_cpus);
>  		if (ret) {
>  			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>  				__func__, ret);
> @@ -269,7 +286,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>  		}
>  
>  		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
> -		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, opp_shared_cpus,
>  					    power_scale_mw);
>  	}
>  
> @@ -284,6 +301,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>  
>  free_cpumask:
>  	free_cpumask_var(scmi_cpus);
> +	free_cpumask_var(opp_shared_cpus);
>  	return ret;
>  }
>  
> -- 
> 2.27.0

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-12 11:17   ` Viresh Kumar
@ 2021-01-13 11:55     ` Nicola Mazzucato
  2021-01-14  5:07       ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-13 11:55 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

Hi Viresh, thanks for looking into this.
Please see below.

On 1/12/21 11:17 AM, Viresh Kumar wrote:
> On 11-01-21, 15:45, Nicola Mazzucato wrote:
>> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
>> +static int scmi_init_cpudata(void)
>> +{
>> +	int cpu;
>> +	unsigned int ncpus = num_possible_cpus();
>> +
>> +	cpudata_table = kzalloc(sizeof(*cpudata_table) * ncpus, GFP_KERNEL);
>> +	if (!cpudata_table)
>> +		return -ENOMEM;
> 
> This could have been done with a per-cpu variable instead.

sure, I can do a DEFINE_PER_CPU() for it if it makes it better.

> 
>> +	for_each_possible_cpu(cpu) {
>> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
>> +					GFP_KERNEL))
>> +			goto out;
>> +	}
> 
> You are making a copy of the struct for each CPU and so for a 16 CPUs
> sharing their clock lines, you will have 16 copies of the exact same
> stuff.
> 
> An optimal approach would be to have a linked-list of this structure
> and that will only have 1 node per cpufreq policy.

It is allocating space for the cpumask for each of the cpu. No data is copied yet.
I understand the optimisation, but I don't see a linkage to cpufreq policy to be
a good idea. This cpudata is for internal storage of scmi and opp-shared info
and it is not tied to cpufreq policy. We have moved all the cpu bits to probe
and at this stage we have no knowledge of cpufreq polices.
Or what am I missing?

> 
>> +	return 0;
>> +
>> +out:
>> +	kfree(cpudata_table);
>> +	return -ENOMEM;
>> +}
>> +
>> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>> +{
>> +	struct device *cpu_dev;
>> +	int ret, nr_opp;
>> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>> +	bool power_scale_mw;
>> +	cpumask_var_t scmi_cpus;
>> +
>> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
>> +		return -ENOMEM;
>> +
>> +	cpumask_set_cpu(cpu, scmi_cpus);
>> +
>> +	cpu_dev = get_cpu_device(cpu);
>> +
>> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
> 
> Where do you expect the sharing information to come from in this case
> ? DT ?

Coming from SCMI perf. The source of info has not changed.

> 
>> +	if (ret) {
>> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
>> +		goto free_cpumask;
>> +	}
>> +
>> +	/*
>> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
>> +	 * haven't already done so, or set their OPPs as shared.
>> +	 */
>> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>> +	if (nr_opp <= 0) {
>> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
>> +		if (ret) {
>> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
>> +			goto free_cpumask;
>> +		}
>> +
>> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
>> +		if (ret) {
>> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>> +				__func__, ret);
>> +			goto free_cpumask;
>> +		}
>> +
>> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> 
> Shouldn't you do this just after adding the OPPs ?

This was suggested earlier. It was moved closer to em_registration to where the
nr_opp is used. One way or the other as I don't have a strong preference for its
place.

> 
>> +		if (nr_opp <= 0) {
>> +			dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
>> +				__func__, ret);
>> +
>> +			ret = -ENODEV;
>> +			goto free_cpumask;
>> +		}
>> +
>> +		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
>> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
>> +					    power_scale_mw);
>> +	}
>> +
>> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev,
>> +					    &cpudata_table[cpu].freq_table);
>> +	if (ret) {
>> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
>> +		goto free_cpumask;
>> +	}
>> +
>> +	cpumask_copy(cpudata_table[cpu].scmi_shared_cpus, scmi_cpus);
>> +
>> +free_cpumask:
>> +	free_cpumask_var(scmi_cpus);
>> +	return ret;
>> +}
>> +
>>  static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  {
>>  	int ret;
>>  	struct device *dev = &sdev->dev;
>> +	int cpu;
>> +	struct device *cpu_dev;
> 
> Please keep the list of local variable in decreasing order of their
> length, many people including me prefer it that way.

Apologies, it will get fixed.

> 
>>  
>>  	handle = sdev->handle;
>>  
>> @@ -247,6 +305,24 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  		devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, NULL);
>>  #endif
>>  
>> +	ret = scmi_init_cpudata();
>> +	if (ret) {
>> +		pr_err("%s: init cpu data failed\n", __func__);
>> +		return ret;
>> +	}
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		cpu_dev = get_cpu_device(cpu);
>> +
>> +		ret = scmi_init_device(handle, cpu);
>> +		if (ret) {
>> +			dev_err(cpu_dev, "%s: init device failed\n",
>> +				__func__);
>> +
>> +			return ret;
> 
> You missed undoing scmi_init_cpudata().

Thanks for spotting. I will fix it.

> 
>> +		}
>> +	}
>> +
>>  	ret = cpufreq_register_driver(&scmi_cpufreq_driver);
>>  	if (ret) {
>>  		dev_err(dev, "%s: registering cpufreq failed, err: %d\n",
>> @@ -258,6 +334,20 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  
>>  static void scmi_cpufreq_remove(struct scmi_device *sdev)
>>  {
>> +	int cpu;
>> +	struct device *cpu_dev;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		cpu_dev = get_cpu_device(cpu);
>> +
>> +		dev_pm_opp_free_cpufreq_table(cpu_dev,
>> +					      &cpudata_table[cpu].freq_table);
>> +
>> +		free_cpumask_var(cpudata_table[cpu].scmi_shared_cpus);
>> +	}
>> +
>> +	kfree(cpudata_table);
>> +
>>  	cpufreq_unregister_driver(&scmi_cpufreq_driver);
>>  }
>>  
>> -- 
>> 2.27.0
> 

Many thanks,
Nicola

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM
  2021-01-12 11:20   ` Viresh Kumar
@ 2021-01-13 12:00     ` Nicola Mazzucato
  0 siblings, 0 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-13 12:00 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

Hi Viresh, thanks for looking into this.
Please see below.

On 1/12/21 11:20 AM, Viresh Kumar wrote:
> On 11-01-21, 15:45, Nicola Mazzucato wrote:
>> By design, SCMI performance domains define the granularity of
>> performance controls, they do not describe any underlying hardware
>> dependencies (although they may match in many cases).
>>
>> It is therefore possible to have some platforms where hardware may have
>> the ability to control CPU performance at different granularity and choose
>> to describe fine-grained performance control through SCMI.
>>
>> In such situations, the energy model would be provided with inaccurate
>> information based on controls, while it still needs to know the
>> performance boundaries.
>>
>> To restore correct functionality, retrieve information of CPUs under the
>> same performance domain from operating-points-v2 in DT, and pass it on to
>> EM.
>>
>> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
>> ---
>>  drivers/cpufreq/scmi-cpufreq.c | 22 ++++++++++++++++++++--
>>  1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
>> index 4aa97cdc5997..ff6ba6fab58b 100644
>> --- a/drivers/cpufreq/scmi-cpufreq.c
>> +++ b/drivers/cpufreq/scmi-cpufreq.c
>> @@ -226,9 +226,12 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>  	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>>  	bool power_scale_mw;
>>  	cpumask_var_t scmi_cpus;
>> +	cpumask_var_t opp_shared_cpus;
>>  
>>  	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
>>  		return -ENOMEM;
>> +	if (!zalloc_cpumask_var(&opp_shared_cpus, GFP_KERNEL))
>> +		return -ENOMEM;
>>  
>>  	cpumask_set_cpu(cpu, scmi_cpus);
>>  
>> @@ -240,6 +243,20 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>  		goto free_cpumask;
>>  	}
>>  
>> +	/*
>> +	 * The OPP 'sharing cpus' info may come from dt through an empty opp
>> +	 * table and opp-shared. If found, it takes precedence over the SCMI
>> +	 * domain IDs info.
>> +	 */
>> +	ret = dev_pm_opp_of_get_sharing_cpus(cpu_dev, opp_shared_cpus);
> 
> If this succeeds, you shouldn't even try to call the other
> get_sharing_cpus variant.

IIUC you mean the above scmi_get_sharing_cpus() ?
It is actually required to do so, cause we need the info of SCMI domains,
regardless of the clock-sharing lines. When we have opp-sharing cpus we still
need to control the SCMI domains as usual.

> 
>> +	if (ret || !cpumask_weight(opp_shared_cpus)) {
>> +		/*
>> +		 * Either opp-table is not set or no opp-shared was found,
>> +		 * use the information from SCMI domain IDs.
>> +		 */
>> +		cpumask_copy(opp_shared_cpus, scmi_cpus);
>> +	}
>> +
>>  	/*
>>  	 * We get here for each CPU. Add OPPs only on those CPUs for which we
>>  	 * haven't already done so, or set their OPPs as shared.
>> @@ -252,7 +269,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>  			goto free_cpumask;
>>  		}
>>  
>> -		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
>> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, opp_shared_cpus);
>>  		if (ret) {
>>  			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>>  				__func__, ret);
>> @@ -269,7 +286,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>  		}
>>  
>>  		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
>> -		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
>> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, opp_shared_cpus,
>>  					    power_scale_mw);
>>  	}
>>  
>> @@ -284,6 +301,7 @@ static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>  
>>  free_cpumask:
>>  	free_cpumask_var(scmi_cpus);
>> +	free_cpumask_var(opp_shared_cpus);
>>  	return ret;
>>  }
>>  
>> -- 
>> 2.27.0
> 

Many thanks,
Nicola

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-13 11:55     ` Nicola Mazzucato
@ 2021-01-14  5:07       ` Viresh Kumar
  2021-01-14 13:37         ` Nicola Mazzucato
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2021-01-14  5:07 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

On 13-01-21, 11:55, Nicola Mazzucato wrote:
> On 1/12/21 11:17 AM, Viresh Kumar wrote:
> > This could have been done with a per-cpu variable instead.
> 
> sure, I can do a DEFINE_PER_CPU() for it if it makes it better.

If we don't go with the linked list approach, then yes.
 
> >> +	for_each_possible_cpu(cpu) {
> >> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
> >> +					GFP_KERNEL))
> >> +			goto out;
> >> +	}
> > 
> > You are making a copy of the struct for each CPU and so for a 16 CPUs
> > sharing their clock lines, you will have 16 copies of the exact same
> > stuff.
> > 
> > An optimal approach would be to have a linked-list of this structure
> > and that will only have 1 node per cpufreq policy.
> 
> It is allocating space for the cpumask for each of the cpu. No data is copied yet.

Yes, I was talking about the whole design here.

> I understand the optimisation, but I don't see a linkage to cpufreq policy to be
> a good idea. This cpudata is for internal storage of scmi and opp-shared info
> and it is not tied to cpufreq policy.

Well, it is going to be the same information for all CPUs of a policy, isn't it
?

> We have moved all the cpu bits to probe
> and at this stage we have no knowledge of cpufreq polices.

Yes, but you are reading that information from scmi or DT (empty opp tables) and
so you know what the cpumasks are going to be set to. The linked list is the
right solution in my opinion, it is much more optimal.

> >> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
> >> +{
> >> +	struct device *cpu_dev;
> >> +	int ret, nr_opp;
> >> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
> >> +	bool power_scale_mw;
> >> +	cpumask_var_t scmi_cpus;
> >> +
> >> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
> >> +		return -ENOMEM;
> >> +
> >> +	cpumask_set_cpu(cpu, scmi_cpus);
> >> +
> >> +	cpu_dev = get_cpu_device(cpu);
> >> +
> >> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
> > 
> > Where do you expect the sharing information to come from in this case
> > ? DT ?
> 
> Coming from SCMI perf. The source of info has not changed.
> 
> > 
> >> +	if (ret) {
> >> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
> >> +		goto free_cpumask;
> >> +	}
> >> +
> >> +	/*
> >> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
> >> +	 * haven't already done so, or set their OPPs as shared.
> >> +	 */
> >> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> >> +	if (nr_opp <= 0) {
> >> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
> >> +		if (ret) {
> >> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
> >> +			goto free_cpumask;
> >> +		}
> >> +
> >> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
> >> +		if (ret) {
> >> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
> >> +				__func__, ret);
> >> +			goto free_cpumask;
> >> +		}
> >> +
> >> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> > 
> > Shouldn't you do this just after adding the OPPs ?
> 
> This was suggested earlier. It was moved closer to em_registration to where the
> nr_opp is used. One way or the other as I don't have a strong preference for its
> place.

It is better to move it above as this will shorten the error path.

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-14  5:07       ` Viresh Kumar
@ 2021-01-14 13:37         ` Nicola Mazzucato
  0 siblings, 0 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-14 13:37 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, cristian.marussi, morten.rasmussen, chris.redpath

Hi Viresh,

many thanks for your suggestions. I will prepare a new version based on those.

Many thanks,
Nicola


On 1/14/21 5:07 AM, Viresh Kumar wrote:
> On 13-01-21, 11:55, Nicola Mazzucato wrote:
>> On 1/12/21 11:17 AM, Viresh Kumar wrote:
>>> This could have been done with a per-cpu variable instead.
>>
>> sure, I can do a DEFINE_PER_CPU() for it if it makes it better.
> 
> If we don't go with the linked list approach, then yes.
>  
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
>>>> +					GFP_KERNEL))
>>>> +			goto out;
>>>> +	}
>>>
>>> You are making a copy of the struct for each CPU and so for a 16 CPUs
>>> sharing their clock lines, you will have 16 copies of the exact same
>>> stuff.
>>>
>>> An optimal approach would be to have a linked-list of this structure
>>> and that will only have 1 node per cpufreq policy.
>>
>> It is allocating space for the cpumask for each of the cpu. No data is copied yet.
> 
> Yes, I was talking about the whole design here.
> 
>> I understand the optimisation, but I don't see a linkage to cpufreq policy to be
>> a good idea. This cpudata is for internal storage of scmi and opp-shared info
>> and it is not tied to cpufreq policy.
> 
> Well, it is going to be the same information for all CPUs of a policy, isn't it
> ?
> 
>> We have moved all the cpu bits to probe
>> and at this stage we have no knowledge of cpufreq polices.
> 
> Yes, but you are reading that information from scmi or DT (empty opp tables) and
> so you know what the cpumasks are going to be set to. The linked list is the
> right solution in my opinion, it is much more optimal.
> 
>>>> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>>>> +{
>>>> +	struct device *cpu_dev;
>>>> +	int ret, nr_opp;
>>>> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>>>> +	bool power_scale_mw;
>>>> +	cpumask_var_t scmi_cpus;
>>>> +
>>>> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
>>>> +		return -ENOMEM;
>>>> +
>>>> +	cpumask_set_cpu(cpu, scmi_cpus);
>>>> +
>>>> +	cpu_dev = get_cpu_device(cpu);
>>>> +
>>>> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
>>>
>>> Where do you expect the sharing information to come from in this case
>>> ? DT ?
>>
>> Coming from SCMI perf. The source of info has not changed.
>>
>>>
>>>> +	if (ret) {
>>>> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
>>>> +		goto free_cpumask;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
>>>> +	 * haven't already done so, or set their OPPs as shared.
>>>> +	 */
>>>> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>>>> +	if (nr_opp <= 0) {
>>>> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
>>>> +		if (ret) {
>>>> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
>>>> +			goto free_cpumask;
>>>> +		}
>>>> +
>>>> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
>>>> +		if (ret) {
>>>> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>>>> +				__func__, ret);
>>>> +			goto free_cpumask;
>>>> +		}
>>>> +
>>>> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>>>
>>> Shouldn't you do this just after adding the OPPs ?
>>
>> This was suggested earlier. It was moved closer to em_registration to where the
>> nr_opp is used. One way or the other as I don't have a strong preference for its
>> place.
> 
> It is better to move it above as this will shorten the error path.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-11 15:45 ` [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe Nicola Mazzucato
  2021-01-12 11:17   ` Viresh Kumar
@ 2021-01-14 16:54   ` Cristian Marussi
  2021-01-30  9:43     ` Nicola Mazzucato
  1 sibling, 1 reply; 13+ messages in thread
From: Cristian Marussi @ 2021-01-14 16:54 UTC (permalink / raw)
  To: Nicola Mazzucato
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, morten.rasmussen, chris.redpath

Hi Nicola,

a few remarks down below.

On Mon, Jan 11, 2021 at 03:45:22PM +0000, Nicola Mazzucato wrote:
> Some of the cpu related initialisation can be done at probe stage.
> This patch moves those initialisations from the ->init callback to the
> probe stage.
> 
> This is done in preparation for adding support to retrieve additional
> information from DT (CPUs sharing v/f lines).
> 
> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
> ---
>  drivers/cpufreq/scmi-cpufreq.c | 180 ++++++++++++++++++++++++---------
>  1 file changed, 135 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> index 15b213ed78fa..4aa97cdc5997 100644
> --- a/drivers/cpufreq/scmi-cpufreq.c
> +++ b/drivers/cpufreq/scmi-cpufreq.c
> @@ -25,6 +25,14 @@ struct scmi_data {
>  	struct device *cpu_dev;
>  };
>  
> +/* Per-CPU storage for runtime management */
> +struct scmi_cpudata {
> +	cpumask_var_t scmi_shared_cpus;
> +	struct cpufreq_frequency_table *freq_table;
> +};
> +
> +static struct scmi_cpudata *cpudata_table;
> +
>  static const struct scmi_handle *handle;
>  
>  static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
> @@ -120,13 +128,10 @@ scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
>  
>  static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>  {
> -	int ret, nr_opp;
> +	int ret;
>  	unsigned int latency;
>  	struct device *cpu_dev;
>  	struct scmi_data *priv;
> -	struct cpufreq_frequency_table *freq_table;
> -	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
> -	bool power_scale_mw;
>  
>  	cpu_dev = get_cpu_device(policy->cpu);
>  	if (!cpu_dev) {
> @@ -134,42 +139,19 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>  		return -ENODEV;
>  	}
>  
> -	ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
> -	if (ret) {
> -		dev_warn(cpu_dev, "failed to add opps to the device\n");
> -		return ret;
> -	}
> -
> -	ret = scmi_get_sharing_cpus(cpu_dev, policy->cpus);
> -	if (ret) {
> -		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
> -		return ret;
> -	}
> -
> -	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
> -	if (ret) {
> -		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
> -			__func__, ret);
> -		return ret;
> -	}
> -
>  	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>  	if (!priv) {
>  		ret = -ENOMEM;
>  		goto out_free_opp;
>  	}
>  
> -	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
> -	if (ret) {
> -		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> -		goto out_free_priv;
> -	}
> +	cpumask_copy(policy->cpus, cpudata_table[policy->cpu].scmi_shared_cpus);
>  
>  	priv->cpu_dev = cpu_dev;
>  	priv->domain_id = handle->perf_ops->device_domain_id(cpu_dev);
>  
>  	policy->driver_data = priv;
> -	policy->freq_table = freq_table;
> +	policy->freq_table = cpudata_table[policy->cpu].freq_table;
>  
>  	/* SCMI allows DVFS request for any domain from any CPU */
>  	policy->dvfs_possible_from_any_cpu = true;
> @@ -183,23 +165,8 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>  	policy->fast_switch_possible =
>  		handle->perf_ops->fast_switch_possible(handle, cpu_dev);
>  
> -	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> -	if (nr_opp <= 0) {
> -		dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
> -			__func__, ret);
> -
> -		ret = -ENODEV;
> -		goto out_free_priv;
> -	}
> -
> -	power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
> -	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
> -				    power_scale_mw);
> -
>  	return 0;
>  
> -out_free_priv:
> -	kfree(priv);
>  out_free_opp:
>  	dev_pm_opp_remove_all_dynamic(cpu_dev);
>  

My understanding (but I could be wrong given my limited familiarity with
CPUFREQ ... so bear with me) is that dev_pm_opp_remove_all_dynamic() is
meant to clean dynamic OPPs added by dev_pm_opp_add() which in turn in
this driver is folded inside the handle->perf_ops->device_opps_add() call,
so is not that this call should be added also on the error path inside
the new scmi_init_device() ? (this was already faulty this way in the
original code to be honest...if faulty at all :D)

I added a few such invocations down below as rough untested example of
what I mean.

> @@ -210,7 +177,6 @@ static int scmi_cpufreq_exit(struct cpufreq_policy *policy)
>  {
>  	struct scmi_data *priv = policy->driver_data;
>  
> -	dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
>  	dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
>  	kfree(priv);
>  
> @@ -231,10 +197,102 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
>  	.exit	= scmi_cpufreq_exit,
>  };
>  
> +static int scmi_init_cpudata(void)
> +{
> +	int cpu;
> +	unsigned int ncpus = num_possible_cpus();
> +
> +	cpudata_table = kzalloc(sizeof(*cpudata_table) * ncpus, GFP_KERNEL);
Shouldn/t this be a kcalloc() given it's an array allocation, checkpatch
complains too.

> +	if (!cpudata_table)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
> +					GFP_KERNEL))
> +			goto out;
> +	}
> +
> +	return 0;
> +
> +out:
> +	kfree(cpudata_table);
> +	return -ENOMEM;
> +}
> +
> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
> +{
> +	struct device *cpu_dev;
> +	int ret, nr_opp;
> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
> +	bool power_scale_mw;
> +	cpumask_var_t scmi_cpus;
> +
> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	cpumask_set_cpu(cpu, scmi_cpus);
> +
> +	cpu_dev = get_cpu_device(cpu);
> +
> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
> +	if (ret) {
> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
> +		goto free_cpumask;
> +	}
> +
> +	/*
> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
> +	 * haven't already done so, or set their OPPs as shared.
> +	 */
> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> +	if (nr_opp <= 0) {
> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
> +		if (ret) {
> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
> +			goto free_cpumask;
> +		}
> +
> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
> +		if (ret) {
> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
> +				__func__, ret);
			goto free_dynamic_opps;
> +		}
> +
> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
> +		if (nr_opp <= 0) {
> +			dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
> +				__func__, ret);
> +
> +			ret = -ENODEV;
			goto free_dynamic_opps;
> +		}
> +
> +		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
> +					    power_scale_mw);
> +	}
> +
> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev,
> +					    &cpudata_table[cpu].freq_table);
> +	if (ret) {
> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
		goto free_dynamic_opps;
> +	}
> +
> +	cpumask_copy(cpudata_table[cpu].scmi_shared_cpus, scmi_cpus);
> +
   free_dynamic_opps:
	   dev_pm_opp_remove_all_dynamic(cpu_dev);
> +free_cpumask:
> +	free_cpumask_var(scmi_cpus);
> +	return ret;
> +}
> +
>  static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  {
>  	int ret;
>  	struct device *dev = &sdev->dev;
> +	int cpu;
> +	struct device *cpu_dev;
>  
>  	handle = sdev->handle;
>  
> @@ -247,6 +305,24 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  		devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, NULL);
>  #endif
>  
> +	ret = scmi_init_cpudata();
> +	if (ret) {
> +		pr_err("%s: init cpu data failed\n", __func__);
> +		return ret;
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_dev = get_cpu_device(cpu);
> +
> +		ret = scmi_init_device(handle, cpu);
> +		if (ret) {
> +			dev_err(cpu_dev, "%s: init device failed\n",
> +				__func__);
> +
			goto clean;
> +		}
> +	}
> +
>  	ret = cpufreq_register_driver(&scmi_cpufreq_driver);
>  	if (ret) {
>  		dev_err(dev, "%s: registering cpufreq failed, err: %d\n",

	/* clean any dynamic OPPs already set */
clean:
	for_each_possible_cpu(cpu) {
		cpu_dev = get_cpu_device(cpu);

		dev_pm_opp_remove_all_dynamic(cpu_dev);
	}

	return ret;
}


Thanks

Cristian

> @@ -258,6 +334,20 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>  
>  static void scmi_cpufreq_remove(struct scmi_device *sdev)
>  {
> +	int cpu;
> +	struct device *cpu_dev;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_dev = get_cpu_device(cpu);
> +
> +		dev_pm_opp_free_cpufreq_table(cpu_dev,
> +					      &cpudata_table[cpu].freq_table);
> +
> +		free_cpumask_var(cpudata_table[cpu].scmi_shared_cpus);
> +	}
> +
> +	kfree(cpudata_table);
> +
>  	cpufreq_unregister_driver(&scmi_cpufreq_driver);
>  }
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe
  2021-01-14 16:54   ` Cristian Marussi
@ 2021-01-30  9:43     ` Nicola Mazzucato
  0 siblings, 0 replies; 13+ messages in thread
From: Nicola Mazzucato @ 2021-01-30  9:43 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla, rjw,
	vireshk, morten.rasmussen, chris.redpath

Hi Cristian,

sorry for my late reply.
Thanks for looking into this.

I am preparing a v7 with suggestions proposed by Viresh which, hopefully, should
remove some unclear parts and resolve your comments.
I had left behind some dealloc, so thanks for spotting!

Many thanks,
Nicola

On 1/14/21 4:54 PM, Cristian Marussi wrote:
> Hi Nicola,
> 
> a few remarks down below.
> 
> On Mon, Jan 11, 2021 at 03:45:22PM +0000, Nicola Mazzucato wrote:
>> Some of the cpu related initialisation can be done at probe stage.
>> This patch moves those initialisations from the ->init callback to the
>> probe stage.
>>
>> This is done in preparation for adding support to retrieve additional
>> information from DT (CPUs sharing v/f lines).
>>
>> Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
>> ---
>>  drivers/cpufreq/scmi-cpufreq.c | 180 ++++++++++++++++++++++++---------
>>  1 file changed, 135 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
>> index 15b213ed78fa..4aa97cdc5997 100644
>> --- a/drivers/cpufreq/scmi-cpufreq.c
>> +++ b/drivers/cpufreq/scmi-cpufreq.c
>> @@ -25,6 +25,14 @@ struct scmi_data {
>>  	struct device *cpu_dev;
>>  };
>>  
>> +/* Per-CPU storage for runtime management */
>> +struct scmi_cpudata {
>> +	cpumask_var_t scmi_shared_cpus;
>> +	struct cpufreq_frequency_table *freq_table;
>> +};
>> +
>> +static struct scmi_cpudata *cpudata_table;
>> +
>>  static const struct scmi_handle *handle;
>>  
>>  static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
>> @@ -120,13 +128,10 @@ scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
>>  
>>  static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>>  {
>> -	int ret, nr_opp;
>> +	int ret;
>>  	unsigned int latency;
>>  	struct device *cpu_dev;
>>  	struct scmi_data *priv;
>> -	struct cpufreq_frequency_table *freq_table;
>> -	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>> -	bool power_scale_mw;
>>  
>>  	cpu_dev = get_cpu_device(policy->cpu);
>>  	if (!cpu_dev) {
>> @@ -134,42 +139,19 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>>  		return -ENODEV;
>>  	}
>>  
>> -	ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
>> -	if (ret) {
>> -		dev_warn(cpu_dev, "failed to add opps to the device\n");
>> -		return ret;
>> -	}
>> -
>> -	ret = scmi_get_sharing_cpus(cpu_dev, policy->cpus);
>> -	if (ret) {
>> -		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
>> -		return ret;
>> -	}
>> -
>> -	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
>> -	if (ret) {
>> -		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>> -			__func__, ret);
>> -		return ret;
>> -	}
>> -
>>  	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>>  	if (!priv) {
>>  		ret = -ENOMEM;
>>  		goto out_free_opp;
>>  	}
>>  
>> -	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
>> -	if (ret) {
>> -		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
>> -		goto out_free_priv;
>> -	}
>> +	cpumask_copy(policy->cpus, cpudata_table[policy->cpu].scmi_shared_cpus);
>>  
>>  	priv->cpu_dev = cpu_dev;
>>  	priv->domain_id = handle->perf_ops->device_domain_id(cpu_dev);
>>  
>>  	policy->driver_data = priv;
>> -	policy->freq_table = freq_table;
>> +	policy->freq_table = cpudata_table[policy->cpu].freq_table;
>>  
>>  	/* SCMI allows DVFS request for any domain from any CPU */
>>  	policy->dvfs_possible_from_any_cpu = true;
>> @@ -183,23 +165,8 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
>>  	policy->fast_switch_possible =
>>  		handle->perf_ops->fast_switch_possible(handle, cpu_dev);
>>  
>> -	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>> -	if (nr_opp <= 0) {
>> -		dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
>> -			__func__, ret);
>> -
>> -		ret = -ENODEV;
>> -		goto out_free_priv;
>> -	}
>> -
>> -	power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
>> -	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
>> -				    power_scale_mw);
>> -
>>  	return 0;
>>  
>> -out_free_priv:
>> -	kfree(priv);
>>  out_free_opp:
>>  	dev_pm_opp_remove_all_dynamic(cpu_dev);
>>  
> 
> My understanding (but I could be wrong given my limited familiarity with
> CPUFREQ ... so bear with me) is that dev_pm_opp_remove_all_dynamic() is
> meant to clean dynamic OPPs added by dev_pm_opp_add() which in turn in
> this driver is folded inside the handle->perf_ops->device_opps_add() call,
> so is not that this call should be added also on the error path inside
> the new scmi_init_device() ? (this was already faulty this way in the
> original code to be honest...if faulty at all :D)
> 
> I added a few such invocations down below as rough untested example of
> what I mean.
> 
>> @@ -210,7 +177,6 @@ static int scmi_cpufreq_exit(struct cpufreq_policy *policy)
>>  {
>>  	struct scmi_data *priv = policy->driver_data;
>>  
>> -	dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
>>  	dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
>>  	kfree(priv);
>>  
>> @@ -231,10 +197,102 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
>>  	.exit	= scmi_cpufreq_exit,
>>  };
>>  
>> +static int scmi_init_cpudata(void)
>> +{
>> +	int cpu;
>> +	unsigned int ncpus = num_possible_cpus();
>> +
>> +	cpudata_table = kzalloc(sizeof(*cpudata_table) * ncpus, GFP_KERNEL);
> Shouldn/t this be a kcalloc() given it's an array allocation, checkpatch
> complains too.
> 
>> +	if (!cpudata_table)
>> +		return -ENOMEM;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		if (!zalloc_cpumask_var(&cpudata_table[cpu].scmi_shared_cpus,
>> +					GFP_KERNEL))
>> +			goto out;
>> +	}
>> +
>> +	return 0;
>> +
>> +out:
>> +	kfree(cpudata_table);
>> +	return -ENOMEM;
>> +}
>> +
>> +static int scmi_init_device(const struct scmi_handle *handle, int cpu)
>> +{
>> +	struct device *cpu_dev;
>> +	int ret, nr_opp;
>> +	struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power);
>> +	bool power_scale_mw;
>> +	cpumask_var_t scmi_cpus;
>> +
>> +	if (!zalloc_cpumask_var(&scmi_cpus, GFP_KERNEL))
>> +		return -ENOMEM;
>> +
>> +	cpumask_set_cpu(cpu, scmi_cpus);
>> +
>> +	cpu_dev = get_cpu_device(cpu);
>> +
>> +	ret = scmi_get_sharing_cpus(cpu_dev, scmi_cpus);
>> +	if (ret) {
>> +		dev_warn(cpu_dev, "failed to get sharing cpumask\n");
>> +		goto free_cpumask;
>> +	}
>> +
>> +	/*
>> +	 * We get here for each CPU. Add OPPs only on those CPUs for which we
>> +	 * haven't already done so, or set their OPPs as shared.
>> +	 */
>> +	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>> +	if (nr_opp <= 0) {
>> +		ret = handle->perf_ops->device_opps_add(handle, cpu_dev);
>> +		if (ret) {
>> +			dev_warn(cpu_dev, "failed to add opps to the device\n");
>> +			goto free_cpumask;
>> +		}
>> +
>> +		ret = dev_pm_opp_set_sharing_cpus(cpu_dev, scmi_cpus);
>> +		if (ret) {
>> +			dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n",
>> +				__func__, ret);
> 			goto free_dynamic_opps;
>> +		}
>> +
>> +		nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
>> +		if (nr_opp <= 0) {
>> +			dev_err(cpu_dev, "%s: No OPPs for this device: %d\n",
>> +				__func__, ret);
>> +
>> +			ret = -ENODEV;
> 			goto free_dynamic_opps;
>> +		}
>> +
>> +		power_scale_mw = handle->perf_ops->power_scale_mw_get(handle);
>> +		em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, scmi_cpus,
>> +					    power_scale_mw);
>> +	}
>> +
>> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev,
>> +					    &cpudata_table[cpu].freq_table);
>> +	if (ret) {
>> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> 		goto free_dynamic_opps;
>> +	}
>> +
>> +	cpumask_copy(cpudata_table[cpu].scmi_shared_cpus, scmi_cpus);
>> +
>    free_dynamic_opps:
> 	   dev_pm_opp_remove_all_dynamic(cpu_dev);
>> +free_cpumask:
>> +	free_cpumask_var(scmi_cpus);
>> +	return ret;
>> +}
>> +
>>  static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  {
>>  	int ret;
>>  	struct device *dev = &sdev->dev;
>> +	int cpu;
>> +	struct device *cpu_dev;
>>  
>>  	handle = sdev->handle;
>>  
>> @@ -247,6 +305,24 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  		devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, NULL);
>>  #endif
>>  
>> +	ret = scmi_init_cpudata();
>> +	if (ret) {
>> +		pr_err("%s: init cpu data failed\n", __func__);
>> +		return ret;
>> +	}
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		cpu_dev = get_cpu_device(cpu);
>> +
>> +		ret = scmi_init_device(handle, cpu);
>> +		if (ret) {
>> +			dev_err(cpu_dev, "%s: init device failed\n",
>> +				__func__);
>> +
> 			goto clean;
>> +		}
>> +	}
>> +
>>  	ret = cpufreq_register_driver(&scmi_cpufreq_driver);
>>  	if (ret) {
>>  		dev_err(dev, "%s: registering cpufreq failed, err: %d\n",
> 
> 	/* clean any dynamic OPPs already set */
> clean:
> 	for_each_possible_cpu(cpu) {
> 		cpu_dev = get_cpu_device(cpu);
> 
> 		dev_pm_opp_remove_all_dynamic(cpu_dev);
> 	}
> 
> 	return ret;
> }
> 
> 
> Thanks
> 
> Cristian
> 
>> @@ -258,6 +334,20 @@ static int scmi_cpufreq_probe(struct scmi_device *sdev)
>>  
>>  static void scmi_cpufreq_remove(struct scmi_device *sdev)
>>  {
>> +	int cpu;
>> +	struct device *cpu_dev;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		cpu_dev = get_cpu_device(cpu);
>> +
>> +		dev_pm_opp_free_cpufreq_table(cpu_dev,
>> +					      &cpudata_table[cpu].freq_table);
>> +
>> +		free_cpumask_var(cpudata_table[cpu].scmi_shared_cpus);
>> +	}
>> +
>> +	kfree(cpudata_table);
>> +
>>  	cpufreq_unregister_driver(&scmi_cpufreq_driver);
>>  }
>>  
>> -- 
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-30  9:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 15:45 [PATCH v6 0/4] CPUFreq: Add support for opp-sharing cpus Nicola Mazzucato
2021-01-11 15:45 ` [PATCH v6 1/4] scmi-cpufreq: Remove deferred probe Nicola Mazzucato
2021-01-11 15:45 ` [PATCH v6 2/4] scmi-cpufreq: Move CPU initialisation to probe Nicola Mazzucato
2021-01-12 11:17   ` Viresh Kumar
2021-01-13 11:55     ` Nicola Mazzucato
2021-01-14  5:07       ` Viresh Kumar
2021-01-14 13:37         ` Nicola Mazzucato
2021-01-14 16:54   ` Cristian Marussi
2021-01-30  9:43     ` Nicola Mazzucato
2021-01-11 15:45 ` [PATCH v6 3/4] scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM Nicola Mazzucato
2021-01-12 11:20   ` Viresh Kumar
2021-01-13 12:00     ` Nicola Mazzucato
2021-01-11 15:45 ` [PATCH v6 4/4] cpufreq: blacklist Arm Vexpress platforms in cpufreq-dt-platdev Nicola Mazzucato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).