LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH RFC 0/9] Add CPU based scaling support to Passive governor
@ 2019-03-28 15:28 Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 1/9] OPP: Add and export helpers to get avg/peak bw Sibi Sankar
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

This RFC series aims to add cpu based scaling support to the passive
governor and scale DDR with a generic interconnect bandwidth based
devfreq driver on SDM845 SoC. This series achieves similar functionality
to Georgi's Patch series (https://patchwork.kernel.org/cover/10850817/)
and can be used with MSM8916/MSM8996/QCS404 SoCs.

[patches 1,6 - Add and export export helpers to get avg/peak bandwidth and
 update voltage of an disabled opp respectively]

[patch 3 - Adds cpu based scaling support to passive governor]
To achieve this, it listens to CPU frequency transition notifiers
to keep itself up to date on the current CPU frequency.
To decide the frequency of the device, the governor depends one of
the following:
* Constructs a CPU frequency to device frequency mapping table from
  required-opps property of the devfreq device's opp_table
* Scales the device frequency in proportion to the CPU frequency by
  performing interpolation.

[patch 7 - Parses and updates opps from the frequency/voltage read from
 the look up tables]

The patch series depends on opp-bw-MBs bindings introduced in:
https://patchwork.kernel.org/cover/10850817/

Saravana Kannan (2):
  PM / devfreq: Add cpu based scaling support to passive_governor
  PM / devfreq: Add devfreq driver for interconnect bandwidth voting

Sibi Sankar (7):
  OPP: Add and export helpers to get avg/peak bw
  OPP: Export a number of helpers to prevent code duplication
  dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver
  OPP: Add and export helper to update voltage
  cpufreq: qcom: Add support to update cpu node's OPP tables
  arm64: dts: qcom: sdm845: Add cpu OPP tables
  arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables

 .../devicetree/bindings/devfreq/icbw.txt      | 146 +++++++++
 arch/arm64/boot/dts/qcom/sdm845.dtsi          | 262 +++++++++++++++++
 drivers/cpufreq/qcom-cpufreq-hw.c             |  29 +-
 drivers/devfreq/Kconfig                       |  19 ++
 drivers/devfreq/Makefile                      |   1 +
 drivers/devfreq/devfreq_icbw.c                | 132 +++++++++
 drivers/devfreq/governor_passive.c            | 276 +++++++++++++++++-
 drivers/opp/core.c                            | 100 +++++++
 drivers/opp/of.c                              |  13 +-
 include/linux/devfreq.h                       |  43 ++-
 include/linux/pm_opp.h                        |  35 +++
 11 files changed, 1044 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/devfreq/icbw.txt
 create mode 100644 drivers/devfreq/devfreq_icbw.c

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 1/9] OPP: Add and export helpers to get avg/peak bw
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication Sibi Sankar
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add and export helpers 'dev_pm_opp_get_avg_bw()' and
'dev_pm_opp_get_peak_bw()' that can be used to get the
average and peak bandwidth values read from device tree
when present.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/opp/core.c     | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/pm_opp.h | 12 ++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 4fcc2b9259c5..addaf7aae9ae 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -149,6 +149,44 @@ unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_get_level);
 
+/**
+ * dev_pm_opp_get_avg_bw() - Gets the average bandwidth corresponding to an
+ * available opp
+ * @opp:	opp for which average bandwidth has to be returned for
+ *
+ * Return: average bandwidth read from device tree corresponding to the
+ * opp, else return 0.
+ */
+unsigned int dev_pm_opp_get_avg_bw(struct dev_pm_opp *opp)
+{
+	if (IS_ERR_OR_NULL(opp) || !opp->available) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return 0;
+	}
+
+	return opp->bandwidth->avg;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_avg_bw);
+
+/**
+ * dev_pm_opp_get_peak_bw() - Gets the peak bandwidth corresponding to an
+ * available opp
+ * @opp:	opp for which peak bandwidth has to be returned for
+ *
+ * Return: peak bandwidth read from device tree corresponding to the
+ * opp, else return 0.
+ */
+unsigned int dev_pm_opp_get_peak_bw(struct dev_pm_opp *opp)
+{
+	if (IS_ERR_OR_NULL(opp) || !opp->available) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return 0;
+	}
+
+	return opp->bandwidth->peak;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_peak_bw);
+
 /**
  * dev_pm_opp_is_turbo() - Returns if opp is turbo OPP or not
  * @opp: opp for which turbo mode is being verified
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 80a49d1fa9a8..82ff8e2e1ff7 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -99,6 +99,8 @@ unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp);
 unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp);
 
 unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp);
+unsigned int dev_pm_opp_get_avg_bw(struct dev_pm_opp *opp);
+unsigned int dev_pm_opp_get_peak_bw(struct dev_pm_opp *opp);
 
 bool dev_pm_opp_is_turbo(struct dev_pm_opp *opp);
 
@@ -179,6 +181,16 @@ static inline unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp)
 	return 0;
 }
 
+static inline unsigned int dev_pm_opp_get_avg_bw(struct dev_pm_opp *opp)
+{
+	return 0;
+}
+
+static inline unsigned int dev_pm_opp_get_peak_bw(struct dev_pm_opp *opp)
+{
+	return 0;
+}
+
 static inline bool dev_pm_opp_is_turbo(struct dev_pm_opp *opp)
 {
 	return false;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 1/9] OPP: Add and export helpers to get avg/peak bw Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-07-08  3:28   ` Hsin-Yi Wang
  2019-03-28 15:28 ` [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor Sibi Sankar
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Export 'dev_pm_opp_find_opp_of_np' and 'of_parse_required_nodes'
as it will be used by passive governor to parse and auto-populate
mapping specified using the required-opps property.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/opp/of.c       | 13 +++++++++++--
 include/linux/pm_opp.h | 13 +++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 539f3d013a59..d9d8875eca05 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -98,8 +98,8 @@ static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table,
 	return NULL;
 }
 
-static struct device_node *of_parse_required_opp(struct device_node *np,
-						 int index)
+struct device_node *of_parse_required_opp(struct device_node *np,
+					  int index)
 {
 	struct device_node *required_np;
 
@@ -111,6 +111,15 @@ static struct device_node *of_parse_required_opp(struct device_node *np,
 
 	return required_np;
 }
+EXPORT_SYMBOL_GPL(of_parse_required_opp);
+
+/* The caller must call dev_pm_opp_put() after the OPP is used */
+struct dev_pm_opp *dev_pm_opp_find_opp_of_np(struct opp_table *opp_table,
+					     struct device_node *opp_np)
+{
+	return _find_opp_of_np(opp_table, opp_np);
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_find_opp_of_np);
 
 /* The caller must call dev_pm_opp_put_opp_table() after the table is used */
 static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 82ff8e2e1ff7..d7cb0e65c4f0 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -359,8 +359,11 @@ void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask);
 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
+struct device_node *of_parse_required_opp(struct device_node *np, int index);
 int of_get_required_opp_performance_state(struct device_node *np, int index);
 void dev_pm_opp_of_register_em(struct cpumask *cpus);
+struct dev_pm_opp *dev_pm_opp_find_opp_of_np(struct opp_table *opp_table,
+					     struct device_node *opp_np);
 #else
 static inline int dev_pm_opp_of_add_table(struct device *dev)
 {
@@ -390,6 +393,11 @@ static inline int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct
 	return -ENOTSUPP;
 }
 
+static inline struct dev_pm_opp *dev_pm_opp_find_opp_of_np(struct opp_table *opp_table, struct device_node *opp_np)
+{
+	return ERR_PTR(-ENOTSUPP);
+}
+
 static inline struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
 {
 	return NULL;
@@ -408,6 +416,11 @@ static inline int of_get_required_opp_performance_state(struct device_node *np,
 {
 	return -ENOTSUPP;
 }
+
+static inline struct device_node *of_parse_required_opp(struct device_node *np, int index)
+{
+	return NULL;
+}
 #endif
 
 #endif		/* __LINUX_OPP_H__ */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 1/9] OPP: Add and export helpers to get avg/peak bw Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-04-12  7:39   ` Chanwoo Choi
  2019-03-28 15:28 ` [PATCH RFC 4/9] dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver Sibi Sankar
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Saravana Kannan, Sibi Sankar

From: Saravana Kannan <skannan@codeaurora.org>

Many CPU architectures have caches that can scale independent of the
CPUs. Frequency scaling of the caches is necessary to make sure the cache
is not a performance bottleneck that leads to poor performance and
power. The same idea applies for RAM/DDR.

To achieve this, this patch add support for cpu based scaling to the
passive governor. This is accomplished by taking the current frequency
of each CPU frequency domain and then adjusts the frequency of the cache
(or any devfreq device) based on the frequency of the CPUs. It listens
to CPU frequency transition notifiers to keep itself up to date on the
current CPU frequency.

To decide the frequency of the device, the governor does one of the
following:
* Constructs a CPU frequency to device frequency mapping table from
  required-opps property of the devfreq device's opp_table

* Scales the device frequency in proportion to the CPU frequency. So, if
  the CPUs are running at their max frequency, the device runs at its
  max frequency. If the CPUs are running at their min frequency, the
  device runs at its min frequency. It is interpolated for frequencies
  in between.

Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
[Sibi: Integrated cpu-freqmap governor into passive_governor]
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/devfreq/Kconfig            |   4 +
 drivers/devfreq/governor_passive.c | 276 ++++++++++++++++++++++++++++-
 include/linux/devfreq.h            |  43 ++++-
 3 files changed, 315 insertions(+), 8 deletions(-)

diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 6a172d338f6d..9a45f464a56b 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -72,6 +72,10 @@ config DEVFREQ_GOV_PASSIVE
 	  device. This governor does not change the frequency by itself
 	  through sysfs entries. The passive governor recommends that
 	  devfreq device uses the OPP table to get the frequency/voltage.
+	  Alternatively the governor can also be chosen to scale based on
+	  the online CPUs current frequency. A CPU frequency to device
+	  frequency mapping table(s) is auto-populated by the governor
+	  for this purpose.
 
 comment "DEVFREQ Drivers"
 
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index 3bc29acbd54e..2506682b233b 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -11,10 +11,63 @@
  */
 
 #include <linux/module.h>
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
 #include <linux/device.h>
 #include <linux/devfreq.h>
+#include <linux/of.h>
+#include <linux/slab.h>
 #include "governor.h"
 
+static unsigned int xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
+					     unsigned int cpu)
+{
+	unsigned int cpu_min, cpu_max;
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	unsigned int dev_min, dev_max, cpu_percent, cpu_freq = 0, freq = 0;
+	unsigned long *freq_table = devfreq->profile->freq_table;
+	struct device *dev = devfreq->dev.parent;
+	struct devfreq_map *map;
+	int opp_cnt, i;
+
+	if (!data->state[cpu] || data->state[cpu]->first_cpu != cpu) {
+		freq = 0;
+		goto out;
+	}
+
+	/* Use Interpolation if map is not available */
+	cpu_freq = data->state[cpu]->freq;
+	if (!data->map) {
+		cpu_min = data->state[cpu]->min_freq;
+		cpu_max = data->state[cpu]->max_freq;
+		if (freq_table) {
+			dev_min = freq_table[0];
+			dev_max = freq_table[devfreq->profile->max_state - 1];
+		} else {
+			if (devfreq->max_freq <= devfreq->min_freq)
+				return 0;
+			dev_min = devfreq->min_freq;
+			dev_max = devfreq->max_freq;
+		}
+
+		cpu_percent = ((cpu_freq - cpu_min) * 100) / cpu_max - cpu_min;
+		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
+		goto out;
+	}
+
+	map = data->map[cpu];
+	opp_cnt = dev_pm_opp_get_opp_count(dev);
+	for (i = 0; i < opp_cnt; i++) {
+		freq = max(freq, map[i].dev_hz);
+		if (map[i].cpu_khz >= cpu_freq)
+			break;
+	}
+out:
+	dev_dbg(dev, "CPU%u: %d -> dev: %u\n", cpu, cpu_freq, freq);
+	return freq;
+}
+
 static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
 					unsigned long *freq)
 {
@@ -23,6 +76,7 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
 	struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
 	unsigned long child_freq = ULONG_MAX;
 	struct dev_pm_opp *opp;
+	unsigned int cpu, tgt_freq = 0;
 	int i, count, ret = 0;
 
 	/*
@@ -35,6 +89,14 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
 		goto out;
 	}
 
+	if (p_data->cpufreq_type) {
+		for_each_possible_cpu(cpu)
+			tgt_freq = max(tgt_freq,
+				       xlate_cpufreq_to_devfreq(p_data, cpu));
+		*freq = tgt_freq;
+		goto out;
+	}
+
 	/*
 	 * If the parent and passive devfreq device uses the OPP table,
 	 * get the next frequency by using the OPP table.
@@ -149,6 +211,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+static int cpufreq_passive_notifier_call(struct notifier_block *nb,
+					 unsigned long event, void *ptr)
+{
+	struct devfreq_passive_data *data =
+			container_of(nb, struct devfreq_passive_data, nb);
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	struct cpufreq_freqs *freq = ptr;
+	struct devfreq_cpu_state *state;
+	int ret = 0;
+
+	if (event != CPUFREQ_POSTCHANGE)
+		goto out;
+
+	state = data->state[freq->cpu];
+	if (!state)
+		goto out;
+
+	if (state->freq != freq->new) {
+		state->freq = freq->new;
+		mutex_lock(&devfreq->lock);
+		ret = update_devfreq(devfreq);
+		mutex_unlock(&devfreq->lock);
+		if (ret)
+			dev_err(&devfreq->dev, "Frequency update failed.\n");
+	}
+out:
+	return ret;
+}
+
+static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
+{
+	unsigned int cpu;
+	struct devfreq_map **cpu_map;
+	struct devfreq_map *map, *per_cpu_map;
+	struct devfreq_passive_data *data = *p_data;
+	struct devfreq *devfreq = (struct devfreq *)data->this;
+	int i, count = 0, opp_cnt = 0, ret = 0, iter_val = 0;
+	struct device_node *np, *opp_table_np, *cpu_np;
+	struct opp_table *opp_table, *cpu_opp_tbl;
+	struct device *dev = devfreq->dev.parent;
+	struct devfreq_cpu_state *state;
+	struct dev_pm_opp *opp, *cpu_opp;
+	struct cpufreq_policy *policy;
+	struct device *cpu_dev;
+	u64 cpu_khz, dev_hz;
+
+	get_online_cpus();
+	data->nb.notifier_call = cpufreq_passive_notifier_call;
+	ret = cpufreq_register_notifier(&data->nb,
+					CPUFREQ_TRANSITION_NOTIFIER);
+	if (ret)
+		return ret;
+
+	/* Populate devfreq_cpu_state */
+	for_each_online_cpu(cpu) {
+		if (data->state[cpu])
+			continue;
+
+		policy = cpufreq_cpu_get(cpu);
+		if (policy) {
+			state = kzalloc(sizeof(*state), GFP_KERNEL);
+			if (!state)
+				return -ENOMEM;
+
+			state->first_cpu = cpumask_first(policy->related_cpus);
+			state->freq = policy->cur;
+			state->min_freq = policy->cpuinfo.min_freq;
+			state->max_freq = policy->cpuinfo.max_freq;
+			data->state[cpu] = state;
+			cpufreq_cpu_put(policy);
+		} else {
+			return -EPROBE_DEFER;
+		}
+	}
+
+	opp_table_np = dev_pm_opp_of_get_opp_desc_node(dev);
+	if (!opp_table_np)
+		goto out;
+
+	opp_cnt = dev_pm_opp_get_opp_count(dev);
+	if (opp_cnt <= 0)
+		goto put_opp_table;
+
+	/* Allocate memory for devfreq_map*/
+	cpu_map = kcalloc(num_possible_cpus(), sizeof(*cpu_map), GFP_KERNEL);
+	if (!cpu_map)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		per_cpu_map = kcalloc(opp_cnt, sizeof(*per_cpu_map),
+				      GFP_KERNEL);
+		if (!per_cpu_map)
+			return -ENOMEM;
+		cpu_map[cpu] = per_cpu_map;
+	}
+	data->map = cpu_map;
+
+	/* Populate devfreq_map */
+	opp_table = dev_pm_opp_get_opp_table(dev);
+	if (!opp_table)
+		return -ENOMEM;
+
+	for_each_available_child_of_node(opp_table_np, np) {
+		opp = dev_pm_opp_find_opp_of_np(opp_table, np);
+		if (IS_ERR(opp))
+			continue;
+
+		dev_hz = dev_pm_opp_get_freq(opp);
+		dev_pm_opp_put(opp);
+
+		count = of_count_phandle_with_args(np, "required-opps", NULL);
+		for (i = 0; i < count; i++) {
+			for_each_possible_cpu(cpu) {
+				cpu_dev = get_cpu_device(cpu);
+				if (!cpu_dev) {
+					dev_err(dev, "CPU get device failed.\n");
+					continue;
+				}
+
+				cpu_np = of_parse_required_opp(np, i);
+				if (!cpu_np) {
+					dev_err(dev, "Parsing required opp failed.\n");
+					continue;
+				}
+
+				/* Get cpu opp-table */
+				cpu_opp_tbl = dev_pm_opp_get_opp_table(cpu_dev);
+				if (!cpu_opp_tbl) {
+					dev_err(dev, "CPU opp table get failed.\n");
+					goto put_cpu_node;
+				}
+
+				/* Match the cpu opp node from required-opp with
+				 * the cpu-opp table */
+				cpu_opp = dev_pm_opp_find_opp_of_np(cpu_opp_tbl,
+								    cpu_np);
+				if (!cpu_opp) {
+					dev_dbg(dev, "CPU opp get failed.\n");
+					goto put_cpu_opp_table;
+				}
+
+				cpu_khz = dev_pm_opp_get_freq(cpu_opp);
+				if (cpu_opp && cpu_khz) {
+					/* Update freq-map if not already set */
+					map = cpu_map[cpu];
+					map[iter_val].cpu_khz = cpu_khz / 1000;
+					map[iter_val].dev_hz = dev_hz;
+				}
+				dev_pm_opp_put(cpu_opp);
+put_cpu_opp_table:
+				dev_pm_opp_put_opp_table(cpu_opp_tbl);
+put_cpu_node:
+				of_node_put(cpu_np);
+			}
+		}
+		iter_val++;
+	}
+	dev_pm_opp_put_opp_table(opp_table);
+put_opp_table:
+	of_node_put(opp_table_np);
+out:
+	put_online_cpus();
+
+	/* Update devfreq */
+	mutex_lock(&devfreq->lock);
+	ret = update_devfreq(devfreq);
+	mutex_unlock(&devfreq->lock);
+	if (ret)
+		dev_err(dev, "Frequency update failed.\n");
+
+	return ret;
+}
+
+static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
+{
+	int cpu;
+	struct devfreq_passive_data *data = *p_data;
+
+	cpufreq_unregister_notifier(&data->nb,
+				    CPUFREQ_TRANSITION_NOTIFIER);
+
+	for_each_possible_cpu(cpu) {
+		kfree(data->state[cpu]);
+		kfree(data->map[cpu]);
+		data->state[cpu] = NULL;
+		data->map[cpu] = NULL;
+	}
+
+	kfree(data->map);
+	data->map = NULL;
+
+	return 0;
+}
+
 static int devfreq_passive_event_handler(struct devfreq *devfreq,
 				unsigned int event, void *data)
 {
@@ -159,7 +415,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
 	struct notifier_block *nb = &p_data->nb;
 	int ret = 0;
 
-	if (!parent)
+	if (!parent && !p_data->cpufreq_type)
 		return -EPROBE_DEFER;
 
 	switch (event) {
@@ -167,13 +423,21 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
 		if (!p_data->this)
 			p_data->this = devfreq;
 
-		nb->notifier_call = devfreq_passive_notifier_call;
-		ret = devm_devfreq_register_notifier(dev, parent, nb,
-					DEVFREQ_TRANSITION_NOTIFIER);
+		if (p_data->cpufreq_type) {
+			ret = cpufreq_passive_register(&p_data);
+		} else {
+			nb->notifier_call = devfreq_passive_notifier_call;
+			ret = devm_devfreq_register_notifier(dev, parent, nb,
+						DEVFREQ_TRANSITION_NOTIFIER);
+		}
 		break;
 	case DEVFREQ_GOV_STOP:
-		devm_devfreq_unregister_notifier(dev, parent, nb,
-					DEVFREQ_TRANSITION_NOTIFIER);
+		if (p_data->cpufreq_type) {
+			cpufreq_passive_unregister(&p_data);
+		} else {
+			devm_devfreq_unregister_notifier(dev, parent, nb,
+						DEVFREQ_TRANSITION_NOTIFIER);
+		}
 		break;
 	default:
 		break;
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index fbffa74bfc1b..e8235fbe49e6 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -265,6 +265,38 @@ struct devfreq_simple_ondemand_data {
 #endif
 
 #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
+/**
+ * struct devfreq_cpu_state - holds the per-cpu state
+ * @freq:	holds the current frequency of the cpu.
+ * @min_freq:	holds the min frequency of the cpu.
+ * @max_freq:	holds the max frequency of the cpu.
+ * @first_cpu:	holds the cpumask of the first cpu of a policy.
+ *
+ * This structure stores the required cpu_state of a cpu.
+ * This is auto-populated by the governor.
+ */
+struct devfreq_cpu_state {
+	unsigned int freq;
+	unsigned int min_freq;
+	unsigned int max_freq;
+	unsigned int first_cpu;
+};
+
+/**
+ * struct devfreq_map - holds mapping from cpu frequency
+ * to devfreq frequency
+ * @cpu_khz:	holds the cpu frequency in Khz
+ * @dev_hz:	holds the devfreq device frequency in Hz
+ *
+ * This structure stores the lookup table between cpu
+ * and the devfreq device. This is auto-populated by the
+ * governor.
+ */
+struct devfreq_map {
+	unsigned int cpu_khz;
+	unsigned int dev_hz;
+};
+
 /**
  * struct devfreq_passive_data - void *data fed to struct devfreq
  *	and devfreq_add_device
@@ -278,11 +310,13 @@ struct devfreq_simple_ondemand_data {
  *			the next frequency, should use this callback.
  * @this:	the devfreq instance of own device.
  * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
+ * @state:	holds the state min/max/current frequency of all online cpu's
+ * @map:	holds the maps between cpu frequency and device frequency
  *
  * The devfreq_passive_data have to set the devfreq instance of parent
  * device with governors except for the passive governor. But, don't need to
- * initialize the 'this' and 'nb' field because the devfreq core will handle
- * them.
+ * initialize the 'this', 'nb', 'state' and 'map' field because the devfreq
+ * core will handle them.
  */
 struct devfreq_passive_data {
 	/* Should set the devfreq instance of parent device */
@@ -291,9 +325,14 @@ struct devfreq_passive_data {
 	/* Optional callback to decide the next frequency of passvice device */
 	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
 
+	/* Should be set if the devfreq device wants to be scaled with cpu*/
+	u8 cpufreq_type;
+
 	/* For passive governor's internal use. Don't need to set them */
 	struct devfreq *this;
 	struct notifier_block nb;
+	struct devfreq_cpu_state *state[NR_CPUS];
+	struct devfreq_map **map;
 };
 #endif
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 4/9] dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (2 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 5/9] PM / devfreq: Add devfreq driver for interconnect bandwidth voting Sibi Sankar
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add dt-bindings support for a generic interconnect bandwidth voting
devfreq driver.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 .../devicetree/bindings/devfreq/icbw.txt      | 146 ++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/devfreq/icbw.txt

diff --git a/Documentation/devicetree/bindings/devfreq/icbw.txt b/Documentation/devicetree/bindings/devfreq/icbw.txt
new file mode 100644
index 000000000000..389aa77a2363
--- /dev/null
+++ b/Documentation/devicetree/bindings/devfreq/icbw.txt
@@ -0,0 +1,146 @@
+Interconnect bandwidth device
+
+icbw is a device that represents an interconnect path that connects two
+devices. This device is typically used to vote for BW requirements between
+two devices. Eg: CPU to DDR, GPU to DDR, etc. This device is expected to
+use passive govenor by default.
+
+Required properties:
+- compatible:		Must be "devfreq-icbw"
+- interconnects:	Pairs of phandles and interconnect provider specifier
+			to denote the edge source and destination ports of
+			the interconnect path. See also:
+		Documentation/devicetree/bindings/interconnect/interconnect.txt
+- operating-points-v2:	A phandle to an OPP v2 table that holds frequency,
+			bandwidth values (in MB/s) and required-opp's populated
+			with phandles pointing to the required per cpu opp. The
+			bandwidth (in MB/s) values depend on multiple properties
+			of the interconnect path like frequency, interconnect
+			width, etc.
+
+Example:
+
+cpus {
+	...
+
+	CPU0: cpu@0 {
+		...
+		operating-points-v2 = <&cpu0_opp_table>;
+		...
+	};
+
+	CPU1: cpu@100 {
+		...
+		operating-points-v2 = <&cpu0_opp_table>;
+		...
+	};
+
+	CPU2: cpu@200 {
+		...
+		operating-points-v2 = <&cpu0_opp_table>;
+		...
+	};
+
+	CPU3: cpu@300 {
+		...
+		operating-points-v2 = <&cpu0_opp_table>;
+		...
+	};
+
+	CPU4: cpu@400 {
+		...
+		operating-points-v2 = <&cpu4_opp_table>;
+		...
+	};
+
+	CPU5: cpu@500 {
+		...
+		operating-points-v2 = <&cpu4_opp_table>;
+		...
+	};
+
+	CPU6: cpu@600 {
+		...
+		operating-points-v2 = <&cpu4_opp_table>;
+		...
+	};
+
+	CPU7: cpu@700 {
+		...
+		operating-points-v2 = <&cpu4_opp_table>;
+		...
+	};
+};
+
+cpu0_opp_table: cpu0_opp_table {
+	compatible = "operating-points-v2";
+	opp-shared;
+
+	cpu0_opp1: opp-300000000 {
+		opp-hz = /bits/ 64 <300000000>;
+	};
+
+	...
+
+	cpu0_opp16: opp-1612800000 {
+		opp-hz = /bits/ 64 <1612800000>;
+	};
+
+	...
+};
+
+cpu4_opp_table: cpu4_opp_table {
+	compatible = "operating-points-v2";
+	opp-shared;
+
+	...
+
+	cpu4_opp4: opp-1056000000 {
+		opp-hz = /bits/ 64 <1056000000>;
+	};
+
+	cpu4_opp5: opp-1209600000 {
+		opp-hz = /bits/ 64 <1209600000>;
+	};
+
+	...
+};
+
+bw_opp_table: bw-opp-table {
+	compatible = "operating-points-v2";
+
+	opp-200  {
+		opp-hz = /bits/ 64 < 200000000 >; /* 200 MHz */
+		required-opps = <&cpu0_opp1>;
+		/* 0 MB/s average and 762 MB/s peak bandwidth */
+		opp-bw-MBs = <0 762>;
+	};
+
+	opp-300 {
+		opp-hz = /bits/ 64 < 300000000 >; /* 300 MHz */
+		/* 0 MB/s average and 1144 MB/s peak bandwidth */
+		opp-bw-MBs = <0 1144>;
+	};
+
+	...
+
+	opp-768 {
+		opp-hz = /bits/ 64 < 768000000 >; /* 768 MHz */
+		/* 0 MB/s average and 2929 MB/s peak bandwidth */
+		opp-bw-MBs = <0 2929>;
+		required-opps = <&cpu4_opp4>;
+	};
+
+	opp-1017 {
+		opp-hz = /bits/ 64 < 1017000000 >; /* 1017 MHz */
+		/* 0 MB/s average and 3879 MB/s peak bandwidth */
+		opp-bw-MBs = <0 3879>;
+		required-opps = <&cpu0_opp16>, <&cpu4_opp5>;
+	};
+};
+
+cpubw {
+	compatible = "devfreq-icbw";
+	interconnects = <&snoc MASTER_APSS_1 &bimc SLAVE_EBI_CH0>;
+	operating-points-v2 = <&bw_opp_table>;
+};
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 5/9] PM / devfreq: Add devfreq driver for interconnect bandwidth voting
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (3 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 4/9] dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 6/9] OPP: Add and export helper to update voltage Sibi Sankar
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Saravana Kannan, Sibi Sankar

From: Saravana Kannan <skannan@codeaurora.org>

This driver registers itself as a devfreq device that allows devfreq
governors to make bandwidth votes for an interconnect path. This allows
applying various policies for different interconnect paths using devfreq
governors.

Example uses:
* Use the devfreq performance governor to set the CPU to DDR
  interconnect path for maximum performance.
* Use the devfreq powersave governor to set the GPU to DDR
  interconnect path for minimum performance.
* Use the Passive governor to scale the DDR frequency based
  on the needs of the CPUs' current frequency

Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
[Sibi: cleanup and added passive governor support]
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/devfreq/Kconfig        |  15 ++++
 drivers/devfreq/Makefile       |   1 +
 drivers/devfreq/devfreq_icbw.c | 132 +++++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+)
 create mode 100644 drivers/devfreq/devfreq_icbw.c

diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 9a45f464a56b..0df5b65f157a 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -117,6 +117,21 @@ config ARM_RK3399_DMC_DEVFREQ
           It sets the frequency for the memory controller and reads the usage counts
           from hardware.
 
+config DEVFREQ_ICBW
+	tristate "Driver for making bandwidth votes on interconnect paths"
+	select DEVFREQ_GOV_PASSIVE
+	select DEVFREQ_GOV_PERFORMANCE
+	select DEVFREQ_GOV_POWERSAVE
+	select DEVFREQ_GOV_USERSPACE
+	depends on INTERCONNECT
+	default n
+	help
+	  Different devfreq governors use this devfreq device to make
+	  bandwidth votes for interconnect paths between different devices
+	  (Eg: CPU to DDR, GPU to DDR, etc). This driver provides a generic
+	  interface so that the devfreq governors can be shared across SoCs
+	  and architectures.
+
 source "drivers/devfreq/event/Kconfig"
 
 endif # PM_DEVFREQ
diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
index 32b8d4d3f12c..66591b0e5259 100644
--- a/drivers/devfreq/Makefile
+++ b/drivers/devfreq/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_DEVFREQ_GOV_PASSIVE)	+= governor_passive.o
 obj-$(CONFIG_ARM_EXYNOS_BUS_DEVFREQ)	+= exynos-bus.o
 obj-$(CONFIG_ARM_RK3399_DMC_DEVFREQ)	+= rk3399_dmc.o
 obj-$(CONFIG_ARM_TEGRA_DEVFREQ)		+= tegra-devfreq.o
+obj-$(CONFIG_DEVFREQ_ICBW)		+= devfreq_icbw.o
 
 # DEVFREQ Event Drivers
 obj-$(CONFIG_PM_DEVFREQ_EVENT)		+= event/
diff --git a/drivers/devfreq/devfreq_icbw.c b/drivers/devfreq/devfreq_icbw.c
new file mode 100644
index 000000000000..c552cf4218c8
--- /dev/null
+++ b/drivers/devfreq/devfreq_icbw.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
+ */
+
+#include <linux/devfreq.h>
+#include <linux/interconnect.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+struct icbw_data {
+	struct devfreq *df;
+	struct icc_path *path;
+	u32 cur_ab;
+	u32 cur_pb;
+};
+
+static int icbw_target(struct device *dev, unsigned long *freq, u32 flags)
+{
+	struct dev_pm_opp *opp;
+	struct icbw_data *data = dev_get_drvdata(dev);
+	u32 new_pb, new_ab;
+	int ret;
+
+	opp = devfreq_recommended_opp(dev, freq, flags);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	/* Get avg and peak bandwidth */
+	new_ab = dev_pm_opp_get_avg_bw(opp);
+	new_pb = dev_pm_opp_get_peak_bw(opp);
+	dev_pm_opp_put(opp);
+
+	if (data->cur_pb == new_pb && data->cur_ab == new_ab)
+		return 0;
+
+	dev_dbg(dev, "BW icc: AB: %u PB: %u\n", new_ab, new_pb);
+
+	ret = icc_set_bw(data->path, new_ab, new_pb);
+	if (ret) {
+		dev_err(dev, "bandwidth request failed (%d)\n", ret);
+	} else {
+		data->cur_pb = new_pb;
+		data->cur_ab = new_ab;
+	}
+
+	return ret;
+}
+
+static int devfreq_icbw_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct devfreq_dev_profile *profile;
+	struct devfreq_passive_data *passive_data;
+	struct icbw_data *data;
+	int ret;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+	dev_set_drvdata(dev, data);
+
+	passive_data = devm_kzalloc(dev, sizeof(*passive_data), GFP_KERNEL);
+	if (!passive_data)
+		return -ENOMEM;
+
+	passive_data->cpufreq_type = true;
+
+	profile = devm_kzalloc(dev, sizeof(*profile), GFP_KERNEL);
+	if (!profile)
+		return -ENOMEM;
+
+	profile->target = icbw_target;
+
+	data->path = of_icc_get(dev, NULL);
+	if (IS_ERR(data->path)) {
+		dev_err(dev, "Unable to register interconnect path\n");
+		return PTR_ERR(data->path);
+	}
+
+	ret = dev_pm_opp_of_add_table(dev);
+	if (ret) {
+		dev_err(dev, "Couldn't find OPP table\n");
+		goto err_icc;
+	}
+
+	data->df = devfreq_add_device(dev, profile,
+				      DEVFREQ_GOV_PASSIVE, passive_data);
+	if (IS_ERR(data->df)) {
+		ret = PTR_ERR(data->df);
+		goto err_opp_table;
+	}
+
+	return 0;
+
+err_opp_table:
+	dev_pm_opp_of_remove_table(dev);
+err_icc:
+	icc_put(data->path);
+	return ret;
+}
+
+static int devfreq_icbw_remove(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct icbw_data *data = dev_get_drvdata(dev);
+
+	devfreq_remove_device(data->df);
+	icc_put(data->path);
+	return 0;
+}
+
+static const struct of_device_id icbw_match_table[] = {
+	{ .compatible = "devfreq-icbw" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, icbw_match_table);
+
+static struct platform_driver icbw_driver = {
+	.probe = devfreq_icbw_probe,
+	.remove = devfreq_icbw_remove,
+	.driver = {
+		.name = "devfreq-icbw",
+		.of_match_table = icbw_match_table,
+	},
+};
+module_platform_driver(icbw_driver);
+
+MODULE_DESCRIPTION("Interconnect bandwidth voting driver");
+MODULE_LICENSE("GPL v2");
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 6/9] OPP: Add and export helper to update voltage
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (4 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 5/9] PM / devfreq: Add devfreq driver for interconnect bandwidth voting Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-04-10 10:24   ` Viresh Kumar
  2019-03-28 15:28 ` [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables Sibi Sankar
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add and export 'dev_pm_opp_update_voltage' to find and update voltage
of an opp for a given frequency. This will be useful to update the opps
with voltages read back from firmware.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/opp/core.c     | 62 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_opp.h | 10 +++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index addaf7aae9ae..c066cd120a33 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -2090,6 +2090,68 @@ int dev_pm_opp_disable(struct device *dev, unsigned long freq)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_disable);
 
+static int _opp_update_voltage(struct device *dev, unsigned long freq,
+			       unsigned long u_volt)
+{
+	struct opp_table *opp_table;
+	struct dev_pm_opp *tmp_opp, *opp = ERR_PTR(-ENODEV);
+	int r = 0;
+
+	/* Find the opp_table */
+	opp_table = _find_opp_table(dev);
+	if (IS_ERR(opp_table)) {
+		r = PTR_ERR(opp_table);
+		dev_warn(dev, "%s: Device OPP not found (%d)\n", __func__, r);
+		return r;
+	}
+
+	mutex_lock(&opp_table->lock);
+
+	/* Do we have the frequency? */
+	list_for_each_entry(tmp_opp, &opp_table->opp_list, node) {
+		if (tmp_opp->rate == freq) {
+			opp = tmp_opp;
+			break;
+		}
+	}
+
+	if (IS_ERR(opp)) {
+		r = PTR_ERR(opp);
+		goto unlock;
+	}
+
+	/* update only if the opp is disabled */
+	if (opp->available)
+		goto unlock;
+
+	opp->supplies[0].u_volt = u_volt;
+
+unlock:
+	mutex_unlock(&opp_table->lock);
+	dev_pm_opp_put_opp_table(opp_table);
+	return r;
+}
+
+/**
+ * dev_pm_opp_update_voltage() - find and update voltage of an opp
+ *				 for a given frequency
+ * @dev:	device for which we do this operation
+ * @freq:	OPP frequency to update voltage
+ * @u_volt:	voltage requested for this opp
+ *
+ * This is useful only for devices with single power supply.
+ *
+ * Return: -EINVAL for bad pointers, -ENOMEM if no memory available for the
+ * copy operation, returns 0 if no modification was done OR modification was
+ * successful.
+ */
+int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
+			      unsigned long u_volt)
+{
+	return _opp_update_voltage(dev, freq, u_volt);
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_update_voltage);
+
 /**
  * dev_pm_opp_register_notifier() - Register OPP notifier for the device
  * @dev:	Device for which notifier needs to be registered
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index d7cb0e65c4f0..58490f6839ce 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -130,6 +130,9 @@ int dev_pm_opp_enable(struct device *dev, unsigned long freq);
 
 int dev_pm_opp_disable(struct device *dev, unsigned long freq);
 
+int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
+			      unsigned long u_volt);
+
 int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb);
 int dev_pm_opp_unregister_notifier(struct device *dev, struct notifier_block *nb);
 
@@ -265,6 +268,13 @@ static inline int dev_pm_opp_disable(struct device *dev, unsigned long freq)
 	return 0;
 }
 
+static inline int dev_pm_opp_update_voltage(struct device *dev,
+					    unsigned long freq,
+					    unsigned long u_volt)
+{
+	return 0;
+}
+
 static inline int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb)
 {
 	return -ENOTSUPP;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (5 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 6/9] OPP: Add and export helper to update voltage Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-04-10 10:33   ` Viresh Kumar
  2019-03-28 15:28 ` [PATCH RFC 8/9] arm64: dts: qcom: sdm845: Add cpu " Sibi Sankar
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add support to parse and update OPP tables attached to the cpu nodes.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 4b0b50403901..5c268dd2346c 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -73,6 +73,25 @@ static unsigned int qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
 	return policy->freq_table[index].frequency;
 }
 
+static int qcom_find_update_opp(struct device *cpu_dev, unsigned long freq,
+				unsigned long volt)
+{
+	int ret;
+	struct dev_pm_opp *opp;
+
+	opp = dev_pm_opp_find_freq_exact(cpu_dev, freq, true);
+	if (IS_ERR(opp)) {
+		ret = dev_pm_opp_add(cpu_dev, freq, volt);
+	} else {
+		dev_pm_opp_disable(cpu_dev, freq);
+		ret = dev_pm_opp_update_voltage(cpu_dev, freq, volt);
+		dev_pm_opp_enable(cpu_dev, freq);
+		dev_pm_opp_put(opp);
+	}
+
+	return ret;
+}
+
 static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 				    struct cpufreq_policy *policy,
 				    void __iomem *base)
@@ -81,11 +100,16 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 	u32 volt;
 	unsigned int max_cores = cpumask_weight(policy->cpus);
 	struct cpufreq_frequency_table	*table;
+	int ret;
 
 	table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
+	ret = dev_pm_opp_of_add_table(cpu_dev);
+	if (ret)
+		dev_dbg(cpu_dev, "Couldn't add OPP table\n");
+
 	for (i = 0; i < LUT_MAX_ENTRIES; i++) {
 		data = readl_relaxed(base + REG_FREQ_LUT +
 				      i * LUT_ROW_SIZE);
@@ -104,7 +128,7 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 
 		if (freq != prev_freq && core_count == max_cores) {
 			table[i].frequency = freq;
-			dev_pm_opp_add(cpu_dev, freq * 1000, volt);
+			qcom_find_update_opp(cpu_dev, freq * 1000, volt);
 			dev_dbg(cpu_dev, "index=%d freq=%d, core_count %d\n", i,
 				freq, core_count);
 		} else {
@@ -125,7 +149,8 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 			if (prev_cc != max_cores) {
 				prev->frequency = prev_freq;
 				prev->flags = CPUFREQ_BOOST_FREQ;
-				dev_pm_opp_add(cpu_dev,	prev_freq * 1000, volt);
+				qcom_find_update_opp(cpu_dev, prev_freq * 1000,
+						     volt);
 			}
 
 			break;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 8/9] arm64: dts: qcom: sdm845: Add cpu OPP tables
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (6 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables Sibi Sankar
@ 2019-03-28 15:28 ` " Sibi Sankar
  2019-03-28 15:28 ` [PATCH RFC 9/9] arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables Sibi Sankar
  2019-04-11  7:02 ` [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add a OPP tables for the cpu nodes.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 182 +++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index cd8ac481381b..072563f6b6cb 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -193,6 +193,7 @@
 			qcom,freq-domain = <&cpufreq_hw 0>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_0>;
+			operating-points-v2 = <&cpu0_opp_table>;
 			L2_0: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -210,6 +211,7 @@
 			qcom,freq-domain = <&cpufreq_hw 0>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_100>;
+			operating-points-v2 = <&cpu0_opp_table>;
 			L2_100: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -224,6 +226,7 @@
 			qcom,freq-domain = <&cpufreq_hw 0>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_200>;
+			operating-points-v2 = <&cpu0_opp_table>;
 			L2_200: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -238,6 +241,7 @@
 			qcom,freq-domain = <&cpufreq_hw 0>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_300>;
+			operating-points-v2 = <&cpu0_opp_table>;
 			L2_300: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -252,6 +256,7 @@
 			qcom,freq-domain = <&cpufreq_hw 1>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_400>;
+			operating-points-v2 = <&cpu4_opp_table>;
 			L2_400: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -266,6 +271,7 @@
 			qcom,freq-domain = <&cpufreq_hw 1>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_500>;
+			operating-points-v2 = <&cpu4_opp_table>;
 			L2_500: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -280,6 +286,7 @@
 			qcom,freq-domain = <&cpufreq_hw 1>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_600>;
+			operating-points-v2 = <&cpu4_opp_table>;
 			L2_600: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -294,6 +301,7 @@
 			qcom,freq-domain = <&cpufreq_hw 1>;
 			#cooling-cells = <2>;
 			next-level-cache = <&L2_700>;
+			operating-points-v2 = <&cpu4_opp_table>;
 			L2_700: l2-cache {
 				compatible = "cache";
 				next-level-cache = <&L3_0>;
@@ -301,6 +309,180 @@
 		};
 	};
 
+	cpu0_opp_table: cpu0_opp_table {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		cpu0_opp1: opp-300000000 {
+			opp-hz = /bits/ 64 <300000000>;
+		};
+
+		cpu0_opp2: opp-403200000 {
+			opp-hz = /bits/ 64 <403200000>;
+		};
+
+		cpu0_opp3: opp-480000000 {
+			opp-hz = /bits/ 64 <480000000>;
+		};
+
+		cpu0_opp4: opp-576000000 {
+			opp-hz = /bits/ 64 <576000000>;
+		};
+
+		cpu0_opp5: opp-652800000 {
+			opp-hz = /bits/ 64 <652800000>;
+		};
+
+		cpu0_opp6: opp-748800000 {
+			opp-hz = /bits/ 64 <748800000>;
+		};
+
+		cpu0_opp7: opp-825600000 {
+			opp-hz = /bits/ 64 <825600000>;
+		};
+
+		cpu0_opp8: opp-902400000 {
+			opp-hz = /bits/ 64 <902400000>;
+		};
+
+		cpu0_opp9: opp-979200000 {
+			opp-hz = /bits/ 64 <979200000>;
+		};
+
+		cpu0_opp10: opp-1056000000 {
+			opp-hz = /bits/ 64 <1056000000>;
+		};
+
+		cpu0_opp11: opp-1132800000 {
+			opp-hz = /bits/ 64 <1132800000>;
+		};
+
+		cpu0_opp12: opp-1228800000 {
+			opp-hz = /bits/ 64 <1228800000>;
+		};
+
+		cpu0_opp13: opp-1324800000 {
+			opp-hz = /bits/ 64 <1324800000>;
+		};
+
+		cpu0_opp14: opp-1420800000 {
+			opp-hz = /bits/ 64 <1420800000>;
+		};
+
+		cpu0_opp15: opp-1516800000 {
+			opp-hz = /bits/ 64 <1516800000>;
+		};
+
+		cpu0_opp16: opp-1612800000 {
+			opp-hz = /bits/ 64 <1612800000>;
+		};
+
+		cpu0_opp17: opp-1689600000 {
+			opp-hz = /bits/ 64 <1689600000>;
+		};
+
+		cpu0_opp18: opp-1766400000 {
+			opp-hz = /bits/ 64 <1766400000>;
+		};
+	};
+
+	cpu4_opp_table: cpu4_opp_table {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		cpu4_opp1: opp-825600000 {
+			opp-hz = /bits/ 64 <825600000>;
+		};
+
+		cpu4_opp2: opp-902400000 {
+			opp-hz = /bits/ 64 <902400000>;
+		};
+
+		cpu4_opp3: opp-979200000 {
+			opp-hz = /bits/ 64 <979200000>;
+		};
+
+		cpu4_opp4: opp-1056000000 {
+			opp-hz = /bits/ 64 <1056000000>;
+		};
+
+		cpu4_opp5: opp-1209600000 {
+			opp-hz = /bits/ 64 <1209600000>;
+		};
+
+		cpu4_opp6: opp-1286400000 {
+			opp-hz = /bits/ 64 <1286400000>;
+		};
+
+		cpu4_opp7: opp-1363200000 {
+			opp-hz = /bits/ 64 <1363200000>;
+		};
+
+		cpu4_opp8: opp-1459200000 {
+			opp-hz = /bits/ 64 <1459200000>;
+		};
+
+		cpu4_opp9: opp-1536000000 {
+			opp-hz = /bits/ 64 <1536000000>;
+		};
+
+		cpu4_opp10: opp-1612800000 {
+			opp-hz = /bits/ 64 <1612800000>;
+		};
+
+		cpu4_opp11: opp-1689600000 {
+			opp-hz = /bits/ 64 <1689600000>;
+		};
+
+		cpu4_opp12: opp-1766400000 {
+			opp-hz = /bits/ 64 <1766400000>;
+		};
+
+		cpu4_opp13: opp-1843200000 {
+			opp-hz = /bits/ 64 <1843200000>;
+		};
+
+		cpu4_opp14: opp-1920000000 {
+			opp-hz = /bits/ 64 <1920000000>;
+		};
+
+		cpu4_opp15: opp-1996800000 {
+			opp-hz = /bits/ 64 <1996800000>;
+		};
+
+		cpu4_opp16: opp-2092800000 {
+			opp-hz = /bits/ 64 <2092800000>;
+		};
+
+		cpu4_opp17: opp-2169600000 {
+			opp-hz = /bits/ 64 <2169600000>;
+		};
+
+		cpu4_opp18: opp-2246400000 {
+			opp-hz = /bits/ 64 <2246400000>;
+		};
+
+		cpu4_opp19: opp-2323200000 {
+			opp-hz = /bits/ 64 <2323200000>;
+		};
+
+		cpu4_opp20: opp-2400000000 {
+			opp-hz = /bits/ 64 <2400000000>;
+		};
+
+		cpu4_opp21: opp-2476800000 {
+			opp-hz = /bits/ 64 <2476800000>;
+		};
+
+		cpu4_opp22: opp-2553600000 {
+			opp-hz = /bits/ 64 <2553600000>;
+		};
+
+		cpu4_opp23: opp-2649600000 {
+			opp-hz = /bits/ 64 <2649600000>;
+		};
+	};
+
 	pmu {
 		compatible = "arm,armv8-pmuv3";
 		interrupts = <GIC_PPI 5 IRQ_TYPE_LEVEL_HIGH>;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 9/9] arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (7 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 8/9] arm64: dts: qcom: sdm845: Add cpu " Sibi Sankar
@ 2019-03-28 15:28 ` Sibi Sankar
  2019-04-11  7:02 ` [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-03-28 15:28 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, cw00.choi, linux-pm,
	evgreen, daidavid1, dianders, Sibi Sankar

Add nodes to enable DDR devfreq driver on SDM845 SoC.

Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 80 ++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 072563f6b6cb..21a0068855e8 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -11,6 +11,7 @@
 #include <dt-bindings/clock/qcom,lpass-sdm845.h>
 #include <dt-bindings/clock/qcom,rpmh.h>
 #include <dt-bindings/clock/qcom,videocc-sdm845.h>
+#include <dt-bindings/interconnect/qcom,sdm845.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
 #include <dt-bindings/phy/phy-qcom-qusb2.h>
 #include <dt-bindings/power/qcom-aoss-qmp.h>
@@ -488,6 +489,85 @@
 		interrupts = <GIC_PPI 5 IRQ_TYPE_LEVEL_HIGH>;
 	};
 
+	cpubw: cpu-icbw {
+		compatible = "devfreq-icbw";
+		interconnects = <&rsc_hlos MASTER_APPSS_PROC &rsc_hlos SLAVE_EBI1>;
+		operating-points-v2 = <&bw_opp_table>;
+	};
+
+	bw_opp_table: bw-opp-table {
+		compatible = "operating-points-v2";
+
+		opp-200  {
+			opp-hz = /bits/ 64 < 200000000 >; /* 200 MHz */
+			required-opps = <&cpu0_opp1>;
+			/* 0 MB/s average and 762 MB/s peak bandwidth */
+			opp-bw-MBs = <0 762>;
+		};
+
+		opp-300 {
+			opp-hz = /bits/ 64 < 300000000 >; /* 300 MHz */
+			/* 0 MB/s average and 1144 MB/s peak bandwidth */
+			opp-bw-MBs = <0 1144>;
+		};
+
+		opp-451 {
+			opp-hz = /bits/ 64 < 451000000 >; /* 451 MHz */
+			/* 0 MB/s average and 1720 MB/s peak bandwidth */
+			opp-bw-MBs = <0 1720>;
+			required-opps = <&cpu0_opp6>;
+		};
+
+		opp-547 {
+			opp-hz = /bits/ 64 < 547000000 >; /* 547 MHz */
+			/* 0 MB/s average and 2086 MB/s peak bandwidth */
+			opp-bw-MBs = <0 2086>;
+			required-opps = <&cpu0_opp11>;
+		};
+
+		opp-681 {
+			opp-hz = /bits/ 64 < 681000000 >; /* 681 MHz */
+			/* 0 MB/s average and 2597 MB/s peak bandwidth */
+			opp-bw-MBs = <0 2597>;
+			required-opps = <&cpu0_opp15>;
+		};
+
+		opp-768 {
+			opp-hz = /bits/ 64 < 768000000 >; /* 768 MHz */
+			/* 0 MB/s average and 2929 MB/s peak bandwidth */
+			opp-bw-MBs = <0 2929>;
+			required-opps = <&cpu4_opp4>;
+		};
+
+		opp-1017 {
+			opp-hz = /bits/ 64 < 1017000000 >; /* 1017 MHz */
+			/* 0 MB/s average and 3879 MB/s peak bandwidth */
+			opp-bw-MBs = <0 3879>;
+			required-opps = <&cpu0_opp16>, <&cpu4_opp5>;
+		};
+
+		opp-1296 {
+			opp-hz = /bits/ 64 < 1296000000 >; /* 1296 MHz */
+			/* 0 MB/s average and 4943 MB/s peak bandwidth */
+			opp-bw-MBs = <0 4943>;
+			required-opps = <&cpu4_opp10>;
+		};
+
+		opp-1555 {
+			opp-hz = /bits/ 64 < 1555000000 >; /* 1555 MHz */
+			/* 0 MB/s average and 5931 MB/s peak bandwidth */
+			opp-bw-MBs = <0 5931>;
+			required-opps = <&cpu4_opp12>;
+		};
+
+		opp-1804 {
+			opp-hz = /bits/ 64 < 1804000000 >; /* 1804 MHz */
+			/* 0 MB/s average and 6881 MB/s peak bandwidth */
+			opp-bw-MBs = <0 6881>;
+			required-opps = <&cpu4_opp15>;
+		};
+	};
+
 	timer {
 		compatible = "arm,armv8-timer";
 		interrupts = <GIC_PPI 1 IRQ_TYPE_LEVEL_LOW>,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 6/9] OPP: Add and export helper to update voltage
  2019-03-28 15:28 ` [PATCH RFC 6/9] OPP: Add and export helper to update voltage Sibi Sankar
@ 2019-04-10 10:24   ` Viresh Kumar
  2019-04-10 11:08     ` Sibi Sankar
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2019-04-10 10:24 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw, nm, sboyd,
	georgi.djakov, bjorn.andersson, david.brown, mark.rutland,
	linux-kernel, linux-arm-msm-owner, devicetree, rnayak, cw00.choi,
	linux-pm, evgreen, daidavid1, dianders

On 28-03-19, 20:58, Sibi Sankar wrote:
> Add and export 'dev_pm_opp_update_voltage' to find and update voltage
> of an opp for a given frequency. This will be useful to update the opps
> with voltages read back from firmware.
> 
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> ---
>  drivers/opp/core.c     | 62 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm_opp.h | 10 +++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index addaf7aae9ae..c066cd120a33 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -2090,6 +2090,68 @@ int dev_pm_opp_disable(struct device *dev, unsigned long freq)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_disable);
>  
> +static int _opp_update_voltage(struct device *dev, unsigned long freq,
> +			       unsigned long u_volt)
> +{
> +	struct opp_table *opp_table;
> +	struct dev_pm_opp *tmp_opp, *opp = ERR_PTR(-ENODEV);
> +	int r = 0;

s/r/ret/

> +
> +	/* Find the opp_table */
> +	opp_table = _find_opp_table(dev);
> +	if (IS_ERR(opp_table)) {
> +		r = PTR_ERR(opp_table);
> +		dev_warn(dev, "%s: Device OPP not found (%d)\n", __func__, r);

Why dev_warn instead of dev_err which is normally used in this file ? And you
weren't searching for OPP but opp_table. Put out the correct error message
please.

> +		return r;
> +	}
> +
> +	mutex_lock(&opp_table->lock);
> +
> +	/* Do we have the frequency? */
> +	list_for_each_entry(tmp_opp, &opp_table->opp_list, node) {
> +		if (tmp_opp->rate == freq) {
> +			opp = tmp_opp;
> +			break;
> +		}
> +	}
> +
> +	if (IS_ERR(opp)) {
> +		r = PTR_ERR(opp);
> +		goto unlock;
> +	}
> +
> +	/* update only if the opp is disabled */
> +	if (opp->available)
> +		goto unlock;
> +

What about reusing dev_pm_opp_find_freq_exact() instead of the above code in
this routine ?

> +	opp->supplies[0].u_volt = u_volt;
> +
> +unlock:
> +	mutex_unlock(&opp_table->lock);
> +	dev_pm_opp_put_opp_table(opp_table);
> +	return r;
> +}
> +
> +/**
> + * dev_pm_opp_update_voltage() - find and update voltage of an opp
> + *				 for a given frequency
> + * @dev:	device for which we do this operation
> + * @freq:	OPP frequency to update voltage
> + * @u_volt:	voltage requested for this opp
> + *
> + * This is useful only for devices with single power supply.
> + *
> + * Return: -EINVAL for bad pointers, -ENOMEM if no memory available for the
> + * copy operation, returns 0 if no modification was done OR modification was
> + * successful.
> + */
> +int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
> +			      unsigned long u_volt)
> +{
> +	return _opp_update_voltage(dev, freq, u_volt);

Remove the unnecessary wrapper and open code this routine here. While at it, you
should updated min/max/target and not just target voltage.

> +}
> +EXPORT_SYMBOL_GPL(dev_pm_opp_update_voltage);
> +
>  /**
>   * dev_pm_opp_register_notifier() - Register OPP notifier for the device
>   * @dev:	Device for which notifier needs to be registered
> diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
> index d7cb0e65c4f0..58490f6839ce 100644
> --- a/include/linux/pm_opp.h
> +++ b/include/linux/pm_opp.h
> @@ -130,6 +130,9 @@ int dev_pm_opp_enable(struct device *dev, unsigned long freq);
>  
>  int dev_pm_opp_disable(struct device *dev, unsigned long freq);
>  
> +int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
> +			      unsigned long u_volt);
> +
>  int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb);
>  int dev_pm_opp_unregister_notifier(struct device *dev, struct notifier_block *nb);
>  
> @@ -265,6 +268,13 @@ static inline int dev_pm_opp_disable(struct device *dev, unsigned long freq)
>  	return 0;
>  }
>  
> +static inline int dev_pm_opp_update_voltage(struct device *dev,
> +					    unsigned long freq,
> +					    unsigned long u_volt)
> +{
> +	return 0;
> +}
> +
>  static inline int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb)
>  {
>  	return -ENOTSUPP;
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project

-- 
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables
  2019-03-28 15:28 ` [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables Sibi Sankar
@ 2019-04-10 10:33   ` Viresh Kumar
  2019-04-10 11:16     ` Sibi Sankar
  0 siblings, 1 reply; 21+ messages in thread
From: Viresh Kumar @ 2019-04-10 10:33 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw, nm, sboyd,
	georgi.djakov, bjorn.andersson, david.brown, mark.rutland,
	linux-kernel, linux-arm-msm-owner, devicetree, rnayak, cw00.choi,
	linux-pm, evgreen, daidavid1, dianders

On 28-03-19, 20:58, Sibi Sankar wrote:
> Add support to parse and update OPP tables attached to the cpu nodes.
> 
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 4b0b50403901..5c268dd2346c 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -73,6 +73,25 @@ static unsigned int qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
>  	return policy->freq_table[index].frequency;
>  }
>  
> +static int qcom_find_update_opp(struct device *cpu_dev, unsigned long freq,
> +				unsigned long volt)
> +{
> +	int ret;
> +	struct dev_pm_opp *opp;
> +
> +	opp = dev_pm_opp_find_freq_exact(cpu_dev, freq, true);
> +	if (IS_ERR(opp)) {
> +		ret = dev_pm_opp_add(cpu_dev, freq, volt);

With my comment on the other patch, you can just call
dev_pm_opp_update_voltage() and if that fails then call dev_pm_opp_add().

> +	} else {
> +		dev_pm_opp_disable(cpu_dev, freq);
> +		ret = dev_pm_opp_update_voltage(cpu_dev, freq, volt);
> +		dev_pm_opp_enable(cpu_dev, freq);

Perhaps no one else should be using the CPU OPP table at this point of time and
so we can get away with disable and enable stuff ? Just add a comment on why
that works.

> +		dev_pm_opp_put(opp);
> +	}
> +
> +	return ret;
> +}
> +
>  static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>  				    struct cpufreq_policy *policy,
>  				    void __iomem *base)
> @@ -81,11 +100,16 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>  	u32 volt;
>  	unsigned int max_cores = cpumask_weight(policy->cpus);
>  	struct cpufreq_frequency_table	*table;
> +	int ret;
>  
>  	table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
>  	if (!table)
>  		return -ENOMEM;
>  
> +	ret = dev_pm_opp_of_add_table(cpu_dev);
> +	if (ret)
> +		dev_dbg(cpu_dev, "Couldn't add OPP table\n");
> +
>  	for (i = 0; i < LUT_MAX_ENTRIES; i++) {
>  		data = readl_relaxed(base + REG_FREQ_LUT +
>  				      i * LUT_ROW_SIZE);
> @@ -104,7 +128,7 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>  
>  		if (freq != prev_freq && core_count == max_cores) {
>  			table[i].frequency = freq;
> -			dev_pm_opp_add(cpu_dev, freq * 1000, volt);
> +			qcom_find_update_opp(cpu_dev, freq * 1000, volt);
>  			dev_dbg(cpu_dev, "index=%d freq=%d, core_count %d\n", i,
>  				freq, core_count);
>  		} else {
> @@ -125,7 +149,8 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>  			if (prev_cc != max_cores) {
>  				prev->frequency = prev_freq;
>  				prev->flags = CPUFREQ_BOOST_FREQ;
> -				dev_pm_opp_add(cpu_dev,	prev_freq * 1000, volt);
> +				qcom_find_update_opp(cpu_dev, prev_freq * 1000,
> +						     volt);
>  			}
>  
>  			break;
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project

-- 
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 6/9] OPP: Add and export helper to update voltage
  2019-04-10 10:24   ` Viresh Kumar
@ 2019-04-10 11:08     ` Sibi Sankar
  0 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-04-10 11:08 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw, nm, sboyd,
	georgi.djakov, bjorn.andersson, david.brown, mark.rutland,
	linux-kernel, linux-arm-msm-owner, devicetree, rnayak, cw00.choi,
	linux-pm, evgreen, daidavid1, dianders

Hey Viresh,
Thanks for the review!

On 2019-04-10 15:54, Viresh Kumar wrote:
> On 28-03-19, 20:58, Sibi Sankar wrote:
>> Add and export 'dev_pm_opp_update_voltage' to find and update voltage
>> of an opp for a given frequency. This will be useful to update the 
>> opps
>> with voltages read back from firmware.
>> 
>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> ---
>>  drivers/opp/core.c     | 62 
>> ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pm_opp.h | 10 +++++++
>>  2 files changed, 72 insertions(+)
>> 
>> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
>> index addaf7aae9ae..c066cd120a33 100644
>> --- a/drivers/opp/core.c
>> +++ b/drivers/opp/core.c
>> @@ -2090,6 +2090,68 @@ int dev_pm_opp_disable(struct device *dev, 
>> unsigned long freq)
>>  }
>>  EXPORT_SYMBOL_GPL(dev_pm_opp_disable);
>> 
>> +static int _opp_update_voltage(struct device *dev, unsigned long 
>> freq,
>> +			       unsigned long u_volt)
>> +{
>> +	struct opp_table *opp_table;
>> +	struct dev_pm_opp *tmp_opp, *opp = ERR_PTR(-ENODEV);
>> +	int r = 0;
> 
> s/r/ret/
> 
>> +
>> +	/* Find the opp_table */
>> +	opp_table = _find_opp_table(dev);
>> +	if (IS_ERR(opp_table)) {
>> +		r = PTR_ERR(opp_table);
>> +		dev_warn(dev, "%s: Device OPP not found (%d)\n", __func__, r);
> 
> Why dev_warn instead of dev_err which is normally used in this file ? 
> And you
> weren't searching for OPP but opp_table. Put out the correct error 
> message
> please.

yeah I'll use dev_err (I'll fix the same in
_opp_set_availability too)

> 
>> +		return r;
>> +	}
>> +
>> +	mutex_lock(&opp_table->lock);
>> +
>> +	/* Do we have the frequency? */
>> +	list_for_each_entry(tmp_opp, &opp_table->opp_list, node) {
>> +		if (tmp_opp->rate == freq) {
>> +			opp = tmp_opp;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (IS_ERR(opp)) {
>> +		r = PTR_ERR(opp);
>> +		goto unlock;
>> +	}
>> +
>> +	/* update only if the opp is disabled */
>> +	if (opp->available)
>> +		goto unlock;
>> +
> 
> What about reusing dev_pm_opp_find_freq_exact() instead of the above 
> code in
> this routine ?

yes I'll use that

> 
>> +	opp->supplies[0].u_volt = u_volt;
>> +
>> +unlock:
>> +	mutex_unlock(&opp_table->lock);
>> +	dev_pm_opp_put_opp_table(opp_table);
>> +	return r;
>> +}
>> +
>> +/**
>> + * dev_pm_opp_update_voltage() - find and update voltage of an opp
>> + *				 for a given frequency
>> + * @dev:	device for which we do this operation
>> + * @freq:	OPP frequency to update voltage
>> + * @u_volt:	voltage requested for this opp
>> + *
>> + * This is useful only for devices with single power supply.
>> + *
>> + * Return: -EINVAL for bad pointers, -ENOMEM if no memory available 
>> for the
>> + * copy operation, returns 0 if no modification was done OR 
>> modification was
>> + * successful.
>> + */
>> +int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
>> +			      unsigned long u_volt)
>> +{
>> +	return _opp_update_voltage(dev, freq, u_volt);
> 
> Remove the unnecessary wrapper and open code this routine here. While 
> at it, you
> should updated min/max/target and not just target voltage.

will fix this in the next-respin

> 
>> +}
>> +EXPORT_SYMBOL_GPL(dev_pm_opp_update_voltage);
>> +
>>  /**
>>   * dev_pm_opp_register_notifier() - Register OPP notifier for the 
>> device
>>   * @dev:	Device for which notifier needs to be registered
>> diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
>> index d7cb0e65c4f0..58490f6839ce 100644
>> --- a/include/linux/pm_opp.h
>> +++ b/include/linux/pm_opp.h
>> @@ -130,6 +130,9 @@ int dev_pm_opp_enable(struct device *dev, unsigned 
>> long freq);
>> 
>>  int dev_pm_opp_disable(struct device *dev, unsigned long freq);
>> 
>> +int dev_pm_opp_update_voltage(struct device *dev, unsigned long freq,
>> +			      unsigned long u_volt);
>> +
>>  int dev_pm_opp_register_notifier(struct device *dev, struct 
>> notifier_block *nb);
>>  int dev_pm_opp_unregister_notifier(struct device *dev, struct 
>> notifier_block *nb);
>> 
>> @@ -265,6 +268,13 @@ static inline int dev_pm_opp_disable(struct 
>> device *dev, unsigned long freq)
>>  	return 0;
>>  }
>> 
>> +static inline int dev_pm_opp_update_voltage(struct device *dev,
>> +					    unsigned long freq,
>> +					    unsigned long u_volt)
>> +{
>> +	return 0;
>> +}
>> +
>>  static inline int dev_pm_opp_register_notifier(struct device *dev, 
>> struct notifier_block *nb)
>>  {
>>  	return -ENOTSUPP;
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
>> Forum,
>> a Linux Foundation Collaborative Project

-- 
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables
  2019-04-10 10:33   ` Viresh Kumar
@ 2019-04-10 11:16     ` Sibi Sankar
  2019-04-10 11:25       ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-04-10 11:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw, nm, sboyd,
	georgi.djakov, bjorn.andersson, david.brown, mark.rutland,
	linux-kernel, linux-arm-msm-owner, devicetree, rnayak, cw00.choi,
	linux-pm, evgreen, daidavid1, dianders

On 2019-04-10 16:03, Viresh Kumar wrote:
> On 28-03-19, 20:58, Sibi Sankar wrote:
>> Add support to parse and update OPP tables attached to the cpu nodes.
>> 
>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> ---
>>  drivers/cpufreq/qcom-cpufreq-hw.c | 29 +++++++++++++++++++++++++++--
>>  1 file changed, 27 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c 
>> b/drivers/cpufreq/qcom-cpufreq-hw.c
>> index 4b0b50403901..5c268dd2346c 100644
>> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
>> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
>> @@ -73,6 +73,25 @@ static unsigned int 
>> qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
>>  	return policy->freq_table[index].frequency;
>>  }
>> 
>> +static int qcom_find_update_opp(struct device *cpu_dev, unsigned long 
>> freq,
>> +				unsigned long volt)
>> +{
>> +	int ret;
>> +	struct dev_pm_opp *opp;
>> +
>> +	opp = dev_pm_opp_find_freq_exact(cpu_dev, freq, true);
>> +	if (IS_ERR(opp)) {
>> +		ret = dev_pm_opp_add(cpu_dev, freq, volt);
> 
> With my comment on the other patch, you can just call
> dev_pm_opp_update_voltage() and if that fails then call 
> dev_pm_opp_add().

yeah that should simplify things.

Also through the above approach I cannot
really disable opps that the OSM does not
support. I can only try enabling opp's that
OSM supports. But that would require all
opp's nodes to start with "disabled" but
that is not allowed I guess.

> 
>> +	} else {
>> +		dev_pm_opp_disable(cpu_dev, freq);
>> +		ret = dev_pm_opp_update_voltage(cpu_dev, freq, volt);
>> +		dev_pm_opp_enable(cpu_dev, freq);
> 
> Perhaps no one else should be using the CPU OPP table at this point of 
> time and
> so we can get away with disable and enable stuff ? Just add a comment 
> on why
> that works.

we can get away without enable/disable here

> 
>> +		dev_pm_opp_put(opp);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>>  				    struct cpufreq_policy *policy,
>>  				    void __iomem *base)
>> @@ -81,11 +100,16 @@ static int qcom_cpufreq_hw_read_lut(struct device 
>> *cpu_dev,
>>  	u32 volt;
>>  	unsigned int max_cores = cpumask_weight(policy->cpus);
>>  	struct cpufreq_frequency_table	*table;
>> +	int ret;
>> 
>>  	table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
>>  	if (!table)
>>  		return -ENOMEM;
>> 
>> +	ret = dev_pm_opp_of_add_table(cpu_dev);
>> +	if (ret)
>> +		dev_dbg(cpu_dev, "Couldn't add OPP table\n");
>> +
>>  	for (i = 0; i < LUT_MAX_ENTRIES; i++) {
>>  		data = readl_relaxed(base + REG_FREQ_LUT +
>>  				      i * LUT_ROW_SIZE);
>> @@ -104,7 +128,7 @@ static int qcom_cpufreq_hw_read_lut(struct device 
>> *cpu_dev,
>> 
>>  		if (freq != prev_freq && core_count == max_cores) {
>>  			table[i].frequency = freq;
>> -			dev_pm_opp_add(cpu_dev, freq * 1000, volt);
>> +			qcom_find_update_opp(cpu_dev, freq * 1000, volt);
>>  			dev_dbg(cpu_dev, "index=%d freq=%d, core_count %d\n", i,
>>  				freq, core_count);
>>  		} else {
>> @@ -125,7 +149,8 @@ static int qcom_cpufreq_hw_read_lut(struct device 
>> *cpu_dev,
>>  			if (prev_cc != max_cores) {
>>  				prev->frequency = prev_freq;
>>  				prev->flags = CPUFREQ_BOOST_FREQ;
>> -				dev_pm_opp_add(cpu_dev,	prev_freq * 1000, volt);
>> +				qcom_find_update_opp(cpu_dev, prev_freq * 1000,
>> +						     volt);
>>  			}
>> 
>>  			break;
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
>> Forum,
>> a Linux Foundation Collaborative Project

-- 
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables
  2019-04-10 11:16     ` Sibi Sankar
@ 2019-04-10 11:25       ` Viresh Kumar
  0 siblings, 0 replies; 21+ messages in thread
From: Viresh Kumar @ 2019-04-10 11:25 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw, nm, sboyd,
	georgi.djakov, bjorn.andersson, david.brown, mark.rutland,
	linux-kernel, linux-arm-msm-owner, devicetree, rnayak, cw00.choi,
	linux-pm, evgreen, daidavid1, dianders

On 10-04-19, 16:46, Sibi Sankar wrote:
> On 2019-04-10 16:03, Viresh Kumar wrote:
> > On 28-03-19, 20:58, Sibi Sankar wrote:
> > > Add support to parse and update OPP tables attached to the cpu nodes.
> > > 
> > > Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> > > ---
> > >  drivers/cpufreq/qcom-cpufreq-hw.c | 29 +++++++++++++++++++++++++++--
> > >  1 file changed, 27 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c
> > > b/drivers/cpufreq/qcom-cpufreq-hw.c
> > > index 4b0b50403901..5c268dd2346c 100644
> > > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > > @@ -73,6 +73,25 @@ static unsigned int
> > > qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
> > >  	return policy->freq_table[index].frequency;
> > >  }
> > > 
> > > +static int qcom_find_update_opp(struct device *cpu_dev, unsigned
> > > long freq,
> > > +				unsigned long volt)
> > > +{
> > > +	int ret;
> > > +	struct dev_pm_opp *opp;
> > > +
> > > +	opp = dev_pm_opp_find_freq_exact(cpu_dev, freq, true);
> > > +	if (IS_ERR(opp)) {
> > > +		ret = dev_pm_opp_add(cpu_dev, freq, volt);
> > 
> > With my comment on the other patch, you can just call
> > dev_pm_opp_update_voltage() and if that fails then call
> > dev_pm_opp_add().
> 
> yeah that should simplify things.
> 
> Also through the above approach I cannot
> really disable opps that the OSM does not
> support. I can only try enabling opp's that
> OSM supports. But that would require all
> opp's nodes to start with "disabled" but
> that is not allowed I guess.

Yeah maybe add all OPPs from DT, disable all of them and then only modify and
enable the ones you need.

-- 
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] Add CPU based scaling support to Passive governor
  2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
                   ` (8 preceding siblings ...)
  2019-03-28 15:28 ` [PATCH RFC 9/9] arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables Sibi Sankar
@ 2019-04-11  7:02 ` Sibi Sankar
  9 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-04-11  7:02 UTC (permalink / raw)
  To: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm, devicetree, rnayak, cw00.choi, linux-pm, evgreen,
	daidavid1, dianders

On 2019-03-28 20:58, Sibi Sankar wrote:
> This RFC series aims to add cpu based scaling support to the passive
> governor and scale DDR with a generic interconnect bandwidth based
> devfreq driver on SDM845 SoC. This series achieves similar 
> functionality
> to Georgi's Patch series (https://patchwork.kernel.org/cover/10850817/)
> and can be used with MSM8916/MSM8996/QCS404 SoCs.
> 
> [patches 1,6 - Add and export export helpers to get avg/peak bandwidth 
> and
>  update voltage of an disabled opp respectively]
> 
> [patch 3 - Adds cpu based scaling support to passive governor]
> To achieve this, it listens to CPU frequency transition notifiers
> to keep itself up to date on the current CPU frequency.
> To decide the frequency of the device, the governor depends one of
> the following:
> * Constructs a CPU frequency to device frequency mapping table from
>   required-opps property of the devfreq device's opp_table
> * Scales the device frequency in proportion to the CPU frequency by
>   performing interpolation.

Had a discussion with Viresh and Georgi,
Viresh pointed out cpu based scaling can
be done in a better way by placing the
required-opps in the cpu opp table and
with some changes in the core so dropping
the idea of cpufreq integration into passive
governor for now.

> 
> [patch 7 - Parses and updates opps from the frequency/voltage read from
>  the look up tables]
> 
> The patch series depends on opp-bw-MBs bindings introduced in:
> https://patchwork.kernel.org/cover/10850817/
> 
> Saravana Kannan (2):
>   PM / devfreq: Add cpu based scaling support to passive_governor
>   PM / devfreq: Add devfreq driver for interconnect bandwidth voting
> 
> Sibi Sankar (7):
>   OPP: Add and export helpers to get avg/peak bw
>   OPP: Export a number of helpers to prevent code duplication
>   dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver
>   OPP: Add and export helper to update voltage
>   cpufreq: qcom: Add support to update cpu node's OPP tables
>   arm64: dts: qcom: sdm845: Add cpu OPP tables
>   arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables
> 
>  .../devicetree/bindings/devfreq/icbw.txt      | 146 +++++++++
>  arch/arm64/boot/dts/qcom/sdm845.dtsi          | 262 +++++++++++++++++
>  drivers/cpufreq/qcom-cpufreq-hw.c             |  29 +-
>  drivers/devfreq/Kconfig                       |  19 ++
>  drivers/devfreq/Makefile                      |   1 +
>  drivers/devfreq/devfreq_icbw.c                | 132 +++++++++
>  drivers/devfreq/governor_passive.c            | 276 +++++++++++++++++-
>  drivers/opp/core.c                            | 100 +++++++
>  drivers/opp/of.c                              |  13 +-
>  include/linux/devfreq.h                       |  43 ++-
>  include/linux/pm_opp.h                        |  35 +++
>  11 files changed, 1044 insertions(+), 12 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/devfreq/icbw.txt
>  create mode 100644 drivers/devfreq/devfreq_icbw.c

-- 
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor
  2019-03-28 15:28 ` [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor Sibi Sankar
@ 2019-04-12  7:39   ` Chanwoo Choi
  2019-05-27  8:23     ` Sibi Sankar
  0 siblings, 1 reply; 21+ messages in thread
From: Chanwoo Choi @ 2019-04-12  7:39 UTC (permalink / raw)
  To: Sibi Sankar, robh+dt, andy.gross, myungjoo.ham, kyungmin.park,
	rjw, viresh.kumar, nm, sboyd, georgi.djakov
  Cc: bjorn.andersson, david.brown, mark.rutland, linux-kernel,
	linux-arm-msm-owner, devicetree, rnayak, linux-pm, evgreen,
	daidavid1, dianders, Saravana Kannan

Hi,

I agree this approach absolutely.
Just I add some comments. Please check it.

On 19. 3. 29. 오전 12:28, Sibi Sankar wrote:
> From: Saravana Kannan <skannan@codeaurora.org>
> 
> Many CPU architectures have caches that can scale independent of the
> CPUs. Frequency scaling of the caches is necessary to make sure the cache
> is not a performance bottleneck that leads to poor performance and
> power. The same idea applies for RAM/DDR.
> 
> To achieve this, this patch add support for cpu based scaling to the
> passive governor. This is accomplished by taking the current frequency
> of each CPU frequency domain and then adjusts the frequency of the cache
> (or any devfreq device) based on the frequency of the CPUs. It listens
> to CPU frequency transition notifiers to keep itself up to date on the
> current CPU frequency.
> 
> To decide the frequency of the device, the governor does one of the
> following:
> * Constructs a CPU frequency to device frequency mapping table from
>   required-opps property of the devfreq device's opp_table
> 
> * Scales the device frequency in proportion to the CPU frequency. So, if
>   the CPUs are running at their max frequency, the device runs at its
>   max frequency. If the CPUs are running at their min frequency, the
>   device runs at its min frequency. It is interpolated for frequencies
>   in between.
> 
> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> ---
>  drivers/devfreq/Kconfig            |   4 +
>  drivers/devfreq/governor_passive.c | 276 ++++++++++++++++++++++++++++-
>  include/linux/devfreq.h            |  43 ++++-
>  3 files changed, 315 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index 6a172d338f6d..9a45f464a56b 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -72,6 +72,10 @@ config DEVFREQ_GOV_PASSIVE
>  	  device. This governor does not change the frequency by itself
>  	  through sysfs entries. The passive governor recommends that
>  	  devfreq device uses the OPP table to get the frequency/voltage.
> +	  Alternatively the governor can also be chosen to scale based on
> +	  the online CPUs current frequency. A CPU frequency to device
> +	  frequency mapping table(s) is auto-populated by the governor
> +	  for this purpose.
>  
>  comment "DEVFREQ Drivers"
>  
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index 3bc29acbd54e..2506682b233b 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -11,10 +11,63 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
>  #include <linux/device.h>
>  #include <linux/devfreq.h>
> +#include <linux/of.h>
> +#include <linux/slab.h>
>  #include "governor.h"
>  
> +static unsigned int xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> +					     unsigned int cpu)
> +{
> +	unsigned int cpu_min, cpu_max;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	unsigned int dev_min, dev_max, cpu_percent, cpu_freq = 0, freq = 0;
> +	unsigned long *freq_table = devfreq->profile->freq_table;
> +	struct device *dev = devfreq->dev.parent;
> +	struct devfreq_map *map;
> +	int opp_cnt, i;
> +
> +	if (!data->state[cpu] || data->state[cpu]->first_cpu != cpu) {
> +		freq = 0;
> +		goto out;

goto out -> return 0;

> +	}
> +
> +	/* Use Interpolation if map is not available */
> +	cpu_freq = data->state[cpu]->freq;
> +	if (!data->map) {
> +		cpu_min = data->state[cpu]->min_freq;
> +		cpu_max = data->state[cpu]->max_freq;
> +		if (freq_table) {
> +			dev_min = freq_table[0];
> +			dev_max = freq_table[devfreq->profile->max_state - 1];

Actually, it is not always true. The devfreq recommend the ascending order for
'freq_table'. But, it is not mandatory. Also, some devfreq device uses the
decending order for 'freq_table'. So, a patch[1] was considering the order
when getting the minimum/maximum frequency from freq_table. 

If you want to get the minimum/maximum frequency, you have to consider the order
of 'freq_table' as the patch[1].

[1] df5cf4a36178 ("PM / devfreq: Fix handling of min/max_freq == 0")

             /* Get minimum frequency according to sorting order */
+               if (freq_table[0] < freq_table[df->profile->max_state - 1])
+                       value = freq_table[0];
+               else
+                       value = freq_table[df->profile->max_state - 1];


> +		} else {
> +			if (devfreq->max_freq <= devfreq->min_freq)
> +				return 0;
> +			dev_min = devfreq->min_freq;
> +			dev_max = devfreq->max_freq;
> +		}
> +
> +		cpu_percent = ((cpu_freq - cpu_min) * 100) / cpu_max - cpu_min;
> +		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
> +		goto out;

You don't need to jump 'out'. Instead, you better to use the 'else' statement
for if data->map is not NULL. I think that almost case when using this patch
will be available of data->map. In order to skip the likely 'false' statement,
I recommend the following sequence.

	if (data->map) {
		map = data->map[cpu];
		...
	} else {
		/* Use Interpolation if map is not available */
	}


> +	}
> +
> +	map = data->map[cpu];
> +	opp_cnt = dev_pm_opp_get_opp_count(dev);
> +	for (i = 0; i < opp_cnt; i++) {
> +		freq = max(freq, map[i].dev_hz);
> +		if (map[i].cpu_khz >= cpu_freq)
> +			break;
> +	}
> +out:
> +	dev_dbg(dev, "CPU%u: %d -> dev: %u\n", cpu, cpu_freq, freq);

IMO, I think it is not necessary. If you want to print log, you better to print
it on device driver instead of governor.

> +	return freq;
> +}
> +
>  static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  					unsigned long *freq)
>  {
> @@ -23,6 +76,7 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
>  	unsigned long child_freq = ULONG_MAX;
>  	struct dev_pm_opp *opp;
> +	unsigned int cpu, tgt_freq = 0;

tgt means 'target'? If right, just use target_freq intead of 'tgt_freq'
for the readability.

>  	int i, count, ret = 0;
>  
>  	/*
> @@ -35,6 +89,14 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  		goto out;
>  	}
>  
> +	if (p_data->cpufreq_type) {
> +		for_each_possible_cpu(cpu)
> +			tgt_freq = max(tgt_freq,
> +				       xlate_cpufreq_to_devfreq(p_data, cpu));
> +		*freq = tgt_freq;

You better to change from 'tgt_freq' to 'target_freq' for the readability.

> +		goto out;
> +	}

I think that 'goto out' using is not proper for supporting two case.
Instead, you better to split out as following according to the type
of parent device (devfreq device or cpufreq device).

	switch (p_data->parent_type) {
	case DEVFREQ_PARENT_DEV:
		ret = get_target_freq_with_devfreq() 
		break;
	case CPUFREQ_PARENT_DEV:
		ret = get_target_freq_with_cpufreq()
		break;
	default:
		dev_err(...)
		ret = -EINVAL;
		goto out;
	}
		
	if (ret < 0) {
		/* exception handling for 'ret' value */
	}

> +
>  	/*
>  	 * If the parent and passive devfreq device uses the OPP table,
>  	 * get the next frequency by using the OPP table.
> @@ -149,6 +211,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> +					 unsigned long event, void *ptr)
> +{
> +	struct devfreq_passive_data *data =
> +			container_of(nb, struct devfreq_passive_data, nb);
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct cpufreq_freqs *freq = ptr;
> +	struct devfreq_cpu_state *state;

nitpick. how about using 'cpu_state' instead of 'state'?
in order to get the meaning from just variable name.

> +	int ret = 0;
> +
> +	if (event != CPUFREQ_POSTCHANGE)
> +		goto out;

just 'return' is simple instead of 'goto out' because this case  
don't need to treat the any restoring code. And also, you have
to check whether freq is NULL or not as following:

	if (event != CPUFREQ_POSTCHANGE || !freq || data->state[freq->cpu])
		return ret;
	state = data->state[freq->cpu];

> +
> +	state = data->state[freq->cpu];
> +	if (!state)
> +		goto out;
> +
> +	if (state->freq != freq->new) {
> +		state->freq = freq->new;

You have to update the frequency after update_devfreq() is completed without error.

> +		mutex_lock(&devfreq->lock);
> +		ret = update_devfreq(devfreq);
> +		mutex_unlock(&devfreq->lock);
> +		if (ret)
> +			dev_err(&devfreq->dev, "Frequency update failed.\n");

Almost devfreq error used the following format: "Couldn't ..." .
If there is no any specific reason to change the format for error log,
	"Couldnt update the frequency.\n" 

> +	}> +out:
> +	return ret;

Also, we can reduce the unneeded indentation as following:

	if (state->freq == freq->new)
		return ret;
	
	mutex_lock(&devfreq->lock);
	ret = update_devfreq(devfreq);
	mutex_unlock(&devfreq->lock);
	if (ret) {
		dev_err(&devfreq->dev, "Couldnt update the frequency.\n");
		return ret;
	}
	state->freq = freq->new;

	return 0;

> +}
> +
> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
> +{
> +	unsigned int cpu;
> +	struct devfreq_map **cpu_map;
> +	struct devfreq_map *map, *per_cpu_map;
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	int i, count = 0, opp_cnt = 0, ret = 0, iter_val = 0;
> +	struct device_node *np, *opp_table_np, *cpu_np;
> +	struct opp_table *opp_table, *cpu_opp_tbl;
> +	struct device *dev = devfreq->dev.parent;
> +	struct devfreq_cpu_state *state;
> +	struct dev_pm_opp *opp, *cpu_opp;
> +	struct cpufreq_policy *policy;
> +	struct device *cpu_dev;
> +	u64 cpu_khz, dev_hz;
> +
> +	get_online_cpus();
> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> +	ret = cpufreq_register_notifier(&data->nb,
> +					CPUFREQ_TRANSITION_NOTIFIER);
> +	if (ret)
> +		return ret;
> +
> +	/* Populate devfreq_cpu_state */
> +	for_each_online_cpu(cpu) {
> +		if (data->state[cpu])
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (policy) {
> +			state = kzalloc(sizeof(*state), GFP_KERNEL);
> +			if (!state)
> +				return -ENOMEM;
> +
> +			state->first_cpu = cpumask_first(policy->related_cpus);
> +			state->freq = policy->cur;
> +			state->min_freq = policy->cpuinfo.min_freq;
> +			state->max_freq = policy->cpuinfo.max_freq;
> +			data->state[cpu] = state;
> +			cpufreq_cpu_put(policy);
> +		} else {
> +			return -EPROBE_DEFER;
> +		}
> +	}
> +
> +	opp_table_np = dev_pm_opp_of_get_opp_desc_node(dev);
> +	if (!opp_table_np)
> +		goto out;
> +
> +	opp_cnt = dev_pm_opp_get_opp_count(dev);
> +	if (opp_cnt <= 0)
> +		goto put_opp_table;
> +
> +	/* Allocate memory for devfreq_map*/
> +	cpu_map = kcalloc(num_possible_cpus(), sizeof(*cpu_map), GFP_KERNEL);
> +	if (!cpu_map)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		per_cpu_map = kcalloc(opp_cnt, sizeof(*per_cpu_map),
> +				      GFP_KERNEL);
> +		if (!per_cpu_map)
> +			return -ENOMEM;
> +		cpu_map[cpu] = per_cpu_map;
> +	}
> +	data->map = cpu_map;
> +
> +	/* Populate devfreq_map */
> +	opp_table = dev_pm_opp_get_opp_table(dev);
> +	if (!opp_table)
> +		return -ENOMEM;
> +
> +	for_each_available_child_of_node(opp_table_np, np) {
> +		opp = dev_pm_opp_find_opp_of_np(opp_table, np);
> +		if (IS_ERR(opp))
> +			continue;
> +
> +		dev_hz = dev_pm_opp_get_freq(opp);
> +		dev_pm_opp_put(opp);
> +
> +		count = of_count_phandle_with_args(np, "required-opps", NULL);
> +		for (i = 0; i < count; i++) {
> +			for_each_possible_cpu(cpu) {
> +				cpu_dev = get_cpu_device(cpu);
> +				if (!cpu_dev) {
> +					dev_err(dev, "CPU get device failed.\n");
> +					continue;
> +				}
> +
> +				cpu_np = of_parse_required_opp(np, i);
> +				if (!cpu_np) {
> +					dev_err(dev, "Parsing required opp failed.\n");
> +					continue;
> +				}
> +
> +				/* Get cpu opp-table */
> +				cpu_opp_tbl = dev_pm_opp_get_opp_table(cpu_dev);
> +				if (!cpu_opp_tbl) {
> +					dev_err(dev, "CPU opp table get failed.\n");
> +					goto put_cpu_node;
> +				}
> +
> +				/* Match the cpu opp node from required-opp with
> +				 * the cpu-opp table */
> +				cpu_opp = dev_pm_opp_find_opp_of_np(cpu_opp_tbl,
> +								    cpu_np);
> +				if (!cpu_opp) {
> +					dev_dbg(dev, "CPU opp get failed.\n");
> +					goto put_cpu_opp_table;
> +				}
> +
> +				cpu_khz = dev_pm_opp_get_freq(cpu_opp);
> +				if (cpu_opp && cpu_khz) {
> +					/* Update freq-map if not already set */
> +					map = cpu_map[cpu];
> +					map[iter_val].cpu_khz = cpu_khz / 1000;
> +					map[iter_val].dev_hz = dev_hz;
> +				}
> +				dev_pm_opp_put(cpu_opp);
> +put_cpu_opp_table:
> +				dev_pm_opp_put_opp_table(cpu_opp_tbl);
> +put_cpu_node:
> +				of_node_put(cpu_np);
> +			}
> +		}
> +		iter_val++;
> +	}
> +	dev_pm_opp_put_opp_table(opp_table);
> +put_opp_table:
> +	of_node_put(opp_table_np);
> +out:
> +	put_online_cpus();
> +
> +	/* Update devfreq */
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);
> +	mutex_unlock(&devfreq->lock);
> +	if (ret)
> +		dev_err(dev, "Frequency update failed.\n");
> +
> +	return ret;
> +}
> +
> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
> +{
> +	int cpu;
> +	struct devfreq_passive_data *data = *p_data;
> +
> +	cpufreq_unregister_notifier(&data->nb,
> +				    CPUFREQ_TRANSITION_NOTIFIER);
> +
> +	for_each_possible_cpu(cpu) {
> +		kfree(data->state[cpu]);
> +		kfree(data->map[cpu]);
> +		data->state[cpu] = NULL;
> +		data->map[cpu] = NULL;
> +	}
> +
> +	kfree(data->map);
> +	data->map = NULL;
> +
> +	return 0;
> +}
> +
>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  				unsigned int event, void *data)
>  {
> @@ -159,7 +415,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  	struct notifier_block *nb = &p_data->nb;
>  	int ret = 0;
>  
> -	if (!parent)
> +	if (!parent && !p_data->cpufreq_type)
>  		return -EPROBE_DEFER;

It makes the fault for the existing devfreq devices with passive governor.
Please remove '!p_data->cpufreq_type' condition.

>  
>  	switch (event) {
> @@ -167,13 +423,21 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  		if (!p_data->this)
>  			p_data->this = devfreq;
>  
> -		nb->notifier_call = devfreq_passive_notifier_call;
> -		ret = devm_devfreq_register_notifier(dev, parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		if (p_data->cpufreq_type) {
> +			ret = cpufreq_passive_register(&p_data);
> +		} else {
> +			nb->notifier_call = devfreq_passive_notifier_call;
> +			ret = devm_devfreq_register_notifier(dev, parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		}

I suggested the my opinion aboyt 'cpufreq_type' variable below.
You can separte it for more clready with using parent device type.

		if (p_data->parent_type == DEVFREQ_PARENT_DEV)
			...
		else if (p_data->parent_type == CPUFREQ_PARENT_DEV)
			...
		else 
			// error handling

>  		break;
>  	case DEVFREQ_GOV_STOP:
> -		devm_devfreq_unregister_notifier(dev, parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		if (p_data->cpufreq_type) {
> +			cpufreq_passive_unregister(&p_data);
> +		} else {
> +			devm_devfreq_unregister_notifier(dev, parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		}

ditto.

>  		break;
>  	default:
>  		break;
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index fbffa74bfc1b..e8235fbe49e6 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -265,6 +265,38 @@ struct devfreq_simple_ondemand_data {
>  #endif
>  
>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
> +/**
> + * struct devfreq_cpu_state - holds the per-cpu state
> + * @freq:	holds the current frequency of the cpu.
> + * @min_freq:	holds the min frequency of the cpu.
> + * @max_freq:	holds the max frequency of the cpu.
> + * @first_cpu:	holds the cpumask of the first cpu of a policy.
> + *
> + * This structure stores the required cpu_state of a cpu.
> + * This is auto-populated by the governor.
> + */
> +struct devfreq_cpu_state {
> +	unsigned int freq;
> +	unsigned int min_freq;
> +	unsigned int max_freq;
> +	unsigned int first_cpu;
> +};
> +
> +/**
> + * struct devfreq_map - holds mapping from cpu frequency
> + * to devfreq frequency
> + * @cpu_khz:	holds the cpu frequency in Khz
> + * @dev_hz:	holds the devfreq device frequency in Hz
> + *
> + * This structure stores the lookup table between cpu
> + * and the devfreq device. This is auto-populated by the
> + * governor.
> + */
> +struct devfreq_map {

How about changing the structure name as following or others?
- devfreq_freq_map or devfreq_cpufreq_map or others.

Because this structure name guessing the meaning of mapping
between devfreq frequency and cpufreq frequency.

> +	unsigned int cpu_khz;
> +	unsigned int dev_hz;
> +};
> +
>  /**
>   * struct devfreq_passive_data - void *data fed to struct devfreq
>   *	and devfreq_add_device
> @@ -278,11 +310,13 @@ struct devfreq_simple_ondemand_data {
>   *			the next frequency, should use this callback.
>   * @this:	the devfreq instance of own device.
>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> + * @state:	holds the state min/max/current frequency of all online cpu's

As I commented above, how about using 'cpu_state' instead of 'state'?
in order to get the meaning from just variable name.

nitpick. Also,  I think that you can skip 'holds' from the description
of 'state' variable.


> + * @map:	holds the maps between cpu frequency and device frequency

How about using 'cpufreq_map' instead of 'map' for the readability?
IMHO, Because this variable is only used when parent device is cpu. 
I think that if you add to specify the meaningful prefix about cpu to variable name,
it is easy to catch the meaning of variable.
- map -> cpufreq_map.

nitpick. Also,  I think that you can skip 'holds' from the description
of 'map' variable.

>   *
>   * The devfreq_passive_data have to set the devfreq instance of parent
>   * device with governors except for the passive governor. But, don't need to
> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> - * them.
> + * initialize the 'this', 'nb', 'state' and 'map' field because the devfreq

If you agree my opinion above, 
- state -> cpu_state.
- map -> cpufreq_map

> + * core will handle them.
>   */
>  struct devfreq_passive_data {
>  	/* Should set the devfreq instance of parent device */
> @@ -291,9 +325,14 @@ struct devfreq_passive_data {
>  	/* Optional callback to decide the next frequency of passvice device */
>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>  
> +	/* Should be set if the devfreq device wants to be scaled with cpu*/
> +	u8 cpufreq_type;

The devfreq devices with passive governor have always parent
either devfreq device or cpufreq device. So, you better to specify
the parent type as following: I think that it is more clear to check
the type of parent device.

	enum devfreq_parent_dev_type {
		DEVFREQ_PARENT_DEV,
		CPUFREQ_PARENT_DEV,
	};

	enum devfreq_parent_dev_type parent_type;

> +
>  	/* For passive governor's internal use. Don't need to set them */
>  	struct devfreq *this;
>  	struct notifier_block nb;
> +	struct devfreq_cpu_state *state[NR_CPUS];
> +	struct devfreq_map **map;

ditto.

>  };
>  #endif
>  
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor
  2019-04-12  7:39   ` Chanwoo Choi
@ 2019-05-27  8:23     ` Sibi Sankar
  2019-05-28  1:27       ` Chanwoo Choi
  0 siblings, 1 reply; 21+ messages in thread
From: Sibi Sankar @ 2019-05-27  8:23 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov, bjorn.andersson,
	david.brown, mark.rutland, linux-kernel, linux-arm-msm-owner,
	devicetree, rnayak, linux-pm, evgreen, daidavid1, dianders,
	Saravana Kannan, linux-kernel-owner

Hey Chanwoo,

Thanks a lot for reviewing the patch. Like I
had indicated earlier we decided to go with
a simpler approach instead on qualcomm SoCs.
I am happy to re-spin this patch with your
comments addressed if we do find other users
for this feature.

On 2019-04-12 13:09, Chanwoo Choi wrote:
> Hi,
> 
> I agree this approach absolutely.
> Just I add some comments. Please check it.
> 
> On 19. 3. 29. 오전 12:28, Sibi Sankar wrote:
>> From: Saravana Kannan <skannan@codeaurora.org>
>> 
>> Many CPU architectures have caches that can scale independent of the
>> CPUs. Frequency scaling of the caches is necessary to make sure the 
>> cache
>> is not a performance bottleneck that leads to poor performance and
>> power. The same idea applies for RAM/DDR.
>> 
>> To achieve this, this patch add support for cpu based scaling to the
>> passive governor. This is accomplished by taking the current frequency
>> of each CPU frequency domain and then adjusts the frequency of the 
>> cache
>> (or any devfreq device) based on the frequency of the CPUs. It listens
>> to CPU frequency transition notifiers to keep itself up to date on the
>> current CPU frequency.
>> 
>> To decide the frequency of the device, the governor does one of the
>> following:
>> * Constructs a CPU frequency to device frequency mapping table from
>>   required-opps property of the devfreq device's opp_table
>> 
>> * Scales the device frequency in proportion to the CPU frequency. So, 
>> if
>>   the CPUs are running at their max frequency, the device runs at its
>>   max frequency. If the CPUs are running at their min frequency, the
>>   device runs at its min frequency. It is interpolated for frequencies
>>   in between.
>> 
>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> ---
>>  drivers/devfreq/Kconfig            |   4 +
>>  drivers/devfreq/governor_passive.c | 276 
>> ++++++++++++++++++++++++++++-
>>  include/linux/devfreq.h            |  43 ++++-
>>  3 files changed, 315 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>> index 6a172d338f6d..9a45f464a56b 100644
>> --- a/drivers/devfreq/Kconfig
>> +++ b/drivers/devfreq/Kconfig
>> @@ -72,6 +72,10 @@ config DEVFREQ_GOV_PASSIVE
>>  	  device. This governor does not change the frequency by itself
>>  	  through sysfs entries. The passive governor recommends that
>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>> +	  Alternatively the governor can also be chosen to scale based on
>> +	  the online CPUs current frequency. A CPU frequency to device
>> +	  frequency mapping table(s) is auto-populated by the governor
>> +	  for this purpose.
>> 
>>  comment "DEVFREQ Drivers"
>> 
>> diff --git a/drivers/devfreq/governor_passive.c 
>> b/drivers/devfreq/governor_passive.c
>> index 3bc29acbd54e..2506682b233b 100644
>> --- a/drivers/devfreq/governor_passive.c
>> +++ b/drivers/devfreq/governor_passive.c
>> @@ -11,10 +11,63 @@
>>   */
>> 
>>  #include <linux/module.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpufreq.h>
>> +#include <linux/cpumask.h>
>>  #include <linux/device.h>
>>  #include <linux/devfreq.h>
>> +#include <linux/of.h>
>> +#include <linux/slab.h>
>>  #include "governor.h"
>> 
>> +static unsigned int xlate_cpufreq_to_devfreq(struct 
>> devfreq_passive_data *data,
>> +					     unsigned int cpu)
>> +{
>> +	unsigned int cpu_min, cpu_max;
>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>> +	unsigned int dev_min, dev_max, cpu_percent, cpu_freq = 0, freq = 0;
>> +	unsigned long *freq_table = devfreq->profile->freq_table;
>> +	struct device *dev = devfreq->dev.parent;
>> +	struct devfreq_map *map;
>> +	int opp_cnt, i;
>> +
>> +	if (!data->state[cpu] || data->state[cpu]->first_cpu != cpu) {
>> +		freq = 0;
>> +		goto out;
> 
> goto out -> return 0;
> 
>> +	}
>> +
>> +	/* Use Interpolation if map is not available */
>> +	cpu_freq = data->state[cpu]->freq;
>> +	if (!data->map) {
>> +		cpu_min = data->state[cpu]->min_freq;
>> +		cpu_max = data->state[cpu]->max_freq;
>> +		if (freq_table) {
>> +			dev_min = freq_table[0];
>> +			dev_max = freq_table[devfreq->profile->max_state - 1];
> 
> Actually, it is not always true. The devfreq recommend the ascending 
> order for
> 'freq_table'. But, it is not mandatory. Also, some devfreq device uses 
> the
> decending order for 'freq_table'. So, a patch[1] was considering the 
> order
> when getting the minimum/maximum frequency from freq_table.
> 
> If you want to get the minimum/maximum frequency, you have to consider 
> the order
> of 'freq_table' as the patch[1].
> 
> [1] df5cf4a36178 ("PM / devfreq: Fix handling of min/max_freq == 0")
> 
>              /* Get minimum frequency according to sorting order */
> +               if (freq_table[0] < freq_table[df->profile->max_state - 
> 1])
> +                       value = freq_table[0];
> +               else
> +                       value = freq_table[df->profile->max_state - 1];
> 
> 
>> +		} else {
>> +			if (devfreq->max_freq <= devfreq->min_freq)
>> +				return 0;
>> +			dev_min = devfreq->min_freq;
>> +			dev_max = devfreq->max_freq;
>> +		}
>> +
>> +		cpu_percent = ((cpu_freq - cpu_min) * 100) / cpu_max - cpu_min;
>> +		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>> +		goto out;
> 
> You don't need to jump 'out'. Instead, you better to use the 'else' 
> statement
> for if data->map is not NULL. I think that almost case when using this 
> patch
> will be available of data->map. In order to skip the likely 'false' 
> statement,
> I recommend the following sequence.
> 
> 	if (data->map) {
> 		map = data->map[cpu];
> 		...
> 	} else {
> 		/* Use Interpolation if map is not available */
> 	}
> 
> 
>> +	}
>> +
>> +	map = data->map[cpu];
>> +	opp_cnt = dev_pm_opp_get_opp_count(dev);
>> +	for (i = 0; i < opp_cnt; i++) {
>> +		freq = max(freq, map[i].dev_hz);
>> +		if (map[i].cpu_khz >= cpu_freq)
>> +			break;
>> +	}
>> +out:
>> +	dev_dbg(dev, "CPU%u: %d -> dev: %u\n", cpu, cpu_freq, freq);
> 
> IMO, I think it is not necessary. If you want to print log, you better 
> to print
> it on device driver instead of governor.
> 
>> +	return freq;
>> +}
>> +
>>  static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>  					unsigned long *freq)
>>  {
>> @@ -23,6 +76,7 @@ static int devfreq_passive_get_target_freq(struct 
>> devfreq *devfreq,
>>  	struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
>>  	unsigned long child_freq = ULONG_MAX;
>>  	struct dev_pm_opp *opp;
>> +	unsigned int cpu, tgt_freq = 0;
> 
> tgt means 'target'? If right, just use target_freq intead of 'tgt_freq'
> for the readability.
> 
>>  	int i, count, ret = 0;
>> 
>>  	/*
>> @@ -35,6 +89,14 @@ static int devfreq_passive_get_target_freq(struct 
>> devfreq *devfreq,
>>  		goto out;
>>  	}
>> 
>> +	if (p_data->cpufreq_type) {
>> +		for_each_possible_cpu(cpu)
>> +			tgt_freq = max(tgt_freq,
>> +				       xlate_cpufreq_to_devfreq(p_data, cpu));
>> +		*freq = tgt_freq;
> 
> You better to change from 'tgt_freq' to 'target_freq' for the 
> readability.
> 
>> +		goto out;
>> +	}
> 
> I think that 'goto out' using is not proper for supporting two case.
> Instead, you better to split out as following according to the type
> of parent device (devfreq device or cpufreq device).
> 
> 	switch (p_data->parent_type) {
> 	case DEVFREQ_PARENT_DEV:
> 		ret = get_target_freq_with_devfreq()
> 		break;
> 	case CPUFREQ_PARENT_DEV:
> 		ret = get_target_freq_with_cpufreq()
> 		break;
> 	default:
> 		dev_err(...)
> 		ret = -EINVAL;
> 		goto out;
> 	}
> 
> 	if (ret < 0) {
> 		/* exception handling for 'ret' value */
> 	}
> 
>> +
>>  	/*
>>  	 * If the parent and passive devfreq device uses the OPP table,
>>  	 * get the next frequency by using the OPP table.
>> @@ -149,6 +211,200 @@ static int devfreq_passive_notifier_call(struct 
>> notifier_block *nb,
>>  	return NOTIFY_DONE;
>>  }
>> 
>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>> +					 unsigned long event, void *ptr)
>> +{
>> +	struct devfreq_passive_data *data =
>> +			container_of(nb, struct devfreq_passive_data, nb);
>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>> +	struct cpufreq_freqs *freq = ptr;
>> +	struct devfreq_cpu_state *state;
> 
> nitpick. how about using 'cpu_state' instead of 'state'?
> in order to get the meaning from just variable name.
> 
>> +	int ret = 0;
>> +
>> +	if (event != CPUFREQ_POSTCHANGE)
>> +		goto out;
> 
> just 'return' is simple instead of 'goto out' because this case
> don't need to treat the any restoring code. And also, you have
> to check whether freq is NULL or not as following:
> 
> 	if (event != CPUFREQ_POSTCHANGE || !freq || data->state[freq->cpu])
> 		return ret;
> 	state = data->state[freq->cpu];
> 
>> +
>> +	state = data->state[freq->cpu];
>> +	if (!state)
>> +		goto out;
>> +
>> +	if (state->freq != freq->new) {
>> +		state->freq = freq->new;
> 
> You have to update the frequency after update_devfreq() is completed
> without error.
> 
>> +		mutex_lock(&devfreq->lock);
>> +		ret = update_devfreq(devfreq);
>> +		mutex_unlock(&devfreq->lock);
>> +		if (ret)
>> +			dev_err(&devfreq->dev, "Frequency update failed.\n");
> 
> Almost devfreq error used the following format: "Couldn't ..." .
> If there is no any specific reason to change the format for error log,
> 	"Couldnt update the frequency.\n"
> 
>> +	}> +out:
>> +	return ret;
> 
> Also, we can reduce the unneeded indentation as following:
> 
> 	if (state->freq == freq->new)
> 		return ret;
> 
> 	mutex_lock(&devfreq->lock);
> 	ret = update_devfreq(devfreq);
> 	mutex_unlock(&devfreq->lock);
> 	if (ret) {
> 		dev_err(&devfreq->dev, "Couldnt update the frequency.\n");
> 		return ret;
> 	}
> 	state->freq = freq->new;
> 
> 	return 0;
> 
>> +}
>> +
>> +static int cpufreq_passive_register(struct devfreq_passive_data 
>> **p_data)
>> +{
>> +	unsigned int cpu;
>> +	struct devfreq_map **cpu_map;
>> +	struct devfreq_map *map, *per_cpu_map;
>> +	struct devfreq_passive_data *data = *p_data;
>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>> +	int i, count = 0, opp_cnt = 0, ret = 0, iter_val = 0;
>> +	struct device_node *np, *opp_table_np, *cpu_np;
>> +	struct opp_table *opp_table, *cpu_opp_tbl;
>> +	struct device *dev = devfreq->dev.parent;
>> +	struct devfreq_cpu_state *state;
>> +	struct dev_pm_opp *opp, *cpu_opp;
>> +	struct cpufreq_policy *policy;
>> +	struct device *cpu_dev;
>> +	u64 cpu_khz, dev_hz;
>> +
>> +	get_online_cpus();
>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>> +	ret = cpufreq_register_notifier(&data->nb,
>> +					CPUFREQ_TRANSITION_NOTIFIER);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Populate devfreq_cpu_state */
>> +	for_each_online_cpu(cpu) {
>> +		if (data->state[cpu])
>> +			continue;
>> +
>> +		policy = cpufreq_cpu_get(cpu);
>> +		if (policy) {
>> +			state = kzalloc(sizeof(*state), GFP_KERNEL);
>> +			if (!state)
>> +				return -ENOMEM;
>> +
>> +			state->first_cpu = cpumask_first(policy->related_cpus);
>> +			state->freq = policy->cur;
>> +			state->min_freq = policy->cpuinfo.min_freq;
>> +			state->max_freq = policy->cpuinfo.max_freq;
>> +			data->state[cpu] = state;
>> +			cpufreq_cpu_put(policy);
>> +		} else {
>> +			return -EPROBE_DEFER;
>> +		}
>> +	}
>> +
>> +	opp_table_np = dev_pm_opp_of_get_opp_desc_node(dev);
>> +	if (!opp_table_np)
>> +		goto out;
>> +
>> +	opp_cnt = dev_pm_opp_get_opp_count(dev);
>> +	if (opp_cnt <= 0)
>> +		goto put_opp_table;
>> +
>> +	/* Allocate memory for devfreq_map*/
>> +	cpu_map = kcalloc(num_possible_cpus(), sizeof(*cpu_map), 
>> GFP_KERNEL);
>> +	if (!cpu_map)
>> +		return -ENOMEM;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		per_cpu_map = kcalloc(opp_cnt, sizeof(*per_cpu_map),
>> +				      GFP_KERNEL);
>> +		if (!per_cpu_map)
>> +			return -ENOMEM;
>> +		cpu_map[cpu] = per_cpu_map;
>> +	}
>> +	data->map = cpu_map;
>> +
>> +	/* Populate devfreq_map */
>> +	opp_table = dev_pm_opp_get_opp_table(dev);
>> +	if (!opp_table)
>> +		return -ENOMEM;
>> +
>> +	for_each_available_child_of_node(opp_table_np, np) {
>> +		opp = dev_pm_opp_find_opp_of_np(opp_table, np);
>> +		if (IS_ERR(opp))
>> +			continue;
>> +
>> +		dev_hz = dev_pm_opp_get_freq(opp);
>> +		dev_pm_opp_put(opp);
>> +
>> +		count = of_count_phandle_with_args(np, "required-opps", NULL);
>> +		for (i = 0; i < count; i++) {
>> +			for_each_possible_cpu(cpu) {
>> +				cpu_dev = get_cpu_device(cpu);
>> +				if (!cpu_dev) {
>> +					dev_err(dev, "CPU get device failed.\n");
>> +					continue;
>> +				}
>> +
>> +				cpu_np = of_parse_required_opp(np, i);
>> +				if (!cpu_np) {
>> +					dev_err(dev, "Parsing required opp failed.\n");
>> +					continue;
>> +				}
>> +
>> +				/* Get cpu opp-table */
>> +				cpu_opp_tbl = dev_pm_opp_get_opp_table(cpu_dev);
>> +				if (!cpu_opp_tbl) {
>> +					dev_err(dev, "CPU opp table get failed.\n");
>> +					goto put_cpu_node;
>> +				}
>> +
>> +				/* Match the cpu opp node from required-opp with
>> +				 * the cpu-opp table */
>> +				cpu_opp = dev_pm_opp_find_opp_of_np(cpu_opp_tbl,
>> +								    cpu_np);
>> +				if (!cpu_opp) {
>> +					dev_dbg(dev, "CPU opp get failed.\n");
>> +					goto put_cpu_opp_table;
>> +				}
>> +
>> +				cpu_khz = dev_pm_opp_get_freq(cpu_opp);
>> +				if (cpu_opp && cpu_khz) {
>> +					/* Update freq-map if not already set */
>> +					map = cpu_map[cpu];
>> +					map[iter_val].cpu_khz = cpu_khz / 1000;
>> +					map[iter_val].dev_hz = dev_hz;
>> +				}
>> +				dev_pm_opp_put(cpu_opp);
>> +put_cpu_opp_table:
>> +				dev_pm_opp_put_opp_table(cpu_opp_tbl);
>> +put_cpu_node:
>> +				of_node_put(cpu_np);
>> +			}
>> +		}
>> +		iter_val++;
>> +	}
>> +	dev_pm_opp_put_opp_table(opp_table);
>> +put_opp_table:
>> +	of_node_put(opp_table_np);
>> +out:
>> +	put_online_cpus();
>> +
>> +	/* Update devfreq */
>> +	mutex_lock(&devfreq->lock);
>> +	ret = update_devfreq(devfreq);
>> +	mutex_unlock(&devfreq->lock);
>> +	if (ret)
>> +		dev_err(dev, "Frequency update failed.\n");
>> +
>> +	return ret;
>> +}
>> +
>> +static int cpufreq_passive_unregister(struct devfreq_passive_data 
>> **p_data)
>> +{
>> +	int cpu;
>> +	struct devfreq_passive_data *data = *p_data;
>> +
>> +	cpufreq_unregister_notifier(&data->nb,
>> +				    CPUFREQ_TRANSITION_NOTIFIER);
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		kfree(data->state[cpu]);
>> +		kfree(data->map[cpu]);
>> +		data->state[cpu] = NULL;
>> +		data->map[cpu] = NULL;
>> +	}
>> +
>> +	kfree(data->map);
>> +	data->map = NULL;
>> +
>> +	return 0;
>> +}
>> +
>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>  				unsigned int event, void *data)
>>  {
>> @@ -159,7 +415,7 @@ static int devfreq_passive_event_handler(struct 
>> devfreq *devfreq,
>>  	struct notifier_block *nb = &p_data->nb;
>>  	int ret = 0;
>> 
>> -	if (!parent)
>> +	if (!parent && !p_data->cpufreq_type)
>>  		return -EPROBE_DEFER;
> 
> It makes the fault for the existing devfreq devices with passive 
> governor.
> Please remove '!p_data->cpufreq_type' condition.
> 
>> 
>>  	switch (event) {
>> @@ -167,13 +423,21 @@ static int devfreq_passive_event_handler(struct 
>> devfreq *devfreq,
>>  		if (!p_data->this)
>>  			p_data->this = devfreq;
>> 
>> -		nb->notifier_call = devfreq_passive_notifier_call;
>> -		ret = devm_devfreq_register_notifier(dev, parent, nb,
>> -					DEVFREQ_TRANSITION_NOTIFIER);
>> +		if (p_data->cpufreq_type) {
>> +			ret = cpufreq_passive_register(&p_data);
>> +		} else {
>> +			nb->notifier_call = devfreq_passive_notifier_call;
>> +			ret = devm_devfreq_register_notifier(dev, parent, nb,
>> +						DEVFREQ_TRANSITION_NOTIFIER);
>> +		}
> 
> I suggested the my opinion aboyt 'cpufreq_type' variable below.
> You can separte it for more clready with using parent device type.
> 
> 		if (p_data->parent_type == DEVFREQ_PARENT_DEV)
> 			...
> 		else if (p_data->parent_type == CPUFREQ_PARENT_DEV)
> 			...
> 		else
> 			// error handling
> 
>>  		break;
>>  	case DEVFREQ_GOV_STOP:
>> -		devm_devfreq_unregister_notifier(dev, parent, nb,
>> -					DEVFREQ_TRANSITION_NOTIFIER);
>> +		if (p_data->cpufreq_type) {
>> +			cpufreq_passive_unregister(&p_data);
>> +		} else {
>> +			devm_devfreq_unregister_notifier(dev, parent, nb,
>> +						DEVFREQ_TRANSITION_NOTIFIER);
>> +		}
> 
> ditto.
> 
>>  		break;
>>  	default:
>>  		break;
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index fbffa74bfc1b..e8235fbe49e6 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -265,6 +265,38 @@ struct devfreq_simple_ondemand_data {
>>  #endif
>> 
>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>> +/**
>> + * struct devfreq_cpu_state - holds the per-cpu state
>> + * @freq:	holds the current frequency of the cpu.
>> + * @min_freq:	holds the min frequency of the cpu.
>> + * @max_freq:	holds the max frequency of the cpu.
>> + * @first_cpu:	holds the cpumask of the first cpu of a policy.
>> + *
>> + * This structure stores the required cpu_state of a cpu.
>> + * This is auto-populated by the governor.
>> + */
>> +struct devfreq_cpu_state {
>> +	unsigned int freq;
>> +	unsigned int min_freq;
>> +	unsigned int max_freq;
>> +	unsigned int first_cpu;
>> +};
>> +
>> +/**
>> + * struct devfreq_map - holds mapping from cpu frequency
>> + * to devfreq frequency
>> + * @cpu_khz:	holds the cpu frequency in Khz
>> + * @dev_hz:	holds the devfreq device frequency in Hz
>> + *
>> + * This structure stores the lookup table between cpu
>> + * and the devfreq device. This is auto-populated by the
>> + * governor.
>> + */
>> +struct devfreq_map {
> 
> How about changing the structure name as following or others?
> - devfreq_freq_map or devfreq_cpufreq_map or others.
> 
> Because this structure name guessing the meaning of mapping
> between devfreq frequency and cpufreq frequency.
> 
>> +	unsigned int cpu_khz;
>> +	unsigned int dev_hz;
>> +};
>> +
>>  /**
>>   * struct devfreq_passive_data - void *data fed to struct devfreq
>>   *	and devfreq_add_device
>> @@ -278,11 +310,13 @@ struct devfreq_simple_ondemand_data {
>>   *			the next frequency, should use this callback.
>>   * @this:	the devfreq instance of own device.
>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>> + * @state:	holds the state min/max/current frequency of all online 
>> cpu's
> 
> As I commented above, how about using 'cpu_state' instead of 'state'?
> in order to get the meaning from just variable name.
> 
> nitpick. Also,  I think that you can skip 'holds' from the description
> of 'state' variable.
> 
> 
>> + * @map:	holds the maps between cpu frequency and device frequency
> 
> How about using 'cpufreq_map' instead of 'map' for the readability?
> IMHO, Because this variable is only used when parent device is cpu.
> I think that if you add to specify the meaningful prefix about cpu to
> variable name,
> it is easy to catch the meaning of variable.
> - map -> cpufreq_map.
> 
> nitpick. Also,  I think that you can skip 'holds' from the description
> of 'map' variable.
> 
>>   *
>>   * The devfreq_passive_data have to set the devfreq instance of 
>> parent
>>   * device with governors except for the passive governor. But, don't 
>> need to
>> - * initialize the 'this' and 'nb' field because the devfreq core will 
>> handle
>> - * them.
>> + * initialize the 'this', 'nb', 'state' and 'map' field because the 
>> devfreq
> 
> If you agree my opinion above,
> - state -> cpu_state.
> - map -> cpufreq_map
> 
>> + * core will handle them.
>>   */
>>  struct devfreq_passive_data {
>>  	/* Should set the devfreq instance of parent device */
>> @@ -291,9 +325,14 @@ struct devfreq_passive_data {
>>  	/* Optional callback to decide the next frequency of passvice device 
>> */
>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>> 
>> +	/* Should be set if the devfreq device wants to be scaled with cpu*/
>> +	u8 cpufreq_type;
> 
> The devfreq devices with passive governor have always parent
> either devfreq device or cpufreq device. So, you better to specify
> the parent type as following: I think that it is more clear to check
> the type of parent device.
> 
> 	enum devfreq_parent_dev_type {
> 		DEVFREQ_PARENT_DEV,
> 		CPUFREQ_PARENT_DEV,
> 	};
> 
> 	enum devfreq_parent_dev_type parent_type;
> 
>> +
>>  	/* For passive governor's internal use. Don't need to set them */
>>  	struct devfreq *this;
>>  	struct notifier_block nb;
>> +	struct devfreq_cpu_state *state[NR_CPUS];
>> +	struct devfreq_map **map;
> 
> ditto.
> 
>>  };
>>  #endif
>> 
>> 

-- 
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor
  2019-05-27  8:23     ` Sibi Sankar
@ 2019-05-28  1:27       ` Chanwoo Choi
  0 siblings, 0 replies; 21+ messages in thread
From: Chanwoo Choi @ 2019-05-28  1:27 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: robh+dt, andy.gross, myungjoo.ham, kyungmin.park, rjw,
	viresh.kumar, nm, sboyd, georgi.djakov, bjorn.andersson,
	david.brown, mark.rutland, linux-kernel, linux-arm-msm-owner,
	devicetree, rnayak, linux-pm, evgreen, daidavid1, dianders,
	Saravana Kannan, linux-kernel-owner

Hi Sibi,

On 19. 5. 27. 오후 5:23, Sibi Sankar wrote:
> Hey Chanwoo,
> 
> Thanks a lot for reviewing the patch. Like I
> had indicated earlier we decided to go with
> a simpler approach instead on qualcomm SoCs.
> I am happy to re-spin this patch with your
> comments addressed if we do find other users
> for this feature.

Actually, I think that this approach is necessary.
On many real released product like smartphone
have the dependency between cpufreq and devfreq
for memory bus bandwidth. For example, when playing
the video or get the 60fps without loss.

If possible, this patch should be ongoing on either
now or future. Thanks.

> 
> On 2019-04-12 13:09, Chanwoo Choi wrote:
>> Hi,
>>
>> I agree this approach absolutely.
>> Just I add some comments. Please check it.
>>
>> On 19. 3. 29. 오전 12:28, Sibi Sankar wrote:
>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>
>>> Many CPU architectures have caches that can scale independent of the
>>> CPUs. Frequency scaling of the caches is necessary to make sure the cache
>>> is not a performance bottleneck that leads to poor performance and
>>> power. The same idea applies for RAM/DDR.
>>>
>>> To achieve this, this patch add support for cpu based scaling to the
>>> passive governor. This is accomplished by taking the current frequency
>>> of each CPU frequency domain and then adjusts the frequency of the cache
>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>> to CPU frequency transition notifiers to keep itself up to date on the
>>> current CPU frequency.
>>>
>>> To decide the frequency of the device, the governor does one of the
>>> following:
>>> * Constructs a CPU frequency to device frequency mapping table from
>>>   required-opps property of the devfreq device's opp_table
>>>
>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>   the CPUs are running at their max frequency, the device runs at its
>>>   max frequency. If the CPUs are running at their min frequency, the
>>>   device runs at its min frequency. It is interpolated for frequencies
>>>   in between.
>>>
>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>> ---
>>>  drivers/devfreq/Kconfig            |   4 +
>>>  drivers/devfreq/governor_passive.c | 276 ++++++++++++++++++++++++++++-
>>>  include/linux/devfreq.h            |  43 ++++-
>>>  3 files changed, 315 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>> index 6a172d338f6d..9a45f464a56b 100644
>>> --- a/drivers/devfreq/Kconfig
>>> +++ b/drivers/devfreq/Kconfig
>>> @@ -72,6 +72,10 @@ config DEVFREQ_GOV_PASSIVE
>>>        device. This governor does not change the frequency by itself
>>>        through sysfs entries. The passive governor recommends that
>>>        devfreq device uses the OPP table to get the frequency/voltage.
>>> +      Alternatively the governor can also be chosen to scale based on
>>> +      the online CPUs current frequency. A CPU frequency to device
>>> +      frequency mapping table(s) is auto-populated by the governor
>>> +      for this purpose.
>>>
>>>  comment "DEVFREQ Drivers"
>>>
>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>> index 3bc29acbd54e..2506682b233b 100644
>>> --- a/drivers/devfreq/governor_passive.c
>>> +++ b/drivers/devfreq/governor_passive.c
>>> @@ -11,10 +11,63 @@
>>>   */
>>>
>>>  #include <linux/module.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/cpufreq.h>
>>> +#include <linux/cpumask.h>
>>>  #include <linux/device.h>
>>>  #include <linux/devfreq.h>
>>> +#include <linux/of.h>
>>> +#include <linux/slab.h>
>>>  #include "governor.h"
>>>
>>> +static unsigned int xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>> +                         unsigned int cpu)
>>> +{
>>> +    unsigned int cpu_min, cpu_max;
>>> +    struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +    unsigned int dev_min, dev_max, cpu_percent, cpu_freq = 0, freq = 0;
>>> +    unsigned long *freq_table = devfreq->profile->freq_table;
>>> +    struct device *dev = devfreq->dev.parent;
>>> +    struct devfreq_map *map;
>>> +    int opp_cnt, i;
>>> +
>>> +    if (!data->state[cpu] || data->state[cpu]->first_cpu != cpu) {
>>> +        freq = 0;
>>> +        goto out;
>>
>> goto out -> return 0;
>>
>>> +    }
>>> +
>>> +    /* Use Interpolation if map is not available */
>>> +    cpu_freq = data->state[cpu]->freq;
>>> +    if (!data->map) {
>>> +        cpu_min = data->state[cpu]->min_freq;
>>> +        cpu_max = data->state[cpu]->max_freq;
>>> +        if (freq_table) {
>>> +            dev_min = freq_table[0];
>>> +            dev_max = freq_table[devfreq->profile->max_state - 1];
>>
>> Actually, it is not always true. The devfreq recommend the ascending order for
>> 'freq_table'. But, it is not mandatory. Also, some devfreq device uses the
>> decending order for 'freq_table'. So, a patch[1] was considering the order
>> when getting the minimum/maximum frequency from freq_table.
>>
>> If you want to get the minimum/maximum frequency, you have to consider the order
>> of 'freq_table' as the patch[1].
>>
>> [1] df5cf4a36178 ("PM / devfreq: Fix handling of min/max_freq == 0")
>>
>>              /* Get minimum frequency according to sorting order */
>> +               if (freq_table[0] < freq_table[df->profile->max_state - 1])
>> +                       value = freq_table[0];
>> +               else
>> +                       value = freq_table[df->profile->max_state - 1];
>>
>>
>>> +        } else {
>>> +            if (devfreq->max_freq <= devfreq->min_freq)
>>> +                return 0;
>>> +            dev_min = devfreq->min_freq;
>>> +            dev_max = devfreq->max_freq;
>>> +        }
>>> +
>>> +        cpu_percent = ((cpu_freq - cpu_min) * 100) / cpu_max - cpu_min;
>>> +        freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>> +        goto out;
>>
>> You don't need to jump 'out'. Instead, you better to use the 'else' statement
>> for if data->map is not NULL. I think that almost case when using this patch
>> will be available of data->map. In order to skip the likely 'false' statement,
>> I recommend the following sequence.
>>
>>     if (data->map) {
>>         map = data->map[cpu];
>>         ...
>>     } else {
>>         /* Use Interpolation if map is not available */
>>     }
>>
>>
>>> +    }
>>> +
>>> +    map = data->map[cpu];
>>> +    opp_cnt = dev_pm_opp_get_opp_count(dev);
>>> +    for (i = 0; i < opp_cnt; i++) {
>>> +        freq = max(freq, map[i].dev_hz);
>>> +        if (map[i].cpu_khz >= cpu_freq)
>>> +            break;
>>> +    }
>>> +out:
>>> +    dev_dbg(dev, "CPU%u: %d -> dev: %u\n", cpu, cpu_freq, freq);
>>
>> IMO, I think it is not necessary. If you want to print log, you better to print
>> it on device driver instead of governor.
>>
>>> +    return freq;
>>> +}
>>> +
>>>  static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>                      unsigned long *freq)
>>>  {
>>> @@ -23,6 +76,7 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>      struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
>>>      unsigned long child_freq = ULONG_MAX;
>>>      struct dev_pm_opp *opp;
>>> +    unsigned int cpu, tgt_freq = 0;
>>
>> tgt means 'target'? If right, just use target_freq intead of 'tgt_freq'
>> for the readability.
>>
>>>      int i, count, ret = 0;
>>>
>>>      /*
>>> @@ -35,6 +89,14 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>          goto out;
>>>      }
>>>
>>> +    if (p_data->cpufreq_type) {
>>> +        for_each_possible_cpu(cpu)
>>> +            tgt_freq = max(tgt_freq,
>>> +                       xlate_cpufreq_to_devfreq(p_data, cpu));
>>> +        *freq = tgt_freq;
>>
>> You better to change from 'tgt_freq' to 'target_freq' for the readability.
>>
>>> +        goto out;
>>> +    }
>>
>> I think that 'goto out' using is not proper for supporting two case.
>> Instead, you better to split out as following according to the type
>> of parent device (devfreq device or cpufreq device).
>>
>>     switch (p_data->parent_type) {
>>     case DEVFREQ_PARENT_DEV:
>>         ret = get_target_freq_with_devfreq()
>>         break;
>>     case CPUFREQ_PARENT_DEV:
>>         ret = get_target_freq_with_cpufreq()
>>         break;
>>     default:
>>         dev_err(...)
>>         ret = -EINVAL;
>>         goto out;
>>     }
>>
>>     if (ret < 0) {
>>         /* exception handling for 'ret' value */
>>     }
>>
>>> +
>>>      /*
>>>       * If the parent and passive devfreq device uses the OPP table,
>>>       * get the next frequency by using the OPP table.
>>> @@ -149,6 +211,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>      return NOTIFY_DONE;
>>>  }
>>>
>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>> +                     unsigned long event, void *ptr)
>>> +{
>>> +    struct devfreq_passive_data *data =
>>> +            container_of(nb, struct devfreq_passive_data, nb);
>>> +    struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +    struct cpufreq_freqs *freq = ptr;
>>> +    struct devfreq_cpu_state *state;
>>
>> nitpick. how about using 'cpu_state' instead of 'state'?
>> in order to get the meaning from just variable name.
>>
>>> +    int ret = 0;
>>> +
>>> +    if (event != CPUFREQ_POSTCHANGE)
>>> +        goto out;
>>
>> just 'return' is simple instead of 'goto out' because this case
>> don't need to treat the any restoring code. And also, you have
>> to check whether freq is NULL or not as following:
>>
>>     if (event != CPUFREQ_POSTCHANGE || !freq || data->state[freq->cpu])
>>         return ret;
>>     state = data->state[freq->cpu];
>>
>>> +
>>> +    state = data->state[freq->cpu];
>>> +    if (!state)
>>> +        goto out;
>>> +
>>> +    if (state->freq != freq->new) {
>>> +        state->freq = freq->new;
>>
>> You have to update the frequency after update_devfreq() is completed
>> without error.
>>
>>> +        mutex_lock(&devfreq->lock);
>>> +        ret = update_devfreq(devfreq);
>>> +        mutex_unlock(&devfreq->lock);
>>> +        if (ret)
>>> +            dev_err(&devfreq->dev, "Frequency update failed.\n");
>>
>> Almost devfreq error used the following format: "Couldn't ..." .
>> If there is no any specific reason to change the format for error log,
>>     "Couldnt update the frequency.\n"
>>
>>> +    }> +out:
>>> +    return ret;
>>
>> Also, we can reduce the unneeded indentation as following:
>>
>>     if (state->freq == freq->new)
>>         return ret;
>>
>>     mutex_lock(&devfreq->lock);
>>     ret = update_devfreq(devfreq);
>>     mutex_unlock(&devfreq->lock);
>>     if (ret) {
>>         dev_err(&devfreq->dev, "Couldnt update the frequency.\n");
>>         return ret;
>>     }
>>     state->freq = freq->new;
>>
>>     return 0;
>>
>>> +}
>>> +
>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>> +{
>>> +    unsigned int cpu;
>>> +    struct devfreq_map **cpu_map;
>>> +    struct devfreq_map *map, *per_cpu_map;
>>> +    struct devfreq_passive_data *data = *p_data;
>>> +    struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +    int i, count = 0, opp_cnt = 0, ret = 0, iter_val = 0;
>>> +    struct device_node *np, *opp_table_np, *cpu_np;
>>> +    struct opp_table *opp_table, *cpu_opp_tbl;
>>> +    struct device *dev = devfreq->dev.parent;
>>> +    struct devfreq_cpu_state *state;
>>> +    struct dev_pm_opp *opp, *cpu_opp;
>>> +    struct cpufreq_policy *policy;
>>> +    struct device *cpu_dev;
>>> +    u64 cpu_khz, dev_hz;
>>> +
>>> +    get_online_cpus();
>>> +    data->nb.notifier_call = cpufreq_passive_notifier_call;
>>> +    ret = cpufreq_register_notifier(&data->nb,
>>> +                    CPUFREQ_TRANSITION_NOTIFIER);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    /* Populate devfreq_cpu_state */
>>> +    for_each_online_cpu(cpu) {
>>> +        if (data->state[cpu])
>>> +            continue;
>>> +
>>> +        policy = cpufreq_cpu_get(cpu);
>>> +        if (policy) {
>>> +            state = kzalloc(sizeof(*state), GFP_KERNEL);
>>> +            if (!state)
>>> +                return -ENOMEM;
>>> +
>>> +            state->first_cpu = cpumask_first(policy->related_cpus);
>>> +            state->freq = policy->cur;
>>> +            state->min_freq = policy->cpuinfo.min_freq;
>>> +            state->max_freq = policy->cpuinfo.max_freq;
>>> +            data->state[cpu] = state;
>>> +            cpufreq_cpu_put(policy);
>>> +        } else {
>>> +            return -EPROBE_DEFER;
>>> +        }
>>> +    }
>>> +
>>> +    opp_table_np = dev_pm_opp_of_get_opp_desc_node(dev);
>>> +    if (!opp_table_np)
>>> +        goto out;
>>> +
>>> +    opp_cnt = dev_pm_opp_get_opp_count(dev);
>>> +    if (opp_cnt <= 0)
>>> +        goto put_opp_table;
>>> +
>>> +    /* Allocate memory for devfreq_map*/
>>> +    cpu_map = kcalloc(num_possible_cpus(), sizeof(*cpu_map), GFP_KERNEL);
>>> +    if (!cpu_map)
>>> +        return -ENOMEM;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        per_cpu_map = kcalloc(opp_cnt, sizeof(*per_cpu_map),
>>> +                      GFP_KERNEL);
>>> +        if (!per_cpu_map)
>>> +            return -ENOMEM;
>>> +        cpu_map[cpu] = per_cpu_map;
>>> +    }
>>> +    data->map = cpu_map;
>>> +
>>> +    /* Populate devfreq_map */
>>> +    opp_table = dev_pm_opp_get_opp_table(dev);
>>> +    if (!opp_table)
>>> +        return -ENOMEM;
>>> +
>>> +    for_each_available_child_of_node(opp_table_np, np) {
>>> +        opp = dev_pm_opp_find_opp_of_np(opp_table, np);
>>> +        if (IS_ERR(opp))
>>> +            continue;
>>> +
>>> +        dev_hz = dev_pm_opp_get_freq(opp);
>>> +        dev_pm_opp_put(opp);
>>> +
>>> +        count = of_count_phandle_with_args(np, "required-opps", NULL);
>>> +        for (i = 0; i < count; i++) {
>>> +            for_each_possible_cpu(cpu) {
>>> +                cpu_dev = get_cpu_device(cpu);
>>> +                if (!cpu_dev) {
>>> +                    dev_err(dev, "CPU get device failed.\n");
>>> +                    continue;
>>> +                }
>>> +
>>> +                cpu_np = of_parse_required_opp(np, i);
>>> +                if (!cpu_np) {
>>> +                    dev_err(dev, "Parsing required opp failed.\n");
>>> +                    continue;
>>> +                }
>>> +
>>> +                /* Get cpu opp-table */
>>> +                cpu_opp_tbl = dev_pm_opp_get_opp_table(cpu_dev);
>>> +                if (!cpu_opp_tbl) {
>>> +                    dev_err(dev, "CPU opp table get failed.\n");
>>> +                    goto put_cpu_node;
>>> +                }
>>> +
>>> +                /* Match the cpu opp node from required-opp with
>>> +                 * the cpu-opp table */
>>> +                cpu_opp = dev_pm_opp_find_opp_of_np(cpu_opp_tbl,
>>> +                                    cpu_np);
>>> +                if (!cpu_opp) {
>>> +                    dev_dbg(dev, "CPU opp get failed.\n");
>>> +                    goto put_cpu_opp_table;
>>> +                }
>>> +
>>> +                cpu_khz = dev_pm_opp_get_freq(cpu_opp);
>>> +                if (cpu_opp && cpu_khz) {
>>> +                    /* Update freq-map if not already set */
>>> +                    map = cpu_map[cpu];
>>> +                    map[iter_val].cpu_khz = cpu_khz / 1000;
>>> +                    map[iter_val].dev_hz = dev_hz;
>>> +                }
>>> +                dev_pm_opp_put(cpu_opp);
>>> +put_cpu_opp_table:
>>> +                dev_pm_opp_put_opp_table(cpu_opp_tbl);
>>> +put_cpu_node:
>>> +                of_node_put(cpu_np);
>>> +            }
>>> +        }
>>> +        iter_val++;
>>> +    }
>>> +    dev_pm_opp_put_opp_table(opp_table);
>>> +put_opp_table:
>>> +    of_node_put(opp_table_np);
>>> +out:
>>> +    put_online_cpus();
>>> +
>>> +    /* Update devfreq */
>>> +    mutex_lock(&devfreq->lock);
>>> +    ret = update_devfreq(devfreq);
>>> +    mutex_unlock(&devfreq->lock);
>>> +    if (ret)
>>> +        dev_err(dev, "Frequency update failed.\n");
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>> +{
>>> +    int cpu;
>>> +    struct devfreq_passive_data *data = *p_data;
>>> +
>>> +    cpufreq_unregister_notifier(&data->nb,
>>> +                    CPUFREQ_TRANSITION_NOTIFIER);
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        kfree(data->state[cpu]);
>>> +        kfree(data->map[cpu]);
>>> +        data->state[cpu] = NULL;
>>> +        data->map[cpu] = NULL;
>>> +    }
>>> +
>>> +    kfree(data->map);
>>> +    data->map = NULL;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>                  unsigned int event, void *data)
>>>  {
>>> @@ -159,7 +415,7 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>      struct notifier_block *nb = &p_data->nb;
>>>      int ret = 0;
>>>
>>> -    if (!parent)
>>> +    if (!parent && !p_data->cpufreq_type)
>>>          return -EPROBE_DEFER;
>>
>> It makes the fault for the existing devfreq devices with passive governor.
>> Please remove '!p_data->cpufreq_type' condition.
>>
>>>
>>>      switch (event) {
>>> @@ -167,13 +423,21 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>          if (!p_data->this)
>>>              p_data->this = devfreq;
>>>
>>> -        nb->notifier_call = devfreq_passive_notifier_call;
>>> -        ret = devm_devfreq_register_notifier(dev, parent, nb,
>>> -                    DEVFREQ_TRANSITION_NOTIFIER);
>>> +        if (p_data->cpufreq_type) {
>>> +            ret = cpufreq_passive_register(&p_data);
>>> +        } else {
>>> +            nb->notifier_call = devfreq_passive_notifier_call;
>>> +            ret = devm_devfreq_register_notifier(dev, parent, nb,
>>> +                        DEVFREQ_TRANSITION_NOTIFIER);
>>> +        }
>>
>> I suggested the my opinion aboyt 'cpufreq_type' variable below.
>> You can separte it for more clready with using parent device type.
>>
>>         if (p_data->parent_type == DEVFREQ_PARENT_DEV)
>>             ...
>>         else if (p_data->parent_type == CPUFREQ_PARENT_DEV)
>>             ...
>>         else
>>             // error handling
>>
>>>          break;
>>>      case DEVFREQ_GOV_STOP:
>>> -        devm_devfreq_unregister_notifier(dev, parent, nb,
>>> -                    DEVFREQ_TRANSITION_NOTIFIER);
>>> +        if (p_data->cpufreq_type) {
>>> +            cpufreq_passive_unregister(&p_data);
>>> +        } else {
>>> +            devm_devfreq_unregister_notifier(dev, parent, nb,
>>> +                        DEVFREQ_TRANSITION_NOTIFIER);
>>> +        }
>>
>> ditto.
>>
>>>          break;
>>>      default:
>>>          break;
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index fbffa74bfc1b..e8235fbe49e6 100644
>>> --- a/include/linux/devfreq.h
>>> +++ b/include/linux/devfreq.h
>>> @@ -265,6 +265,38 @@ struct devfreq_simple_ondemand_data {
>>>  #endif
>>>
>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>> +/**
>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>> + * @freq:    holds the current frequency of the cpu.
>>> + * @min_freq:    holds the min frequency of the cpu.
>>> + * @max_freq:    holds the max frequency of the cpu.
>>> + * @first_cpu:    holds the cpumask of the first cpu of a policy.
>>> + *
>>> + * This structure stores the required cpu_state of a cpu.
>>> + * This is auto-populated by the governor.
>>> + */
>>> +struct devfreq_cpu_state {
>>> +    unsigned int freq;
>>> +    unsigned int min_freq;
>>> +    unsigned int max_freq;
>>> +    unsigned int first_cpu;
>>> +};
>>> +
>>> +/**
>>> + * struct devfreq_map - holds mapping from cpu frequency
>>> + * to devfreq frequency
>>> + * @cpu_khz:    holds the cpu frequency in Khz
>>> + * @dev_hz:    holds the devfreq device frequency in Hz
>>> + *
>>> + * This structure stores the lookup table between cpu
>>> + * and the devfreq device. This is auto-populated by the
>>> + * governor.
>>> + */
>>> +struct devfreq_map {
>>
>> How about changing the structure name as following or others?
>> - devfreq_freq_map or devfreq_cpufreq_map or others.
>>
>> Because this structure name guessing the meaning of mapping
>> between devfreq frequency and cpufreq frequency.
>>
>>> +    unsigned int cpu_khz;
>>> +    unsigned int dev_hz;
>>> +};
>>> +
>>>  /**
>>>   * struct devfreq_passive_data - void *data fed to struct devfreq
>>>   *    and devfreq_add_device
>>> @@ -278,11 +310,13 @@ struct devfreq_simple_ondemand_data {
>>>   *            the next frequency, should use this callback.
>>>   * @this:    the devfreq instance of own device.
>>>   * @nb:        the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>> + * @state:    holds the state min/max/current frequency of all online cpu's
>>
>> As I commented above, how about using 'cpu_state' instead of 'state'?
>> in order to get the meaning from just variable name.
>>
>> nitpick. Also,  I think that you can skip 'holds' from the description
>> of 'state' variable.
>>
>>
>>> + * @map:    holds the maps between cpu frequency and device frequency
>>
>> How about using 'cpufreq_map' instead of 'map' for the readability?
>> IMHO, Because this variable is only used when parent device is cpu.
>> I think that if you add to specify the meaningful prefix about cpu to
>> variable name,
>> it is easy to catch the meaning of variable.
>> - map -> cpufreq_map.
>>
>> nitpick. Also,  I think that you can skip 'holds' from the description
>> of 'map' variable.
>>
>>>   *
>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>   * device with governors except for the passive governor. But, don't need to
>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>> - * them.
>>> + * initialize the 'this', 'nb', 'state' and 'map' field because the devfreq
>>
>> If you agree my opinion above,
>> - state -> cpu_state.
>> - map -> cpufreq_map
>>
>>> + * core will handle them.
>>>   */
>>>  struct devfreq_passive_data {
>>>      /* Should set the devfreq instance of parent device */
>>> @@ -291,9 +325,14 @@ struct devfreq_passive_data {
>>>      /* Optional callback to decide the next frequency of passvice device */
>>>      int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>
>>> +    /* Should be set if the devfreq device wants to be scaled with cpu*/
>>> +    u8 cpufreq_type;
>>
>> The devfreq devices with passive governor have always parent
>> either devfreq device or cpufreq device. So, you better to specify
>> the parent type as following: I think that it is more clear to check
>> the type of parent device.
>>
>>     enum devfreq_parent_dev_type {
>>         DEVFREQ_PARENT_DEV,
>>         CPUFREQ_PARENT_DEV,
>>     };
>>
>>     enum devfreq_parent_dev_type parent_type;
>>
>>> +
>>>      /* For passive governor's internal use. Don't need to set them */
>>>      struct devfreq *this;
>>>      struct notifier_block nb;
>>> +    struct devfreq_cpu_state *state[NR_CPUS];
>>> +    struct devfreq_map **map;
>>
>> ditto.
>>
>>>  };
>>>  #endif
>>>
>>>
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication
  2019-03-28 15:28 ` [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication Sibi Sankar
@ 2019-07-08  3:28   ` Hsin-Yi Wang
  2019-07-10  8:01     ` Sibi Sankar
  0 siblings, 1 reply; 21+ messages in thread
From: Hsin-Yi Wang @ 2019-07-08  3:28 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Rob Herring, andy.gross, MyungJoo Ham, Kyungmin Park,
	Rafael J. Wysocki, Viresh Kumar, Nishanth Menon, Stephen Boyd,
	georgi.djakov, bjorn.andersson, david.brown, Mark Rutland, lkml,
	linux-arm-msm-owner, devicetree, rnayak, Chanwoo Choi, linux-pm,
	evgreen, daidavid1, dianders

On Thu, Mar 28, 2019 at 3:28 PM Sibi Sankar <sibis@codeaurora.org> wrote:

> +
> +/* The caller must call dev_pm_opp_put() after the OPP is used */
> +struct dev_pm_opp *dev_pm_opp_find_opp_of_np(struct opp_table *opp_table,
> +                                            struct device_node *opp_np)
> +{
> +       return _find_opp_of_np(opp_table, opp_np);
> +}
Hi Sibi,

Though this is not the latest version, we've seen following issue:

We would get lockdep warnings on this:
[   79.068957] Call trace:
[   79.071396]  _find_opp_of_np+0xa0/0xa8
[   79.075136]  dev_pm_opp_find_opp_of_np+0x24/0x30
[   79.079744]  devfreq_passive_event_handler+0x304/0x51c
[   79.084872]  devfreq_add_device+0x368/0x434
[   79.089046]  devm_devfreq_add_device+0x68/0xb0
[   79.093480]  mtk_cci_devfreq_probe+0x108/0x158
[   79.097915]  platform_drv_probe+0x80/0xb0
[   79.101915]  really_probe+0x1b4/0x28c
[   79.105568]  driver_probe_device+0x64/0xfc
[   79.109655]  __driver_attach+0x94/0xcc
[   79.113395]  bus_for_each_dev+0x84/0xcc
[   79.117221]  driver_attach+0x2c/0x38
[   79.120788]  bus_add_driver+0x120/0x1f4
[   79.124614]  driver_register+0x64/0xf8
[   79.128355]  __platform_driver_register+0x4c/0x58
[   79.133049]  mtk_cci_devfreq_init+0x1c/0x24
[   79.137224]  do_one_initcall+0x1c0/0x3e0
[   79.141138]  do_initcall_level+0x1f4/0x224
[   79.145225]  do_basic_setup+0x34/0x4c
[   79.148878]  kernel_init_freeable+0x10c/0x194
[   79.153225]  kernel_init+0x14/0x100
[   79.156705]  ret_from_fork+0x10/0x18
[   79.160270] irq event stamp: 238006
[   79.163750] hardirqs last  enabled at (238005):
[<ffffffa71fdea0a4>] _raw_spin_unlock_irqrestore+0x40/0x84
[   79.173391] hardirqs last disabled at (238006):
[<ffffffa71f480e78>] do_debug_exception+0x70/0x198
[   79.182337] softirqs last  enabled at (237998):
[<ffffffa71f48165c>] __do_softirq+0x45c/0x4a4
[   79.190850] softirqs last disabled at (237987):
[<ffffffa71f4bc0d4>] irq_exit+0xd8/0xf8
[   79.198842] ---[ end trace 0e66a55077a0abab ]---

In _find_opp_of_np()[1], there's
lockdep_assert_held(&opp_table_lock);

[1] https://elixir.bootlin.com/linux/latest/source/drivers/opp/of.c#L75

But in governor passive.c#cpufreq_passive_register(), it call
dev_pm_opp_find_opp_of_np() directly, so it wouldn't access
opp_table_lock lock.

Another similar place is in dev_pm_opp_of_add_table(), most devfreq
would call this to get opp table.
dev_pm_opp_of_add_table
 -->   _opp_add_static_v2
    -->    _of_opp_alloc_required_opps  // would goes here if opp
table contains "required-opps" property.
        -->    _find_opp_of_np
cpufreq-map governor needs devfreq to have "required-opps" property.
So it would also trigger above lockdep warning.


The question is: Is lockdep_assert_held(&opp_table_lock); needed in
above use cases? Since they don't need to modify device and opp lists.

Thanks

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication
  2019-07-08  3:28   ` Hsin-Yi Wang
@ 2019-07-10  8:01     ` Sibi Sankar
  0 siblings, 0 replies; 21+ messages in thread
From: Sibi Sankar @ 2019-07-10  8:01 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Rob Herring, andy.gross, MyungJoo Ham, Kyungmin Park,
	Rafael J. Wysocki, Viresh Kumar, Nishanth Menon, Stephen Boyd,
	georgi.djakov, bjorn.andersson, david.brown, Mark Rutland, lkml,
	linux-arm-msm-owner, devicetree, rnayak, Chanwoo Choi, linux-pm,
	evgreen, daidavid1, dianders

Hi Hsin-Yi,

I'll get this addressed in the next re-spin which I plan to post by
end of this week.

On 7/8/19 8:58 AM, Hsin-Yi Wang wrote:
> On Thu, Mar 28, 2019 at 3:28 PM Sibi Sankar <sibis@codeaurora.org> wrote:
> 
>> +
>> +/* The caller must call dev_pm_opp_put() after the OPP is used */
>> +struct dev_pm_opp *dev_pm_opp_find_opp_of_np(struct opp_table *opp_table,
>> +                                            struct device_node *opp_np)
>> +{
>> +       return _find_opp_of_np(opp_table, opp_np);
>> +}
> Hi Sibi,
> 
> Though this is not the latest version, we've seen following issue:
> 
> We would get lockdep warnings on this:
> [   79.068957] Call trace:
> [   79.071396]  _find_opp_of_np+0xa0/0xa8
> [   79.075136]  dev_pm_opp_find_opp_of_np+0x24/0x30
> [   79.079744]  devfreq_passive_event_handler+0x304/0x51c
> [   79.084872]  devfreq_add_device+0x368/0x434
> [   79.089046]  devm_devfreq_add_device+0x68/0xb0
> [   79.093480]  mtk_cci_devfreq_probe+0x108/0x158
> [   79.097915]  platform_drv_probe+0x80/0xb0
> [   79.101915]  really_probe+0x1b4/0x28c
> [   79.105568]  driver_probe_device+0x64/0xfc
> [   79.109655]  __driver_attach+0x94/0xcc
> [   79.113395]  bus_for_each_dev+0x84/0xcc
> [   79.117221]  driver_attach+0x2c/0x38
> [   79.120788]  bus_add_driver+0x120/0x1f4
> [   79.124614]  driver_register+0x64/0xf8
> [   79.128355]  __platform_driver_register+0x4c/0x58
> [   79.133049]  mtk_cci_devfreq_init+0x1c/0x24
> [   79.137224]  do_one_initcall+0x1c0/0x3e0
> [   79.141138]  do_initcall_level+0x1f4/0x224
> [   79.145225]  do_basic_setup+0x34/0x4c
> [   79.148878]  kernel_init_freeable+0x10c/0x194
> [   79.153225]  kernel_init+0x14/0x100
> [   79.156705]  ret_from_fork+0x10/0x18
> [   79.160270] irq event stamp: 238006
> [   79.163750] hardirqs last  enabled at (238005):
> [<ffffffa71fdea0a4>] _raw_spin_unlock_irqrestore+0x40/0x84
> [   79.173391] hardirqs last disabled at (238006):
> [<ffffffa71f480e78>] do_debug_exception+0x70/0x198
> [   79.182337] softirqs last  enabled at (237998):
> [<ffffffa71f48165c>] __do_softirq+0x45c/0x4a4
> [   79.190850] softirqs last disabled at (237987):
> [<ffffffa71f4bc0d4>] irq_exit+0xd8/0xf8
> [   79.198842] ---[ end trace 0e66a55077a0abab ]---
> 
> In _find_opp_of_np()[1], there's
> lockdep_assert_held(&opp_table_lock);
> 
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/opp/of.c#L75
> 
> But in governor passive.c#cpufreq_passive_register(), it call
> dev_pm_opp_find_opp_of_np() directly, so it wouldn't access
> opp_table_lock lock.
> 
> Another similar place is in dev_pm_opp_of_add_table(), most devfreq
> would call this to get opp table.
> dev_pm_opp_of_add_table
>   -->   _opp_add_static_v2
>      -->    _of_opp_alloc_required_opps  // would goes here if opp
> table contains "required-opps" property.
>          -->    _find_opp_of_np
> cpufreq-map governor needs devfreq to have "required-opps" property.
> So it would also trigger above lockdep warning.
> 
> 
> The question is: Is lockdep_assert_held(&opp_table_lock); needed in
> above use cases? Since they don't need to modify device and opp lists.
> 
> Thanks
> 
> 
> 

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-28 15:28 [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 1/9] OPP: Add and export helpers to get avg/peak bw Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 2/9] OPP: Export a number of helpers to prevent code duplication Sibi Sankar
2019-07-08  3:28   ` Hsin-Yi Wang
2019-07-10  8:01     ` Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 3/9] PM / devfreq: Add cpu based scaling support to passive_governor Sibi Sankar
2019-04-12  7:39   ` Chanwoo Choi
2019-05-27  8:23     ` Sibi Sankar
2019-05-28  1:27       ` Chanwoo Choi
2019-03-28 15:28 ` [PATCH RFC 4/9] dt-bindings: devfreq: Add bindings for devfreq dev-icbw driver Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 5/9] PM / devfreq: Add devfreq driver for interconnect bandwidth voting Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 6/9] OPP: Add and export helper to update voltage Sibi Sankar
2019-04-10 10:24   ` Viresh Kumar
2019-04-10 11:08     ` Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 7/9] cpufreq: qcom: Add support to update cpu node's OPP tables Sibi Sankar
2019-04-10 10:33   ` Viresh Kumar
2019-04-10 11:16     ` Sibi Sankar
2019-04-10 11:25       ` Viresh Kumar
2019-03-28 15:28 ` [PATCH RFC 8/9] arm64: dts: qcom: sdm845: Add cpu " Sibi Sankar
2019-03-28 15:28 ` [PATCH RFC 9/9] arm64: dts: qcom: sdm845: Add nodes for icbw driver and opp tables Sibi Sankar
2019-04-11  7:02 ` [PATCH RFC 0/9] Add CPU based scaling support to Passive governor Sibi Sankar

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox