devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
@ 2019-07-03  1:10 Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
                   ` (7 more replies)
  0 siblings, 8 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

Interconnects and interconnect paths quantify their performance levels in
terms of bandwidth and not in terms of frequency. So similar to how we have
frequency based OPP tables in DT and in the OPP framework, we need
bandwidth OPP table support in the OPP framework and in DT. Since there can
be more than one interconnect path used by a device, we also need a way to
assign a bandwidth OPP table to an interconnect path.

This patch series:
- Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
- Adds interconnect-opp-table property to interconnect DT bindings
- Adds OPP helper functions for bandwidth OPP tables
- Adds icc_get_opp_table() to get the OPP table for an interconnect path

So with the DT bindings added in this patch series, the DT for a GPU
that does bandwidth voting from GPU to Cache and GPU to DDR would look
something like this:

gpu_cache_opp_table: gpu_cache_opp_table {
	compatible = "operating-points-v2";

	gpu_cache_3000: opp-3000 {
		opp-peak-KBps = <3000>;
		opp-avg-KBps = <1000>;
	};
	gpu_cache_6000: opp-6000 {
		opp-peak-KBps = <6000>;
		opp-avg-KBps = <2000>;
	};
	gpu_cache_9000: opp-9000 {
		opp-peak-KBps = <9000>;
		opp-avg-KBps = <9000>;
	};
};

gpu_ddr_opp_table: gpu_ddr_opp_table {
	compatible = "operating-points-v2";

	gpu_ddr_1525: opp-1525 {
		opp-peak-KBps = <1525>;
		opp-avg-KBps = <452>;
	};
	gpu_ddr_3051: opp-3051 {
		opp-peak-KBps = <3051>;
		opp-avg-KBps = <915>;
	};
	gpu_ddr_7500: opp-7500 {
		opp-peak-KBps = <7500>;
		opp-avg-KBps = <3000>;
	};
};

gpu_opp_table: gpu_opp_table {
	compatible = "operating-points-v2";
	opp-shared;

	opp-200000000 {
		opp-hz = /bits/ 64 <200000000>;
	};
	opp-400000000 {
		opp-hz = /bits/ 64 <400000000>;
	};
};

gpu@7864000 {
	...
	operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>;
	interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEM_CACHE>,
			<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
	interconnect-names = "gpu-cache", "gpu-mem";
	interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>
};

Cheers,
Saravana

Saravana Kannan (6):
  dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  OPP: Add support for bandwidth OPP tables
  OPP: Add helper function for bandwidth OPP tables
  OPP: Add API to find an OPP table from its DT node
  dt-bindings: interconnect: Add interconnect-opp-table property
  interconnect: Add OPP table support for interconnects

 .../bindings/interconnect/interconnect.txt    |  8 ++
 Documentation/devicetree/bindings/opp/opp.txt | 15 +++-
 drivers/interconnect/core.c                   | 27 ++++++-
 drivers/opp/core.c                            | 51 +++++++++++++
 drivers/opp/of.c                              | 76 ++++++++++++++++---
 drivers/opp/opp.h                             |  4 +-
 include/linux/interconnect.h                  |  7 ++
 include/linux/pm_opp.h                        | 26 +++++++
 8 files changed, 199 insertions(+), 15 deletions(-)

-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-16 17:25   ` Sibi Sankar
                     ` (2 more replies)
  2019-07-03  1:10 ` [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables Saravana Kannan
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

Interconnects often quantify their performance points in terms of
bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
allow specifying Bandwidth OPP tables in DT.

opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
tables.

opp-avg-KBps is an optional property that can be used in Bandwidth OPP
tables.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index 76b6c79604a5..c869e87caa2a 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -83,9 +83,14 @@ properties.
 
 Required properties:
 - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
-  required property for all device nodes but devices like power domains. The
-  power domain nodes must have another (implementation dependent) property which
-  uniquely identifies the OPP nodes.
+  required property for all device nodes but for devices like power domains or
+  bandwidth opp tables. The power domain nodes must have another (implementation
+  dependent) property which uniquely identifies the OPP nodes. The interconnect
+  opps are required to have the opp-peak-bw property.
+
+- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
+  big-endian integer. This is a required property for all devices that don't
+  have opp-hz. For example, bandwidth OPP tables for interconnect paths.
 
 Optional properties:
 - opp-microvolt: voltage in micro Volts.
@@ -132,6 +137,10 @@ Optional properties:
 - opp-level: A value representing the performance level of the device,
   expressed as a 32-bit integer.
 
+- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
+  32-bit big-endian integer. This property is only meaningful in OPP tables
+  where opp-peak-KBps is present.
+
 - clock-latency-ns: Specifies the maximum possible transition latency (in
   nanoseconds) for switching to this OPP from any other OPP.
 
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-16 17:33   ` Sibi Sankar
  2019-07-30 10:57   ` Amit Kucheria
  2019-07-03  1:10 ` [PATCH v3 3/6] OPP: Add helper function " Saravana Kannan
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

Not all devices quantify their performance points in terms of frequency.
Devices like interconnects quantify their performance points in terms of
bandwidth. We need a way to represent these bandwidth levels in OPP. So,
add support for parsing bandwidth OPPs from DT.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/opp/of.c  | 34 ++++++++++++++++++++++++++++++++--
 drivers/opp/opp.h |  4 +++-
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c10c782d15aa..54fa70ed2adc 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -552,6 +552,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
 
+static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
+{
+	int ret;
+	u64 rate;
+	u32 bw;
+
+	ret = of_property_read_u64(np, "opp-hz", &rate);
+	if (!ret) {
+		/*
+		 * Rate is defined as an unsigned long in clk API, and so
+		 * casting explicitly to its type. Must be fixed once rate is 64
+		 * bit guaranteed in clk API.
+		 */
+		new_opp->rate = (unsigned long)rate;
+		return 0;
+	}
+
+	ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
+	if (ret)
+		return ret;
+	new_opp->rate = (unsigned long) &bw;
+
+	ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
+	if (!ret)
+		new_opp->avg_bw = (unsigned long) &bw;
+
+	return 0;
+}
+
 /**
  * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
  * @opp_table:	OPP table
@@ -589,11 +618,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
 	if (!new_opp)
 		return ERR_PTR(-ENOMEM);
 
-	ret = of_property_read_u64(np, "opp-hz", &rate);
+	ret = _read_opp_key(new_opp, np);
 	if (ret < 0) {
 		/* "opp-hz" is optional for devices like power domains. */
 		if (!opp_table->is_genpd) {
-			dev_err(dev, "%s: opp-hz not found\n", __func__);
+			dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
+				__func__);
 			goto free_opp;
 		}
 
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 569b3525aa67..ead2cdafe957 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -59,7 +59,8 @@ extern struct list_head opp_tables;
  * @turbo:	true if turbo (boost) OPP
  * @suspend:	true if suspend OPP
  * @pstate: Device's power domain's performance state.
- * @rate:	Frequency in hertz
+ * @rate:	Frequency in hertz OR Peak bandwidth in kilobytes per second
+ * @avg_bw:	Average bandwidth in kilobytes per second
  * @level:	Performance level
  * @supplies:	Power supplies voltage/current values
  * @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's
@@ -81,6 +82,7 @@ struct dev_pm_opp {
 	bool suspend;
 	unsigned int pstate;
 	unsigned long rate;
+	unsigned long avg_bw;
 	unsigned int level;
 
 	struct dev_pm_opp_supply *supplies;
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 3/6] OPP: Add helper function for bandwidth OPP tables
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 4/6] OPP: Add API to find an OPP table from its DT node Saravana Kannan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

The frequency OPP tables have helper functions to search for entries in the
table based on frequency and get the frequency values for a given (or
suspend) OPP entry.

Add similar helper functions for bandwidth OPP tables to search for entries
in the table based on peak bandwidth and to get the peak and average
bandwidth for a given (or suspend) OPP entry.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/opp/core.c     | 51 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_opp.h | 19 ++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 0e7703fe733f..0168862579f1 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -130,6 +130,29 @@ unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_get_freq);
 
+/**
+ * dev_pm_opp_get_bw() - Gets the bandwidth corresponding to an available opp
+ * @opp:	opp for which frequency has to be returned for
+ * @avg_bw:	Pointer where the corresponding average bandwidth is stored.
+ *		Can be NULL.
+ *
+ * Return: Peak bandwidth in KBps corresponding to the opp, else
+ * return 0
+ */
+unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp, unsigned long *avg_bw)
+{
+	if (IS_ERR_OR_NULL(opp) || !opp->available) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return 0;
+	}
+
+	if (avg_bw)
+		*avg_bw = opp->avg_bw;
+
+	return opp->rate;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_bw);
+
 /**
  * dev_pm_opp_get_level() - Gets the level corresponding to an available opp
  * @opp:	opp for which level value has to be returned for
@@ -302,6 +325,34 @@ unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_get_suspend_opp_freq);
 
+/**
+ * dev_pm_opp_get_suspend_opp_bw() - Get peak bandwidth of suspend opp in KBps
+ * @dev:	device for which we do this operation
+ * @avg_bw:	Pointer where the corresponding average bandwidth is stored.
+ *		Can be NULL.
+ *
+ * Return: This function returns the peak bandwidth of the OPP marked as
+ * suspend_opp if one is available, else returns 0;
+ */
+unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+					    unsigned long *avg_bw)
+{
+	struct opp_table *opp_table;
+	unsigned long peak_bw = 0;
+
+	opp_table = _find_opp_table(dev);
+	if (IS_ERR(opp_table))
+		return 0;
+
+	if (opp_table->suspend_opp && opp_table->suspend_opp->available)
+		peak_bw = dev_pm_opp_get_bw(opp_table->suspend_opp, avg_bw);
+
+	dev_pm_opp_put_opp_table(opp_table);
+
+	return peak_bw;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_suspend_opp_bw);
+
 int _get_opp_count(struct opp_table *opp_table)
 {
 	struct dev_pm_opp *opp;
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index b150fe97ce5a..d4d79ac0b5b2 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -85,6 +85,7 @@ void dev_pm_opp_put_opp_table(struct opp_table *opp_table);
 unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp);
 
 unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp);
+unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp, unsigned long *avg_bw);
 
 unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp);
 
@@ -95,6 +96,8 @@ unsigned long dev_pm_opp_get_max_clock_latency(struct device *dev);
 unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev);
 unsigned long dev_pm_opp_get_max_transition_latency(struct device *dev);
 unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev);
+unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+					    unsigned long *avg_bw);
 
 struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
 					      unsigned long freq,
@@ -161,6 +164,11 @@ static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp)
 {
 	return 0;
 }
+static inline unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp,
+					      unsigned long *avg_bw)
+{
+	return 0;
+}
 
 static inline unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp)
 {
@@ -197,6 +205,12 @@ static inline unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev)
 	return 0;
 }
 
+static inline unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+							  unsigned long *avg_bw)
+{
+	return 0;
+}
+
 static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
 					unsigned long freq, bool available)
 {
@@ -332,6 +346,11 @@ static inline void dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask
 
 #endif		/* CONFIG_PM_OPP */
 
+#define dev_pm_opp_find_peak_bw_exact	dev_pm_opp_find_freq_exact
+#define dev_pm_opp_find_peak_bw_floor	dev_pm_opp_find_freq_floor
+#define dev_pm_opp_find_peak_bw_ceil_by_volt dev_pm_opp_find_freq_ceil_by_volt
+#define dev_pm_opp_find_peak_bw_ceil	dev_pm_opp_find_freq_ceil
+
 #if defined(CONFIG_PM_OPP) && defined(CONFIG_OF)
 int dev_pm_opp_of_add_table(struct device *dev);
 int dev_pm_opp_of_add_table_indexed(struct device *dev, int index);
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 4/6] OPP: Add API to find an OPP table from its DT node
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
                   ` (2 preceding siblings ...)
  2019-07-03  1:10 ` [PATCH v3 3/6] OPP: Add helper function " Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-03  1:10 ` [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property Saravana Kannan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

This allows finding a device's OPP table (when it has multiple) from a
phandle to the OPP table in DT.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/opp/of.c       | 42 ++++++++++++++++++++++++++++++++++--------
 include/linux/pm_opp.h |  7 +++++++
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 54fa70ed2adc..34c51905f56d 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -42,14 +42,9 @@ struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_opp_desc_node);
 
-struct opp_table *_managed_opp(struct device *dev, int index)
+struct opp_table *_find_opp_table_from_node(struct device_node *np)
 {
 	struct opp_table *opp_table, *managed_table = NULL;
-	struct device_node *np;
-
-	np = _opp_of_get_opp_desc_node(dev->of_node, index);
-	if (!np)
-		return NULL;
 
 	list_for_each_entry(opp_table, &opp_tables, node) {
 		if (opp_table->np == np) {
@@ -69,11 +64,42 @@ struct opp_table *_managed_opp(struct device *dev, int index)
 		}
 	}
 
-	of_node_put(np);
-
 	return managed_table;
 }
 
+/**
+ * dev_pm_opp_of_find_table_from_node() - Find OPP table from its DT node
+ * @np: DT node used for finding the OPP table
+ *
+ * Return: OPP table corresponding to the DT node, else NULL on failure.
+ *
+ * The caller needs to put the node with of_node_put() after using it.
+ */
+struct opp_table *dev_pm_opp_of_find_table_from_node(struct device_node *np)
+{
+	struct opp_table *opp_table;
+
+	mutex_lock(&opp_table_lock);
+	opp_table = _find_opp_table_from_node(np);
+	mutex_unlock(&opp_table_lock);
+	return opp_table;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_of_find_table_from_node);
+
+struct opp_table *_managed_opp(struct device *dev, int index)
+{
+	struct device_node *np;
+	struct opp_table *opp_table;
+
+	np = _opp_of_get_opp_desc_node(dev->of_node, index);
+	if (!np)
+		return NULL;
+
+	opp_table = _find_opp_table_from_node(np);
+	of_node_put(np);
+	return opp_table;
+}
+
 /* The caller must call dev_pm_opp_put() after the OPP is used */
 static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table,
 					  struct device_node *opp_np)
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index d4d79ac0b5b2..d80c59bcd50b 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -359,6 +359,7 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask *cpumask);
 void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask);
 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
+struct opp_table *dev_pm_opp_of_find_table_from_node(struct device_node *np);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
 int of_get_required_opp_performance_state(struct device_node *np, int index);
 void dev_pm_opp_of_register_em(struct cpumask *cpus);
@@ -396,6 +397,12 @@ static inline struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device
 	return NULL;
 }
 
+static inline struct opp_table *dev_pm_opp_of_find_table_from_node(
+							struct device_node *np)
+{
+	return NULL;
+}
+
 static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
 {
 	return NULL;
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
                   ` (3 preceding siblings ...)
  2019-07-03  1:10 ` [PATCH v3 4/6] OPP: Add API to find an OPP table from its DT node Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-22 23:39   ` Rob Herring
  2019-07-03  1:10 ` [PATCH v3 6/6] interconnect: Add OPP table support for interconnects Saravana Kannan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

Add support for listing bandwidth OPP tables for each interconnect path
listed using the interconnects property.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 .../devicetree/bindings/interconnect/interconnect.txt     | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt b/Documentation/devicetree/bindings/interconnect/interconnect.txt
index 6f5d23a605b7..fc5b75b76a2c 100644
--- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
+++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
@@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name strings sorted in the same
 			 * dma-mem: Path from the device to the main memory of
 			            the system
 
+interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP tables)
+			that specify the OPPs for the interconnect paths listed
+			in the interconnects property. This property can only
+			point to OPP tables that belong to the device and are
+			listed in the device's operating-points-v2 property.
+
 Example:
 
 	sdhci@7864000 {
+		operating-points-v2 = <&sdhc_opp_table>, <&sdhc_mem_opp_table>;
 		...
 		interconnects = <&pnoc MASTER_SDCC_1 &bimc SLAVE_EBI_CH0>;
 		interconnect-names = "sdhc-mem";
+		interconnect-opp-table = <&sdhc_mem_opp_table>;
 	};
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
                   ` (4 preceding siblings ...)
  2019-07-03  1:10 ` [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property Saravana Kannan
@ 2019-07-03  1:10 ` Saravana Kannan
  2019-07-03  6:45   ` Vincent Guittot
  2019-07-26 16:25   ` Georgi Djakov
  2019-07-03  6:36 ` [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Viresh Kumar
  2019-07-17 10:32 ` Viresh Kumar
  7 siblings, 2 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03  1:10 UTC (permalink / raw)
  To: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: Saravana Kannan, vincent.guittot, seansw, daidavid1,
	Rajendra Nayak, sibis, bjorn.andersson, evgreen, kernel-team,
	linux-pm, devicetree, linux-kernel

Interconnect paths can have different performance points. Now that OPP
framework supports bandwidth OPP tables, add OPP table support for
interconnects.

Devices can use the interconnect-opp-table DT property to specify OPP
tables for interconnect paths. And the driver can obtain the OPP table for
an interconnect path by calling icc_get_opp_table().

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
 include/linux/interconnect.h |  7 +++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 871eb4bc4efc..881bac80bc1e 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -47,6 +47,7 @@ struct icc_req {
  */
 struct icc_path {
 	size_t num_nodes;
+	struct opp_table *opp_table;
 	struct icc_req reqs[];
 };
 
@@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
 {
 	struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
 	struct icc_node *src_node, *dst_node;
-	struct device_node *np = NULL;
+	struct device_node *np = NULL, *opp_node;
 	struct of_phandle_args src_args, dst_args;
 	int idx = 0;
 	int ret;
@@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
 		dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
 	mutex_unlock(&icc_lock);
 
+	opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
+	if (opp_node) {
+		path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
+		of_node_put(opp_node);
+	}
+
+
 	return path;
 }
 EXPORT_SYMBOL_GPL(of_icc_get);
 
+/**
+ * icc_get_opp_table() - Get the OPP table that corresponds to a path
+ * @path: reference to the path returned by icc_get()
+ *
+ * This function will return the OPP table that corresponds to a path handle.
+ * If the interconnect API is disabled, NULL is returned and the consumer
+ * drivers will still build. Drivers are free to handle this specifically, but
+ * they don't have to.
+ *
+ * Return: opp_table pointer on success. NULL is returned when the API is
+ * disabled or the OPP table is missing.
+ */
+struct opp_table *icc_get_opp_table(struct icc_path *path)
+{
+	return path->opp_table;
+}
+
 /**
  * icc_set_bw() - set bandwidth constraints on an interconnect path
  * @path: reference to the path returned by icc_get()
diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
index dc25864755ba..0c0bc55f0e89 100644
--- a/include/linux/interconnect.h
+++ b/include/linux/interconnect.h
@@ -9,6 +9,7 @@
 
 #include <linux/mutex.h>
 #include <linux/types.h>
+#include <linux/pm_opp.h>
 
 /* macros for converting to icc units */
 #define Bps_to_icc(x)	((x) / 1000)
@@ -28,6 +29,7 @@ struct device;
 struct icc_path *icc_get(struct device *dev, const int src_id,
 			 const int dst_id);
 struct icc_path *of_icc_get(struct device *dev, const char *name);
+struct opp_table *icc_get_opp_table(struct icc_path *path);
 void icc_put(struct icc_path *path);
 int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
 
@@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
 {
 }
 
+static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
+{
+	return NULL;
+}
+
 static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
 {
 	return 0;
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
                   ` (5 preceding siblings ...)
  2019-07-03  1:10 ` [PATCH v3 6/6] interconnect: Add OPP table support for interconnects Saravana Kannan
@ 2019-07-03  6:36 ` Viresh Kumar
  2019-07-03 20:35   ` Saravana Kannan
  2019-07-17 10:32 ` Viresh Kumar
  7 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-03  6:36 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, vincent.guittot,
	seansw, daidavid1, Rajendra Nayak, sibis, bjorn.andersson,
	evgreen, kernel-team, linux-pm, devicetree, linux-kernel

On 02-07-19, 18:10, Saravana Kannan wrote:
> Interconnects and interconnect paths quantify their performance levels in
> terms of bandwidth and not in terms of frequency. So similar to how we have
> frequency based OPP tables in DT and in the OPP framework, we need
> bandwidth OPP table support in the OPP framework and in DT. Since there can
> be more than one interconnect path used by a device, we also need a way to
> assign a bandwidth OPP table to an interconnect path.
> 
> This patch series:
> - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> - Adds interconnect-opp-table property to interconnect DT bindings
> - Adds OPP helper functions for bandwidth OPP tables
> - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> 
> So with the DT bindings added in this patch series, the DT for a GPU
> that does bandwidth voting from GPU to Cache and GPU to DDR would look
> something like this:

And what changed since V2 ?

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-03  1:10 ` [PATCH v3 6/6] interconnect: Add OPP table support for interconnects Saravana Kannan
@ 2019-07-03  6:45   ` Vincent Guittot
  2019-07-03 21:33     ` Saravana Kannan
  2019-07-26 16:25   ` Georgi Djakov
  1 sibling, 1 reply; 48+ messages in thread
From: Vincent Guittot @ 2019-07-03  6:45 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	kernel-team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
>
> Interconnect paths can have different performance points. Now that OPP
> framework supports bandwidth OPP tables, add OPP table support for
> interconnects.
>
> Devices can use the interconnect-opp-table DT property to specify OPP
> tables for interconnect paths. And the driver can obtain the OPP table for
> an interconnect path by calling icc_get_opp_table().

The opp table of a path must come from the aggregation of OPP tables
of the interconnect providers. So such kind of OPP table should be at
provider level but not at path level.

>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
>  include/linux/interconnect.h |  7 +++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 871eb4bc4efc..881bac80bc1e 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -47,6 +47,7 @@ struct icc_req {
>   */
>  struct icc_path {
>         size_t num_nodes;
> +       struct opp_table *opp_table;
>         struct icc_req reqs[];
>  };
>
> @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  {
>         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
>         struct icc_node *src_node, *dst_node;
> -       struct device_node *np = NULL;
> +       struct device_node *np = NULL, *opp_node;
>         struct of_phandle_args src_args, dst_args;
>         int idx = 0;
>         int ret;
> @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
>         mutex_unlock(&icc_lock);
>
> +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> +       if (opp_node) {
> +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> +               of_node_put(opp_node);
> +       }
> +
> +
>         return path;
>  }
>  EXPORT_SYMBOL_GPL(of_icc_get);
>
> +/**
> + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> + * @path: reference to the path returned by icc_get()
> + *
> + * This function will return the OPP table that corresponds to a path handle.
> + * If the interconnect API is disabled, NULL is returned and the consumer
> + * drivers will still build. Drivers are free to handle this specifically, but
> + * they don't have to.
> + *
> + * Return: opp_table pointer on success. NULL is returned when the API is
> + * disabled or the OPP table is missing.
> + */
> +struct opp_table *icc_get_opp_table(struct icc_path *path)
> +{
> +       return path->opp_table;
> +}
> +
>  /**
>   * icc_set_bw() - set bandwidth constraints on an interconnect path
>   * @path: reference to the path returned by icc_get()
> diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> index dc25864755ba..0c0bc55f0e89 100644
> --- a/include/linux/interconnect.h
> +++ b/include/linux/interconnect.h
> @@ -9,6 +9,7 @@
>
>  #include <linux/mutex.h>
>  #include <linux/types.h>
> +#include <linux/pm_opp.h>
>
>  /* macros for converting to icc units */
>  #define Bps_to_icc(x)  ((x) / 1000)
> @@ -28,6 +29,7 @@ struct device;
>  struct icc_path *icc_get(struct device *dev, const int src_id,
>                          const int dst_id);
>  struct icc_path *of_icc_get(struct device *dev, const char *name);
> +struct opp_table *icc_get_opp_table(struct icc_path *path);
>  void icc_put(struct icc_path *path);
>  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
>
> @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
>  {
>  }
>
> +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> +{
> +       return NULL;
> +}
> +
>  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
>  {
>         return 0;
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-03  6:36 ` [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Viresh Kumar
@ 2019-07-03 20:35   ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03 20:35 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	seansw, daidavid1, Rajendra Nayak, sibis, Bjorn Andersson,
	evgreen, Android Kernel Team, Linux PM, devicetree, LKML

On Tue, Jul 2, 2019 at 11:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects and interconnect paths quantify their performance levels in
> > terms of bandwidth and not in terms of frequency. So similar to how we have
> > frequency based OPP tables in DT and in the OPP framework, we need
> > bandwidth OPP table support in the OPP framework and in DT. Since there can
> > be more than one interconnect path used by a device, we also need a way to
> > assign a bandwidth OPP table to an interconnect path.
> >
> > This patch series:
> > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> > - Adds interconnect-opp-table property to interconnect DT bindings
> > - Adds OPP helper functions for bandwidth OPP tables
> > - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> >
> > So with the DT bindings added in this patch series, the DT for a GPU
> > that does bandwidth voting from GPU to Cache and GPU to DDR would look
> > something like this:
>
> And what changed since V2 ?

Sorry, forgot to put that in. I just dropped a lot of patches that
weren't relevant to the idea of BW OPPs and tying them to
interconnects.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-03  6:45   ` Vincent Guittot
@ 2019-07-03 21:33     ` Saravana Kannan
  2019-07-04  7:12       ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-03 21:33 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> >
> > Interconnect paths can have different performance points. Now that OPP
> > framework supports bandwidth OPP tables, add OPP table support for
> > interconnects.
> >
> > Devices can use the interconnect-opp-table DT property to specify OPP
> > tables for interconnect paths. And the driver can obtain the OPP table for
> > an interconnect path by calling icc_get_opp_table().
>
> The opp table of a path must come from the aggregation of OPP tables
> of the interconnect providers.

The aggregation of OPP tables of the providers is certainly the
superset of what a path can achieve, but to say that OPPs for
interconnect path should match that superset is an oversimplification
of the reality in hardware.

There are lots of reasons an interconnect path might not want to use
all the available bandwidth options across all the interconnects in
the route.

1. That particular path might not have been validated or verified
   during the HW design process for some of the frequencies/bandwidth
   combinations of the providers.

2. Similarly during parts screening in the factory, some of the
   combinations might not have been screened and can't be guaranteed
   to work.

3. Only a certain set of bandwidth levels might make sense to use from
   a power/performance balance given the device using it. For example:
   - The big CPU might not want to use some of the lower bandwidths
     but the little CPU might want to.
   - The big CPU might not want to use some intermediate bandwidth
     points if they don't save a lot of power compared to a higher
     bandwidth levels, but the little CPU might want to.
   - The little CPU might never want to use the higher set of
     bandwidth levels since they won't be power efficient for the use
     cases that might run on it.

4. It might not make sense from a system level power perspective.
Let's take an example of a path S (source) -> A -> B -> C -> D
(destination).
   - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
     C supports 5 and 10 GB/s
   - If you combine and list the superset of bandwidth levels
     supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
   - Which set of bandwidth levels make sense will depend on the
     hardware characteristics of the interconnects.
   - If B is the biggest power sink, then you might want to use all 10
     levels.
   - If A is the biggest power sink, then you might want to use all 2,
     5 and 10 GB/s of the levels.
   - If C is the biggest power sink then you might only want to use 5
     and 10 GB/s
   - The more hops and paths you get the more convoluted this gets.

5. The design of the interconnects themselves might have an impact on
which bandwidth levels are used.
   - For example, the FIFO depth between two specific interconnects
     might affect the valid bandwidth levels for a specific path.
   - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
     three paths.
   - If C <-> B FIFO depth is small, then there might be a requirement
     that C and B be closely performance matched to avoid system level
     congestion due to back pressure.
   - So S2 -> D1 path can't use all the bandwidth levels supported by
     C-B combination.
   - But S2 -> D2 can use all the bandwidth levels supported by C.
   - And S1 -> D1 can use all the levels supported by A-B combination.

These are just some of the reasons I could recollect in a few minutes.
These are all real world cases I had to deal with in the past several
years of dealing with scaling interconnects. I'm sure vendors and SoCs
I'm not familiar with have other good reasons I'm not aware of.

Trying to figure this all out by aggregating OPP tables of
interconnect providers just isn't feasible nor is it efficient. The
OPP tables for an interconnect path is describing the valid BW levels
supported by that path and verified in hardware and makes a lot of
sense to capture it clearly in DT.

> So such kind of OPP table should be at
> provider level but not at path level.

They can also use it if they want to, but they'll probably want to use
a frequency OPP table.


-Saravana

>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> >  include/linux/interconnect.h |  7 +++++++
> >  2 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > index 871eb4bc4efc..881bac80bc1e 100644
> > --- a/drivers/interconnect/core.c
> > +++ b/drivers/interconnect/core.c
> > @@ -47,6 +47,7 @@ struct icc_req {
> >   */
> >  struct icc_path {
> >         size_t num_nodes;
> > +       struct opp_table *opp_table;
> >         struct icc_req reqs[];
> >  };
> >
> > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> >  {
> >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> >         struct icc_node *src_node, *dst_node;
> > -       struct device_node *np = NULL;
> > +       struct device_node *np = NULL, *opp_node;
> >         struct of_phandle_args src_args, dst_args;
> >         int idx = 0;
> >         int ret;
> > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> >         mutex_unlock(&icc_lock);
> >
> > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > +       if (opp_node) {
> > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > +               of_node_put(opp_node);
> > +       }
> > +
> > +
> >         return path;
> >  }
> >  EXPORT_SYMBOL_GPL(of_icc_get);
> >
> > +/**
> > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > + * @path: reference to the path returned by icc_get()
> > + *
> > + * This function will return the OPP table that corresponds to a path handle.
> > + * If the interconnect API is disabled, NULL is returned and the consumer
> > + * drivers will still build. Drivers are free to handle this specifically, but
> > + * they don't have to.
> > + *
> > + * Return: opp_table pointer on success. NULL is returned when the API is
> > + * disabled or the OPP table is missing.
> > + */
> > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > +{
> > +       return path->opp_table;
> > +}
> > +
> >  /**
> >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> >   * @path: reference to the path returned by icc_get()
> > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > index dc25864755ba..0c0bc55f0e89 100644
> > --- a/include/linux/interconnect.h
> > +++ b/include/linux/interconnect.h
> > @@ -9,6 +9,7 @@
> >
> >  #include <linux/mutex.h>
> >  #include <linux/types.h>
> > +#include <linux/pm_opp.h>
> >
> >  /* macros for converting to icc units */
> >  #define Bps_to_icc(x)  ((x) / 1000)
> > @@ -28,6 +29,7 @@ struct device;
> >  struct icc_path *icc_get(struct device *dev, const int src_id,
> >                          const int dst_id);
> >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> >  void icc_put(struct icc_path *path);
> >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> >
> > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> >  {
> >  }
> >
> > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > +{
> > +       return NULL;
> > +}
> > +
> >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> >  {
> >         return 0;
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-03 21:33     ` Saravana Kannan
@ 2019-07-04  7:12       ` Vincent Guittot
  2019-07-07 21:48         ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Vincent Guittot @ 2019-07-04  7:12 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Interconnect paths can have different performance points. Now that OPP
> > > framework supports bandwidth OPP tables, add OPP table support for
> > > interconnects.
> > >
> > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > an interconnect path by calling icc_get_opp_table().
> >
> > The opp table of a path must come from the aggregation of OPP tables
> > of the interconnect providers.
>
> The aggregation of OPP tables of the providers is certainly the
> superset of what a path can achieve, but to say that OPPs for
> interconnect path should match that superset is an oversimplification
> of the reality in hardware.
>
> There are lots of reasons an interconnect path might not want to use
> all the available bandwidth options across all the interconnects in
> the route.
>
> 1. That particular path might not have been validated or verified
>    during the HW design process for some of the frequencies/bandwidth
>    combinations of the providers.

All these constraint are provider's constraints and not consumer's one

The consumer asks for a bandwidth according to its needs and then the
providers select the optimal bandwidth of each interconnect after
aggregating all the request and according to what OPP have been
validated

>
> 2. Similarly during parts screening in the factory, some of the
>    combinations might not have been screened and can't be guaranteed
>    to work.

As above, it's the provider's job to select the final bandwidth
according to its constraint

>
> 3. Only a certain set of bandwidth levels might make sense to use from
>    a power/performance balance given the device using it. For example:
>    - The big CPU might not want to use some of the lower bandwidths
>      but the little CPU might want to.
>    - The big CPU might not want to use some intermediate bandwidth
>      points if they don't save a lot of power compared to a higher
>      bandwidth levels, but the little CPU might want to.
>    - The little CPU might never want to use the higher set of
>      bandwidth levels since they won't be power efficient for the use
>      cases that might run on it.

These example are quite vague about the reasons why little might never
want to use higher bandwidth.
But then, if little doesn't ask high bandwidth it will not use them.

>
> 4. It might not make sense from a system level power perspective.
> Let's take an example of a path S (source) -> A -> B -> C -> D
> (destination).
>    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
>      C supports 5 and 10 GB/s
>    - If you combine and list the superset of bandwidth levels
>      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
>    - Which set of bandwidth levels make sense will depend on the
>      hardware characteristics of the interconnects.
>    - If B is the biggest power sink, then you might want to use all 10
>      levels.
>    - If A is the biggest power sink, then you might want to use all 2,
>      5 and 10 GB/s of the levels.
>    - If C is the biggest power sink then you might only want to use 5
>      and 10 GB/s
>    - The more hops and paths you get the more convoluted this gets.
>
> 5. The design of the interconnects themselves might have an impact on
> which bandwidth levels are used.
>    - For example, the FIFO depth between two specific interconnects
>      might affect the valid bandwidth levels for a specific path.
>    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
>      three paths.
>    - If C <-> B FIFO depth is small, then there might be a requirement
>      that C and B be closely performance matched to avoid system level
>      congestion due to back pressure.
>    - So S2 -> D1 path can't use all the bandwidth levels supported by
>      C-B combination.
>    - But S2 -> D2 can use all the bandwidth levels supported by C.
>    - And S1 -> D1 can use all the levels supported by A-B combination.
>

All the examples above makes sense but have to be handle by the
provider not the consumer. The consumer asks for a bandwidth according
to its constraints. Then the provider which is the driver that manages
the interconnect IP, should manage all this hardware and platform
specific stuff related to the interconnect IP in order to set the
optimal bandwidth that fit both consumer constraint and platform
specific configuration.

> These are just some of the reasons I could recollect in a few minutes.
> These are all real world cases I had to deal with in the past several
> years of dealing with scaling interconnects. I'm sure vendors and SoCs
> I'm not familiar with have other good reasons I'm not aware of.
>
> Trying to figure this all out by aggregating OPP tables of
> interconnect providers just isn't feasible nor is it efficient. The
> OPP tables for an interconnect path is describing the valid BW levels
> supported by that path and verified in hardware and makes a lot of
> sense to capture it clearly in DT.
>
> > So such kind of OPP table should be at
> > provider level but not at path level.
>
> They can also use it if they want to, but they'll probably want to use
> a frequency OPP table.
>
>
> -Saravana
>
> >
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > >  include/linux/interconnect.h |  7 +++++++
> > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > index 871eb4bc4efc..881bac80bc1e 100644
> > > --- a/drivers/interconnect/core.c
> > > +++ b/drivers/interconnect/core.c
> > > @@ -47,6 +47,7 @@ struct icc_req {
> > >   */
> > >  struct icc_path {
> > >         size_t num_nodes;
> > > +       struct opp_table *opp_table;
> > >         struct icc_req reqs[];
> > >  };
> > >
> > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > >  {
> > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > >         struct icc_node *src_node, *dst_node;
> > > -       struct device_node *np = NULL;
> > > +       struct device_node *np = NULL, *opp_node;
> > >         struct of_phandle_args src_args, dst_args;
> > >         int idx = 0;
> > >         int ret;
> > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > >         mutex_unlock(&icc_lock);
> > >
> > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > +       if (opp_node) {
> > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > +               of_node_put(opp_node);
> > > +       }
> > > +
> > > +
> > >         return path;
> > >  }
> > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > >
> > > +/**
> > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > + * @path: reference to the path returned by icc_get()
> > > + *
> > > + * This function will return the OPP table that corresponds to a path handle.
> > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > + * they don't have to.
> > > + *
> > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > + * disabled or the OPP table is missing.
> > > + */
> > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > +{
> > > +       return path->opp_table;
> > > +}
> > > +
> > >  /**
> > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > >   * @path: reference to the path returned by icc_get()
> > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > index dc25864755ba..0c0bc55f0e89 100644
> > > --- a/include/linux/interconnect.h
> > > +++ b/include/linux/interconnect.h
> > > @@ -9,6 +9,7 @@
> > >
> > >  #include <linux/mutex.h>
> > >  #include <linux/types.h>
> > > +#include <linux/pm_opp.h>
> > >
> > >  /* macros for converting to icc units */
> > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > @@ -28,6 +29,7 @@ struct device;
> > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > >                          const int dst_id);
> > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > >  void icc_put(struct icc_path *path);
> > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > >
> > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > >  {
> > >  }
> > >
> > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > +{
> > > +       return NULL;
> > > +}
> > > +
> > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > >  {
> > >         return 0;
> > > --
> > > 2.22.0.410.gd8fdbe21b5-goog
> > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-04  7:12       ` Vincent Guittot
@ 2019-07-07 21:48         ` Saravana Kannan
  2019-07-09  7:25           ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-07 21:48 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Interconnect paths can have different performance points. Now that OPP
> > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > interconnects.
> > > >
> > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > an interconnect path by calling icc_get_opp_table().
> > >
> > > The opp table of a path must come from the aggregation of OPP tables
> > > of the interconnect providers.
> >
> > The aggregation of OPP tables of the providers is certainly the
> > superset of what a path can achieve, but to say that OPPs for
> > interconnect path should match that superset is an oversimplification
> > of the reality in hardware.
> >
> > There are lots of reasons an interconnect path might not want to use
> > all the available bandwidth options across all the interconnects in
> > the route.
> >
> > 1. That particular path might not have been validated or verified
> >    during the HW design process for some of the frequencies/bandwidth
> >    combinations of the providers.
>
> All these constraint are provider's constraints and not consumer's one
>
> The consumer asks for a bandwidth according to its needs and then the
> providers select the optimal bandwidth of each interconnect after
> aggregating all the request and according to what OPP have been
> validated

Not really. The screening can be a consumer specific issue. The
consumer IP itself might have some issue with using too low of a
bandwidth or bandwidth that's not within some range. It should not be
the provider's job to take into account all the IP that might be
connected to the interconnects. If the interconnect HW itself didn't
change, the provider driver shouldn't need to change. By your
definition, a provider driver will have to account for all the
possible bus masters that might be connected to it across all SoCs.
That's not good design nor is it scalable.

> >
> > 2. Similarly during parts screening in the factory, some of the
> >    combinations might not have been screened and can't be guaranteed
> >    to work.
>
> As above, it's the provider's job to select the final bandwidth
> according to its constraint

Same reply as above.

> >
> > 3. Only a certain set of bandwidth levels might make sense to use from
> >    a power/performance balance given the device using it. For example:
> >    - The big CPU might not want to use some of the lower bandwidths
> >      but the little CPU might want to.
> >    - The big CPU might not want to use some intermediate bandwidth
> >      points if they don't save a lot of power compared to a higher
> >      bandwidth levels, but the little CPU might want to.
> >    - The little CPU might never want to use the higher set of
> >      bandwidth levels since they won't be power efficient for the use
> >      cases that might run on it.
>
> These example are quite vague about the reasons why little might never
> want to use higher bandwidth.

How is it vague? I just said because of power/performance balance.

> But then, if little doesn't ask high bandwidth it will not use them.

If you are running a heuristics based algorithm to pick bandwidth,
this is how it'll know NOT to use some of the bandwidth levels.

> >
> > 4. It might not make sense from a system level power perspective.
> > Let's take an example of a path S (source) -> A -> B -> C -> D
> > (destination).
> >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> >      C supports 5 and 10 GB/s
> >    - If you combine and list the superset of bandwidth levels
> >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> >    - Which set of bandwidth levels make sense will depend on the
> >      hardware characteristics of the interconnects.
> >    - If B is the biggest power sink, then you might want to use all 10
> >      levels.
> >    - If A is the biggest power sink, then you might want to use all 2,
> >      5 and 10 GB/s of the levels.
> >    - If C is the biggest power sink then you might only want to use 5
> >      and 10 GB/s
> >    - The more hops and paths you get the more convoluted this gets.
> >
> > 5. The design of the interconnects themselves might have an impact on
> > which bandwidth levels are used.
> >    - For example, the FIFO depth between two specific interconnects
> >      might affect the valid bandwidth levels for a specific path.
> >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> >      three paths.
> >    - If C <-> B FIFO depth is small, then there might be a requirement
> >      that C and B be closely performance matched to avoid system level
> >      congestion due to back pressure.
> >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> >      C-B combination.
> >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> >    - And S1 -> D1 can use all the levels supported by A-B combination.
> >
>
> All the examples above makes sense but have to be handle by the
> provider not the consumer. The consumer asks for a bandwidth according
> to its constraints. Then the provider which is the driver that manages
> the interconnect IP, should manage all this hardware and platform
> specific stuff related to the interconnect IP in order to set the
> optimal bandwidth that fit both consumer constraint and platform
> specific configuration.

Sure, but the provider itself can have interconnect properties to
indicate which other interconnects it's tied to. And the provider will
still need the interconnect-opp-table to denote which bandwidth levels
are sensible to use with each of its connections.

So in some instances the interconnect-opp-table covers the needs of
purely consumers and in some instances purely providers. But in either
case, it's still needed to describe the hardware properly.

-Saravana

> > These are just some of the reasons I could recollect in a few minutes.
> > These are all real world cases I had to deal with in the past several
> > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > I'm not familiar with have other good reasons I'm not aware of.
> >
> > Trying to figure this all out by aggregating OPP tables of
> > interconnect providers just isn't feasible nor is it efficient. The
> > OPP tables for an interconnect path is describing the valid BW levels
> > supported by that path and verified in hardware and makes a lot of
> > sense to capture it clearly in DT.
> >
> > > So such kind of OPP table should be at
> > > provider level but not at path level.
> >
> > They can also use it if they want to, but they'll probably want to use
> > a frequency OPP table.
> >
> >
> > -Saravana
> >
> > >
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > >  include/linux/interconnect.h |  7 +++++++
> > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > --- a/drivers/interconnect/core.c
> > > > +++ b/drivers/interconnect/core.c
> > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > >   */
> > > >  struct icc_path {
> > > >         size_t num_nodes;
> > > > +       struct opp_table *opp_table;
> > > >         struct icc_req reqs[];
> > > >  };
> > > >
> > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > >  {
> > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > >         struct icc_node *src_node, *dst_node;
> > > > -       struct device_node *np = NULL;
> > > > +       struct device_node *np = NULL, *opp_node;
> > > >         struct of_phandle_args src_args, dst_args;
> > > >         int idx = 0;
> > > >         int ret;
> > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > >         mutex_unlock(&icc_lock);
> > > >
> > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > +       if (opp_node) {
> > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > +               of_node_put(opp_node);
> > > > +       }
> > > > +
> > > > +
> > > >         return path;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > >
> > > > +/**
> > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > + * @path: reference to the path returned by icc_get()
> > > > + *
> > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > + * they don't have to.
> > > > + *
> > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > + * disabled or the OPP table is missing.
> > > > + */
> > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > +{
> > > > +       return path->opp_table;
> > > > +}
> > > > +
> > > >  /**
> > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > >   * @path: reference to the path returned by icc_get()
> > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > --- a/include/linux/interconnect.h
> > > > +++ b/include/linux/interconnect.h
> > > > @@ -9,6 +9,7 @@
> > > >
> > > >  #include <linux/mutex.h>
> > > >  #include <linux/types.h>
> > > > +#include <linux/pm_opp.h>
> > > >
> > > >  /* macros for converting to icc units */
> > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > @@ -28,6 +29,7 @@ struct device;
> > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > >                          const int dst_id);
> > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > >  void icc_put(struct icc_path *path);
> > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > >
> > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > >  {
> > > >  }
> > > >
> > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > +{
> > > > +       return NULL;
> > > > +}
> > > > +
> > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > >  {
> > > >         return 0;
> > > > --
> > > > 2.22.0.410.gd8fdbe21b5-goog
> > > >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-07 21:48         ` Saravana Kannan
@ 2019-07-09  7:25           ` Vincent Guittot
  2019-07-09 19:02             ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Vincent Guittot @ 2019-07-09  7:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Sun, 7 Jul 2019 at 23:48, Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > Interconnect paths can have different performance points. Now that OPP
> > > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > > interconnects.
> > > > >
> > > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > > an interconnect path by calling icc_get_opp_table().
> > > >
> > > > The opp table of a path must come from the aggregation of OPP tables
> > > > of the interconnect providers.
> > >
> > > The aggregation of OPP tables of the providers is certainly the
> > > superset of what a path can achieve, but to say that OPPs for
> > > interconnect path should match that superset is an oversimplification
> > > of the reality in hardware.
> > >
> > > There are lots of reasons an interconnect path might not want to use
> > > all the available bandwidth options across all the interconnects in
> > > the route.
> > >
> > > 1. That particular path might not have been validated or verified
> > >    during the HW design process for some of the frequencies/bandwidth
> > >    combinations of the providers.
> >
> > All these constraint are provider's constraints and not consumer's one
> >
> > The consumer asks for a bandwidth according to its needs and then the
> > providers select the optimal bandwidth of each interconnect after
> > aggregating all the request and according to what OPP have been
> > validated
>
> Not really. The screening can be a consumer specific issue. The
> consumer IP itself might have some issue with using too low of a
> bandwidth or bandwidth that's not within some range. It should not be

How can an IP ask for not enough bandwidth ?
It asks the needed bandwidth based on its requirements

> the provider's job to take into account all the IP that might be
> connected to the interconnects. If the interconnect HW itself didn't

That's not what I'm saying. The provider knows which bandwidth the
interconnect can provide as it is the ones which configures it. So if
the interconnect has a finite number of bandwidth point based probably
on the possible clock frequency and others config of the interconnect,
it selects the best final config after aggregating the request of the
consumer.

> change, the provider driver shouldn't need to change. By your
> definition, a provider driver will have to account for all the
> possible bus masters that might be connected to it across all SoCs.

you didn't catch my point

> That's not good design nor is it scalable.
>
> > >
> > > 2. Similarly during parts screening in the factory, some of the
> > >    combinations might not have been screened and can't be guaranteed
> > >    to work.
> >
> > As above, it's the provider's job to select the final bandwidth
> > according to its constraint
>
> Same reply as above.
>
> > >
> > > 3. Only a certain set of bandwidth levels might make sense to use from
> > >    a power/performance balance given the device using it. For example:
> > >    - The big CPU might not want to use some of the lower bandwidths
> > >      but the little CPU might want to.
> > >    - The big CPU might not want to use some intermediate bandwidth
> > >      points if they don't save a lot of power compared to a higher
> > >      bandwidth levels, but the little CPU might want to.
> > >    - The little CPU might never want to use the higher set of
> > >      bandwidth levels since they won't be power efficient for the use
> > >      cases that might run on it.
> >
> > These example are quite vague about the reasons why little might never
> > want to use higher bandwidth.
>
> How is it vague? I just said because of power/performance balance.
>
> > But then, if little doesn't ask high bandwidth it will not use them.
>
> If you are running a heuristics based algorithm to pick bandwidth,
> this is how it'll know NOT to use some of the bandwidth levels.

so you want to set a bandwidth according to the cpu frequency which is
what has been proposed in other thread

>
> > >
> > > 4. It might not make sense from a system level power perspective.
> > > Let's take an example of a path S (source) -> A -> B -> C -> D
> > > (destination).
> > >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> > >      C supports 5 and 10 GB/s
> > >    - If you combine and list the superset of bandwidth levels
> > >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> > >    - Which set of bandwidth levels make sense will depend on the
> > >      hardware characteristics of the interconnects.
> > >    - If B is the biggest power sink, then you might want to use all 10
> > >      levels.
> > >    - If A is the biggest power sink, then you might want to use all 2,
> > >      5 and 10 GB/s of the levels.
> > >    - If C is the biggest power sink then you might only want to use 5
> > >      and 10 GB/s
> > >    - The more hops and paths you get the more convoluted this gets.
> > >
> > > 5. The design of the interconnects themselves might have an impact on
> > > which bandwidth levels are used.
> > >    - For example, the FIFO depth between two specific interconnects
> > >      might affect the valid bandwidth levels for a specific path.
> > >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> > >      three paths.
> > >    - If C <-> B FIFO depth is small, then there might be a requirement
> > >      that C and B be closely performance matched to avoid system level
> > >      congestion due to back pressure.
> > >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> > >      C-B combination.
> > >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> > >    - And S1 -> D1 can use all the levels supported by A-B combination.
> > >
> >
> > All the examples above makes sense but have to be handle by the
> > provider not the consumer. The consumer asks for a bandwidth according
> > to its constraints. Then the provider which is the driver that manages
> > the interconnect IP, should manage all this hardware and platform
> > specific stuff related to the interconnect IP in order to set the
> > optimal bandwidth that fit both consumer constraint and platform
> > specific configuration.
>
> Sure, but the provider itself can have interconnect properties to
> indicate which other interconnects it's tied to. And the provider will
> still need the interconnect-opp-table to denote which bandwidth levels
> are sensible to use with each of its connections.
>
> So in some instances the interconnect-opp-table covers the needs of
> purely consumers and in some instances purely providers. But in either
> case, it's still needed to describe the hardware properly.
>
> -Saravana
>
> > > These are just some of the reasons I could recollect in a few minutes.
> > > These are all real world cases I had to deal with in the past several
> > > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > > I'm not familiar with have other good reasons I'm not aware of.
> > >
> > > Trying to figure this all out by aggregating OPP tables of
> > > interconnect providers just isn't feasible nor is it efficient. The
> > > OPP tables for an interconnect path is describing the valid BW levels
> > > supported by that path and verified in hardware and makes a lot of
> > > sense to capture it clearly in DT.
> > >
> > > > So such kind of OPP table should be at
> > > > provider level but not at path level.
> > >
> > > They can also use it if they want to, but they'll probably want to use
> > > a frequency OPP table.
> > >
> > >
> > > -Saravana
> > >
> > > >
> > > > >
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > ---
> > > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > > >  include/linux/interconnect.h |  7 +++++++
> > > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > > --- a/drivers/interconnect/core.c
> > > > > +++ b/drivers/interconnect/core.c
> > > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > > >   */
> > > > >  struct icc_path {
> > > > >         size_t num_nodes;
> > > > > +       struct opp_table *opp_table;
> > > > >         struct icc_req reqs[];
> > > > >  };
> > > > >
> > > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > >  {
> > > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > > >         struct icc_node *src_node, *dst_node;
> > > > > -       struct device_node *np = NULL;
> > > > > +       struct device_node *np = NULL, *opp_node;
> > > > >         struct of_phandle_args src_args, dst_args;
> > > > >         int idx = 0;
> > > > >         int ret;
> > > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > > >         mutex_unlock(&icc_lock);
> > > > >
> > > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > > +       if (opp_node) {
> > > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > > +               of_node_put(opp_node);
> > > > > +       }
> > > > > +
> > > > > +
> > > > >         return path;
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > > >
> > > > > +/**
> > > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > > + * @path: reference to the path returned by icc_get()
> > > > > + *
> > > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > > + * they don't have to.
> > > > > + *
> > > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > > + * disabled or the OPP table is missing.
> > > > > + */
> > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > +{
> > > > > +       return path->opp_table;
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > > >   * @path: reference to the path returned by icc_get()
> > > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > > --- a/include/linux/interconnect.h
> > > > > +++ b/include/linux/interconnect.h
> > > > > @@ -9,6 +9,7 @@
> > > > >
> > > > >  #include <linux/mutex.h>
> > > > >  #include <linux/types.h>
> > > > > +#include <linux/pm_opp.h>
> > > > >
> > > > >  /* macros for converting to icc units */
> > > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > > @@ -28,6 +29,7 @@ struct device;
> > > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > > >                          const int dst_id);
> > > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > > >  void icc_put(struct icc_path *path);
> > > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > > >
> > > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > > >  {
> > > > >  }
> > > > >
> > > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > +{
> > > > > +       return NULL;
> > > > > +}
> > > > > +
> > > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > > >  {
> > > > >         return 0;
> > > > > --
> > > > > 2.22.0.410.gd8fdbe21b5-goog
> > > > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-09  7:25           ` Vincent Guittot
@ 2019-07-09 19:02             ` Saravana Kannan
  2019-07-15  8:16               ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-09 19:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Tue, Jul 9, 2019 at 12:25 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Sun, 7 Jul 2019 at 23:48, Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > > <vincent.guittot@linaro.org> wrote:
> > > > >
> > > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > Interconnect paths can have different performance points. Now that OPP
> > > > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > > > interconnects.
> > > > > >
> > > > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > > > an interconnect path by calling icc_get_opp_table().
> > > > >
> > > > > The opp table of a path must come from the aggregation of OPP tables
> > > > > of the interconnect providers.
> > > >
> > > > The aggregation of OPP tables of the providers is certainly the
> > > > superset of what a path can achieve, but to say that OPPs for
> > > > interconnect path should match that superset is an oversimplification
> > > > of the reality in hardware.
> > > >
> > > > There are lots of reasons an interconnect path might not want to use
> > > > all the available bandwidth options across all the interconnects in
> > > > the route.
> > > >
> > > > 1. That particular path might not have been validated or verified
> > > >    during the HW design process for some of the frequencies/bandwidth
> > > >    combinations of the providers.
> > >
> > > All these constraint are provider's constraints and not consumer's one
> > >
> > > The consumer asks for a bandwidth according to its needs and then the
> > > providers select the optimal bandwidth of each interconnect after
> > > aggregating all the request and according to what OPP have been
> > > validated
> >
> > Not really. The screening can be a consumer specific issue. The
> > consumer IP itself might have some issue with using too low of a
> > bandwidth or bandwidth that's not within some range. It should not be
>
> How can an IP ask for not enough bandwidth ?
> It asks the needed bandwidth based on its requirements

The "enough bandwidth" is not always obvious. It's only for very
simple cases that you can calculate the required bandwidth. Even for
cases that you think might be "obvious/easy" aren't always easy.

For example, you'd think a display IP would have a fixed bandwidth
requirement for a fixed resolution screen. But that's far from the
truth. It can also change as the number of layers change per frame.
For video decoder/encoder, it depends on how well the frames compress
with a specific compression scheme.
So the "required" bandwidth is often a heuristic based on the IP
frequency or traffic measurement.

But that's not even the point I was making in this specific "bullet".

A hardware IP might be screen/verified with only certain bandwidth
levels. Or it might have hardware bugs that prevent it from using
lower bandwidths even though it's technically sufficient. We need a
way to capture that per path. This is not even a fictional case. This
has been true multiple times over widely used IPs.

> > the provider's job to take into account all the IP that might be
> > connected to the interconnects. If the interconnect HW itself didn't
>
> That's not what I'm saying. The provider knows which bandwidth the
> interconnect can provide as it is the ones which configures it. So if
> the interconnect has a finite number of bandwidth point based probably
> on the possible clock frequency and others config of the interconnect,
> it selects the best final config after aggregating the request of the
> consumer.

I completely agree with this. What you are stating above is how it
should work and that's the whole point of the interconnect framework.

But this is orthogonal to the point I'm making.

> > change, the provider driver shouldn't need to change. By your
> > definition, a provider driver will have to account for all the
> > possible bus masters that might be connected to it across all SoCs.
>
> you didn't catch my point

Same. I think we are talking over each other. Let me try again.

You are trying to describe how and interconnect provider and framework
should work. There's no disagreement there.

My point is that consumers might not want to or can not always use all
the available bandwidth levels offered by the providers. There can be
many reasons for that (which is what I listed in my earlier emails)
and we need a good and generic way to capture that so that everyone
isn't trying to invent their own property.

> > That's not good design nor is it scalable.
> >
> > > >
> > > > 2. Similarly during parts screening in the factory, some of the
> > > >    combinations might not have been screened and can't be guaranteed
> > > >    to work.
> > >
> > > As above, it's the provider's job to select the final bandwidth
> > > according to its constraint
> >
> > Same reply as above.
> >
> > > >
> > > > 3. Only a certain set of bandwidth levels might make sense to use from
> > > >    a power/performance balance given the device using it. For example:
> > > >    - The big CPU might not want to use some of the lower bandwidths
> > > >      but the little CPU might want to.
> > > >    - The big CPU might not want to use some intermediate bandwidth
> > > >      points if they don't save a lot of power compared to a higher
> > > >      bandwidth levels, but the little CPU might want to.
> > > >    - The little CPU might never want to use the higher set of
> > > >      bandwidth levels since they won't be power efficient for the use
> > > >      cases that might run on it.
> > >
> > > These example are quite vague about the reasons why little might never
> > > want to use higher bandwidth.
> >
> > How is it vague? I just said because of power/performance balance.
> >
> > > But then, if little doesn't ask high bandwidth it will not use them.
> >
> > If you are running a heuristics based algorithm to pick bandwidth,
> > this is how it'll know NOT to use some of the bandwidth levels.
>
> so you want to set a bandwidth according to the cpu frequency which is
> what has been proposed in other thread

Nope, that's just one heuristic. Often times it's based on hardware
monitors measuring interconnect activity. If you go look at the SDM845
in a Pixel 3, almost nothing is directly tied to the CPU frequency.

Even if you are scaling bandwidth based on other hardware
measurements, you might want to avoid some bandwidth level provided by
the interconnect providers because it's suboptimal.

For example, when making bandwidth votes to accommodate the big CPUs,
you might never want to use some of the lower bandwidth levels because
they are not power efficient for any CPU frequency or any bandwidth
level. Because at those levels the memory/interconnect is so slow that
it has a non-trivial utilization increase (because the CPU is
stalling) of the big CPUs.

Again, this is completely different from what the providers/icc
framework does. Which is, once the request is made, they aggregate and
set the actual interconnect frequencies correctly.

> >
> > > >
> > > > 4. It might not make sense from a system level power perspective.
> > > > Let's take an example of a path S (source) -> A -> B -> C -> D
> > > > (destination).
> > > >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> > > >      C supports 5 and 10 GB/s
> > > >    - If you combine and list the superset of bandwidth levels
> > > >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> > > >    - Which set of bandwidth levels make sense will depend on the
> > > >      hardware characteristics of the interconnects.
> > > >    - If B is the biggest power sink, then you might want to use all 10
> > > >      levels.
> > > >    - If A is the biggest power sink, then you might want to use all 2,
> > > >      5 and 10 GB/s of the levels.
> > > >    - If C is the biggest power sink then you might only want to use 5
> > > >      and 10 GB/s
> > > >    - The more hops and paths you get the more convoluted this gets.
> > > >
> > > > 5. The design of the interconnects themselves might have an impact on
> > > > which bandwidth levels are used.
> > > >    - For example, the FIFO depth between two specific interconnects
> > > >      might affect the valid bandwidth levels for a specific path.
> > > >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> > > >      three paths.
> > > >    - If C <-> B FIFO depth is small, then there might be a requirement
> > > >      that C and B be closely performance matched to avoid system level
> > > >      congestion due to back pressure.
> > > >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> > > >      C-B combination.
> > > >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> > > >    - And S1 -> D1 can use all the levels supported by A-B combination.
> > > >
> > >
> > > All the examples above makes sense but have to be handle by the
> > > provider not the consumer. The consumer asks for a bandwidth according
> > > to its constraints. Then the provider which is the driver that manages
> > > the interconnect IP, should manage all this hardware and platform
> > > specific stuff related to the interconnect IP in order to set the
> > > optimal bandwidth that fit both consumer constraint and platform
> > > specific configuration.
> >
> > Sure, but the provider itself can have interconnect properties to
> > indicate which other interconnects it's tied to. And the provider will
> > still need the interconnect-opp-table to denote which bandwidth levels
> > are sensible to use with each of its connections.

You seem to have missed this comment.

Thanks,
Saravana

> > So in some instances the interconnect-opp-table covers the needs of
> > purely consumers and in some instances purely providers. But in either
> > case, it's still needed to describe the hardware properly.
> >
> > -Saravana
> >
> > > > These are just some of the reasons I could recollect in a few minutes.
> > > > These are all real world cases I had to deal with in the past several
> > > > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > > > I'm not familiar with have other good reasons I'm not aware of.
> > > >
> > > > Trying to figure this all out by aggregating OPP tables of
> > > > interconnect providers just isn't feasible nor is it efficient. The
> > > > OPP tables for an interconnect path is describing the valid BW levels
> > > > supported by that path and verified in hardware and makes a lot of
> > > > sense to capture it clearly in DT.
> > > >
> > > > > So such kind of OPP table should be at
> > > > > provider level but not at path level.
> > > >
> > > > They can also use it if they want to, but they'll probably want to use
> > > > a frequency OPP table.
> > > >
> > > >
> > > > -Saravana
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > ---
> > > > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > > > >  include/linux/interconnect.h |  7 +++++++
> > > > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > > > --- a/drivers/interconnect/core.c
> > > > > > +++ b/drivers/interconnect/core.c
> > > > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > > > >   */
> > > > > >  struct icc_path {
> > > > > >         size_t num_nodes;
> > > > > > +       struct opp_table *opp_table;
> > > > > >         struct icc_req reqs[];
> > > > > >  };
> > > > > >
> > > > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > >  {
> > > > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > > > >         struct icc_node *src_node, *dst_node;
> > > > > > -       struct device_node *np = NULL;
> > > > > > +       struct device_node *np = NULL, *opp_node;
> > > > > >         struct of_phandle_args src_args, dst_args;
> > > > > >         int idx = 0;
> > > > > >         int ret;
> > > > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > > > >         mutex_unlock(&icc_lock);
> > > > > >
> > > > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > > > +       if (opp_node) {
> > > > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > > > +               of_node_put(opp_node);
> > > > > > +       }
> > > > > > +
> > > > > > +
> > > > > >         return path;
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > > > >
> > > > > > +/**
> > > > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > > > + * @path: reference to the path returned by icc_get()
> > > > > > + *
> > > > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > > > + * they don't have to.
> > > > > > + *
> > > > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > > > + * disabled or the OPP table is missing.
> > > > > > + */
> > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > +{
> > > > > > +       return path->opp_table;
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > > > >   * @path: reference to the path returned by icc_get()
> > > > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > > > --- a/include/linux/interconnect.h
> > > > > > +++ b/include/linux/interconnect.h
> > > > > > @@ -9,6 +9,7 @@
> > > > > >
> > > > > >  #include <linux/mutex.h>
> > > > > >  #include <linux/types.h>
> > > > > > +#include <linux/pm_opp.h>
> > > > > >
> > > > > >  /* macros for converting to icc units */
> > > > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > > > @@ -28,6 +29,7 @@ struct device;
> > > > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > > > >                          const int dst_id);
> > > > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > > > >  void icc_put(struct icc_path *path);
> > > > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > > > >
> > > > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > > > >  {
> > > > > >  }
> > > > > >
> > > > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > +{
> > > > > > +       return NULL;
> > > > > > +}
> > > > > > +
> > > > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > > > >  {
> > > > > >         return 0;
> > > > > > --
> > > > > > 2.22.0.410.gd8fdbe21b5-goog
> > > > > >
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-09 19:02             ` Saravana Kannan
@ 2019-07-15  8:16               ` Vincent Guittot
  2019-07-16  0:55                 ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Vincent Guittot @ 2019-07-15  8:16 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Tue, 9 Jul 2019 at 21:03, Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jul 9, 2019 at 12:25 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Sun, 7 Jul 2019 at 23:48, Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > > > <vincent.guittot@linaro.org> wrote:
> > > > > >
> > > > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > Interconnect paths can have different performance points. Now that OPP
> > > > > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > > > > interconnects.
> > > > > > >
> > > > > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > > > > an interconnect path by calling icc_get_opp_table().
> > > > > >
> > > > > > The opp table of a path must come from the aggregation of OPP tables
> > > > > > of the interconnect providers.
> > > > >
> > > > > The aggregation of OPP tables of the providers is certainly the
> > > > > superset of what a path can achieve, but to say that OPPs for
> > > > > interconnect path should match that superset is an oversimplification
> > > > > of the reality in hardware.
> > > > >
> > > > > There are lots of reasons an interconnect path might not want to use
> > > > > all the available bandwidth options across all the interconnects in
> > > > > the route.
> > > > >
> > > > > 1. That particular path might not have been validated or verified
> > > > >    during the HW design process for some of the frequencies/bandwidth
> > > > >    combinations of the providers.
> > > >
> > > > All these constraint are provider's constraints and not consumer's one
> > > >
> > > > The consumer asks for a bandwidth according to its needs and then the
> > > > providers select the optimal bandwidth of each interconnect after
> > > > aggregating all the request and according to what OPP have been
> > > > validated
> > >
> > > Not really. The screening can be a consumer specific issue. The
> > > consumer IP itself might have some issue with using too low of a
> > > bandwidth or bandwidth that's not within some range. It should not be
> >
> > How can an IP ask for not enough bandwidth ?
> > It asks the needed bandwidth based on its requirements
>
> The "enough bandwidth" is not always obvious. It's only for very
> simple cases that you can calculate the required bandwidth. Even for
> cases that you think might be "obvious/easy" aren't always easy.
>
> For example, you'd think a display IP would have a fixed bandwidth
> requirement for a fixed resolution screen. But that's far from the
> truth. It can also change as the number of layers change per frame.
> For video decoder/encoder, it depends on how well the frames compress
> with a specific compression scheme.
> So the "required" bandwidth is often a heuristic based on the IP
> frequency or traffic measurement.
>
> But that's not even the point I was making in this specific "bullet".
>
> A hardware IP might be screen/verified with only certain bandwidth
> levels. Or it might have hardware bugs that prevent it from using
> lower bandwidths even though it's technically sufficient. We need a
> way to capture that per path. This is not even a fictional case. This
> has been true multiple times over widely used IPs.

here you are mixing HW constraint on the soc and OPP screening with
bandwidth request from consumer
ICC framework is about getting bandwidth request not trying to fix
some HW/voltage dependency of the SoC

>
> > > the provider's job to take into account all the IP that might be
> > > connected to the interconnects. If the interconnect HW itself didn't
> >
> > That's not what I'm saying. The provider knows which bandwidth the
> > interconnect can provide as it is the ones which configures it. So if
> > the interconnect has a finite number of bandwidth point based probably
> > on the possible clock frequency and others config of the interconnect,
> > it selects the best final config after aggregating the request of the
> > consumer.
>
> I completely agree with this. What you are stating above is how it
> should work and that's the whole point of the interconnect framework.
>
> But this is orthogonal to the point I'm making.

It's not orthogonal because you want to add a OPP table pointer in the
ICC path structure to fix your platform HW constraint whereas it's not
the purpose of the framework IMO

>
> > > change, the provider driver shouldn't need to change. By your
> > > definition, a provider driver will have to account for all the
> > > possible bus masters that might be connected to it across all SoCs.
> >
> > you didn't catch my point
>
> Same. I think we are talking over each other. Let me try again.
>
> You are trying to describe how and interconnect provider and framework
> should work. There's no disagreement there.
>
> My point is that consumers might not want to or can not always use all
> the available bandwidth levels offered by the providers. There can be
> many reasons for that (which is what I listed in my earlier emails)
> and we need a good and generic way to capture that so that everyone
> isn't trying to invent their own property.

And my point is that you want to describe some platform or even UCs
specific constraint in the ICC framework which is not the place to do.

If the consumers might not want to use all available bandwidth because
this is not power efficient as an example, this should be describe
somewhere else to express  that there is a shared power domain
between some devices and we shoudl ensure that all devices in this
power domain should use the  Optimal Operating Point (optimal freq for
a voltage)
ICC framework describes the bandwidth request that are expressed by
the consumers for the current running state of their IP but it doesn't
reflect the fact that on platform A, the consumer should use bandwidth
X because it will select a voltage level of a shared power domain that
is optimized for the other devices B, C ... . It's up to the provider
to know HW details of the bus that it drives and to make such
decision;  the consumer should always request the same


>
> > > That's not good design nor is it scalable.
> > >
> > > > >
> > > > > 2. Similarly during parts screening in the factory, some of the
> > > > >    combinations might not have been screened and can't be guaranteed
> > > > >    to work.
> > > >
> > > > As above, it's the provider's job to select the final bandwidth
> > > > according to its constraint
> > >
> > > Same reply as above.
> > >
> > > > >
> > > > > 3. Only a certain set of bandwidth levels might make sense to use from
> > > > >    a power/performance balance given the device using it. For example:
> > > > >    - The big CPU might not want to use some of the lower bandwidths
> > > > >      but the little CPU might want to.
> > > > >    - The big CPU might not want to use some intermediate bandwidth
> > > > >      points if they don't save a lot of power compared to a higher
> > > > >      bandwidth levels, but the little CPU might want to.
> > > > >    - The little CPU might never want to use the higher set of
> > > > >      bandwidth levels since they won't be power efficient for the use
> > > > >      cases that might run on it.
> > > >
> > > > These example are quite vague about the reasons why little might never
> > > > want to use higher bandwidth.
> > >
> > > How is it vague? I just said because of power/performance balance.
> > >
> > > > But then, if little doesn't ask high bandwidth it will not use them.
> > >
> > > If you are running a heuristics based algorithm to pick bandwidth,
> > > this is how it'll know NOT to use some of the bandwidth levels.
> >
> > so you want to set a bandwidth according to the cpu frequency which is
> > what has been proposed in other thread
>
> Nope, that's just one heuristic. Often times it's based on hardware
> monitors measuring interconnect activity. If you go look at the SDM845
> in a Pixel 3, almost nothing is directly tied to the CPU frequency.
>
> Even if you are scaling bandwidth based on other hardware
> measurements, you might want to avoid some bandwidth level provided by
> the interconnect providers because it's suboptimal.
>
> For example, when making bandwidth votes to accommodate the big CPUs,
> you might never want to use some of the lower bandwidth levels because
> they are not power efficient for any CPU frequency or any bandwidth
> level. Because at those levels the memory/interconnect is so slow that
> it has a non-trivial utilization increase (because the CPU is
> stalling) of the big CPUs.
>
> Again, this is completely different from what the providers/icc
> framework does. Which is, once the request is made, they aggregate and
> set the actual interconnect frequencies correctly.
>
> > >
> > > > >
> > > > > 4. It might not make sense from a system level power perspective.
> > > > > Let's take an example of a path S (source) -> A -> B -> C -> D
> > > > > (destination).
> > > > >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> > > > >      C supports 5 and 10 GB/s
> > > > >    - If you combine and list the superset of bandwidth levels
> > > > >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> > > > >    - Which set of bandwidth levels make sense will depend on the
> > > > >      hardware characteristics of the interconnects.
> > > > >    - If B is the biggest power sink, then you might want to use all 10
> > > > >      levels.
> > > > >    - If A is the biggest power sink, then you might want to use all 2,
> > > > >      5 and 10 GB/s of the levels.
> > > > >    - If C is the biggest power sink then you might only want to use 5
> > > > >      and 10 GB/s
> > > > >    - The more hops and paths you get the more convoluted this gets.
> > > > >
> > > > > 5. The design of the interconnects themselves might have an impact on
> > > > > which bandwidth levels are used.
> > > > >    - For example, the FIFO depth between two specific interconnects
> > > > >      might affect the valid bandwidth levels for a specific path.
> > > > >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> > > > >      three paths.
> > > > >    - If C <-> B FIFO depth is small, then there might be a requirement
> > > > >      that C and B be closely performance matched to avoid system level
> > > > >      congestion due to back pressure.
> > > > >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> > > > >      C-B combination.
> > > > >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> > > > >    - And S1 -> D1 can use all the levels supported by A-B combination.
> > > > >
> > > >
> > > > All the examples above makes sense but have to be handle by the
> > > > provider not the consumer. The consumer asks for a bandwidth according
> > > > to its constraints. Then the provider which is the driver that manages
> > > > the interconnect IP, should manage all this hardware and platform
> > > > specific stuff related to the interconnect IP in order to set the
> > > > optimal bandwidth that fit both consumer constraint and platform
> > > > specific configuration.
> > >
> > > Sure, but the provider itself can have interconnect properties to
> > > indicate which other interconnects it's tied to. And the provider will
> > > still need the interconnect-opp-table to denote which bandwidth levels
> > > are sensible to use with each of its connections.
>
> You seem to have missed this comment.
>
> Thanks,
> Saravana
>
> > > So in some instances the interconnect-opp-table covers the needs of
> > > purely consumers and in some instances purely providers. But in either
> > > case, it's still needed to describe the hardware properly.
> > >
> > > -Saravana
> > >
> > > > > These are just some of the reasons I could recollect in a few minutes.
> > > > > These are all real world cases I had to deal with in the past several
> > > > > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > > > > I'm not familiar with have other good reasons I'm not aware of.
> > > > >
> > > > > Trying to figure this all out by aggregating OPP tables of
> > > > > interconnect providers just isn't feasible nor is it efficient. The
> > > > > OPP tables for an interconnect path is describing the valid BW levels
> > > > > supported by that path and verified in hardware and makes a lot of
> > > > > sense to capture it clearly in DT.
> > > > >
> > > > > > So such kind of OPP table should be at
> > > > > > provider level but not at path level.
> > > > >
> > > > > They can also use it if they want to, but they'll probably want to use
> > > > > a frequency OPP table.
> > > > >
> > > > >
> > > > > -Saravana
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > ---
> > > > > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > > > > >  include/linux/interconnect.h |  7 +++++++
> > > > > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > > > > --- a/drivers/interconnect/core.c
> > > > > > > +++ b/drivers/interconnect/core.c
> > > > > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > > > > >   */
> > > > > > >  struct icc_path {
> > > > > > >         size_t num_nodes;
> > > > > > > +       struct opp_table *opp_table;
> > > > > > >         struct icc_req reqs[];
> > > > > > >  };
> > > > > > >
> > > > > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > >  {
> > > > > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > > > > >         struct icc_node *src_node, *dst_node;
> > > > > > > -       struct device_node *np = NULL;
> > > > > > > +       struct device_node *np = NULL, *opp_node;
> > > > > > >         struct of_phandle_args src_args, dst_args;
> > > > > > >         int idx = 0;
> > > > > > >         int ret;
> > > > > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > > > > >         mutex_unlock(&icc_lock);
> > > > > > >
> > > > > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > > > > +       if (opp_node) {
> > > > > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > > > > +               of_node_put(opp_node);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +
> > > > > > >         return path;
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > > > > + * @path: reference to the path returned by icc_get()
> > > > > > > + *
> > > > > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > > > > + * they don't have to.
> > > > > > > + *
> > > > > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > > > > + * disabled or the OPP table is missing.
> > > > > > > + */
> > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > +{
> > > > > > > +       return path->opp_table;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /**
> > > > > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > > > > >   * @path: reference to the path returned by icc_get()
> > > > > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > > > > --- a/include/linux/interconnect.h
> > > > > > > +++ b/include/linux/interconnect.h
> > > > > > > @@ -9,6 +9,7 @@
> > > > > > >
> > > > > > >  #include <linux/mutex.h>
> > > > > > >  #include <linux/types.h>
> > > > > > > +#include <linux/pm_opp.h>
> > > > > > >
> > > > > > >  /* macros for converting to icc units */
> > > > > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > > > > @@ -28,6 +29,7 @@ struct device;
> > > > > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > > > > >                          const int dst_id);
> > > > > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > > > > >  void icc_put(struct icc_path *path);
> > > > > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > > > > >
> > > > > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > > > > >  {
> > > > > > >  }
> > > > > > >
> > > > > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > +{
> > > > > > > +       return NULL;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > > > > >  {
> > > > > > >         return 0;
> > > > > > > --
> > > > > > > 2.22.0.410.gd8fdbe21b5-goog
> > > > > > >
> > > >
> > > > --
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-15  8:16               ` Vincent Guittot
@ 2019-07-16  0:55                 ` Saravana Kannan
  2019-07-24  7:16                   ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-16  0:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Mon, Jul 15, 2019 at 1:16 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Tue, 9 Jul 2019 at 21:03, Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Tue, Jul 9, 2019 at 12:25 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Sun, 7 Jul 2019 at 23:48, Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> > > > <vincent.guittot@linaro.org> wrote:
> > > > >
> > > > > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > > > > <vincent.guittot@linaro.org> wrote:
> > > > > > >
> > > > > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > >
> > > > > > > > Interconnect paths can have different performance points. Now that OPP
> > > > > > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > > > > > interconnects.
> > > > > > > >
> > > > > > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > > > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > > > > > an interconnect path by calling icc_get_opp_table().
> > > > > > >
> > > > > > > The opp table of a path must come from the aggregation of OPP tables
> > > > > > > of the interconnect providers.
> > > > > >
> > > > > > The aggregation of OPP tables of the providers is certainly the
> > > > > > superset of what a path can achieve, but to say that OPPs for
> > > > > > interconnect path should match that superset is an oversimplification
> > > > > > of the reality in hardware.
> > > > > >
> > > > > > There are lots of reasons an interconnect path might not want to use
> > > > > > all the available bandwidth options across all the interconnects in
> > > > > > the route.
> > > > > >
> > > > > > 1. That particular path might not have been validated or verified
> > > > > >    during the HW design process for some of the frequencies/bandwidth
> > > > > >    combinations of the providers.
> > > > >
> > > > > All these constraint are provider's constraints and not consumer's one
> > > > >
> > > > > The consumer asks for a bandwidth according to its needs and then the
> > > > > providers select the optimal bandwidth of each interconnect after
> > > > > aggregating all the request and according to what OPP have been
> > > > > validated
> > > >
> > > > Not really. The screening can be a consumer specific issue. The
> > > > consumer IP itself might have some issue with using too low of a
> > > > bandwidth or bandwidth that's not within some range. It should not be
> > >
> > > How can an IP ask for not enough bandwidth ?
> > > It asks the needed bandwidth based on its requirements
> >
> > The "enough bandwidth" is not always obvious. It's only for very
> > simple cases that you can calculate the required bandwidth. Even for
> > cases that you think might be "obvious/easy" aren't always easy.
> >
> > For example, you'd think a display IP would have a fixed bandwidth
> > requirement for a fixed resolution screen. But that's far from the
> > truth. It can also change as the number of layers change per frame.
> > For video decoder/encoder, it depends on how well the frames compress
> > with a specific compression scheme.
> > So the "required" bandwidth is often a heuristic based on the IP
> > frequency or traffic measurement.
> >
> > But that's not even the point I was making in this specific "bullet".
> >
> > A hardware IP might be screen/verified with only certain bandwidth
> > levels. Or it might have hardware bugs that prevent it from using
> > lower bandwidths even though it's technically sufficient. We need a
> > way to capture that per path. This is not even a fictional case. This
> > has been true multiple times over widely used IPs.
>
> here you are mixing HW constraint on the soc and OPP screening with
> bandwidth request from consumer
> ICC framework is about getting bandwidth request not trying to fix
> some HW/voltage dependency of the SoC
>
> >
> > > > the provider's job to take into account all the IP that might be
> > > > connected to the interconnects. If the interconnect HW itself didn't
> > >
> > > That's not what I'm saying. The provider knows which bandwidth the
> > > interconnect can provide as it is the ones which configures it. So if
> > > the interconnect has a finite number of bandwidth point based probably
> > > on the possible clock frequency and others config of the interconnect,
> > > it selects the best final config after aggregating the request of the
> > > consumer.
> >
> > I completely agree with this. What you are stating above is how it
> > should work and that's the whole point of the interconnect framework.
> >
> > But this is orthogonal to the point I'm making.
>
> It's not orthogonal because you want to add a OPP table pointer in the
> ICC path structure to fix your platform HW constraint whereas it's not
> the purpose of the framework IMO
>
> >
> > > > change, the provider driver shouldn't need to change. By your
> > > > definition, a provider driver will have to account for all the
> > > > possible bus masters that might be connected to it across all SoCs.
> > >
> > > you didn't catch my point
> >
> > Same. I think we are talking over each other. Let me try again.
> >
> > You are trying to describe how and interconnect provider and framework
> > should work. There's no disagreement there.
> >
> > My point is that consumers might not want to or can not always use all
> > the available bandwidth levels offered by the providers. There can be
> > many reasons for that (which is what I listed in my earlier emails)
> > and we need a good and generic way to capture that so that everyone
> > isn't trying to invent their own property.
>
> And my point is that you want to describe some platform or even UCs
> specific constraint in the ICC framework which is not the place to do.
>
> If the consumers might not want to use all available bandwidth because
> this is not power efficient as an example, this should be describe
> somewhere else to express  that there is a shared power domain
> between some devices and we shoudl ensure that all devices in this
> power domain should use the  Optimal Operating Point (optimal freq for
> a voltage)

My patch series has nothing to do with shared power domains. I think
the examples have made it amply clear.

> ICC framework describes the bandwidth request that are expressed by
> the consumers for the current running state of their IP but it doesn't
> reflect the fact that on platform A, the consumer should use bandwidth
> X because it will select a voltage level of a shared power domain that
> is optimized for the other devices B, C ... . It's up to the provider
> to know HW details of the bus that it drives and to make such
> decision;  the consumer should always request the same

The change to ICC framework is practically just this. I don't have any
future changes planned for the ICC framework. This is the entirety of
it.

+       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
+       if (opp_node) {
+               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
+               of_node_put(opp_node);
+       }

It's quite a stretch and bit hyperbolic to say this one change is
getting ICC framework to do all the things you claim above.

It's literally a simple helper function so that the consumer doesn't
have to make assumptions about indices and it's a bit more explicit
about which OPP table of the device (a device can have multiple OPP
tables) corresponds to which ICC path.

Going by your extreme argument, one can also claim that it's not the
ICC framework's job to make it easy for consumers to figure out the
source/destination endpoints or give them names and delete the
interconnect and interconnect-names properties. That's clearly just as
absurd a claim.


-Saravana

> > > > That's not good design nor is it scalable.
> > > >
> > > > > >
> > > > > > 2. Similarly during parts screening in the factory, some of the
> > > > > >    combinations might not have been screened and can't be guaranteed
> > > > > >    to work.
> > > > >
> > > > > As above, it's the provider's job to select the final bandwidth
> > > > > according to its constraint
> > > >
> > > > Same reply as above.
> > > >
> > > > > >
> > > > > > 3. Only a certain set of bandwidth levels might make sense to use from
> > > > > >    a power/performance balance given the device using it. For example:
> > > > > >    - The big CPU might not want to use some of the lower bandwidths
> > > > > >      but the little CPU might want to.
> > > > > >    - The big CPU might not want to use some intermediate bandwidth
> > > > > >      points if they don't save a lot of power compared to a higher
> > > > > >      bandwidth levels, but the little CPU might want to.
> > > > > >    - The little CPU might never want to use the higher set of
> > > > > >      bandwidth levels since they won't be power efficient for the use
> > > > > >      cases that might run on it.
> > > > >
> > > > > These example are quite vague about the reasons why little might never
> > > > > want to use higher bandwidth.
> > > >
> > > > How is it vague? I just said because of power/performance balance.
> > > >
> > > > > But then, if little doesn't ask high bandwidth it will not use them.
> > > >
> > > > If you are running a heuristics based algorithm to pick bandwidth,
> > > > this is how it'll know NOT to use some of the bandwidth levels.
> > >
> > > so you want to set a bandwidth according to the cpu frequency which is
> > > what has been proposed in other thread
> >
> > Nope, that's just one heuristic. Often times it's based on hardware
> > monitors measuring interconnect activity. If you go look at the SDM845
> > in a Pixel 3, almost nothing is directly tied to the CPU frequency.
> >
> > Even if you are scaling bandwidth based on other hardware
> > measurements, you might want to avoid some bandwidth level provided by
> > the interconnect providers because it's suboptimal.
> >
> > For example, when making bandwidth votes to accommodate the big CPUs,
> > you might never want to use some of the lower bandwidth levels because
> > they are not power efficient for any CPU frequency or any bandwidth
> > level. Because at those levels the memory/interconnect is so slow that
> > it has a non-trivial utilization increase (because the CPU is
> > stalling) of the big CPUs.
> >
> > Again, this is completely different from what the providers/icc
> > framework does. Which is, once the request is made, they aggregate and
> > set the actual interconnect frequencies correctly.
> >
> > > >
> > > > > >
> > > > > > 4. It might not make sense from a system level power perspective.
> > > > > > Let's take an example of a path S (source) -> A -> B -> C -> D
> > > > > > (destination).
> > > > > >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> > > > > >      C supports 5 and 10 GB/s
> > > > > >    - If you combine and list the superset of bandwidth levels
> > > > > >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> > > > > >    - Which set of bandwidth levels make sense will depend on the
> > > > > >      hardware characteristics of the interconnects.
> > > > > >    - If B is the biggest power sink, then you might want to use all 10
> > > > > >      levels.
> > > > > >    - If A is the biggest power sink, then you might want to use all 2,
> > > > > >      5 and 10 GB/s of the levels.
> > > > > >    - If C is the biggest power sink then you might only want to use 5
> > > > > >      and 10 GB/s
> > > > > >    - The more hops and paths you get the more convoluted this gets.
> > > > > >
> > > > > > 5. The design of the interconnects themselves might have an impact on
> > > > > > which bandwidth levels are used.
> > > > > >    - For example, the FIFO depth between two specific interconnects
> > > > > >      might affect the valid bandwidth levels for a specific path.
> > > > > >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> > > > > >      three paths.
> > > > > >    - If C <-> B FIFO depth is small, then there might be a requirement
> > > > > >      that C and B be closely performance matched to avoid system level
> > > > > >      congestion due to back pressure.
> > > > > >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> > > > > >      C-B combination.
> > > > > >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> > > > > >    - And S1 -> D1 can use all the levels supported by A-B combination.
> > > > > >
> > > > >
> > > > > All the examples above makes sense but have to be handle by the
> > > > > provider not the consumer. The consumer asks for a bandwidth according
> > > > > to its constraints. Then the provider which is the driver that manages
> > > > > the interconnect IP, should manage all this hardware and platform
> > > > > specific stuff related to the interconnect IP in order to set the
> > > > > optimal bandwidth that fit both consumer constraint and platform
> > > > > specific configuration.
> > > >
> > > > Sure, but the provider itself can have interconnect properties to
> > > > indicate which other interconnects it's tied to. And the provider will
> > > > still need the interconnect-opp-table to denote which bandwidth levels
> > > > are sensible to use with each of its connections.
> >
> > You seem to have missed this comment.
> >
> > Thanks,
> > Saravana
> >
> > > > So in some instances the interconnect-opp-table covers the needs of
> > > > purely consumers and in some instances purely providers. But in either
> > > > case, it's still needed to describe the hardware properly.
> > > >
> > > > -Saravana
> > > >
> > > > > > These are just some of the reasons I could recollect in a few minutes.
> > > > > > These are all real world cases I had to deal with in the past several
> > > > > > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > > > > > I'm not familiar with have other good reasons I'm not aware of.
> > > > > >
> > > > > > Trying to figure this all out by aggregating OPP tables of
> > > > > > interconnect providers just isn't feasible nor is it efficient. The
> > > > > > OPP tables for an interconnect path is describing the valid BW levels
> > > > > > supported by that path and verified in hardware and makes a lot of
> > > > > > sense to capture it clearly in DT.
> > > > > >
> > > > > > > So such kind of OPP table should be at
> > > > > > > provider level but not at path level.
> > > > > >
> > > > > > They can also use it if they want to, but they'll probably want to use
> > > > > > a frequency OPP table.
> > > > > >
> > > > > >
> > > > > > -Saravana
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > ---
> > > > > > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > > > > > >  include/linux/interconnect.h |  7 +++++++
> > > > > > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > > > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > > > > > --- a/drivers/interconnect/core.c
> > > > > > > > +++ b/drivers/interconnect/core.c
> > > > > > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > > > > > >   */
> > > > > > > >  struct icc_path {
> > > > > > > >         size_t num_nodes;
> > > > > > > > +       struct opp_table *opp_table;
> > > > > > > >         struct icc_req reqs[];
> > > > > > > >  };
> > > > > > > >
> > > > > > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > > >  {
> > > > > > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > > > > > >         struct icc_node *src_node, *dst_node;
> > > > > > > > -       struct device_node *np = NULL;
> > > > > > > > +       struct device_node *np = NULL, *opp_node;
> > > > > > > >         struct of_phandle_args src_args, dst_args;
> > > > > > > >         int idx = 0;
> > > > > > > >         int ret;
> > > > > > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > > > > > >         mutex_unlock(&icc_lock);
> > > > > > > >
> > > > > > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > > > > > +       if (opp_node) {
> > > > > > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > > > > > +               of_node_put(opp_node);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +
> > > > > > > >         return path;
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > > > > > + * @path: reference to the path returned by icc_get()
> > > > > > > > + *
> > > > > > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > > > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > > > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > > > > > + * they don't have to.
> > > > > > > > + *
> > > > > > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > > > > > + * disabled or the OPP table is missing.
> > > > > > > > + */
> > > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > > +{
> > > > > > > > +       return path->opp_table;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /**
> > > > > > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > > > > > >   * @path: reference to the path returned by icc_get()
> > > > > > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > > > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > > > > > --- a/include/linux/interconnect.h
> > > > > > > > +++ b/include/linux/interconnect.h
> > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > >
> > > > > > > >  #include <linux/mutex.h>
> > > > > > > >  #include <linux/types.h>
> > > > > > > > +#include <linux/pm_opp.h>
> > > > > > > >
> > > > > > > >  /* macros for converting to icc units */
> > > > > > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > > > > > @@ -28,6 +29,7 @@ struct device;
> > > > > > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > > > > > >                          const int dst_id);
> > > > > > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > > > > > >  void icc_put(struct icc_path *path);
> > > > > > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > > > > > >
> > > > > > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > > > > > >  {
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > > +{
> > > > > > > > +       return NULL;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > > > > > >  {
> > > > > > > >         return 0;
> > > > > > > > --
> > > > > > > > 2.22.0.410.gd8fdbe21b5-goog
> > > > > > > >
> > > > >
> > > > > --
> > > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > > > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
@ 2019-07-16 17:25   ` Sibi Sankar
  2019-07-16 18:58     ` Saravana Kannan
  2019-07-17  7:54   ` Viresh Kumar
  2019-07-26 16:24   ` Georgi Djakov
  2 siblings, 1 reply; 48+ messages in thread
From: Sibi Sankar @ 2019-07-16 17:25 UTC (permalink / raw)
  To: Saravana Kannan, Georgi Djakov, Rob Herring, Mark Rutland,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: vincent.guittot, seansw, daidavid1, Rajendra Nayak,
	bjorn.andersson, evgreen, kernel-team, linux-pm, devicetree,
	linux-kernel

Hey Saravana,

https://patchwork.kernel.org/patch/10850815/
There was already a discussion ^^ on how bandwidth bindings were to be
named.

On 7/3/19 6:40 AM, Saravana Kannan wrote:
> Interconnects often quantify their performance points in terms of
> bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> allow specifying Bandwidth OPP tables in DT.
> 
> opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> tables.
> 
> opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> tables.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>   Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> index 76b6c79604a5..c869e87caa2a 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -83,9 +83,14 @@ properties.
>   
>   Required properties:
>   - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> -  required property for all device nodes but devices like power domains. The
> -  power domain nodes must have another (implementation dependent) property which
> -  uniquely identifies the OPP nodes.
> +  required property for all device nodes but for devices like power domains or
> +  bandwidth opp tables. The power domain nodes must have another (implementation
> +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> +  opps are required to have the opp-peak-bw property.
> +
> +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
> +  big-endian integer. This is a required property for all devices that don't
> +  have opp-hz. For example, bandwidth OPP tables for interconnect paths.
>   
>   Optional properties:
>   - opp-microvolt: voltage in micro Volts.
> @@ -132,6 +137,10 @@ Optional properties:
>   - opp-level: A value representing the performance level of the device,
>     expressed as a 32-bit integer.
>   
> +- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
> +  32-bit big-endian integer. This property is only meaningful in OPP tables
> +  where opp-peak-KBps is present.
> +
>   - clock-latency-ns: Specifies the maximum possible transition latency (in
>     nanoseconds) for switching to this OPP from any other OPP.
>   
> 

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables
  2019-07-03  1:10 ` [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables Saravana Kannan
@ 2019-07-16 17:33   ` Sibi Sankar
  2019-07-16 19:10     ` Saravana Kannan
  2019-07-30 10:57   ` Amit Kucheria
  1 sibling, 1 reply; 48+ messages in thread
From: Sibi Sankar @ 2019-07-16 17:33 UTC (permalink / raw)
  To: Saravana Kannan, Georgi Djakov, Rob Herring, Mark Rutland,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: vincent.guittot, seansw, daidavid1, Rajendra Nayak,
	bjorn.andersson, evgreen, kernel-team, linux-pm, devicetree,
	linux-kernel, adharmap

Hey Saravana,

On 7/3/19 6:40 AM, Saravana Kannan wrote:
> Not all devices quantify their performance points in terms of frequency.
> Devices like interconnects quantify their performance points in terms of
> bandwidth. We need a way to represent these bandwidth levels in OPP. So,
> add support for parsing bandwidth OPPs from DT.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>   drivers/opp/of.c  | 34 ++++++++++++++++++++++++++++++++--
>   drivers/opp/opp.h |  4 +++-
>   2 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index c10c782d15aa..54fa70ed2adc 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -552,6 +552,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>   
> +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
> +{
> +	int ret;
> +	u64 rate;
> +	u32 bw;
> +
> +	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	if (!ret) {
> +		/*
> +		 * Rate is defined as an unsigned long in clk API, and so
> +		 * casting explicitly to its type. Must be fixed once rate is 64
> +		 * bit guaranteed in clk API.
> +		 */
> +		new_opp->rate = (unsigned long)rate
now that the rate gets set here, please remove the rate assignment in
_opp_add_static_v2

> +		return 0;
> +	}
> +
> +	ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
> +	if (ret)
> +		return ret;
> +	new_opp->rate = (unsigned long) &bw;

should be bw instead

> +
> +	ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
> +	if (!ret)
> +		new_opp->avg_bw = (unsigned long) &bw;

ditto

> +
> +	return 0;
> +}
> +
>   /**
>    * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
>    * @opp_table:	OPP table
> @@ -589,11 +618,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
>   	if (!new_opp)
>   		return ERR_PTR(-ENOMEM);
>   
> -	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	ret = _read_opp_key(new_opp, np);
>   	if (ret < 0) {
>   		/* "opp-hz" is optional for devices like power domains. */
>   		if (!opp_table->is_genpd) {
> -			dev_err(dev, "%s: opp-hz not found\n", __func__);
> +			dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
> +				__func__);

please remove the else part where rate value will be reset.

>   			goto free_opp;
>   		}
>   
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index 569b3525aa67..ead2cdafe957 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -59,7 +59,8 @@ extern struct list_head opp_tables;
>    * @turbo:	true if turbo (boost) OPP
>    * @suspend:	true if suspend OPP
>    * @pstate: Device's power domain's performance state.
> - * @rate:	Frequency in hertz
> + * @rate:	Frequency in hertz OR Peak bandwidth in kilobytes per second
> + * @avg_bw:	Average bandwidth in kilobytes per second
>    * @level:	Performance level
>    * @supplies:	Power supplies voltage/current values
>    * @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's
> @@ -81,6 +82,7 @@ struct dev_pm_opp {
>   	bool suspend;
>   	unsigned int pstate;
>   	unsigned long rate;
> +	unsigned long avg_bw;
>   	unsigned int level;
>   
>   	struct dev_pm_opp_supply *supplies;
> 

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-16 17:25   ` Sibi Sankar
@ 2019-07-16 18:58     ` Saravana Kannan
  2019-07-22 23:35       ` Rob Herring
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-16 18:58 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
>
> Hey Saravana,
>
> https://patchwork.kernel.org/patch/10850815/
> There was already a discussion ^^ on how bandwidth bindings were to be
> named.

Yes, I'm aware of that series. That series is trying to define a BW
mapping for an existing frequency OPP table. This patch is NOT about
adding a mapping to an existing table. This patch is about adding the
notion of BW OPP tables where BW is the "key" instead of "frequency".

So let's not mixed up these two series.

-Saravana

> On 7/3/19 6:40 AM, Saravana Kannan wrote:
> > Interconnects often quantify their performance points in terms of
> > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > allow specifying Bandwidth OPP tables in DT.
> >
> > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > tables.
> >
> > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > tables.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >   Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> >   1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > index 76b6c79604a5..c869e87caa2a 100644
> > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > @@ -83,9 +83,14 @@ properties.
> >
> >   Required properties:
> >   - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > -  required property for all device nodes but devices like power domains. The
> > -  power domain nodes must have another (implementation dependent) property which
> > -  uniquely identifies the OPP nodes.
> > +  required property for all device nodes but for devices like power domains or
> > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > +  opps are required to have the opp-peak-bw property.
> > +
> > +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
> > +  big-endian integer. This is a required property for all devices that don't
> > +  have opp-hz. For example, bandwidth OPP tables for interconnect paths.
> >
> >   Optional properties:
> >   - opp-microvolt: voltage in micro Volts.
> > @@ -132,6 +137,10 @@ Optional properties:
> >   - opp-level: A value representing the performance level of the device,
> >     expressed as a 32-bit integer.
> >
> > +- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
> > +  32-bit big-endian integer. This property is only meaningful in OPP tables
> > +  where opp-peak-KBps is present.
> > +
> >   - clock-latency-ns: Specifies the maximum possible transition latency (in
> >     nanoseconds) for switching to this OPP from any other OPP.
> >
> >
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables
  2019-07-16 17:33   ` Sibi Sankar
@ 2019-07-16 19:10     ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-16 19:10 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	adharmap

On Tue, Jul 16, 2019 at 10:33 AM Sibi Sankar <sibis@codeaurora.org> wrote:
>
> Hey Saravana,
>
> On 7/3/19 6:40 AM, Saravana Kannan wrote:
> > Not all devices quantify their performance points in terms of frequency.
> > Devices like interconnects quantify their performance points in terms of
> > bandwidth. We need a way to represent these bandwidth levels in OPP. So,
> > add support for parsing bandwidth OPPs from DT.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >   drivers/opp/of.c  | 34 ++++++++++++++++++++++++++++++++--
> >   drivers/opp/opp.h |  4 +++-
> >   2 files changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > index c10c782d15aa..54fa70ed2adc 100644
> > --- a/drivers/opp/of.c
> > +++ b/drivers/opp/of.c
> > @@ -552,6 +552,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
> >   }
> >   EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
> >
> > +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
> > +{
> > +     int ret;
> > +     u64 rate;
> > +     u32 bw;
> > +
> > +     ret = of_property_read_u64(np, "opp-hz", &rate);
> > +     if (!ret) {
> > +             /*
> > +              * Rate is defined as an unsigned long in clk API, and so
> > +              * casting explicitly to its type. Must be fixed once rate is 64
> > +              * bit guaranteed in clk API.
> > +              */
> > +             new_opp->rate = (unsigned long)rate
> now that the rate gets set here, please remove the rate assignment in
> _opp_add_static_v2
>
> > +             return 0;
> > +     }
> > +
> > +     ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
> > +     if (ret)
> > +             return ret;
> > +     new_opp->rate = (unsigned long) &bw;
>
> should be bw instead

Good catch. Thanks!

>
> > +
> > +     ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
> > +     if (!ret)
> > +             new_opp->avg_bw = (unsigned long) &bw;
>
> ditto
>
> > +
> > +     return 0;
> > +}
> > +
> >   /**
> >    * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
> >    * @opp_table:      OPP table
> > @@ -589,11 +618,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
> >       if (!new_opp)
> >               return ERR_PTR(-ENOMEM);
> >
> > -     ret = of_property_read_u64(np, "opp-hz", &rate);
> > +     ret = _read_opp_key(new_opp, np);
> >       if (ret < 0) {
> >               /* "opp-hz" is optional for devices like power domains. */
> >               if (!opp_table->is_genpd) {
> > -                     dev_err(dev, "%s: opp-hz not found\n", __func__);
> > +                     dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
> > +                             __func__);
>
> please remove the else part where rate value will be reset.

Ah! I flipped the meaning of the "if" check in my head. Thanks!

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
  2019-07-16 17:25   ` Sibi Sankar
@ 2019-07-17  7:54   ` Viresh Kumar
  2019-07-17 20:29     ` Saravana Kannan
  2019-07-26 16:24   ` Georgi Djakov
  2 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-17  7:54 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, vincent.guittot,
	seansw, daidavid1, Rajendra Nayak, sibis, bjorn.andersson,
	evgreen, kernel-team, linux-pm, devicetree, linux-kernel

On 02-07-19, 18:10, Saravana Kannan wrote:
> Interconnects often quantify their performance points in terms of
> bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> allow specifying Bandwidth OPP tables in DT.
> 
> opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> tables.
> 
> opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> tables.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> index 76b6c79604a5..c869e87caa2a 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -83,9 +83,14 @@ properties.
>  
>  Required properties:
>  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> -  required property for all device nodes but devices like power domains. The
> -  power domain nodes must have another (implementation dependent) property which
> -  uniquely identifies the OPP nodes.
> +  required property for all device nodes but for devices like power domains or
> +  bandwidth opp tables. The power domain nodes must have another (implementation
> +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> +  opps are required to have the opp-peak-bw property.

                                   ??

> +
> +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
> +  big-endian integer. This is a required property for all devices that don't
> +  have opp-hz. For example, bandwidth OPP tables for interconnect paths.
>  
>  Optional properties:
>  - opp-microvolt: voltage in micro Volts.
> @@ -132,6 +137,10 @@ Optional properties:
>  - opp-level: A value representing the performance level of the device,
>    expressed as a 32-bit integer.
>  
> +- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
> +  32-bit big-endian integer. This property is only meaningful in OPP tables
> +  where opp-peak-KBps is present.
> +
>  - clock-latency-ns: Specifies the maximum possible transition latency (in
>    nanoseconds) for switching to this OPP from any other OPP.
>  
> -- 
> 2.22.0.410.gd8fdbe21b5-goog

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
                   ` (6 preceding siblings ...)
  2019-07-03  6:36 ` [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Viresh Kumar
@ 2019-07-17 10:32 ` Viresh Kumar
  2019-07-17 20:34   ` Saravana Kannan
  7 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-17 10:32 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, vincent.guittot,
	seansw, daidavid1, Rajendra Nayak, sibis, bjorn.andersson,
	evgreen, kernel-team, linux-pm, devicetree, linux-kernel

On 02-07-19, 18:10, Saravana Kannan wrote:
> Interconnects and interconnect paths quantify their performance levels in
> terms of bandwidth and not in terms of frequency. So similar to how we have
> frequency based OPP tables in DT and in the OPP framework, we need
> bandwidth OPP table support in the OPP framework and in DT. Since there can
> be more than one interconnect path used by a device, we also need a way to
> assign a bandwidth OPP table to an interconnect path.
> 
> This patch series:
> - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> - Adds interconnect-opp-table property to interconnect DT bindings
> - Adds OPP helper functions for bandwidth OPP tables
> - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> 
> So with the DT bindings added in this patch series, the DT for a GPU
> that does bandwidth voting from GPU to Cache and GPU to DDR would look
> something like this:
> 
> gpu_cache_opp_table: gpu_cache_opp_table {
> 	compatible = "operating-points-v2";
> 
> 	gpu_cache_3000: opp-3000 {
> 		opp-peak-KBps = <3000>;
> 		opp-avg-KBps = <1000>;
> 	};
> 	gpu_cache_6000: opp-6000 {
> 		opp-peak-KBps = <6000>;
> 		opp-avg-KBps = <2000>;
> 	};
> 	gpu_cache_9000: opp-9000 {
> 		opp-peak-KBps = <9000>;
> 		opp-avg-KBps = <9000>;
> 	};
> };
> 
> gpu_ddr_opp_table: gpu_ddr_opp_table {
> 	compatible = "operating-points-v2";
> 
> 	gpu_ddr_1525: opp-1525 {
> 		opp-peak-KBps = <1525>;
> 		opp-avg-KBps = <452>;
> 	};
> 	gpu_ddr_3051: opp-3051 {
> 		opp-peak-KBps = <3051>;
> 		opp-avg-KBps = <915>;
> 	};
> 	gpu_ddr_7500: opp-7500 {
> 		opp-peak-KBps = <7500>;
> 		opp-avg-KBps = <3000>;
> 	};
> };

Who is going to use the above tables and how ? These are the maximum
BW available over these paths, right ?

> gpu_opp_table: gpu_opp_table {
> 	compatible = "operating-points-v2";
> 	opp-shared;
> 
> 	opp-200000000 {
> 		opp-hz = /bits/ 64 <200000000>;
> 	};
> 	opp-400000000 {
> 		opp-hz = /bits/ 64 <400000000>;
> 	};
> };

Shouldn't this link back to the above tables via required-opp, etc ?
How will we know how much BW is required by the GPU device for all the
paths ?

> gpu@7864000 {
> 	...
> 	operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>;
> 	interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEM_CACHE>,
> 			<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
> 	interconnect-names = "gpu-cache", "gpu-mem";
> 	interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>
> };

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-17  7:54   ` Viresh Kumar
@ 2019-07-17 20:29     ` Saravana Kannan
  2019-07-18  4:35       ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-17 20:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects often quantify their performance points in terms of
> > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > allow specifying Bandwidth OPP tables in DT.
> >
> > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > tables.
> >
> > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > tables.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > index 76b6c79604a5..c869e87caa2a 100644
> > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > @@ -83,9 +83,14 @@ properties.
> >
> >  Required properties:
> >  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > -  required property for all device nodes but devices like power domains. The
> > -  power domain nodes must have another (implementation dependent) property which
> > -  uniquely identifies the OPP nodes.
> > +  required property for all device nodes but for devices like power domains or
> > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > +  opps are required to have the opp-peak-bw property.
>
>                                    ??

Sorry, what's the question? Was this an accidental email?

-Saravana

>
> > +
> > +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
> > +  big-endian integer. This is a required property for all devices that don't
> > +  have opp-hz. For example, bandwidth OPP tables for interconnect paths.
> >
> >  Optional properties:
> >  - opp-microvolt: voltage in micro Volts.
> > @@ -132,6 +137,10 @@ Optional properties:
> >  - opp-level: A value representing the performance level of the device,
> >    expressed as a 32-bit integer.
> >
> > +- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
> > +  32-bit big-endian integer. This property is only meaningful in OPP tables
> > +  where opp-peak-KBps is present.
> > +
> >  - clock-latency-ns: Specifies the maximum possible transition latency (in
> >    nanoseconds) for switching to this OPP from any other OPP.
> >
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
>
> --
> viresh
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-17 10:32 ` Viresh Kumar
@ 2019-07-17 20:34   ` Saravana Kannan
  2019-07-18  5:37     ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-17 20:34 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects and interconnect paths quantify their performance levels in
> > terms of bandwidth and not in terms of frequency. So similar to how we have
> > frequency based OPP tables in DT and in the OPP framework, we need
> > bandwidth OPP table support in the OPP framework and in DT. Since there can
> > be more than one interconnect path used by a device, we also need a way to
> > assign a bandwidth OPP table to an interconnect path.
> >
> > This patch series:
> > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> > - Adds interconnect-opp-table property to interconnect DT bindings
> > - Adds OPP helper functions for bandwidth OPP tables
> > - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> >
> > So with the DT bindings added in this patch series, the DT for a GPU
> > that does bandwidth voting from GPU to Cache and GPU to DDR would look
> > something like this:
> >
> > gpu_cache_opp_table: gpu_cache_opp_table {
> >       compatible = "operating-points-v2";
> >
> >       gpu_cache_3000: opp-3000 {
> >               opp-peak-KBps = <3000>;
> >               opp-avg-KBps = <1000>;
> >       };
> >       gpu_cache_6000: opp-6000 {
> >               opp-peak-KBps = <6000>;
> >               opp-avg-KBps = <2000>;
> >       };
> >       gpu_cache_9000: opp-9000 {
> >               opp-peak-KBps = <9000>;
> >               opp-avg-KBps = <9000>;
> >       };
> > };
> >
> > gpu_ddr_opp_table: gpu_ddr_opp_table {
> >       compatible = "operating-points-v2";
> >
> >       gpu_ddr_1525: opp-1525 {
> >               opp-peak-KBps = <1525>;
> >               opp-avg-KBps = <452>;
> >       };
> >       gpu_ddr_3051: opp-3051 {
> >               opp-peak-KBps = <3051>;
> >               opp-avg-KBps = <915>;
> >       };
> >       gpu_ddr_7500: opp-7500 {
> >               opp-peak-KBps = <7500>;
> >               opp-avg-KBps = <3000>;
> >       };
> > };
>
> Who is going to use the above tables and how ?

In this example the GPU driver would use these. It'll go through these
and then decide what peak and average bw to pick based on whatever
criteria.

> These are the maximum
> BW available over these paths, right ?

I wouldn't call them "maximum" because there can't be multiple
maximums :) But yes, these are the meaningful bandwidth from the GPU's
perspective to use over these paths.

>
> > gpu_opp_table: gpu_opp_table {
> >       compatible = "operating-points-v2";
> >       opp-shared;
> >
> >       opp-200000000 {
> >               opp-hz = /bits/ 64 <200000000>;
> >       };
> >       opp-400000000 {
> >               opp-hz = /bits/ 64 <400000000>;
> >       };
> > };
>
> Shouldn't this link back to the above tables via required-opp, etc ?
> How will we know how much BW is required by the GPU device for all the
> paths ?

If that's what the GPU driver wants to do, then yes. But the GPU
driver could also choose to scale the bandwidth for these paths based
on multiple other signals. Eg: bus port busy percentage, measure
bandwidth, etc.

Or they might even decide to pick the BW OPP that matches the index of
their GPU OPP. It's up to them how they actually do the heuristic.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-17 20:29     ` Saravana Kannan
@ 2019-07-18  4:35       ` Viresh Kumar
  2019-07-18 17:26         ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-18  4:35 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On 17-07-19, 13:29, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > Interconnects often quantify their performance points in terms of
> > > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > > allow specifying Bandwidth OPP tables in DT.
> > >
> > > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > > tables.
> > >
> > > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > > tables.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > > index 76b6c79604a5..c869e87caa2a 100644
> > > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > > @@ -83,9 +83,14 @@ properties.
> > >
> > >  Required properties:
> > >  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > > -  required property for all device nodes but devices like power domains. The
> > > -  power domain nodes must have another (implementation dependent) property which
> > > -  uniquely identifies the OPP nodes.
> > > +  required property for all device nodes but for devices like power domains or
> > > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > > +  opps are required to have the opp-peak-bw property.
> >
> >                                    ??
> 
> Sorry, what's the question? Was this an accidental email?

Too much smartness is too bad sometimes, sorry about that :)

I placed the ?? right below "opp-peak-bw", there is no property like
that. You failed to update it :)

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-17 20:34   ` Saravana Kannan
@ 2019-07-18  5:37     ` Viresh Kumar
  2019-07-19  4:12       ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-18  5:37 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

I know you have explained lots of things earlier as well, but they are
available over multiple threads and I don't know where to reply now :)

Lets have proper discussion (once again) here and be done with it.
Sorry for the trouble of explaining things again.

On 17-07-19, 13:34, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > gpu_cache_opp_table: gpu_cache_opp_table {
> > >       compatible = "operating-points-v2";
> > >
> > >       gpu_cache_3000: opp-3000 {
> > >               opp-peak-KBps = <3000>;
> > >               opp-avg-KBps = <1000>;
> > >       };
> > >       gpu_cache_6000: opp-6000 {
> > >               opp-peak-KBps = <6000>;
> > >               opp-avg-KBps = <2000>;
> > >       };
> > >       gpu_cache_9000: opp-9000 {
> > >               opp-peak-KBps = <9000>;
> > >               opp-avg-KBps = <9000>;
> > >       };
> > > };
> > >
> > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > >       compatible = "operating-points-v2";
> > >
> > >       gpu_ddr_1525: opp-1525 {
> > >               opp-peak-KBps = <1525>;
> > >               opp-avg-KBps = <452>;
> > >       };
> > >       gpu_ddr_3051: opp-3051 {
> > >               opp-peak-KBps = <3051>;
> > >               opp-avg-KBps = <915>;
> > >       };
> > >       gpu_ddr_7500: opp-7500 {
> > >               opp-peak-KBps = <7500>;
> > >               opp-avg-KBps = <3000>;
> > >       };
> > > };
> >
> > Who is going to use the above tables and how ?
> 
> In this example the GPU driver would use these. It'll go through these
> and then decide what peak and average bw to pick based on whatever
> criteria.

Are you saying that the GPU driver will decide which bandwidth to
choose while running at a particular frequency (say 2 GHz) ? And that
it can choose 1525 or 3051 or 7500 from the ddr path ?

Will it be possible to publicly share how we derive to these decisions
?

The thing is I don't like these separate OPP tables which will not be
used by anyone else, but just GPU (or a single device). I would like
to put this data in the GPU OPP table only. What about putting a
range in the GPU OPP table for the Bandwidth if it can change so much
for the same frequency.

> > These are the maximum
> > BW available over these paths, right ?
> 
> I wouldn't call them "maximum" because there can't be multiple
> maximums :) But yes, these are the meaningful bandwidth from the GPU's
> perspective to use over these paths.
> 
> >
> > > gpu_opp_table: gpu_opp_table {
> > >       compatible = "operating-points-v2";
> > >       opp-shared;
> > >
> > >       opp-200000000 {
> > >               opp-hz = /bits/ 64 <200000000>;
> > >       };
> > >       opp-400000000 {
> > >               opp-hz = /bits/ 64 <400000000>;
> > >       };
> > > };
> >
> > Shouldn't this link back to the above tables via required-opp, etc ?
> > How will we know how much BW is required by the GPU device for all the
> > paths ?
> 
> If that's what the GPU driver wants to do, then yes. But the GPU
> driver could also choose to scale the bandwidth for these paths based
> on multiple other signals. Eg: bus port busy percentage, measure
> bandwidth, etc.

Lets say that the GPU is running at 2 GHz right now and based on above
inputs it wants to increase the bandwidth to 7500 for ddr path, now
does it make sense to run at 4 GHz instead of 2 so we utilize the
bandwidth to the best of our ability and waste less power ?

If something like that is acceptable, then what about keeping the
bandwidth fixed for frequencies and rather scale the frequency of the
GPU on the inputs your provided (like bus port busy percentage, etc).

The current proposal makes me wonder on why should we try to reuse OPP
tables for providing these bandwidth values as the OPP tables for
interconnect paths isn't really a lot of data, only bandwidth all the
time and there is no linking from the device's OPP table as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-18  4:35       ` Viresh Kumar
@ 2019-07-18 17:26         ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-18 17:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Wed, Jul 17, 2019 at 9:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 17-07-19, 13:29, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > > Interconnects often quantify their performance points in terms of
> > > > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > > > allow specifying Bandwidth OPP tables in DT.
> > > >
> > > > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > > > tables.
> > > >
> > > > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > > > tables.
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> > > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > > > index 76b6c79604a5..c869e87caa2a 100644
> > > > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > > > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > > > @@ -83,9 +83,14 @@ properties.
> > > >
> > > >  Required properties:
> > > >  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > > > -  required property for all device nodes but devices like power domains. The
> > > > -  power domain nodes must have another (implementation dependent) property which
> > > > -  uniquely identifies the OPP nodes.
> > > > +  required property for all device nodes but for devices like power domains or
> > > > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > > > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > > > +  opps are required to have the opp-peak-bw property.
> > >
> > >                                    ??
> >
> > Sorry, what's the question? Was this an accidental email?
>
> Too much smartness is too bad sometimes, sorry about that :)
>
> I placed the ?? right below "opp-peak-bw", there is no property like
> that. You failed to update it :)

Ah, "typo". I'll fix it.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-18  5:37     ` Viresh Kumar
@ 2019-07-19  4:12       ` Saravana Kannan
  2019-07-29  9:24         ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-19  4:12 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> I know you have explained lots of things earlier as well, but they are
> available over multiple threads and I don't know where to reply now :)
>
> Lets have proper discussion (once again) here and be done with it.
> Sorry for the trouble of explaining things again.
>
> On 17-07-19, 13:34, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > > gpu_cache_opp_table: gpu_cache_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >
> > > >       gpu_cache_3000: opp-3000 {
> > > >               opp-peak-KBps = <3000>;
> > > >               opp-avg-KBps = <1000>;
> > > >       };
> > > >       gpu_cache_6000: opp-6000 {
> > > >               opp-peak-KBps = <6000>;
> > > >               opp-avg-KBps = <2000>;
> > > >       };
> > > >       gpu_cache_9000: opp-9000 {
> > > >               opp-peak-KBps = <9000>;
> > > >               opp-avg-KBps = <9000>;
> > > >       };
> > > > };
> > > >
> > > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >
> > > >       gpu_ddr_1525: opp-1525 {
> > > >               opp-peak-KBps = <1525>;
> > > >               opp-avg-KBps = <452>;
> > > >       };
> > > >       gpu_ddr_3051: opp-3051 {
> > > >               opp-peak-KBps = <3051>;
> > > >               opp-avg-KBps = <915>;
> > > >       };
> > > >       gpu_ddr_7500: opp-7500 {
> > > >               opp-peak-KBps = <7500>;
> > > >               opp-avg-KBps = <3000>;
> > > >       };
> > > > };
> > >
> > > Who is going to use the above tables and how ?
> >
> > In this example the GPU driver would use these. It'll go through these
> > and then decide what peak and average bw to pick based on whatever
> > criteria.
>
> Are you saying that the GPU driver will decide which bandwidth to
> choose while running at a particular frequency (say 2 GHz) ? And that
> it can choose 1525 or 3051 or 7500 from the ddr path ?
>
> Will it be possible to publicly share how we derive to these decisions
> ?

GPU is just an example. So I can't really speak for how a random GPU
driver might decide the bandwidth to pick.

But one obvious way is to start at the lowest bandwidth and check the
bus port busy%. If it's > 80% busy, it'll pick the next bandwidth,
etc. So, something like what cpufreq ondemand or conservative governor
used to do.

> The thing is I don't like these separate OPP tables which will not be
> used by anyone else, but just GPU (or a single device).

The BW OPP table isn't always a secondary OPP table. It can be a
primary OPP table too. For example, if you have a bandwidth monitoring
device/HW IP that can measure for a path and make requests for that
path, it'll have a BW OPP table and it'll pick from one of those BW
OPP levels based on the hardware measurements. It will have it's own
device driver. This is basically no different from a device being the
only user of a freq OPP table.

> I would like
> to put this data in the GPU OPP table only. What about putting a
> range in the GPU OPP table for the Bandwidth if it can change so much
> for the same frequency.

I don't think the range is going to work. If a GPU is doing purely
computational work, it's not unreasonable for it to vote for the
lowest bandwidth for any GPU frequency.

>
> > > These are the maximum
> > > BW available over these paths, right ?
> >
> > I wouldn't call them "maximum" because there can't be multiple
> > maximums :) But yes, these are the meaningful bandwidth from the GPU's
> > perspective to use over these paths.
> >
> > >
> > > > gpu_opp_table: gpu_opp_table {
> > > >       compatible = "operating-points-v2";
> > > >       opp-shared;
> > > >
> > > >       opp-200000000 {
> > > >               opp-hz = /bits/ 64 <200000000>;
> > > >       };
> > > >       opp-400000000 {
> > > >               opp-hz = /bits/ 64 <400000000>;
> > > >       };
> > > > };
> > >
> > > Shouldn't this link back to the above tables via required-opp, etc ?
> > > How will we know how much BW is required by the GPU device for all the
> > > paths ?
> >
> > If that's what the GPU driver wants to do, then yes. But the GPU
> > driver could also choose to scale the bandwidth for these paths based
> > on multiple other signals. Eg: bus port busy percentage, measure
> > bandwidth, etc.
>
> Lets say that the GPU is running at 2 GHz right now and based on above
> inputs it wants to increase the bandwidth to 7500 for ddr path, now
> does it make sense to run at 4 GHz instead of 2 so we utilize the
> bandwidth to the best of our ability and waste less power ?

This is kinda hard to explain, but I'll try.

Firstly, the GPU power increase might be so high that you might not
want to do this anyway.

Also, what you are proposing *might* improve the perf/mW (efficiency)
but it doesn't decrease the actual power consumption. So, this doesn't
really work towards saving power for mobile devices.

Also, if the GPU is generating a lot of traffic to DDR and you
increase the GPU frequency, it's only going to generate even more
traffic. So you'll end up in a positive feedback loop that maxes out
the frequency and bandwidth. Definitely not something you want for a
mobile device.

> If something like that is acceptable, then what about keeping the
> bandwidth fixed for frequencies and rather scale the frequency of the
> GPU on the inputs your provided (like bus port busy percentage, etc).

I don't think it's acceptable.

> The current proposal makes me wonder on why should we try to reuse OPP
> tables for providing these bandwidth values as the OPP tables for
> interconnect paths isn't really a lot of data, only bandwidth all the
> time and there is no linking from the device's OPP table as well.

I think everyone is getting too tied up on mapping device frequency to
bandwidth requests. That's useful for a limited set of cases. But it
doesn't work for a lot of use cases.

Couple of benefits of using BW OPPs instead of with listing bandwidth
values as part of frequency OPP tables:
- Works better when the interconnect path has more useful levels that
the device frequency levels. I think this might even be true on the
SDM845 for GPU and DDR. The link from freq OPP to BW OPP could list
the minimum bandwidth level to use for a particular device freq and
then let the hardware monitoring heuristic take it higher from there.
- Works even if no freq to bandwidth mapping heuristic is used but the
device needs to skip certain bandwidth levels based on the platform's
power/perf reasons.
- More scalable as more properties are added to BW OPP levels. Traffic
priority is one natural extension of the BW OPP "rows". Explicit
latency is another possibility.
- Currently devices that have use case specific bandwidth levels
(that's not computed at runtime) have no way of capturing their use
case level bandwidth needs in DT. Everyone is inventing their own
scheme. Having BW OPP table would allow them capture all the use case
specific bandwidth levels in DT and then pick one using the
index/phandle/etc. We could even allow naming OPP rows and pick it
that way. Not saying this is a main reason for BW OPP tables or we
should do this, but this is a possibility to consider.

Long story short, BW OPP tables make a lot of sense for anyone who has
actually done bandwidth scaling on a commercial platform.

If people are getting too tied up about the interconnect-opp-table we
can just drop that. I just added that to avoid having any implicit
ordering of tables in the operation-points-v2 property vs
interconnects property and call it out more explicitly. But it's not a
hill worth dying on.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-16 18:58     ` Saravana Kannan
@ 2019-07-22 23:35       ` Rob Herring
  2019-07-22 23:40         ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Rob Herring @ 2019-07-22 23:35 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Sibi Sankar, Georgi Djakov, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Tue, Jul 16, 2019 at 11:58:08AM -0700, Saravana Kannan wrote:
> On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> >
> > Hey Saravana,
> >
> > https://patchwork.kernel.org/patch/10850815/
> > There was already a discussion ^^ on how bandwidth bindings were to be
> > named.
> 
> Yes, I'm aware of that series. That series is trying to define a BW
> mapping for an existing frequency OPP table. This patch is NOT about
> adding a mapping to an existing table. This patch is about adding the
> notion of BW OPP tables where BW is the "key" instead of "frequency".
> 
> So let's not mixed up these two series.

Maybe different reasons, but in the end we'd end up with 2 bandwidth 
properties. We need to sort out how they'd overlap/coexist.

The same comment in that series about defining a standard unit suffix 
also applies to this one.

Rob

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property
  2019-07-03  1:10 ` [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property Saravana Kannan
@ 2019-07-22 23:39   ` Rob Herring
  2019-07-22 23:43     ` Saravana Kannan
  2019-07-23  2:21     ` Viresh Kumar
  0 siblings, 2 replies; 48+ messages in thread
From: Rob Herring @ 2019-07-22 23:39 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Mark Rutland, Viresh Kumar, Nishanth Menon,
	Stephen Boyd, Rafael J. Wysocki, vincent.guittot, seansw,
	daidavid1, Rajendra Nayak, sibis, bjorn.andersson, evgreen,
	kernel-team, linux-pm, devicetree, linux-kernel

On Tue, Jul 02, 2019 at 06:10:19PM -0700, Saravana Kannan wrote:
> Add support for listing bandwidth OPP tables for each interconnect path
> listed using the interconnects property.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  .../devicetree/bindings/interconnect/interconnect.txt     | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> index 6f5d23a605b7..fc5b75b76a2c 100644
> --- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
> +++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> @@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name strings sorted in the same
>  			 * dma-mem: Path from the device to the main memory of
>  			            the system
>  
> +interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP tables)
> +			that specify the OPPs for the interconnect paths listed
> +			in the interconnects property. This property can only
> +			point to OPP tables that belong to the device and are
> +			listed in the device's operating-points-v2 property.
> +

IMO, there's no need for this property. Which OPP is which should be 
defined already as part of the device's binding. That's enough for the 
driver to know which OPP applies to the interconnect.

>  Example:
>  
>  	sdhci@7864000 {
> +		operating-points-v2 = <&sdhc_opp_table>, <&sdhc_mem_opp_table>;
>  		...
>  		interconnects = <&pnoc MASTER_SDCC_1 &bimc SLAVE_EBI_CH0>;
>  		interconnect-names = "sdhc-mem";
> +		interconnect-opp-table = <&sdhc_mem_opp_table>;
>  	};
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-22 23:35       ` Rob Herring
@ 2019-07-22 23:40         ` Saravana Kannan
  2019-07-23 14:26           ` Rob Herring
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-22 23:40 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sibi Sankar, Georgi Djakov, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Mon, Jul 22, 2019 at 4:35 PM Rob Herring <robh@kernel.org> wrote:
>
> On Tue, Jul 16, 2019 at 11:58:08AM -0700, Saravana Kannan wrote:
> > On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> > >
> > > Hey Saravana,
> > >
> > > https://patchwork.kernel.org/patch/10850815/
> > > There was already a discussion ^^ on how bandwidth bindings were to be
> > > named.
> >
> > Yes, I'm aware of that series. That series is trying to define a BW
> > mapping for an existing frequency OPP table. This patch is NOT about
> > adding a mapping to an existing table. This patch is about adding the
> > notion of BW OPP tables where BW is the "key" instead of "frequency".
> >
> > So let's not mixed up these two series.
>
> Maybe different reasons, but in the end we'd end up with 2 bandwidth
> properties. We need to sort out how they'd overlap/coexist.

Oh, I totally agree! My point is that the other mapping isn't the
right approach because it doesn't handle a whole swath of use cases.
The one I'm proposing can act as a super set of the other (as in, can
handle that use case too).

> The same comment in that series about defining a standard unit suffix
> also applies to this one.

I thought I read that whole series and I don't remember reading about
the unit suffix. But I'll take a closer look. I've chosen to keep the
DT units at least as "high of a resolution" as what the APIs accept
today. The APIs take KB/s. So I make sure DT can capture KB/s
differences. If we all agree that KB/s is "too accurate" then I think
we should change everything to MB/s.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property
  2019-07-22 23:39   ` Rob Herring
@ 2019-07-22 23:43     ` Saravana Kannan
  2019-07-23  2:21     ` Viresh Kumar
  1 sibling, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-22 23:43 UTC (permalink / raw)
  To: Rob Herring
  Cc: Georgi Djakov, Mark Rutland, Viresh Kumar, Nishanth Menon,
	Stephen Boyd, Rafael J. Wysocki, Vincent Guittot, Sweeney, Sean,
	daidavid1, Rajendra Nayak, Sibi Sankar, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Mon, Jul 22, 2019 at 4:39 PM Rob Herring <robh@kernel.org> wrote:
>
> On Tue, Jul 02, 2019 at 06:10:19PM -0700, Saravana Kannan wrote:
> > Add support for listing bandwidth OPP tables for each interconnect path
> > listed using the interconnects property.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  .../devicetree/bindings/interconnect/interconnect.txt     | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > index 6f5d23a605b7..fc5b75b76a2c 100644
> > --- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > +++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > @@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name strings sorted in the same
> >                        * dma-mem: Path from the device to the main memory of
> >                                   the system
> >
> > +interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP tables)
> > +                     that specify the OPPs for the interconnect paths listed
> > +                     in the interconnects property. This property can only
> > +                     point to OPP tables that belong to the device and are
> > +                     listed in the device's operating-points-v2 property.
> > +
>
> IMO, there's no need for this property. Which OPP is which should be
> defined already as part of the device's binding. That's enough for the
> driver to know which OPP applies to the interconnect.

Sure, I don't mind dropping this.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property
  2019-07-22 23:39   ` Rob Herring
  2019-07-22 23:43     ` Saravana Kannan
@ 2019-07-23  2:21     ` Viresh Kumar
  1 sibling, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2019-07-23  2:21 UTC (permalink / raw)
  To: Rob Herring
  Cc: Saravana Kannan, Georgi Djakov, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, vincent.guittot,
	seansw, daidavid1, Rajendra Nayak, sibis, bjorn.andersson,
	evgreen, kernel-team, linux-pm, devicetree, linux-kernel

On 22-07-19, 17:39, Rob Herring wrote:
> On Tue, Jul 02, 2019 at 06:10:19PM -0700, Saravana Kannan wrote:
> > Add support for listing bandwidth OPP tables for each interconnect path
> > listed using the interconnects property.
> > 
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  .../devicetree/bindings/interconnect/interconnect.txt     | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > index 6f5d23a605b7..fc5b75b76a2c 100644
> > --- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > +++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > @@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name strings sorted in the same
> >  			 * dma-mem: Path from the device to the main memory of
> >  			            the system
> >  
> > +interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP tables)
> > +			that specify the OPPs for the interconnect paths listed
> > +			in the interconnects property. This property can only
> > +			point to OPP tables that belong to the device and are
> > +			listed in the device's operating-points-v2 property.
> > +
> 
> IMO, there's no need for this property. Which OPP is which should be 
> defined already as part of the device's binding. That's enough for the 
> driver to know which OPP applies to the interconnect.

And if there is confusion we can actually use the compatible property
to have another string which highlights that it is an interconnect OPP
?

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-22 23:40         ` Saravana Kannan
@ 2019-07-23 14:26           ` Rob Herring
  2019-07-24  0:18             ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Rob Herring @ 2019-07-23 14:26 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Sibi Sankar, Georgi Djakov, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Mon, Jul 22, 2019 at 5:41 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Jul 22, 2019 at 4:35 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, Jul 16, 2019 at 11:58:08AM -0700, Saravana Kannan wrote:
> > > On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> > > >
> > > > Hey Saravana,
> > > >
> > > > https://patchwork.kernel.org/patch/10850815/
> > > > There was already a discussion ^^ on how bandwidth bindings were to be
> > > > named.
> > >
> > > Yes, I'm aware of that series. That series is trying to define a BW
> > > mapping for an existing frequency OPP table. This patch is NOT about
> > > adding a mapping to an existing table. This patch is about adding the
> > > notion of BW OPP tables where BW is the "key" instead of "frequency".
> > >
> > > So let's not mixed up these two series.
> >
> > Maybe different reasons, but in the end we'd end up with 2 bandwidth
> > properties. We need to sort out how they'd overlap/coexist.
>
> Oh, I totally agree! My point is that the other mapping isn't the
> right approach because it doesn't handle a whole swath of use cases.
> The one I'm proposing can act as a super set of the other (as in, can
> handle that use case too).
>
> > The same comment in that series about defining a standard unit suffix
> > also applies to this one.
>
> I thought I read that whole series and I don't remember reading about
> the unit suffix. But I'll take a closer look. I've chosen to keep the
> DT units at least as "high of a resolution" as what the APIs accept
> today. The APIs take KB/s. So I make sure DT can capture KB/s
> differences. If we all agree that KB/s is "too accurate" then I think
> we should change everything to MB/s.

Either one is fine with me, but trying to align to what the OS picked
doesn't work. What does BSD use for example? More important is
aligning across DT properties so we don't have folks picking whatever
random unit they like. We generally try to go with the smallest units
that will have enough (32-bit) range for everyone, so that's probably
KB/s here.

Rob

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-23 14:26           ` Rob Herring
@ 2019-07-24  0:18             ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:18 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sibi Sankar, Georgi Djakov, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Tue, Jul 23, 2019 at 7:27 AM Rob Herring <robh@kernel.org> wrote:
>
> On Mon, Jul 22, 2019 at 5:41 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Jul 22, 2019 at 4:35 PM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Tue, Jul 16, 2019 at 11:58:08AM -0700, Saravana Kannan wrote:
> > > > On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> > > > >
> > > > > Hey Saravana,
> > > > >
> > > > > https://patchwork.kernel.org/patch/10850815/
> > > > > There was already a discussion ^^ on how bandwidth bindings were to be
> > > > > named.
> > > >
> > > > Yes, I'm aware of that series. That series is trying to define a BW
> > > > mapping for an existing frequency OPP table. This patch is NOT about
> > > > adding a mapping to an existing table. This patch is about adding the
> > > > notion of BW OPP tables where BW is the "key" instead of "frequency".
> > > >
> > > > So let's not mixed up these two series.
> > >
> > > Maybe different reasons, but in the end we'd end up with 2 bandwidth
> > > properties. We need to sort out how they'd overlap/coexist.
> >
> > Oh, I totally agree! My point is that the other mapping isn't the
> > right approach because it doesn't handle a whole swath of use cases.
> > The one I'm proposing can act as a super set of the other (as in, can
> > handle that use case too).
> >
> > > The same comment in that series about defining a standard unit suffix
> > > also applies to this one.
> >
> > I thought I read that whole series and I don't remember reading about
> > the unit suffix. But I'll take a closer look. I've chosen to keep the
> > DT units at least as "high of a resolution" as what the APIs accept
> > today. The APIs take KB/s. So I make sure DT can capture KB/s
> > differences. If we all agree that KB/s is "too accurate" then I think
> > we should change everything to MB/s.
>
> Either one is fine with me, but trying to align to what the OS picked
> doesn't work. What does BSD use for example? More important is
> aligning across DT properties so we don't have folks picking whatever
> random unit they like. We generally try to go with the smallest units
> that will have enough (32-bit) range for everyone, so that's probably
> KB/s here.

Yeah, that makes sense.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-16  0:55                 ` Saravana Kannan
@ 2019-07-24  7:16                   ` Vincent Guittot
  0 siblings, 0 replies; 48+ messages in thread
From: Vincent Guittot @ 2019-07-24  7:16 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Sweeney, Sean,
	daidavid1, Rajendra Nayak, sibis, Bjorn Andersson, Evan Green,
	Android Kernel Team, open list:THERMAL,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel

On Tue, 16 Jul 2019 at 02:56, Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Jul 15, 2019 at 1:16 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Tue, 9 Jul 2019 at 21:03, Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Tue, Jul 9, 2019 at 12:25 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > On Sun, 7 Jul 2019 at 23:48, Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> > > > > <vincent.guittot@linaro.org> wrote:
> > > > > >
> > > > > > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > > > > > <vincent.guittot@linaro.org> wrote:
> > > > > > > >
> > > > > > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > >
> > > > > > > > > Interconnect paths can have different performance points. Now that OPP
> > > > > > > > > framework supports bandwidth OPP tables, add OPP table support for
> > > > > > > > > interconnects.
> > > > > > > > >
> > > > > > > > > Devices can use the interconnect-opp-table DT property to specify OPP
> > > > > > > > > tables for interconnect paths. And the driver can obtain the OPP table for
> > > > > > > > > an interconnect path by calling icc_get_opp_table().
> > > > > > > >
> > > > > > > > The opp table of a path must come from the aggregation of OPP tables
> > > > > > > > of the interconnect providers.
> > > > > > >
> > > > > > > The aggregation of OPP tables of the providers is certainly the
> > > > > > > superset of what a path can achieve, but to say that OPPs for
> > > > > > > interconnect path should match that superset is an oversimplification
> > > > > > > of the reality in hardware.
> > > > > > >
> > > > > > > There are lots of reasons an interconnect path might not want to use
> > > > > > > all the available bandwidth options across all the interconnects in
> > > > > > > the route.
> > > > > > >
> > > > > > > 1. That particular path might not have been validated or verified
> > > > > > >    during the HW design process for some of the frequencies/bandwidth
> > > > > > >    combinations of the providers.
> > > > > >
> > > > > > All these constraint are provider's constraints and not consumer's one
> > > > > >
> > > > > > The consumer asks for a bandwidth according to its needs and then the
> > > > > > providers select the optimal bandwidth of each interconnect after
> > > > > > aggregating all the request and according to what OPP have been
> > > > > > validated
> > > > >
> > > > > Not really. The screening can be a consumer specific issue. The
> > > > > consumer IP itself might have some issue with using too low of a
> > > > > bandwidth or bandwidth that's not within some range. It should not be
> > > >
> > > > How can an IP ask for not enough bandwidth ?
> > > > It asks the needed bandwidth based on its requirements
> > >
> > > The "enough bandwidth" is not always obvious. It's only for very
> > > simple cases that you can calculate the required bandwidth. Even for
> > > cases that you think might be "obvious/easy" aren't always easy.
> > >
> > > For example, you'd think a display IP would have a fixed bandwidth
> > > requirement for a fixed resolution screen. But that's far from the
> > > truth. It can also change as the number of layers change per frame.
> > > For video decoder/encoder, it depends on how well the frames compress
> > > with a specific compression scheme.
> > > So the "required" bandwidth is often a heuristic based on the IP
> > > frequency or traffic measurement.
> > >
> > > But that's not even the point I was making in this specific "bullet".
> > >
> > > A hardware IP might be screen/verified with only certain bandwidth
> > > levels. Or it might have hardware bugs that prevent it from using
> > > lower bandwidths even though it's technically sufficient. We need a
> > > way to capture that per path. This is not even a fictional case. This
> > > has been true multiple times over widely used IPs.
> >
> > here you are mixing HW constraint on the soc and OPP screening with
> > bandwidth request from consumer
> > ICC framework is about getting bandwidth request not trying to fix
> > some HW/voltage dependency of the SoC
> >
> > >
> > > > > the provider's job to take into account all the IP that might be
> > > > > connected to the interconnects. If the interconnect HW itself didn't
> > > >
> > > > That's not what I'm saying. The provider knows which bandwidth the
> > > > interconnect can provide as it is the ones which configures it. So if
> > > > the interconnect has a finite number of bandwidth point based probably
> > > > on the possible clock frequency and others config of the interconnect,
> > > > it selects the best final config after aggregating the request of the
> > > > consumer.
> > >
> > > I completely agree with this. What you are stating above is how it
> > > should work and that's the whole point of the interconnect framework.
> > >
> > > But this is orthogonal to the point I'm making.
> >
> > It's not orthogonal because you want to add a OPP table pointer in the
> > ICC path structure to fix your platform HW constraint whereas it's not
> > the purpose of the framework IMO
> >
> > >
> > > > > change, the provider driver shouldn't need to change. By your
> > > > > definition, a provider driver will have to account for all the
> > > > > possible bus masters that might be connected to it across all SoCs.
> > > >
> > > > you didn't catch my point
> > >
> > > Same. I think we are talking over each other. Let me try again.
> > >
> > > You are trying to describe how and interconnect provider and framework
> > > should work. There's no disagreement there.
> > >
> > > My point is that consumers might not want to or can not always use all
> > > the available bandwidth levels offered by the providers. There can be
> > > many reasons for that (which is what I listed in my earlier emails)
> > > and we need a good and generic way to capture that so that everyone
> > > isn't trying to invent their own property.
> >
> > And my point is that you want to describe some platform or even UCs
> > specific constraint in the ICC framework which is not the place to do.
> >
> > If the consumers might not want to use all available bandwidth because
> > this is not power efficient as an example, this should be describe
> > somewhere else to express  that there is a shared power domain
> > between some devices and we shoudl ensure that all devices in this
> > power domain should use the  Optimal Operating Point (optimal freq for
> > a voltage)
>
> My patch series has nothing to do with shared power domains. I think
> the examples have made it amply clear.

It's far from being clear why a consumer doesn't want to use some
bandwidth level TBH

Do you have a real example ?

>
> > ICC framework describes the bandwidth request that are expressed by
> > the consumers for the current running state of their IP but it doesn't
> > reflect the fact that on platform A, the consumer should use bandwidth
> > X because it will select a voltage level of a shared power domain that
> > is optimized for the other devices B, C ... . It's up to the provider
> > to know HW details of the bus that it drives and to make such
> > decision;  the consumer should always request the same
>
> The change to ICC framework is practically just this. I don't have any
> future changes planned for the ICC framework. This is the entirety of
> it.
>
> +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> +       if (opp_node) {
> +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> +               of_node_put(opp_node);
> +       }
>
> It's quite a stretch and bit hyperbolic to say this one change is
> getting ICC framework to do all the things you claim above.
>

So I clearly don't see the benefit of adding this opp_table field in
icc_path struct because I'm still convinced that the consumer doesn't
have to get a bandwidth table like that.
If the consumer already get a bandwidth value and it should in order
to call icc_set or even in order to select one element in your table,
then it should directly set it with icc_set and let the provider
aggregate and choose the real one

> It's literally a simple helper function so that the consumer doesn't
> have to make assumptions about indices and it's a bit more explicit
> about which OPP table of the device (a device can have multiple OPP
> tables) corresponds to which ICC path.
>
> Going by your extreme argument, one can also claim that it's not the
> ICC framework's job to make it easy for consumers to figure out the
> source/destination endpoints or give them names and delete the
> interconnect and interconnect-names properties. That's clearly just as
> absurd a claim.
>
>
> -Saravana
>
> > > > > That's not good design nor is it scalable.
> > > > >
> > > > > > >
> > > > > > > 2. Similarly during parts screening in the factory, some of the
> > > > > > >    combinations might not have been screened and can't be guaranteed
> > > > > > >    to work.
> > > > > >
> > > > > > As above, it's the provider's job to select the final bandwidth
> > > > > > according to its constraint
> > > > >
> > > > > Same reply as above.
> > > > >
> > > > > > >
> > > > > > > 3. Only a certain set of bandwidth levels might make sense to use from
> > > > > > >    a power/performance balance given the device using it. For example:
> > > > > > >    - The big CPU might not want to use some of the lower bandwidths
> > > > > > >      but the little CPU might want to.
> > > > > > >    - The big CPU might not want to use some intermediate bandwidth
> > > > > > >      points if they don't save a lot of power compared to a higher
> > > > > > >      bandwidth levels, but the little CPU might want to.
> > > > > > >    - The little CPU might never want to use the higher set of
> > > > > > >      bandwidth levels since they won't be power efficient for the use
> > > > > > >      cases that might run on it.
> > > > > >
> > > > > > These example are quite vague about the reasons why little might never
> > > > > > want to use higher bandwidth.
> > > > >
> > > > > How is it vague? I just said because of power/performance balance.
> > > > >
> > > > > > But then, if little doesn't ask high bandwidth it will not use them.
> > > > >
> > > > > If you are running a heuristics based algorithm to pick bandwidth,
> > > > > this is how it'll know NOT to use some of the bandwidth levels.
> > > >
> > > > so you want to set a bandwidth according to the cpu frequency which is
> > > > what has been proposed in other thread
> > >
> > > Nope, that's just one heuristic. Often times it's based on hardware
> > > monitors measuring interconnect activity. If you go look at the SDM845
> > > in a Pixel 3, almost nothing is directly tied to the CPU frequency.
> > >
> > > Even if you are scaling bandwidth based on other hardware
> > > measurements, you might want to avoid some bandwidth level provided by
> > > the interconnect providers because it's suboptimal.
> > >
> > > For example, when making bandwidth votes to accommodate the big CPUs,
> > > you might never want to use some of the lower bandwidth levels because
> > > they are not power efficient for any CPU frequency or any bandwidth
> > > level. Because at those levels the memory/interconnect is so slow that
> > > it has a non-trivial utilization increase (because the CPU is
> > > stalling) of the big CPUs.
> > >
> > > Again, this is completely different from what the providers/icc
> > > framework does. Which is, once the request is made, they aggregate and
> > > set the actual interconnect frequencies correctly.
> > >
> > > > >
> > > > > > >
> > > > > > > 4. It might not make sense from a system level power perspective.
> > > > > > > Let's take an example of a path S (source) -> A -> B -> C -> D
> > > > > > > (destination).
> > > > > > >    - A supports only 2, 5, 7 and 10 GB/s. B supports 1, 2 ... 10 GB/s.
> > > > > > >      C supports 5 and 10 GB/s
> > > > > > >    - If you combine and list the superset of bandwidth levels
> > > > > > >      supported in that path, that'd be 1, 2, 3, ... 10 GB/s.
> > > > > > >    - Which set of bandwidth levels make sense will depend on the
> > > > > > >      hardware characteristics of the interconnects.
> > > > > > >    - If B is the biggest power sink, then you might want to use all 10
> > > > > > >      levels.
> > > > > > >    - If A is the biggest power sink, then you might want to use all 2,
> > > > > > >      5 and 10 GB/s of the levels.
> > > > > > >    - If C is the biggest power sink then you might only want to use 5
> > > > > > >      and 10 GB/s
> > > > > > >    - The more hops and paths you get the more convoluted this gets.
> > > > > > >
> > > > > > > 5. The design of the interconnects themselves might have an impact on
> > > > > > > which bandwidth levels are used.
> > > > > > >    - For example, the FIFO depth between two specific interconnects
> > > > > > >      might affect the valid bandwidth levels for a specific path.
> > > > > > >    - Say S1 -> A -> B -> D1, S2 -> C -> B -> D1 and S2 -> C -> D2 are
> > > > > > >      three paths.
> > > > > > >    - If C <-> B FIFO depth is small, then there might be a requirement
> > > > > > >      that C and B be closely performance matched to avoid system level
> > > > > > >      congestion due to back pressure.
> > > > > > >    - So S2 -> D1 path can't use all the bandwidth levels supported by
> > > > > > >      C-B combination.
> > > > > > >    - But S2 -> D2 can use all the bandwidth levels supported by C.
> > > > > > >    - And S1 -> D1 can use all the levels supported by A-B combination.
> > > > > > >
> > > > > >
> > > > > > All the examples above makes sense but have to be handle by the
> > > > > > provider not the consumer. The consumer asks for a bandwidth according
> > > > > > to its constraints. Then the provider which is the driver that manages
> > > > > > the interconnect IP, should manage all this hardware and platform
> > > > > > specific stuff related to the interconnect IP in order to set the
> > > > > > optimal bandwidth that fit both consumer constraint and platform
> > > > > > specific configuration.
> > > > >
> > > > > Sure, but the provider itself can have interconnect properties to
> > > > > indicate which other interconnects it's tied to. And the provider will
> > > > > still need the interconnect-opp-table to denote which bandwidth levels
> > > > > are sensible to use with each of its connections.
> > >
> > > You seem to have missed this comment.
> > >
> > > Thanks,
> > > Saravana
> > >
> > > > > So in some instances the interconnect-opp-table covers the needs of
> > > > > purely consumers and in some instances purely providers. But in either
> > > > > case, it's still needed to describe the hardware properly.
> > > > >
> > > > > -Saravana
> > > > >
> > > > > > > These are just some of the reasons I could recollect in a few minutes.
> > > > > > > These are all real world cases I had to deal with in the past several
> > > > > > > years of dealing with scaling interconnects. I'm sure vendors and SoCs
> > > > > > > I'm not familiar with have other good reasons I'm not aware of.
> > > > > > >
> > > > > > > Trying to figure this all out by aggregating OPP tables of
> > > > > > > interconnect providers just isn't feasible nor is it efficient. The
> > > > > > > OPP tables for an interconnect path is describing the valid BW levels
> > > > > > > supported by that path and verified in hardware and makes a lot of
> > > > > > > sense to capture it clearly in DT.
> > > > > > >
> > > > > > > > So such kind of OPP table should be at
> > > > > > > > provider level but not at path level.
> > > > > > >
> > > > > > > They can also use it if they want to, but they'll probably want to use
> > > > > > > a frequency OPP table.
> > > > > > >
> > > > > > >
> > > > > > > -Saravana
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> > > > > > > > >  include/linux/interconnect.h |  7 +++++++
> > > > > > > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > > > > > > > > index 871eb4bc4efc..881bac80bc1e 100644
> > > > > > > > > --- a/drivers/interconnect/core.c
> > > > > > > > > +++ b/drivers/interconnect/core.c
> > > > > > > > > @@ -47,6 +47,7 @@ struct icc_req {
> > > > > > > > >   */
> > > > > > > > >  struct icc_path {
> > > > > > > > >         size_t num_nodes;
> > > > > > > > > +       struct opp_table *opp_table;
> > > > > > > > >         struct icc_req reqs[];
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > > > >  {
> > > > > > > > >         struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> > > > > > > > >         struct icc_node *src_node, *dst_node;
> > > > > > > > > -       struct device_node *np = NULL;
> > > > > > > > > +       struct device_node *np = NULL, *opp_node;
> > > > > > > > >         struct of_phandle_args src_args, dst_args;
> > > > > > > > >         int idx = 0;
> > > > > > > > >         int ret;
> > > > > > > > > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> > > > > > > > >                 dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> > > > > > > > >         mutex_unlock(&icc_lock);
> > > > > > > > >
> > > > > > > > > +       opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
> > > > > > > > > +       if (opp_node) {
> > > > > > > > > +               path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> > > > > > > > > +               of_node_put(opp_node);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > >         return path;
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL_GPL(of_icc_get);
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> > > > > > > > > + * @path: reference to the path returned by icc_get()
> > > > > > > > > + *
> > > > > > > > > + * This function will return the OPP table that corresponds to a path handle.
> > > > > > > > > + * If the interconnect API is disabled, NULL is returned and the consumer
> > > > > > > > > + * drivers will still build. Drivers are free to handle this specifically, but
> > > > > > > > > + * they don't have to.
> > > > > > > > > + *
> > > > > > > > > + * Return: opp_table pointer on success. NULL is returned when the API is
> > > > > > > > > + * disabled or the OPP table is missing.
> > > > > > > > > + */
> > > > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > > > +{
> > > > > > > > > +       return path->opp_table;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /**
> > > > > > > > >   * icc_set_bw() - set bandwidth constraints on an interconnect path
> > > > > > > > >   * @path: reference to the path returned by icc_get()
> > > > > > > > > diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> > > > > > > > > index dc25864755ba..0c0bc55f0e89 100644
> > > > > > > > > --- a/include/linux/interconnect.h
> > > > > > > > > +++ b/include/linux/interconnect.h
> > > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > >
> > > > > > > > >  #include <linux/mutex.h>
> > > > > > > > >  #include <linux/types.h>
> > > > > > > > > +#include <linux/pm_opp.h>
> > > > > > > > >
> > > > > > > > >  /* macros for converting to icc units */
> > > > > > > > >  #define Bps_to_icc(x)  ((x) / 1000)
> > > > > > > > > @@ -28,6 +29,7 @@ struct device;
> > > > > > > > >  struct icc_path *icc_get(struct device *dev, const int src_id,
> > > > > > > > >                          const int dst_id);
> > > > > > > > >  struct icc_path *of_icc_get(struct device *dev, const char *name);
> > > > > > > > > +struct opp_table *icc_get_opp_table(struct icc_path *path);
> > > > > > > > >  void icc_put(struct icc_path *path);
> > > > > > > > >  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
> > > > > > > > >
> > > > > > > > > @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
> > > > > > > > >  {
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> > > > > > > > > +{
> > > > > > > > > +       return NULL;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
> > > > > > > > >  {
> > > > > > > > >         return 0;
> > > > > > > > > --
> > > > > > > > > 2.22.0.410.gd8fdbe21b5-goog
> > > > > > > > >
> > > > > >
> > > > > > --
> > > > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > > > > >

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
  2019-07-16 17:25   ` Sibi Sankar
  2019-07-17  7:54   ` Viresh Kumar
@ 2019-07-26 16:24   ` Georgi Djakov
  2019-07-26 19:08     ` Saravana Kannan
  2 siblings, 1 reply; 48+ messages in thread
From: Georgi Djakov @ 2019-07-26 16:24 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: vincent.guittot, seansw, daidavid1, Rajendra Nayak, sibis,
	bjorn.andersson, evgreen, kernel-team, linux-pm, devicetree,
	linux-kernel

Hi Saravana,

On 7/3/19 04:10, Saravana Kannan wrote:
> Interconnects often quantify their performance points in terms of
> bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> allow specifying Bandwidth OPP tables in DT.
> 
> opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> tables.
> 
> opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> tables.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> index 76b6c79604a5..c869e87caa2a 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -83,9 +83,14 @@ properties.
>  
>  Required properties:
>  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> -  required property for all device nodes but devices like power domains. The
> -  power domain nodes must have another (implementation dependent) property which
> -  uniquely identifies the OPP nodes.
> +  required property for all device nodes but for devices like power domains or
> +  bandwidth opp tables. The power domain nodes must have another (implementation
> +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> +  opps are required to have the opp-peak-bw property.
> +
> +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit

As Rob already mentioned, KBps should be documented. See [1].

Thanks,
Georgi

[1] https://lore.kernel.org/lkml/20190423132823.7915-2-georgi.djakov@linaro.org/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-03  1:10 ` [PATCH v3 6/6] interconnect: Add OPP table support for interconnects Saravana Kannan
  2019-07-03  6:45   ` Vincent Guittot
@ 2019-07-26 16:25   ` Georgi Djakov
  2019-07-26 19:08     ` Saravana Kannan
  1 sibling, 1 reply; 48+ messages in thread
From: Georgi Djakov @ 2019-07-26 16:25 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
  Cc: vincent.guittot, seansw, daidavid1, Rajendra Nayak, sibis,
	bjorn.andersson, evgreen, kernel-team, linux-pm, devicetree,
	linux-kernel

Hi Saravana,

On 7/3/19 04:10, Saravana Kannan wrote:
> Interconnect paths can have different performance points. Now that OPP
> framework supports bandwidth OPP tables, add OPP table support for
> interconnects.
> 
> Devices can use the interconnect-opp-table DT property to specify OPP
> tables for interconnect paths. And the driver can obtain the OPP table for
> an interconnect path by calling icc_get_opp_table().
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
>  include/linux/interconnect.h |  7 +++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 871eb4bc4efc..881bac80bc1e 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -47,6 +47,7 @@ struct icc_req {
>   */
>  struct icc_path {
>  	size_t num_nodes;
> +	struct opp_table *opp_table;

I am a bit worried that these tables might be abused and size of the DT will
grow with many OPP tables of all existing paths.

>  	struct icc_req reqs[];
>  };
>  
> @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  {
>  	struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
>  	struct icc_node *src_node, *dst_node;
> -	struct device_node *np = NULL;
> +	struct device_node *np = NULL, *opp_node;
>  	struct of_phandle_args src_args, dst_args;
>  	int idx = 0;
>  	int ret;
> @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  		dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
>  	mutex_unlock(&icc_lock);
>  
> +	opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);

Can't we figure out if the device OPP table contains bandwidth even without this
property?

Thanks,
Georgi

> +	if (opp_node) {
> +		path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
> +		of_node_put(opp_node);
> +	}
> +
> +
>  	return path;
>  }
>  EXPORT_SYMBOL_GPL(of_icc_get);
>  
> +/**
> + * icc_get_opp_table() - Get the OPP table that corresponds to a path
> + * @path: reference to the path returned by icc_get()
> + *
> + * This function will return the OPP table that corresponds to a path handle.
> + * If the interconnect API is disabled, NULL is returned and the consumer
> + * drivers will still build. Drivers are free to handle this specifically, but
> + * they don't have to.
> + *
> + * Return: opp_table pointer on success. NULL is returned when the API is
> + * disabled or the OPP table is missing.
> + */
> +struct opp_table *icc_get_opp_table(struct icc_path *path)
> +{
> +	return path->opp_table;
> +}
> +
>  /**
>   * icc_set_bw() - set bandwidth constraints on an interconnect path
>   * @path: reference to the path returned by icc_get()
> diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> index dc25864755ba..0c0bc55f0e89 100644
> --- a/include/linux/interconnect.h
> +++ b/include/linux/interconnect.h
> @@ -9,6 +9,7 @@
>  
>  #include <linux/mutex.h>
>  #include <linux/types.h>
> +#include <linux/pm_opp.h>
>  
>  /* macros for converting to icc units */
>  #define Bps_to_icc(x)	((x) / 1000)
> @@ -28,6 +29,7 @@ struct device;
>  struct icc_path *icc_get(struct device *dev, const int src_id,
>  			 const int dst_id);
>  struct icc_path *of_icc_get(struct device *dev, const char *name);
> +struct opp_table *icc_get_opp_table(struct icc_path *path);
>  void icc_put(struct icc_path *path);
>  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
>  
> @@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
>  {
>  }
>  
> +static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
> +{
> +	return NULL;
> +}
> +
>  static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
>  {
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects
  2019-07-26 16:25   ` Georgi Djakov
@ 2019-07-26 19:08     ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-26 19:08 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: Rob Herring, Mark Rutland, Viresh Kumar, Nishanth Menon,
	Stephen Boyd, Rafael J. Wysocki, Vincent Guittot, Sweeney, Sean,
	David Dai, Rajendra Nayak, Sibi Sankar, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Fri, Jul 26, 2019 at 9:25 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> Hi Saravana,
>
> On 7/3/19 04:10, Saravana Kannan wrote:
> > Interconnect paths can have different performance points. Now that OPP
> > framework supports bandwidth OPP tables, add OPP table support for
> > interconnects.
> >
> > Devices can use the interconnect-opp-table DT property to specify OPP
> > tables for interconnect paths. And the driver can obtain the OPP table for
> > an interconnect path by calling icc_get_opp_table().
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/interconnect/core.c  | 27 ++++++++++++++++++++++++++-
> >  include/linux/interconnect.h |  7 +++++++
> >  2 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> > index 871eb4bc4efc..881bac80bc1e 100644
> > --- a/drivers/interconnect/core.c
> > +++ b/drivers/interconnect/core.c
> > @@ -47,6 +47,7 @@ struct icc_req {
> >   */
> >  struct icc_path {
> >       size_t num_nodes;
> > +     struct opp_table *opp_table;
>
> I am a bit worried that these tables might be abused and size of the DT will
> grow with many OPP tables of all existing paths.

A ton of stuff can be abused in downstream code. We can't do anything
about that.

We just need to keep an eye on OPP table abuse in upstream (whether it
frequency or bw OPP).

> >       struct icc_req reqs[];
> >  };
> >
> > @@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> >  {
> >       struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
> >       struct icc_node *src_node, *dst_node;
> > -     struct device_node *np = NULL;
> > +     struct device_node *np = NULL, *opp_node;
> >       struct of_phandle_args src_args, dst_args;
> >       int idx = 0;
> >       int ret;
> > @@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
> >               dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
> >       mutex_unlock(&icc_lock);
> >
> > +     opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
>
> Can't we figure out if the device OPP table contains bandwidth even without this
> property?
>

Rob pointed out that the property isn't necessary because the device
binding should document which OPP table is used for what. That takes
care of my main concern of how do we know which OPP table is for what
path. So I'm dropping this patch.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
  2019-07-26 16:24   ` Georgi Djakov
@ 2019-07-26 19:08     ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-26 19:08 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: Rob Herring, Mark Rutland, Viresh Kumar, Nishanth Menon,
	Stephen Boyd, Rafael J. Wysocki, Vincent Guittot, Sweeney, Sean,
	David Dai, Rajendra Nayak, Sibi Sankar, Bjorn Andersson,
	Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Fri, Jul 26, 2019 at 9:24 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> Hi Saravana,
>
> On 7/3/19 04:10, Saravana Kannan wrote:
> > Interconnects often quantify their performance points in terms of
> > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > allow specifying Bandwidth OPP tables in DT.
> >
> > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > tables.
> >
> > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > tables.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > index 76b6c79604a5..c869e87caa2a 100644
> > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > @@ -83,9 +83,14 @@ properties.
> >
> >  Required properties:
> >  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > -  required property for all device nodes but devices like power domains. The
> > -  power domain nodes must have another (implementation dependent) property which
> > -  uniquely identifies the OPP nodes.
> > +  required property for all device nodes but for devices like power domains or
> > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > +  opps are required to have the opp-peak-bw property.
> > +
> > +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
>
> As Rob already mentioned, KBps should be documented. See [1].
>

Will do. Thanks for the pointer.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-19  4:12       ` Saravana Kannan
@ 2019-07-29  9:24         ` Viresh Kumar
  2019-07-29 20:12           ` Saravana Kannan
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2019-07-29  9:24 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On 18-07-19, 21:12, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > I would like
> > to put this data in the GPU OPP table only. What about putting a
> > range in the GPU OPP table for the Bandwidth if it can change so much
> > for the same frequency.
> 
> I don't think the range is going to work.

Any specific reason for that ?

> If a GPU is doing purely
> computational work, it's not unreasonable for it to vote for the
> lowest bandwidth for any GPU frequency.

I think that is fine, but if the GPU is able to find how much
bandwidth it needs why can't it just pass that value without needing
to have another OPP table for the path ?

The interconnect can then take all the requests and have its own OPP
table to get help on deciding what stable bandwidth to use.

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-29  9:24         ` Viresh Kumar
@ 2019-07-29 20:12           ` Saravana Kannan
  2019-07-30  3:01             ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Saravana Kannan @ 2019-07-29 20:12 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-07-19, 21:12, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > I would like
> > > to put this data in the GPU OPP table only. What about putting a
> > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > for the same frequency.
> >
> > I don't think the range is going to work.
>
> Any specific reason for that ?

The next sentence was literally explaining this :) Fine to debate
that, but ignoring that and asking this question is kinda funny.

> > If a GPU is doing purely
> > computational work, it's not unreasonable for it to vote for the
> > lowest bandwidth for any GPU frequency.
>
> I think that is fine, but if the GPU is able to find how much
> bandwidth it needs why can't it just pass that value without needing
> to have another OPP table for the path ?

You were asking this question in the context of "can the GPU OPP just
list all the range of bandwidth it might use per GPU frequency". My point
is that the range would be useless because it would the entire
available bandwidth range (because purely compute work might not need
any bandwidth).

Whereas, what the GPU's algorithm actually needs might be the list of
"useful" bandwidth levels to use.

Also, as we add more ICC request properties, this range idea will not scale.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-29 20:12           ` Saravana Kannan
@ 2019-07-30  3:01             ` Viresh Kumar
  2019-08-03  1:20               ` Saravana Kannan
  2019-08-03  7:36               ` Saravana Kannan
  0 siblings, 2 replies; 48+ messages in thread
From: Viresh Kumar @ 2019-07-30  3:01 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On 29-07-19, 13:12, Saravana Kannan wrote:
> On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > I would like
> > > > to put this data in the GPU OPP table only. What about putting a
> > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > for the same frequency.
> > >
> > > I don't think the range is going to work.
> >
> > Any specific reason for that ?
> 
> The next sentence was literally explaining this :) Fine to debate
> that, but ignoring that and asking this question is kinda funny.

Okay, but ...
 
> > > If a GPU is doing purely
> > > computational work, it's not unreasonable for it to vote for the
> > > lowest bandwidth for any GPU frequency.

... it wasn't clear to me even after reading this sentence again now
:)

I understand that you may have to vote for the lowest bandwidth but
that doesn't explain why a range can't work (sorry if it was just me
who doesn't understood it :)).

> > I think that is fine, but if the GPU is able to find how much
> > bandwidth it needs why can't it just pass that value without needing
> > to have another OPP table for the path ?
> 
> You were asking this question in the context of "can the GPU OPP just
> list all the range of bandwidth it might use per GPU frequency". My point
> is that the range would be useless because it would the entire
> available bandwidth range (because purely compute work might not need
> any bandwidth).

If it is useless to have entire range here, then why bother providing
one ? Why can't the GPU request what it needs in exact terms, based on
its calculations ? And then based on these requests, let the
interconnect find what's the best/stable values it really wants to
program the path for (and for that the interconnect can use its own
OPP table, which would be fine).

> Whereas, what the GPU's algorithm actually needs might be the list of
> "useful" bandwidth levels to use.

Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
the conversations we had until now. It is very capable of finding how
much bandwidth it needs, you just want the GPU driver to finally align
that with a stable bandwidth for the platform later on. And what I am
asking is that it is not required for the end driver to look for
stable values, it just requests what it wants and let the interconnect
core/driver decide the stable values.

Very much like the clock framework, most of the user drivers just ask
for a clk value to be programmed and it is the clock driver which
keeps a table of the stable values and then aligns the requested value
to one of those.

> Also, as we add more ICC request properties, this range idea will not scale.

I am not sure about what kind of properties there can be and where
should they land. My feedback is completely based on what you have
presented in this patchset.

-- 
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables
  2019-07-03  1:10 ` [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables Saravana Kannan
  2019-07-16 17:33   ` Sibi Sankar
@ 2019-07-30 10:57   ` Amit Kucheria
  2019-07-30 23:20     ` Saravana Kannan
  1 sibling, 1 reply; 48+ messages in thread
From: Amit Kucheria @ 2019-07-30 10:57 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	seansw, daidavid1, Rajendra Nayak, Sibi Sankar, Bjorn Andersson,
	evgreen, kernel-team, Linux PM list,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Wed, Jul 3, 2019 at 6:40 AM Saravana Kannan <saravanak@google.com> wrote:
>
> Not all devices quantify their performance points in terms of frequency.
> Devices like interconnects quantify their performance points in terms of
> bandwidth. We need a way to represent these bandwidth levels in OPP. So,
> add support for parsing bandwidth OPPs from DT.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/opp/of.c  | 34 ++++++++++++++++++++++++++++++++--
>  drivers/opp/opp.h |  4 +++-
>  2 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index c10c782d15aa..54fa70ed2adc 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -552,6 +552,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>
> +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
> +{
> +       int ret;
> +       u64 rate;
> +       u32 bw;
> +
> +       ret = of_property_read_u64(np, "opp-hz", &rate);
> +       if (!ret) {
> +               /*
> +                * Rate is defined as an unsigned long in clk API, and so
> +                * casting explicitly to its type. Must be fixed once rate is 64
> +                * bit guaranteed in clk API.
> +                */
> +               new_opp->rate = (unsigned long)rate;
> +               return 0;
> +       }
> +
> +       ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
> +       if (ret)
> +               return ret;
> +       new_opp->rate = (unsigned long) &bw;
> +
> +       ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
> +       if (!ret)
> +               new_opp->avg_bw = (unsigned long) &bw;
> +
> +       return 0;
> +}
> +
>  /**
>   * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
>   * @opp_table: OPP table
> @@ -589,11 +618,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
>         if (!new_opp)
>                 return ERR_PTR(-ENOMEM);
>
> -       ret = of_property_read_u64(np, "opp-hz", &rate);
> +       ret = _read_opp_key(new_opp, np);
>         if (ret < 0) {
>                 /* "opp-hz" is optional for devices like power domains. */
>                 if (!opp_table->is_genpd) {
> -                       dev_err(dev, "%s: opp-hz not found\n", __func__);
> +                       dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
> +                               __func__);
>                         goto free_opp;
>                 }
>
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index 569b3525aa67..ead2cdafe957 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -59,7 +59,8 @@ extern struct list_head opp_tables;
>   * @turbo:     true if turbo (boost) OPP
>   * @suspend:   true if suspend OPP
>   * @pstate: Device's power domain's performance state.
> - * @rate:      Frequency in hertz
> + * @rate:      Frequency in hertz OR Peak bandwidth in kilobytes per second

rate is most often used for clk rates. Let us not overload this just
to save one struct member. IMO, you should introduce a peak_bw member
and then have an error check if the DT provides both rate and peak_bw
during parsing.

> + * @avg_bw:    Average bandwidth in kilobytes per second
>   * @level:     Performance level
>   * @supplies:  Power supplies voltage/current values
>   * @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's
> @@ -81,6 +82,7 @@ struct dev_pm_opp {
>         bool suspend;
>         unsigned int pstate;
>         unsigned long rate;
> +       unsigned long avg_bw;
>         unsigned int level;
>
>         struct dev_pm_opp_supply *supplies;
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables
  2019-07-30 10:57   ` Amit Kucheria
@ 2019-07-30 23:20     ` Saravana Kannan
  0 siblings, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-07-30 23:20 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM list,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

On Tue, Jul 30, 2019 at 3:57 AM Amit Kucheria <amit.kucheria@linaro.org> wrote:
>
> On Wed, Jul 3, 2019 at 6:40 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Not all devices quantify their performance points in terms of frequency.
> > Devices like interconnects quantify their performance points in terms of
> > bandwidth. We need a way to represent these bandwidth levels in OPP. So,
> > add support for parsing bandwidth OPPs from DT.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/opp/of.c  | 34 ++++++++++++++++++++++++++++++++--
> >  drivers/opp/opp.h |  4 +++-
> >  2 files changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > index c10c782d15aa..54fa70ed2adc 100644
> > --- a/drivers/opp/of.c
> > +++ b/drivers/opp/of.c
> > @@ -552,6 +552,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
> >
> > +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
> > +{
> > +       int ret;
> > +       u64 rate;
> > +       u32 bw;
> > +
> > +       ret = of_property_read_u64(np, "opp-hz", &rate);
> > +       if (!ret) {
> > +               /*
> > +                * Rate is defined as an unsigned long in clk API, and so
> > +                * casting explicitly to its type. Must be fixed once rate is 64
> > +                * bit guaranteed in clk API.
> > +                */
> > +               new_opp->rate = (unsigned long)rate;
> > +               return 0;
> > +       }
> > +
> > +       ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
> > +       if (ret)
> > +               return ret;
> > +       new_opp->rate = (unsigned long) &bw;
> > +
> > +       ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
> > +       if (!ret)
> > +               new_opp->avg_bw = (unsigned long) &bw;
> > +
> > +       return 0;
> > +}
> > +
> >  /**
> >   * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
> >   * @opp_table: OPP table
> > @@ -589,11 +618,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
> >         if (!new_opp)
> >                 return ERR_PTR(-ENOMEM);
> >
> > -       ret = of_property_read_u64(np, "opp-hz", &rate);
> > +       ret = _read_opp_key(new_opp, np);
> >         if (ret < 0) {
> >                 /* "opp-hz" is optional for devices like power domains. */
> >                 if (!opp_table->is_genpd) {
> > -                       dev_err(dev, "%s: opp-hz not found\n", __func__);
> > +                       dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
> > +                               __func__);
> >                         goto free_opp;
> >                 }
> >
> > diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> > index 569b3525aa67..ead2cdafe957 100644
> > --- a/drivers/opp/opp.h
> > +++ b/drivers/opp/opp.h
> > @@ -59,7 +59,8 @@ extern struct list_head opp_tables;
> >   * @turbo:     true if turbo (boost) OPP
> >   * @suspend:   true if suspend OPP
> >   * @pstate: Device's power domain's performance state.
> > - * @rate:      Frequency in hertz
> > + * @rate:      Frequency in hertz OR Peak bandwidth in kilobytes per second
>
> rate is most often used for clk rates. Let us not overload this just
> to save one struct member. IMO, you should introduce a peak_bw member
> and then have an error check if the DT provides both rate and peak_bw
> during parsing.

This is not about saving space. It avoids having to rewrite a lot of
helper functions. If you want, I can rename this to "key" but I'm not
going to create a different field.

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-30  3:01             ` Viresh Kumar
@ 2019-08-03  1:20               ` Saravana Kannan
  2019-08-03  7:36               ` Saravana Kannan
  1 sibling, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-08-03  1:20 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

[-- Attachment #1: Type: text/plain, Size: 5700 bytes --]

On Mon, Jul 29, 2019 at 8:02 PM Viresh Kumar <viresh.kumar@linaro.org>
wrote:

> On 29-07-19, 13:12, Saravana Kannan wrote:
> > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org>
> wrote:
> > >
> > > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <
> viresh.kumar@linaro.org> wrote:
> > > > > I would like
> > > > > to put this data in the GPU OPP table only. What about putting a
> > > > > range in the GPU OPP table for the Bandwidth if it can change so
> much
> > > > > for the same frequency.
> > > >
> > > > I don't think the range is going to work.
> > >
> > > Any specific reason for that ?
> >
> > The next sentence was literally explaining this :) Fine to debate
> > that, but ignoring that and asking this question is kinda funny.
>
> Okay, but ...
>
> > > > If a GPU is doing purely
> > > > computational work, it's not unreasonable for it to vote for the
> > > > lowest bandwidth for any GPU frequency.
>
> ... it wasn't clear to me even after reading this sentence again now
> :)
>
> I understand that you may have to vote for the lowest bandwidth but
> that doesn't explain why a range can't work (sorry if it was just me
> who doesn't understood it :)).
>

Well, doesn't work as in, it doesn't give any additional info. I can just
vote for 0 or UINT_MAX if I want to stay at the lowest or high bandwidth.
Have the actual values of the lowest or highest point doesn't help for
cases where you need to skip intermediate bandwidth levels when going from
low to high (as the need increases).


>
> > > I think that is fine, but if the GPU is able to find how much
> > > bandwidth it needs why can't it just pass that value without needing
> > > to have another OPP table for the path ?
> >
> > You were asking this question in the context of "can the GPU OPP just
> > list all the range of bandwidth it might use per GPU frequency". My point
> > is that the range would be useless because it would the entire
> > available bandwidth range (because purely compute work might not need
> > any bandwidth).
>
> If it is useless to have entire range here, then why bother providing
> one ? Why can't the GPU request what it needs in exact terms, based on
> its calculations ? And then based on these requests, let the
> interconnect find what's the best/stable values it really wants to
> program the path for (and for that the interconnect can use its own
> OPP table, which would be fine).
>

Let's say there actual path can support 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10
GB/s.

Let's say 2, 3, and 4 need the same voltage level as 5 for this path. So,
for GPU's needs using 2, 3 and 4 GB/s might not be good because the power
savings from the frequency difference is not worth the performance and
power (if you run the interconnect slow, the GPU would run faster to
achieve the same performance) impact compared to running the interconnect
at 5 GB/s. Similarly it might skip 6 GB/s. So even if the GPU can somehow
calculate the exact bandwidth required (or say measure it), it'll need to
know to skip 2, 3 and 4 because they aren't power/perf efficient levels to
use.

But all these bandwidth levels might be useable for a smaller HW IP whose
power cost isn't high. So power savings running the interconnect at 3 GB/s
might be worth it -- because even if the small HW IP ran faster to achieve
the performance, the power increase in the HW IP won't be higher than the
power savings from running the interconnect slower.



> > Whereas, what the GPU's algorithm actually needs might be the list of
> > "useful" bandwidth levels to use.
>
> Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
> the conversations we had until now. It is very capable of finding how
> much bandwidth it needs,


Not really. If you have a monitor that can actually measure the bandwidth,
yes. Most often that's not the case. If you just have a monitor that can
give your the bus port busy% then it'll have to use this table to pick the
useful ones. As in, in the example above, if the bus is still too busy at 1
GB/s it would directly ask for 5 GB/s instead of going through 2, 3 and 4.


> you just want the GPU driver to finally align
> that with a stable bandwidth for the platform later on. And what I am
> asking is that it is not required for the end driver to look for
> stable values, it just requests what it wants and let the interconnect
> core/driver decide the stable values.
>

I think you've misunderstood my prior statements then.

The interconnect driver would then still have to aggregate the requests and
pick the final frequency for the interconnect. That's where it comes in --
executing/implementing the requests of all the clients.

Very much like the clock framework, most of the user drivers just ask
> for a clk value to be programmed and it is the clock driver which
> keeps a table of the stable values and then aligns the requested value
> to one of those.
>

This of this similar to the clock API and the OPP tables for CPUs. The
clock API could run the CPU at multiple different frequencies. But the CPU
driver uses the CPU OPP table to pick a certain set of frequencies that are
"useful". If your CPU clock is shared with another block, say L3 and
there's an L3 driver that's requesting a different frequency range (using
clk_set_rate_range(x, UINT_MAX)), that's where the clock driver aggregates
their request and set's the final clock frequency.

Similarly, the GPU driver wants to pick useful interconnect path bandwidth
levels using BW OPP tables. And the interconnect framework aggregates the
requests across multiple drivers requesting different bandwidths.

Does that make sense?

-Saravana

[-- Attachment #2: Type: text/html, Size: 7509 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
  2019-07-30  3:01             ` Viresh Kumar
  2019-08-03  1:20               ` Saravana Kannan
@ 2019-08-03  7:36               ` Saravana Kannan
  1 sibling, 0 replies; 48+ messages in thread
From: Saravana Kannan @ 2019-08-03  7:36 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML

Resending due to HTML.

On Mon, Jul 29, 2019 at 8:02 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 29-07-19, 13:12, Saravana Kannan wrote:
> > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > > I would like
> > > > > to put this data in the GPU OPP table only. What about putting a
> > > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > > for the same frequency.
> > > >
> > > > I don't think the range is going to work.
> > >
> > > Any specific reason for that ?
> >
> > The next sentence was literally explaining this :) Fine to debate
> > that, but ignoring that and asking this question is kinda funny.
>
> Okay, but ...
>
> > > > If a GPU is doing purely
> > > > computational work, it's not unreasonable for it to vote for the
> > > > lowest bandwidth for any GPU frequency.
>
> ... it wasn't clear to me even after reading this sentence again now
> :)
>
> I understand that you may have to vote for the lowest bandwidth but
> that doesn't explain why a range can't work (sorry if it was just me
> who doesn't understood it :)).

Well, doesn't work as in, it doesn't give any additional info. I can
just vote for 0 or UINT_MAX if I want to stay at the lowest or high
bandwidth. Having the actual values of the lowest or highest point
doesn't help for cases where you need to skip intermediate bandwidth
levels when going from low to high (as the need increases).

>
> > > I think that is fine, but if the GPU is able to find how much
> > > bandwidth it needs why can't it just pass that value without needing
> > > to have another OPP table for the path ?
> >
> > You were asking this question in the context of "can the GPU OPP just
> > list all the range of bandwidth it might use per GPU frequency". My point
> > is that the range would be useless because it would the entire
> > available bandwidth range (because purely compute work might not need
> > any bandwidth).
>
> If it is useless to have entire range here, then why bother providing
> one ? Why can't the GPU request what it needs in exact terms, based on
> its calculations ? And then based on these requests, let the
> interconnect find what's the best/stable values it really wants to
> program the path for (and for that the interconnect can use its own
> OPP table, which would be fine).

Let's say there actual path can support 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 GB/s.

Let's say 2, 3, and 4 need the same voltage level as 5 for this path.
So, for GPU's needs using 2, 3 and 4 GB/s might not be good because
the power savings from the frequency difference is not worth the
performance and power (if you run the interconnect slow, the GPU would
run faster to achieve the same performance) impact compared to running
the interconnect at 5 GB/s. Similarly it might skip 6 GB/s. So even if
the GPU can somehow calculate the exact bandwidth required (or say
measure it), it'll need to know to skip 2, 3 and 4 because they aren't
power/perf efficient levels to use.

But all these bandwidth levels might be useable for a smaller HW IP
whose power cost isn't high. So power savings running the interconnect
at 3 GB/s might be worth it -- because even if the small HW IP ran
faster to achieve the performance, the power increase in the HW IP
won't be higher than the power savings from running the interconnect
slower.

> > Whereas, what the GPU's algorithm actually needs might be the list of
> > "useful" bandwidth levels to use.
>
> Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
> the conversations we had until now. It is very capable of finding how
> much bandwidth it needs,

Not really. If you have a monitor that can actually measure the
bandwidth, yes. Most often that's not the case. If you just have a
monitor that can give you the bus port busy% then it'll have to use
this table to pick the useful ones. As in, in the example above, if
the bus is still too busy at 1 GB/s it would directly ask for 5 GB/s
instead of going through 2, 3 and 4.

> you just want the GPU driver to finally align
> that with a stable bandwidth for the platform later on. And what I am
> asking is that it is not required for the end driver to look for
> stable values, it just requests what it wants and let the interconnect
> core/driver decide the stable values.

I think you've misunderstood my prior statements then.

The interconnect driver would then still have to aggregate the
requests and pick the final frequency for the interconnect. That's
where it comes in -- executing/implementing the requests of all the
clients.

> Very much like the clock framework, most of the user drivers just ask
> for a clk value to be programmed and it is the clock driver which
> keeps a table of the stable values and then aligns the requested value
> to one of those.

This of this similar to the clock API and the OPP tables for CPUs. The
clock API could run the CPU at multiple different frequencies. But the
CPU driver uses the CPU OPP table to pick a certain set of frequencies
that are "useful". If your CPU clock is shared with another block, say
L3 and there's an L3 driver that's requesting a different frequency
range (using clk_set_rate_range(x, UINT_MAX)), that's where the clock
driver aggregates their request and set's the final clock frequency.

Similarly, the GPU driver wants to pick useful interconnect path
bandwidth levels using BW OPP tables. And the interconnect framework
aggregates the requests across multiple drivers requesting different
bandwidths.

Does that make sense?

-Saravana

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-08-03  7:36 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03  1:10 [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Saravana Kannan
2019-07-03  1:10 ` [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings Saravana Kannan
2019-07-16 17:25   ` Sibi Sankar
2019-07-16 18:58     ` Saravana Kannan
2019-07-22 23:35       ` Rob Herring
2019-07-22 23:40         ` Saravana Kannan
2019-07-23 14:26           ` Rob Herring
2019-07-24  0:18             ` Saravana Kannan
2019-07-17  7:54   ` Viresh Kumar
2019-07-17 20:29     ` Saravana Kannan
2019-07-18  4:35       ` Viresh Kumar
2019-07-18 17:26         ` Saravana Kannan
2019-07-26 16:24   ` Georgi Djakov
2019-07-26 19:08     ` Saravana Kannan
2019-07-03  1:10 ` [PATCH v3 2/6] OPP: Add support for bandwidth OPP tables Saravana Kannan
2019-07-16 17:33   ` Sibi Sankar
2019-07-16 19:10     ` Saravana Kannan
2019-07-30 10:57   ` Amit Kucheria
2019-07-30 23:20     ` Saravana Kannan
2019-07-03  1:10 ` [PATCH v3 3/6] OPP: Add helper function " Saravana Kannan
2019-07-03  1:10 ` [PATCH v3 4/6] OPP: Add API to find an OPP table from its DT node Saravana Kannan
2019-07-03  1:10 ` [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property Saravana Kannan
2019-07-22 23:39   ` Rob Herring
2019-07-22 23:43     ` Saravana Kannan
2019-07-23  2:21     ` Viresh Kumar
2019-07-03  1:10 ` [PATCH v3 6/6] interconnect: Add OPP table support for interconnects Saravana Kannan
2019-07-03  6:45   ` Vincent Guittot
2019-07-03 21:33     ` Saravana Kannan
2019-07-04  7:12       ` Vincent Guittot
2019-07-07 21:48         ` Saravana Kannan
2019-07-09  7:25           ` Vincent Guittot
2019-07-09 19:02             ` Saravana Kannan
2019-07-15  8:16               ` Vincent Guittot
2019-07-16  0:55                 ` Saravana Kannan
2019-07-24  7:16                   ` Vincent Guittot
2019-07-26 16:25   ` Georgi Djakov
2019-07-26 19:08     ` Saravana Kannan
2019-07-03  6:36 ` [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths Viresh Kumar
2019-07-03 20:35   ` Saravana Kannan
2019-07-17 10:32 ` Viresh Kumar
2019-07-17 20:34   ` Saravana Kannan
2019-07-18  5:37     ` Viresh Kumar
2019-07-19  4:12       ` Saravana Kannan
2019-07-29  9:24         ` Viresh Kumar
2019-07-29 20:12           ` Saravana Kannan
2019-07-30  3:01             ` Viresh Kumar
2019-08-03  1:20               ` Saravana Kannan
2019-08-03  7:36               ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).