linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] interconnect: Add path tagging support
@ 2019-06-18  9:17 Georgi Djakov
  2019-06-18  9:17 ` [PATCH v2 1/2] interconnect: Add support for path tags Georgi Djakov
  2019-06-18  9:17 ` [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845 Georgi Djakov
  0 siblings, 2 replies; 14+ messages in thread
From: Georgi Djakov @ 2019-06-18  9:17 UTC (permalink / raw)
  To: linux-pm
  Cc: daidavid1, vincent.guittot, bjorn.andersson, amit.kucheria,
	evgreen, dianders, seansw, linux-kernel, linux-arm-kernel,
	linux-arm-msm, georgi.djakov

SoCs that have multiple coexisting CPUs and DSPs, may have shared
interconnect buses between them. In such cases, each CPU/DSP may have
different bandwidth needs, depending on whether it is active or sleeping.
This means that we have to keep different bandwidth configurations for
the CPU (active/sleep). In such systems, usually there is a way to
communicate and synchronize this information with some firmware or pass
it to another processor responsible for monitoring and switching the
interconnect configurations based on the state of each CPU/DSP.

The above problem can be solved by introducing the path tagging concept,
that allows consumers to optionally attach a tag to each path they use.
This tag is used to differentiate between the aggregated bandwidth values
for each state. The tag is generic and how it's handled is up to the
platform specific interconnect provider drivers.

v2:
- Store tag with the request. (Evan)
- Reorganize the code to save bandwidth values into buckets and use the
  tag as a bitfield. (Evan)
- Clear the aggregated values after icc_set().

v1: https://lore.kernel.org/lkml/20190208172152.1807-1-georgi.djakov@linaro.org/


David Dai (1):
  interconnect: qcom: Add tagging and wake/sleep support for sdm845

Georgi Djakov (1):
  interconnect: Add support for path tags

 drivers/interconnect/core.c           |  24 ++++-
 drivers/interconnect/qcom/sdm845.c    | 131 +++++++++++++++++++-------
 include/linux/interconnect-provider.h |   4 +-
 include/linux/interconnect.h          |   5 +
 4 files changed, 129 insertions(+), 35 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] interconnect: Add support for path tags
  2019-06-18  9:17 [PATCH v2 0/2] interconnect: Add path tagging support Georgi Djakov
@ 2019-06-18  9:17 ` Georgi Djakov
  2019-07-11 17:06   ` Evan Green
  2019-06-18  9:17 ` [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845 Georgi Djakov
  1 sibling, 1 reply; 14+ messages in thread
From: Georgi Djakov @ 2019-06-18  9:17 UTC (permalink / raw)
  To: linux-pm
  Cc: daidavid1, vincent.guittot, bjorn.andersson, amit.kucheria,
	evgreen, dianders, seansw, linux-kernel, linux-arm-kernel,
	linux-arm-msm, georgi.djakov

Consumers may have use cases with different bandwidth requirements based
on the system or driver state. The consumer driver can append a specific
tag to the path and pass this information to the interconnect platform
driver to do the aggregation based on this state.

Introduce icc_set_tag() function that will allow the consumers to append
an optional tag to each path. The aggregation of these tagged paths is
platform specific.

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
---
 drivers/interconnect/core.c           | 24 +++++++++++++++++++++++-
 drivers/interconnect/qcom/sdm845.c    |  2 +-
 include/linux/interconnect-provider.h |  4 ++--
 include/linux/interconnect.h          |  5 +++++
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 871eb4bc4efc..251354bb7fdc 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -29,6 +29,7 @@ static struct dentry *icc_debugfs_dir;
  * @req_node: entry in list of requests for the particular @node
  * @node: the interconnect node to which this constraint applies
  * @dev: reference to the device that sets the constraints
+ * @tag: path tag (optional)
  * @avg_bw: an integer describing the average bandwidth in kBps
  * @peak_bw: an integer describing the peak bandwidth in kBps
  */
@@ -36,6 +37,7 @@ struct icc_req {
 	struct hlist_node req_node;
 	struct icc_node *node;
 	struct device *dev;
+	u32 tag;
 	u32 avg_bw;
 	u32 peak_bw;
 };
@@ -204,7 +206,7 @@ static int aggregate_requests(struct icc_node *node)
 	node->peak_bw = 0;
 
 	hlist_for_each_entry(r, &node->req_list, req_node)
-		p->aggregate(node, r->avg_bw, r->peak_bw,
+		p->aggregate(node, r->tag, r->avg_bw, r->peak_bw,
 			     &node->avg_bw, &node->peak_bw);
 
 	return 0;
@@ -385,6 +387,26 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
 }
 EXPORT_SYMBOL_GPL(of_icc_get);
 
+/**
+ * icc_set_tag() - set an optional tag on a path
+ * @path: the path we want to tag
+ * @tag: the tag value
+ *
+ * This function allows consumers to append a tag to the requests associated
+ * with a path, so that a different aggregation could be done based on this tag.
+ */
+void icc_set_tag(struct icc_path *path, u32 tag)
+{
+	int i;
+
+	if (!path)
+		return;
+
+	for (i = 0; i < path->num_nodes; i++)
+		path->reqs[i].tag = tag;
+}
+EXPORT_SYMBOL_GPL(icc_set_tag);
+
 /**
  * icc_set_bw() - set bandwidth constraints on an interconnect path
  * @path: reference to the path returned by icc_get()
diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
index 4915b78da673..fb526004c82e 100644
--- a/drivers/interconnect/qcom/sdm845.c
+++ b/drivers/interconnect/qcom/sdm845.c
@@ -626,7 +626,7 @@ static void bcm_aggregate(struct qcom_icc_bcm *bcm)
 	bcm->dirty = false;
 }
 
-static int qcom_icc_aggregate(struct icc_node *node, u32 avg_bw,
+static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
 			      u32 peak_bw, u32 *agg_avg, u32 *agg_peak)
 {
 	size_t i;
diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h
index 63caccadc2db..4ee19fd41568 100644
--- a/include/linux/interconnect-provider.h
+++ b/include/linux/interconnect-provider.h
@@ -45,8 +45,8 @@ struct icc_provider {
 	struct list_head	provider_list;
 	struct list_head	nodes;
 	int (*set)(struct icc_node *src, struct icc_node *dst);
-	int (*aggregate)(struct icc_node *node, u32 avg_bw, u32 peak_bw,
-			 u32 *agg_avg, u32 *agg_peak);
+	int (*aggregate)(struct icc_node *node, u32 tag, u32 avg_bw,
+			 u32 peak_bw, u32 *agg_avg, u32 *agg_peak);
 	struct icc_node* (*xlate)(struct of_phandle_args *spec, void *data);
 	struct device		*dev;
 	int			users;
diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
index dc25864755ba..d70a914cba11 100644
--- a/include/linux/interconnect.h
+++ b/include/linux/interconnect.h
@@ -30,6 +30,7 @@ struct icc_path *icc_get(struct device *dev, const int src_id,
 struct icc_path *of_icc_get(struct device *dev, const char *name);
 void icc_put(struct icc_path *path);
 int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
+void icc_set_tag(struct icc_path *path, u32 tag);
 
 #else
 
@@ -54,6 +55,10 @@ static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
 	return 0;
 }
 
+static inline void icc_set_tag(struct icc_path *path, u32 tag)
+{
+}
+
 #endif /* CONFIG_INTERCONNECT */
 
 #endif /* __LINUX_INTERCONNECT_H */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-06-18  9:17 [PATCH v2 0/2] interconnect: Add path tagging support Georgi Djakov
  2019-06-18  9:17 ` [PATCH v2 1/2] interconnect: Add support for path tags Georgi Djakov
@ 2019-06-18  9:17 ` Georgi Djakov
  2019-07-11 17:06   ` Evan Green
  1 sibling, 1 reply; 14+ messages in thread
From: Georgi Djakov @ 2019-06-18  9:17 UTC (permalink / raw)
  To: linux-pm
  Cc: daidavid1, vincent.guittot, bjorn.andersson, amit.kucheria,
	evgreen, dianders, seansw, linux-kernel, linux-arm-kernel,
	linux-arm-msm, georgi.djakov

From: David Dai <daidavid1@codeaurora.org>

Add support for wake and sleep commands by using a tag to indicate
whether or not the aggregate and set requests fall into execution
state specific bucket.

Signed-off-by: David Dai <daidavid1@codeaurora.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
---
 drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
 1 file changed, 98 insertions(+), 31 deletions(-)

diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
index fb526004c82e..c100aab39415 100644
--- a/drivers/interconnect/qcom/sdm845.c
+++ b/drivers/interconnect/qcom/sdm845.c
@@ -66,6 +66,17 @@ struct bcm_db {
 #define SDM845_MAX_BCM_PER_NODE	2
 #define SDM845_MAX_VCD		10
 
+#define QCOM_ICC_BUCKET_AMC		0
+#define QCOM_ICC_BUCKET_WAKE		1
+#define QCOM_ICC_BUCKET_SLEEP		2
+#define QCOM_ICC_NUM_BUCKETS		3
+#define QCOM_ICC_TAG_AMC		BIT(QCOM_ICC_BUCKET_AMC)
+#define QCOM_ICC_TAG_WAKE		BIT(QCOM_ICC_BUCKET_WAKE)
+#define QCOM_ICC_TAG_SLEEP		BIT(QCOM_ICC_BUCKET_SLEEP)
+#define QCOM_ICC_TAG_ACTIVE_ONLY	(QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
+#define QCOM_ICC_TAG_ALWAYS		(QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
+					 QCOM_ICC_TAG_SLEEP)
+
 /**
  * struct qcom_icc_node - Qualcomm specific interconnect nodes
  * @name: the node name used in debugfs
@@ -75,7 +86,9 @@ struct bcm_db {
  * @channels: num of channels at this node
  * @buswidth: width of the interconnect between a node and the bus
  * @sum_avg: current sum aggregate value of all avg bw requests
+ * @sum_avg_cached: previous sum aggregate value of all avg bw requests
  * @max_peak: current max aggregate value of all peak bw requests
+ * @max_peak_cached: previous max aggregate value of all peak bw requests
  * @bcms: list of bcms associated with this logical node
  * @num_bcms: num of @bcms
  */
@@ -86,8 +99,10 @@ struct qcom_icc_node {
 	u16 num_links;
 	u16 channels;
 	u16 buswidth;
-	u64 sum_avg;
-	u64 max_peak;
+	u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
+	u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
+	u64 max_peak[QCOM_ICC_NUM_BUCKETS];
+	u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
 	struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
 	size_t num_bcms;
 };
@@ -112,8 +127,8 @@ struct qcom_icc_bcm {
 	const char *name;
 	u32 type;
 	u32 addr;
-	u64 vote_x;
-	u64 vote_y;
+	u64 vote_x[QCOM_ICC_NUM_BUCKETS];
+	u64 vote_y[QCOM_ICC_NUM_BUCKETS];
 	bool dirty;
 	bool keepalive;
 	struct bcm_db aux_data;
@@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
 		cmd->wait = true;
 }
 
-static void tcs_list_gen(struct list_head *bcm_list,
+static void tcs_list_gen(struct list_head *bcm_list, int bucket,
 			 struct tcs_cmd tcs_list[SDM845_MAX_VCD],
 			 int n[SDM845_MAX_VCD])
 {
@@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
 			commit = true;
 			cur_vcd_size = 0;
 		}
-		tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
-			    bcm->addr, commit);
+		tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
+			    bcm->vote_y[bucket], bcm->addr, commit);
 		idx++;
 		n[batch]++;
 		/*
@@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
 
 static void bcm_aggregate(struct qcom_icc_bcm *bcm)
 {
-	size_t i;
-	u64 agg_avg = 0;
-	u64 agg_peak = 0;
+	size_t i, bucket;
+	u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
+	u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
 	u64 temp;
 
-	for (i = 0; i < bcm->num_nodes; i++) {
-		temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
-		do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
-		agg_avg = max(agg_avg, temp);
+	for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
+		for (i = 0; i < bcm->num_nodes; i++) {
+			temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
+			do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
+			agg_avg[bucket] = max(agg_avg[bucket], temp);
 
-		temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
-		do_div(temp, bcm->nodes[i]->buswidth);
-		agg_peak = max(agg_peak, temp);
-	}
+			temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
+			do_div(temp, bcm->nodes[i]->buswidth);
+			agg_peak[bucket] = max(agg_peak[bucket], temp);
 
-	temp = agg_avg * 1000ULL;
-	do_div(temp, bcm->aux_data.unit);
-	bcm->vote_x = temp;
+			bcm->nodes[i]->sum_avg[bucket] = 0;
+			bcm->nodes[i]->max_peak[bucket] = 0;
+		}
 
-	temp = agg_peak * 1000ULL;
-	do_div(temp, bcm->aux_data.unit);
-	bcm->vote_y = temp;
+		temp = agg_avg[bucket] * 1000ULL;
+		do_div(temp, bcm->aux_data.unit);
+		bcm->vote_x[bucket] = temp;
 
-	if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
-		bcm->vote_x = 1;
-		bcm->vote_y = 1;
+		temp = agg_peak[bucket] * 1000ULL;
+		do_div(temp, bcm->aux_data.unit);
+		bcm->vote_y[bucket] = temp;
+	}
+
+	if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
+		bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
+		bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
+		bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
+		bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
 	}
 
 	bcm->dirty = false;
@@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
 {
 	size_t i;
 	struct qcom_icc_node *qn;
+	unsigned long tag_word = (unsigned long)tag;
 
 	qn = node->data;
 
+	if (!tag)
+		tag_word = QCOM_ICC_TAG_ALWAYS;
+
+	for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
+		if (test_bit(i, &tag_word)) {
+			qn->sum_avg[i] += avg_bw;
+			qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
+			qn->sum_avg_cached[i] = qn->sum_avg[i];
+			qn->max_peak_cached[i] = qn->max_peak[i];
+		}
+	}
+
 	*agg_avg += avg_bw;
 	*agg_peak = max_t(u32, *agg_peak, peak_bw);
 
-	qn->sum_avg = *agg_avg;
-	qn->max_peak = *agg_peak;
-
 	for (i = 0; i < qn->num_bcms; i++)
 		qn->bcms[i]->dirty = true;
 
@@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
 	 * Construct the command list based on a pre ordered list of BCMs
 	 * based on VCD.
 	 */
-	tcs_list_gen(&commit_list, cmds, commit_idx);
+	tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
 
 	if (!commit_idx[0])
 		return ret;
@@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
 		return ret;
 	}
 
+	INIT_LIST_HEAD(&commit_list);
+
+	for (i = 0; i < qp->num_bcms; i++) {
+		/*
+		 * Only generate WAKE and SLEEP commands if a resource's
+		 * requirements change as the execution environment transitions
+		 * between different power states.
+		 */
+		if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
+		    qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
+		    qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
+		    qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
+			list_add_tail(&qp->bcms[i]->list, &commit_list);
+		}
+	}
+
+	if (list_empty(&commit_list))
+		return ret;
+
+	tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
+
+	ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
+	if (ret) {
+		pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
+		return ret;
+	}
+
+	tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
+
+	ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
+	if (ret) {
+		pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
+		return ret;
+	}
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] interconnect: Add support for path tags
  2019-06-18  9:17 ` [PATCH v2 1/2] interconnect: Add support for path tags Georgi Djakov
@ 2019-07-11 17:06   ` Evan Green
  0 siblings, 0 replies; 14+ messages in thread
From: Evan Green @ 2019-07-11 17:06 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: linux-pm, David Dai, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

Hi Georgi and David,

On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> Consumers may have use cases with different bandwidth requirements based
> on the system or driver state. The consumer driver can append a specific
> tag to the path and pass this information to the interconnect platform
> driver to do the aggregation based on this state.
>
> Introduce icc_set_tag() function that will allow the consumers to append
> an optional tag to each path. The aggregation of these tagged paths is
> platform specific.
>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
>  drivers/interconnect/core.c           | 24 +++++++++++++++++++++++-
>  drivers/interconnect/qcom/sdm845.c    |  2 +-
>  include/linux/interconnect-provider.h |  4 ++--
>  include/linux/interconnect.h          |  5 +++++
>  4 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 871eb4bc4efc..251354bb7fdc 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -29,6 +29,7 @@ static struct dentry *icc_debugfs_dir;
>   * @req_node: entry in list of requests for the particular @node
>   * @node: the interconnect node to which this constraint applies
>   * @dev: reference to the device that sets the constraints
> + * @tag: path tag (optional)
>   * @avg_bw: an integer describing the average bandwidth in kBps
>   * @peak_bw: an integer describing the peak bandwidth in kBps
>   */
> @@ -36,6 +37,7 @@ struct icc_req {
>         struct hlist_node req_node;
>         struct icc_node *node;
>         struct device *dev;
> +       u32 tag;
>         u32 avg_bw;
>         u32 peak_bw;
>  };
> @@ -204,7 +206,7 @@ static int aggregate_requests(struct icc_node *node)
>         node->peak_bw = 0;
>
>         hlist_for_each_entry(r, &node->req_list, req_node)
> -               p->aggregate(node, r->avg_bw, r->peak_bw,
> +               p->aggregate(node, r->tag, r->avg_bw, r->peak_bw,
>                              &node->avg_bw, &node->peak_bw);
>
>         return 0;
> @@ -385,6 +387,26 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  }
>  EXPORT_SYMBOL_GPL(of_icc_get);
>
> +/**
> + * icc_set_tag() - set an optional tag on a path
> + * @path: the path we want to tag
> + * @tag: the tag value
> + *
> + * This function allows consumers to append a tag to the requests associated
> + * with a path, so that a different aggregation could be done based on this tag.
> + */
> +void icc_set_tag(struct icc_path *path, u32 tag)
> +{
> +       int i;
> +
> +       if (!path)
> +               return;
> +
> +       for (i = 0; i < path->num_nodes; i++)
> +               path->reqs[i].tag = tag;

It's a little unfortunate to have this tag sprayed across all the
request nodes in the path, even though it's really a single value. If
we thought there were likely to be more attributes common to a path
that a provider might want access to, we could add a pointer to the
path in icc_req, then stick the tag in the path. But if the tag ends
up being the only thing worth looking at, then I guess what you have
now is slightly better.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-06-18  9:17 ` [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845 Georgi Djakov
@ 2019-07-11 17:06   ` Evan Green
  2019-07-15 23:34     ` David Dai
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Green @ 2019-07-11 17:06 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: linux-pm, David Dai, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

Hi Georgi and David,

On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> From: David Dai <daidavid1@codeaurora.org>
>
> Add support for wake and sleep commands by using a tag to indicate
> whether or not the aggregate and set requests fall into execution
> state specific bucket.
>
> Signed-off-by: David Dai <daidavid1@codeaurora.org>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
>  drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>  1 file changed, 98 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
> index fb526004c82e..c100aab39415 100644
> --- a/drivers/interconnect/qcom/sdm845.c
> +++ b/drivers/interconnect/qcom/sdm845.c
> @@ -66,6 +66,17 @@ struct bcm_db {
>  #define SDM845_MAX_BCM_PER_NODE        2
>  #define SDM845_MAX_VCD         10
>
> +#define QCOM_ICC_BUCKET_AMC            0

What is AMC again? Is it the "right now" bucket? Maybe a comment on
the meaning of this bucket would be helpful.

> +#define QCOM_ICC_BUCKET_WAKE           1
> +#define QCOM_ICC_BUCKET_SLEEP          2
> +#define QCOM_ICC_NUM_BUCKETS           3
> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
> +                                        QCOM_ICC_TAG_SLEEP)
> +
>  /**
>   * struct qcom_icc_node - Qualcomm specific interconnect nodes
>   * @name: the node name used in debugfs
> @@ -75,7 +86,9 @@ struct bcm_db {
>   * @channels: num of channels at this node
>   * @buswidth: width of the interconnect between a node and the bus
>   * @sum_avg: current sum aggregate value of all avg bw requests
> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>   * @max_peak: current max aggregate value of all peak bw requests
> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>   * @bcms: list of bcms associated with this logical node
>   * @num_bcms: num of @bcms
>   */
> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>         u16 num_links;
>         u16 channels;
>         u16 buswidth;
> -       u64 sum_avg;
> -       u64 max_peak;
> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>         struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>         size_t num_bcms;
>  };
> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>         const char *name;
>         u32 type;
>         u32 addr;
> -       u64 vote_x;
> -       u64 vote_y;
> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>         bool dirty;
>         bool keepalive;
>         struct bcm_db aux_data;
> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>                 cmd->wait = true;
>  }
>
> -static void tcs_list_gen(struct list_head *bcm_list,
> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>                          struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>                          int n[SDM845_MAX_VCD])
>  {
> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>                         commit = true;
>                         cur_vcd_size = 0;
>                 }
> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
> -                           bcm->addr, commit);
> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
> +                           bcm->vote_y[bucket], bcm->addr, commit);
>                 idx++;
>                 n[batch]++;
>                 /*
> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>
>  static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>  {
> -       size_t i;
> -       u64 agg_avg = 0;
> -       u64 agg_peak = 0;
> +       size_t i, bucket;
> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>         u64 temp;
>
> -       for (i = 0; i < bcm->num_nodes; i++) {
> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> -               agg_avg = max(agg_avg, temp);
> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
> +               for (i = 0; i < bcm->num_nodes; i++) {
> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>
> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
> -               do_div(temp, bcm->nodes[i]->buswidth);

Why is it that this one doesn't have the multiply by
bcm->nodes[i]->channels again? I can't recall if there was a reason.
If it's correct maybe it deserves a comment.

> -               agg_peak = max(agg_peak, temp);
> -       }
> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
> +                       do_div(temp, bcm->nodes[i]->buswidth);
> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>
> -       temp = agg_avg * 1000ULL;
> -       do_div(temp, bcm->aux_data.unit);
> -       bcm->vote_x = temp;
> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
> +                       bcm->nodes[i]->max_peak[bucket] = 0;

I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
1. qcom_icc_aggregate() does the math from the incoming values on
sum_avg, and then clobbers sum_avg_cached with those values.
2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.

But I don't get why that's needed. Why not just have sum_avg? Wouldn't
it work the same? Ok, it wouldn't if you ended up calling
bcm_aggregate() multiple times on the same bcm. But you have a dirty
flag that prevents this from happening. So I think it's safe to remove
the cached arrays, and just clear out the sum_avg when you aggregate.

> +               }
>
> -       temp = agg_peak * 1000ULL;
> -       do_div(temp, bcm->aux_data.unit);
> -       bcm->vote_y = temp;
> +               temp = agg_avg[bucket] * 1000ULL;
> +               do_div(temp, bcm->aux_data.unit);
> +               bcm->vote_x[bucket] = temp;
>
> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
> -               bcm->vote_x = 1;
> -               bcm->vote_y = 1;
> +               temp = agg_peak[bucket] * 1000ULL;
> +               do_div(temp, bcm->aux_data.unit);
> +               bcm->vote_y[bucket] = temp;
> +       }
> +
> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>         }
>
>         bcm->dirty = false;
> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>  {
>         size_t i;
>         struct qcom_icc_node *qn;
> +       unsigned long tag_word = (unsigned long)tag;
>
>         qn = node->data;
>
> +       if (!tag)
> +               tag_word = QCOM_ICC_TAG_ALWAYS;
> +
> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
> +               if (test_bit(i, &tag_word)) {

I guess all this extra business with tag_word and casting is so that
you can use test_bit, which is presumably a tiny bit faster? Does this
actually make a measurable difference? Maybe in the name of simplicity
we just do if (tag & BIT(i)), and then optimize if we find that
conditional to be a hotspot?

> +                       qn->sum_avg[i] += avg_bw;
> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
> +                       qn->max_peak_cached[i] = qn->max_peak[i];
> +               }
> +       }
> +
>         *agg_avg += avg_bw;
>         *agg_peak = max_t(u32, *agg_peak, peak_bw);
>
> -       qn->sum_avg = *agg_avg;
> -       qn->max_peak = *agg_peak;
> -
>         for (i = 0; i < qn->num_bcms; i++)
>                 qn->bcms[i]->dirty = true;
>
> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>          * Construct the command list based on a pre ordered list of BCMs
>          * based on VCD.
>          */
> -       tcs_list_gen(&commit_list, cmds, commit_idx);
> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>
>         if (!commit_idx[0])
>                 return ret;
> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>                 return ret;
>         }
>
> +       INIT_LIST_HEAD(&commit_list);
> +
> +       for (i = 0; i < qp->num_bcms; i++) {
> +               /*
> +                * Only generate WAKE and SLEEP commands if a resource's
> +                * requirements change as the execution environment transitions
> +                * between different power states.
> +                */
> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
> +               }
> +       }
> +
> +       if (list_empty(&commit_list))
> +               return ret;
> +
> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
> +
> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
> +       if (ret) {
> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
> +               return ret;
> +       }
> +
> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
> +
> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
> +       if (ret) {
> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
> +               return ret;
> +       }
> +
>         return ret;
>  }
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-11 17:06   ` Evan Green
@ 2019-07-15 23:34     ` David Dai
  2019-07-16 20:15       ` Evan Green
  0 siblings, 1 reply; 14+ messages in thread
From: David Dai @ 2019-07-15 23:34 UTC (permalink / raw)
  To: Evan Green, Georgi Djakov
  Cc: linux-pm, Vincent Guittot, Bjorn Andersson, amit.kucheria,
	Doug Anderson, Sean Sweeney, LKML, linux-arm Mailing List,
	linux-arm-msm

Hi Evan,

Thanks for the continued help in reviewing these patches!

On 7/11/2019 10:06 AM, Evan Green wrote:
> Hi Georgi and David,
>
> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>> From: David Dai <daidavid1@codeaurora.org>
>>
>> Add support for wake and sleep commands by using a tag to indicate
>> whether or not the aggregate and set requests fall into execution
>> state specific bucket.
>>
>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>> ---
>>   drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>>   1 file changed, 98 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
>> index fb526004c82e..c100aab39415 100644
>> --- a/drivers/interconnect/qcom/sdm845.c
>> +++ b/drivers/interconnect/qcom/sdm845.c
>> @@ -66,6 +66,17 @@ struct bcm_db {
>>   #define SDM845_MAX_BCM_PER_NODE        2
>>   #define SDM845_MAX_VCD         10
>>
>> +#define QCOM_ICC_BUCKET_AMC            0
> What is AMC again? Is it the "right now" bucket? Maybe a comment on
> the meaning of this bucket would be helpful.
That's correct. Will add a comment for this.
>
>> +#define QCOM_ICC_BUCKET_WAKE           1
>> +#define QCOM_ICC_BUCKET_SLEEP          2
>> +#define QCOM_ICC_NUM_BUCKETS           3
>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
>> +                                        QCOM_ICC_TAG_SLEEP)
>> +
>>   /**
>>    * struct qcom_icc_node - Qualcomm specific interconnect nodes
>>    * @name: the node name used in debugfs
>> @@ -75,7 +86,9 @@ struct bcm_db {
>>    * @channels: num of channels at this node
>>    * @buswidth: width of the interconnect between a node and the bus
>>    * @sum_avg: current sum aggregate value of all avg bw requests
>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>>    * @max_peak: current max aggregate value of all peak bw requests
>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>>    * @bcms: list of bcms associated with this logical node
>>    * @num_bcms: num of @bcms
>>    */
>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>>          u16 num_links;
>>          u16 channels;
>>          u16 buswidth;
>> -       u64 sum_avg;
>> -       u64 max_peak;
>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>>          struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>>          size_t num_bcms;
>>   };
>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>>          const char *name;
>>          u32 type;
>>          u32 addr;
>> -       u64 vote_x;
>> -       u64 vote_y;
>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>>          bool dirty;
>>          bool keepalive;
>>          struct bcm_db aux_data;
>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>>                  cmd->wait = true;
>>   }
>>
>> -static void tcs_list_gen(struct list_head *bcm_list,
>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>>                           struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>>                           int n[SDM845_MAX_VCD])
>>   {
>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>                          commit = true;
>>                          cur_vcd_size = 0;
>>                  }
>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
>> -                           bcm->addr, commit);
>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
>> +                           bcm->vote_y[bucket], bcm->addr, commit);
>>                  idx++;
>>                  n[batch]++;
>>                  /*
>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>
>>   static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>>   {
>> -       size_t i;
>> -       u64 agg_avg = 0;
>> -       u64 agg_peak = 0;
>> +       size_t i, bucket;
>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>>          u64 temp;
>>
>> -       for (i = 0; i < bcm->num_nodes; i++) {
>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>> -               agg_avg = max(agg_avg, temp);
>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
>> +               for (i = 0; i < bcm->num_nodes; i++) {
>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>>
>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
>> -               do_div(temp, bcm->nodes[i]->buswidth);
> Why is it that this one doesn't have the multiply by
> bcm->nodes[i]->channels again? I can't recall if there was a reason.
> If it's correct maybe it deserves a comment.

I think the rationale behind this is generally for consumers to target a 
certain minimum threshold to satisfy some structural latency 
requirements as opposed to strictly throughput, and it may be easier for 
consumers to reuse certain values to support hitting some minimum NoC 
frequencies without having to be concerned with the number of channels 
that may change from platform to platform.

>
>> -               agg_peak = max(agg_peak, temp);
>> -       }
>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
>> +                       do_div(temp, bcm->nodes[i]->buswidth);
>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>>
>> -       temp = agg_avg * 1000ULL;
>> -       do_div(temp, bcm->aux_data.unit);
>> -       bcm->vote_x = temp;
>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
> 1. qcom_icc_aggregate() does the math from the incoming values on
> sum_avg, and then clobbers sum_avg_cached with those values.
> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
>
> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
> it work the same? Ok, it wouldn't if you ended up calling
> bcm_aggregate() multiple times on the same bcm. But you have a dirty
> flag that prevents this from happening. So I think it's safe to remove
> the cached arrays, and just clear out the sum_avg when you aggregate.
You are correct in that the dirty flag would prevent another repeat of 
the bcm_aggregate() call in the same icc_set request. But consider a 
following icc_set request on a different node that shares the same BCM, 
the next bcm_aggregate() would result in an incorrect aggregate sum_avg 
for the BCM since the avg_sum from the previous node(from the previous 
icc_set) was cleared out. We need a way to retain the current state of 
all nodes to accurately aggregate the bw values for the BCM.
>> +               }
>>
>> -       temp = agg_peak * 1000ULL;
>> -       do_div(temp, bcm->aux_data.unit);
>> -       bcm->vote_y = temp;
>> +               temp = agg_avg[bucket] * 1000ULL;
>> +               do_div(temp, bcm->aux_data.unit);
>> +               bcm->vote_x[bucket] = temp;
>>
>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
>> -               bcm->vote_x = 1;
>> -               bcm->vote_y = 1;
>> +               temp = agg_peak[bucket] * 1000ULL;
>> +               do_div(temp, bcm->aux_data.unit);
>> +               bcm->vote_y[bucket] = temp;
>> +       }
>> +
>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>>          }
>>
>>          bcm->dirty = false;
>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>>   {
>>          size_t i;
>>          struct qcom_icc_node *qn;
>> +       unsigned long tag_word = (unsigned long)tag;
>>
>>          qn = node->data;
>>
>> +       if (!tag)
>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
>> +
>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>> +               if (test_bit(i, &tag_word)) {
> I guess all this extra business with tag_word and casting is so that
> you can use test_bit, which is presumably a tiny bit faster? Does this
> actually make a measurable difference? Maybe in the name of simplicity
> we just do if (tag & BIT(i)), and then optimize if we find that
> conditional to be a hotspot?
Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
>> +                       qn->sum_avg[i] += avg_bw;
>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
>> +               }
>> +       }
>> +
>>          *agg_avg += avg_bw;
>>          *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>
>> -       qn->sum_avg = *agg_avg;
>> -       qn->max_peak = *agg_peak;
>> -
>>          for (i = 0; i < qn->num_bcms; i++)
>>                  qn->bcms[i]->dirty = true;
>>
>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>           * Construct the command list based on a pre ordered list of BCMs
>>           * based on VCD.
>>           */
>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>>
>>          if (!commit_idx[0])
>>                  return ret;
>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>                  return ret;
>>          }
>>
>> +       INIT_LIST_HEAD(&commit_list);
>> +
>> +       for (i = 0; i < qp->num_bcms; i++) {
>> +               /*
>> +                * Only generate WAKE and SLEEP commands if a resource's
>> +                * requirements change as the execution environment transitions
>> +                * between different power states.
>> +                */
>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
>> +               }
>> +       }
>> +
>> +       if (list_empty(&commit_list))
>> +               return ret;
>> +
>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
>> +
>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
>> +       if (ret) {
>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
>> +               return ret;
>> +       }
>> +
>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
>> +
>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
>> +       if (ret) {
>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
>> +               return ret;
>> +       }
>> +
>>          return ret;
>>   }
>>
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-15 23:34     ` David Dai
@ 2019-07-16 20:15       ` Evan Green
  2019-07-18 17:59         ` David Dai
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Green @ 2019-07-16 20:15 UTC (permalink / raw)
  To: David Dai
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
>
> Hi Evan,
>
> Thanks for the continued help in reviewing these patches!

No problem. I want to do more, but haven't found time to do the
prerequisite research before jumping into some of the other
discussions yet.

>
> On 7/11/2019 10:06 AM, Evan Green wrote:
> > Hi Georgi and David,
> >
> > On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
> >> From: David Dai <daidavid1@codeaurora.org>
> >>
> >> Add support for wake and sleep commands by using a tag to indicate
> >> whether or not the aggregate and set requests fall into execution
> >> state specific bucket.
> >>
> >> Signed-off-by: David Dai <daidavid1@codeaurora.org>
> >> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> >> ---
> >>   drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
> >>   1 file changed, 98 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
> >> index fb526004c82e..c100aab39415 100644
> >> --- a/drivers/interconnect/qcom/sdm845.c
> >> +++ b/drivers/interconnect/qcom/sdm845.c
> >> @@ -66,6 +66,17 @@ struct bcm_db {
> >>   #define SDM845_MAX_BCM_PER_NODE        2
> >>   #define SDM845_MAX_VCD         10
> >>
> >> +#define QCOM_ICC_BUCKET_AMC            0
> > What is AMC again? Is it the "right now" bucket? Maybe a comment on
> > the meaning of this bucket would be helpful.
> That's correct. Will add a comment for this.
> >
> >> +#define QCOM_ICC_BUCKET_WAKE           1
> >> +#define QCOM_ICC_BUCKET_SLEEP          2
> >> +#define QCOM_ICC_NUM_BUCKETS           3
> >> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
> >> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
> >> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
> >> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
> >> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
> >> +                                        QCOM_ICC_TAG_SLEEP)
> >> +
> >>   /**
> >>    * struct qcom_icc_node - Qualcomm specific interconnect nodes
> >>    * @name: the node name used in debugfs
> >> @@ -75,7 +86,9 @@ struct bcm_db {
> >>    * @channels: num of channels at this node
> >>    * @buswidth: width of the interconnect between a node and the bus
> >>    * @sum_avg: current sum aggregate value of all avg bw requests
> >> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
> >>    * @max_peak: current max aggregate value of all peak bw requests
> >> + * @max_peak_cached: previous max aggregate value of all peak bw requests
> >>    * @bcms: list of bcms associated with this logical node
> >>    * @num_bcms: num of @bcms
> >>    */
> >> @@ -86,8 +99,10 @@ struct qcom_icc_node {
> >>          u16 num_links;
> >>          u16 channels;
> >>          u16 buswidth;
> >> -       u64 sum_avg;
> >> -       u64 max_peak;
> >> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
> >> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
> >> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
> >> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
> >>          struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
> >>          size_t num_bcms;
> >>   };
> >> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
> >>          const char *name;
> >>          u32 type;
> >>          u32 addr;
> >> -       u64 vote_x;
> >> -       u64 vote_y;
> >> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
> >> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
> >>          bool dirty;
> >>          bool keepalive;
> >>          struct bcm_db aux_data;
> >> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
> >>                  cmd->wait = true;
> >>   }
> >>
> >> -static void tcs_list_gen(struct list_head *bcm_list,
> >> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
> >>                           struct tcs_cmd tcs_list[SDM845_MAX_VCD],
> >>                           int n[SDM845_MAX_VCD])
> >>   {
> >> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>                          commit = true;
> >>                          cur_vcd_size = 0;
> >>                  }
> >> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
> >> -                           bcm->addr, commit);
> >> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
> >> +                           bcm->vote_y[bucket], bcm->addr, commit);
> >>                  idx++;
> >>                  n[batch]++;
> >>                  /*
> >> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>
> >>   static void bcm_aggregate(struct qcom_icc_bcm *bcm)
> >>   {
> >> -       size_t i;
> >> -       u64 agg_avg = 0;
> >> -       u64 agg_peak = 0;
> >> +       size_t i, bucket;
> >> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
> >> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
> >>          u64 temp;
> >>
> >> -       for (i = 0; i < bcm->num_nodes; i++) {
> >> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
> >> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >> -               agg_avg = max(agg_avg, temp);
> >> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
> >> +               for (i = 0; i < bcm->num_nodes; i++) {
> >> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
> >> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
> >>
> >> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
> >> -               do_div(temp, bcm->nodes[i]->buswidth);
> > Why is it that this one doesn't have the multiply by
> > bcm->nodes[i]->channels again? I can't recall if there was a reason.
> > If it's correct maybe it deserves a comment.
>
> I think the rationale behind this is generally for consumers to target a
> certain minimum threshold to satisfy some structural latency
> requirements as opposed to strictly throughput, and it may be easier for
> consumers to reuse certain values to support hitting some minimum NoC
> frequencies without having to be concerned with the number of channels
> that may change from platform to platform.

I was mostly pointing out that sum_avg seems to have the multiply, but
max_peak does not. I would have expected those two things to be of the
same units, and get the same treatment. Maybe the hardware is taking
in different final units for that field, one that is per-channel and
one that isn't?

>
> >
> >> -               agg_peak = max(agg_peak, temp);
> >> -       }
> >> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
> >> +                       do_div(temp, bcm->nodes[i]->buswidth);
> >> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
> >>
> >> -       temp = agg_avg * 1000ULL;
> >> -       do_div(temp, bcm->aux_data.unit);
> >> -       bcm->vote_x = temp;
> >> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
> >> +                       bcm->nodes[i]->max_peak[bucket] = 0;
> > I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
> > 1. qcom_icc_aggregate() does the math from the incoming values on
> > sum_avg, and then clobbers sum_avg_cached with those values.
> > 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
> >
> > But I don't get why that's needed. Why not just have sum_avg? Wouldn't
> > it work the same? Ok, it wouldn't if you ended up calling
> > bcm_aggregate() multiple times on the same bcm. But you have a dirty
> > flag that prevents this from happening. So I think it's safe to remove
> > the cached arrays, and just clear out the sum_avg when you aggregate.
> You are correct in that the dirty flag would prevent another repeat of
> the bcm_aggregate() call in the same icc_set request. But consider a
> following icc_set request on a different node that shares the same BCM,
> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
> for the BCM since the avg_sum from the previous node(from the previous
> icc_set) was cleared out. We need a way to retain the current state of
> all nodes to accurately aggregate the bw values for the BCM.

I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
they're only ever a) equal, like after qcom_icc_aggregate(), or b)
sum_avg is zeroed, and sum_avg_cached is its old value. A new
icc_set_bw() would call aggregate_requests(), which would clobber
sum_avg_cached to sum_avg for every BCM involved. Then the core would
call apply_constraints(), then qcom_icc_set(), which would use
sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
that bcm_aggregate() is only called once per BCM. This all happens
under the mutex held in the core. A new request would start the whole
thing over, since sum_avg is cleared. It seems to me that flow would
work the same with one array as it does with two. Maybe you can walk
me through a scenario?
-Evan


> >> +               }
> >>
> >> -       temp = agg_peak * 1000ULL;
> >> -       do_div(temp, bcm->aux_data.unit);
> >> -       bcm->vote_y = temp;
> >> +               temp = agg_avg[bucket] * 1000ULL;
> >> +               do_div(temp, bcm->aux_data.unit);
> >> +               bcm->vote_x[bucket] = temp;
> >>
> >> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
> >> -               bcm->vote_x = 1;
> >> -               bcm->vote_y = 1;
> >> +               temp = agg_peak[bucket] * 1000ULL;
> >> +               do_div(temp, bcm->aux_data.unit);
> >> +               bcm->vote_y[bucket] = temp;
> >> +       }
> >> +
> >> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
> >> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
> >> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
> >> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
> >> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
> >>          }
> >>
> >>          bcm->dirty = false;
> >> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
> >>   {
> >>          size_t i;
> >>          struct qcom_icc_node *qn;
> >> +       unsigned long tag_word = (unsigned long)tag;
> >>
> >>          qn = node->data;
> >>
> >> +       if (!tag)
> >> +               tag_word = QCOM_ICC_TAG_ALWAYS;
> >> +
> >> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
> >> +               if (test_bit(i, &tag_word)) {
> > I guess all this extra business with tag_word and casting is so that
> > you can use test_bit, which is presumably a tiny bit faster? Does this
> > actually make a measurable difference? Maybe in the name of simplicity
> > we just do if (tag & BIT(i)), and then optimize if we find that
> > conditional to be a hotspot?
> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
> >> +                       qn->sum_avg[i] += avg_bw;
> >> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
> >> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
> >> +                       qn->max_peak_cached[i] = qn->max_peak[i];
> >> +               }
> >> +       }
> >> +
> >>          *agg_avg += avg_bw;
> >>          *agg_peak = max_t(u32, *agg_peak, peak_bw);
> >>
> >> -       qn->sum_avg = *agg_avg;
> >> -       qn->max_peak = *agg_peak;
> >> -
> >>          for (i = 0; i < qn->num_bcms; i++)
> >>                  qn->bcms[i]->dirty = true;
> >>
> >> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>           * Construct the command list based on a pre ordered list of BCMs
> >>           * based on VCD.
> >>           */
> >> -       tcs_list_gen(&commit_list, cmds, commit_idx);
> >> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
> >>
> >>          if (!commit_idx[0])
> >>                  return ret;
> >> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>                  return ret;
> >>          }
> >>
> >> +       INIT_LIST_HEAD(&commit_list);
> >> +
> >> +       for (i = 0; i < qp->num_bcms; i++) {
> >> +               /*
> >> +                * Only generate WAKE and SLEEP commands if a resource's
> >> +                * requirements change as the execution environment transitions
> >> +                * between different power states.
> >> +                */
> >> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
> >> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
> >> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
> >> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
> >> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
> >> +               }
> >> +       }
> >> +
> >> +       if (list_empty(&commit_list))
> >> +               return ret;
> >> +
> >> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
> >> +
> >> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
> >> +       if (ret) {
> >> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
> >> +               return ret;
> >> +       }
> >> +
> >> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
> >> +
> >> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
> >> +       if (ret) {
> >> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
> >> +               return ret;
> >> +       }
> >> +
> >>          return ret;
> >>   }
> >>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-16 20:15       ` Evan Green
@ 2019-07-18 17:59         ` David Dai
  2019-07-30 22:54           ` Evan Green
  0 siblings, 1 reply; 14+ messages in thread
From: David Dai @ 2019-07-18 17:59 UTC (permalink / raw)
  To: Evan Green
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

On 7/16/2019 1:15 PM, Evan Green wrote:
> On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
>> Hi Evan,
>>
>> Thanks for the continued help in reviewing these patches!
> No problem. I want to do more, but haven't found time to do the
> prerequisite research before jumping into some of the other
> discussions yet.
>
>> On 7/11/2019 10:06 AM, Evan Green wrote:
>>> Hi Georgi and David,
>>>
>>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>>>> From: David Dai <daidavid1@codeaurora.org>
>>>>
>>>> Add support for wake and sleep commands by using a tag to indicate
>>>> whether or not the aggregate and set requests fall into execution
>>>> state specific bucket.
>>>>
>>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
>>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>>>> ---
>>>>    drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>>>>    1 file changed, 98 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
>>>> index fb526004c82e..c100aab39415 100644
>>>> --- a/drivers/interconnect/qcom/sdm845.c
>>>> +++ b/drivers/interconnect/qcom/sdm845.c
>>>> @@ -66,6 +66,17 @@ struct bcm_db {
>>>>    #define SDM845_MAX_BCM_PER_NODE        2
>>>>    #define SDM845_MAX_VCD         10
>>>>
>>>> +#define QCOM_ICC_BUCKET_AMC            0
>>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
>>> the meaning of this bucket would be helpful.
>> That's correct. Will add a comment for this.
>>>> +#define QCOM_ICC_BUCKET_WAKE           1
>>>> +#define QCOM_ICC_BUCKET_SLEEP          2
>>>> +#define QCOM_ICC_NUM_BUCKETS           3
>>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
>>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
>>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
>>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
>>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
>>>> +                                        QCOM_ICC_TAG_SLEEP)
>>>> +
>>>>    /**
>>>>     * struct qcom_icc_node - Qualcomm specific interconnect nodes
>>>>     * @name: the node name used in debugfs
>>>> @@ -75,7 +86,9 @@ struct bcm_db {
>>>>     * @channels: num of channels at this node
>>>>     * @buswidth: width of the interconnect between a node and the bus
>>>>     * @sum_avg: current sum aggregate value of all avg bw requests
>>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>>>>     * @max_peak: current max aggregate value of all peak bw requests
>>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>>>>     * @bcms: list of bcms associated with this logical node
>>>>     * @num_bcms: num of @bcms
>>>>     */
>>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>>>>           u16 num_links;
>>>>           u16 channels;
>>>>           u16 buswidth;
>>>> -       u64 sum_avg;
>>>> -       u64 max_peak;
>>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>>>>           struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>>>>           size_t num_bcms;
>>>>    };
>>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>>>>           const char *name;
>>>>           u32 type;
>>>>           u32 addr;
>>>> -       u64 vote_x;
>>>> -       u64 vote_y;
>>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>>>>           bool dirty;
>>>>           bool keepalive;
>>>>           struct bcm_db aux_data;
>>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>>>>                   cmd->wait = true;
>>>>    }
>>>>
>>>> -static void tcs_list_gen(struct list_head *bcm_list,
>>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>>>>                            struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>>>>                            int n[SDM845_MAX_VCD])
>>>>    {
>>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>                           commit = true;
>>>>                           cur_vcd_size = 0;
>>>>                   }
>>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
>>>> -                           bcm->addr, commit);
>>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
>>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
>>>>                   idx++;
>>>>                   n[batch]++;
>>>>                   /*
>>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>
>>>>    static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>>>>    {
>>>> -       size_t i;
>>>> -       u64 agg_avg = 0;
>>>> -       u64 agg_peak = 0;
>>>> +       size_t i, bucket;
>>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
>>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>           u64 temp;
>>>>
>>>> -       for (i = 0; i < bcm->num_nodes; i++) {
>>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
>>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>> -               agg_avg = max(agg_avg, temp);
>>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
>>>> +               for (i = 0; i < bcm->num_nodes; i++) {
>>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
>>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>>>>
>>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
>>>> -               do_div(temp, bcm->nodes[i]->buswidth);
>>> Why is it that this one doesn't have the multiply by
>>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
>>> If it's correct maybe it deserves a comment.
>> I think the rationale behind this is generally for consumers to target a
>> certain minimum threshold to satisfy some structural latency
>> requirements as opposed to strictly throughput, and it may be easier for
>> consumers to reuse certain values to support hitting some minimum NoC
>> frequencies without having to be concerned with the number of channels
>> that may change from platform to platform.
> I was mostly pointing out that sum_avg seems to have the multiply, but
> max_peak does not. I would have expected those two things to be of the
> same units, and get the same treatment. Maybe the hardware is taking
> in different final units for that field, one that is per-channel and
> one that isn't?

The hardware isn't treating the values differently. I couldn't find any 
justification other than the intuition mentioned above for the ease of 
voting from the consumer perspective. The consumer would know that this 
peak_bw value results in some floor performance from the system to 
satisfy its latency requirements. The same approach would work if we 
accounted for the number of channels as well, but given that channels 
may vary from platform to platform or even on the same platform that 
shares multiple channel configurations(DDR), it can be difficult for 
consumers to keep track of and have to adjust their votes constantly(to 
try to hit some frequency/latency requirement, this intuition doesn't 
apply for avg_bw since we're concerned with throughput in that case).

>>>> -               agg_peak = max(agg_peak, temp);
>>>> -       }
>>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
>>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
>>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>>>>
>>>> -       temp = agg_avg * 1000ULL;
>>>> -       do_div(temp, bcm->aux_data.unit);
>>>> -       bcm->vote_x = temp;
>>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
>>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
>>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
>>> 1. qcom_icc_aggregate() does the math from the incoming values on
>>> sum_avg, and then clobbers sum_avg_cached with those values.
>>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
>>>
>>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
>>> it work the same? Ok, it wouldn't if you ended up calling
>>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
>>> flag that prevents this from happening. So I think it's safe to remove
>>> the cached arrays, and just clear out the sum_avg when you aggregate.
>> You are correct in that the dirty flag would prevent another repeat of
>> the bcm_aggregate() call in the same icc_set request. But consider a
>> following icc_set request on a different node that shares the same BCM,
>> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
>> for the BCM since the avg_sum from the previous node(from the previous
>> icc_set) was cleared out. We need a way to retain the current state of
>> all nodes to accurately aggregate the bw values for the BCM.
> I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
> they're only ever a) equal, like after qcom_icc_aggregate(), or b)
> sum_avg is zeroed, and sum_avg_cached is its old value. A new
> icc_set_bw() would call aggregate_requests(), which would clobber
> sum_avg_cached to sum_avg for every BCM involved. Then the core would
> call apply_constraints(), then qcom_icc_set(), which would use
> sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
> that bcm_aggregate() is only called once per BCM. This all happens
> under the mutex held in the core. A new request would start the whole
> thing over, since sum_avg is cleared. It seems to me that flow would
> work the same with one array as it does with two. Maybe you can walk
> me through a scenario?
> -Evan

Let's walk through the scenario you've just described with the 
assumption that there's only one avg_sum value per node with two 
icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB 
for node 2) under the same BCM(say BCM A). The first 
qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed 
by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM 
A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x 
of 2MB(for simplicity let's ignore unit). We then clear out 
node1->avg_sum before we start the next icc_set_bw(). In the following 
icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2 
followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but 
now incorrectly aggregates BCM A to 1MB by looking at 
max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared 
out when in reality BCM A should have a vote_x value of 2MB at this 
point. The subsequent bcm_aggregate do not re-aggregate all of the 
requests for each of its nodes, but assumes that the aggregated results 
at the nodes are correct.

>
>>>> +               }
>>>>
>>>> -       temp = agg_peak * 1000ULL;
>>>> -       do_div(temp, bcm->aux_data.unit);
>>>> -       bcm->vote_y = temp;
>>>> +               temp = agg_avg[bucket] * 1000ULL;
>>>> +               do_div(temp, bcm->aux_data.unit);
>>>> +               bcm->vote_x[bucket] = temp;
>>>>
>>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
>>>> -               bcm->vote_x = 1;
>>>> -               bcm->vote_y = 1;
>>>> +               temp = agg_peak[bucket] * 1000ULL;
>>>> +               do_div(temp, bcm->aux_data.unit);
>>>> +               bcm->vote_y[bucket] = temp;
>>>> +       }
>>>> +
>>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>           }
>>>>
>>>>           bcm->dirty = false;
>>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>>>>    {
>>>>           size_t i;
>>>>           struct qcom_icc_node *qn;
>>>> +       unsigned long tag_word = (unsigned long)tag;
>>>>
>>>>           qn = node->data;
>>>>
>>>> +       if (!tag)
>>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
>>>> +
>>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>>>> +               if (test_bit(i, &tag_word)) {
>>> I guess all this extra business with tag_word and casting is so that
>>> you can use test_bit, which is presumably a tiny bit faster? Does this
>>> actually make a measurable difference? Maybe in the name of simplicity
>>> we just do if (tag & BIT(i)), and then optimize if we find that
>>> conditional to be a hotspot?
>> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
>>>> +                       qn->sum_avg[i] += avg_bw;
>>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
>>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
>>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
>>>> +               }
>>>> +       }
>>>> +
>>>>           *agg_avg += avg_bw;
>>>>           *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>>>
>>>> -       qn->sum_avg = *agg_avg;
>>>> -       qn->max_peak = *agg_peak;
>>>> -
>>>>           for (i = 0; i < qn->num_bcms; i++)
>>>>                   qn->bcms[i]->dirty = true;
>>>>
>>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>            * Construct the command list based on a pre ordered list of BCMs
>>>>            * based on VCD.
>>>>            */
>>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>>>>
>>>>           if (!commit_idx[0])
>>>>                   return ret;
>>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>                   return ret;
>>>>           }
>>>>
>>>> +       INIT_LIST_HEAD(&commit_list);
>>>> +
>>>> +       for (i = 0; i < qp->num_bcms; i++) {
>>>> +               /*
>>>> +                * Only generate WAKE and SLEEP commands if a resource's
>>>> +                * requirements change as the execution environment transitions
>>>> +                * between different power states.
>>>> +                */
>>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
>>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
>>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
>>>> +               }
>>>> +       }
>>>> +
>>>> +       if (list_empty(&commit_list))
>>>> +               return ret;
>>>> +
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
>>>> +
>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
>>>> +       if (ret) {
>>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
>>>> +               return ret;
>>>> +       }
>>>> +
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
>>>> +
>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
>>>> +       if (ret) {
>>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
>>>> +               return ret;
>>>> +       }
>>>> +
>>>>           return ret;
>>>>    }
>>>>
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-18 17:59         ` David Dai
@ 2019-07-30 22:54           ` Evan Green
  2019-07-31  0:37             ` David Dai
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Green @ 2019-07-30 22:54 UTC (permalink / raw)
  To: David Dai
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

On Thu, Jul 18, 2019 at 10:59 AM David Dai <daidavid1@codeaurora.org> wrote:
>
> On 7/16/2019 1:15 PM, Evan Green wrote:
> > On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
> >> Hi Evan,
> >>
> >> Thanks for the continued help in reviewing these patches!
> > No problem. I want to do more, but haven't found time to do the
> > prerequisite research before jumping into some of the other
> > discussions yet.
> >
> >> On 7/11/2019 10:06 AM, Evan Green wrote:
> >>> Hi Georgi and David,
> >>>
> >>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
> >>>> From: David Dai <daidavid1@codeaurora.org>
> >>>>
> >>>> Add support for wake and sleep commands by using a tag to indicate
> >>>> whether or not the aggregate and set requests fall into execution
> >>>> state specific bucket.
> >>>>
> >>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
> >>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> >>>> ---
> >>>>    drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
> >>>>    1 file changed, 98 insertions(+), 31 deletions(-)
> >>>>
> >>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
> >>>> index fb526004c82e..c100aab39415 100644
> >>>> --- a/drivers/interconnect/qcom/sdm845.c
> >>>> +++ b/drivers/interconnect/qcom/sdm845.c
> >>>> @@ -66,6 +66,17 @@ struct bcm_db {
> >>>>    #define SDM845_MAX_BCM_PER_NODE        2
> >>>>    #define SDM845_MAX_VCD         10
> >>>>
> >>>> +#define QCOM_ICC_BUCKET_AMC            0
> >>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
> >>> the meaning of this bucket would be helpful.
> >> That's correct. Will add a comment for this.
> >>>> +#define QCOM_ICC_BUCKET_WAKE           1
> >>>> +#define QCOM_ICC_BUCKET_SLEEP          2
> >>>> +#define QCOM_ICC_NUM_BUCKETS           3
> >>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
> >>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
> >>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
> >>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
> >>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
> >>>> +                                        QCOM_ICC_TAG_SLEEP)
> >>>> +
> >>>>    /**
> >>>>     * struct qcom_icc_node - Qualcomm specific interconnect nodes
> >>>>     * @name: the node name used in debugfs
> >>>> @@ -75,7 +86,9 @@ struct bcm_db {
> >>>>     * @channels: num of channels at this node
> >>>>     * @buswidth: width of the interconnect between a node and the bus
> >>>>     * @sum_avg: current sum aggregate value of all avg bw requests
> >>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
> >>>>     * @max_peak: current max aggregate value of all peak bw requests
> >>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
> >>>>     * @bcms: list of bcms associated with this logical node
> >>>>     * @num_bcms: num of @bcms
> >>>>     */
> >>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
> >>>>           u16 num_links;
> >>>>           u16 channels;
> >>>>           u16 buswidth;
> >>>> -       u64 sum_avg;
> >>>> -       u64 max_peak;
> >>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
> >>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
> >>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
> >>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
> >>>>           struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
> >>>>           size_t num_bcms;
> >>>>    };
> >>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
> >>>>           const char *name;
> >>>>           u32 type;
> >>>>           u32 addr;
> >>>> -       u64 vote_x;
> >>>> -       u64 vote_y;
> >>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
> >>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
> >>>>           bool dirty;
> >>>>           bool keepalive;
> >>>>           struct bcm_db aux_data;
> >>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
> >>>>                   cmd->wait = true;
> >>>>    }
> >>>>
> >>>> -static void tcs_list_gen(struct list_head *bcm_list,
> >>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
> >>>>                            struct tcs_cmd tcs_list[SDM845_MAX_VCD],
> >>>>                            int n[SDM845_MAX_VCD])
> >>>>    {
> >>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>>>                           commit = true;
> >>>>                           cur_vcd_size = 0;
> >>>>                   }
> >>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
> >>>> -                           bcm->addr, commit);
> >>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
> >>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
> >>>>                   idx++;
> >>>>                   n[batch]++;
> >>>>                   /*
> >>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>>>
> >>>>    static void bcm_aggregate(struct qcom_icc_bcm *bcm)
> >>>>    {
> >>>> -       size_t i;
> >>>> -       u64 agg_avg = 0;
> >>>> -       u64 agg_peak = 0;
> >>>> +       size_t i, bucket;
> >>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
> >>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
> >>>>           u64 temp;
> >>>>
> >>>> -       for (i = 0; i < bcm->num_nodes; i++) {
> >>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
> >>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >>>> -               agg_avg = max(agg_avg, temp);
> >>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
> >>>> +               for (i = 0; i < bcm->num_nodes; i++) {
> >>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
> >>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
> >>>>
> >>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
> >>>> -               do_div(temp, bcm->nodes[i]->buswidth);
> >>> Why is it that this one doesn't have the multiply by
> >>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
> >>> If it's correct maybe it deserves a comment.
> >> I think the rationale behind this is generally for consumers to target a
> >> certain minimum threshold to satisfy some structural latency
> >> requirements as opposed to strictly throughput, and it may be easier for
> >> consumers to reuse certain values to support hitting some minimum NoC
> >> frequencies without having to be concerned with the number of channels
> >> that may change from platform to platform.
> > I was mostly pointing out that sum_avg seems to have the multiply, but
> > max_peak does not. I would have expected those two things to be of the
> > same units, and get the same treatment. Maybe the hardware is taking
> > in different final units for that field, one that is per-channel and
> > one that isn't?
>
> The hardware isn't treating the values differently. I couldn't find any
> justification other than the intuition mentioned above for the ease of
> voting from the consumer perspective. The consumer would know that this
> peak_bw value results in some floor performance from the system to
> satisfy its latency requirements. The same approach would work if we
> accounted for the number of channels as well, but given that channels
> may vary from platform to platform or even on the same platform that
> shares multiple channel configurations(DDR), it can be difficult for
> consumers to keep track of and have to adjust their votes constantly(to
> try to hit some frequency/latency requirement, this intuition doesn't
> apply for avg_bw since we're concerned with throughput in that case).
>
> >>>> -               agg_peak = max(agg_peak, temp);
> >>>> -       }
> >>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
> >>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
> >>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
> >>>>
> >>>> -       temp = agg_avg * 1000ULL;
> >>>> -       do_div(temp, bcm->aux_data.unit);
> >>>> -       bcm->vote_x = temp;
> >>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
> >>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
> >>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
> >>> 1. qcom_icc_aggregate() does the math from the incoming values on
> >>> sum_avg, and then clobbers sum_avg_cached with those values.
> >>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
> >>>
> >>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
> >>> it work the same? Ok, it wouldn't if you ended up calling
> >>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
> >>> flag that prevents this from happening. So I think it's safe to remove
> >>> the cached arrays, and just clear out the sum_avg when you aggregate.
> >> You are correct in that the dirty flag would prevent another repeat of
> >> the bcm_aggregate() call in the same icc_set request. But consider a
> >> following icc_set request on a different node that shares the same BCM,
> >> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
> >> for the BCM since the avg_sum from the previous node(from the previous
> >> icc_set) was cleared out. We need a way to retain the current state of
> >> all nodes to accurately aggregate the bw values for the BCM.
> > I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
> > they're only ever a) equal, like after qcom_icc_aggregate(), or b)
> > sum_avg is zeroed, and sum_avg_cached is its old value. A new
> > icc_set_bw() would call aggregate_requests(), which would clobber
> > sum_avg_cached to sum_avg for every BCM involved. Then the core would
> > call apply_constraints(), then qcom_icc_set(), which would use
> > sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
> > that bcm_aggregate() is only called once per BCM. This all happens
> > under the mutex held in the core. A new request would start the whole
> > thing over, since sum_avg is cleared. It seems to me that flow would
> > work the same with one array as it does with two. Maybe you can walk
> > me through a scenario?
> > -Evan
>
> Let's walk through the scenario you've just described with the
> assumption that there's only one avg_sum value per node with two
> icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB
> for node 2) under the same BCM(say BCM A). The first
> qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed
> by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM
> A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x
> of 2MB(for simplicity let's ignore unit). We then clear out
> node1->avg_sum before we start the next icc_set_bw(). In the following
> icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2
> followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but
> now incorrectly aggregates BCM A to 1MB by looking at
> max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared
> out when in reality BCM A should have a vote_x value of 2MB at this
> point. The subsequent bcm_aggregate do not re-aggregate all of the
> requests for each of its nodes, but assumes that the aggregated results
> at the nodes are correct.

Ah, I finally get it. Thanks for the detailed explanation. It's pretty
confusing that there are essentially two connected graphs laid on top
of each other, one graph consisting of nodes the framework deals with,
and another graph that groups those nodes together into BCMs. I was
failing to understand that bcm_aggregate loops over nodes that have
nothing to do with the current request, and so it needs to remember
the old totals from former requests. You've got the two arrays
basically to differentiate between "add together all requests for this
node", and "max all nodes into a BCM", since you need to reset sum_avg
at the start of the first call to qcom_icc_aggregate().

I had suggested a callback in the core earlier to tell the providers
"I'm about to start aggregating on these nodes", which would have
allowed you to clear sum_avg in that callback and reduce down to one
array. IMO that's a lot easier to understand than these double arrays,
but maybe it's just me that gets confused.

Why do we bother with the individual nodes at all, why don't we just
build a graph out of the BCMs themselves and pass that to the
framework? I guess you can't do that because of .channels and
.bus_width, you wouldn't know what to multiply/divide by to translate
to a vote value? Hm... it would be great to make this simpler, but I'm
out of suggestions for now.
-Evan

>
> >
> >>>> +               }
> >>>>
> >>>> -       temp = agg_peak * 1000ULL;
> >>>> -       do_div(temp, bcm->aux_data.unit);
> >>>> -       bcm->vote_y = temp;
> >>>> +               temp = agg_avg[bucket] * 1000ULL;
> >>>> +               do_div(temp, bcm->aux_data.unit);
> >>>> +               bcm->vote_x[bucket] = temp;
> >>>>
> >>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
> >>>> -               bcm->vote_x = 1;
> >>>> -               bcm->vote_y = 1;
> >>>> +               temp = agg_peak[bucket] * 1000ULL;
> >>>> +               do_div(temp, bcm->aux_data.unit);
> >>>> +               bcm->vote_y[bucket] = temp;
> >>>> +       }
> >>>> +
> >>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
> >>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
> >>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
> >>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
> >>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
> >>>>           }
> >>>>
> >>>>           bcm->dirty = false;
> >>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
> >>>>    {
> >>>>           size_t i;
> >>>>           struct qcom_icc_node *qn;
> >>>> +       unsigned long tag_word = (unsigned long)tag;
> >>>>
> >>>>           qn = node->data;
> >>>>
> >>>> +       if (!tag)
> >>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
> >>>> +
> >>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
> >>>> +               if (test_bit(i, &tag_word)) {
> >>> I guess all this extra business with tag_word and casting is so that
> >>> you can use test_bit, which is presumably a tiny bit faster? Does this
> >>> actually make a measurable difference? Maybe in the name of simplicity
> >>> we just do if (tag & BIT(i)), and then optimize if we find that
> >>> conditional to be a hotspot?
> >> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
> >>>> +                       qn->sum_avg[i] += avg_bw;
> >>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
> >>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
> >>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>>           *agg_avg += avg_bw;
> >>>>           *agg_peak = max_t(u32, *agg_peak, peak_bw);
> >>>>
> >>>> -       qn->sum_avg = *agg_avg;
> >>>> -       qn->max_peak = *agg_peak;
> >>>> -
> >>>>           for (i = 0; i < qn->num_bcms; i++)
> >>>>                   qn->bcms[i]->dirty = true;
> >>>>
> >>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>>>            * Construct the command list based on a pre ordered list of BCMs
> >>>>            * based on VCD.
> >>>>            */
> >>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
> >>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
> >>>>
> >>>>           if (!commit_idx[0])
> >>>>                   return ret;
> >>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>>>                   return ret;
> >>>>           }
> >>>>
> >>>> +       INIT_LIST_HEAD(&commit_list);
> >>>> +
> >>>> +       for (i = 0; i < qp->num_bcms; i++) {
> >>>> +               /*
> >>>> +                * Only generate WAKE and SLEEP commands if a resource's
> >>>> +                * requirements change as the execution environment transitions
> >>>> +                * between different power states.
> >>>> +                */
> >>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
> >>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
> >>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
> >>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
> >>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>> +       if (list_empty(&commit_list))
> >>>> +               return ret;
> >>>> +
> >>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
> >>>> +
> >>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
> >>>> +       if (ret) {
> >>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
> >>>> +               return ret;
> >>>> +       }
> >>>> +
> >>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
> >>>> +
> >>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
> >>>> +       if (ret) {
> >>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
> >>>> +               return ret;
> >>>> +       }
> >>>> +
> >>>>           return ret;
> >>>>    }
> >>>>
> >> --
> >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> >> a Linux Foundation Collaborative Project
> >>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-30 22:54           ` Evan Green
@ 2019-07-31  0:37             ` David Dai
  2019-07-31 19:06               ` Evan Green
  0 siblings, 1 reply; 14+ messages in thread
From: David Dai @ 2019-07-31  0:37 UTC (permalink / raw)
  To: Evan Green
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm, grahamr


On 7/30/2019 3:54 PM, Evan Green wrote:
> On Thu, Jul 18, 2019 at 10:59 AM David Dai <daidavid1@codeaurora.org> wrote:
>> On 7/16/2019 1:15 PM, Evan Green wrote:
>>> On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
>>>> Hi Evan,
>>>>
>>>> Thanks for the continued help in reviewing these patches!
>>> No problem. I want to do more, but haven't found time to do the
>>> prerequisite research before jumping into some of the other
>>> discussions yet.
>>>
>>>> On 7/11/2019 10:06 AM, Evan Green wrote:
>>>>> Hi Georgi and David,
>>>>>
>>>>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>>>>>> From: David Dai <daidavid1@codeaurora.org>
>>>>>>
>>>>>> Add support for wake and sleep commands by using a tag to indicate
>>>>>> whether or not the aggregate and set requests fall into execution
>>>>>> state specific bucket.
>>>>>>
>>>>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
>>>>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>>>>>> ---
>>>>>>     drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>>>>>>     1 file changed, 98 insertions(+), 31 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
>>>>>> index fb526004c82e..c100aab39415 100644
>>>>>> --- a/drivers/interconnect/qcom/sdm845.c
>>>>>> +++ b/drivers/interconnect/qcom/sdm845.c
>>>>>> @@ -66,6 +66,17 @@ struct bcm_db {
>>>>>>     #define SDM845_MAX_BCM_PER_NODE        2
>>>>>>     #define SDM845_MAX_VCD         10
>>>>>>
>>>>>> +#define QCOM_ICC_BUCKET_AMC            0
>>>>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
>>>>> the meaning of this bucket would be helpful.
>>>> That's correct. Will add a comment for this.
>>>>>> +#define QCOM_ICC_BUCKET_WAKE           1
>>>>>> +#define QCOM_ICC_BUCKET_SLEEP          2
>>>>>> +#define QCOM_ICC_NUM_BUCKETS           3
>>>>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
>>>>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
>>>>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
>>>>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
>>>>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
>>>>>> +                                        QCOM_ICC_TAG_SLEEP)
>>>>>> +
>>>>>>     /**
>>>>>>      * struct qcom_icc_node - Qualcomm specific interconnect nodes
>>>>>>      * @name: the node name used in debugfs
>>>>>> @@ -75,7 +86,9 @@ struct bcm_db {
>>>>>>      * @channels: num of channels at this node
>>>>>>      * @buswidth: width of the interconnect between a node and the bus
>>>>>>      * @sum_avg: current sum aggregate value of all avg bw requests
>>>>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>>>>>>      * @max_peak: current max aggregate value of all peak bw requests
>>>>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>>>>>>      * @bcms: list of bcms associated with this logical node
>>>>>>      * @num_bcms: num of @bcms
>>>>>>      */
>>>>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>>>>>>            u16 num_links;
>>>>>>            u16 channels;
>>>>>>            u16 buswidth;
>>>>>> -       u64 sum_avg;
>>>>>> -       u64 max_peak;
>>>>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
>>>>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
>>>>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
>>>>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>>>>>>            struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>>>>>>            size_t num_bcms;
>>>>>>     };
>>>>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>>>>>>            const char *name;
>>>>>>            u32 type;
>>>>>>            u32 addr;
>>>>>> -       u64 vote_x;
>>>>>> -       u64 vote_y;
>>>>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
>>>>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>>>>>>            bool dirty;
>>>>>>            bool keepalive;
>>>>>>            struct bcm_db aux_data;
>>>>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>>>>>>                    cmd->wait = true;
>>>>>>     }
>>>>>>
>>>>>> -static void tcs_list_gen(struct list_head *bcm_list,
>>>>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>>>>>>                             struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>>>>>>                             int n[SDM845_MAX_VCD])
>>>>>>     {
>>>>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>>>                            commit = true;
>>>>>>                            cur_vcd_size = 0;
>>>>>>                    }
>>>>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
>>>>>> -                           bcm->addr, commit);
>>>>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
>>>>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
>>>>>>                    idx++;
>>>>>>                    n[batch]++;
>>>>>>                    /*
>>>>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>>>
>>>>>>     static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>>>>>>     {
>>>>>> -       size_t i;
>>>>>> -       u64 agg_avg = 0;
>>>>>> -       u64 agg_peak = 0;
>>>>>> +       size_t i, bucket;
>>>>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>>>            u64 temp;
>>>>>>
>>>>>> -       for (i = 0; i < bcm->num_nodes; i++) {
>>>>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
>>>>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>>>> -               agg_avg = max(agg_avg, temp);
>>>>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
>>>>>> +               for (i = 0; i < bcm->num_nodes; i++) {
>>>>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
>>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>>>>>>
>>>>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
>>>>>> -               do_div(temp, bcm->nodes[i]->buswidth);
>>>>> Why is it that this one doesn't have the multiply by
>>>>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
>>>>> If it's correct maybe it deserves a comment.
>>>> I think the rationale behind this is generally for consumers to target a
>>>> certain minimum threshold to satisfy some structural latency
>>>> requirements as opposed to strictly throughput, and it may be easier for
>>>> consumers to reuse certain values to support hitting some minimum NoC
>>>> frequencies without having to be concerned with the number of channels
>>>> that may change from platform to platform.
>>> I was mostly pointing out that sum_avg seems to have the multiply, but
>>> max_peak does not. I would have expected those two things to be of the
>>> same units, and get the same treatment. Maybe the hardware is taking
>>> in different final units for that field, one that is per-channel and
>>> one that isn't?
>> The hardware isn't treating the values differently. I couldn't find any
>> justification other than the intuition mentioned above for the ease of
>> voting from the consumer perspective. The consumer would know that this
>> peak_bw value results in some floor performance from the system to
>> satisfy its latency requirements. The same approach would work if we
>> accounted for the number of channels as well, but given that channels
>> may vary from platform to platform or even on the same platform that
>> shares multiple channel configurations(DDR), it can be difficult for
>> consumers to keep track of and have to adjust their votes constantly(to
>> try to hit some frequency/latency requirement, this intuition doesn't
>> apply for avg_bw since we're concerned with throughput in that case).
>>
>>>>>> -               agg_peak = max(agg_peak, temp);
>>>>>> -       }
>>>>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
>>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
>>>>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>>>>>>
>>>>>> -       temp = agg_avg * 1000ULL;
>>>>>> -       do_div(temp, bcm->aux_data.unit);
>>>>>> -       bcm->vote_x = temp;
>>>>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
>>>>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
>>>>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
>>>>> 1. qcom_icc_aggregate() does the math from the incoming values on
>>>>> sum_avg, and then clobbers sum_avg_cached with those values.
>>>>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
>>>>>
>>>>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
>>>>> it work the same? Ok, it wouldn't if you ended up calling
>>>>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
>>>>> flag that prevents this from happening. So I think it's safe to remove
>>>>> the cached arrays, and just clear out the sum_avg when you aggregate.
>>>> You are correct in that the dirty flag would prevent another repeat of
>>>> the bcm_aggregate() call in the same icc_set request. But consider a
>>>> following icc_set request on a different node that shares the same BCM,
>>>> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
>>>> for the BCM since the avg_sum from the previous node(from the previous
>>>> icc_set) was cleared out. We need a way to retain the current state of
>>>> all nodes to accurately aggregate the bw values for the BCM.
>>> I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
>>> they're only ever a) equal, like after qcom_icc_aggregate(), or b)
>>> sum_avg is zeroed, and sum_avg_cached is its old value. A new
>>> icc_set_bw() would call aggregate_requests(), which would clobber
>>> sum_avg_cached to sum_avg for every BCM involved. Then the core would
>>> call apply_constraints(), then qcom_icc_set(), which would use
>>> sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
>>> that bcm_aggregate() is only called once per BCM. This all happens
>>> under the mutex held in the core. A new request would start the whole
>>> thing over, since sum_avg is cleared. It seems to me that flow would
>>> work the same with one array as it does with two. Maybe you can walk
>>> me through a scenario?
>>> -Evan
>> Let's walk through the scenario you've just described with the
>> assumption that there's only one avg_sum value per node with two
>> icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB
>> for node 2) under the same BCM(say BCM A). The first
>> qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed
>> by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM
>> A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x
>> of 2MB(for simplicity let's ignore unit). We then clear out
>> node1->avg_sum before we start the next icc_set_bw(). In the following
>> icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2
>> followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but
>> now incorrectly aggregates BCM A to 1MB by looking at
>> max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared
>> out when in reality BCM A should have a vote_x value of 2MB at this
>> point. The subsequent bcm_aggregate do not re-aggregate all of the
>> requests for each of its nodes, but assumes that the aggregated results
>> at the nodes are correct.
> Ah, I finally get it. Thanks for the detailed explanation. It's pretty
> confusing that there are essentially two connected graphs laid on top
> of each other, one graph consisting of nodes the framework deals with,
> and another graph that groups those nodes together into BCMs. I was
> failing to understand that bcm_aggregate loops over nodes that have
> nothing to do with the current request, and so it needs to remember
> the old totals from former requests. You've got the two arrays
> basically to differentiate between "add together all requests for this
> node", and "max all nodes into a BCM", since you need to reset sum_avg
> at the start of the first call to qcom_icc_aggregate().
Well it's not really two graphs since the BCMs aren't really connected 
to each other, they only have association with some groups of physical 
nodes that share a clock domain(There's some nuances here, but let's 
assume for the sake of simplicity). Their only job is to aggregate to 
some threshold value and select a performance point and they don't 
contain any information about the connectivity of the nodes.
> I had suggested a callback in the core earlier to tell the providers
> "I'm about to start aggregating on these nodes", which would have
> allowed you to clear sum_avg in that callback and reduce down to one
> array. IMO that's a lot easier to understand than these double arrays,
> but maybe it's just me that gets confused.
I do admit looking at this is somewhat confusing. I'm not totally 
against the idea of adding another callback in the framework, maybe we 
can re-evaluate this when there are other providers using the 
interconnect framework. I'd prefer to have the justification of needing 
additional ops in the core if somehow there's some hardware out there 
that dictates that we need some pre or post aggregation stage as opposed 
to easier book keeping? Though I do like the idea of reducing complexity 
overall, any thoughts on this Georgi?
>
> Why do we bother with the individual nodes at all, why don't we just
> build a graph out of the BCMs themselves and pass that to the
> framework? I guess you can't do that because of .channels and
> .bus_width, you wouldn't know what to multiply/divide by to translate
> to a vote value? Hm... it would be great to make this simpler, but I'm
> out of suggestions for now.

I appreciate the thought, but not only do the nodes provide the 
width/channel, they provide all the connectivity data and an accurate 
representation of the NoC topology. There's no way to aggregate the 
nodes and the paths properly if we lose out on the granularity that the 
current graph provides(Imagine the example of two nodes on some mutually 
exclusive path under the same BCM again using avg_bw, 1MBps on node1 and 
1MBps node2 should result in an aggregate BCM node of 1MBps since they 
physically don't share the same port where as if we clobbered the nodes 
together and represent them under a single BCM, it would suggest that 
they share the same physical port and aggregate 2MBps when in reality 
they don't need to be since they are parallel).

> -Evan
>
>>>>>> +               }
>>>>>>
>>>>>> -       temp = agg_peak * 1000ULL;
>>>>>> -       do_div(temp, bcm->aux_data.unit);
>>>>>> -       bcm->vote_y = temp;
>>>>>> +               temp = agg_avg[bucket] * 1000ULL;
>>>>>> +               do_div(temp, bcm->aux_data.unit);
>>>>>> +               bcm->vote_x[bucket] = temp;
>>>>>>
>>>>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
>>>>>> -               bcm->vote_x = 1;
>>>>>> -               bcm->vote_y = 1;
>>>>>> +               temp = agg_peak[bucket] * 1000ULL;
>>>>>> +               do_div(temp, bcm->aux_data.unit);
>>>>>> +               bcm->vote_y[bucket] = temp;
>>>>>> +       }
>>>>>> +
>>>>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
>>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
>>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
>>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>>>            }
>>>>>>
>>>>>>            bcm->dirty = false;
>>>>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>>>>>>     {
>>>>>>            size_t i;
>>>>>>            struct qcom_icc_node *qn;
>>>>>> +       unsigned long tag_word = (unsigned long)tag;
>>>>>>
>>>>>>            qn = node->data;
>>>>>>
>>>>>> +       if (!tag)
>>>>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
>>>>>> +
>>>>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>>>>>> +               if (test_bit(i, &tag_word)) {
>>>>> I guess all this extra business with tag_word and casting is so that
>>>>> you can use test_bit, which is presumably a tiny bit faster? Does this
>>>>> actually make a measurable difference? Maybe in the name of simplicity
>>>>> we just do if (tag & BIT(i)), and then optimize if we find that
>>>>> conditional to be a hotspot?
>>>> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
>>>>>> +                       qn->sum_avg[i] += avg_bw;
>>>>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
>>>>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
>>>>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>>            *agg_avg += avg_bw;
>>>>>>            *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>>>>>
>>>>>> -       qn->sum_avg = *agg_avg;
>>>>>> -       qn->max_peak = *agg_peak;
>>>>>> -
>>>>>>            for (i = 0; i < qn->num_bcms; i++)
>>>>>>                    qn->bcms[i]->dirty = true;
>>>>>>
>>>>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>>>             * Construct the command list based on a pre ordered list of BCMs
>>>>>>             * based on VCD.
>>>>>>             */
>>>>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>>>>>>
>>>>>>            if (!commit_idx[0])
>>>>>>                    return ret;
>>>>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>>>                    return ret;
>>>>>>            }
>>>>>>
>>>>>> +       INIT_LIST_HEAD(&commit_list);
>>>>>> +
>>>>>> +       for (i = 0; i < qp->num_bcms; i++) {
>>>>>> +               /*
>>>>>> +                * Only generate WAKE and SLEEP commands if a resource's
>>>>>> +                * requirements change as the execution environment transitions
>>>>>> +                * between different power states.
>>>>>> +                */
>>>>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
>>>>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
>>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
>>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
>>>>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>> +       if (list_empty(&commit_list))
>>>>>> +               return ret;
>>>>>> +
>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
>>>>>> +
>>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
>>>>>> +       if (ret) {
>>>>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
>>>>>> +               return ret;
>>>>>> +       }
>>>>>> +
>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
>>>>>> +
>>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
>>>>>> +       if (ret) {
>>>>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
>>>>>> +               return ret;
>>>>>> +       }
>>>>>> +
>>>>>>            return ret;
>>>>>>     }
>>>>>>
>>>> --
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>>>> a Linux Foundation Collaborative Project
>>>>
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-31  0:37             ` David Dai
@ 2019-07-31 19:06               ` Evan Green
  2019-08-02 16:22                 ` Georgi Djakov
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Green @ 2019-07-31 19:06 UTC (permalink / raw)
  To: David Dai
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm, grahamr

On Tue, Jul 30, 2019 at 5:37 PM David Dai <daidavid1@codeaurora.org> wrote:
>
>
> On 7/30/2019 3:54 PM, Evan Green wrote:
> > On Thu, Jul 18, 2019 at 10:59 AM David Dai <daidavid1@codeaurora.org> wrote:
> >> On 7/16/2019 1:15 PM, Evan Green wrote:
> >>> On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
> >>>> Hi Evan,
> >>>>
> >>>> Thanks for the continued help in reviewing these patches!
> >>> No problem. I want to do more, but haven't found time to do the
> >>> prerequisite research before jumping into some of the other
> >>> discussions yet.
> >>>
> >>>> On 7/11/2019 10:06 AM, Evan Green wrote:
> >>>>> Hi Georgi and David,
> >>>>>
> >>>>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
> >>>>>> From: David Dai <daidavid1@codeaurora.org>
> >>>>>>
> >>>>>> Add support for wake and sleep commands by using a tag to indicate
> >>>>>> whether or not the aggregate and set requests fall into execution
> >>>>>> state specific bucket.
> >>>>>>
> >>>>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
> >>>>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> >>>>>> ---
> >>>>>>     drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
> >>>>>>     1 file changed, 98 insertions(+), 31 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
> >>>>>> index fb526004c82e..c100aab39415 100644
> >>>>>> --- a/drivers/interconnect/qcom/sdm845.c
> >>>>>> +++ b/drivers/interconnect/qcom/sdm845.c
> >>>>>> @@ -66,6 +66,17 @@ struct bcm_db {
> >>>>>>     #define SDM845_MAX_BCM_PER_NODE        2
> >>>>>>     #define SDM845_MAX_VCD         10
> >>>>>>
> >>>>>> +#define QCOM_ICC_BUCKET_AMC            0
> >>>>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
> >>>>> the meaning of this bucket would be helpful.
> >>>> That's correct. Will add a comment for this.
> >>>>>> +#define QCOM_ICC_BUCKET_WAKE           1
> >>>>>> +#define QCOM_ICC_BUCKET_SLEEP          2
> >>>>>> +#define QCOM_ICC_NUM_BUCKETS           3
> >>>>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
> >>>>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
> >>>>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
> >>>>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
> >>>>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
> >>>>>> +                                        QCOM_ICC_TAG_SLEEP)
> >>>>>> +
> >>>>>>     /**
> >>>>>>      * struct qcom_icc_node - Qualcomm specific interconnect nodes
> >>>>>>      * @name: the node name used in debugfs
> >>>>>> @@ -75,7 +86,9 @@ struct bcm_db {
> >>>>>>      * @channels: num of channels at this node
> >>>>>>      * @buswidth: width of the interconnect between a node and the bus
> >>>>>>      * @sum_avg: current sum aggregate value of all avg bw requests
> >>>>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
> >>>>>>      * @max_peak: current max aggregate value of all peak bw requests
> >>>>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
> >>>>>>      * @bcms: list of bcms associated with this logical node
> >>>>>>      * @num_bcms: num of @bcms
> >>>>>>      */
> >>>>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
> >>>>>>            u16 num_links;
> >>>>>>            u16 channels;
> >>>>>>            u16 buswidth;
> >>>>>> -       u64 sum_avg;
> >>>>>> -       u64 max_peak;
> >>>>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
> >>>>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
> >>>>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
> >>>>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
> >>>>>>            struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
> >>>>>>            size_t num_bcms;
> >>>>>>     };
> >>>>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
> >>>>>>            const char *name;
> >>>>>>            u32 type;
> >>>>>>            u32 addr;
> >>>>>> -       u64 vote_x;
> >>>>>> -       u64 vote_y;
> >>>>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
> >>>>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
> >>>>>>            bool dirty;
> >>>>>>            bool keepalive;
> >>>>>>            struct bcm_db aux_data;
> >>>>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
> >>>>>>                    cmd->wait = true;
> >>>>>>     }
> >>>>>>
> >>>>>> -static void tcs_list_gen(struct list_head *bcm_list,
> >>>>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
> >>>>>>                             struct tcs_cmd tcs_list[SDM845_MAX_VCD],
> >>>>>>                             int n[SDM845_MAX_VCD])
> >>>>>>     {
> >>>>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>>>>>                            commit = true;
> >>>>>>                            cur_vcd_size = 0;
> >>>>>>                    }
> >>>>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
> >>>>>> -                           bcm->addr, commit);
> >>>>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
> >>>>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
> >>>>>>                    idx++;
> >>>>>>                    n[batch]++;
> >>>>>>                    /*
> >>>>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
> >>>>>>
> >>>>>>     static void bcm_aggregate(struct qcom_icc_bcm *bcm)
> >>>>>>     {
> >>>>>> -       size_t i;
> >>>>>> -       u64 agg_avg = 0;
> >>>>>> -       u64 agg_peak = 0;
> >>>>>> +       size_t i, bucket;
> >>>>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
> >>>>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
> >>>>>>            u64 temp;
> >>>>>>
> >>>>>> -       for (i = 0; i < bcm->num_nodes; i++) {
> >>>>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
> >>>>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >>>>>> -               agg_avg = max(agg_avg, temp);
> >>>>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
> >>>>>> +               for (i = 0; i < bcm->num_nodes; i++) {
> >>>>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
> >>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
> >>>>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
> >>>>>>
> >>>>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
> >>>>>> -               do_div(temp, bcm->nodes[i]->buswidth);
> >>>>> Why is it that this one doesn't have the multiply by
> >>>>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
> >>>>> If it's correct maybe it deserves a comment.
> >>>> I think the rationale behind this is generally for consumers to target a
> >>>> certain minimum threshold to satisfy some structural latency
> >>>> requirements as opposed to strictly throughput, and it may be easier for
> >>>> consumers to reuse certain values to support hitting some minimum NoC
> >>>> frequencies without having to be concerned with the number of channels
> >>>> that may change from platform to platform.
> >>> I was mostly pointing out that sum_avg seems to have the multiply, but
> >>> max_peak does not. I would have expected those two things to be of the
> >>> same units, and get the same treatment. Maybe the hardware is taking
> >>> in different final units for that field, one that is per-channel and
> >>> one that isn't?
> >> The hardware isn't treating the values differently. I couldn't find any
> >> justification other than the intuition mentioned above for the ease of
> >> voting from the consumer perspective. The consumer would know that this
> >> peak_bw value results in some floor performance from the system to
> >> satisfy its latency requirements. The same approach would work if we
> >> accounted for the number of channels as well, but given that channels
> >> may vary from platform to platform or even on the same platform that
> >> shares multiple channel configurations(DDR), it can be difficult for
> >> consumers to keep track of and have to adjust their votes constantly(to
> >> try to hit some frequency/latency requirement, this intuition doesn't
> >> apply for avg_bw since we're concerned with throughput in that case).
> >>
> >>>>>> -               agg_peak = max(agg_peak, temp);
> >>>>>> -       }
> >>>>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
> >>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
> >>>>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
> >>>>>>
> >>>>>> -       temp = agg_avg * 1000ULL;
> >>>>>> -       do_div(temp, bcm->aux_data.unit);
> >>>>>> -       bcm->vote_x = temp;
> >>>>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
> >>>>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
> >>>>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
> >>>>> 1. qcom_icc_aggregate() does the math from the incoming values on
> >>>>> sum_avg, and then clobbers sum_avg_cached with those values.
> >>>>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
> >>>>>
> >>>>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
> >>>>> it work the same? Ok, it wouldn't if you ended up calling
> >>>>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
> >>>>> flag that prevents this from happening. So I think it's safe to remove
> >>>>> the cached arrays, and just clear out the sum_avg when you aggregate.
> >>>> You are correct in that the dirty flag would prevent another repeat of
> >>>> the bcm_aggregate() call in the same icc_set request. But consider a
> >>>> following icc_set request on a different node that shares the same BCM,
> >>>> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
> >>>> for the BCM since the avg_sum from the previous node(from the previous
> >>>> icc_set) was cleared out. We need a way to retain the current state of
> >>>> all nodes to accurately aggregate the bw values for the BCM.
> >>> I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
> >>> they're only ever a) equal, like after qcom_icc_aggregate(), or b)
> >>> sum_avg is zeroed, and sum_avg_cached is its old value. A new
> >>> icc_set_bw() would call aggregate_requests(), which would clobber
> >>> sum_avg_cached to sum_avg for every BCM involved. Then the core would
> >>> call apply_constraints(), then qcom_icc_set(), which would use
> >>> sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
> >>> that bcm_aggregate() is only called once per BCM. This all happens
> >>> under the mutex held in the core. A new request would start the whole
> >>> thing over, since sum_avg is cleared. It seems to me that flow would
> >>> work the same with one array as it does with two. Maybe you can walk
> >>> me through a scenario?
> >>> -Evan
> >> Let's walk through the scenario you've just described with the
> >> assumption that there's only one avg_sum value per node with two
> >> icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB
> >> for node 2) under the same BCM(say BCM A). The first
> >> qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed
> >> by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM
> >> A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x
> >> of 2MB(for simplicity let's ignore unit). We then clear out
> >> node1->avg_sum before we start the next icc_set_bw(). In the following
> >> icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2
> >> followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but
> >> now incorrectly aggregates BCM A to 1MB by looking at
> >> max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared
> >> out when in reality BCM A should have a vote_x value of 2MB at this
> >> point. The subsequent bcm_aggregate do not re-aggregate all of the
> >> requests for each of its nodes, but assumes that the aggregated results
> >> at the nodes are correct.
> > Ah, I finally get it. Thanks for the detailed explanation. It's pretty
> > confusing that there are essentially two connected graphs laid on top
> > of each other, one graph consisting of nodes the framework deals with,
> > and another graph that groups those nodes together into BCMs. I was
> > failing to understand that bcm_aggregate loops over nodes that have
> > nothing to do with the current request, and so it needs to remember
> > the old totals from former requests. You've got the two arrays
> > basically to differentiate between "add together all requests for this
> > node", and "max all nodes into a BCM", since you need to reset sum_avg
> > at the start of the first call to qcom_icc_aggregate().
> Well it's not really two graphs since the BCMs aren't really connected
> to each other, they only have association with some groups of physical
> nodes that share a clock domain(There's some nuances here, but let's
> assume for the sake of simplicity). Their only job is to aggregate to
> some threshold value and select a performance point and they don't
> contain any information about the connectivity of the nodes.

Right ok, I see.

> > I had suggested a callback in the core earlier to tell the providers
> > "I'm about to start aggregating on these nodes", which would have
> > allowed you to clear sum_avg in that callback and reduce down to one
> > array. IMO that's a lot easier to understand than these double arrays,
> > but maybe it's just me that gets confused.
> I do admit looking at this is somewhat confusing. I'm not totally
> against the idea of adding another callback in the framework, maybe we
> can re-evaluate this when there are other providers using the
> interconnect framework. I'd prefer to have the justification of needing
> additional ops in the core if somehow there's some hardware out there
> that dictates that we need some pre or post aggregation stage as opposed
> to easier book keeping? Though I do like the idea of reducing complexity
> overall, any thoughts on this Georgi?

Sure. I suppose any other SoC that does this same grouping thing in
the hardware will end up duplicating this same complexity. We'll see
if anybody has anything like this. It also might end up being useful
even if it's just for QC SoCs if we find ourselves copy/pasting a lot
of this logic in sdm845.c for sdm-next.c. Generally we should aim to
keep the providers as dumb as we can, but I'm fine waiting until
there's something to refactor down.

> >
> > Why do we bother with the individual nodes at all, why don't we just
> > build a graph out of the BCMs themselves and pass that to the
> > framework? I guess you can't do that because of .channels and
> > .bus_width, you wouldn't know what to multiply/divide by to translate
> > to a vote value? Hm... it would be great to make this simpler, but I'm
> > out of suggestions for now.
>
> I appreciate the thought, but not only do the nodes provide the
> width/channel, they provide all the connectivity data and an accurate
> representation of the NoC topology. There's no way to aggregate the
> nodes and the paths properly if we lose out on the granularity that the
> current graph provides(Imagine the example of two nodes on some mutually
> exclusive path under the same BCM again using avg_bw, 1MBps on node1 and
> 1MBps node2 should result in an aggregate BCM node of 1MBps since they
> physically don't share the same port where as if we clobbered the nodes
> together and represent them under a single BCM, it would suggest that
> they share the same physical port and aggregate 2MBps when in reality
> they don't need to be since they are parallel).

Oh right, that makes sense. I'm on board.
-Evan

>
> > -Evan
> >
> >>>>>> +               }
> >>>>>>
> >>>>>> -       temp = agg_peak * 1000ULL;
> >>>>>> -       do_div(temp, bcm->aux_data.unit);
> >>>>>> -       bcm->vote_y = temp;
> >>>>>> +               temp = agg_avg[bucket] * 1000ULL;
> >>>>>> +               do_div(temp, bcm->aux_data.unit);
> >>>>>> +               bcm->vote_x[bucket] = temp;
> >>>>>>
> >>>>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
> >>>>>> -               bcm->vote_x = 1;
> >>>>>> -               bcm->vote_y = 1;
> >>>>>> +               temp = agg_peak[bucket] * 1000ULL;
> >>>>>> +               do_div(temp, bcm->aux_data.unit);
> >>>>>> +               bcm->vote_y[bucket] = temp;
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
> >>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
> >>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
> >>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
> >>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
> >>>>>>            }
> >>>>>>
> >>>>>>            bcm->dirty = false;
> >>>>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
> >>>>>>     {
> >>>>>>            size_t i;
> >>>>>>            struct qcom_icc_node *qn;
> >>>>>> +       unsigned long tag_word = (unsigned long)tag;
> >>>>>>
> >>>>>>            qn = node->data;
> >>>>>>
> >>>>>> +       if (!tag)
> >>>>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
> >>>>>> +
> >>>>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
> >>>>>> +               if (test_bit(i, &tag_word)) {
> >>>>> I guess all this extra business with tag_word and casting is so that
> >>>>> you can use test_bit, which is presumably a tiny bit faster? Does this
> >>>>> actually make a measurable difference? Maybe in the name of simplicity
> >>>>> we just do if (tag & BIT(i)), and then optimize if we find that
> >>>>> conditional to be a hotspot?
> >>>> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
> >>>>>> +                       qn->sum_avg[i] += avg_bw;
> >>>>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
> >>>>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
> >>>>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
> >>>>>> +               }
> >>>>>> +       }
> >>>>>> +
> >>>>>>            *agg_avg += avg_bw;
> >>>>>>            *agg_peak = max_t(u32, *agg_peak, peak_bw);
> >>>>>>
> >>>>>> -       qn->sum_avg = *agg_avg;
> >>>>>> -       qn->max_peak = *agg_peak;
> >>>>>> -
> >>>>>>            for (i = 0; i < qn->num_bcms; i++)
> >>>>>>                    qn->bcms[i]->dirty = true;
> >>>>>>
> >>>>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>>>>>             * Construct the command list based on a pre ordered list of BCMs
> >>>>>>             * based on VCD.
> >>>>>>             */
> >>>>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
> >>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
> >>>>>>
> >>>>>>            if (!commit_idx[0])
> >>>>>>                    return ret;
> >>>>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
> >>>>>>                    return ret;
> >>>>>>            }
> >>>>>>
> >>>>>> +       INIT_LIST_HEAD(&commit_list);
> >>>>>> +
> >>>>>> +       for (i = 0; i < qp->num_bcms; i++) {
> >>>>>> +               /*
> >>>>>> +                * Only generate WAKE and SLEEP commands if a resource's
> >>>>>> +                * requirements change as the execution environment transitions
> >>>>>> +                * between different power states.
> >>>>>> +                */
> >>>>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
> >>>>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
> >>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
> >>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
> >>>>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
> >>>>>> +               }
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       if (list_empty(&commit_list))
> >>>>>> +               return ret;
> >>>>>> +
> >>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
> >>>>>> +
> >>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
> >>>>>> +       if (ret) {
> >>>>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
> >>>>>> +               return ret;
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
> >>>>>> +
> >>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
> >>>>>> +       if (ret) {
> >>>>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
> >>>>>> +               return ret;
> >>>>>> +       }
> >>>>>> +
> >>>>>>            return ret;
> >>>>>>     }
> >>>>>>
> >>>> --
> >>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> >>>> a Linux Foundation Collaborative Project
> >>>>
> >> --
> >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> >> a Linux Foundation Collaborative Project
> >>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
  2019-07-31 19:06               ` Evan Green
@ 2019-08-02 16:22                 ` Georgi Djakov
  2019-08-05 15:33                   ` [PATCH] interconnect: Add pre_aggregate() callback Georgi Djakov
  0 siblings, 1 reply; 14+ messages in thread
From: Georgi Djakov @ 2019-08-02 16:22 UTC (permalink / raw)
  To: Evan Green, David Dai
  Cc: linux-pm, Vincent Guittot, Bjorn Andersson, amit.kucheria,
	Doug Anderson, Sean Sweeney, LKML, linux-arm Mailing List,
	linux-arm-msm, grahamr

On 7/31/19 22:06, Evan Green wrote:
> On Tue, Jul 30, 2019 at 5:37 PM David Dai <daidavid1@codeaurora.org> wrote:
>>
>>
>> On 7/30/2019 3:54 PM, Evan Green wrote:
>>> On Thu, Jul 18, 2019 at 10:59 AM David Dai <daidavid1@codeaurora.org> wrote:
>>>> On 7/16/2019 1:15 PM, Evan Green wrote:
>>>>> On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
>>>>>> Hi Evan,
>>>>>>
>>>>>> Thanks for the continued help in reviewing these patches!
>>>>> No problem. I want to do more, but haven't found time to do the
>>>>> prerequisite research before jumping into some of the other
>>>>> discussions yet.
>>>>>
>>>>>> On 7/11/2019 10:06 AM, Evan Green wrote:
>>>>>>> Hi Georgi and David,
>>>>>>>
>>>>>>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>>>>>>>> From: David Dai <daidavid1@codeaurora.org>
>>>>>>>>
>>>>>>>> Add support for wake and sleep commands by using a tag to indicate
>>>>>>>> whether or not the aggregate and set requests fall into execution
>>>>>>>> state specific bucket.
>>>>>>>>
>>>>>>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
>>>>>>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>>>>>>>> ---
>>>>>>>>     drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>>>>>>>>     1 file changed, 98 insertions(+), 31 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
>>>>>>>> index fb526004c82e..c100aab39415 100644
>>>>>>>> --- a/drivers/interconnect/qcom/sdm845.c
>>>>>>>> +++ b/drivers/interconnect/qcom/sdm845.c
>>>>>>>> @@ -66,6 +66,17 @@ struct bcm_db {
>>>>>>>>     #define SDM845_MAX_BCM_PER_NODE        2
>>>>>>>>     #define SDM845_MAX_VCD         10
>>>>>>>>
>>>>>>>> +#define QCOM_ICC_BUCKET_AMC            0
>>>>>>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
>>>>>>> the meaning of this bucket would be helpful.
>>>>>> That's correct. Will add a comment for this.
>>>>>>>> +#define QCOM_ICC_BUCKET_WAKE           1
>>>>>>>> +#define QCOM_ICC_BUCKET_SLEEP          2
>>>>>>>> +#define QCOM_ICC_NUM_BUCKETS           3
>>>>>>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
>>>>>>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
>>>>>>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
>>>>>>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
>>>>>>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
>>>>>>>> +                                        QCOM_ICC_TAG_SLEEP)
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * struct qcom_icc_node - Qualcomm specific interconnect nodes
>>>>>>>>      * @name: the node name used in debugfs
>>>>>>>> @@ -75,7 +86,9 @@ struct bcm_db {
>>>>>>>>      * @channels: num of channels at this node
>>>>>>>>      * @buswidth: width of the interconnect between a node and the bus
>>>>>>>>      * @sum_avg: current sum aggregate value of all avg bw requests
>>>>>>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>>>>>>>>      * @max_peak: current max aggregate value of all peak bw requests
>>>>>>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>>>>>>>>      * @bcms: list of bcms associated with this logical node
>>>>>>>>      * @num_bcms: num of @bcms
>>>>>>>>      */
>>>>>>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>>>>>>>>            u16 num_links;
>>>>>>>>            u16 channels;
>>>>>>>>            u16 buswidth;
>>>>>>>> -       u64 sum_avg;
>>>>>>>> -       u64 max_peak;
>>>>>>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
>>>>>>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
>>>>>>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
>>>>>>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>>>>>>>>            struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>>>>>>>>            size_t num_bcms;
>>>>>>>>     };
>>>>>>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>>>>>>>>            const char *name;
>>>>>>>>            u32 type;
>>>>>>>>            u32 addr;
>>>>>>>> -       u64 vote_x;
>>>>>>>> -       u64 vote_y;
>>>>>>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
>>>>>>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>>>>>>>>            bool dirty;
>>>>>>>>            bool keepalive;
>>>>>>>>            struct bcm_db aux_data;
>>>>>>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>>>>>>>>                    cmd->wait = true;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> -static void tcs_list_gen(struct list_head *bcm_list,
>>>>>>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>>>>>>>>                             struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>>>>>>>>                             int n[SDM845_MAX_VCD])
>>>>>>>>     {
>>>>>>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>>>>>                            commit = true;
>>>>>>>>                            cur_vcd_size = 0;
>>>>>>>>                    }
>>>>>>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
>>>>>>>> -                           bcm->addr, commit);
>>>>>>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
>>>>>>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
>>>>>>>>                    idx++;
>>>>>>>>                    n[batch]++;
>>>>>>>>                    /*
>>>>>>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>>>>>
>>>>>>>>     static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>>>>>>>>     {
>>>>>>>> -       size_t i;
>>>>>>>> -       u64 agg_avg = 0;
>>>>>>>> -       u64 agg_peak = 0;
>>>>>>>> +       size_t i, bucket;
>>>>>>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>>>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>>>>>            u64 temp;
>>>>>>>>
>>>>>>>> -       for (i = 0; i < bcm->num_nodes; i++) {
>>>>>>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
>>>>>>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>>>>>> -               agg_avg = max(agg_avg, temp);
>>>>>>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
>>>>>>>> +               for (i = 0; i < bcm->num_nodes; i++) {
>>>>>>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
>>>>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>>>>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>>>>>>>>
>>>>>>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
>>>>>>>> -               do_div(temp, bcm->nodes[i]->buswidth);
>>>>>>> Why is it that this one doesn't have the multiply by
>>>>>>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
>>>>>>> If it's correct maybe it deserves a comment.
>>>>>> I think the rationale behind this is generally for consumers to target a
>>>>>> certain minimum threshold to satisfy some structural latency
>>>>>> requirements as opposed to strictly throughput, and it may be easier for
>>>>>> consumers to reuse certain values to support hitting some minimum NoC
>>>>>> frequencies without having to be concerned with the number of channels
>>>>>> that may change from platform to platform.
>>>>> I was mostly pointing out that sum_avg seems to have the multiply, but
>>>>> max_peak does not. I would have expected those two things to be of the
>>>>> same units, and get the same treatment. Maybe the hardware is taking
>>>>> in different final units for that field, one that is per-channel and
>>>>> one that isn't?
>>>> The hardware isn't treating the values differently. I couldn't find any
>>>> justification other than the intuition mentioned above for the ease of
>>>> voting from the consumer perspective. The consumer would know that this
>>>> peak_bw value results in some floor performance from the system to
>>>> satisfy its latency requirements. The same approach would work if we
>>>> accounted for the number of channels as well, but given that channels
>>>> may vary from platform to platform or even on the same platform that
>>>> shares multiple channel configurations(DDR), it can be difficult for
>>>> consumers to keep track of and have to adjust their votes constantly(to
>>>> try to hit some frequency/latency requirement, this intuition doesn't
>>>> apply for avg_bw since we're concerned with throughput in that case).
>>>>
>>>>>>>> -               agg_peak = max(agg_peak, temp);
>>>>>>>> -       }
>>>>>>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
>>>>>>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
>>>>>>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>>>>>>>>
>>>>>>>> -       temp = agg_avg * 1000ULL;
>>>>>>>> -       do_div(temp, bcm->aux_data.unit);
>>>>>>>> -       bcm->vote_x = temp;
>>>>>>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
>>>>>>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
>>>>>>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
>>>>>>> 1. qcom_icc_aggregate() does the math from the incoming values on
>>>>>>> sum_avg, and then clobbers sum_avg_cached with those values.
>>>>>>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
>>>>>>>
>>>>>>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
>>>>>>> it work the same? Ok, it wouldn't if you ended up calling
>>>>>>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
>>>>>>> flag that prevents this from happening. So I think it's safe to remove
>>>>>>> the cached arrays, and just clear out the sum_avg when you aggregate.
>>>>>> You are correct in that the dirty flag would prevent another repeat of
>>>>>> the bcm_aggregate() call in the same icc_set request. But consider a
>>>>>> following icc_set request on a different node that shares the same BCM,
>>>>>> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
>>>>>> for the BCM since the avg_sum from the previous node(from the previous
>>>>>> icc_set) was cleared out. We need a way to retain the current state of
>>>>>> all nodes to accurately aggregate the bw values for the BCM.
>>>>> I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
>>>>> they're only ever a) equal, like after qcom_icc_aggregate(), or b)
>>>>> sum_avg is zeroed, and sum_avg_cached is its old value. A new
>>>>> icc_set_bw() would call aggregate_requests(), which would clobber
>>>>> sum_avg_cached to sum_avg for every BCM involved. Then the core would
>>>>> call apply_constraints(), then qcom_icc_set(), which would use
>>>>> sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
>>>>> that bcm_aggregate() is only called once per BCM. This all happens
>>>>> under the mutex held in the core. A new request would start the whole
>>>>> thing over, since sum_avg is cleared. It seems to me that flow would
>>>>> work the same with one array as it does with two. Maybe you can walk
>>>>> me through a scenario?
>>>>> -Evan
>>>> Let's walk through the scenario you've just described with the
>>>> assumption that there's only one avg_sum value per node with two
>>>> icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB
>>>> for node 2) under the same BCM(say BCM A). The first
>>>> qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed
>>>> by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM
>>>> A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x
>>>> of 2MB(for simplicity let's ignore unit). We then clear out
>>>> node1->avg_sum before we start the next icc_set_bw(). In the following
>>>> icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2
>>>> followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but
>>>> now incorrectly aggregates BCM A to 1MB by looking at
>>>> max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared
>>>> out when in reality BCM A should have a vote_x value of 2MB at this
>>>> point. The subsequent bcm_aggregate do not re-aggregate all of the
>>>> requests for each of its nodes, but assumes that the aggregated results
>>>> at the nodes are correct.
>>> Ah, I finally get it. Thanks for the detailed explanation. It's pretty
>>> confusing that there are essentially two connected graphs laid on top
>>> of each other, one graph consisting of nodes the framework deals with,
>>> and another graph that groups those nodes together into BCMs. I was
>>> failing to understand that bcm_aggregate loops over nodes that have
>>> nothing to do with the current request, and so it needs to remember
>>> the old totals from former requests. You've got the two arrays
>>> basically to differentiate between "add together all requests for this
>>> node", and "max all nodes into a BCM", since you need to reset sum_avg
>>> at the start of the first call to qcom_icc_aggregate().
>> Well it's not really two graphs since the BCMs aren't really connected
>> to each other, they only have association with some groups of physical
>> nodes that share a clock domain(There's some nuances here, but let's
>> assume for the sake of simplicity). Their only job is to aggregate to
>> some threshold value and select a performance point and they don't
>> contain any information about the connectivity of the nodes.
> 
> Right ok, I see.
> 
>>> I had suggested a callback in the core earlier to tell the providers
>>> "I'm about to start aggregating on these nodes", which would have
>>> allowed you to clear sum_avg in that callback and reduce down to one
>>> array. IMO that's a lot easier to understand than these double arrays,
>>> but maybe it's just me that gets confused.
>> I do admit looking at this is somewhat confusing. I'm not totally
>> against the idea of adding another callback in the framework, maybe we
>> can re-evaluate this when there are other providers using the
>> interconnect framework. I'd prefer to have the justification of needing
>> additional ops in the core if somehow there's some hardware out there
>> that dictates that we need some pre or post aggregation stage as opposed
>> to easier book keeping? Though I do like the idea of reducing complexity
>> overall, any thoughts on this Georgi?
> 
> Sure. I suppose any other SoC that does this same grouping thing in
> the hardware will end up duplicating this same complexity. We'll see
> if anybody has anything like this. It also might end up being useful
> even if it's just for QC SoCs if we find ourselves copy/pasting a lot
> of this logic in sdm845.c for sdm-next.c. Generally we should aim to
> keep the providers as dumb as we can, but I'm fine waiting until
> there's something to refactor down.

If this same logic would be re-used in the upcoming SoCs and adding a single
callback would simplify the providers significantly, then let's do it and try to
keep the complexity at minimum from the beginning. Will give it a try.

Thanks,
Georgi

>>>
>>> Why do we bother with the individual nodes at all, why don't we just
>>> build a graph out of the BCMs themselves and pass that to the
>>> framework? I guess you can't do that because of .channels and
>>> .bus_width, you wouldn't know what to multiply/divide by to translate
>>> to a vote value? Hm... it would be great to make this simpler, but I'm
>>> out of suggestions for now.
>>
>> I appreciate the thought, but not only do the nodes provide the
>> width/channel, they provide all the connectivity data and an accurate
>> representation of the NoC topology. There's no way to aggregate the
>> nodes and the paths properly if we lose out on the granularity that the
>> current graph provides(Imagine the example of two nodes on some mutually
>> exclusive path under the same BCM again using avg_bw, 1MBps on node1 and
>> 1MBps node2 should result in an aggregate BCM node of 1MBps since they
>> physically don't share the same port where as if we clobbered the nodes
>> together and represent them under a single BCM, it would suggest that
>> they share the same physical port and aggregate 2MBps when in reality
>> they don't need to be since they are parallel).
> 
> Oh right, that makes sense. I'm on board.
> -Evan
> 
>>
>>> -Evan
>>>
>>>>>>>> +               }
>>>>>>>>
>>>>>>>> -       temp = agg_peak * 1000ULL;
>>>>>>>> -       do_div(temp, bcm->aux_data.unit);
>>>>>>>> -       bcm->vote_y = temp;
>>>>>>>> +               temp = agg_avg[bucket] * 1000ULL;
>>>>>>>> +               do_div(temp, bcm->aux_data.unit);
>>>>>>>> +               bcm->vote_x[bucket] = temp;
>>>>>>>>
>>>>>>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
>>>>>>>> -               bcm->vote_x = 1;
>>>>>>>> -               bcm->vote_y = 1;
>>>>>>>> +               temp = agg_peak[bucket] * 1000ULL;
>>>>>>>> +               do_div(temp, bcm->aux_data.unit);
>>>>>>>> +               bcm->vote_y[bucket] = temp;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
>>>>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
>>>>>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
>>>>>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>>>>>            }
>>>>>>>>
>>>>>>>>            bcm->dirty = false;
>>>>>>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>>>>>>>>     {
>>>>>>>>            size_t i;
>>>>>>>>            struct qcom_icc_node *qn;
>>>>>>>> +       unsigned long tag_word = (unsigned long)tag;
>>>>>>>>
>>>>>>>>            qn = node->data;
>>>>>>>>
>>>>>>>> +       if (!tag)
>>>>>>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
>>>>>>>> +
>>>>>>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>>>>>>>> +               if (test_bit(i, &tag_word)) {
>>>>>>> I guess all this extra business with tag_word and casting is so that
>>>>>>> you can use test_bit, which is presumably a tiny bit faster? Does this
>>>>>>> actually make a measurable difference? Maybe in the name of simplicity
>>>>>>> we just do if (tag & BIT(i)), and then optimize if we find that
>>>>>>> conditional to be a hotspot?
>>>>>> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
>>>>>>>> +                       qn->sum_avg[i] += avg_bw;
>>>>>>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
>>>>>>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
>>>>>>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
>>>>>>>> +               }
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>>            *agg_avg += avg_bw;
>>>>>>>>            *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>>>>>>>
>>>>>>>> -       qn->sum_avg = *agg_avg;
>>>>>>>> -       qn->max_peak = *agg_peak;
>>>>>>>> -
>>>>>>>>            for (i = 0; i < qn->num_bcms; i++)
>>>>>>>>                    qn->bcms[i]->dirty = true;
>>>>>>>>
>>>>>>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>>>>>             * Construct the command list based on a pre ordered list of BCMs
>>>>>>>>             * based on VCD.
>>>>>>>>             */
>>>>>>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
>>>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>>>>>>>>
>>>>>>>>            if (!commit_idx[0])
>>>>>>>>                    return ret;
>>>>>>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>>>>>                    return ret;
>>>>>>>>            }
>>>>>>>>
>>>>>>>> +       INIT_LIST_HEAD(&commit_list);
>>>>>>>> +
>>>>>>>> +       for (i = 0; i < qp->num_bcms; i++) {
>>>>>>>> +               /*
>>>>>>>> +                * Only generate WAKE and SLEEP commands if a resource's
>>>>>>>> +                * requirements change as the execution environment transitions
>>>>>>>> +                * between different power states.
>>>>>>>> +                */
>>>>>>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
>>>>>>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
>>>>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
>>>>>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
>>>>>>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
>>>>>>>> +               }
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       if (list_empty(&commit_list))
>>>>>>>> +               return ret;
>>>>>>>> +
>>>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
>>>>>>>> +
>>>>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
>>>>>>>> +       if (ret) {
>>>>>>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
>>>>>>>> +               return ret;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
>>>>>>>> +
>>>>>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
>>>>>>>> +       if (ret) {
>>>>>>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
>>>>>>>> +               return ret;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>>            return ret;
>>>>>>>>     }
>>>>>>>>
>>>>>> --
>>>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>>>>>> a Linux Foundation Collaborative Project
>>>>>>
>>>> --
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>>>> a Linux Foundation Collaborative Project
>>>>
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] interconnect: Add pre_aggregate() callback
  2019-08-02 16:22                 ` Georgi Djakov
@ 2019-08-05 15:33                   ` Georgi Djakov
  2019-08-05 16:04                     ` Evan Green
  0 siblings, 1 reply; 14+ messages in thread
From: Georgi Djakov @ 2019-08-05 15:33 UTC (permalink / raw)
  To: linux-pm, evgreen, daidavid1
  Cc: vincent.guittot, bjorn.andersson, amit.kucheria, dianders,
	seansw, linux-kernel, linux-arm-kernel, linux-arm-msm,
	Georgi Djakov

Introduce an optional callback in interconnect provider drivers. It can be
used for implementing actions, that need to be executed before the actual
aggregation of the bandwidth requests has started.

The benefit of this for now is that it will significantly simplify the code
in provider drivers.

Suggested-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
---
 drivers/interconnect/core.c           | 3 +++
 include/linux/interconnect-provider.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 251354bb7fdc..7b971228df38 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -205,6 +205,9 @@ static int aggregate_requests(struct icc_node *node)
 	node->avg_bw = 0;
 	node->peak_bw = 0;
 
+	if (p->pre_aggregate)
+		p->pre_aggregate(node);
+
 	hlist_for_each_entry(r, &node->req_list, req_node)
 		p->aggregate(node, r->tag, r->avg_bw, r->peak_bw,
 			     &node->avg_bw, &node->peak_bw);
diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h
index 4ee19fd41568..fd42bd19302d 100644
--- a/include/linux/interconnect-provider.h
+++ b/include/linux/interconnect-provider.h
@@ -36,6 +36,8 @@ struct icc_node *of_icc_xlate_onecell(struct of_phandle_args *spec,
  * @nodes: internal list of the interconnect provider nodes
  * @set: pointer to device specific set operation function
  * @aggregate: pointer to device specific aggregate operation function
+ * @pre_aggregate: pointer to device specific function that is called
+ *		   before the aggregation begins (optional)
  * @xlate: provider-specific callback for mapping nodes from phandle arguments
  * @dev: the device this interconnect provider belongs to
  * @users: count of active users
@@ -47,6 +49,7 @@ struct icc_provider {
 	int (*set)(struct icc_node *src, struct icc_node *dst);
 	int (*aggregate)(struct icc_node *node, u32 tag, u32 avg_bw,
 			 u32 peak_bw, u32 *agg_avg, u32 *agg_peak);
+	int (*pre_aggregate)(struct icc_node *node);
 	struct icc_node* (*xlate)(struct of_phandle_args *spec, void *data);
 	struct device		*dev;
 	int			users;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] interconnect: Add pre_aggregate() callback
  2019-08-05 15:33                   ` [PATCH] interconnect: Add pre_aggregate() callback Georgi Djakov
@ 2019-08-05 16:04                     ` Evan Green
  0 siblings, 0 replies; 14+ messages in thread
From: Evan Green @ 2019-08-05 16:04 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: linux-pm, David Dai, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm

On Mon, Aug 5, 2019 at 8:33 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> Introduce an optional callback in interconnect provider drivers. It can be
> used for implementing actions, that need to be executed before the actual
> aggregation of the bandwidth requests has started.
>
> The benefit of this for now is that it will significantly simplify the code
> in provider drivers.
>
> Suggested-by: Evan Green <evgreen@chromium.org>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

Thanks Georgi, I like it! We should confirm that it actually does
allow David to remove the sum_avg_cached and max_peak_cached shadow
arrays.

> ---
>  drivers/interconnect/core.c           | 3 +++
>  include/linux/interconnect-provider.h | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 251354bb7fdc..7b971228df38 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -205,6 +205,9 @@ static int aggregate_requests(struct icc_node *node)
>         node->avg_bw = 0;
>         node->peak_bw = 0;
>
> +       if (p->pre_aggregate)
> +               p->pre_aggregate(node);
> +
>         hlist_for_each_entry(r, &node->req_list, req_node)
>                 p->aggregate(node, r->tag, r->avg_bw, r->peak_bw,
>                              &node->avg_bw, &node->peak_bw);
> diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h
> index 4ee19fd41568..fd42bd19302d 100644
> --- a/include/linux/interconnect-provider.h
> +++ b/include/linux/interconnect-provider.h
> @@ -36,6 +36,8 @@ struct icc_node *of_icc_xlate_onecell(struct of_phandle_args *spec,
>   * @nodes: internal list of the interconnect provider nodes
>   * @set: pointer to device specific set operation function
>   * @aggregate: pointer to device specific aggregate operation function
> + * @pre_aggregate: pointer to device specific function that is called
> + *                before the aggregation begins (optional)
>   * @xlate: provider-specific callback for mapping nodes from phandle arguments
>   * @dev: the device this interconnect provider belongs to
>   * @users: count of active users
> @@ -47,6 +49,7 @@ struct icc_provider {
>         int (*set)(struct icc_node *src, struct icc_node *dst);
>         int (*aggregate)(struct icc_node *node, u32 tag, u32 avg_bw,
>                          u32 peak_bw, u32 *agg_avg, u32 *agg_peak);
> +       int (*pre_aggregate)(struct icc_node *node);
>         struct icc_node* (*xlate)(struct of_phandle_args *spec, void *data);
>         struct device           *dev;
>         int                     users;

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-05 16:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18  9:17 [PATCH v2 0/2] interconnect: Add path tagging support Georgi Djakov
2019-06-18  9:17 ` [PATCH v2 1/2] interconnect: Add support for path tags Georgi Djakov
2019-07-11 17:06   ` Evan Green
2019-06-18  9:17 ` [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845 Georgi Djakov
2019-07-11 17:06   ` Evan Green
2019-07-15 23:34     ` David Dai
2019-07-16 20:15       ` Evan Green
2019-07-18 17:59         ` David Dai
2019-07-30 22:54           ` Evan Green
2019-07-31  0:37             ` David Dai
2019-07-31 19:06               ` Evan Green
2019-08-02 16:22                 ` Georgi Djakov
2019-08-05 15:33                   ` [PATCH] interconnect: Add pre_aggregate() callback Georgi Djakov
2019-08-05 16:04                     ` Evan Green

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).