linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/3] Invoke rpmh_flush for non OSI targets
@ 2020-02-28 11:38 Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 1/3] arm64: dts: qcom: sc7180: Add cpuidle low power states Maulik Shah
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Maulik Shah @ 2020-02-28 11:38 UTC (permalink / raw)
  To: swboyd, mka, evgreen, bjorn.andersson
  Cc: linux-kernel, linux-arm-msm, agross, dianders, rnayak, ilina,
	lsrao, Maulik Shah

Changes in v9:
- Keep rpmh_flush() to invoke from within cache_lock
- Remove comments related to only last cpu invoking rpmh_flush()

Changes in v8:
- Address Stephen's comments on changes 2 and 3
- Add Reviewed by from Stephen on change 1

Changes in v7:
- Address Srinivas's comments to update commit text
- Add Reviewed by from Srinivas

Changes in v6:
- Drop 1 & 2 changes from v5 as they already landed in maintainer tree
- Drop 3 & 4 changes from v5 as no user at present for power domain in rsc
- Rename subject to appropriate since power domain changes are dropped
- Rebase other changes on top of next-20200221

Changes in v5:
- Add Rob's Acked by on dt-bindings change
- Drop firmware psci change
- Update cpuidle stats in dtsi to follow PC mode
- Include change to update dirty flag when data is updated from [4]
- Add change to invoke rpmh_flush when caches are dirty

Changes in v4:
- Add change to allow hierarchical topology in PC mode
- Drop hierarchical domain idle states converter from v3
- Address Merge sc7180 dtsi change to add low power modes

Changes in v3:
- Address Rob's comment on dt property value
- Address Stephen's comments on rpmh-rsc driver change
- Include sc7180 cpuidle low power mode changes from [1]
- Include hierarchical domain idle states converter change from [2]

Changes in v2:
- Add Stephen's Reviewed-By to the first three patches
- Addressed Stephen's comments on fourth patch
- Include changes to connect rpmh domain to cpuidle and genpds

Resource State Coordinator (RSC) is responsible for powering off/lowering
the requirements from CPU subsystem for the associated hardware like buses,
clocks, and regulators when all CPUs and cluster is powered down.

RSC power domain uses last-man activities provided by genpd framework based
on Ulf Hansoon's patch series[3], when the cluster of CPUs enter deepest
idle states. As a part of domain poweroff, RSC can lower resource state
requirements by flushing the cached sleep and wake state votes for various
resources.

[1] https://patchwork.kernel.org/patch/11218965
[2] https://patchwork.kernel.org/patch/10941671
[3] https://patchwork.kernel.org/project/linux-arm-msm/list/?series=222355
[4] https://patchwork.kernel.org/project/linux-arm-msm/list/?series=236503

Maulik Shah (3):
  arm64: dts: qcom: sc7180: Add cpuidle low power states
  soc: qcom: rpmh: Update dirty flag only when data changes
  soc: qcom: rpmh: Invoke rpmh_flush for dirty caches

 arch/arm64/boot/dts/qcom/sc7180.dtsi | 78 ++++++++++++++++++++++++++++++++++++
 drivers/soc/qcom/rpmh.c              | 27 ++++++++++---
 2 files changed, 100 insertions(+), 5 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v9 1/3] arm64: dts: qcom: sc7180: Add cpuidle low power states
  2020-02-28 11:38 [PATCH v9 0/3] Invoke rpmh_flush for non OSI targets Maulik Shah
@ 2020-02-28 11:38 ` Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches Maulik Shah
  2 siblings, 0 replies; 14+ messages in thread
From: Maulik Shah @ 2020-02-28 11:38 UTC (permalink / raw)
  To: swboyd, mka, evgreen, bjorn.andersson
  Cc: linux-kernel, linux-arm-msm, agross, dianders, rnayak, ilina,
	lsrao, Maulik Shah, devicetree

Add device bindings for cpuidle states for cpu devices.

Cc: devicetree@vger.kernel.orgi
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
---
 arch/arm64/boot/dts/qcom/sc7180.dtsi | 78 ++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index cc5a94f..2941a7e 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -86,6 +86,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x0>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_0>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -103,6 +106,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x100>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_100>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -117,6 +123,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x200>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_200>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -131,6 +140,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x300>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_300>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -145,6 +157,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x400>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_400>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -159,6 +174,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x500>;
 			enable-method = "psci";
+			cpu-idle-states = <&LITTLE_CPU_SLEEP_0
+					   &LITTLE_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_500>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 0>;
@@ -173,6 +191,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x600>;
 			enable-method = "psci";
+			cpu-idle-states = <&BIG_CPU_SLEEP_0
+					   &BIG_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_600>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 1>;
@@ -187,6 +208,9 @@
 			compatible = "arm,armv8";
 			reg = <0x0 0x700>;
 			enable-method = "psci";
+			cpu-idle-states = <&BIG_CPU_SLEEP_0
+					   &BIG_CPU_SLEEP_1
+					   &CLUSTER_SLEEP_0>;
 			next-level-cache = <&L2_700>;
 			#cooling-cells = <2>;
 			qcom,freq-domain = <&cpufreq_hw 1>;
@@ -195,6 +219,60 @@
 				next-level-cache = <&L3_0>;
 			};
 		};
+
+		idle-states {
+			entry-method = "psci";
+
+			LITTLE_CPU_SLEEP_0: cpu-sleep-0-0 {
+				compatible = "arm,idle-state";
+				idle-state-name = "little-power-down";
+				arm,psci-suspend-param = <0x40000003>;
+				entry-latency-us = <549>;
+				exit-latency-us = <901>;
+				min-residency-us = <1774>;
+				local-timer-stop;
+			};
+
+			LITTLE_CPU_SLEEP_1: cpu-sleep-0-1 {
+				compatible = "arm,idle-state";
+				idle-state-name = "little-rail-power-down";
+				arm,psci-suspend-param = <0x40000004>;
+				entry-latency-us = <702>;
+				exit-latency-us = <915>;
+				min-residency-us = <4001>;
+				local-timer-stop;
+			};
+
+			BIG_CPU_SLEEP_0: cpu-sleep-1-0 {
+				compatible = "arm,idle-state";
+				idle-state-name = "big-power-down";
+				arm,psci-suspend-param = <0x40000003>;
+				entry-latency-us = <523>;
+				exit-latency-us = <1244>;
+				min-residency-us = <2207>;
+				local-timer-stop;
+			};
+
+			BIG_CPU_SLEEP_1: cpu-sleep-1-1 {
+				compatible = "arm,idle-state";
+				idle-state-name = "big-rail-power-down";
+				arm,psci-suspend-param = <0x40000004>;
+				entry-latency-us = <526>;
+				exit-latency-us = <1854>;
+				min-residency-us = <5555>;
+				local-timer-stop;
+			};
+
+			CLUSTER_SLEEP_0: cluster-sleep-0 {
+				compatible = "arm,idle-state";
+				idle-state-name = "cluster-power-down";
+				arm,psci-suspend-param = <0x40003444>;
+				entry-latency-us = <3263>;
+				exit-latency-us = <6562>;
+				min-residency-us = <9926>;
+				local-timer-stop;
+			};
+		};
 	};
 
 	memory@80000000 {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes
  2020-02-28 11:38 [PATCH v9 0/3] Invoke rpmh_flush for non OSI targets Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 1/3] arm64: dts: qcom: sc7180: Add cpuidle low power states Maulik Shah
@ 2020-02-28 11:38 ` Maulik Shah
  2020-02-28 21:50   ` Evan Green
  2020-02-28 11:38 ` [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches Maulik Shah
  2 siblings, 1 reply; 14+ messages in thread
From: Maulik Shah @ 2020-02-28 11:38 UTC (permalink / raw)
  To: swboyd, mka, evgreen, bjorn.andersson
  Cc: linux-kernel, linux-arm-msm, agross, dianders, rnayak, ilina,
	lsrao, Maulik Shah

Currently rpmh ctrlr dirty flag is set for all cases regardless of data
is really changed or not. Add changes to update dirty flag when data is
changed to newer values.

Also move dirty flag updates to happen from within cache_lock and remove
unnecessary INIT_LIST_HEAD() call and a default case from switch.

Fixes: 600513dfeef3 ("drivers: qcom: rpmh: cache sleep/wake state requests")
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
---
 drivers/soc/qcom/rpmh.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
index eb0ded0..f28afe4 100644
--- a/drivers/soc/qcom/rpmh.c
+++ b/drivers/soc/qcom/rpmh.c
@@ -133,26 +133,30 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
 
 	req->addr = cmd->addr;
 	req->sleep_val = req->wake_val = UINT_MAX;
-	INIT_LIST_HEAD(&req->list);
 	list_add_tail(&req->list, &ctrlr->cache);
 
 existing:
 	switch (state) {
 	case RPMH_ACTIVE_ONLY_STATE:
-		if (req->sleep_val != UINT_MAX)
+		if (req->sleep_val != UINT_MAX) {
 			req->wake_val = cmd->data;
+			ctrlr->dirty = true;
+		}
 		break;
 	case RPMH_WAKE_ONLY_STATE:
-		req->wake_val = cmd->data;
+		if (req->wake_val != cmd->data) {
+			req->wake_val = cmd->data;
+			ctrlr->dirty = true;
+		}
 		break;
 	case RPMH_SLEEP_STATE:
-		req->sleep_val = cmd->data;
-		break;
-	default:
+		if (req->sleep_val != cmd->data) {
+			req->sleep_val = cmd->data;
+			ctrlr->dirty = true;
+		}
 		break;
 	}
 
-	ctrlr->dirty = true;
 unlock:
 	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
 
@@ -287,6 +291,7 @@ static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
 
 	spin_lock_irqsave(&ctrlr->cache_lock, flags);
 	list_add_tail(&req->list, &ctrlr->batch_cache);
+	ctrlr->dirty = true;
 	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
 }
 
@@ -323,6 +328,7 @@ static void invalidate_batch(struct rpmh_ctrlr *ctrlr)
 	list_for_each_entry_safe(req, tmp, &ctrlr->batch_cache, list)
 		kfree(req);
 	INIT_LIST_HEAD(&ctrlr->batch_cache);
+	ctrlr->dirty = true;
 	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
 }
 
@@ -507,7 +513,6 @@ int rpmh_invalidate(const struct device *dev)
 	int ret;
 
 	invalidate_batch(ctrlr);
-	ctrlr->dirty = true;
 
 	do {
 		ret = rpmh_rsc_invalidate(ctrlr_to_drv(ctrlr));
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-02-28 11:38 [PATCH v9 0/3] Invoke rpmh_flush for non OSI targets Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 1/3] arm64: dts: qcom: sc7180: Add cpuidle low power states Maulik Shah
  2020-02-28 11:38 ` [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes Maulik Shah
@ 2020-02-28 11:38 ` Maulik Shah
  2020-02-28 21:48   ` Evan Green
  2020-02-28 23:45   ` Doug Anderson
  2 siblings, 2 replies; 14+ messages in thread
From: Maulik Shah @ 2020-02-28 11:38 UTC (permalink / raw)
  To: swboyd, mka, evgreen, bjorn.andersson
  Cc: linux-kernel, linux-arm-msm, agross, dianders, rnayak, ilina,
	lsrao, Maulik Shah

Add changes to invoke rpmh flush() from within cache_lock when the data
in cache is dirty.

This is done only if OSI is not supported in PSCI. If OSI is supported
rpmh_flush can get invoked when the last cpu going to power collapse
deepest low power mode.

Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
driver is only compiled for arm64 which supports psci_has_osi_support()
API.

Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
---
 drivers/soc/qcom/Kconfig |  2 +-
 drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index d0a73e7..2e581bc 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
 
 config QCOM_RPMH
 	bool "Qualcomm RPM-Hardened (RPMH) Communication"
-	depends on ARCH_QCOM && ARM64 || COMPILE_TEST
+	depends on ARCH_QCOM && ARM64
 	help
 	  Support for communication with the hardened-RPM blocks in
 	  Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
index f28afe4..6a5a60c 100644
--- a/drivers/soc/qcom/rpmh.c
+++ b/drivers/soc/qcom/rpmh.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
+#include <linux/psci.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
@@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
 	}
 
 unlock:
+	if (ctrlr->dirty && !psci_has_osi_support()) {
+		if (rpmh_flush(ctrlr)) {
+			spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
 	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
 
 	return req;
@@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
 }
 EXPORT_SYMBOL(rpmh_write);
 
-static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
+static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&ctrlr->cache_lock, flags);
+
 	list_add_tail(&req->list, &ctrlr->batch_cache);
 	ctrlr->dirty = true;
+
+	if (!psci_has_osi_support()) {
+		if (rpmh_flush(ctrlr)) {
+			spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
+			return -EINVAL;
+		}
+	}
+
 	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
+
+	return 0;
 }
 
 static int flush_batch(struct rpmh_ctrlr *ctrlr)
 {
 	struct batch_cache_req *req;
 	const struct rpmh_request *rpm_msg;
-	unsigned long flags;
 	int ret = 0;
 	int i;
 
 	/* Send Sleep/Wake requests to the controller, expect no response */
-	spin_lock_irqsave(&ctrlr->cache_lock, flags);
 	list_for_each_entry(req, &ctrlr->batch_cache, list) {
 		for (i = 0; i < req->count; i++) {
 			rpm_msg = req->rpm_msgs + i;
@@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
 				break;
 		}
 	}
-	spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
 
 	return ret;
 }
@@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
 		cmd += n[i];
 	}
 
-	if (state != RPMH_ACTIVE_ONLY_STATE) {
-		cache_batch(ctrlr, req);
-		return 0;
-	}
+	if (state != RPMH_ACTIVE_ONLY_STATE)
+		return cache_batch(ctrlr, req);
 
 	for (i = 0; i < count; i++) {
 		struct completion *compl = &compls[i];
@@ -455,9 +469,6 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
  * Return: -EBUSY if the controller is busy, probably waiting on a response
  * to a RPMH request sent earlier.
  *
- * This function is always called from the sleep code from the last CPU
- * that is powering down the entire system. Since no other RPMH API would be
- * executing at this time, it is safe to run lockless.
  */
 int rpmh_flush(struct rpmh_ctrlr *ctrlr)
 {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-02-28 11:38 ` [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches Maulik Shah
@ 2020-02-28 21:48   ` Evan Green
  2020-03-03  5:48     ` Maulik Shah
  2020-02-28 23:45   ` Doug Anderson
  1 sibling, 1 reply; 14+ messages in thread
From: Evan Green @ 2020-02-28 21:48 UTC (permalink / raw)
  To: Maulik Shah
  Cc: Stephen Boyd, Matthias Kaehlcke, Bjorn Andersson, LKML,
	linux-arm-msm, Andy Gross, Doug Anderson, Rajendra Nayak,
	Lina Iyer, lsrao

Hi Maulik,
Thanks for spinning this so promptly.

On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> Add changes to invoke rpmh flush() from within cache_lock when the data
> in cache is dirty.
>
> This is done only if OSI is not supported in PSCI. If OSI is supported
> rpmh_flush can get invoked when the last cpu going to power collapse
> deepest low power mode.
>
> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
> driver is only compiled for arm64 which supports psci_has_osi_support()
> API.
>
> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
> ---
>  drivers/soc/qcom/Kconfig |  2 +-
>  drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
>  2 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> index d0a73e7..2e581bc 100644
> --- a/drivers/soc/qcom/Kconfig
> +++ b/drivers/soc/qcom/Kconfig
> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
>
>  config QCOM_RPMH
>         bool "Qualcomm RPM-Hardened (RPMH) Communication"
> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
> +       depends on ARCH_QCOM && ARM64
>         help
>           Support for communication with the hardened-RPM blocks in
>           Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> index f28afe4..6a5a60c 100644
> --- a/drivers/soc/qcom/rpmh.c
> +++ b/drivers/soc/qcom/rpmh.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/psci.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>         }
>
>  unlock:
> +       if (ctrlr->dirty && !psci_has_osi_support()) {
> +               if (rpmh_flush(ctrlr)) {
> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +                       return ERR_PTR(-EINVAL);
> +               }
> +       }
> +
>         spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>
>         return req;
> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
>  }
>  EXPORT_SYMBOL(rpmh_write);
>
> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>  {
>         unsigned long flags;
>
>         spin_lock_irqsave(&ctrlr->cache_lock, flags);
> +
>         list_add_tail(&req->list, &ctrlr->batch_cache);
>         ctrlr->dirty = true;
> +
> +       if (!psci_has_osi_support()) {
> +               if (rpmh_flush(ctrlr)) {
> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +
> +       return 0;
>  }
>
>  static int flush_batch(struct rpmh_ctrlr *ctrlr)
>  {
>         struct batch_cache_req *req;
>         const struct rpmh_request *rpm_msg;
> -       unsigned long flags;
>         int ret = 0;
>         int i;
>
>         /* Send Sleep/Wake requests to the controller, expect no response */
> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
>         list_for_each_entry(req, &ctrlr->batch_cache, list) {
>                 for (i = 0; i < req->count; i++) {
>                         rpm_msg = req->rpm_msgs + i;
> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>                                 break;
>                 }
>         }
> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>
>         return ret;
>  }
> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>                 cmd += n[i];
>         }
>
> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
> -               cache_batch(ctrlr, req);
> -               return 0;
> -       }
> +       if (state != RPMH_ACTIVE_ONLY_STATE)
> +               return cache_batch(ctrlr, req);
>
>         for (i = 0; i < count; i++) {
>                 struct completion *compl = &compls[i];
> @@ -455,9 +469,6 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
>   * Return: -EBUSY if the controller is busy, probably waiting on a response
>   * to a RPMH request sent earlier.
>   *
> - * This function is always called from the sleep code from the last CPU
> - * that is powering down the entire system. Since no other RPMH API would be
> - * executing at this time, it is safe to run lockless.

Oh nice, I didn't even see that comment. We should probably replace
that with a comment indicating that we assume ctrlr->cache_lock is
already held.

Please also remove this comment in rpmh_flush():
        /*
         * Nobody else should be calling this function other than system PM,
         * hence we can run without locks.
         */
        list_for_each_entry(p, &ctrlr->cache, list) {

-Evan

>   */
>  int rpmh_flush(struct rpmh_ctrlr *ctrlr)
>  {
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes
  2020-02-28 11:38 ` [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes Maulik Shah
@ 2020-02-28 21:50   ` Evan Green
  2020-03-02 11:38     ` Maulik Shah
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Green @ 2020-02-28 21:50 UTC (permalink / raw)
  To: Maulik Shah
  Cc: Stephen Boyd, Matthias Kaehlcke, Bjorn Andersson, LKML,
	linux-arm-msm, Andy Gross, Doug Anderson, Rajendra Nayak,
	Lina Iyer, lsrao

On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> Currently rpmh ctrlr dirty flag is set for all cases regardless of data
> is really changed or not. Add changes to update dirty flag when data is
> changed to newer values.
>
> Also move dirty flag updates to happen from within cache_lock and remove
> unnecessary INIT_LIST_HEAD() call and a default case from switch.
>
> Fixes: 600513dfeef3 ("drivers: qcom: rpmh: cache sleep/wake state requests")
> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
> ---
>  drivers/soc/qcom/rpmh.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> index eb0ded0..f28afe4 100644
> --- a/drivers/soc/qcom/rpmh.c
> +++ b/drivers/soc/qcom/rpmh.c
> @@ -133,26 +133,30 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>
>         req->addr = cmd->addr;
>         req->sleep_val = req->wake_val = UINT_MAX;
> -       INIT_LIST_HEAD(&req->list);
>         list_add_tail(&req->list, &ctrlr->cache);
>
>  existing:
>         switch (state) {
>         case RPMH_ACTIVE_ONLY_STATE:
> -               if (req->sleep_val != UINT_MAX)
> +               if (req->sleep_val != UINT_MAX) {
>                         req->wake_val = cmd->data;
> +                       ctrlr->dirty = true;
> +               }
>                 break;
>         case RPMH_WAKE_ONLY_STATE:
> -               req->wake_val = cmd->data;
> +               if (req->wake_val != cmd->data) {
> +                       req->wake_val = cmd->data;
> +                       ctrlr->dirty = true;
> +               }
>                 break;
>         case RPMH_SLEEP_STATE:
> -               req->sleep_val = cmd->data;
> -               break;
> -       default:
> +               if (req->sleep_val != cmd->data) {
> +                       req->sleep_val = cmd->data;
> +                       ctrlr->dirty = true;
> +               }
>                 break;
>         }
>
> -       ctrlr->dirty = true;
>  unlock:
>         spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>
> @@ -287,6 +291,7 @@ static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>
>         spin_lock_irqsave(&ctrlr->cache_lock, flags);
>         list_add_tail(&req->list, &ctrlr->batch_cache);
> +       ctrlr->dirty = true;

Is this fixing a case where we were not previously marking the
controller dirty but should have? I notice there's a fixes tag, but it
would be helpful to add something to the commit text indicating that
you're fixing a missing case where the controller should have been
marked dirty. With that fixed, you can add my tag:

Reviewed-by: Evan Green <evgreen@chromium.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-02-28 11:38 ` [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches Maulik Shah
  2020-02-28 21:48   ` Evan Green
@ 2020-02-28 23:45   ` Doug Anderson
  2020-03-03  5:47     ` Maulik Shah
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2020-02-28 23:45 UTC (permalink / raw)
  To: Maulik Shah
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao

Hi,

On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> Add changes to invoke rpmh flush() from within cache_lock when the data
> in cache is dirty.
>
> This is done only if OSI is not supported in PSCI. If OSI is supported
> rpmh_flush can get invoked when the last cpu going to power collapse
> deepest low power mode.
>
> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
> driver is only compiled for arm64 which supports psci_has_osi_support()
> API.
>
> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
> ---
>  drivers/soc/qcom/Kconfig |  2 +-
>  drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
>  2 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> index d0a73e7..2e581bc 100644
> --- a/drivers/soc/qcom/Kconfig
> +++ b/drivers/soc/qcom/Kconfig
> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
>
>  config QCOM_RPMH
>         bool "Qualcomm RPM-Hardened (RPMH) Communication"
> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
> +       depends on ARCH_QCOM && ARM64
>         help
>           Support for communication with the hardened-RPM blocks in
>           Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> index f28afe4..6a5a60c 100644
> --- a/drivers/soc/qcom/rpmh.c
> +++ b/drivers/soc/qcom/rpmh.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/psci.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>         }
>
>  unlock:
> +       if (ctrlr->dirty && !psci_has_osi_support()) {
> +               if (rpmh_flush(ctrlr)) {
> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +                       return ERR_PTR(-EINVAL);
> +               }
> +       }

It's been a long time since I looked in depth at RPMH, but upon a
first glance this seems like it's gonna be terrible for performance.
You're going to send every entry again and again, aren't you?  In
other words in pseudo-code:

1. rpmh_write(addr=0x10, data=0x99);
==> writes on the bus (0x10, 0x99)

2. rpmh_write(addr=0x11, data=0xaa);
==> writes on the bus (0x10, 0x99)
==> writes on the bus (0x11, 0xaa)

3. rpmh_write(addr=0x10, data=0xbb);
==> writes on the bus (0x10, 0xbb)
==> writes on the bus (0x11, 0xaa)

4. rpmh_write(addr=0x12, data=0xcc);
==> writes on the bus (0x10, 0xbb)
==> writes on the bus (0x11, 0xaa)
==> writes on the bus (0x12, 0xcc)

That seems bad.  Why can't you just send the new request itself and
forget adding it to the cache?  In other words don't even call
cache_rpm_request() in the non-OSI case and then in __rpmh_write()
just send right away...

I tried to test this and my printouts didn't show anything actually
happening in rpmh_flush().  Maybe I just don't have the write patches
to exercise this properly...


> +
>         spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>
>         return req;
> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
>  }
>  EXPORT_SYMBOL(rpmh_write);
>
> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>  {
>         unsigned long flags;
>
>         spin_lock_irqsave(&ctrlr->cache_lock, flags);
> +
>         list_add_tail(&req->list, &ctrlr->batch_cache);
>         ctrlr->dirty = true;
> +
> +       if (!psci_has_osi_support()) {
> +               if (rpmh_flush(ctrlr)) {
> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> +
> +       return 0;
>  }
>
>  static int flush_batch(struct rpmh_ctrlr *ctrlr)
>  {
>         struct batch_cache_req *req;
>         const struct rpmh_request *rpm_msg;
> -       unsigned long flags;
>         int ret = 0;
>         int i;
>
>         /* Send Sleep/Wake requests to the controller, expect no response */
> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
>         list_for_each_entry(req, &ctrlr->batch_cache, list) {
>                 for (i = 0; i < req->count; i++) {
>                         rpm_msg = req->rpm_msgs + i;
> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>                                 break;
>                 }
>         }
> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>
>         return ret;
>  }
> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>                 cmd += n[i];
>         }
>
> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
> -               cache_batch(ctrlr, req);
> -               return 0;
> -       }
> +       if (state != RPMH_ACTIVE_ONLY_STATE)
> +               return cache_batch(ctrlr, req);

I'm curious: why not just do:

if (state != RPMH_ACTIVE_ONLY_STATE && psci_has_osi_support()) {
  cache_batch(ctrlr, req);
  return 0;
}

...AKA don't even cache it up if we're not in OSI mode.  IIUC this
would be a huge deal because with your code you're doing the whole
RPMH transfer under "spin_lock_irqsave", right?  And presumably RPMH
transfers are somewhat slow, otherwise why did anyone come up with
this whole caching / last-man-down scheme to start with?

OK, it turned out to be at least slightly more complex because it
appears that we're supposed to use rpmh_rsc_write_ctrl_data() for
sleep/wake stuff and that they never do completions, but it really
wasn't too hard.  I prototyped it at <http://crrev.com/c/2080916>.
Feel free to hijack that change if it looks like a starting point and
if it looks like I'm not too confused.


>         for (i = 0; i < count; i++) {
>                 struct completion *compl = &compls[i];
> @@ -455,9 +469,6 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
>   * Return: -EBUSY if the controller is busy, probably waiting on a response
>   * to a RPMH request sent earlier.
>   *
> - * This function is always called from the sleep code from the last CPU
> - * that is powering down the entire system. Since no other RPMH API would be
> - * executing at this time, it is safe to run lockless.

Interesting that you removed this comment but not the copy of the
comment inside this function.  Was that on purpose?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes
  2020-02-28 21:50   ` Evan Green
@ 2020-03-02 11:38     ` Maulik Shah
  0 siblings, 0 replies; 14+ messages in thread
From: Maulik Shah @ 2020-03-02 11:38 UTC (permalink / raw)
  To: Evan Green
  Cc: Stephen Boyd, Matthias Kaehlcke, Bjorn Andersson, LKML,
	linux-arm-msm, Andy Gross, Doug Anderson, Rajendra Nayak,
	Lina Iyer, lsrao


On 2/29/2020 3:20 AM, Evan Green wrote:
> On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>> Currently rpmh ctrlr dirty flag is set for all cases regardless of data
>> is really changed or not. Add changes to update dirty flag when data is
>> changed to newer values.
>>
>> Also move dirty flag updates to happen from within cache_lock and remove
>> unnecessary INIT_LIST_HEAD() call and a default case from switch.
>>
>> Fixes: 600513dfeef3 ("drivers: qcom: rpmh: cache sleep/wake state requests")
>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
>> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
>> ---
>>   drivers/soc/qcom/rpmh.c | 21 +++++++++++++--------
>>   1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>> index eb0ded0..f28afe4 100644
>> --- a/drivers/soc/qcom/rpmh.c
>> +++ b/drivers/soc/qcom/rpmh.c
>> @@ -133,26 +133,30 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>>
>>          req->addr = cmd->addr;
>>          req->sleep_val = req->wake_val = UINT_MAX;
>> -       INIT_LIST_HEAD(&req->list);
>>          list_add_tail(&req->list, &ctrlr->cache);
>>
>>   existing:
>>          switch (state) {
>>          case RPMH_ACTIVE_ONLY_STATE:
>> -               if (req->sleep_val != UINT_MAX)
>> +               if (req->sleep_val != UINT_MAX) {
>>                          req->wake_val = cmd->data;
>> +                       ctrlr->dirty = true;
>> +               }
>>                  break;
>>          case RPMH_WAKE_ONLY_STATE:
>> -               req->wake_val = cmd->data;
>> +               if (req->wake_val != cmd->data) {
>> +                       req->wake_val = cmd->data;
>> +                       ctrlr->dirty = true;
>> +               }
>>                  break;
>>          case RPMH_SLEEP_STATE:
>> -               req->sleep_val = cmd->data;
>> -               break;
>> -       default:
>> +               if (req->sleep_val != cmd->data) {
>> +                       req->sleep_val = cmd->data;
>> +                       ctrlr->dirty = true;
>> +               }
>>                  break;
>>          }
>>
>> -       ctrlr->dirty = true;
>>   unlock:
>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>> @@ -287,6 +291,7 @@ static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>>
>>          spin_lock_irqsave(&ctrlr->cache_lock, flags);
>>          list_add_tail(&req->list, &ctrlr->batch_cache);
>> +       ctrlr->dirty = true;
> Is this fixing a case where we were not previously marking the
> controller dirty but should have?
Well, its not a missing case as such.
Let me explain.
Whenever rpmh_invalidate()  (lets call it event a) is called, controller 
dirty flag is set.
post invalidating, the aggregator driver always sends new sleep data 
(event b) and wake data (event c).
the last cpu going down flushes the cached data and dirty flag is 
unset.  (event d)

hence setting dirty flag, only once during (event a) and not updating it 
during (event b, c) was not any issue/missing case, at least till now.

However with changes to invoke rpmh_flush() whenever data is dirty, it 
is required to set this flag for each of these events.
Otherwise as the rpmh_flush() call happened during event b would have 
marked as non-dirty data at the end of it,
now when rpmh_flush() call happened during event c, it will not flush 
anything as caches are not dirty, which is not the expectation.


> I notice there's a fixes tag, but it
> would be helpful to add something to the commit text indicating that
> you're fixing a missing case where the controller should have been
> marked dirty.
sure i can add details on why we need to update dirty flag everytime 
when touching caches.
> With that fixed, you can add my tag:
>
> Reviewed-by: Evan Green <evgreen@chromium.org>
Thanks

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-02-28 23:45   ` Doug Anderson
@ 2020-03-03  5:47     ` Maulik Shah
  2020-03-04  0:40       ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Maulik Shah @ 2020-03-03  5:47 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao


On 2/29/2020 5:15 AM, Doug Anderson wrote:
> Hi,
>
> On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>> Add changes to invoke rpmh flush() from within cache_lock when the data
>> in cache is dirty.
>>
>> This is done only if OSI is not supported in PSCI. If OSI is supported
>> rpmh_flush can get invoked when the last cpu going to power collapse
>> deepest low power mode.
>>
>> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
>> driver is only compiled for arm64 which supports psci_has_osi_support()
>> API.
>>
>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
>> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
>> ---
>>   drivers/soc/qcom/Kconfig |  2 +-
>>   drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
>>   2 files changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>> index d0a73e7..2e581bc 100644
>> --- a/drivers/soc/qcom/Kconfig
>> +++ b/drivers/soc/qcom/Kconfig
>> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
>>
>>   config QCOM_RPMH
>>          bool "Qualcomm RPM-Hardened (RPMH) Communication"
>> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
>> +       depends on ARCH_QCOM && ARM64
>>          help
>>            Support for communication with the hardened-RPM blocks in
>>            Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>> index f28afe4..6a5a60c 100644
>> --- a/drivers/soc/qcom/rpmh.c
>> +++ b/drivers/soc/qcom/rpmh.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/module.h>
>>   #include <linux/of.h>
>>   #include <linux/platform_device.h>
>> +#include <linux/psci.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>>   #include <linux/types.h>
>> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>>          }
>>
>>   unlock:
>> +       if (ctrlr->dirty && !psci_has_osi_support()) {
>> +               if (rpmh_flush(ctrlr)) {
>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +                       return ERR_PTR(-EINVAL);
>> +               }
>> +       }
> It's been a long time since I looked in depth at RPMH, but upon a
> first glance this seems like it's gonna be terrible for performance.
> You're going to send every entry again and again, aren't you?  In
> other words in pseudo-code:
>
> 1. rpmh_write(addr=0x10, data=0x99);
> ==> writes on the bus (0x10, 0x99)
>
> 2. rpmh_write(addr=0x11, data=0xaa);
> ==> writes on the bus (0x10, 0x99)
> ==> writes on the bus (0x11, 0xaa)
>
> 3. rpmh_write(addr=0x10, data=0xbb);
> ==> writes on the bus (0x10, 0xbb)
> ==> writes on the bus (0x11, 0xaa)
>
> 4. rpmh_write(addr=0x12, data=0xcc);
> ==> writes on the bus (0x10, 0xbb)
> ==> writes on the bus (0x11, 0xaa)
> ==> writes on the bus (0x12, 0xcc)
>
> That seems bad.

Hi Doug,

No this is NOT how data is sent to RPMh/AOSS.
The rpmh_flush() fills up DRV-2 (HLOS) TCSes, makes it ready and The HW 
takes care of
sending data of Sleep TCSes for each of the EL/ DRV(s) when Last cpu is 
going to deepest
low power mode and of WAKE TCSes while first cpu is waking up.

> Why can't you just send the new request itself and
> forget adding it to the cache?  In other words don't even call
> cache_rpm_request() in the non-OSI case and then in __rpmh_write()
> just send right away...

This won’t work out. Let me explain why…

We have 3 SLEEP and 3 WAKE TCSes from below config..
                         qcom,tcs-config = <ACTIVE_TCS  2>,
                                           <SLEEP_TCS   3>,
                                           <WAKE_TCS    3>,
Each TCS has total 16 commands so total 48 commands(16*3) for each SLEEP 
and WAKE TCSes,
that can be filled up.

Now Lets take a example in pseudo-code on what could happen if we don’t 
cache and
immediately fill up TCSes commands. The triggering part doesn’t happen 
as explained above
it fills up TCSes and makes them ready..

Time-t0 (from client_x invoking rpmh_write_batch() for SLEEP SET, a 
batch of 3 commands)

rpmh_write_batch(
addr=0x10, data=0x99,  -> fills up CMD0 in SLEEP TCS_0
addr=0x11, data=0xaa,  -> fills up CMD1 in SLEEP TCS_0
addr=0x10, data=0xbb); -> fills up CMD2 in SLEEP TCS_0

Time-t1 (from client_y invoking rpmh_write(), a single command)

rpmh_write(
addr=0x12, data=0xcc,  -> fills up CMD3 in SLEEP TCS_0
);

Time-t2 (from client_x invokes rpmh_invalidate() which invalidates all 
previous *batch requests* only)

At this point, it should have CMD3 only in TCS while CMD 0,1,2 needs to 
be freed up, since we expect
a new batch request now.

Since driver didn’t cache anything in the first place, it doesn’t know 
details about previous batch request
like how many commands it had, what were the commands of those batches 
when filling up in TCSes, and so on…
(basically all the data required to free up only CMD 0,1,2, and don’t 
disturb CMD3)

Whats more?

The new batch request could be of let say 5 commands after invalidation, 
instead of 3 commands in previous batch.
So it will not fit in CMD-0,1,2 and we might want to allocate from 
CMD-4,5,6,7,8 now.

This will leave a hole in TCS CMDs (each TCS has 16 total commands) 
unless we re-arrange everything.
Also we may want to fill up batch request first and then single 
requests, by not caching anything, driver don’t
know which one is batch and which one is single request.

There are other cases like below which also gets impacted if driver 
don't cache anything...

for example, when we don’t have dedicated ACTIVE TCS ( if we have below 
config with ACTIVE TCS count 0)
     qcom,tcs-config = <ACTIVE_TCS  0>,
                           <SLEEP_TCS   3>,
                           <WAKE_TCS    3>,

Now to send active data, driver may re-use/ re-purpose few of the sleep 
or wake TCS, to be used as ACTIVE TCS and once work is done,
it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver 
don’t cache, all the SLEEP and WAKE data is lost when one
of TCS is repurposed to use as ACTIVE TCS.

Hope above explanation clears why caching is important and gives clear 
view of caching v/s not caching.

Thanks,
Maulik

> I tried to test this and my printouts didn't show anything actually
> happening in rpmh_flush().  Maybe I just don't have the write patches
> to exercise this properly...

it may be due to missing interconnect patches series
https://patchwork.kernel.org/project/linux-arm-msm/list/?series=247175

>> +
>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>>          return req;
>> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
>>   }
>>   EXPORT_SYMBOL(rpmh_write);
>>
>> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>>   {
>>          unsigned long flags;
>>
>>          spin_lock_irqsave(&ctrlr->cache_lock, flags);
>> +
>>          list_add_tail(&req->list, &ctrlr->batch_cache);
>>          ctrlr->dirty = true;
>> +
>> +       if (!psci_has_osi_support()) {
>> +               if (rpmh_flush(ctrlr)) {
>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +                       return -EINVAL;
>> +               }
>> +       }
>> +
>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +
>> +       return 0;
>>   }
>>
>>   static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>   {
>>          struct batch_cache_req *req;
>>          const struct rpmh_request *rpm_msg;
>> -       unsigned long flags;
>>          int ret = 0;
>>          int i;
>>
>>          /* Send Sleep/Wake requests to the controller, expect no response */
>> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
>>          list_for_each_entry(req, &ctrlr->batch_cache, list) {
>>                  for (i = 0; i < req->count; i++) {
>>                          rpm_msg = req->rpm_msgs + i;
>> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>                                  break;
>>                  }
>>          }
>> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>>          return ret;
>>   }
>> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>>                  cmd += n[i];
>>          }
>>
>> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
>> -               cache_batch(ctrlr, req);
>> -               return 0;
>> -       }
>> +       if (state != RPMH_ACTIVE_ONLY_STATE)
>> +               return cache_batch(ctrlr, req);
> I'm curious: why not just do:
>
> if (state != RPMH_ACTIVE_ONLY_STATE && psci_has_osi_support()) {
>    cache_batch(ctrlr, req);
>    return 0;
> }
>
> ...AKA don't even cache it up if we're not in OSI mode.  IIUC this
> would be a huge deal because with your code you're doing the whole
> RPMH transfer under "spin_lock_irqsave", right?  And presumably RPMH
> transfers are somewhat slow, otherwise why did anyone come up with
> this whole caching / last-man-down scheme to start with?
>
> OK, it turned out to be at least slightly more complex because it
> appears that we're supposed to use rpmh_rsc_write_ctrl_data() for
> sleep/wake stuff and that they never do completions, but it really
> wasn't too hard.  I prototyped it at <http://crrev.com/c/2080916>.
> Feel free to hijack that change if it looks like a starting point and
> if it looks like I'm not too confused.
I looked at this change and thought of it earlier but it won’t work out 
for the reasons in above example.
I have thought of few optimizations in rpmh_flush() to reduce its time, 
if we *really* see any performance impact…

below is high level idea…
When  rpmh_write_batch() is invoked for SLEEP_SETs, currently 
rpmh_flush() will update both SLEEP and WAKE TCS contents,
However we may change it to update only SLEEP TCS, and when 
rpmh_write_batch() is invoked for WAKE SETs, update only WAKE TCS contents.
This way it may reduce time by roughly ~50%.
>
>>          for (i = 0; i < count; i++) {
>>                  struct completion *compl = &compls[i];
>> @@ -455,9 +469,6 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
>>    * Return: -EBUSY if the controller is busy, probably waiting on a response
>>    * to a RPMH request sent earlier.
>>    *
>> - * This function is always called from the sleep code from the last CPU
>> - * that is powering down the entire system. Since no other RPMH API would be
>> - * executing at this time, it is safe to run lockless.
> Interesting that you removed this comment but not the copy of the
> comment inside this function.  Was that on purpose?
Its a miss. my bad. will remove in next revision.

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-02-28 21:48   ` Evan Green
@ 2020-03-03  5:48     ` Maulik Shah
  0 siblings, 0 replies; 14+ messages in thread
From: Maulik Shah @ 2020-03-03  5:48 UTC (permalink / raw)
  To: Evan Green
  Cc: Stephen Boyd, Matthias Kaehlcke, Bjorn Andersson, LKML,
	linux-arm-msm, Andy Gross, Doug Anderson, Rajendra Nayak,
	Lina Iyer, lsrao


On 2/29/2020 3:18 AM, Evan Green wrote:
> Hi Maulik,
> Thanks for spinning this so promptly.
>
> On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>> Add changes to invoke rpmh flush() from within cache_lock when the data
>> in cache is dirty.
>>
>> This is done only if OSI is not supported in PSCI. If OSI is supported
>> rpmh_flush can get invoked when the last cpu going to power collapse
>> deepest low power mode.
>>
>> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
>> driver is only compiled for arm64 which supports psci_has_osi_support()
>> API.
>>
>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
>> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
>> ---
>>   drivers/soc/qcom/Kconfig |  2 +-
>>   drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
>>   2 files changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>> index d0a73e7..2e581bc 100644
>> --- a/drivers/soc/qcom/Kconfig
>> +++ b/drivers/soc/qcom/Kconfig
>> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
>>
>>   config QCOM_RPMH
>>          bool "Qualcomm RPM-Hardened (RPMH) Communication"
>> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
>> +       depends on ARCH_QCOM && ARM64
>>          help
>>            Support for communication with the hardened-RPM blocks in
>>            Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>> index f28afe4..6a5a60c 100644
>> --- a/drivers/soc/qcom/rpmh.c
>> +++ b/drivers/soc/qcom/rpmh.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/module.h>
>>   #include <linux/of.h>
>>   #include <linux/platform_device.h>
>> +#include <linux/psci.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>>   #include <linux/types.h>
>> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>>          }
>>
>>   unlock:
>> +       if (ctrlr->dirty && !psci_has_osi_support()) {
>> +               if (rpmh_flush(ctrlr)) {
>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +                       return ERR_PTR(-EINVAL);
>> +               }
>> +       }
>> +
>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>>          return req;
>> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
>>   }
>>   EXPORT_SYMBOL(rpmh_write);
>>
>> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>>   {
>>          unsigned long flags;
>>
>>          spin_lock_irqsave(&ctrlr->cache_lock, flags);
>> +
>>          list_add_tail(&req->list, &ctrlr->batch_cache);
>>          ctrlr->dirty = true;
>> +
>> +       if (!psci_has_osi_support()) {
>> +               if (rpmh_flush(ctrlr)) {
>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +                       return -EINVAL;
>> +               }
>> +       }
>> +
>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +
>> +       return 0;
>>   }
>>
>>   static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>   {
>>          struct batch_cache_req *req;
>>          const struct rpmh_request *rpm_msg;
>> -       unsigned long flags;
>>          int ret = 0;
>>          int i;
>>
>>          /* Send Sleep/Wake requests to the controller, expect no response */
>> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
>>          list_for_each_entry(req, &ctrlr->batch_cache, list) {
>>                  for (i = 0; i < req->count; i++) {
>>                          rpm_msg = req->rpm_msgs + i;
>> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>                                  break;
>>                  }
>>          }
>> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>>          return ret;
>>   }
>> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>>                  cmd += n[i];
>>          }
>>
>> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
>> -               cache_batch(ctrlr, req);
>> -               return 0;
>> -       }
>> +       if (state != RPMH_ACTIVE_ONLY_STATE)
>> +               return cache_batch(ctrlr, req);
>>
>>          for (i = 0; i < count; i++) {
>>                  struct completion *compl = &compls[i];
>> @@ -455,9 +469,6 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
>>    * Return: -EBUSY if the controller is busy, probably waiting on a response
>>    * to a RPMH request sent earlier.
>>    *
>> - * This function is always called from the sleep code from the last CPU
>> - * that is powering down the entire system. Since no other RPMH API would be
>> - * executing at this time, it is safe to run lockless.
> Oh nice, I didn't even see that comment. We should probably replace
> that with a comment indicating that we assume ctrlr->cache_lock is
> already held.
>
> Please also remove this comment in rpmh_flush():
>          /*
>           * Nobody else should be calling this function other than system PM,
>           * hence we can run without locks.
>           */
>          list_for_each_entry(p, &ctrlr->cache, list) {
>
> -Evan
Done, will remove in next revision.
>
>>    */
>>   int rpmh_flush(struct rpmh_ctrlr *ctrlr)
>>   {
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
>> of Code Aurora Forum, hosted by The Linux Foundation

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-03-03  5:47     ` Maulik Shah
@ 2020-03-04  0:40       ` Doug Anderson
  2020-03-05  9:41         ` Maulik Shah
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2020-03-04  0:40 UTC (permalink / raw)
  To: Maulik Shah
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao

Hi,

On Mon, Mar 2, 2020 at 9:47 PM Maulik Shah <mkshah@codeaurora.org> wrote:
>
>
> On 2/29/2020 5:15 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
> >> Add changes to invoke rpmh flush() from within cache_lock when the data
> >> in cache is dirty.
> >>
> >> This is done only if OSI is not supported in PSCI. If OSI is supported
> >> rpmh_flush can get invoked when the last cpu going to power collapse
> >> deepest low power mode.
> >>
> >> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
> >> driver is only compiled for arm64 which supports psci_has_osi_support()
> >> API.
> >>
> >> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> >> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
> >> ---
> >>   drivers/soc/qcom/Kconfig |  2 +-
> >>   drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
> >>   2 files changed, 23 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> >> index d0a73e7..2e581bc 100644
> >> --- a/drivers/soc/qcom/Kconfig
> >> +++ b/drivers/soc/qcom/Kconfig
> >> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
> >>
> >>   config QCOM_RPMH
> >>          bool "Qualcomm RPM-Hardened (RPMH) Communication"
> >> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
> >> +       depends on ARCH_QCOM && ARM64
> >>          help
> >>            Support for communication with the hardened-RPM blocks in
> >>            Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
> >> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> >> index f28afe4..6a5a60c 100644
> >> --- a/drivers/soc/qcom/rpmh.c
> >> +++ b/drivers/soc/qcom/rpmh.c
> >> @@ -12,6 +12,7 @@
> >>   #include <linux/module.h>
> >>   #include <linux/of.h>
> >>   #include <linux/platform_device.h>
> >> +#include <linux/psci.h>
> >>   #include <linux/slab.h>
> >>   #include <linux/spinlock.h>
> >>   #include <linux/types.h>
> >> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
> >>          }
> >>
> >>   unlock:
> >> +       if (ctrlr->dirty && !psci_has_osi_support()) {
> >> +               if (rpmh_flush(ctrlr)) {
> >> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >> +                       return ERR_PTR(-EINVAL);
> >> +               }
> >> +       }
> > It's been a long time since I looked in depth at RPMH, but upon a
> > first glance this seems like it's gonna be terrible for performance.
> > You're going to send every entry again and again, aren't you?  In
> > other words in pseudo-code:
> >
> > 1. rpmh_write(addr=0x10, data=0x99);
> > ==> writes on the bus (0x10, 0x99)
> >
> > 2. rpmh_write(addr=0x11, data=0xaa);
> > ==> writes on the bus (0x10, 0x99)
> > ==> writes on the bus (0x11, 0xaa)
> >
> > 3. rpmh_write(addr=0x10, data=0xbb);
> > ==> writes on the bus (0x10, 0xbb)
> > ==> writes on the bus (0x11, 0xaa)
> >
> > 4. rpmh_write(addr=0x12, data=0xcc);
> > ==> writes on the bus (0x10, 0xbb)
> > ==> writes on the bus (0x11, 0xaa)
> > ==> writes on the bus (0x12, 0xcc)
> >
> > That seems bad.
>
> Hi Doug,
>
> No this is NOT how data is sent to RPMh/AOSS.
> The rpmh_flush() fills up DRV-2 (HLOS) TCSes, makes it ready and The HW
> takes care of
> sending data of Sleep TCSes for each of the EL/ DRV(s) when Last cpu is
> going to deepest
> low power mode and of WAKE TCSes while first cpu is waking up.

Ah, I see.  So for sleep / wake commands we never directly wait for
them to go out on the bus while the system is awake.  We just program
them all to the RPMH hardware and they'll set there and all get sent
automatically when the last CPU goes into deepest low power mode.

...so actually the whole point of OSI mode (from an RPMH perspective)
is not to avoid transactions on the bus.  It's just avoiding
programming RPMH over and over again.  Is that correct?

...and the reason we have all these data structures in the kernel is
to keep track of auxiliary information about the things in the
sleep/wake TCSs and make it easier to update bits of them?


> > Why can't you just send the new request itself and
> > forget adding it to the cache?  In other words don't even call
> > cache_rpm_request() in the non-OSI case and then in __rpmh_write()
> > just send right away...
>
> This won’t work out. Let me explain why…
>
> We have 3 SLEEP and 3 WAKE TCSes from below config..
>                          qcom,tcs-config = <ACTIVE_TCS  2>,
>                                            <SLEEP_TCS   3>,
>                                            <WAKE_TCS    3>,
> Each TCS has total 16 commands so total 48 commands(16*3) for each SLEEP
> and WAKE TCSes,
> that can be filled up.
>
> Now Lets take a example in pseudo-code on what could happen if we don’t
> cache and
> immediately fill up TCSes commands. The triggering part doesn’t happen
> as explained above
> it fills up TCSes and makes them ready..
>
> Time-t0 (from client_x invoking rpmh_write_batch() for SLEEP SET, a
> batch of 3 commands)
>
> rpmh_write_batch(
> addr=0x10, data=0x99,  -> fills up CMD0 in SLEEP TCS_0
> addr=0x11, data=0xaa,  -> fills up CMD1 in SLEEP TCS_0
> addr=0x10, data=0xbb); -> fills up CMD2 in SLEEP TCS_0
>
> Time-t1 (from client_y invoking rpmh_write(), a single command)
>
> rpmh_write(
> addr=0x12, data=0xcc,  -> fills up CMD3 in SLEEP TCS_0
> );
>
> Time-t2 (from client_x invokes rpmh_invalidate() which invalidates all
> previous *batch requests* only)
>
> At this point, it should have CMD3 only in TCS while CMD 0,1,2 needs to
> be freed up, since we expect
> a new batch request now.
>
> Since driver didn’t cache anything in the first place, it doesn’t know
> details about previous batch request
> like how many commands it had, what were the commands of those batches
> when filling up in TCSes, and so on…
> (basically all the data required to free up only CMD 0,1,2, and don’t
> disturb CMD3)
>
> Whats more?
>
> The new batch request could be of let say 5 commands after invalidation,
> instead of 3 commands in previous batch.
> So it will not fit in CMD-0,1,2 and we might want to allocate from
> CMD-4,5,6,7,8 now.
>
> This will leave a hole in TCS CMDs (each TCS has 16 total commands)
> unless we re-arrange everything.
> Also we may want to fill up batch request first and then single
> requests, by not caching anything, driver don’t
> know which one is batch and which one is single request.

OK, I got it now.  I'll try to spend some time tomorrow looking over
everything / testing with my new understanding.


> There are other cases like below which also gets impacted if driver
> don't cache anything...
>
> for example, when we don’t have dedicated ACTIVE TCS ( if we have below
> config with ACTIVE TCS count 0)
>      qcom,tcs-config = <ACTIVE_TCS  0>,
>                            <SLEEP_TCS   3>,
>                            <WAKE_TCS    3>,
>
> Now to send active data, driver may re-use/ re-purpose few of the sleep
> or wake TCS, to be used as ACTIVE TCS and once work is done,
> it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver
> don’t cache, all the SLEEP and WAKE data is lost when one
> of TCS is repurposed to use as ACTIVE TCS.

Ah, interesting.  I'll read the code more, but are you expecting this
type of situation to work today, or is it theoretical for the future?


> Hope above explanation clears why caching is important and gives clear
> view of caching v/s not caching.
>
> Thanks,
> Maulik
>
> > I tried to test this and my printouts didn't show anything actually
> > happening in rpmh_flush().  Maybe I just don't have the write patches
> > to exercise this properly...
>
> it may be due to missing interconnect patches series
> https://patchwork.kernel.org/project/linux-arm-msm/list/?series=247175

I ended up pulling those in but I was still not seeing things work as
I expected.  I'll debug more tomorrow to see if it was my expectations
that were wrong or if there was a real issue.


> >>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >>
> >>          return req;
> >> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
> >>   }
> >>   EXPORT_SYMBOL(rpmh_write);
> >>
> >> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
> >> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
> >>   {
> >>          unsigned long flags;
> >>
> >>          spin_lock_irqsave(&ctrlr->cache_lock, flags);
> >> +
> >>          list_add_tail(&req->list, &ctrlr->batch_cache);
> >>          ctrlr->dirty = true;
> >> +
> >> +       if (!psci_has_osi_support()) {
> >> +               if (rpmh_flush(ctrlr)) {
> >> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >> +                       return -EINVAL;
> >> +               }
> >> +       }
> >> +
> >>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >> +
> >> +       return 0;
> >>   }
> >>
> >>   static int flush_batch(struct rpmh_ctrlr *ctrlr)
> >>   {
> >>          struct batch_cache_req *req;
> >>          const struct rpmh_request *rpm_msg;
> >> -       unsigned long flags;
> >>          int ret = 0;
> >>          int i;
> >>
> >>          /* Send Sleep/Wake requests to the controller, expect no response */
> >> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
> >>          list_for_each_entry(req, &ctrlr->batch_cache, list) {
> >>                  for (i = 0; i < req->count; i++) {
> >>                          rpm_msg = req->rpm_msgs + i;
> >> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
> >>                                  break;
> >>                  }
> >>          }
> >> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >>
> >>          return ret;
> >>   }
> >> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
> >>                  cmd += n[i];
> >>          }
> >>
> >> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
> >> -               cache_batch(ctrlr, req);
> >> -               return 0;
> >> -       }
> >> +       if (state != RPMH_ACTIVE_ONLY_STATE)
> >> +               return cache_batch(ctrlr, req);
> > I'm curious: why not just do:
> >
> > if (state != RPMH_ACTIVE_ONLY_STATE && psci_has_osi_support()) {
> >    cache_batch(ctrlr, req);
> >    return 0;
> > }
> >
> > ...AKA don't even cache it up if we're not in OSI mode.  IIUC this
> > would be a huge deal because with your code you're doing the whole
> > RPMH transfer under "spin_lock_irqsave", right?  And presumably RPMH
> > transfers are somewhat slow, otherwise why did anyone come up with
> > this whole caching / last-man-down scheme to start with?
> >
> > OK, it turned out to be at least slightly more complex because it
> > appears that we're supposed to use rpmh_rsc_write_ctrl_data() for
> > sleep/wake stuff and that they never do completions, but it really
> > wasn't too hard.  I prototyped it at <http://crrev.com/c/2080916>.
> > Feel free to hijack that change if it looks like a starting point and
> > if it looks like I'm not too confused.
> I looked at this change and thought of it earlier but it won’t work out
> for the reasons in above example.
> I have thought of few optimizations in rpmh_flush() to reduce its time,
> if we *really* see any performance impact…
>
> below is high level idea…
> When  rpmh_write_batch() is invoked for SLEEP_SETs, currently
> rpmh_flush() will update both SLEEP and WAKE TCS contents,
> However we may change it to update only SLEEP TCS, and when
> rpmh_write_batch() is invoked for WAKE SETs, update only WAKE TCS contents.
> This way it may reduce time by roughly ~50%.

OK, that's something to keep in mind.  Agree that it doesn't have the
be part of the initial change.

-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-03-04  0:40       ` Doug Anderson
@ 2020-03-05  9:41         ` Maulik Shah
  2020-03-05 22:18           ` Doug Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: Maulik Shah @ 2020-03-05  9:41 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao

Hi,

On 3/4/2020 6:10 AM, Doug Anderson wrote:
> Hi,
>
> On Mon, Mar 2, 2020 at 9:47 PM Maulik Shah <mkshah@codeaurora.org> wrote:
>>
>> On 2/29/2020 5:15 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>>>> Add changes to invoke rpmh flush() from within cache_lock when the data
>>>> in cache is dirty.
>>>>
>>>> This is done only if OSI is not supported in PSCI. If OSI is supported
>>>> rpmh_flush can get invoked when the last cpu going to power collapse
>>>> deepest low power mode.
>>>>
>>>> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the
>>>> driver is only compiled for arm64 which supports psci_has_osi_support()
>>>> API.
>>>>
>>>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
>>>> Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
>>>> ---
>>>>   drivers/soc/qcom/Kconfig |  2 +-
>>>>   drivers/soc/qcom/rpmh.c  | 33 ++++++++++++++++++++++-----------
>>>>   2 files changed, 23 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>>>> index d0a73e7..2e581bc 100644
>>>> --- a/drivers/soc/qcom/Kconfig
>>>> +++ b/drivers/soc/qcom/Kconfig
>>>> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM
>>>>
>>>>   config QCOM_RPMH
>>>>          bool "Qualcomm RPM-Hardened (RPMH) Communication"
>>>> -       depends on ARCH_QCOM && ARM64 || COMPILE_TEST
>>>> +       depends on ARCH_QCOM && ARM64
>>>>          help
>>>>            Support for communication with the hardened-RPM blocks in
>>>>            Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an
>>>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>>>> index f28afe4..6a5a60c 100644
>>>> --- a/drivers/soc/qcom/rpmh.c
>>>> +++ b/drivers/soc/qcom/rpmh.c
>>>> @@ -12,6 +12,7 @@
>>>>   #include <linux/module.h>
>>>>   #include <linux/of.h>
>>>>   #include <linux/platform_device.h>
>>>> +#include <linux/psci.h>
>>>>   #include <linux/slab.h>
>>>>   #include <linux/spinlock.h>
>>>>   #include <linux/types.h>
>>>> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>>>>          }
>>>>
>>>>   unlock:
>>>> +       if (ctrlr->dirty && !psci_has_osi_support()) {
>>>> +               if (rpmh_flush(ctrlr)) {
>>>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>>> +                       return ERR_PTR(-EINVAL);
>>>> +               }
>>>> +       }
>>> It's been a long time since I looked in depth at RPMH, but upon a
>>> first glance this seems like it's gonna be terrible for performance.
>>> You're going to send every entry again and again, aren't you?  In
>>> other words in pseudo-code:
>>>
>>> 1. rpmh_write(addr=0x10, data=0x99);
>>> ==> writes on the bus (0x10, 0x99)
>>>
>>> 2. rpmh_write(addr=0x11, data=0xaa);
>>> ==> writes on the bus (0x10, 0x99)
>>> ==> writes on the bus (0x11, 0xaa)
>>>
>>> 3. rpmh_write(addr=0x10, data=0xbb);
>>> ==> writes on the bus (0x10, 0xbb)
>>> ==> writes on the bus (0x11, 0xaa)
>>>
>>> 4. rpmh_write(addr=0x12, data=0xcc);
>>> ==> writes on the bus (0x10, 0xbb)
>>> ==> writes on the bus (0x11, 0xaa)
>>> ==> writes on the bus (0x12, 0xcc)
>>>
>>> That seems bad.
>> Hi Doug,
>>
>> No this is NOT how data is sent to RPMh/AOSS.
>> The rpmh_flush() fills up DRV-2 (HLOS) TCSes, makes it ready and The HW
>> takes care of
>> sending data of Sleep TCSes for each of the EL/ DRV(s) when Last cpu is
>> going to deepest
>> low power mode and of WAKE TCSes while first cpu is waking up.
> Ah, I see.  So for sleep / wake commands we never directly wait for
> them to go out on the bus while the system is awake.  We just program
> them all to the RPMH hardware and they'll set there and all get sent
> automatically when the last CPU goes into deepest low power mode.
>
> ...so actually the whole point of OSI mode (from an RPMH perspective)
> is not to avoid transactions on the bus.  It's just avoiding
> programming RPMH over and over again.  Is that correct?
>
> ...and the reason we have all these data structures in the kernel is
> to keep track of auxiliary information about the things in the
> sleep/wake TCSs and make it easier to update bits of them?
correct.
>
>
>>> Why can't you just send the new request itself and
>>> forget adding it to the cache?  In other words don't even call
>>> cache_rpm_request() in the non-OSI case and then in __rpmh_write()
>>> just send right away...
>> This won’t work out. Let me explain why…
>>
>> We have 3 SLEEP and 3 WAKE TCSes from below config..
>>                          qcom,tcs-config = <ACTIVE_TCS  2>,
>>                                            <SLEEP_TCS   3>,
>>                                            <WAKE_TCS    3>,
>> Each TCS has total 16 commands so total 48 commands(16*3) for each SLEEP
>> and WAKE TCSes,
>> that can be filled up.
>>
>> Now Lets take a example in pseudo-code on what could happen if we don’t
>> cache and
>> immediately fill up TCSes commands. The triggering part doesn’t happen
>> as explained above
>> it fills up TCSes and makes them ready..
>>
>> Time-t0 (from client_x invoking rpmh_write_batch() for SLEEP SET, a
>> batch of 3 commands)
>>
>> rpmh_write_batch(
>> addr=0x10, data=0x99,  -> fills up CMD0 in SLEEP TCS_0
>> addr=0x11, data=0xaa,  -> fills up CMD1 in SLEEP TCS_0
>> addr=0x10, data=0xbb); -> fills up CMD2 in SLEEP TCS_0
>>
>> Time-t1 (from client_y invoking rpmh_write(), a single command)
>>
>> rpmh_write(
>> addr=0x12, data=0xcc,  -> fills up CMD3 in SLEEP TCS_0
>> );
>>
>> Time-t2 (from client_x invokes rpmh_invalidate() which invalidates all
>> previous *batch requests* only)
>>
>> At this point, it should have CMD3 only in TCS while CMD 0,1,2 needs to
>> be freed up, since we expect
>> a new batch request now.
>>
>> Since driver didn’t cache anything in the first place, it doesn’t know
>> details about previous batch request
>> like how many commands it had, what were the commands of those batches
>> when filling up in TCSes, and so on…
>> (basically all the data required to free up only CMD 0,1,2, and don’t
>> disturb CMD3)
>>
>> Whats more?
>>
>> The new batch request could be of let say 5 commands after invalidation,
>> instead of 3 commands in previous batch.
>> So it will not fit in CMD-0,1,2 and we might want to allocate from
>> CMD-4,5,6,7,8 now.
>>
>> This will leave a hole in TCS CMDs (each TCS has 16 total commands)
>> unless we re-arrange everything.
>> Also we may want to fill up batch request first and then single
>> requests, by not caching anything, driver don’t
>> know which one is batch and which one is single request.
> OK, I got it now.  I'll try to spend some time tomorrow looking over
> everything / testing with my new understanding.
>
>
>> There are other cases like below which also gets impacted if driver
>> don't cache anything...
>>
>> for example, when we don’t have dedicated ACTIVE TCS ( if we have below
>> config with ACTIVE TCS count 0)
>>      qcom,tcs-config = <ACTIVE_TCS  0>,
>>                            <SLEEP_TCS   3>,
>>                            <WAKE_TCS    3>,
>>
>> Now to send active data, driver may re-use/ re-purpose few of the sleep
>> or wake TCS, to be used as ACTIVE TCS and once work is done,
>> it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver
>> don’t cache, all the SLEEP and WAKE data is lost when one
>> of TCS is repurposed to use as ACTIVE TCS.
> Ah, interesting.  I'll read the code more, but are you expecting this
> type of situation to work today, or is it theoretical for the future?
yes, we have targets which needs to work with this type of situation.
>
>
>> Hope above explanation clears why caching is important and gives clear
>> view of caching v/s not caching.
>>
>> Thanks,
>> Maulik
>>
>>> I tried to test this and my printouts didn't show anything actually
>>> happening in rpmh_flush().  Maybe I just don't have the write patches
>>> to exercise this properly...
>> it may be due to missing interconnect patches series
>> https://patchwork.kernel.org/project/linux-arm-msm/list/?series=247175
> I ended up pulling those in but I was still not seeing things work as
> I expected.  I'll debug more tomorrow to see if it was my expectations
> that were wrong or if there was a real issue.
>
>
>>>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>>>
>>>>          return req;
>>>> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state,
>>>>   }
>>>>   EXPORT_SYMBOL(rpmh_write);
>>>>
>>>> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>>>> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req)
>>>>   {
>>>>          unsigned long flags;
>>>>
>>>>          spin_lock_irqsave(&ctrlr->cache_lock, flags);
>>>> +
>>>>          list_add_tail(&req->list, &ctrlr->batch_cache);
>>>>          ctrlr->dirty = true;
>>>> +
>>>> +       if (!psci_has_osi_support()) {
>>>> +               if (rpmh_flush(ctrlr)) {
>>>> +                       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>>> +                       return -EINVAL;
>>>> +               }
>>>> +       }
>>>> +
>>>>          spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>>> +
>>>> +       return 0;
>>>>   }
>>>>
>>>>   static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>>>   {
>>>>          struct batch_cache_req *req;
>>>>          const struct rpmh_request *rpm_msg;
>>>> -       unsigned long flags;
>>>>          int ret = 0;
>>>>          int i;
>>>>
>>>>          /* Send Sleep/Wake requests to the controller, expect no response */
>>>> -       spin_lock_irqsave(&ctrlr->cache_lock, flags);
>>>>          list_for_each_entry(req, &ctrlr->batch_cache, list) {
>>>>                  for (i = 0; i < req->count; i++) {
>>>>                          rpm_msg = req->rpm_msgs + i;
>>>> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>>>>                                  break;
>>>>                  }
>>>>          }
>>>> -       spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>>>
>>>>          return ret;
>>>>   }
>>>> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>>>>                  cmd += n[i];
>>>>          }
>>>>
>>>> -       if (state != RPMH_ACTIVE_ONLY_STATE) {
>>>> -               cache_batch(ctrlr, req);
>>>> -               return 0;
>>>> -       }
>>>> +       if (state != RPMH_ACTIVE_ONLY_STATE)
>>>> +               return cache_batch(ctrlr, req);
>>> I'm curious: why not just do:
>>>
>>> if (state != RPMH_ACTIVE_ONLY_STATE && psci_has_osi_support()) {
>>>    cache_batch(ctrlr, req);
>>>    return 0;
>>> }
>>>
>>> ...AKA don't even cache it up if we're not in OSI mode.  IIUC this
>>> would be a huge deal because with your code you're doing the whole
>>> RPMH transfer under "spin_lock_irqsave", right?  And presumably RPMH
>>> transfers are somewhat slow, otherwise why did anyone come up with
>>> this whole caching / last-man-down scheme to start with?
>>>
>>> OK, it turned out to be at least slightly more complex because it
>>> appears that we're supposed to use rpmh_rsc_write_ctrl_data() for
>>> sleep/wake stuff and that they never do completions, but it really
>>> wasn't too hard.  I prototyped it at <http://crrev.com/c/2080916>.
>>> Feel free to hijack that change if it looks like a starting point and
>>> if it looks like I'm not too confused.
>> I looked at this change and thought of it earlier but it won’t work out
>> for the reasons in above example.
>> I have thought of few optimizations in rpmh_flush() to reduce its time,
>> if we *really* see any performance impact…
>>
>> below is high level idea…
>> When  rpmh_write_batch() is invoked for SLEEP_SETs, currently
>> rpmh_flush() will update both SLEEP and WAKE TCS contents,
>> However we may change it to update only SLEEP TCS, and when
>> rpmh_write_batch() is invoked for WAKE SETs, update only WAKE TCS contents.
>> This way it may reduce time by roughly ~50%.
> OK, that's something to keep in mind.  Agree that it doesn't have the
> be part of the initial change.
>
> -Doug
Thanks,
Maulik

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-03-05  9:41         ` Maulik Shah
@ 2020-03-05 22:18           ` Doug Anderson
  2020-03-10  9:11             ` Maulik Shah
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2020-03-05 22:18 UTC (permalink / raw)
  To: Maulik Shah
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao

Hi,

On Thu, Mar 5, 2020 at 1:41 AM Maulik Shah <mkshah@codeaurora.org> wrote:
 >> There are other cases like below which also gets impacted if driver
> >> don't cache anything...
> >>
> >> for example, when we don’t have dedicated ACTIVE TCS ( if we have below
> >> config with ACTIVE TCS count 0)
> >>      qcom,tcs-config = <ACTIVE_TCS  0>,
> >>                            <SLEEP_TCS   3>,
> >>                            <WAKE_TCS    3>,
> >>
> >> Now to send active data, driver may re-use/ re-purpose few of the sleep
> >> or wake TCS, to be used as ACTIVE TCS and once work is done,
> >> it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver
> >> don’t cache, all the SLEEP and WAKE data is lost when one
> >> of TCS is repurposed to use as ACTIVE TCS.
> > Ah, interesting.  I'll read the code more, but are you expecting this
> > type of situation to work today, or is it theoretical for the future?
> yes, we have targets which needs to work with this type of situation.

My brain is still slowly absorbing all the code, but something tells
me that targets with no ACTIVE TCS will not work properly with non-OSI
mode unless you change your patches more.  Specifically to make the
zero ACTIVE TCS case work I think you need a rpmh_flush() call after
_ALL_ calls to rpmh_write() and rpmh_write_batch() (even those
modifying ACTIVE state).  rpmh_write_async() will be yet more
interesting because you'd have to flush in rpmh_tx_done() I guess?
...and also somehow you need to inhibit entering sleep mode if an
async write was in progress?  Maybe easier to just detect the
"non-OSI-mode + 0 ACTIVE TCS" case at probe time and fail to probe?


-Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches
  2020-03-05 22:18           ` Doug Anderson
@ 2020-03-10  9:11             ` Maulik Shah
  0 siblings, 0 replies; 14+ messages in thread
From: Maulik Shah @ 2020-03-10  9:11 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Stephen Boyd, Matthias Kaehlcke, Evan Green, Bjorn Andersson,
	LKML, linux-arm-msm, Andy Gross, Rajendra Nayak, Lina Iyer,
	lsrao


On 3/6/2020 3:48 AM, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 5, 2020 at 1:41 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>  >> There are other cases like below which also gets impacted if driver
>>>> don't cache anything...
>>>>
>>>> for example, when we don’t have dedicated ACTIVE TCS ( if we have below
>>>> config with ACTIVE TCS count 0)
>>>>      qcom,tcs-config = <ACTIVE_TCS  0>,
>>>>                            <SLEEP_TCS   3>,
>>>>                            <WAKE_TCS    3>,
>>>>
>>>> Now to send active data, driver may re-use/ re-purpose few of the sleep
>>>> or wake TCS, to be used as ACTIVE TCS and once work is done,
>>>> it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver
>>>> don’t cache, all the SLEEP and WAKE data is lost when one
>>>> of TCS is repurposed to use as ACTIVE TCS.
>>> Ah, interesting.  I'll read the code more, but are you expecting this
>>> type of situation to work today, or is it theoretical for the future?
>> yes, we have targets which needs to work with this type of situation.
> My brain is still slowly absorbing all the code, but something tells
> me that targets with no ACTIVE TCS will not work properly with non-OSI
> mode unless you change your patches more.  Specifically to make the
> zero ACTIVE TCS case work I think you need a rpmh_flush() call after
> _ALL_ calls to rpmh_write() and rpmh_write_batch() (even those
> modifying ACTIVE state).  rpmh_write_async() will be yet more
> interesting because you'd have to flush in rpmh_tx_done() I guess?
> ...and also somehow you need to inhibit entering sleep mode if an
> async write was in progress?  Maybe easier to just detect the
> "non-OSI-mode + 0 ACTIVE TCS" case at probe time and fail to probe?
>
>
> -Doug
No, it shouldn’t break with "non-OSI-mode + 0 ACTIVE TCS"

After taking your suggestion to do rpmh start/end transaction in v13, rpmh_end_transaction()
invokes rpmh_flush() only for the last client and by this time expecting all of rpmh_write()
and rpmh_write_batch() will be already “finished” as client first waits for them to finish
and then only invokes end.

So driver is good to handle rpmh_write() and rpmh_write_batch() calls.

Regarding rpmh_write_async() call, which is a fire-n-forget request from SW and client driver
may immediately invoke rpmh_end_transaction() after this.

this case is also handled…
Lets again take an example for understanding this..

1.    Client invokes rpmh_write_async() to send ACTIVE cmds for targets which has zero ACTIVE TCS

    Rpmh driver Re-purposes one of SLEEP/WAKE TCS to use as ACTIVE, internally this also sets
    drv->tcs_in_use to true for respective SLEEP/WAKE TCS.

2.    Client now without waiting for above to finish, goes ahead and invokes rpmh_end_transaction()
    which calls rpmh_flush() (in case cache become dirty)

    Now if re-purposed TCS is still in use in HW (transaction in progress), we still have
    drv->tcs_in_use set. So the rpmh_rsc_invalidate() (invoked from rpmh_flush()) will keep on
    returning -EAGAIN until that TCS becomes free to use and then goes ahead to finish its job.
   
Thanks,
Maulik

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-10  9:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 11:38 [PATCH v9 0/3] Invoke rpmh_flush for non OSI targets Maulik Shah
2020-02-28 11:38 ` [PATCH v9 1/3] arm64: dts: qcom: sc7180: Add cpuidle low power states Maulik Shah
2020-02-28 11:38 ` [PATCH v9 2/3] soc: qcom: rpmh: Update dirty flag only when data changes Maulik Shah
2020-02-28 21:50   ` Evan Green
2020-03-02 11:38     ` Maulik Shah
2020-02-28 11:38 ` [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches Maulik Shah
2020-02-28 21:48   ` Evan Green
2020-03-03  5:48     ` Maulik Shah
2020-02-28 23:45   ` Doug Anderson
2020-03-03  5:47     ` Maulik Shah
2020-03-04  0:40       ` Doug Anderson
2020-03-05  9:41         ` Maulik Shah
2020-03-05 22:18           ` Doug Anderson
2020-03-10  9:11             ` Maulik Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).