Linux-ARM-MSM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845
@ 2019-08-23  6:32 Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call Vivek Gautam
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Vivek Gautam @ 2019-08-23  6:32 UTC (permalink / raw)
  To: joro, agross, will.deacon, robin.murphy, iommu
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

Previous version of the patches are at [1]:

Qcom's implementation of smmu-500 on sdm845 adds a hardware logic called
wait-for-safe. This logic helps in meeting the invalidation requirements
from 'real-time clients', such as display and camera. This wait-for-safe
logic ensures that the invalidations happen after getting an ack from these
devices.
In this patch-series we are disabling this wait-for-safe logic from the
arm-smmu driver's probe as with this enabled the hardware tries to
throttle invalidations from 'non-real-time clients', such as USB and UFS.

For detailed information please refer to patch [3/4] in this series.
I have included the device tree patch too in this series for someone who
would like to test out this. Here's a branch [2] that gets display on MTP
SDM845 device.

This patch series is inspired from downstream work to handle under-performance
issues on real-time clients on sdm845. In downstream we add separate page table
ops to handle TLB maintenance and toggle wait-for-safe in tlb_sync call so that
achieve required performance for display and camera [3, 4].

Changes since v3:
 * Based on arm-smmu implementation cleanup series [5] by Robin Murphy which is
   already merged in Will's tree [6].
 * Implemented the sdm845 specific reset hook which does arm_smmu_device_reset()
   followed by making SCM call to disable the wait-for-safe logic.
 * Removed depedency for SCM call on any dt flag. We invariably try to disable
   the wait-for-safe logic on sdm845. The platforms such as mtp845, and db845
   that implement handlers for this particular SCM call should be able disable
   wait-for-safe logic.
   Other platforms such as cheza don't enable the wait-for-safe logic at all
   from their bootloaders. So there's no need to disable the same.
 * No change in SCM call patches 1 & 2.

Changes since v2:
 * Dropped the patch to add atomic io_read/write scm API.
 * Removed support for any separate page table ops to handle wait-for-safe.
   Currently just disabling this wait-for-safe logic from arm_smmu_device_probe()
   to achieve performance on USB/UFS on sdm845.
 * Added a device tree patch to add smmu option for fw-implemented support
   for SCM call to take care of SAFE toggling.

Changes since v1:
 * Addressed Will and Robin's comments:
    - Dropped the patch[4] that forked out __arm_smmu_tlb_inv_range_nosync(),
      and __arm_smmu_tlb_sync().
    - Cleaned up the errata patch further to use downstream polling mechanism
      for tlb sync.
 * No change in SCM call patches - patches 1 to 3.

[1] https://lore.kernel.org/patchwork/cover/1087453/
[2] https://github.com/vivekgautam1/linux/tree/v5.2-rc4/sdm845-display-working
[3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=da765c6c75266b38191b38ef086274943f353ea7
[4] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=8696005aaaf745de68f57793c1a534a34345c30a
[5] https://patchwork.kernel.org/patch/11096265/
[6] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/

Vivek Gautam (3):
  firmware: qcom_scm-64: Add atomic version of qcom_scm_call
  firmware/qcom_scm: Add scm call to handle smmu errata
  iommu: arm-smmu-impl: Add sdm845 implementation hook

 drivers/firmware/qcom_scm-32.c |   5 ++
 drivers/firmware/qcom_scm-64.c | 149 +++++++++++++++++++++++++++++------------
 drivers/firmware/qcom_scm.c    |   6 ++
 drivers/firmware/qcom_scm.h    |   5 ++
 drivers/iommu/arm-smmu-impl.c  |  27 +++++++-
 include/linux/qcom_scm.h       |   2 +
 6 files changed, 149 insertions(+), 45 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call
  2019-08-23  6:32 [PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845 Vivek Gautam
@ 2019-08-23  6:32 ` Vivek Gautam
  2019-09-06 15:12   ` Stephen Boyd
  2019-08-23  6:32 ` [PATCH v4 2/3] firmware/qcom_scm: Add scm call to handle smmu errata Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
  2 siblings, 1 reply; 11+ messages in thread
From: Vivek Gautam @ 2019-08-23  6:32 UTC (permalink / raw)
  To: joro, agross, will.deacon, robin.murphy, iommu
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

There are scnenarios where drivers are required to make a
scm call in atomic context, such as in one of the qcom's
arm-smmu-500 errata [1].

[1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/
      tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n4842")

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/firmware/qcom_scm-64.c | 136 ++++++++++++++++++++++++++++-------------
 1 file changed, 92 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
index 91d5ad7cf58b..b6dca32c5ac4 100644
--- a/drivers/firmware/qcom_scm-64.c
+++ b/drivers/firmware/qcom_scm-64.c
@@ -62,32 +62,71 @@ static DEFINE_MUTEX(qcom_scm_lock);
 #define FIRST_EXT_ARG_IDX 3
 #define N_REGISTER_ARGS (MAX_QCOM_SCM_ARGS - N_EXT_QCOM_SCM_ARGS + 1)
 
-/**
- * qcom_scm_call() - Invoke a syscall in the secure world
- * @dev:	device
- * @svc_id:	service identifier
- * @cmd_id:	command identifier
- * @desc:	Descriptor structure containing arguments and return values
- *
- * Sends a command to the SCM and waits for the command to finish processing.
- * This should *only* be called in pre-emptible context.
-*/
-static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
-			 const struct qcom_scm_desc *desc,
-			 struct arm_smccc_res *res)
+static void __qcom_scm_call_do(const struct qcom_scm_desc *desc,
+			       struct arm_smccc_res *res, u32 fn_id,
+			       u64 x5, u32 type)
+{
+	u64 cmd;
+	struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};
+
+	cmd = ARM_SMCCC_CALL_VAL(type, qcom_smccc_convention,
+				 ARM_SMCCC_OWNER_SIP, fn_id);
+
+	quirk.state.a6 = 0;
+
+	do {
+		arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
+				    desc->args[1], desc->args[2], x5,
+				    quirk.state.a6, 0, res, &quirk);
+
+		if (res->a0 == QCOM_SCM_INTERRUPTED)
+			cmd = res->a0;
+
+	} while (res->a0 == QCOM_SCM_INTERRUPTED);
+}
+
+static void qcom_scm_call_do(const struct qcom_scm_desc *desc,
+			     struct arm_smccc_res *res, u32 fn_id,
+			     u64 x5, bool atomic)
+{
+	int retry_count = 0;
+
+	if (!atomic) {
+		do {
+			mutex_lock(&qcom_scm_lock);
+
+			__qcom_scm_call_do(desc, res, fn_id, x5,
+					   ARM_SMCCC_STD_CALL);
+
+			mutex_unlock(&qcom_scm_lock);
+
+			if (res->a0 == QCOM_SCM_V2_EBUSY) {
+				if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY)
+					break;
+				msleep(QCOM_SCM_EBUSY_WAIT_MS);
+			}
+		}  while (res->a0 == QCOM_SCM_V2_EBUSY);
+	} else {
+		__qcom_scm_call_do(desc, res, fn_id, x5, ARM_SMCCC_FAST_CALL);
+	}
+}
+
+static int ___qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
+			    const struct qcom_scm_desc *desc,
+			    struct arm_smccc_res *res, bool atomic)
 {
 	int arglen = desc->arginfo & 0xf;
-	int retry_count = 0, i;
+	int i;
 	u32 fn_id = QCOM_SCM_FNID(svc_id, cmd_id);
-	u64 cmd, x5 = desc->args[FIRST_EXT_ARG_IDX];
+	u64 x5 = desc->args[FIRST_EXT_ARG_IDX];
 	dma_addr_t args_phys = 0;
 	void *args_virt = NULL;
 	size_t alloc_len;
-	struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};
+	gfp_t flag = atomic ? GFP_ATOMIC : GFP_KERNEL;
 
 	if (unlikely(arglen > N_REGISTER_ARGS)) {
 		alloc_len = N_EXT_QCOM_SCM_ARGS * sizeof(u64);
-		args_virt = kzalloc(PAGE_ALIGN(alloc_len), GFP_KERNEL);
+		args_virt = kzalloc(PAGE_ALIGN(alloc_len), flag);
 
 		if (!args_virt)
 			return -ENOMEM;
@@ -117,33 +156,7 @@ static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
 		x5 = args_phys;
 	}
 
-	do {
-		mutex_lock(&qcom_scm_lock);
-
-		cmd = ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL,
-					 qcom_smccc_convention,
-					 ARM_SMCCC_OWNER_SIP, fn_id);
-
-		quirk.state.a6 = 0;
-
-		do {
-			arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
-				      desc->args[1], desc->args[2], x5,
-				      quirk.state.a6, 0, res, &quirk);
-
-			if (res->a0 == QCOM_SCM_INTERRUPTED)
-				cmd = res->a0;
-
-		} while (res->a0 == QCOM_SCM_INTERRUPTED);
-
-		mutex_unlock(&qcom_scm_lock);
-
-		if (res->a0 == QCOM_SCM_V2_EBUSY) {
-			if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY)
-				break;
-			msleep(QCOM_SCM_EBUSY_WAIT_MS);
-		}
-	}  while (res->a0 == QCOM_SCM_V2_EBUSY);
+	qcom_scm_call_do(desc, res, fn_id, x5, atomic);
 
 	if (args_virt) {
 		dma_unmap_single(dev, args_phys, alloc_len, DMA_TO_DEVICE);
@@ -156,6 +169,41 @@ static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
 	return 0;
 }
 
+/**
+ * qcom_scm_call() - Invoke a syscall in the secure world
+ * @dev:	device
+ * @svc_id:	service identifier
+ * @cmd_id:	command identifier
+ * @desc:	Descriptor structure containing arguments and return values
+ *
+ * Sends a command to the SCM and waits for the command to finish processing.
+ * This should *only* be called in pre-emptible context.
+ */
+static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
+			 const struct qcom_scm_desc *desc,
+			 struct arm_smccc_res *res)
+{
+	return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, false);
+}
+
+/**
+ * qcom_scm_call_atomic() - atomic variation of qcom_scm_call()
+ * @dev:	device
+ * @svc_id:	service identifier
+ * @cmd_id:	command identifier
+ * @desc:	Descriptor structure containing arguments and return values
+ * @res:	Structure containing results from SMC/HVC call
+ *
+ * Sends a command to the SCM and waits for the command to finish processing.
+ * This should be called in atomic context only.
+ */
+static int qcom_scm_call_atomic(struct device *dev, u32 svc_id, u32 cmd_id,
+				const struct qcom_scm_desc *desc,
+				struct arm_smccc_res *res)
+{
+	return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, true);
+}
+
 /**
  * qcom_scm_set_cold_boot_addr() - Set the cold boot address for cpus
  * @entry: Entry point function for the cpus
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 2/3] firmware/qcom_scm: Add scm call to handle smmu errata
  2019-08-23  6:32 [PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845 Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call Vivek Gautam
@ 2019-08-23  6:32 ` Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
  2 siblings, 0 replies; 11+ messages in thread
From: Vivek Gautam @ 2019-08-23  6:32 UTC (permalink / raw)
  To: joro, agross, will.deacon, robin.murphy, iommu
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

Qcom's smmu-500 needs to toggle wait-for-safe sequence to
handle TLB invalidation sync's.
Few firmwares allow doing that through SCM interface.
Add API to toggle wait for safe from firmware through a
SCM call.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/firmware/qcom_scm-32.c |  5 +++++
 drivers/firmware/qcom_scm-64.c | 13 +++++++++++++
 drivers/firmware/qcom_scm.c    |  6 ++++++
 drivers/firmware/qcom_scm.h    |  5 +++++
 include/linux/qcom_scm.h       |  2 ++
 5 files changed, 31 insertions(+)

diff --git a/drivers/firmware/qcom_scm-32.c b/drivers/firmware/qcom_scm-32.c
index 215061c581e1..bee8729525ec 100644
--- a/drivers/firmware/qcom_scm-32.c
+++ b/drivers/firmware/qcom_scm-32.c
@@ -614,3 +614,8 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val)
 	return qcom_scm_call_atomic2(QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE,
 				     addr, val);
 }
+
+int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool enable)
+{
+	return -ENODEV;
+}
diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
index b6dca32c5ac4..41c06dcfa9e1 100644
--- a/drivers/firmware/qcom_scm-64.c
+++ b/drivers/firmware/qcom_scm-64.c
@@ -550,3 +550,16 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val)
 	return qcom_scm_call(dev, QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE,
 			     &desc, &res);
 }
+
+int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool en)
+{
+	struct qcom_scm_desc desc = {0};
+	struct arm_smccc_res res;
+
+	desc.args[0] = QCOM_SCM_CONFIG_ERRATA1_CLIENT_ALL;
+	desc.args[1] = en;
+	desc.arginfo = QCOM_SCM_ARGS(2);
+
+	return qcom_scm_call_atomic(dev, QCOM_SCM_SVC_SMMU_PROGRAM,
+				    QCOM_SCM_CONFIG_ERRATA1, &desc, &res);
+}
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index 2ddc118dba1b..2b3b7a8c4270 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -344,6 +344,12 @@ int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare)
 }
 EXPORT_SYMBOL(qcom_scm_iommu_secure_ptbl_init);
 
+int qcom_scm_qsmmu500_wait_safe_toggle(bool en)
+{
+	return __qcom_scm_qsmmu500_wait_safe_toggle(__scm->dev, en);
+}
+EXPORT_SYMBOL(qcom_scm_qsmmu500_wait_safe_toggle);
+
 int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val)
 {
 	return __qcom_scm_io_readl(__scm->dev, addr, val);
diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h
index 99506bd873c0..baee744dbcfe 100644
--- a/drivers/firmware/qcom_scm.h
+++ b/drivers/firmware/qcom_scm.h
@@ -91,10 +91,15 @@ extern int __qcom_scm_restore_sec_cfg(struct device *dev, u32 device_id,
 				      u32 spare);
 #define QCOM_SCM_IOMMU_SECURE_PTBL_SIZE	3
 #define QCOM_SCM_IOMMU_SECURE_PTBL_INIT	4
+#define QCOM_SCM_SVC_SMMU_PROGRAM	0x15
+#define QCOM_SCM_CONFIG_ERRATA1		0x3
+#define QCOM_SCM_CONFIG_ERRATA1_CLIENT_ALL	0x2
 extern int __qcom_scm_iommu_secure_ptbl_size(struct device *dev, u32 spare,
 					     size_t *size);
 extern int __qcom_scm_iommu_secure_ptbl_init(struct device *dev, u64 addr,
 					     u32 size, u32 spare);
+extern int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev,
+						bool enable);
 #define QCOM_MEM_PROT_ASSIGN_ID	0x16
 extern int  __qcom_scm_assign_mem(struct device *dev,
 				  phys_addr_t mem_region, size_t mem_sz,
diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h
index 3f12cc77fb58..aee3d8580d89 100644
--- a/include/linux/qcom_scm.h
+++ b/include/linux/qcom_scm.h
@@ -57,6 +57,7 @@ extern int qcom_scm_set_remote_state(u32 state, u32 id);
 extern int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare);
 extern int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size);
 extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare);
+extern int qcom_scm_qsmmu500_wait_safe_toggle(bool en);
 extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val);
 extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val);
 #else
@@ -96,6 +97,7 @@ qcom_scm_set_remote_state(u32 state,u32 id) { return -ENODEV; }
 static inline int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare) { return -ENODEV; }
 static inline int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size) { return -ENODEV; }
 static inline int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare) { return -ENODEV; }
+static inline int qcom_scm_qsmmu500_wait_safe_toggle(bool en) { return -ENODEV; }
 static inline int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { return -ENODEV; }
 static inline int qcom_scm_io_writel(phys_addr_t addr, unsigned int val) { return -ENODEV; }
 #endif
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-08-23  6:32 [PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845 Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call Vivek Gautam
  2019-08-23  6:32 ` [PATCH v4 2/3] firmware/qcom_scm: Add scm call to handle smmu errata Vivek Gautam
@ 2019-08-23  6:32 ` Vivek Gautam
  2019-09-06  6:32   ` Vivek Gautam
                     ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: Vivek Gautam @ 2019-08-23  6:32 UTC (permalink / raw)
  To: joro, agross, will.deacon, robin.murphy, iommu
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

Add reset hook for sdm845 based platforms to turn off
the wait-for-safe sequence.

Understanding how wait-for-safe logic affects USB and UFS performance
on MTP845 and DB845 boards:

Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
to address under-performance issues in real-time clients, such as
Display, and Camera.
On receiving an invalidation requests, the SMMU forwards SAFE request
to these clients and waits for SAFE ack signal from real-time clients.
The SAFE signal from such clients is used to qualify the start of
invalidation.
This logic is controlled by chicken bits, one for each - MDP (display),
IFE0, and IFE1 (camera), that can be accessed only from secure software
on sdm845.

This configuration, however, degrades the performance of non-real time
clients, such as USB, and UFS etc. This happens because, with wait-for-safe
logic enabled the hardware tries to throttle non-real time clients while
waiting for SAFE ack signals from real-time clients.

On mtp845 and db845 devices, with wait-for-safe logic enabled by the
bootloaders we see degraded performance of USB and UFS when kernel
enables the smmu stage-1 translations for these clients.
Turn off this wait-for-safe logic from the kernel gets us back the perf
of USB and UFS devices until we re-visit this when we start seeing perf
issues on display/camera on upstream supported SDM845 platforms.
The bootloaders on these boards implement secure monitor callbacks to
handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
logic can be toggled.

There are other boards such as cheza whose bootloaders don't enable this
logic. Such boards don't implement callbacks to handle the specific SCM
call so disabling this logic for such boards will be a no-op.

This change is inspired by the downstream change from Patrick Daly
to address performance issues with display and camera by handling
this wait-for-safe within separte io-pagetable ops to do TLB
maintenance. So a big thanks to him for the change and for all the
offline discussions.

Without this change the UFS reads are pretty slow:
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
10+0 records in
10+0 records out
10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
real    0m 22.39s
user    0m 0.00s
sys     0m 0.01s

With this change they are back to rock!
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
300+0 records in
300+0 records out
314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
real    0m 1.03s
user    0m 0.00s
sys     0m 0.54s

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu-impl.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 3f88cd078dd5..0aef87c41f9c 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -6,6 +6,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/of.h>
+#include <linux/qcom_scm.h>
 
 #include "arm-smmu.h"
 
@@ -102,7 +103,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm
 	return &cs->smmu;
 }
 
-
 #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
 
 #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
@@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
 	.reset = arm_mmu500_reset,
 };
 
+static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
+{
+	int ret;
+
+	arm_mmu500_reset(smmu);
+
+	/*
+	 * To address performance degradation in non-real time clients,
+	 * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
+	 * such as MTP and db845, whose firmwares implement secure monitor
+	 * call handlers to turn on/off the wait-for-safe logic.
+	 */
+	ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
+	if (ret)
+		dev_warn(smmu->dev, "Failed to turn off SAFE logic\n");
+
+	return 0;
+}
+
+const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {
+	.reset = qcom_sdm845_smmu500_reset,
+};
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
@@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 				  "calxeda,smmu-secure-config-access"))
 		smmu->impl = &calxeda_impl;
 
+	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
+		smmu->impl = &qcom_sdm845_smmu500_impl;
+
 	return smmu;
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
@ 2019-09-06  6:32   ` Vivek Gautam
  2019-09-06 15:07   ` Stephen Boyd
  2019-09-10 13:26   ` Robin Murphy
  2 siblings, 0 replies; 11+ messages in thread
From: Vivek Gautam @ 2019-09-06  6:32 UTC (permalink / raw)
  To: list@263.net:IOMMU DRIVERS, Joerg Roedel, joro, Andy Gross,
	Will Deacon, Robin Murphy, list@263.net:IOMMU DRIVERS,
	Joerg Roedel, iommu
  Cc: linux-arm-msm, open list, Bjorn Andersson

On Fri, Aug 23, 2019 at 12:03 PM Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
>
> Add reset hook for sdm845 based platforms to turn off
> the wait-for-safe sequence.
>
> Understanding how wait-for-safe logic affects USB and UFS performance
> on MTP845 and DB845 boards:
>
> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> to address under-performance issues in real-time clients, such as
> Display, and Camera.
> On receiving an invalidation requests, the SMMU forwards SAFE request
> to these clients and waits for SAFE ack signal from real-time clients.
> The SAFE signal from such clients is used to qualify the start of
> invalidation.
> This logic is controlled by chicken bits, one for each - MDP (display),
> IFE0, and IFE1 (camera), that can be accessed only from secure software
> on sdm845.
>
> This configuration, however, degrades the performance of non-real time
> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> logic enabled the hardware tries to throttle non-real time clients while
> waiting for SAFE ack signals from real-time clients.
>
> On mtp845 and db845 devices, with wait-for-safe logic enabled by the
> bootloaders we see degraded performance of USB and UFS when kernel
> enables the smmu stage-1 translations for these clients.
> Turn off this wait-for-safe logic from the kernel gets us back the perf
> of USB and UFS devices until we re-visit this when we start seeing perf
> issues on display/camera on upstream supported SDM845 platforms.
> The bootloaders on these boards implement secure monitor callbacks to
> handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
> logic can be toggled.
>
> There are other boards such as cheza whose bootloaders don't enable this
> logic. Such boards don't implement callbacks to handle the specific SCM
> call so disabling this logic for such boards will be a no-op.
>
> This change is inspired by the downstream change from Patrick Daly
> to address performance issues with display and camera by handling
> this wait-for-safe within separte io-pagetable ops to do TLB
> maintenance. So a big thanks to him for the change and for all the
> offline discussions.
>
> Without this change the UFS reads are pretty slow:
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> real    0m 22.39s
> user    0m 0.00s
> sys     0m 0.01s
>
> With this change they are back to rock!
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> 300+0 records in
> 300+0 records out
> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> real    0m 1.03s
> user    0m 0.00s
> sys     0m 0.54s
>
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  drivers/iommu/arm-smmu-impl.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..0aef87c41f9c 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -6,6 +6,7 @@
>
>  #include <linux/bitfield.h>
>  #include <linux/of.h>
> +#include <linux/qcom_scm.h>
>
>  #include "arm-smmu.h"
>
> @@ -102,7 +103,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm
>         return &cs->smmu;
>  }
>
> -
>  #define ARM_MMU500_ACTLR_CPRE          (1 << 1)
>
>  #define ARM_MMU500_ACR_CACHE_LOCK      (1 << 26)
> @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
>         .reset = arm_mmu500_reset,
>  };
>
> +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
> +{
> +       int ret;
> +
> +       arm_mmu500_reset(smmu);
> +
> +       /*
> +        * To address performance degradation in non-real time clients,
> +        * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
> +        * such as MTP and db845, whose firmwares implement secure monitor
> +        * call handlers to turn on/off the wait-for-safe logic.
> +        */
> +       ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
> +       if (ret)
> +               dev_warn(smmu->dev, "Failed to turn off SAFE logic\n");
> +
> +       return 0;
> +}
> +
> +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {
> +       .reset = qcom_sdm845_smmu500_reset,
> +};
>
>  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>  {
> @@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>                                   "calxeda,smmu-secure-config-access"))
>                 smmu->impl = &calxeda_impl;
>
> +       if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
> +               smmu->impl = &qcom_sdm845_smmu500_impl;
> +
>         return smmu;
>  }

Hi Robin,

How does this change look now? Let me know if there are any concerns.
I will be happy to incorporate them.

Thanks & Regards
Vivek
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
  2019-09-06  6:32   ` Vivek Gautam
@ 2019-09-06 15:07   ` Stephen Boyd
  2019-09-16  9:47     ` Sai Prakash Ranjan
  2019-09-10 13:26   ` Robin Murphy
  2 siblings, 1 reply; 11+ messages in thread
From: Stephen Boyd @ 2019-09-06 15:07 UTC (permalink / raw)
  To: Vivek Gautam, agross, iommu, joro, robin.murphy, will.deacon
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

Quoting Vivek Gautam (2019-08-22 23:32:48)
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..0aef87c41f9c 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -102,7 +103,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm
>         return &cs->smmu;
>  }
>  
> -
>  #define ARM_MMU500_ACTLR_CPRE          (1 << 1)
>  
>  #define ARM_MMU500_ACR_CACHE_LOCK      (1 << 26)

Drop this hunk? 

> @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
>         .reset = arm_mmu500_reset,
>  };
>  
> +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
> +{
> +       int ret;
> +
> +       arm_mmu500_reset(smmu);
> +
> +       /*
> +        * To address performance degradation in non-real time clients,
> +        * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
> +        * such as MTP and db845, whose firmwares implement secure monitor
> +        * call handlers to turn on/off the wait-for-safe logic.
> +        */
> +       ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
> +       if (ret)
> +               dev_warn(smmu->dev, "Failed to turn off SAFE logic\n");
> +
> +       return 0;

return ret? Or intentionally don't return an error for failure?

> +}
> +
> +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {

static?

> +       .reset = qcom_sdm845_smmu500_reset,
> +};
>  
>  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>  {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call
  2019-08-23  6:32 ` [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call Vivek Gautam
@ 2019-09-06 15:12   ` Stephen Boyd
  2019-09-16  9:48     ` Sai Prakash Ranjan
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Boyd @ 2019-09-06 15:12 UTC (permalink / raw)
  To: Vivek Gautam, agross, iommu, joro, robin.murphy, will.deacon
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel, Vivek Gautam

Quoting Vivek Gautam (2019-08-22 23:32:46)
> There are scnenarios where drivers are required to make a
> scm call in atomic context, such as in one of the qcom's
> arm-smmu-500 errata [1].
> 
> [1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/
>       tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n4842")
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>  drivers/firmware/qcom_scm-64.c | 136 ++++++++++++++++++++++++++++-------------
>  1 file changed, 92 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
> index 91d5ad7cf58b..b6dca32c5ac4 100644
> --- a/drivers/firmware/qcom_scm-64.c
> +++ b/drivers/firmware/qcom_scm-64.c
> @@ -62,32 +62,71 @@ static DEFINE_MUTEX(qcom_scm_lock);
>  #define FIRST_EXT_ARG_IDX 3
>  #define N_REGISTER_ARGS (MAX_QCOM_SCM_ARGS - N_EXT_QCOM_SCM_ARGS + 1)
>  
> -/**
> - * qcom_scm_call() - Invoke a syscall in the secure world
> - * @dev:       device
> - * @svc_id:    service identifier
> - * @cmd_id:    command identifier
> - * @desc:      Descriptor structure containing arguments and return values
> - *
> - * Sends a command to the SCM and waits for the command to finish processing.
> - * This should *only* be called in pre-emptible context.
> -*/
> -static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
> -                        const struct qcom_scm_desc *desc,
> -                        struct arm_smccc_res *res)
> +static void __qcom_scm_call_do(const struct qcom_scm_desc *desc,
> +                              struct arm_smccc_res *res, u32 fn_id,
> +                              u64 x5, u32 type)
> +{
> +       u64 cmd;
> +       struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};

Nitpick: Put spaces around braces please.

> +
> +       cmd = ARM_SMCCC_CALL_VAL(type, qcom_smccc_convention,
> +                                ARM_SMCCC_OWNER_SIP, fn_id);
> +
> +       quirk.state.a6 = 0;
> +
> +       do {
> +               arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
> +                                   desc->args[1], desc->args[2], x5,
> +                                   quirk.state.a6, 0, res, &quirk);
> +
> +               if (res->a0 == QCOM_SCM_INTERRUPTED)
> +                       cmd = res->a0;
> +
> +       } while (res->a0 == QCOM_SCM_INTERRUPTED);
> +}
> +
> +static void qcom_scm_call_do(const struct qcom_scm_desc *desc,
> +                            struct arm_smccc_res *res, u32 fn_id,
> +                            u64 x5, bool atomic)
> +{
> +       int retry_count = 0;
> +
> +       if (!atomic) {
> +               do {
> +                       mutex_lock(&qcom_scm_lock);
> +
> +                       __qcom_scm_call_do(desc, res, fn_id, x5,
> +                                          ARM_SMCCC_STD_CALL);
> +
> +                       mutex_unlock(&qcom_scm_lock);
> +
> +                       if (res->a0 == QCOM_SCM_V2_EBUSY) {
> +                               if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY)
> +                                       break;
> +                               msleep(QCOM_SCM_EBUSY_WAIT_MS);
> +                       }
> +               }  while (res->a0 == QCOM_SCM_V2_EBUSY);
> +       } else {
> +               __qcom_scm_call_do(desc, res, fn_id, x5, ARM_SMCCC_FAST_CALL);
> +       }

To save on some indentation maybe you could write it like:

	if (atomic) {
		__qcom_scm_call_do(..)
		return;
	}

	do {
		mutex_lock(..)
		...
	} while (..);

> +}
> +
> +static int ___qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
> +                           const struct qcom_scm_desc *desc,
> +                           struct arm_smccc_res *res, bool atomic)
>  {
>         int arglen = desc->arginfo & 0xf;
> -       int retry_count = 0, i;
> +       int i;
>         u32 fn_id = QCOM_SCM_FNID(svc_id, cmd_id);
> -       u64 cmd, x5 = desc->args[FIRST_EXT_ARG_IDX];
> +       u64 x5 = desc->args[FIRST_EXT_ARG_IDX];
>         dma_addr_t args_phys = 0;
>         void *args_virt = NULL;
>         size_t alloc_len;
> -       struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};
> +       gfp_t flag = atomic ? GFP_ATOMIC : GFP_KERNEL;
>  
>         if (unlikely(arglen > N_REGISTER_ARGS)) {
>                 alloc_len = N_EXT_QCOM_SCM_ARGS * sizeof(u64);
> -               args_virt = kzalloc(PAGE_ALIGN(alloc_len), GFP_KERNEL);
> +               args_virt = kzalloc(PAGE_ALIGN(alloc_len), flag);
>  
>                 if (!args_virt)
>                         return -ENOMEM;
> @@ -156,6 +169,41 @@ static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
>         return 0;
>  }
>  
> +/**
> + * qcom_scm_call() - Invoke a syscall in the secure world
> + * @dev:       device
> + * @svc_id:    service identifier
> + * @cmd_id:    command identifier
> + * @desc:      Descriptor structure containing arguments and return values
> + *
> + * Sends a command to the SCM and waits for the command to finish processing.
> + * This should *only* be called in pre-emptible context.

Add a might_sleep() then?

> + */
> +static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
> +                        const struct qcom_scm_desc *desc,
> +                        struct arm_smccc_res *res)
> +{
> +       return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, false);
> +}
> +
> +/**
> + * qcom_scm_call_atomic() - atomic variation of qcom_scm_call()
> + * @dev:       device
> + * @svc_id:    service identifier
> + * @cmd_id:    command identifier
> + * @desc:      Descriptor structure containing arguments and return values
> + * @res:       Structure containing results from SMC/HVC call
> + *
> + * Sends a command to the SCM and waits for the command to finish processing.
> + * This should be called in atomic context only.
 
Maybe add a cant_sleep()?

> + */
> +static int qcom_scm_call_atomic(struct device *dev, u32 svc_id, u32 cmd_id,
> +                               const struct qcom_scm_desc *desc,
> +                               struct arm_smccc_res *res)
> +{
> +       return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, true);
> +}
> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
  2019-09-06  6:32   ` Vivek Gautam
  2019-09-06 15:07   ` Stephen Boyd
@ 2019-09-10 13:26   ` Robin Murphy
  2019-09-16  9:46     ` Sai Prakash Ranjan
  2 siblings, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2019-09-10 13:26 UTC (permalink / raw)
  To: Vivek Gautam, joro, agross, will.deacon, iommu
  Cc: bjorn.andersson, linux-arm-msm, linux-kernel

On 23/08/2019 07:32, Vivek Gautam wrote:
> Add reset hook for sdm845 based platforms to turn off
> the wait-for-safe sequence.
> 
> Understanding how wait-for-safe logic affects USB and UFS performance
> on MTP845 and DB845 boards:
> 
> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> to address under-performance issues in real-time clients, such as
> Display, and Camera.
> On receiving an invalidation requests, the SMMU forwards SAFE request
> to these clients and waits for SAFE ack signal from real-time clients.
> The SAFE signal from such clients is used to qualify the start of
> invalidation.
> This logic is controlled by chicken bits, one for each - MDP (display),
> IFE0, and IFE1 (camera), that can be accessed only from secure software
> on sdm845.
> 
> This configuration, however, degrades the performance of non-real time
> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> logic enabled the hardware tries to throttle non-real time clients while
> waiting for SAFE ack signals from real-time clients.
> 
> On mtp845 and db845 devices, with wait-for-safe logic enabled by the
> bootloaders we see degraded performance of USB and UFS when kernel
> enables the smmu stage-1 translations for these clients.
> Turn off this wait-for-safe logic from the kernel gets us back the perf
> of USB and UFS devices until we re-visit this when we start seeing perf
> issues on display/camera on upstream supported SDM845 platforms.
> The bootloaders on these boards implement secure monitor callbacks to
> handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
> logic can be toggled.
> 
> There are other boards such as cheza whose bootloaders don't enable this
> logic. Such boards don't implement callbacks to handle the specific SCM
> call so disabling this logic for such boards will be a no-op.
> 
> This change is inspired by the downstream change from Patrick Daly
> to address performance issues with display and camera by handling
> this wait-for-safe within separte io-pagetable ops to do TLB
> maintenance. So a big thanks to him for the change and for all the
> offline discussions.
> 
> Without this change the UFS reads are pretty slow:
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> real    0m 22.39s
> user    0m 0.00s
> sys     0m 0.01s
> 
> With this change they are back to rock!
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> 300+0 records in
> 300+0 records out
> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> real    0m 1.03s
> user    0m 0.00s
> sys     0m 0.54s
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>   drivers/iommu/arm-smmu-impl.c | 27 ++++++++++++++++++++++++++-

I'd be inclined to introduce the inevitable arm-smmu-qcom.c from the 
start, and save worrying about moving this out later. Other than that, 
though, the general self-contained shape of it all is every bit as 
beautiful as I'd hoped :D

Robin.

>   1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..0aef87c41f9c 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -6,6 +6,7 @@
>   
>   #include <linux/bitfield.h>
>   #include <linux/of.h>
> +#include <linux/qcom_scm.h>
>   
>   #include "arm-smmu.h"
>   
> @@ -102,7 +103,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm
>   	return &cs->smmu;
>   }
>   
> -
>   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>   
>   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
>   	.reset = arm_mmu500_reset,
>   };
>   
> +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
> +{
> +	int ret;
> +
> +	arm_mmu500_reset(smmu);
> +
> +	/*
> +	 * To address performance degradation in non-real time clients,
> +	 * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
> +	 * such as MTP and db845, whose firmwares implement secure monitor
> +	 * call handlers to turn on/off the wait-for-safe logic.
> +	 */
> +	ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
> +	if (ret)
> +		dev_warn(smmu->dev, "Failed to turn off SAFE logic\n");
> +
> +	return 0;
> +}
> +
> +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {
> +	.reset = qcom_sdm845_smmu500_reset,
> +};
>   
>   struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>   {
> @@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>   				  "calxeda,smmu-secure-config-access"))
>   		smmu->impl = &calxeda_impl;
>   
> +	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
> +		smmu->impl = &qcom_sdm845_smmu500_impl;
> +
>   	return smmu;
>   }
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-09-10 13:26   ` Robin Murphy
@ 2019-09-16  9:46     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 11+ messages in thread
From: Sai Prakash Ranjan @ 2019-09-16  9:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Vivek Gautam, joro, agross, will.deacon, iommu, bjorn.andersson,
	linux-arm-msm, linux-kernel, linux-arm-msm-owner

Hi Robin,

On 2019-09-10 18:56, Robin Murphy wrote:
> On 23/08/2019 07:32, Vivek Gautam wrote:
>> Add reset hook for sdm845 based platforms to turn off
>> the wait-for-safe sequence.
>> 
>> Understanding how wait-for-safe logic affects USB and UFS performance
>> on MTP845 and DB845 boards:
>> 
>> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
>> to address under-performance issues in real-time clients, such as
>> Display, and Camera.
>> On receiving an invalidation requests, the SMMU forwards SAFE request
>> to these clients and waits for SAFE ack signal from real-time clients.
>> The SAFE signal from such clients is used to qualify the start of
>> invalidation.
>> This logic is controlled by chicken bits, one for each - MDP 
>> (display),
>> IFE0, and IFE1 (camera), that can be accessed only from secure 
>> software
>> on sdm845.
>> 
>> This configuration, however, degrades the performance of non-real time
>> clients, such as USB, and UFS etc. This happens because, with 
>> wait-for-safe
>> logic enabled the hardware tries to throttle non-real time clients 
>> while
>> waiting for SAFE ack signals from real-time clients.
>> 
>> On mtp845 and db845 devices, with wait-for-safe logic enabled by the
>> bootloaders we see degraded performance of USB and UFS when kernel
>> enables the smmu stage-1 translations for these clients.
>> Turn off this wait-for-safe logic from the kernel gets us back the 
>> perf
>> of USB and UFS devices until we re-visit this when we start seeing 
>> perf
>> issues on display/camera on upstream supported SDM845 platforms.
>> The bootloaders on these boards implement secure monitor callbacks to
>> handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
>> logic can be toggled.
>> 
>> There are other boards such as cheza whose bootloaders don't enable 
>> this
>> logic. Such boards don't implement callbacks to handle the specific 
>> SCM
>> call so disabling this logic for such boards will be a no-op.
>> 
>> This change is inspired by the downstream change from Patrick Daly
>> to address performance issues with display and camera by handling
>> this wait-for-safe within separte io-pagetable ops to do TLB
>> maintenance. So a big thanks to him for the change and for all the
>> offline discussions.
>> 
>> Without this change the UFS reads are pretty slow:
>> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
>> 10+0 records in
>> 10+0 records out
>> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
>> real    0m 22.39s
>> user    0m 0.00s
>> sys     0m 0.01s
>> 
>> With this change they are back to rock!
>> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
>> 300+0 records in
>> 300+0 records out
>> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
>> real    0m 1.03s
>> user    0m 0.00s
>> sys     0m 0.54s
>> 
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>   drivers/iommu/arm-smmu-impl.c | 27 ++++++++++++++++++++++++++-
> 
> I'd be inclined to introduce the inevitable arm-smmu-qcom.c from the
> start, and save worrying about moving this out later. Other than that,
> though, the general self-contained shape of it all is every bit as
> beautiful as I'd hoped :D
> 

Have posted v5 with your suggestion.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
  2019-09-06 15:07   ` Stephen Boyd
@ 2019-09-16  9:47     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 11+ messages in thread
From: Sai Prakash Ranjan @ 2019-09-16  9:47 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Vivek Gautam, agross, iommu, joro, robin.murphy, will.deacon,
	bjorn.andersson, linux-arm-msm, linux-kernel,
	linux-arm-msm-owner

Hi Stephen,

On 2019-09-06 20:37, Stephen Boyd wrote:
> Quoting Vivek Gautam (2019-08-22 23:32:48)
>> diff --git a/drivers/iommu/arm-smmu-impl.c 
>> b/drivers/iommu/arm-smmu-impl.c
>> index 3f88cd078dd5..0aef87c41f9c 100644
>> --- a/drivers/iommu/arm-smmu-impl.c
>> +++ b/drivers/iommu/arm-smmu-impl.c
>> @@ -102,7 +103,6 @@ static struct arm_smmu_device 
>> *cavium_smmu_impl_init(struct arm_smmu_device *smm
>>         return &cs->smmu;
>>  }
>> 
>> -
>>  #define ARM_MMU500_ACTLR_CPRE          (1 << 1)
>> 
>>  #define ARM_MMU500_ACR_CACHE_LOCK      (1 << 26)
> 
> Drop this hunk?
> 
>> @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl 
>> = {
>>         .reset = arm_mmu500_reset,
>>  };
>> 
>> +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
>> +{
>> +       int ret;
>> +
>> +       arm_mmu500_reset(smmu);
>> +
>> +       /*
>> +        * To address performance degradation in non-real time 
>> clients,
>> +        * such as USB and UFS, turn off wait-for-safe on sdm845 based 
>> boards,
>> +        * such as MTP and db845, whose firmwares implement secure 
>> monitor
>> +        * call handlers to turn on/off the wait-for-safe logic.
>> +        */
>> +       ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
>> +       if (ret)
>> +               dev_warn(smmu->dev, "Failed to turn off SAFE 
>> logic\n");
>> +
>> +       return 0;
> 
> return ret? Or intentionally don't return an error for failure?
> 
>> +}
>> +
>> +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {
> 
> static?
> 
>> +       .reset = qcom_sdm845_smmu500_reset,
>> +};
>> 
>>  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device 
>> *smmu)
>>  {

Have addressed all your comments in v5.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call
  2019-09-06 15:12   ` Stephen Boyd
@ 2019-09-16  9:48     ` Sai Prakash Ranjan
  0 siblings, 0 replies; 11+ messages in thread
From: Sai Prakash Ranjan @ 2019-09-16  9:48 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Vivek Gautam, agross, iommu, joro, robin.murphy, will.deacon,
	bjorn.andersson, linux-arm-msm, linux-kernel,
	linux-arm-msm-owner

Hi Stephen,

On 2019-09-06 20:42, Stephen Boyd wrote:
> Quoting Vivek Gautam (2019-08-22 23:32:46)
>> There are scnenarios where drivers are required to make a
>> scm call in atomic context, such as in one of the qcom's
>> arm-smmu-500 errata [1].
>> 
>> [1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/
>>       tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n4842")
>> 
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>> ---
>>  drivers/firmware/qcom_scm-64.c | 136 
>> ++++++++++++++++++++++++++++-------------
>>  1 file changed, 92 insertions(+), 44 deletions(-)
>> 
>> diff --git a/drivers/firmware/qcom_scm-64.c 
>> b/drivers/firmware/qcom_scm-64.c
>> index 91d5ad7cf58b..b6dca32c5ac4 100644
>> --- a/drivers/firmware/qcom_scm-64.c
>> +++ b/drivers/firmware/qcom_scm-64.c
>> @@ -62,32 +62,71 @@ static DEFINE_MUTEX(qcom_scm_lock);
>>  #define FIRST_EXT_ARG_IDX 3
>>  #define N_REGISTER_ARGS (MAX_QCOM_SCM_ARGS - N_EXT_QCOM_SCM_ARGS + 1)
>> 
>> -/**
>> - * qcom_scm_call() - Invoke a syscall in the secure world
>> - * @dev:       device
>> - * @svc_id:    service identifier
>> - * @cmd_id:    command identifier
>> - * @desc:      Descriptor structure containing arguments and return 
>> values
>> - *
>> - * Sends a command to the SCM and waits for the command to finish 
>> processing.
>> - * This should *only* be called in pre-emptible context.
>> -*/
>> -static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
>> -                        const struct qcom_scm_desc *desc,
>> -                        struct arm_smccc_res *res)
>> +static void __qcom_scm_call_do(const struct qcom_scm_desc *desc,
>> +                              struct arm_smccc_res *res, u32 fn_id,
>> +                              u64 x5, u32 type)
>> +{
>> +       u64 cmd;
>> +       struct arm_smccc_quirk quirk = {.id = 
>> ARM_SMCCC_QUIRK_QCOM_A6};
> 
> Nitpick: Put spaces around braces please.
> 
>> +
>> +       cmd = ARM_SMCCC_CALL_VAL(type, qcom_smccc_convention,
>> +                                ARM_SMCCC_OWNER_SIP, fn_id);
>> +
>> +       quirk.state.a6 = 0;
>> +
>> +       do {
>> +               arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
>> +                                   desc->args[1], desc->args[2], x5,
>> +                                   quirk.state.a6, 0, res, &quirk);
>> +
>> +               if (res->a0 == QCOM_SCM_INTERRUPTED)
>> +                       cmd = res->a0;
>> +
>> +       } while (res->a0 == QCOM_SCM_INTERRUPTED);
>> +}
>> +
>> +static void qcom_scm_call_do(const struct qcom_scm_desc *desc,
>> +                            struct arm_smccc_res *res, u32 fn_id,
>> +                            u64 x5, bool atomic)
>> +{
>> +       int retry_count = 0;
>> +
>> +       if (!atomic) {
>> +               do {
>> +                       mutex_lock(&qcom_scm_lock);
>> +
>> +                       __qcom_scm_call_do(desc, res, fn_id, x5,
>> +                                          ARM_SMCCC_STD_CALL);
>> +
>> +                       mutex_unlock(&qcom_scm_lock);
>> +
>> +                       if (res->a0 == QCOM_SCM_V2_EBUSY) {
>> +                               if (retry_count++ > 
>> QCOM_SCM_EBUSY_MAX_RETRY)
>> +                                       break;
>> +                               msleep(QCOM_SCM_EBUSY_WAIT_MS);
>> +                       }
>> +               }  while (res->a0 == QCOM_SCM_V2_EBUSY);
>> +       } else {
>> +               __qcom_scm_call_do(desc, res, fn_id, x5, 
>> ARM_SMCCC_FAST_CALL);
>> +       }
> 
> To save on some indentation maybe you could write it like:
> 
> 	if (atomic) {
> 		__qcom_scm_call_do(..)
> 		return;
> 	}
> 
> 	do {
> 		mutex_lock(..)
> 		...
> 	} while (..);
> 
>> +}
>> +
>> +static int ___qcom_scm_call(struct device *dev, u32 svc_id, u32 
>> cmd_id,
>> +                           const struct qcom_scm_desc *desc,
>> +                           struct arm_smccc_res *res, bool atomic)
>>  {
>>         int arglen = desc->arginfo & 0xf;
>> -       int retry_count = 0, i;
>> +       int i;
>>         u32 fn_id = QCOM_SCM_FNID(svc_id, cmd_id);
>> -       u64 cmd, x5 = desc->args[FIRST_EXT_ARG_IDX];
>> +       u64 x5 = desc->args[FIRST_EXT_ARG_IDX];
>>         dma_addr_t args_phys = 0;
>>         void *args_virt = NULL;
>>         size_t alloc_len;
>> -       struct arm_smccc_quirk quirk = {.id = 
>> ARM_SMCCC_QUIRK_QCOM_A6};
>> +       gfp_t flag = atomic ? GFP_ATOMIC : GFP_KERNEL;
>> 
>>         if (unlikely(arglen > N_REGISTER_ARGS)) {
>>                 alloc_len = N_EXT_QCOM_SCM_ARGS * sizeof(u64);
>> -               args_virt = kzalloc(PAGE_ALIGN(alloc_len), 
>> GFP_KERNEL);
>> +               args_virt = kzalloc(PAGE_ALIGN(alloc_len), flag);
>> 
>>                 if (!args_virt)
>>                         return -ENOMEM;
>> @@ -156,6 +169,41 @@ static int qcom_scm_call(struct device *dev, u32 
>> svc_id, u32 cmd_id,
>>         return 0;
>>  }
>> 
>> +/**
>> + * qcom_scm_call() - Invoke a syscall in the secure world
>> + * @dev:       device
>> + * @svc_id:    service identifier
>> + * @cmd_id:    command identifier
>> + * @desc:      Descriptor structure containing arguments and return 
>> values
>> + *
>> + * Sends a command to the SCM and waits for the command to finish 
>> processing.
>> + * This should *only* be called in pre-emptible context.
> 
> Add a might_sleep() then?
> 
>> + */
>> +static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
>> +                        const struct qcom_scm_desc *desc,
>> +                        struct arm_smccc_res *res)
>> +{
>> +       return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, 
>> false);
>> +}
>> +
>> +/**
>> + * qcom_scm_call_atomic() - atomic variation of qcom_scm_call()
>> + * @dev:       device
>> + * @svc_id:    service identifier
>> + * @cmd_id:    command identifier
>> + * @desc:      Descriptor structure containing arguments and return 
>> values
>> + * @res:       Structure containing results from SMC/HVC call
>> + *
>> + * Sends a command to the SCM and waits for the command to finish 
>> processing.
>> + * This should be called in atomic context only.
> 
> Maybe add a cant_sleep()?
> 
>> + */
>> +static int qcom_scm_call_atomic(struct device *dev, u32 svc_id, u32 
>> cmd_id,
>> +                               const struct qcom_scm_desc *desc,
>> +                               struct arm_smccc_res *res)
>> +{
>> +       return ___qcom_scm_call(dev, svc_id, cmd_id, desc, res, true);
>> +}
>> +

Have addressed all your comments in v5.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-23  6:32 [PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845 Vivek Gautam
2019-08-23  6:32 ` [PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call Vivek Gautam
2019-09-06 15:12   ` Stephen Boyd
2019-09-16  9:48     ` Sai Prakash Ranjan
2019-08-23  6:32 ` [PATCH v4 2/3] firmware/qcom_scm: Add scm call to handle smmu errata Vivek Gautam
2019-08-23  6:32 ` [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook Vivek Gautam
2019-09-06  6:32   ` Vivek Gautam
2019-09-06 15:07   ` Stephen Boyd
2019-09-16  9:47     ` Sai Prakash Ranjan
2019-09-10 13:26   ` Robin Murphy
2019-09-16  9:46     ` Sai Prakash Ranjan

Linux-ARM-MSM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-msm/0 linux-arm-msm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-msm linux-arm-msm/ https://lore.kernel.org/linux-arm-msm \
		linux-arm-msm@vger.kernel.org
	public-inbox-index linux-arm-msm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-arm-msm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git