linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] iommu/arm-smmu: adreno-smmu page fault handling
@ 2020-11-24 19:15 Jordan Crouse
  2020-11-24 19:15 ` [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers Jordan Crouse
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jordan Crouse @ 2020-11-24 19:15 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Robin Murphy, iommu, Will Deacon, Akhil P Oommen,
	Bjorn Andersson, Daniel Vetter, David Airlie, Emil Velikov,
	Eric Anholt, Greg Kroah-Hartman, Gustavo A. R. Silva,
	Joerg Roedel, Jonathan Marek, Krishna Reddy, Rob Clark,
	Rob Clark, Sai Prakash Ranjan, Sean Paul, Sharat Masetty,
	Thomas Zimmermann, dri-devel, freedreno, linux-arm-kernel,
	linux-kernel

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v2: Fix comment wording and function pointer check per Rob Clark

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  drm/msm: Add an adreno-smmu-priv callback to get pagefault info
  drm/msm: Improve the a6xx page fault handler

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c      |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      | 76 +++++++++++++++++++++-
 drivers/gpu/drm/msm/msm_iommu.c            | 11 +++-
 drivers/gpu/drm/msm/msm_mmu.h              |  4 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 ++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 16 ++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 +
 include/linux/adreno-smmu-priv.h           | 31 ++++++++-
 8 files changed, 151 insertions(+), 12 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2020-11-24 19:15 [PATCH v2 0/3] iommu/arm-smmu: adreno-smmu page fault handling Jordan Crouse
@ 2020-11-24 19:15 ` Jordan Crouse
  2021-01-22 12:41   ` Will Deacon
  2020-11-24 19:15 ` [PATCH v2 2/3] drm/msm: Add an adreno-smmu-priv callback to get pagefault info Jordan Crouse
  2020-11-24 19:16 ` [PATCH v2 3/3] drm/msm: Improve the a6xx page fault handler Jordan Crouse
  2 siblings, 1 reply; 10+ messages in thread
From: Jordan Crouse @ 2020-11-24 19:15 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Robin Murphy, iommu, Will Deacon, Greg Kroah-Hartman,
	Joerg Roedel, Krishna Reddy, linux-arm-kernel, linux-kernel

Call report_iommu_fault() to allow upper-level drivers to register their
own fault handlers.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0f28a8614da3..7fd18bbda8f5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	int idx = smmu_domain->cfg.cbndx;
+	int ret;
 
 	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
 	if (!(fsr & ARM_SMMU_FSR_FAULT))
@@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
 	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
 
-	dev_err_ratelimited(smmu->dev,
-	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+	ret = report_iommu_fault(domain, dev, iova,
+		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+
+	if (ret == -ENOSYS)
+		dev_err_ratelimited(smmu->dev,
+		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
 			    fsr, iova, fsynr, cbfrsynra, idx);
 
-	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
+	/*
+	 * If the iommu fault returns an error (except -ENOSYS) then assume that
+	 * they will handle resuming on their own
+	 */
+	if (!ret || ret == -ENOSYS)
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
 	return IRQ_HANDLED;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] drm/msm: Add an adreno-smmu-priv callback to get pagefault info
  2020-11-24 19:15 [PATCH v2 0/3] iommu/arm-smmu: adreno-smmu page fault handling Jordan Crouse
  2020-11-24 19:15 ` [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers Jordan Crouse
@ 2020-11-24 19:15 ` Jordan Crouse
  2020-11-24 19:16 ` [PATCH v2 3/3] drm/msm: Improve the a6xx page fault handler Jordan Crouse
  2 siblings, 0 replies; 10+ messages in thread
From: Jordan Crouse @ 2020-11-24 19:15 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Robin Murphy, iommu, Will Deacon, Bjorn Andersson, Joerg Roedel,
	Krishna Reddy, Rob Clark, Sai Prakash Ranjan, linux-arm-kernel,
	linux-kernel

Add a callback in adreno-smmu-priv to read interesting SMMU
registers to provide an opportunity for a richer debug experience
in the GPU driver.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 ++
 include/linux/adreno-smmu-priv.h           | 31 +++++++++++++++++++++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index d0636c803a36..367a267324a2 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -32,6 +32,24 @@ static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
 	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+static void qcom_adreno_smmu_get_fault_info(const void *cookie,
+		struct adreno_smmu_fault_info *info)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
+	/* FIXME: return error here if we aren't really in a fault? */
+
+	info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
+	info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
+	info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
+	info->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
+	info->ttbr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
+	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0
 
 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -156,6 +174,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->cookie = smmu_domain;
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 04288b6fc619..fe511540a6bf 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -224,6 +224,8 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_FSYNR0_WNR		BIT(4)
 
+#define ARM_SMMU_CB_FSYNR1		0x6c
+
 #define ARM_SMMU_CB_S1_TLBIVA		0x600
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index a889f28afb42..53fe32fb9214 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -8,6 +8,32 @@
 
 #include <linux/io-pgtable.h>
 
+/**
+ * struct adreno_smmu_fault_info - container for key fault information
+ *
+ * @far: The faulting IOVA from ARM_SMMU_CB_FAR
+ * @ttbr0: The current TTBR0 pagetable from ARM_SMMU_CB_TTBR0
+ * @contextidr: The value of ARM_SMMU_CB_CONTEXTIDR
+ * @fsr: The fault status from ARM_SMMU_CB_FSR
+ * @fsynr0: The value of FSYNR0 from ARM_SMMU_CB_FSYNR0
+ * @fsynr1: The value of FSYNR1 from ARM_SMMU_CB_FSYNR0
+ * @cbfrsynra: The value of CBFRSYNRA from ARM_SMMU_GR1_CBFRSYNRA(idx)
+ *
+ * This struct passes back key page fault information to the GPU driver
+ * through the get_fault_info function pointer.
+ * The GPU driver can use this information to print informative
+ * log messages and provide deeper GPU specific insight into the fault.
+ */
+struct adreno_smmu_fault_info {
+	u64 far;
+	u64 ttbr0;
+	u32 contextidr;
+	u32 fsr;
+	u32 fsynr0;
+	u32 fsynr1;
+	u32 cbfrsynra;
+};
+
 /**
  * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
  *
@@ -17,6 +43,8 @@
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
  *                 NULL config disables TTBR0 translation, otherwise
  *                 TTBR0 translation is enabled with the specified cfg
+ * @get_fault_info: Called by the GPU fault handler to get information about
+ *                  the fault
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -31,6 +59,7 @@ struct adreno_smmu_priv {
     const void *cookie;
     const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
     int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+    void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
 };
 
-#endif /* __ADRENO_SMMU_PRIV_H */
\ No newline at end of file
+#endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] drm/msm: Improve the a6xx page fault handler
  2020-11-24 19:15 [PATCH v2 0/3] iommu/arm-smmu: adreno-smmu page fault handling Jordan Crouse
  2020-11-24 19:15 ` [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers Jordan Crouse
  2020-11-24 19:15 ` [PATCH v2 2/3] drm/msm: Add an adreno-smmu-priv callback to get pagefault info Jordan Crouse
@ 2020-11-24 19:16 ` Jordan Crouse
  2 siblings, 0 replies; 10+ messages in thread
From: Jordan Crouse @ 2020-11-24 19:16 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Robin Murphy, iommu, Will Deacon, Akhil P Oommen, Daniel Vetter,
	David Airlie, Emil Velikov, Eric Anholt, Gustavo A. R. Silva,
	Jonathan Marek, Rob Clark, Sean Paul, Sharat Masetty,
	Thomas Zimmermann, dri-devel, freedreno, linux-kernel

Use the new adreno-smmu-priv fault info function to get more SMMU
debug registers and print the current TTBR0 to debug per-instance
pagetables and figure out which GPU block generated the request.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 76 +++++++++++++++++++++++++--
 drivers/gpu/drm/msm/msm_iommu.c       | 11 +++-
 drivers/gpu/drm/msm/msm_mmu.h         |  4 +-
 4 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index d6804a802355..ed4cb81af874 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -933,7 +933,7 @@ bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	return true;
 }
 
-static int a5xx_fault_handler(void *arg, unsigned long iova, int flags)
+static int a5xx_fault_handler(void *arg, unsigned long iova, int flags, void *data)
 {
 	struct msm_gpu *gpu = arg;
 	pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d (%u,%u,%u,%u)\n",
@@ -943,7 +943,7 @@ static int a5xx_fault_handler(void *arg, unsigned long iova, int flags)
 			gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(6)),
 			gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(7)));
 
-	return -EFAULT;
+	return 0;
 }
 
 static void a5xx_cp_err_irq(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 948f3656c20c..ac6e8cd5cf1a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -905,18 +905,88 @@ static void a6xx_recover(struct msm_gpu *gpu)
 	msm_gpu_hw_init(gpu);
 }
 
-static int a6xx_fault_handler(void *arg, unsigned long iova, int flags)
+static const char *a6xx_uche_fault_block(struct msm_gpu *gpu, u32 mid)
+{
+	static const char *uche_clients[7] = {
+		"VFD", "SP", "VSC", "VPC", "HLSQ", "PC", "LRZ",
+	};
+	u32 val;
+
+	if (mid < 1 || mid > 3)
+		return "UNKNOWN";
+
+	/*
+	 * The source of the data depends on the mid ID read from FSYNR1.
+	 * and the client ID read from the UCHE block
+	 */
+	val = gpu_read(gpu, REG_A6XX_UCHE_CLIENT_PF);
+
+	/* mid = 3 is most precise and refers to only one block per client */
+	if (mid == 3)
+		return uche_clients[val & 7];
+
+	/* For mid=2 the source is TP or VFD except when the client id is 0 */
+	if (mid == 2)
+		return ((val & 7) == 0) ? "TP" : "TP|VFD";
+
+	/* For mid=1 just return "UCHE" as a catchall for everything else */
+	return "UCHE";
+}
+
+static const char *a6xx_fault_block(struct msm_gpu *gpu, u32 id)
+{
+	if (id == 0)
+		return "CP";
+	else if (id == 4)
+		return "CCU";
+	else if (id == 6)
+		return "CDP Prefetch";
+
+	return a6xx_uche_fault_block(gpu, id);
+}
+
+#define ARM_SMMU_FSR_TF                 BIT(1)
+#define ARM_SMMU_FSR_PF			BIT(3)
+#define ARM_SMMU_FSR_EF			BIT(4)
+
+static int a6xx_fault_handler(void *arg, unsigned long iova, int flags, void *data)
 {
 	struct msm_gpu *gpu = arg;
+	struct adreno_smmu_fault_info *info = data;
+	const char *type = "UNKNOWN";
 
-	pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d (%u,%u,%u,%u)\n",
+	/*
+	 * Print a default message if we couldn't get the data from the
+	 * adreno-smmu-priv
+	 */
+	if (!info) {
+		pr_warn_ratelimited("*** gpu fault: iova=%.16lx flags=%d (%u,%u,%u,%u)\n",
 			iova, flags,
 			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
 			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
 			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
 			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
 
-	return -EFAULT;
+		return 0;
+	}
+
+	if (info->fsr & ARM_SMMU_FSR_TF)
+		type = "TRANSLATION";
+	else if (info->fsr & ARM_SMMU_FSR_PF)
+		type = "PERMISSION";
+	else if (info->fsr & ARM_SMMU_FSR_EF)
+		type = "EXTERNAL";
+
+	pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s type=%s source=%s (%u,%u,%u,%u)\n",
+			info->ttbr0, iova,
+			flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", type,
+			a6xx_fault_block(gpu, info->fsynr1 & 0xff),
+			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
+			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
+			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
+			gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
+
+	return 0;
 }
 
 static void a6xx_cp_hw_err_irq(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 22ac7c692a81..73db30dfda0a 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -212,8 +212,17 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 		unsigned long iova, int flags, void *arg)
 {
 	struct msm_iommu *iommu = arg;
+	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(iommu->base.dev);
+	struct adreno_smmu_fault_info info, *ptr = NULL;
+
+	if (adreno_smmu->get_fault_info) {
+		adreno_smmu->get_fault_info(adreno_smmu->cookie, &info);
+		ptr = &info;
+	}
+
 	if (iommu->base.handler)
-		return iommu->base.handler(iommu->base.arg, iova, flags);
+		return iommu->base.handler(iommu->base.arg, iova, flags, ptr);
+
 	pr_warn_ratelimited("*** fault: iova=%16lx, flags=%d\n", iova, flags);
 	return 0;
 }
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 61ade89d9e48..a88f44c3268d 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -26,7 +26,7 @@ enum msm_mmu_type {
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
-	int (*handler)(void *arg, unsigned long iova, int flags);
+	int (*handler)(void *arg, unsigned long iova, int flags, void *data);
 	void *arg;
 	enum msm_mmu_type type;
 };
@@ -43,7 +43,7 @@ struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
 struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu);
 
 static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
-		int (*handler)(void *arg, unsigned long iova, int flags))
+		int (*handler)(void *arg, unsigned long iova, int flags, void *data))
 {
 	mmu->arg = arg;
 	mmu->handler = handler;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2020-11-24 19:15 ` [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers Jordan Crouse
@ 2021-01-22 12:41   ` Will Deacon
  2021-01-22 12:53     ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2021-01-22 12:41 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: linux-arm-msm, Robin Murphy, iommu, Greg Kroah-Hartman,
	Joerg Roedel, Krishna Reddy, linux-arm-kernel, linux-kernel

On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
> Call report_iommu_fault() to allow upper-level drivers to register their
> own fault handlers.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
> 
>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 0f28a8614da3..7fd18bbda8f5 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	int idx = smmu_domain->cfg.cbndx;
> +	int ret;
>  
>  	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>  	if (!(fsr & ARM_SMMU_FSR_FAULT))
> @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>  	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>  	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>  
> -	dev_err_ratelimited(smmu->dev,
> -	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> +	ret = report_iommu_fault(domain, dev, iova,
> +		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +
> +	if (ret == -ENOSYS)
> +		dev_err_ratelimited(smmu->dev,
> +		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>  			    fsr, iova, fsynr, cbfrsynra, idx);
>  
> -	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> +	/*
> +	 * If the iommu fault returns an error (except -ENOSYS) then assume that
> +	 * they will handle resuming on their own
> +	 */
> +	if (!ret || ret == -ENOSYS)
> +		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);

Hmm, I don't grok this part. If the fault handler returned an error and
we don't clear the FSR, won't we just re-take the irq immediately? I think
it would be better to do this unconditionally, and print the "Unhandled
context fault" message for any non-zero value of ret.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-01-22 12:41   ` Will Deacon
@ 2021-01-22 12:53     ` Robin Murphy
  2021-01-25 21:51       ` Jordan Crouse
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2021-01-22 12:53 UTC (permalink / raw)
  To: Will Deacon, Jordan Crouse
  Cc: linux-arm-msm, iommu, Greg Kroah-Hartman, Joerg Roedel,
	Krishna Reddy, linux-arm-kernel, linux-kernel

On 2021-01-22 12:41, Will Deacon wrote:
> On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
>> Call report_iommu_fault() to allow upper-level drivers to register their
>> own fault handlers.
>>
>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>> ---
>>
>>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
>>   1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> index 0f28a8614da3..7fd18bbda8f5 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   	int idx = smmu_domain->cfg.cbndx;
>> +	int ret;
>>   
>>   	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>>   	if (!(fsr & ARM_SMMU_FSR_FAULT))
>> @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>   	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>>   	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>>   
>> -	dev_err_ratelimited(smmu->dev,
>> -	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>> +	ret = report_iommu_fault(domain, dev, iova,
>> +		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>> +
>> +	if (ret == -ENOSYS)
>> +		dev_err_ratelimited(smmu->dev,
>> +		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>   			    fsr, iova, fsynr, cbfrsynra, idx);
>>   
>> -	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>> +	/*
>> +	 * If the iommu fault returns an error (except -ENOSYS) then assume that
>> +	 * they will handle resuming on their own
>> +	 */
>> +	if (!ret || ret == -ENOSYS)
>> +		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> 
> Hmm, I don't grok this part. If the fault handler returned an error and
> we don't clear the FSR, won't we just re-take the irq immediately?

If we don't touch the FSR at all, yes. Even if we clear the fault 
indicator bits, the interrupt *might* remain asserted until a stalled 
transaction is actually resolved - that's that lovely IMP-DEF corner.

Robin.

> I think
> it would be better to do this unconditionally, and print the "Unhandled
> context fault" message for any non-zero value of ret.
> 
> Will
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-01-22 12:53     ` Robin Murphy
@ 2021-01-25 21:51       ` Jordan Crouse
  2021-01-26 11:40         ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Jordan Crouse @ 2021-01-25 21:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, linux-arm-msm, iommu, Greg Kroah-Hartman,
	Joerg Roedel, Krishna Reddy, linux-arm-kernel, linux-kernel

On Fri, Jan 22, 2021 at 12:53:17PM +0000, Robin Murphy wrote:
> On 2021-01-22 12:41, Will Deacon wrote:
> >On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
> >>Call report_iommu_fault() to allow upper-level drivers to register their
> >>own fault handlers.
> >>
> >>Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> >>---
> >>
> >>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
> >>  1 file changed, 13 insertions(+), 3 deletions(-)
> >>
> >>diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>index 0f28a8614da3..7fd18bbda8f5 100644
> >>--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>@@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>  	int idx = smmu_domain->cfg.cbndx;
> >>+	int ret;
> >>  	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> >>  	if (!(fsr & ARM_SMMU_FSR_FAULT))
> >>@@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>  	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> >>  	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> >>-	dev_err_ratelimited(smmu->dev,
> >>-	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> >>+	ret = report_iommu_fault(domain, dev, iova,
> >>+		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >>+
> >>+	if (ret == -ENOSYS)
> >>+		dev_err_ratelimited(smmu->dev,
> >>+		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> >>  			    fsr, iova, fsynr, cbfrsynra, idx);
> >>-	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> >>+	/*
> >>+	 * If the iommu fault returns an error (except -ENOSYS) then assume that
> >>+	 * they will handle resuming on their own
> >>+	 */
> >>+	if (!ret || ret == -ENOSYS)
> >>+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> >
> >Hmm, I don't grok this part. If the fault handler returned an error and
> >we don't clear the FSR, won't we just re-take the irq immediately?
> 
> If we don't touch the FSR at all, yes. Even if we clear the fault indicator
> bits, the interrupt *might* remain asserted until a stalled transaction is
> actually resolved - that's that lovely IMP-DEF corner.
>
> Robin.
> 

This is for stall-on-fault. The idea is that if the developer chooses to do so
we would stall the GPU after a fault long enough to take a picture of it with
devcoredump and then release the FSR. Since we can't take the devcoredump from
the interrupt handler we schedule it in a worker and then return an error
to let the main handler know that we'll come back around clear the FSR later
when we are done.

It is assumed that we'll have to turn off interrupts in our handler to allow
this to work. Its all very implementation specific, but then again we're
assuming that if you want to do this then you know what you are doing.

In that spirit the error that skips the FSR should probably be something
specific instead of "all errors" - that way a well meaning handler that returns
a -EINVAL doesn't accidentally break itself.

Jordan

> >I think
> >it would be better to do this unconditionally, and print the "Unhandled
> >context fault" message for any non-zero value of ret.

> >
> >Will
> >

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-01-25 21:51       ` Jordan Crouse
@ 2021-01-26 11:40         ` Robin Murphy
  2021-01-26 16:05           ` Rob Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2021-01-26 11:40 UTC (permalink / raw)
  To: Will Deacon, linux-arm-msm, iommu, Greg Kroah-Hartman,
	Joerg Roedel, Krishna Reddy, linux-arm-kernel, linux-kernel

On 2021-01-25 21:51, Jordan Crouse wrote:
> On Fri, Jan 22, 2021 at 12:53:17PM +0000, Robin Murphy wrote:
>> On 2021-01-22 12:41, Will Deacon wrote:
>>> On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
>>>> Call report_iommu_fault() to allow upper-level drivers to register their
>>>> own fault handlers.
>>>>
>>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>>> ---
>>>>
>>>>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
>>>>   1 file changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> index 0f28a8614da3..7fd18bbda8f5 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>   	int idx = smmu_domain->cfg.cbndx;
>>>> +	int ret;
>>>>   	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>>>>   	if (!(fsr & ARM_SMMU_FSR_FAULT))
>>>> @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>>>   	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>>>>   	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>>>> -	dev_err_ratelimited(smmu->dev,
>>>> -	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>>> +	ret = report_iommu_fault(domain, dev, iova,
>>>> +		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>>>> +
>>>> +	if (ret == -ENOSYS)
>>>> +		dev_err_ratelimited(smmu->dev,
>>>> +		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>>>   			    fsr, iova, fsynr, cbfrsynra, idx);
>>>> -	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>>>> +	/*
>>>> +	 * If the iommu fault returns an error (except -ENOSYS) then assume that
>>>> +	 * they will handle resuming on their own
>>>> +	 */
>>>> +	if (!ret || ret == -ENOSYS)
>>>> +		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>>>
>>> Hmm, I don't grok this part. If the fault handler returned an error and
>>> we don't clear the FSR, won't we just re-take the irq immediately?
>>
>> If we don't touch the FSR at all, yes. Even if we clear the fault indicator
>> bits, the interrupt *might* remain asserted until a stalled transaction is
>> actually resolved - that's that lovely IMP-DEF corner.
>>
>> Robin.
>>
> 
> This is for stall-on-fault. The idea is that if the developer chooses to do so
> we would stall the GPU after a fault long enough to take a picture of it with
> devcoredump and then release the FSR. Since we can't take the devcoredump from
> the interrupt handler we schedule it in a worker and then return an error
> to let the main handler know that we'll come back around clear the FSR later
> when we are done.

Sure, but clearing FSR is not writing to RESUME to resolve the stalled 
transaction(s). You can already snarf the FSR contents from your 
report_iommu_fault() handler if you want to, so either way I don't see 
what's gained by not clearing it as expected at the point where we've 
handled the *interrupt*, even if it will take longer to decide what to 
do with the underlying *fault* that it signalled. I'm particularly not 
keen on having unusual behaviour in the core interrupt handling which 
callers may unwittingly trigger, for the sake of one 
very-very-driver-specific flow having a slightly richer debugging 
experience.

For actually *handling* faults, I thought we were going to need to hook 
up the new IOPF fault queue stuff anyway?

Robin.

> It is assumed that we'll have to turn off interrupts in our handler to allow
> this to work. Its all very implementation specific, but then again we're
> assuming that if you want to do this then you know what you are doing.
> 
> In that spirit the error that skips the FSR should probably be something
> specific instead of "all errors" - that way a well meaning handler that returns
> a -EINVAL doesn't accidentally break itself.
> 
> Jordan
> 
>>> I think
>>> it would be better to do this unconditionally, and print the "Unhandled
>>> context fault" message for any non-zero value of ret.
> 
>>>
>>> Will
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-01-26 11:40         ` Robin Murphy
@ 2021-01-26 16:05           ` Rob Clark
  2021-01-26 16:31             ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2021-01-26 16:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, linux-arm-msm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Greg Kroah-Hartman, Joerg Roedel, Krishna Reddy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List

On Tue, Jan 26, 2021 at 3:41 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-01-25 21:51, Jordan Crouse wrote:
> > On Fri, Jan 22, 2021 at 12:53:17PM +0000, Robin Murphy wrote:
> >> On 2021-01-22 12:41, Will Deacon wrote:
> >>> On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
> >>>> Call report_iommu_fault() to allow upper-level drivers to register their
> >>>> own fault handlers.
> >>>>
> >>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> >>>> ---
> >>>>
> >>>>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
> >>>>   1 file changed, 13 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>> index 0f28a8614da3..7fd18bbda8f5 100644
> >>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>> @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>>>    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>    struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>>>    int idx = smmu_domain->cfg.cbndx;
> >>>> +  int ret;
> >>>>    fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> >>>>    if (!(fsr & ARM_SMMU_FSR_FAULT))
> >>>> @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>>>    iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> >>>>    cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> >>>> -  dev_err_ratelimited(smmu->dev,
> >>>> -  "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> >>>> +  ret = report_iommu_fault(domain, dev, iova,
> >>>> +          fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >>>> +
> >>>> +  if (ret == -ENOSYS)
> >>>> +          dev_err_ratelimited(smmu->dev,
> >>>> +          "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> >>>>                        fsr, iova, fsynr, cbfrsynra, idx);
> >>>> -  arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> >>>> +  /*
> >>>> +   * If the iommu fault returns an error (except -ENOSYS) then assume that
> >>>> +   * they will handle resuming on their own
> >>>> +   */
> >>>> +  if (!ret || ret == -ENOSYS)
> >>>> +          arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> >>>
> >>> Hmm, I don't grok this part. If the fault handler returned an error and
> >>> we don't clear the FSR, won't we just re-take the irq immediately?
> >>
> >> If we don't touch the FSR at all, yes. Even if we clear the fault indicator
> >> bits, the interrupt *might* remain asserted until a stalled transaction is
> >> actually resolved - that's that lovely IMP-DEF corner.
> >>
> >> Robin.
> >>
> >
> > This is for stall-on-fault. The idea is that if the developer chooses to do so
> > we would stall the GPU after a fault long enough to take a picture of it with
> > devcoredump and then release the FSR. Since we can't take the devcoredump from
> > the interrupt handler we schedule it in a worker and then return an error
> > to let the main handler know that we'll come back around clear the FSR later
> > when we are done.
>
> Sure, but clearing FSR is not writing to RESUME to resolve the stalled
> transaction(s). You can already snarf the FSR contents from your
> report_iommu_fault() handler if you want to, so either way I don't see
> what's gained by not clearing it as expected at the point where we've
> handled the *interrupt*, even if it will take longer to decide what to
> do with the underlying *fault* that it signalled. I'm particularly not
> keen on having unusual behaviour in the core interrupt handling which
> callers may unwittingly trigger, for the sake of one
> very-very-driver-specific flow having a slightly richer debugging
> experience.

Tbf, "slightly" is an understatement.. it is a big enough improvement
that I've hacked up deferred resume several times to debug various
issues. ;-)

(Which is always a bit of a PITA because of things moving around in
arm-smmu as well as the drm side of things.)

But from my recollection, we can clear FSR immediately, all we need to
do is defer writing ARM_SMMU_CB_RESUME

BR,
-R

>
> For actually *handling* faults, I thought we were going to need to hook
> up the new IOPF fault queue stuff anyway?
>
> Robin.
>
> > It is assumed that we'll have to turn off interrupts in our handler to allow
> > this to work. Its all very implementation specific, but then again we're
> > assuming that if you want to do this then you know what you are doing.
> >
> > In that spirit the error that skips the FSR should probably be something
> > specific instead of "all errors" - that way a well meaning handler that returns
> > a -EINVAL doesn't accidentally break itself.
> >
> > Jordan
> >
> >>> I think
> >>> it would be better to do this unconditionally, and print the "Unhandled
> >>> context fault" message for any non-zero value of ret.
> >
> >>>
> >>> Will
> >>>
> >
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-01-26 16:05           ` Rob Clark
@ 2021-01-26 16:31             ` Robin Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Robin Murphy @ 2021-01-26 16:31 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, linux-arm-msm, list@263.net:IOMMU DRIVERS, iommu,
	Greg Kroah-Hartman, Joerg Roedel, Krishna Reddy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List

On 2021-01-26 16:05, Rob Clark wrote:
> On Tue, Jan 26, 2021 at 3:41 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-01-25 21:51, Jordan Crouse wrote:
>>> On Fri, Jan 22, 2021 at 12:53:17PM +0000, Robin Murphy wrote:
>>>> On 2021-01-22 12:41, Will Deacon wrote:
>>>>> On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
>>>>>> Call report_iommu_fault() to allow upper-level drivers to register their
>>>>>> own fault handlers.
>>>>>>
>>>>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>>>>> ---
>>>>>>
>>>>>>    drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +++++++++++++---
>>>>>>    1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>> index 0f28a8614da3..7fd18bbda8f5 100644
>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>> @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>>>>>     struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>>>     int idx = smmu_domain->cfg.cbndx;
>>>>>> +  int ret;
>>>>>>     fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>>>>>>     if (!(fsr & ARM_SMMU_FSR_FAULT))
>>>>>> @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>>>>>     iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>>>>>>     cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>>>>>> -  dev_err_ratelimited(smmu->dev,
>>>>>> -  "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>>>>> +  ret = report_iommu_fault(domain, dev, iova,
>>>>>> +          fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>>>>>> +
>>>>>> +  if (ret == -ENOSYS)
>>>>>> +          dev_err_ratelimited(smmu->dev,
>>>>>> +          "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>>>>>                         fsr, iova, fsynr, cbfrsynra, idx);
>>>>>> -  arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>>>>>> +  /*
>>>>>> +   * If the iommu fault returns an error (except -ENOSYS) then assume that
>>>>>> +   * they will handle resuming on their own
>>>>>> +   */
>>>>>> +  if (!ret || ret == -ENOSYS)
>>>>>> +          arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>>>>>
>>>>> Hmm, I don't grok this part. If the fault handler returned an error and
>>>>> we don't clear the FSR, won't we just re-take the irq immediately?
>>>>
>>>> If we don't touch the FSR at all, yes. Even if we clear the fault indicator
>>>> bits, the interrupt *might* remain asserted until a stalled transaction is
>>>> actually resolved - that's that lovely IMP-DEF corner.
>>>>
>>>> Robin.
>>>>
>>>
>>> This is for stall-on-fault. The idea is that if the developer chooses to do so
>>> we would stall the GPU after a fault long enough to take a picture of it with
>>> devcoredump and then release the FSR. Since we can't take the devcoredump from
>>> the interrupt handler we schedule it in a worker and then return an error
>>> to let the main handler know that we'll come back around clear the FSR later
>>> when we are done.
>>
>> Sure, but clearing FSR is not writing to RESUME to resolve the stalled
>> transaction(s). You can already snarf the FSR contents from your
>> report_iommu_fault() handler if you want to, so either way I don't see
>> what's gained by not clearing it as expected at the point where we've
>> handled the *interrupt*, even if it will take longer to decide what to
>> do with the underlying *fault* that it signalled. I'm particularly not
>> keen on having unusual behaviour in the core interrupt handling which
>> callers may unwittingly trigger, for the sake of one
>> very-very-driver-specific flow having a slightly richer debugging
>> experience.
> 
> Tbf, "slightly" is an understatement.. it is a big enough improvement
> that I've hacked up deferred resume several times to debug various
> issues. ;-)

Oh, fear not, I fully appreciate that keeping the GPU stalled on a 
faulting transaction is a game-changer in itself ("almost like a real 
MMU!"). That comment was only aimed at whatever the perceived benefits 
are of deliberately not trying to clear the SMMU interrupt (even if it 
*would* stay clear). I have no issue with calling report_iommu_fault(), 
I'm just wary of doing anything weird with the result.

> (Which is always a bit of a PITA because of things moving around in
> arm-smmu as well as the drm side of things.)
> 
> But from my recollection, we can clear FSR immediately, all we need to
> do is defer writing ARM_SMMU_CB_RESUME

Phew! Thanks for the reassurance :)

Robin.

> 
> BR,
> -R
> 
>>
>> For actually *handling* faults, I thought we were going to need to hook
>> up the new IOPF fault queue stuff anyway?
>>
>> Robin.
>>
>>> It is assumed that we'll have to turn off interrupts in our handler to allow
>>> this to work. Its all very implementation specific, but then again we're
>>> assuming that if you want to do this then you know what you are doing.
>>>
>>> In that spirit the error that skips the FSR should probably be something
>>> specific instead of "all errors" - that way a well meaning handler that returns
>>> a -EINVAL doesn't accidentally break itself.
>>>
>>> Jordan
>>>
>>>>> I think
>>>>> it would be better to do this unconditionally, and print the "Unhandled
>>>>> context fault" message for any non-zero value of ret.
>>>
>>>>>
>>>>> Will
>>>>>
>>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-26 18:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-24 19:15 [PATCH v2 0/3] iommu/arm-smmu: adreno-smmu page fault handling Jordan Crouse
2020-11-24 19:15 ` [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers Jordan Crouse
2021-01-22 12:41   ` Will Deacon
2021-01-22 12:53     ` Robin Murphy
2021-01-25 21:51       ` Jordan Crouse
2021-01-26 11:40         ` Robin Murphy
2021-01-26 16:05           ` Rob Clark
2021-01-26 16:31             ` Robin Murphy
2020-11-24 19:15 ` [PATCH v2 2/3] drm/msm: Add an adreno-smmu-priv callback to get pagefault info Jordan Crouse
2020-11-24 19:16 ` [PATCH v2 3/3] drm/msm: Improve the a6xx page fault handler Jordan Crouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).