linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
@ 2021-06-10 21:44 Rob Clark
  2021-06-10 21:44 ` [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers Rob Clark
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Rob Clark @ 2021-06-10 21:44 UTC (permalink / raw)
  To: dri-devel, iommu
  Cc: Konrad Dybcio, Akhil P Oommen, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones, Rob Clark,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Bjorn Andersson,
	Joerg Roedel, linux-arm-msm, Sharat Masetty, John Stultz,
	Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	Zhenzhong Duan, Kristian H. Kristensen, freedreno

From: Rob Clark <robdclark@chromium.org>

This picks up an earlier series[1] from Jordan, and adds additional
support needed to generate GPU devcore dumps on iova faults.  Original
description:

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
    GPU snapshotting needs to avoid crashdumper, and check the
    RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
    resume translation after it has had a chance to snapshot the GPUs
    state
v3: Always clear FSR even if the target driver is going to handle resume
v2: Fix comment wording and function pointer check per Rob Clark

[1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
    info
  drm/msm: Improve the a6xx page fault handler

Rob Clark (2):
  iommu/arm-smmu-qcom: Add stall support
  drm/msm: devcoredump iommu fault support

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
 drivers/gpu/drm/msm/msm_gem.h               |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
 drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
 drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
 drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
 drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
 drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
 include/linux/adreno-smmu-priv.h            |  38 ++++++-
 15 files changed, 367 insertions(+), 21 deletions(-)

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-06-10 21:44 [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Rob Clark
@ 2021-06-10 21:44 ` Rob Clark
  2021-06-14 17:26   ` Bjorn Andersson
  2021-06-10 21:44 ` [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info Rob Clark
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Rob Clark @ 2021-06-10 21:44 UTC (permalink / raw)
  To: dri-devel, iommu
  Cc: Rob Clark, Sai Prakash Ranjan, open list, Will Deacon,
	linux-arm-msm, Joerg Roedel, Robin Murphy, Jordan Crouse,
	Jordan Crouse, freedreno, moderated list:ARM SMMU DRIVERS

From: Jordan Crouse <jcrouse@codeaurora.org>

Call report_iommu_fault() to allow upper-level drivers to register their
own fault handlers.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..b4b32d31fc06 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	int idx = smmu_domain->cfg.cbndx;
+	int ret;
 
 	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
 	if (!(fsr & ARM_SMMU_FSR_FAULT))
@@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
 	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
 
-	dev_err_ratelimited(smmu->dev,
-	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+	ret = report_iommu_fault(domain, NULL, iova,
+		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+
+	if (ret == -ENOSYS)
+		dev_err_ratelimited(smmu->dev,
+		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
 			    fsr, iova, fsynr, cbfrsynra, idx);
 
 	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info
  2021-06-10 21:44 [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Rob Clark
  2021-06-10 21:44 ` [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers Rob Clark
@ 2021-06-10 21:44 ` Rob Clark
  2021-06-14 17:30   ` Bjorn Andersson
  2021-06-10 21:44 ` [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support Rob Clark
  2021-07-04 12:53 ` [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Dmitry Baryshkov
  3 siblings, 1 reply; 13+ messages in thread
From: Rob Clark @ 2021-06-10 21:44 UTC (permalink / raw)
  To: dri-devel, iommu
  Cc: Rob Clark, Isaac J. Manjarres, Sai Prakash Ranjan, open list,
	Will Deacon, linux-arm-msm, Joerg Roedel, Robin Murphy,
	Jordan Crouse, Jordan Crouse, John Stultz, Bjorn Andersson,
	freedreno, moderated list:ARM SMMU DRIVERS

From: Jordan Crouse <jcrouse@codeaurora.org>

Add a callback in adreno-smmu-priv to read interesting SMMU
registers to provide an opportunity for a richer debug experience
in the GPU driver.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 17 ++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 ++
 include/linux/adreno-smmu-priv.h           | 31 +++++++++++++++++++++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 98b3a1c2a181..b2e31ea84128 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -32,6 +32,22 @@ static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
 	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+static void qcom_adreno_smmu_get_fault_info(const void *cookie,
+		struct adreno_smmu_fault_info *info)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
+	info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
+	info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
+	info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
+	info->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
+	info->ttbr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
+	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0
 
 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -156,6 +172,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->cookie = smmu_domain;
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index c31a59d35c64..84c21c4b0691 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -224,6 +224,8 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_FSYNR0_WNR		BIT(4)
 
+#define ARM_SMMU_CB_FSYNR1		0x6c
+
 #define ARM_SMMU_CB_S1_TLBIVA		0x600
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index a889f28afb42..53fe32fb9214 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -8,6 +8,32 @@
 
 #include <linux/io-pgtable.h>
 
+/**
+ * struct adreno_smmu_fault_info - container for key fault information
+ *
+ * @far: The faulting IOVA from ARM_SMMU_CB_FAR
+ * @ttbr0: The current TTBR0 pagetable from ARM_SMMU_CB_TTBR0
+ * @contextidr: The value of ARM_SMMU_CB_CONTEXTIDR
+ * @fsr: The fault status from ARM_SMMU_CB_FSR
+ * @fsynr0: The value of FSYNR0 from ARM_SMMU_CB_FSYNR0
+ * @fsynr1: The value of FSYNR1 from ARM_SMMU_CB_FSYNR0
+ * @cbfrsynra: The value of CBFRSYNRA from ARM_SMMU_GR1_CBFRSYNRA(idx)
+ *
+ * This struct passes back key page fault information to the GPU driver
+ * through the get_fault_info function pointer.
+ * The GPU driver can use this information to print informative
+ * log messages and provide deeper GPU specific insight into the fault.
+ */
+struct adreno_smmu_fault_info {
+	u64 far;
+	u64 ttbr0;
+	u32 contextidr;
+	u32 fsr;
+	u32 fsynr0;
+	u32 fsynr1;
+	u32 cbfrsynra;
+};
+
 /**
  * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
  *
@@ -17,6 +43,8 @@
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
  *                 NULL config disables TTBR0 translation, otherwise
  *                 TTBR0 translation is enabled with the specified cfg
+ * @get_fault_info: Called by the GPU fault handler to get information about
+ *                  the fault
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -31,6 +59,7 @@ struct adreno_smmu_priv {
     const void *cookie;
     const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
     int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+    void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
 };
 
-#endif /* __ADRENO_SMMU_PRIV_H */
\ No newline at end of file
+#endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support
  2021-06-10 21:44 [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Rob Clark
  2021-06-10 21:44 ` [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers Rob Clark
  2021-06-10 21:44 ` [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info Rob Clark
@ 2021-06-10 21:44 ` Rob Clark
  2021-06-11 13:49   ` Jordan Crouse
  2021-06-14 17:54   ` Bjorn Andersson
  2021-07-04 12:53 ` [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Dmitry Baryshkov
  3 siblings, 2 replies; 13+ messages in thread
From: Rob Clark @ 2021-06-10 21:44 UTC (permalink / raw)
  To: dri-devel, iommu
  Cc: freedreno, linux-arm-msm, Jordan Crouse, Rob Clark, Will Deacon,
	Robin Murphy, Joerg Roedel, Bjorn Andersson, Isaac J. Manjarres,
	moderated list:ARM SMMU DRIVERS, open list

From: Rob Clark <robdclark@chromium.org>

Add, via the adreno-smmu-priv interface, a way for the GPU to request
the SMMU to stall translation on faults, and then later resume the
translation, either retrying or terminating the current translation.

This will be used on the GPU side to "freeze" the GPU while we snapshot
useful state for devcoredump.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 33 ++++++++++++++++++++++
 include/linux/adreno-smmu-priv.h           |  7 +++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index b2e31ea84128..61fc645c1325 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -13,6 +13,7 @@ struct qcom_smmu {
 	struct arm_smmu_device smmu;
 	bool bypass_quirk;
 	u8 bypass_cbndx;
+	u32 stall_enabled;
 };
 
 static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
@@ -23,12 +24,17 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
 static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
 		u32 reg)
 {
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+
 	/*
 	 * On the GPU device we want to process subsequent transactions after a
 	 * fault to keep the GPU from hanging
 	 */
 	reg |= ARM_SMMU_SCTLR_HUPCF;
 
+	if (qsmmu->stall_enabled & BIT(idx))
+		reg |= ARM_SMMU_SCTLR_CFCFG;
+
 	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
@@ -48,6 +54,31 @@ static void qcom_adreno_smmu_get_fault_info(const void *cookie,
 	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
 }
 
+static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
+
+	if (enabled)
+		qsmmu->stall_enabled |= BIT(cfg->cbndx);
+	else
+		qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
+}
+
+static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	u32 reg = 0;
+
+	if (terminate)
+		reg |= ARM_SMMU_RESUME_TERMINATE;
+
+	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0
 
 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -173,6 +204,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
+	priv->set_stall = qcom_adreno_smmu_set_stall;
+	priv->resume_translation = qcom_adreno_smmu_resume_translation;
 
 	return 0;
 }
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index 53fe32fb9214..c637e0997f6d 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -45,6 +45,11 @@ struct adreno_smmu_fault_info {
  *                 TTBR0 translation is enabled with the specified cfg
  * @get_fault_info: Called by the GPU fault handler to get information about
  *                  the fault
+ * @set_stall:     Configure whether stall on fault (CFCFG) is enabled.  Call
+ *                 before set_ttbr0_cfg().  If stalling on fault is enabled,
+ *                 the GPU driver must call resume_translation()
+ * @resume_translation: Resume translation after a fault
+ *
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -60,6 +65,8 @@ struct adreno_smmu_priv {
     const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
     int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
     void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
+    void (*set_stall)(const void *cookie, bool enabled);
+    void (*resume_translation)(const void *cookie, bool terminate);
 };
 
 #endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support
  2021-06-10 21:44 ` [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support Rob Clark
@ 2021-06-11 13:49   ` Jordan Crouse
  2021-06-14 17:54   ` Bjorn Andersson
  1 sibling, 0 replies; 13+ messages in thread
From: Jordan Crouse @ 2021-06-11 13:49 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, iommu, freedreno, linux-arm-msm, Rob Clark,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Isaac J. Manjarres, moderated list:ARM SMMU DRIVERS, open list

On Thu, Jun 10, 2021 at 02:44:12PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Add, via the adreno-smmu-priv interface, a way for the GPU to request
> the SMMU to stall translation on faults, and then later resume the
> translation, either retrying or terminating the current translation.
> 
> This will be used on the GPU side to "freeze" the GPU while we snapshot
> useful state for devcoredump.
> 

Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>

> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 33 ++++++++++++++++++++++
>  include/linux/adreno-smmu-priv.h           |  7 +++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index b2e31ea84128..61fc645c1325 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -13,6 +13,7 @@ struct qcom_smmu {
>  	struct arm_smmu_device smmu;
>  	bool bypass_quirk;
>  	u8 bypass_cbndx;
> +	u32 stall_enabled;
>  };
>  
>  static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
> @@ -23,12 +24,17 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
>  static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
>  		u32 reg)
>  {
> +	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> +
>  	/*
>  	 * On the GPU device we want to process subsequent transactions after a
>  	 * fault to keep the GPU from hanging
>  	 */
>  	reg |= ARM_SMMU_SCTLR_HUPCF;
>  
> +	if (qsmmu->stall_enabled & BIT(idx))
> +		reg |= ARM_SMMU_SCTLR_CFCFG;
> +
>  	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
>  }
>  
> @@ -48,6 +54,31 @@ static void qcom_adreno_smmu_get_fault_info(const void *cookie,
>  	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
>  }
>  
> +static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> +{
> +	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> +
> +	if (enabled)
> +		qsmmu->stall_enabled |= BIT(cfg->cbndx);
> +	else
> +		qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> +}
> +
> +static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate)
> +{
> +	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	u32 reg = 0;
> +
> +	if (terminate)
> +		reg |= ARM_SMMU_RESUME_TERMINATE;
> +
> +	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
> +}
> +
>  #define QCOM_ADRENO_SMMU_GPU_SID 0
>  
>  static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> @@ -173,6 +204,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>  	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>  	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>  	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
> +	priv->set_stall = qcom_adreno_smmu_set_stall;
> +	priv->resume_translation = qcom_adreno_smmu_resume_translation;
>  
>  	return 0;
>  }
> diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
> index 53fe32fb9214..c637e0997f6d 100644
> --- a/include/linux/adreno-smmu-priv.h
> +++ b/include/linux/adreno-smmu-priv.h
> @@ -45,6 +45,11 @@ struct adreno_smmu_fault_info {
>   *                 TTBR0 translation is enabled with the specified cfg
>   * @get_fault_info: Called by the GPU fault handler to get information about
>   *                  the fault
> + * @set_stall:     Configure whether stall on fault (CFCFG) is enabled.  Call
> + *                 before set_ttbr0_cfg().  If stalling on fault is enabled,
> + *                 the GPU driver must call resume_translation()
> + * @resume_translation: Resume translation after a fault
> + *
>   *
>   * The GPU driver (drm/msm) and adreno-smmu work together for controlling
>   * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
> @@ -60,6 +65,8 @@ struct adreno_smmu_priv {
>      const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
>      int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
>      void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
> +    void (*set_stall)(const void *cookie, bool enabled);
> +    void (*resume_translation)(const void *cookie, bool terminate);
>  };
>  
>  #endif /* __ADRENO_SMMU_PRIV_H */
> -- 
> 2.31.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers
  2021-06-10 21:44 ` [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers Rob Clark
@ 2021-06-14 17:26   ` Bjorn Andersson
  0 siblings, 0 replies; 13+ messages in thread
From: Bjorn Andersson @ 2021-06-14 17:26 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, iommu, Rob Clark, open list, Will Deacon,
	linux-arm-msm, Robin Murphy, Jordan Crouse, freedreno,
	moderated list:ARM SMMU DRIVERS

On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote:

> From: Jordan Crouse <jcrouse@codeaurora.org>
> 
> Call report_iommu_fault() to allow upper-level drivers to register their
> own fault handlers.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> Acked-by: Will Deacon <will@kernel.org>

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 6f72c4d208ca..b4b32d31fc06 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	int idx = smmu_domain->cfg.cbndx;
> +	int ret;
>  
>  	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>  	if (!(fsr & ARM_SMMU_FSR_FAULT))
> @@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>  	iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>  	cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>  
> -	dev_err_ratelimited(smmu->dev,
> -	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> +	ret = report_iommu_fault(domain, NULL, iova,
> +		fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +
> +	if (ret == -ENOSYS)
> +		dev_err_ratelimited(smmu->dev,
> +		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>  			    fsr, iova, fsynr, cbfrsynra, idx);
>  
>  	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> -- 
> 2.31.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info
  2021-06-10 21:44 ` [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info Rob Clark
@ 2021-06-14 17:30   ` Bjorn Andersson
  0 siblings, 0 replies; 13+ messages in thread
From: Bjorn Andersson @ 2021-06-14 17:30 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Isaac J. Manjarres, Sai Prakash Ranjan, open list,
	Will Deacon, linux-arm-msm, Joerg Roedel, Jordan Crouse,
	dri-devel, Jordan Crouse, iommu, John Stultz, freedreno,
	moderated list:ARM SMMU DRIVERS, Robin Murphy

On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote:

> From: Jordan Crouse <jcrouse@codeaurora.org>
> 
> Add a callback in adreno-smmu-priv to read interesting SMMU
> registers to provide an opportunity for a richer debug experience
> in the GPU driver.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> Signed-off-by: Rob Clark <robdclark@chromium.org>

I presume this implies that more generic options has been discussed.
Regardless, if further conclusions are made in that regard I expect that
this could serve as a base for such efforts.

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 17 ++++++++++++
>  drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 ++
>  include/linux/adreno-smmu-priv.h           | 31 +++++++++++++++++++++-
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 98b3a1c2a181..b2e31ea84128 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -32,6 +32,22 @@ static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
>  	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
>  }
>  
> +static void qcom_adreno_smmu_get_fault_info(const void *cookie,
> +		struct adreno_smmu_fault_info *info)
> +{
> +	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
> +	info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
> +	info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
> +	info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
> +	info->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
> +	info->ttbr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
> +	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
> +}
> +
>  #define QCOM_ADRENO_SMMU_GPU_SID 0
>  
>  static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> @@ -156,6 +172,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>  	priv->cookie = smmu_domain;
>  	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>  	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
> +	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
>  
>  	return 0;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index c31a59d35c64..84c21c4b0691 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -224,6 +224,8 @@ enum arm_smmu_cbar_type {
>  #define ARM_SMMU_CB_FSYNR0		0x68
>  #define ARM_SMMU_FSYNR0_WNR		BIT(4)
>  
> +#define ARM_SMMU_CB_FSYNR1		0x6c
> +
>  #define ARM_SMMU_CB_S1_TLBIVA		0x600
>  #define ARM_SMMU_CB_S1_TLBIASID		0x610
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
> diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
> index a889f28afb42..53fe32fb9214 100644
> --- a/include/linux/adreno-smmu-priv.h
> +++ b/include/linux/adreno-smmu-priv.h
> @@ -8,6 +8,32 @@
>  
>  #include <linux/io-pgtable.h>
>  
> +/**
> + * struct adreno_smmu_fault_info - container for key fault information
> + *
> + * @far: The faulting IOVA from ARM_SMMU_CB_FAR
> + * @ttbr0: The current TTBR0 pagetable from ARM_SMMU_CB_TTBR0
> + * @contextidr: The value of ARM_SMMU_CB_CONTEXTIDR
> + * @fsr: The fault status from ARM_SMMU_CB_FSR
> + * @fsynr0: The value of FSYNR0 from ARM_SMMU_CB_FSYNR0
> + * @fsynr1: The value of FSYNR1 from ARM_SMMU_CB_FSYNR0
> + * @cbfrsynra: The value of CBFRSYNRA from ARM_SMMU_GR1_CBFRSYNRA(idx)
> + *
> + * This struct passes back key page fault information to the GPU driver
> + * through the get_fault_info function pointer.
> + * The GPU driver can use this information to print informative
> + * log messages and provide deeper GPU specific insight into the fault.
> + */
> +struct adreno_smmu_fault_info {
> +	u64 far;
> +	u64 ttbr0;
> +	u32 contextidr;
> +	u32 fsr;
> +	u32 fsynr0;
> +	u32 fsynr1;
> +	u32 cbfrsynra;
> +};
> +
>  /**
>   * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
>   *
> @@ -17,6 +43,8 @@
>   * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
>   *                 NULL config disables TTBR0 translation, otherwise
>   *                 TTBR0 translation is enabled with the specified cfg
> + * @get_fault_info: Called by the GPU fault handler to get information about
> + *                  the fault
>   *
>   * The GPU driver (drm/msm) and adreno-smmu work together for controlling
>   * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
> @@ -31,6 +59,7 @@ struct adreno_smmu_priv {
>      const void *cookie;
>      const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
>      int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
> +    void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
>  };
>  
> -#endif /* __ADRENO_SMMU_PRIV_H */
> \ No newline at end of file
> +#endif /* __ADRENO_SMMU_PRIV_H */
> -- 
> 2.31.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support
  2021-06-10 21:44 ` [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support Rob Clark
  2021-06-11 13:49   ` Jordan Crouse
@ 2021-06-14 17:54   ` Bjorn Andersson
  1 sibling, 0 replies; 13+ messages in thread
From: Bjorn Andersson @ 2021-06-14 17:54 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, iommu, freedreno, linux-arm-msm, Jordan Crouse,
	Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel,
	Isaac J. Manjarres, moderated list:ARM SMMU DRIVERS, open list

On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote:

> From: Rob Clark <robdclark@chromium.org>
> 
> Add, via the adreno-smmu-priv interface, a way for the GPU to request
> the SMMU to stall translation on faults, and then later resume the
> translation, either retrying or terminating the current translation.
> 
> This will be used on the GPU side to "freeze" the GPU while we snapshot
> useful state for devcoredump.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 33 ++++++++++++++++++++++
>  include/linux/adreno-smmu-priv.h           |  7 +++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index b2e31ea84128..61fc645c1325 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -13,6 +13,7 @@ struct qcom_smmu {
>  	struct arm_smmu_device smmu;
>  	bool bypass_quirk;
>  	u8 bypass_cbndx;
> +	u32 stall_enabled;
>  };
>  
>  static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
> @@ -23,12 +24,17 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
>  static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
>  		u32 reg)
>  {
> +	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> +
>  	/*
>  	 * On the GPU device we want to process subsequent transactions after a
>  	 * fault to keep the GPU from hanging
>  	 */
>  	reg |= ARM_SMMU_SCTLR_HUPCF;
>  
> +	if (qsmmu->stall_enabled & BIT(idx))
> +		reg |= ARM_SMMU_SCTLR_CFCFG;
> +
>  	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
>  }
>  
> @@ -48,6 +54,31 @@ static void qcom_adreno_smmu_get_fault_info(const void *cookie,
>  	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
>  }
>  
> +static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> +{
> +	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> +
> +	if (enabled)
> +		qsmmu->stall_enabled |= BIT(cfg->cbndx);
> +	else
> +		qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> +}
> +
> +static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate)
> +{
> +	struct arm_smmu_domain *smmu_domain = (void *)cookie;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	u32 reg = 0;
> +
> +	if (terminate)
> +		reg |= ARM_SMMU_RESUME_TERMINATE;
> +
> +	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
> +}
> +
>  #define QCOM_ADRENO_SMMU_GPU_SID 0
>  
>  static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> @@ -173,6 +204,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>  	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>  	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>  	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
> +	priv->set_stall = qcom_adreno_smmu_set_stall;
> +	priv->resume_translation = qcom_adreno_smmu_resume_translation;
>  
>  	return 0;
>  }
> diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
> index 53fe32fb9214..c637e0997f6d 100644
> --- a/include/linux/adreno-smmu-priv.h
> +++ b/include/linux/adreno-smmu-priv.h
> @@ -45,6 +45,11 @@ struct adreno_smmu_fault_info {
>   *                 TTBR0 translation is enabled with the specified cfg
>   * @get_fault_info: Called by the GPU fault handler to get information about
>   *                  the fault
> + * @set_stall:     Configure whether stall on fault (CFCFG) is enabled.  Call
> + *                 before set_ttbr0_cfg().  If stalling on fault is enabled,
> + *                 the GPU driver must call resume_translation()
> + * @resume_translation: Resume translation after a fault
> + *
>   *
>   * The GPU driver (drm/msm) and adreno-smmu work together for controlling
>   * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
> @@ -60,6 +65,8 @@ struct adreno_smmu_priv {
>      const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
>      int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
>      void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
> +    void (*set_stall)(const void *cookie, bool enabled);
> +    void (*resume_translation)(const void *cookie, bool terminate);
>  };
>  
>  #endif /* __ADRENO_SMMU_PRIV_H */
> -- 
> 2.31.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
  2021-06-10 21:44 [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Rob Clark
                   ` (2 preceding siblings ...)
  2021-06-10 21:44 ` [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support Rob Clark
@ 2021-07-04 12:53 ` Dmitry Baryshkov
  2021-07-04 18:20   ` Rob Clark
  3 siblings, 1 reply; 13+ messages in thread
From: Dmitry Baryshkov @ 2021-07-04 12:53 UTC (permalink / raw)
  To: Rob Clark, dri-devel, iommu
  Cc: Konrad Dybcio, Akhil P Oommen, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones, Rob Clark,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Bjorn Andersson,
	Joerg Roedel, linux-arm-msm, Sharat Masetty, John Stultz,
	Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	Zhenzhong Duan, Kristian H. Kristensen, freedreno

[-- Attachment #1: Type: text/plain, Size: 3072 bytes --]

Hi,

I've had splash screen disabled on my RB3. However once I've enabled it, 
I've got the attached crash during the boot on the msm/msm-next. It 
looks like it is related to this particular set of changes.

On 11/06/2021 00:44, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> This picks up an earlier series[1] from Jordan, and adds additional
> support needed to generate GPU devcore dumps on iova faults.  Original
> description:
> 
> This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> a adreno-smmu-priv function hook to capture a handful of important debugging
> registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> third patch to print more detailed information on page fault such as the TTBR0
> for the pagetable that caused the fault and the source of the fault as
> determined by a combination of the FSYNR1 register and an internal GPU
> register.
> 
> This code provides a solid base that we can expand on later for even more
> extensive GPU side page fault debugging capabilities.
> 
> v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
>      GPU snapshotting needs to avoid crashdumper, and check the
>      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
>      resume translation after it has had a chance to snapshot the GPUs
>      state
> v3: Always clear FSR even if the target driver is going to handle resume
> v2: Fix comment wording and function pointer check per Rob Clark
> 
> [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> 
> Jordan Crouse (3):
>    iommu/arm-smmu: Add support for driver IOMMU fault handlers
>    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
>      info
>    drm/msm: Improve the a6xx page fault handler
> 
> Rob Clark (2):
>    iommu/arm-smmu-qcom: Add stall support
>    drm/msm: devcoredump iommu fault support
> 
>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
>   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
>   drivers/gpu/drm/msm/msm_gem.h               |   1 +
>   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
>   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
>   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
>   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
>   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
>   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
>   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
>   include/linux/adreno-smmu-priv.h            |  38 ++++++-
>   15 files changed, 367 insertions(+), 21 deletions(-)
> 


-- 
With best wishes
Dmitry

[-- Attachment #2: log-rb3-crash.txt --]
[-- Type: text/plain, Size: 29485 bytes --]

\0\0Handling Cmd: set_active:a
SetActiveSlot: _a already active slot
Handling Cmd: download:061b8800
Download Finished
Handling Cmd: boot
A/B retry count NOT decremented
Booting Into Mission Mode
No dtbo partition is found, Skip dtbo
Exit key detection timer
GetVmData: making ScmCall to get HypInfo
GetVmData: No Vm data present! Status = (0x3)
No Ffbm cookie found, ignore: Not Found
Memory Base Address: 0x80000000
Decompressing kernel image start: 32313 ms
Decompressing kernel image done: 35477 ms
BootLinux: failed to get dtbo image
DTB offset is incorrect, kernel image does not have appended DTB
Cmdline: ignore_loglevel console=ttyMSM0,115200n8 clk_ignore_unused pd_ignore_unused earlycon pcie_pme=nomsi root=PARTLABEL=userdata rootwait androidboot.bootdevice=1d84000.ufshc androidboot.serialno=8f186bb6 androidboot.baseband=msm msm_drm.dsi_display0=
RAM Partitions
Add Base: 0x0000000080000000 Available Length: 0x00000000FDFA0000 
WARNING: Unsupported EFI_RAMPARTITION_PROTOCOL
ERROR: Could not get splash memory region node
kaslr-Seed is added to chosen node

Shutting Down UEFI Boot Services: 36987 ms
BDS: LogFs sync skipped, Unsupported
App Log Flush : 0 ms
Exit BS        [37141] UEFI End
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x517f803c]
[    0.000000] Linux version 5.13.0-rc3-00115-g9b6193ea776c-dirty (lumag@eriador) (aarch64-linux-gnu-gcc (Debian 11.1.0-1) 11.1.0, GNU ld (GNU Binutils for Debian) 2.35.2) #142 SMP PREEMPT Sun Jul 4 15:25:15 MSK 2021
[    0.000000] Machine model: Thundercomm Dragonboard 845c
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.000000] earlycon: qcom_geni0 at MMIO 0x0000000000a84000 (options '115200n8')
[    0.000000] printk: bootconsole [qcom_geni0] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x000000017df9ffff]
[    0.000000] NUMA: NODE_DATA [mem 0x17d78c200-0x17d78dfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000017df9ffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000856fffff]
[    0.000000]   node   0: [mem 0x0000000085700000-0x0000000085cfffff]
[    0.000000]   node   0: [mem 0x0000000085d00000-0x0000000085dfffff]
[    0.000000]   node   0: [mem 0x0000000085e00000-0x0000000085efffff]
[    0.000000]   node   0: [mem 0x0000000085f00000-0x0000000085fbffff]
[    0.000000]   node   0: [mem 0x0000000085fc0000-0x00000000890fffff]
[    0.000000]   node   0: [mem 0x0000000089100000-0x000000008aafffff]
[    0.000000]   node   0: [mem 0x000000008ab00000-0x000000008c416fff]
[    0.000000]   node   0: [mem 0x000000008c417000-0x000000008c4fffff]
[    0.000000]   node   0: [mem 0x000000008c500000-0x0000000097bfffff]
[    0.000000]   node   0: [mem 0x0000000097c00000-0x000000017df9ffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000017df9ffff]
[    0.000000] On node 0 totalpages: 1040288
[    0.000000]   DMA zone: 8192 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 524288 pages, LIFO batch:63
[    0.000000]   Normal zone: 8063 pages used for memmap
[    0.000000]   Normal zone: 516000 pages, LIFO batch:63
[    0.000000]   Normal zone: 96 pages in unavailable ranges
[    0.000000] cma: Reserved 32 MiB at 0x00000000fe000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] psci: OSI mode supported.
[    0.000000] percpu: Embedded 23 pages/cpu s56152 r8192 d29864 u94208
[    0.000000] pcpu-alloc: s56152 r8192 d29864 u94208 alloc=23*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Hardware dirty bit management
[    0.000000] CPU features: kernel page table isolation forced ON by KASLR
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1024033
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: ignore_loglevel console=ttyMSM0,115200n8 clk_ignore_unused pd_ignore_unused earlycon pcie_pme=nomsi root=PARTLABEL=userdata rootwait androidboot.bootdevice=1d84000.ufshc androidboot.serialno=8f186bb6 androidboot.baseband=msm msm_drm.dsi_display0=dsi_lt9611_1080_video_display:
[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x00000000fa000000-0x00000000fe000000] (64MB)
[    0.000000] Memory: 3593368K/4161152K available (14400K kernel code, 2062K rwdata, 7628K rodata, 3328K init, 418K bss, 535016K reserved, 32768K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU event tracing is enabled.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
[    0.000000] 	Trampoline variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv3: 768 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] GICv3: Distributor has no Range Selector support
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: 16 PPIs implemented
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000017a60000
[    0.000000] ITS: No ITS available, not enabling LPIs
[    0.000000] random: get_random_bytes called from start_kernel+0x348/0x52c with crng_init=0
[    0.000000] arch_timer: cp15 and mmio timer(s) running at 19.20MHz (virt/virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000001] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.008313] Console: colour dummy device 80x25
[    0.012882] Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=76800)
[    0.023232] pid_max: default: 32768 minimum: 301
[    0.027957] LSM: Security Framework initializing
[    0.032677] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.040145] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.049625] rcu: Hierarchical SRCU implementation.
[    0.055932] EFI services will not be available.
[    0.060866] smp: Bringing up secondary CPUs ...
[    0.067019] Detected VIPT I-cache on CPU1
[    0.067073] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
[    0.067166] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
[    0.068727] Detected VIPT I-cache on CPU2
[    0.068761] GICv3: CPU2: found redistributor 200 region 0:0x0000000017aa0000
[    0.068827] CPU2: Booted secondary processor 0x0000000200 [0x517f803c]
[    0.070471] Detected VIPT I-cache on CPU3
[    0.070496] GICv3: CPU3: found redistributor 300 region 0:0x0000000017ac0000
[    0.070555] CPU3: Booted secondary processor 0x0000000300 [0x517f803c]
[    0.072816] CPU features: detected: Spectre-v2
[    0.072830] Detected VIPT I-cache on CPU4
[    0.072856] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000
[    0.072918] CPU4: Booted secondary processor 0x0000000400 [0x516f802d]
[    0.074954] Detected VIPT I-cache on CPU5
[    0.074984] GICv3: CPU5: found redistributor 500 region 0:0x0000000017b00000
[    0.075047] CPU5: Booted secondary processor 0x0000000500 [0x516f802d]
[    0.077204] Detected VIPT I-cache on CPU6
[    0.077235] GICv3: CPU6: found redistributor 600 region 0:0x0000000017b20000
[    0.077297] CPU6: Booted secondary processor 0x0000000600 [0x516f802d]
[    0.079653] Detected VIPT I-cache on CPU7
[    0.079686] GICv3: CPU7: found redistributor 700 region 0:0x0000000017b40000
[    0.079749] CPU7: Booted secondary processor 0x0000000700 [0x516f802d]
[    0.079831] smp: Brought up 1 node, 8 CPUs
[    0.212583] SMP: Total of 8 processors activated.
[    0.217331] CPU features: detected: 32-bit EL0 Support
[    0.222532] CPU features: detected: 32-bit EL1 Support
[    0.227722] CPU features: detected: Data cache clean to the PoU not required for I/D coherence
[    0.236401] CPU features: detected: Common not Private translations
[    0.242723] CPU features: detected: CRC32 instructions
[    0.247912] CPU features: detected: RCpc load-acquire (LDAPR)
[    0.253711] CPU features: detected: LSE atomic instructions
[    0.259338] CPU features: detected: Privileged Access Never
[    0.264966] CPU features: detected: RAS Extension Support
[    0.294165] CPU: All CPU(s) started at EL1
[    0.298362] alternatives: patching kernel code
[    0.304965] devtmpfs: initialized
[    0.319923] KASLR enabled
[    0.322745] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.332582] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
[    0.340366] pinctrl core: initialized pinctrl subsystem
[    0.346196] DMI not present or invalid.
[    0.350404] NET: Registered protocol family 16
[    0.356280] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    0.363529] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.371545] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.379581] audit: initializing netlink subsys (disabled)
[    0.385161] audit: type=2000 audit(0.292:1): state=initialized audit_enabled=0 res=1
[    0.385517] thermal_sys: Registered thermal governor 'step_wise'
[    0.392973] thermal_sys: Registered thermal governor 'power_allocator'
[    0.401318] cpuidle: using governor menu
[    0.411961] NET: Registered protocol family 42
[    0.416616] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.423672] ASID allocator initialised with 32768 entries
[    0.430661] Serial: AMBA PL011 UART driver
[    0.447842] platform 1c06000.phy: Fixing up cyclic dependency with 100000.clock-controller
[    0.456596] platform 1c0a000.phy: Fixing up cyclic dependency with 100000.clock-controller
[    0.493694] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.500473] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[    0.507245] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.514004] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[    0.521872] cryptd: max_cpu_qlen set to 1000
[    0.529643] VBAT: supplied by DC12V
[    0.533365] VBAT_SOM: supplied by DC12V
[    0.537418] VDC_3V3: supplied by DC12V
[    0.541371] VDC_5V: supplied by DC12V
[    0.545342] vph_pwr: supplied by VBAT_SOM
[    0.549974] iommu: Default domain type: Translated 
[    0.555030] vgaarb: loaded
[    0.557977] SCSI subsystem initialized
[    0.561867] libata version 3.00 loaded.
[    0.565887] usbcore: registered new interface driver usbfs
[    0.571455] usbcore: registered new interface driver hub
[    0.576838] usbcore: registered new device driver usb
[    0.582196] mc: Linux media interface: v0.10
[    0.586535] videodev: Linux video capture interface: v2.00
[    0.592128] pps_core: LinuxPPS API ver. 1 registered
[    0.597147] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.606357] PTP clock support registered
[    0.610437] EDAC MC: Ver: 3.0.0
[    0.614065] qcom_scm: convention: smc arm 64
[    0.618941] FPGA manager framework
[    0.622446] Advanced Linux Sound Architecture Driver Initialized.
[    0.628970] Bluetooth: Core ver 2.22
[    0.632605] NET: Registered protocol family 31
[    0.637098] Bluetooth: HCI device and connection manager initialized
[    0.643517] Bluetooth: HCI socket layer initialized
[    0.648444] Bluetooth: L2CAP socket layer initialized
[    0.653551] Bluetooth: SCO socket layer initialized
[    0.658862] clocksource: Switched to clocksource arch_sys_counter
[    0.665127] VFS: Disk quotas dquot_6.6.0
[    0.669140] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.681221] NET: Registered protocol family 2
[    0.685747] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.695536] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.704215] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.712366] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    0.720117] TCP: Hash tables configured (established 32768 bind 32768)
[    0.726796] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.733633] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.740999] NET: Registered protocol family 1
[    0.745779] RPC: Registered named UNIX socket transport module.
[    0.751771] RPC: Registered udp transport module.
[    0.756530] RPC: Registered tcp transport module.
[    0.761287] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.768282] PCI: CLS 0 bytes, default 64
[    0.772442] Unpacking initramfs...
[    0.776152] hw perfevents: enabled with armv8_pmuv3 PMU driver, 7 counters available
[    0.784475] kvm [1]: HYP mode not available
[    0.792629] Initialise system trusted keyrings
[    0.797251] workingset: timestamp_bits=44 max_order=20 bucket_order=0
[    0.808337] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.814681] NFS: Registering the id_resolver key type
[    0.819805] Key type id_resolver registered
[    0.824038] Key type id_legacy registered
[    0.828149] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.834912] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
[    0.842521] 9p: Installing v9fs 9p2000 file system support
[    0.881103] Key type asymmetric registered
[    0.885248] Asymmetric key parser 'x509' registered
[    0.890203] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244)
[    0.897664] io scheduler mq-deadline registered
[    0.902244] io scheduler kyber registered
[    0.918213] bob: Setting 3312000-3600000uV
[    0.922549] bob: supplied by vph_pwr
[    0.929266] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.936983] msm_serial: driver initialized
[    0.942224] arm-smmu 15000000.iommu: probing hardware configuration...
[    0.948831] arm-smmu 15000000.iommu: SMMUv2 with:
[    0.953607] arm-smmu 15000000.iommu: 	stage 1 translation
[    0.959060] arm-smmu 15000000.iommu: 	non-coherent table walk
[    0.964865] arm-smmu 15000000.iommu: 	(IDR0.CTTW overridden by FW configuration)
[    0.972331] arm-smmu 15000000.iommu: 	stream matching with 76 register groups
[    0.979550] arm-smmu 15000000.iommu: 	45 context banks (0 stage-2 only)
[    0.986694] arm-smmu 15000000.iommu: 	Supported page sizes: 0x61311000
[    0.993287] arm-smmu 15000000.iommu: 	Stage-1: 48-bit VA -> 48-bit IPA
[    1.012286] loop: module loaded
[    1.016058] megasas: 07.714.04.00-rc1
[    1.022118] spmi spmi-0: PMIC arbiter version v5 (0x50000000)
[    1.036242] libphy: Fixed MDIO Bus: probed
[    1.041077] tun: Universal TUN/TAP device driver, 1.6
[    1.046346] CAN device driver interface
[    1.050486] usbcore: registered new interface driver asix
[    1.055970] usbcore: registered new interface driver ax88179_178a
[    1.062236] VFIO - User Level meta-driver version: 0.3
[    1.068820] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.075420] ehci-pci: EHCI PCI platform driver
[    1.079933] ehci-platform: EHCI generic platform driver
[    1.085345] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    1.091604] ohci-pci: OHCI PCI platform driver
[    1.096121] ohci-platform: OHCI generic platform driver
[    1.101857] usbcore: registered new interface driver usb-storage
[    1.109186] udc-core: couldn't find an available UDC - added [zero] to list of pending drivers
[    1.119295] rtc-pm8xxx c440000.spmi:pmic@0:rtc@6000: registered as rtc0
[    1.126041] rtc-pm8xxx c440000.spmi:pmic@0:rtc@6000: setting system clock to 1970-01-01T00:00:37 UTC (37)
[    1.135823] i2c /dev entries driver
[    1.146180] Bluetooth: HCI UART driver ver 2.3
[    1.150692] Bluetooth: HCI UART protocol H4 registered
[    1.155881] Bluetooth: HCI UART protocol BCSP registered
[    1.161264] Bluetooth: HCI UART protocol LL registered
[    1.166467] Bluetooth: HCI UART protocol Three-wire (H5) registered
[    1.172875] Bluetooth: HCI UART protocol Broadcom registered
[    1.178601] Bluetooth: HCI UART protocol QCA registered
[    1.185508] sdhci: Secure Digital Host Controller Interface driver
[    1.191750] sdhci: Copyright(c) Pierre Ossman
[    1.196249] Synopsys Designware Multimedia Card Interface Driver
[    1.202775] sdhci-pltfm: SDHCI platform and OF driver helper
[    1.210549] ledtrig-cpu: registered to indicate activity on CPUs
[    1.216870] SMCCC: SOC_ID: ARCH_FEATURES(ARCH_SOC_ID) returned error: fffffffffffffffe
[    1.225556] usbcore: registered new interface driver usbhid
[    1.231194] usbhid: USB HID core driver
[    1.236471] qcom_q6v5_pas remoteproc-adsp: supply cx not found, using dummy regulator
[    1.244499] qcom_q6v5_pas remoteproc-adsp: supply px not found, using dummy regulator
[    1.253485] remoteproc remoteproc0: remoteproc-adsp is available
[    1.260008] qcom_q6v5_pas remoteproc-cdsp: supply cx not found, using dummy regulator
[    1.268045] qcom_q6v5_pas remoteproc-cdsp: supply px not found, using dummy regulator
[    1.277141] remoteproc remoteproc1: remoteproc-cdsp is available
[    1.313294] NET: Registered protocol family 17
[    1.317807] can: controller area network core
[    1.322237] NET: Registered protocol family 29
[    1.326722] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[    1.332699] Bluetooth: HIDP socket layer initialized
[    1.337824] 9pnet: Installing 9P2000 support
[    1.342172] Key type dns_resolver registered
[    1.346923] registered taskstats version 1
[    1.351069] Loading compiled-in X.509 certificates
[    1.358036] amba 6041000.funnel: Fixing up cyclic dependency with 6002000.stm
[    1.366636] amba 6045000.funnel: Fixing up cyclic dependency with 6043000.funnel
[    1.374140] amba 6045000.funnel: Fixing up cyclic dependency with 6041000.funnel
[    1.382975] amba 6047000.etf: Fixing up cyclic dependency with 6045000.funnel
[    1.390208] amba 6047000.etf: Fixing up cyclic dependency with 6046000.replicator
[    1.398500] amba 6048000.etr: Fixing up cyclic dependency with 6046000.replicator
[    1.413307] amba 7800000.funnel: Fixing up cyclic dependency with 7740000.etm
[    1.420572] amba 7800000.funnel: Fixing up cyclic dependency with 7640000.etm
[    1.427803] amba 7800000.funnel: Fixing up cyclic dependency with 7540000.etm
[    1.435031] amba 7800000.funnel: Fixing up cyclic dependency with 7440000.etm
[    1.442253] amba 7800000.funnel: Fixing up cyclic dependency with 7340000.etm
[    1.449473] amba 7800000.funnel: Fixing up cyclic dependency with 7240000.etm
[    1.456704] amba 7800000.funnel: Fixing up cyclic dependency with 7140000.etm
[    1.463925] amba 7800000.funnel: Fixing up cyclic dependency with 7040000.etm
[    1.471914] amba 7810000.funnel: Fixing up cyclic dependency with 7800000.funnel
[    1.479405] amba 7810000.funnel: Fixing up cyclic dependency with 6043000.funnel
[    1.508341] LT9611_1V8: supplied by VDC_5V
[    1.512917] LT9611_3V3: supplied by VDC_3V3
[    1.517507] PCIE0_1.05V: supplied by VBAT
[    1.522532] CAM0_DVDD_1V2: supplied by VBAT
[    1.527434] CAM0_AVDD_2V8: supplied by VBAT
[    1.531941] CAM3_AVDD_2V8: supplied by VBAT
[    1.536597] VLDO_3V3: supplied by VBAT
[    1.540681] V5P0_HDMIOUT: supplied by VDC_5V
[    1.547168] bam-dma-engine 1dc4000.dma: Adding to iommu group 0
[    1.553801] bam-dma-engine 1dc4000.dma: num-channels unspecified in dt
[    1.560399] bam-dma-engine 1dc4000.dma: num-ees unspecified in dt
[    1.568653] bam-dma-engine 17184000.dma-controller: Adding to iommu group 1
[    1.577559] geni_se_qup 8c0000.geniqup: Adding to iommu group 2
[    1.592398] 898000.serial: ttyHS0 at MMIO 0x898000 (irq = 166, base_baud = 0) is a MSM
[    1.602930] serial serial0: tty port ttyHS0 registered
[    1.603587] Bluetooth: Failed to init regulators:-517
[    1.609497] geni_se_qup ac0000.geniqup: Adding to iommu group 3
[    1.622549] a84000.serial: ttyMSM0 at MMIO 0xa84000 (irq = 167, base_baud = 0) is a MSM
[    1.638796] printk: console [ttyMSM0] enabled
[    1.638796] printk: console [ttyMSM0] enabled
[    1.647593] printk: bootconsole [qcom_geni0] disabled
[    1.647593] printk: bootconsole [qcom_geni0] disabled
[    1.666234] random: fast init done
[    1.670684] i2c 10-003b: Fixing up cyclic dependency with hdmi-out
[    1.679756] geni_i2c a8c000.i2c: Bus frequency not specified, default to 100kHz.
[    1.693843] geni_i2c a98000.i2c: Bus frequency not specified, default to 100kHz.
[    1.705196] smps3: Setting 1352000-1352000uV
[    1.710256] smps3: supplied by vph_pwr
[    1.714220] smps5: Setting 1904000-2040000uV
[    1.718740] smps5: supplied by vph_pwr
[    1.722627] smps7: Setting 900000-1028000uV
[    1.726971] smps7: supplied by vph_pwr
[    1.730836] ldo1: Setting 880000-880000uV
[    1.735343] ldo1: supplied by smps7
[    1.738960] ldo5: Setting 800000-800000uV
[    1.743455] ldo5: supplied by smps7
[    1.747068] ldo12: Setting 1800000-1800000uV
[    1.752255] ldo12: supplied by smps5
[    1.756033] ldo7: Setting 1800000-1800000uV
[    1.761124] ldo7: supplied by smps5
[    1.764804] ldo13: Setting 1800000-2960000uV
[    1.770425] ldo13: supplied by bob
[    1.774003] ldo17: Setting 1304000-1304000uV
[    1.778728] ldo17: supplied by smps3
[    1.782437] ldo20: Setting 2960000-2968000uV
[    1.787539] ldo20: supplied by bob
[    1.791093] ldo21: Setting 2960000-2968000uV
[    1.796245] ldo21: supplied by bob
[    1.799766] ldo24: Setting 3088000-3088000uV
[    1.804833] ldo24: supplied by bob
[    1.808347] ldo25: Setting 3300000-3312000uV
[    1.813408] ldo25: supplied by bob
[    1.816918] ldo26: Setting 1200000-1200000uV
[    1.822026] ldo26: supplied by smps3
[    1.826922] lvs1: supplied by vreg_s4a_1p8
[    1.832370] lvs2: supplied by vreg_s4a_1p8
[    1.838021] arm-smmu 5040000.iommu: probing hardware configuration...
[    1.844532] arm-smmu 5040000.iommu: SMMUv2 with:
[    1.849238] arm-smmu 5040000.iommu: 	stage 1 translation
[    1.854602] arm-smmu 5040000.iommu: 	address translation ops
[    1.860317] arm-smmu 5040000.iommu: 	non-coherent table walk
[    1.866029] arm-smmu 5040000.iommu: 	(IDR0.CTTW overridden by FW configuration)
[    1.873444] arm-smmu 5040000.iommu: 	stream matching with 5 register groups
[    1.880485] arm-smmu 5040000.iommu: 	5 context banks (0 stage-2 only)
[    1.887007] arm-smmu 5040000.iommu: 	Supported page sizes: 0x63315000
[    1.893511] arm-smmu 5040000.iommu: 	Stage-1: 48-bit VA -> 36-bit IPA
[    1.906923] adreno 5000000.gpu: Adding to iommu group 4
[    1.914037] msm ae00000.mdss: Adding to iommu group 5
[    1.920258] platform ae94000.dsi: Fixing up cyclic dependency with 10-003b
[    1.927259] platform ae94000.dsi: Fixing up cyclic dependency with ae01000.mdp
[    1.958684] qcrypto 1dfa000.crypto: Adding to iommu group 0
[    1.959006] sdhci_msm 8804000.sdhci: Adding to iommu group 6
[    1.970817] sdhci_msm 8804000.sdhci: Got CD GPIO
[    1.973397] remoteproc remoteproc2: 4080000.remoteproc is available
[    1.976097] sdhci_msm 8804000.sdhci: TCXO clk not present (-2)
[    1.991236] cpu cpu0: EM: created perf domain
[    1.999243] cpu cpu4: EM: created perf domain
[    2.006161] qcom-qmp-phy 1c06000.phy: Registered Qcom-QMP phy
[    2.012325] qcom-qmp-phy 1c0a000.phy: Registered Qcom-QMP phy
[    2.018368] qcom-qmp-phy 1d87000.phy: Registered Qcom-QMP phy
[    2.021541] mmc0: SDHCI controller on 8804000.sdhci [8804000.sdhci] using ADMA 64-bit
[    2.024451] qcom-qmp-phy 88e9000.phy: Registered Qcom-QMP phy
[    2.038244] qcom-qmp-phy 88eb000.phy: Registered Qcom-QMP phy
[    2.044284] qcom-qusb2-phy 88e2000.phy: Registered Qcom-QUSB2 phy
[    2.050633] qcom-qusb2-phy 88e3000.phy: Registered Qcom-QUSB2 phy
[    2.057089] dw-pcie 1c00000.pci: Adding to iommu group 7
[    2.062619] dw-pcie 1c00000.pci: IRQ index 1 not found
[    2.068312] dw-pcie 1c08000.pci: Adding to iommu group 8
[    2.073856] dw-pcie 1c08000.pci: IRQ index 1 not found
[    2.198106] Bluetooth: hci0: setting up wcn399x
[    2.261308] lt9611 10-003b: LT9611 revision: 0xe1
[    2.279914] msm ae00000.mdss: bound ae01000.mdp (ops dpu_ops)
[    2.285798] msm_dsi ae94000.dsi: supply gdsc not found, using dummy regulator
[    2.293518] msm ae00000.mdss: bound ae94000.dsi (ops dsi_ops)
[    2.300219] adreno 5000000.gpu: supply vdd not found, using dummy regulator
[    2.307354] adreno 5000000.gpu: supply vddcx not found, using dummy regulator
[    2.314635] adreno 5000000.gpu: [drm:msm_gpu_init] *ERROR* Couldn't register GPU cooling device
[    2.324199] platform 506a000.gmu: Adding to iommu group 9
[    2.330186] msm ae00000.mdss: bound 5000000.gpu (ops a3xx_ops)
[    2.336864] [drm:dpu_kms_hw_init:1005] dpu hardware revision:0x40000000
[    2.343740] Unable to handle kernel execute from non-executable memory at virtual address ffff3f8a45696c00
[    2.353512] Mem abort info:
[    2.356371]   ESR = 0x8600000f
[    2.359545]   EC = 0x21: IABT (current EL), IL = 32 bits
[    2.364935]   SET = 0, FnV = 0
[    2.368035]   EA = 0, S1PTW = 0
[    2.371297] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000a1598000
[    2.378134] [ffff3f8a45696c00] pgd=180000017df97003, p4d=180000017df97003, pud=180000017dc1e003, pmd=180000017dbf2003, pte=0068000105696f07
[    2.390835] Internal error: Oops: 8600000f [#1] PREEMPT SMP
[    2.396549] Modules linked in:
[    2.399711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc3-00115-g9b6193ea776c-dirty #142
[    2.408576] Hardware name: Thundercomm Dragonboard 845c (DT)
[    2.414368] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
[    2.420486] pc : 0xffff3f8a45696c00
[    2.424209] lr : msm_fault_handler+0x50/0xd0
[    2.428797] sp : ffff800010003e20
[    2.432225] x29: ffff800010003e20 x28: ffffc0d80c0f2c40 x27: 0000000099f8f000
[    2.439505] x26: ffffc0d80bbb0e78 x25: ffffc0d800000880 x24: 000000009d969800
[    2.446760] x23: 00000000001a0020 x22: ffff3f8a4ba04e58 x21: 0000000000000000
[    2.454037] x20: 000000009d969800 x19: ffff3f8a4ba42b80 x18: ffffffffffffffff
[    2.461304] x17: 000c0400bb44ffff x16: 004000b2b5503510 x15: 0000000000000000
[    2.468575] x14: ffffc0d80c0f2c40 x13: ffff7eb2b1897000 x12: 0000000000000040
[    2.475868] x11: ffff3f8a405426c8 x10: ffff3f8a405426ca x9 : ffffc0d80c1720e0
[    2.483128] x8 : ffff3f8a40542938 x7 : 0000000000000000 x6 : ffff3f8a405429b0
[    2.490387] x5 : ffffc0d80af089f0 x4 : ffff3f8a4ba42b80 x3 : 0000000000000000
[    2.497669] x2 : ffff3f8a45696c00 x1 : ffff800010003e50 x0 : 0000000100000000
[    2.504948] Call trace:
[    2.507507]  0xffff3f8a45696c00
[    2.510744]  report_iommu_fault+0x20/0x3c
[    2.514942]  arm_smmu_context_fault+0x120/0x250
[    2.519584]  __handle_irq_event_percpu+0x54/0x170
[    2.524435]  handle_irq_event+0x64/0x140
[    2.528483]  handle_fasteoi_irq+0xa4/0x1a0
[    2.532665]  __handle_domain_irq+0x7c/0xe0
[    2.536878]  gic_handle_irq+0xbc/0x140
[    2.540848]  el1_irq+0xbc/0x154
[    2.544099]  cpuidle_enter_state+0x12c/0x2f0
[    2.548599]  cpuidle_enter+0x38/0x50
[    2.552240]  do_idle+0x21c/0x2ac
[    2.555567]  cpu_startup_entry+0x24/0x70
[    2.559541]  rest_init+0xd8/0xe8
[    2.562930]  arch_call_rest_init+0x10/0x1c
[    2.567214]  start_kernel+0x4f4/0x52c
[    2.571065] Code: ffffffff ffffffff ffffffff ffffffff (4699c010) 
[    2.577373] ---[ end trace df19a8ce39b5e100 ]---
[    2.582175] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[    2.589200] SMP: stopping secondary CPUs
[    2.594616] Kernel Offset: 0x40d7fa800000 from 0xffff800010000000
[    2.600755] PHYS_OFFSET: 0xffffc076c0000000
[    2.604968] CPU features: 0x00000211,23300e46
[    2.609369] Memory Limit: none
[    2.612471] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Waiting for ssh to finish

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
  2021-07-04 12:53 ` [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Dmitry Baryshkov
@ 2021-07-04 18:20   ` Rob Clark
  2021-07-06 21:36     ` Bjorn Andersson
  2021-07-07  5:12     ` John Stultz
  0 siblings, 2 replies; 13+ messages in thread
From: Rob Clark @ 2021-07-04 18:20 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Konrad Dybcio, Akhil P Oommen, dri-devel, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones, Rob Clark,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Zhenzhong Duan,
	Bjorn Andersson, Joerg Roedel, linux-arm-msm, Sharat Masetty,
	John Stultz, Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Kristian H. Kristensen, freedreno

I suspect you are getting a dpu fault, and need:

https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/

I suppose Bjorn was expecting me to send that patch

BR,
-R

On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> Hi,
>
> I've had splash screen disabled on my RB3. However once I've enabled it,
> I've got the attached crash during the boot on the msm/msm-next. It
> looks like it is related to this particular set of changes.
>
> On 11/06/2021 00:44, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > This picks up an earlier series[1] from Jordan, and adds additional
> > support needed to generate GPU devcore dumps on iova faults.  Original
> > description:
> >
> > This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> > a adreno-smmu-priv function hook to capture a handful of important debugging
> > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> > third patch to print more detailed information on page fault such as the TTBR0
> > for the pagetable that caused the fault and the source of the fault as
> > determined by a combination of the FSYNR1 register and an internal GPU
> > register.
> >
> > This code provides a solid base that we can expand on later for even more
> > extensive GPU side page fault debugging capabilities.
> >
> > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> >      GPU snapshotting needs to avoid crashdumper, and check the
> >      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> >      resume translation after it has had a chance to snapshot the GPUs
> >      state
> > v3: Always clear FSR even if the target driver is going to handle resume
> > v2: Fix comment wording and function pointer check per Rob Clark
> >
> > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> >
> > Jordan Crouse (3):
> >    iommu/arm-smmu: Add support for driver IOMMU fault handlers
> >    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> >      info
> >    drm/msm: Improve the a6xx page fault handler
> >
> > Rob Clark (2):
> >    iommu/arm-smmu-qcom: Add stall support
> >    drm/msm: devcoredump iommu fault support
> >
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
> >   drivers/gpu/drm/msm/msm_gem.h               |   1 +
> >   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
> >   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
> >   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
> >   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
> >   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
> >   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
> >   include/linux/adreno-smmu-priv.h            |  38 ++++++-
> >   15 files changed, 367 insertions(+), 21 deletions(-)
> >
>
>
> --
> With best wishes
> Dmitry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
  2021-07-04 18:20   ` Rob Clark
@ 2021-07-06 21:36     ` Bjorn Andersson
  2021-07-07  5:12     ` John Stultz
  1 sibling, 0 replies; 13+ messages in thread
From: Bjorn Andersson @ 2021-07-06 21:36 UTC (permalink / raw)
  To: Rob Clark
  Cc: Konrad Dybcio, Akhil P Oommen, dri-devel, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones, Rob Clark,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Zhenzhong Duan,
	Joerg Roedel, linux-arm-msm, Sharat Masetty, John Stultz,
	Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Kristian H. Kristensen, Dmitry Baryshkov, freedreno

On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote:

> I suspect you are getting a dpu fault, and need:
> 
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
> 
> I suppose Bjorn was expecting me to send that patch
> 

No, I left that discussion with the same understanding as you... But I
ended up side tracked by some other craziness.

Did you post this somewhere or would you still like me to test it and
spin a patch?

Regards,
Bjorn

> BR,
> -R
> 
> On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
> >
> > Hi,
> >
> > I've had splash screen disabled on my RB3. However once I've enabled it,
> > I've got the attached crash during the boot on the msm/msm-next. It
> > looks like it is related to this particular set of changes.
> >
> > On 11/06/2021 00:44, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > This picks up an earlier series[1] from Jordan, and adds additional
> > > support needed to generate GPU devcore dumps on iova faults.  Original
> > > description:
> > >
> > > This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> > > a adreno-smmu-priv function hook to capture a handful of important debugging
> > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> > > third patch to print more detailed information on page fault such as the TTBR0
> > > for the pagetable that caused the fault and the source of the fault as
> > > determined by a combination of the FSYNR1 register and an internal GPU
> > > register.
> > >
> > > This code provides a solid base that we can expand on later for even more
> > > extensive GPU side page fault debugging capabilities.
> > >
> > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> > >      GPU snapshotting needs to avoid crashdumper, and check the
> > >      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> > >      resume translation after it has had a chance to snapshot the GPUs
> > >      state
> > > v3: Always clear FSR even if the target driver is going to handle resume
> > > v2: Fix comment wording and function pointer check per Rob Clark
> > >
> > > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> > >
> > > Jordan Crouse (3):
> > >    iommu/arm-smmu: Add support for driver IOMMU fault handlers
> > >    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> > >      info
> > >    drm/msm: Improve the a6xx page fault handler
> > >
> > > Rob Clark (2):
> > >    iommu/arm-smmu-qcom: Add stall support
> > >    drm/msm: devcoredump iommu fault support
> > >
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
> > >   drivers/gpu/drm/msm/msm_gem.h               |   1 +
> > >   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
> > >   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
> > >   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
> > >   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
> > >   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
> > >   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
> > >   include/linux/adreno-smmu-priv.h            |  38 ++++++-
> > >   15 files changed, 367 insertions(+), 21 deletions(-)
> > >
> >
> >
> > --
> > With best wishes
> > Dmitry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
  2021-07-04 18:20   ` Rob Clark
  2021-07-06 21:36     ` Bjorn Andersson
@ 2021-07-07  5:12     ` John Stultz
  2021-07-07 17:38       ` Rob Clark
  1 sibling, 1 reply; 13+ messages in thread
From: John Stultz @ 2021-07-07  5:12 UTC (permalink / raw)
  To: Rob Clark
  Cc: Konrad Dybcio, Akhil P Oommen, dri-devel, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones, Rob Clark,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Zhenzhong Duan,
	Bjorn Andersson, Joerg Roedel, linux-arm-msm, Sharat Masetty,
	Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Kristian H. Kristensen, Dmitry Baryshkov, freedreno

On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote:
>
> I suspect you are getting a dpu fault, and need:
>
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
>
> I suppose Bjorn was expecting me to send that patch

If it's helpful, I applied that and it got the db845c booting mainline
again for me (along with some reverts for a separate ext4 shrinker
crash).
Tested-by: John Stultz <john.stultz@linaro.org>

thanks
-john

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling
  2021-07-07  5:12     ` John Stultz
@ 2021-07-07 17:38       ` Rob Clark
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Clark @ 2021-07-07 17:38 UTC (permalink / raw)
  To: John Stultz
  Cc: Konrad Dybcio, Akhil P Oommen, dri-devel, open list, Eric Anholt,
	AngeloGioacchino Del Regno, Marijn Suijten, Lee Jones,
	Sai Prakash Ranjan, Jonathan Marek, Will Deacon, Rob Clark,
	Zhenzhong Duan, Bjorn Andersson, Joerg Roedel, linux-arm-msm,
	Sharat Masetty, Jordan Crouse, moderated list:ARM SMMU DRIVERS,
	Isaac J. Manjarres, Robin Murphy, Douglas Anderson,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Kristian H. Kristensen, Dmitry Baryshkov, freedreno

On Tue, Jul 6, 2021 at 10:12 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > I suspect you are getting a dpu fault, and need:
> >
> > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
> >
> > I suppose Bjorn was expecting me to send that patch
>
> If it's helpful, I applied that and it got the db845c booting mainline
> again for me (along with some reverts for a separate ext4 shrinker
> crash).
> Tested-by: John Stultz <john.stultz@linaro.org>
>

Thanks, I'll send a patch shortly

BR,
-R

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-07 17:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 21:44 [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Rob Clark
2021-06-10 21:44 ` [PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers Rob Clark
2021-06-14 17:26   ` Bjorn Andersson
2021-06-10 21:44 ` [PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info Rob Clark
2021-06-14 17:30   ` Bjorn Andersson
2021-06-10 21:44 ` [PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support Rob Clark
2021-06-11 13:49   ` Jordan Crouse
2021-06-14 17:54   ` Bjorn Andersson
2021-07-04 12:53 ` [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling Dmitry Baryshkov
2021-07-04 18:20   ` Rob Clark
2021-07-06 21:36     ` Bjorn Andersson
2021-07-07  5:12     ` John Stultz
2021-07-07 17:38       ` Rob Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).