dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations
@ 2022-08-21 18:19 Rob Clark
  2022-08-21 18:19 ` [PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation Rob Clark
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Vinod Koul, Loic Poulain, open list, Will Deacon,
	Akhil P Oommen, linux-arm-msm, Konrad Dybcio, Douglas Anderson,
	open list:IOMMU DRIVERS, Bjorn Andersson, Sean Paul,
	open list:IOMMU DRIVERS, Sibi Sankar, Yang Yingliang,
	Dmitry Baryshkov, Robin Murphy, freedreno,
	moderated list:ARM SMMU DRIVERS, Lu Baolu

From: Rob Clark <robdclark@chromium.org>

Two additions to adreno_smmu_priv to allow for a couple of
optimizations:

 + Use a separate ASID for each set of pgtables to avoid
   over-invalidation.
 + Detect the case of unmapping from non-current pgtables
   where we can skip the redundant tlbinv

Rob Clark (5):
  iommu/arm-smmu-qcom: Fix indentation
  iommu/arm-smmu-qcom: Provide way to access current TTBR0
  iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID
  drm/msm: Use separate ASID for each set of pgtables
  drm/msm: Skip tlbinv on unmap from non-current pgtables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      |  6 +++
 drivers/gpu/drm/msm/msm_iommu.c            | 44 +++++++++++++++++++---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 10 +++++
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 43 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  1 +
 include/linux/adreno-smmu-priv.h           | 18 +++++----
 6 files changed, 106 insertions(+), 16 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation
  2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
@ 2022-08-21 18:19 ` Rob Clark
  2022-08-21 18:19 ` [PATCH 2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0 Rob Clark
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel; +Cc: Rob Clark, linux-arm-msm, freedreno, open list

From: Rob Clark <robdclark@chromium.org>

Plus typo.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 include/linux/adreno-smmu-priv.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index c637e0997f6d..ac4c2c0ab724 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -37,7 +37,7 @@ struct adreno_smmu_fault_info {
 /**
  * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
  *
- * @cookie:        An opque token provided by adreno-smmu and passed
+ * @cookie:        An opaque token provided by adreno-smmu and passed
  *                 back into the callbacks
  * @get_ttbr1_cfg: Get the TTBR1 config for the GPUs context-bank
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
@@ -61,12 +61,12 @@ struct adreno_smmu_fault_info {
  * it's domain.
  */
 struct adreno_smmu_priv {
-    const void *cookie;
-    const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
-    int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
-    void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
-    void (*set_stall)(const void *cookie, bool enabled);
-    void (*resume_translation)(const void *cookie, bool terminate);
+	const void *cookie;
+	const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
+	int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+	void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
+	void (*set_stall)(const void *cookie, bool enabled);
+	void (*resume_translation)(const void *cookie, bool terminate);
 };
 
 #endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0
  2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
  2022-08-21 18:19 ` [PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation Rob Clark
@ 2022-08-21 18:19 ` Rob Clark
  2022-08-21 18:19 ` [PATCH 3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID Rob Clark
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Loic Poulain, open list, Will Deacon,
	open list:IOMMU DRIVERS, linux-arm-msm, Joerg Roedel,
	Konrad Dybcio, Robin Murphy, open list:IOMMU DRIVERS,
	Bjorn Andersson, Vinod Koul, Sibi Sankar, freedreno,
	moderated list:ARM SMMU DRIVERS

From: Rob Clark <robdclark@chromium.org>

The drm driver can skip tlbinv when unmapping from something that isn't
the current pgtables, as there is already a tlbinv on context switch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 9 +++++++++
 include/linux/adreno-smmu-priv.h           | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7820711c4560..59b460c1c9a5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -157,6 +157,14 @@ static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
 	return 0;
 }
 
+static u64 qcom_adreno_smmu_get_ttbr0(const void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+
+	return arm_smmu_cb_readq(smmu_domain->smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
+}
+
 static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
 					       struct arm_smmu_device *smmu,
 					       struct device *dev, int start)
@@ -217,6 +225,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->cookie = smmu_domain;
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+	priv->get_ttbr0 = qcom_adreno_smmu_get_ttbr0;
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 	priv->set_stall = qcom_adreno_smmu_set_stall;
 	priv->resume_translation = qcom_adreno_smmu_resume_translation;
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index ac4c2c0ab724..4ad90541a095 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -43,6 +43,7 @@ struct adreno_smmu_fault_info {
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
  *                 NULL config disables TTBR0 translation, otherwise
  *                 TTBR0 translation is enabled with the specified cfg
+ * @get_ttbr0:     Get current TTBR0 value
  * @get_fault_info: Called by the GPU fault handler to get information about
  *                  the fault
  * @set_stall:     Configure whether stall on fault (CFCFG) is enabled.  Call
@@ -64,6 +65,7 @@ struct adreno_smmu_priv {
 	const void *cookie;
 	const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
 	int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+	u64 (*get_ttbr0)(const void *cookie);
 	void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
 	void (*set_stall)(const void *cookie, bool enabled);
 	void (*resume_translation)(const void *cookie, bool terminate);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID
  2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
  2022-08-21 18:19 ` [PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation Rob Clark
  2022-08-21 18:19 ` [PATCH 2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0 Rob Clark
@ 2022-08-21 18:19 ` Rob Clark
  2022-08-21 18:19 ` [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables Rob Clark
  2022-08-21 18:19 ` [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables Rob Clark
  4 siblings, 0 replies; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Loic Poulain, open list, Will Deacon,
	open list:IOMMU DRIVERS, linux-arm-msm, Joerg Roedel,
	Konrad Dybcio, Robin Murphy, open list:IOMMU DRIVERS,
	Bjorn Andersson, Vinod Koul, Sibi Sankar, Yang Yingliang,
	freedreno, moderated list:ARM SMMU DRIVERS, Lu Baolu

From: Rob Clark <robdclark@chromium.org>

This will let the drm driver use different ASID values for each set of
pgtables to avoid over-invalidation on unmap.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  1 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 43 ++++++++++++++++++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  1 +
 include/linux/adreno-smmu-priv.h           |  2 +
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 59b460c1c9a5..3230348729ab 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -229,6 +229,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 	priv->set_stall = qcom_adreno_smmu_set_stall;
 	priv->resume_translation = qcom_adreno_smmu_resume_translation;
+	priv->tlb_inv_by_id = arm_smmu_tlb_inv_by_id;
 
 	return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..624359bb2092 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -252,7 +252,7 @@ static void arm_smmu_tlb_sync_context(struct arm_smmu_domain *smmu_domain)
 	spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
 }
 
-static void arm_smmu_tlb_inv_context_s1(void *cookie)
+static void arm_smmu_tlb_inv_context_s1_asid(void *cookie, u16 asid)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	/*
@@ -261,21 +261,56 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
 	 */
 	wmb();
 	arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
-			  ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
+			  ARM_SMMU_CB_S1_TLBIASID, asid);
 	arm_smmu_tlb_sync_context(smmu_domain);
 }
 
-static void arm_smmu_tlb_inv_context_s2(void *cookie)
+static void arm_smmu_tlb_inv_context_s1(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	arm_smmu_tlb_inv_context_s1_asid(cookie, smmu_domain->cfg.asid);
+}
+
+static void arm_smmu_tlb_inv_context_s2_vmid(void *cookie, u16 vmid)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
 	/* See above */
 	wmb();
-	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
+	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, vmid);
 	arm_smmu_tlb_sync_global(smmu);
 }
 
+static void arm_smmu_tlb_inv_context_s2(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	arm_smmu_tlb_inv_context_s2_vmid(cookie, smmu_domain->cfg.vmid);
+}
+
+void arm_smmu_tlb_inv_by_id(const void *cookie, u16 id)
+{
+	struct arm_smmu_domain *smmu_domain = (void *)cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	arm_smmu_rpm_get(smmu);
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		arm_smmu_tlb_inv_context_s1_asid(smmu_domain, id);
+		break;
+	case ARM_SMMU_DOMAIN_S2:
+	case ARM_SMMU_DOMAIN_NESTED:
+		arm_smmu_tlb_inv_context_s2_vmid(smmu_domain, id);
+		break;
+	case ARM_SMMU_DOMAIN_BYPASS:
+		break;
+	}
+
+	arm_smmu_rpm_put(smmu);
+}
+
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
 				      size_t granule, void *cookie, int reg)
 {
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 2b9b42fb6f30..f6fb52d6f841 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -527,6 +527,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
 
+void arm_smmu_tlb_inv_by_id(const void *cookie, u16 id);
 void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index 4ad90541a095..c44fc68d4de8 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -50,6 +50,7 @@ struct adreno_smmu_fault_info {
  *                 before set_ttbr0_cfg().  If stalling on fault is enabled,
  *                 the GPU driver must call resume_translation()
  * @resume_translation: Resume translation after a fault
+ * @tlb_inv_by_id: Flush TLB by ASID/VMID
  *
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
@@ -69,6 +70,7 @@ struct adreno_smmu_priv {
 	void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info *info);
 	void (*set_stall)(const void *cookie, bool enabled);
 	void (*resume_translation)(const void *cookie, bool terminate);
+	void (*tlb_inv_by_id)(const void *cookie, u16 id);
 };
 
 #endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
  2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
                   ` (2 preceding siblings ...)
  2022-08-21 18:19 ` [PATCH 3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID Rob Clark
@ 2022-08-21 18:19 ` Rob Clark
  2022-08-22 13:52   ` Robin Murphy
  2022-08-21 18:19 ` [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables Rob Clark
  4 siblings, 1 reply; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, David Airlie, linux-arm-msm, Abhinav Kumar, open list,
	Sean Paul, Dmitry Baryshkov, freedreno

From: Rob Clark <robdclark@chromium.org>

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size_t size)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+	struct adreno_smmu_priv *adreno_smmu =
+		dev_get_drvdata(pagetable->parent->dev);
 	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
 	size_t unmapped = 0;
 
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size -= 4096;
 	}
 
-	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
+	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
 
 	return (unmapped == size) ? 0 : -EINVAL;
 }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 
 struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 {
+	static atomic_t asid = ATOMIC_INIT(1);
 	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
 	struct msm_iommu *iommu = to_msm_iommu(parent);
 	struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
 
 	/*
-	 * TODO we would like each set of page tables to have a unique ASID
-	 * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
-	 * end up flushing the ASID used for TTBR1 pagetables, which is not
-	 * what we want.  So for now just use the same ASID as TTBR1.
+	 * ASID 0 is used for kernel mapped buffers in TTBR1, which we
+	 * do not need to invalidate when unmapping from TTBR0 pgtables.
+	 * The hw ASID is at *least* 8b, but can be 16b.  We just assume
+	 * the worst:
 	 */
 	pagetable->asid = 0;
+	while (!pagetable->asid)
+		pagetable->asid = atomic_inc_return(&asid) & 0xff;
 
 	return &pagetable->base;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables
  2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
                   ` (3 preceding siblings ...)
  2022-08-21 18:19 ` [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables Rob Clark
@ 2022-08-21 18:19 ` Rob Clark
  2022-08-24 17:46   ` Akhil P Oommen
  4 siblings, 1 reply; 11+ messages in thread
From: Rob Clark @ 2022-08-21 18:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Akhil P Oommen, David Airlie, linux-arm-msm,
	Konrad Dybcio, Abhinav Kumar, Douglas Anderson, Sean Paul,
	Dmitry Baryshkov, freedreno, open list

From: Rob Clark <robdclark@chromium.org>

We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++++++
 drivers/gpu/drm/msm/msm_iommu.c       | 29 +++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c8ad8aeca777..1ba0ed629549 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
 	/* Always come up on rb 0 */
 	a6xx_gpu->cur_ring = gpu->rb[0];
 
+	/*
+	 * Note, we cannot assume anything about the state of the SMMU when
+	 * coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
+	 * on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
+	 * this behavior.
+	 */
 	gpu->cur_ctx_seqno = 0;
 
 	/* Enable the SQE_to start the CP engine */
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 94c8c09980d1..218074a58081 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size -= 4096;
 	}
 
+	/*
+	 * A CP_SMMU_TABLE_UPDATE is always sent for the first
+	 * submit after resume, and that does a TLB invalidate.
+	 * So we can skip that if the device is not currently
+	 * powered.
+	 */
+	if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
+		goto out;
+
+	/*
+	 * If we are not the current pgtables, we can rely on the
+	 * TLB invalidate done by CP_SMMU_TABLE_UPDATE.
+	 *
+	 * We'll always be racing with the GPU updating ttbr0,
+	 * but there are only two cases:
+	 *
+	 *  + either we are not the the current pgtables and there
+	 *    will be a tlbinv done by the GPU before we are again
+	 *
+	 *  + or we are.. there might have already been a tblinv
+	 *    if we raced with the GPU, but we have to assume the
+	 *    worse and do the tlbinv
+	 */
+	if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
+		goto out_put;
+
 	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
 
+out_put:
+	pm_runtime_put(pagetable->parent->dev);
+out:
 	return (unmapped == size) ? 0 : -EINVAL;
 }
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
  2022-08-21 18:19 ` [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables Rob Clark
@ 2022-08-22 13:52   ` Robin Murphy
  2022-08-22 14:38     ` Robin Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2022-08-22 13:52 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, freedreno, Will Deacon, David Airlie, linux-arm-msm,
	Abhinav Kumar, open list, iommu, Dmitry Baryshkov, Sean Paul

On 2022-08-21 19:19, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Optimize TLB invalidation by using different ASID for each set of
> pgtables.  There can be scenarios where multiple processes end up
> with the same ASID (such as >256 processes using the GPU), but this
> is harmless, it will only result in some over-invalidation (but
> less over-invalidation compared to using ASID=0 for all processes)

Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/

If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.

Robin.

.> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index a54ed354578b..94c8c09980d1 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>   		size_t size)
>   {
>   	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> +	struct adreno_smmu_priv *adreno_smmu =
> +		dev_get_drvdata(pagetable->parent->dev);
>   	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
>   	size_t unmapped = 0;
>   
> @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>   		size -= 4096;
>   	}
>   
> -	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
> +	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>   
>   	return (unmapped == size) ? 0 : -EINVAL;
>   }
> @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
>   
>   struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>   {
> +	static atomic_t asid = ATOMIC_INIT(1);
>   	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
>   	struct msm_iommu *iommu = to_msm_iommu(parent);
>   	struct msm_iommu_pagetable *pagetable;
> @@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>   	pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
>   
>   	/*
> -	 * TODO we would like each set of page tables to have a unique ASID
> -	 * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
> -	 * end up flushing the ASID used for TTBR1 pagetables, which is not
> -	 * what we want.  So for now just use the same ASID as TTBR1.
> +	 * ASID 0 is used for kernel mapped buffers in TTBR1, which we
> +	 * do not need to invalidate when unmapping from TTBR0 pgtables.
> +	 * The hw ASID is at *least* 8b, but can be 16b.  We just assume
> +	 * the worst:
>   	 */
>   	pagetable->asid = 0;
> +	while (!pagetable->asid)
> +		pagetable->asid = atomic_inc_return(&asid) & 0xff;
>   
>   	return &pagetable->base;
>   }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
  2022-08-22 13:52   ` Robin Murphy
@ 2022-08-22 14:38     ` Robin Murphy
  0 siblings, 0 replies; 11+ messages in thread
From: Robin Murphy @ 2022-08-22 14:38 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, David Airlie, freedreno, Abhinav Kumar, open list,
	Sean Paul, iommu, linux-arm-msm, Dmitry Baryshkov, Will Deacon

On 2022-08-22 14:52, Robin Murphy wrote:
> On 2022-08-21 19:19, Rob Clark wrote:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Optimize TLB invalidation by using different ASID for each set of
>> pgtables.  There can be scenarios where multiple processes end up
>> with the same ASID (such as >256 processes using the GPU), but this
>> is harmless, it will only result in some over-invalidation (but
>> less over-invalidation compared to using ASID=0 for all processes)
> 
> Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
> whenever it switches a TTBR, then there's only ever one ASID live in the 
> TLBs at once, so it really doesn't matter whether its value stays the 
> same or changes. This seems like a whole chunk of complexity to achieve 
> nothing :/
> 
> If you could actually use multiple ASIDs in a meaningful way to avoid 
> any invalidations, you'd need to do considerably more work to keep track 
> of reuse, and those races would probably be a lot less benign.

Oh, and on top of that, if you did want to go down that route then 
chances are you'd then also want to start looking at using global 
mappings in TTBR1 to avoid increased TLB pressure from kernel buffers, 
and then we'd run up against some pretty horrid MMU-500 errata which so 
far I've been happy to ignore on the basis that Linux doesn't use global 
mappings. Spoiler alert: unless you can additionally convert everything 
to invalidate by VA, the workaround for #752357 most likely makes the 
whole idea moot.

Robin.

>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>> ---
>>   drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index a54ed354578b..94c8c09980d1 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
>> *mmu, u64 iova,
>>           size_t size)
>>   {
>>       struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
>> +    struct adreno_smmu_priv *adreno_smmu =
>> +        dev_get_drvdata(pagetable->parent->dev);
>>       struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
>>       size_t unmapped = 0;
>> @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
>> *mmu, u64 iova,
>>           size -= 4096;
>>       }
>> -    iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
>> +    adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>>       return (unmapped == size) ? 0 : -EINVAL;
>>   }
>> @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain 
>> *domain, struct device *dev,
>>   struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>>   {
>> +    static atomic_t asid = ATOMIC_INIT(1);
>>       struct adreno_smmu_priv *adreno_smmu = 
>> dev_get_drvdata(parent->dev);
>>       struct msm_iommu *iommu = to_msm_iommu(parent);
>>       struct msm_iommu_pagetable *pagetable;
>> @@ -210,12 +213,14 @@ struct msm_mmu 
>> *msm_iommu_pagetable_create(struct msm_mmu *parent)
>>       pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
>>       /*
>> -     * TODO we would like each set of page tables to have a unique ASID
>> -     * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
>> -     * end up flushing the ASID used for TTBR1 pagetables, which is not
>> -     * what we want.  So for now just use the same ASID as TTBR1.
>> +     * ASID 0 is used for kernel mapped buffers in TTBR1, which we
>> +     * do not need to invalidate when unmapping from TTBR0 pgtables.
>> +     * The hw ASID is at *least* 8b, but can be 16b.  We just assume
>> +     * the worst:
>>        */
>>       pagetable->asid = 0;
>> +    while (!pagetable->asid)
>> +        pagetable->asid = atomic_inc_return(&asid) & 0xff;
>>       return &pagetable->base;
>>   }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables
  2022-08-21 18:19 ` [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables Rob Clark
@ 2022-08-24 17:46   ` Akhil P Oommen
  2022-08-24 19:02     ` Rob Clark
  0 siblings, 1 reply; 11+ messages in thread
From: Akhil P Oommen @ 2022-08-24 17:46 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, freedreno, David Airlie, linux-arm-msm, Konrad Dybcio,
	Abhinav Kumar, Douglas Anderson, Dmitry Baryshkov, Sean Paul,
	open list

On 8/21/2022 11:49 PM, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++++++
>   drivers/gpu/drm/msm/msm_iommu.c       | 29 +++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c8ad8aeca777..1ba0ed629549 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
>   	/* Always come up on rb 0 */
>   	a6xx_gpu->cur_ring = gpu->rb[0];
>   
> +	/*
> +	 * Note, we cannot assume anything about the state of the SMMU when
> +	 * coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
> +	 * on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
> +	 * this behavior.
> +	 */
>   	gpu->cur_ctx_seqno = 0;
>   
>   	/* Enable the SQE_to start the CP engine */
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 94c8c09980d1..218074a58081 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>   		size -= 4096;
>   	}
>   
> +	/*
> +	 * A CP_SMMU_TABLE_UPDATE is always sent for the first
> +	 * submit after resume, and that does a TLB invalidate.
> +	 * So we can skip that if the device is not currently
> +	 * powered.
> +	 */
> +	if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
> +		goto out;
> +
> +	/*
> +	 * If we are not the current pgtables, we can rely on the
> +	 * TLB invalidate done by CP_SMMU_TABLE_UPDATE.
> +	 *
> +	 * We'll always be racing with the GPU updating ttbr0,
> +	 * but there are only two cases:
> +	 *
> +	 *  + either we are not the the current pgtables and there
> +	 *    will be a tlbinv done by the GPU before we are again
> +	 *
> +	 *  + or we are.. there might have already been a tblinv
> +	 *    if we raced with the GPU, but we have to assume the
> +	 *    worse and do the tlbinv
> +	 */
> +	if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
> +		goto out_put;
> +
>   	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>   
> +out_put:
> +	pm_runtime_put(pagetable->parent->dev);
> +out:
>   	return (unmapped == size) ? 0 : -EINVAL;
>   }
>   
Asking because it is a *security issue* if we get this wrong:
1. Is there any measure benefit with this patch? I believe tlb 
invalidation doesn't contribute much to the unmap latency.
2. We at least should insert a full memory barrier before reading the 
ttbr0 register to ensure that everything we did prior to that is visible 
to smmu. But then I guess the cost of the full barrier would be similar 
to the tlb invalidation.

Because it could lead to security issues or other very hard to debug 
issues, I would prefer this optimization only if there is a significant 
measurable gain.

-Akhil.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables
  2022-08-24 17:46   ` Akhil P Oommen
@ 2022-08-24 19:02     ` Rob Clark
  2022-08-25 18:12       ` Akhil P Oommen
  0 siblings, 1 reply; 11+ messages in thread
From: Rob Clark @ 2022-08-24 19:02 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, David Airlie, linux-arm-msm, Konrad Dybcio,
	Abhinav Kumar, dri-devel, Douglas Anderson, Sean Paul,
	Dmitry Baryshkov, freedreno, open list

On Wed, Aug 24, 2022 at 10:46 AM Akhil P Oommen
<quic_akhilpo@quicinc.com> wrote:
>
> On 8/21/2022 11:49 PM, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++++++
> >   drivers/gpu/drm/msm/msm_iommu.c       | 29 +++++++++++++++++++++++++++
> >   2 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index c8ad8aeca777..1ba0ed629549 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
> >       /* Always come up on rb 0 */
> >       a6xx_gpu->cur_ring = gpu->rb[0];
> >
> > +     /*
> > +      * Note, we cannot assume anything about the state of the SMMU when
> > +      * coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
> > +      * on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
> > +      * this behavior.
> > +      */
> >       gpu->cur_ctx_seqno = 0;
> >
> >       /* Enable the SQE_to start the CP engine */
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> > index 94c8c09980d1..218074a58081 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
> >               size -= 4096;
> >       }
> >
> > +     /*
> > +      * A CP_SMMU_TABLE_UPDATE is always sent for the first
> > +      * submit after resume, and that does a TLB invalidate.
> > +      * So we can skip that if the device is not currently
> > +      * powered.
> > +      */
> > +     if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
> > +             goto out;
> > +
> > +     /*
> > +      * If we are not the current pgtables, we can rely on the
> > +      * TLB invalidate done by CP_SMMU_TABLE_UPDATE.
> > +      *
> > +      * We'll always be racing with the GPU updating ttbr0,
> > +      * but there are only two cases:
> > +      *
> > +      *  + either we are not the the current pgtables and there
> > +      *    will be a tlbinv done by the GPU before we are again
> > +      *
> > +      *  + or we are.. there might have already been a tblinv
> > +      *    if we raced with the GPU, but we have to assume the
> > +      *    worse and do the tlbinv
> > +      */
> > +     if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
> > +             goto out_put;
> > +
> >       adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
> >
> > +out_put:
> > +     pm_runtime_put(pagetable->parent->dev);
> > +out:
> >       return (unmapped == size) ? 0 : -EINVAL;
> >   }
> >
> Asking because it is a *security issue* if we get this wrong:
> 1. Is there any measure benefit with this patch? I believe tlb
> invalidation doesn't contribute much to the unmap latency.

It turned out to not make a huge difference.. although I expect the
part about skipping the inv when runtime suspended is still useful
from a power standpoint (but don't have a great setup to measure that)

BR,
-R

> 2. We at least should insert a full memory barrier before reading the
> ttbr0 register to ensure that everything we did prior to that is visible
> to smmu. But then I guess the cost of the full barrier would be similar
> to the tlb invalidation.
>
> Because it could lead to security issues or other very hard to debug
> issues, I would prefer this optimization only if there is a significant
> measurable gain.
>
> -Akhil.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables
  2022-08-24 19:02     ` Rob Clark
@ 2022-08-25 18:12       ` Akhil P Oommen
  0 siblings, 0 replies; 11+ messages in thread
From: Akhil P Oommen @ 2022-08-25 18:12 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, David Airlie, linux-arm-msm, Konrad Dybcio,
	Abhinav Kumar, dri-devel, Douglas Anderson, Sean Paul,
	Dmitry Baryshkov, freedreno, open list

On 8/25/2022 12:32 AM, Rob Clark wrote:
> On Wed, Aug 24, 2022 at 10:46 AM Akhil P Oommen
> <quic_akhilpo@quicinc.com> wrote:
>> On 8/21/2022 11:49 PM, Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++++++
>>>    drivers/gpu/drm/msm/msm_iommu.c       | 29 +++++++++++++++++++++++++++
>>>    2 files changed, 35 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> index c8ad8aeca777..1ba0ed629549 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>> @@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
>>>        /* Always come up on rb 0 */
>>>        a6xx_gpu->cur_ring = gpu->rb[0];
>>>
>>> +     /*
>>> +      * Note, we cannot assume anything about the state of the SMMU when
>>> +      * coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
>>> +      * on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
>>> +      * this behavior.
>>> +      */
>>>        gpu->cur_ctx_seqno = 0;
>>>
>>>        /* Enable the SQE_to start the CP engine */
>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
>>> index 94c8c09980d1..218074a58081 100644
>>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>>> @@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>>>                size -= 4096;
>>>        }
>>>
>>> +     /*
>>> +      * A CP_SMMU_TABLE_UPDATE is always sent for the first
>>> +      * submit after resume, and that does a TLB invalidate.
>>> +      * So we can skip that if the device is not currently
>>> +      * powered.
>>> +      */
>>> +     if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
>>> +             goto out;
>>> +
>>> +     /*
>>> +      * If we are not the current pgtables, we can rely on the
>>> +      * TLB invalidate done by CP_SMMU_TABLE_UPDATE.
>>> +      *
>>> +      * We'll always be racing with the GPU updating ttbr0,
>>> +      * but there are only two cases:
>>> +      *
>>> +      *  + either we are not the the current pgtables and there
>>> +      *    will be a tlbinv done by the GPU before we are again
>>> +      *
>>> +      *  + or we are.. there might have already been a tblinv
>>> +      *    if we raced with the GPU, but we have to assume the
>>> +      *    worse and do the tlbinv
>>> +      */
>>> +     if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
>>> +             goto out_put;
>>> +
>>>        adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>>>
>>> +out_put:
>>> +     pm_runtime_put(pagetable->parent->dev);
>>> +out:
>>>        return (unmapped == size) ? 0 : -EINVAL;
>>>    }
>>>
>> Asking because it is a *security issue* if we get this wrong:
>> 1. Is there any measure benefit with this patch? I believe tlb
>> invalidation doesn't contribute much to the unmap latency.
> It turned out to not make a huge difference.. although I expect the
> part about skipping the inv when runtime suspended is still useful
> from a power standpoint (but don't have a great setup to measure that)
Agree. Perhaps use the recently added 'suspended' flag instead of 
pm_runtime_get_if_in_use().

-Akhil.
>
> BR,
> -R
>
>> 2. We at least should insert a full memory barrier before reading the
>> ttbr0 register to ensure that everything we did prior to that is visible
>> to smmu. But then I guess the cost of the full barrier would be similar
>> to the tlb invalidation.
>>
>> Because it could lead to security issues or other very hard to debug
>> issues, I would prefer this optimization only if there is a significant
>> measurable gain.
>>
>> -Akhil.
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-25 18:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21 18:19 [PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations Rob Clark
2022-08-21 18:19 ` [PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation Rob Clark
2022-08-21 18:19 ` [PATCH 2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0 Rob Clark
2022-08-21 18:19 ` [PATCH 3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID Rob Clark
2022-08-21 18:19 ` [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables Rob Clark
2022-08-22 13:52   ` Robin Murphy
2022-08-22 14:38     ` Robin Murphy
2022-08-21 18:19 ` [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables Rob Clark
2022-08-24 17:46   ` Akhil P Oommen
2022-08-24 19:02     ` Rob Clark
2022-08-25 18:12       ` Akhil P Oommen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).