All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] drm/msm/a6xx: System Cache Support
@ 2019-12-19 13:14 ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan, Sharat Masetty

Some hardware variants contain a system level cache or the last level
cache(llc). This cache is typically a large block which is shared by multiple
clients on the SOC. GPU uses the system cache to cache both the GPU data
buffers(like textures) as well the SMMU pagetables. This helps with
improved render performance as well as lower power consumption by reducing
the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which
then be used by multiple SOC clients. This patch series is an effort to enable
and use two of those slices perallocated for the GPU, one for the GPU data
buffers and another for the GPU SMMU hardware pagetables.

To enable the system cache driver, add [1] to your stack if not already
present. Please review.

[1] https://lore.kernel.org/patchwork/patch/1165298/

Jordan Crouse (1):
  iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context

Sharat Masetty (3):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm: Pass mmu features to generic layers
  drm/msm/a6xx: Add support for using system cache(LLC)

Vivek Gautam (1):
  iommu/arm-smmu: Add domain attribute for QCOM system cache

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 122 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   9 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |   4 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c           |   8 +++
 drivers/gpu/drm/msm/msm_drv.h           |   1 +
 drivers/gpu/drm/msm/msm_gpu.c           |   6 +-
 drivers/gpu/drm/msm/msm_gpu.h           |   6 +-
 drivers/gpu/drm/msm/msm_iommu.c         |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h           |  14 ++++
 drivers/iommu/arm-smmu-impl.c           |   3 +-
 drivers/iommu/arm-smmu-qcom.c           |  10 +++
 drivers/iommu/arm-smmu.c                |  25 +++++--
 drivers/iommu/arm-smmu.h                |   4 +-
 include/linux/iommu.h                   |   1 +
 19 files changed, 216 insertions(+), 20 deletions(-)

--
1.9.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/5] drm/msm/a6xx: System Cache Support
@ 2019-12-19 13:14 ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy, Sharat Masetty

Some hardware variants contain a system level cache or the last level
cache(llc). This cache is typically a large block which is shared by multiple
clients on the SOC. GPU uses the system cache to cache both the GPU data
buffers(like textures) as well the SMMU pagetables. This helps with
improved render performance as well as lower power consumption by reducing
the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which
then be used by multiple SOC clients. This patch series is an effort to enable
and use two of those slices perallocated for the GPU, one for the GPU data
buffers and another for the GPU SMMU hardware pagetables.

To enable the system cache driver, add [1] to your stack if not already
present. Please review.

[1] https://lore.kernel.org/patchwork/patch/1165298/

Jordan Crouse (1):
  iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context

Sharat Masetty (3):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm: Pass mmu features to generic layers
  drm/msm/a6xx: Add support for using system cache(LLC)

Vivek Gautam (1):
  iommu/arm-smmu: Add domain attribute for QCOM system cache

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 122 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   9 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |   4 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c           |   8 +++
 drivers/gpu/drm/msm/msm_drv.h           |   1 +
 drivers/gpu/drm/msm/msm_gpu.c           |   6 +-
 drivers/gpu/drm/msm/msm_gpu.h           |   6 +-
 drivers/gpu/drm/msm/msm_iommu.c         |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h           |  14 ++++
 drivers/iommu/arm-smmu-impl.c           |   3 +-
 drivers/iommu/arm-smmu-qcom.c           |  10 +++
 drivers/iommu/arm-smmu.c                |  25 +++++--
 drivers/iommu/arm-smmu.h                |   4 +-
 include/linux/iommu.h                   |   1 +
 19 files changed, 216 insertions(+), 20 deletions(-)

--
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/5] drm/msm/a6xx: System Cache Support
@ 2019-12-19 13:14 ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, robin.murphy, Sharat Masetty

Some hardware variants contain a system level cache or the last level
cache(llc). This cache is typically a large block which is shared by multiple
clients on the SOC. GPU uses the system cache to cache both the GPU data
buffers(like textures) as well the SMMU pagetables. This helps with
improved render performance as well as lower power consumption by reducing
the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which
then be used by multiple SOC clients. This patch series is an effort to enable
and use two of those slices perallocated for the GPU, one for the GPU data
buffers and another for the GPU SMMU hardware pagetables.

To enable the system cache driver, add [1] to your stack if not already
present. Please review.

[1] https://lore.kernel.org/patchwork/patch/1165298/

Jordan Crouse (1):
  iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context

Sharat Masetty (3):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm: Pass mmu features to generic layers
  drm/msm/a6xx: Add support for using system cache(LLC)

Vivek Gautam (1):
  iommu/arm-smmu: Add domain attribute for QCOM system cache

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 122 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   9 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |   4 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c           |   8 +++
 drivers/gpu/drm/msm/msm_drv.h           |   1 +
 drivers/gpu/drm/msm/msm_gpu.c           |   6 +-
 drivers/gpu/drm/msm/msm_gpu.h           |   6 +-
 drivers/gpu/drm/msm/msm_iommu.c         |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h           |  14 ++++
 drivers/iommu/arm-smmu-impl.c           |   3 +-
 drivers/iommu/arm-smmu-qcom.c           |  10 +++
 drivers/iommu/arm-smmu.c                |  25 +++++--
 drivers/iommu/arm-smmu.h                |   4 +-
 include/linux/iommu.h                   |   1 +
 19 files changed, 216 insertions(+), 20 deletions(-)

--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
  2019-12-19 13:14 ` Sharat Masetty
  (?)
@ 2019-12-19 13:14   ` Sharat Masetty
  -1 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan

From: Jordan Crouse <jcrouse@codeaurora.org>

Pass the propposed io_pgtable_cfg to the implementation specific
init_context() function to give the implementation an opportunity to
to modify it before it gets passed to io-pgtable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c      | 11 ++++++-----
 drivers/iommu/arm-smmu.h      |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b2fe72a..33ed682 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }

-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct cavium_smmu *cs = container_of(smmu_domain->smmu,
 					      struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index eee48f9..4f7e0c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -758,11 +758,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;

 	smmu_domain->smmu = smmu;
-	if (smmu->impl && smmu->impl->init_context) {
-		ret = smmu->impl->init_context(smmu_domain);
-		if (ret)
-			goto out_unlock;
-	}

 	smmu_domain->pgtbl_cfg = (struct io_pgtable_cfg) {
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
@@ -773,6 +768,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};

+	if (smmu->impl && smmu->impl->init_context) {
+		ret = smmu->impl->init_context(smmu_domain, &smmu_domain->pgtbl_cfg);
+		if (ret)
+			goto out_unlock;
+	}
+
 	if (smmu_domain->non_strict)
 		smmu_domain->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;

diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b2df38c..f57cdbe 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -335,7 +335,8 @@ struct arm_smmu_impl {
 			    u64 val);
 	int (*cfg_probe)(struct arm_smmu_device *smmu);
 	int (*reset)(struct arm_smmu_device *smmu);
-	int (*init_context)(struct arm_smmu_domain *smmu_domain);
+	int (*init_context)(struct arm_smmu_domain *smmu_domain,
+			    struct io_pgtable_cfg *pgtbl_cfg);
 };

 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
--
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/5] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel, robin.murphy

From: Jordan Crouse <jcrouse@codeaurora.org>

Pass the propposed io_pgtable_cfg to the implementation specific
init_context() function to give the implementation an opportunity to
to modify it before it gets passed to io-pgtable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c      | 11 ++++++-----
 drivers/iommu/arm-smmu.h      |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b2fe72a..33ed682 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }

-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct cavium_smmu *cs = container_of(smmu_domain->smmu,
 					      struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index eee48f9..4f7e0c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -758,11 +758,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;

 	smmu_domain->smmu = smmu;
-	if (smmu->impl && smmu->impl->init_context) {
-		ret = smmu->impl->init_context(smmu_domain);
-		if (ret)
-			goto out_unlock;
-	}

 	smmu_domain->pgtbl_cfg = (struct io_pgtable_cfg) {
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
@@ -773,6 +768,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};

+	if (smmu->impl && smmu->impl->init_context) {
+		ret = smmu->impl->init_context(smmu_domain, &smmu_domain->pgtbl_cfg);
+		if (ret)
+			goto out_unlock;
+	}
+
 	if (smmu_domain->non_strict)
 		smmu_domain->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;

diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b2df38c..f57cdbe 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -335,7 +335,8 @@ struct arm_smmu_impl {
 			    u64 val);
 	int (*cfg_probe)(struct arm_smmu_device *smmu);
 	int (*reset)(struct arm_smmu_device *smmu);
-	int (*init_context)(struct arm_smmu_domain *smmu_domain);
+	int (*init_context)(struct arm_smmu_domain *smmu_domain,
+			    struct io_pgtable_cfg *pgtbl_cfg);
 };

 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
--
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/5] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, robin.murphy

From: Jordan Crouse <jcrouse@codeaurora.org>

Pass the propposed io_pgtable_cfg to the implementation specific
init_context() function to give the implementation an opportunity to
to modify it before it gets passed to io-pgtable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c      | 11 ++++++-----
 drivers/iommu/arm-smmu.h      |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b2fe72a..33ed682 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }

-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct cavium_smmu *cs = container_of(smmu_domain->smmu,
 					      struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index eee48f9..4f7e0c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -758,11 +758,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;

 	smmu_domain->smmu = smmu;
-	if (smmu->impl && smmu->impl->init_context) {
-		ret = smmu->impl->init_context(smmu_domain);
-		if (ret)
-			goto out_unlock;
-	}

 	smmu_domain->pgtbl_cfg = (struct io_pgtable_cfg) {
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
@@ -773,6 +768,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};

+	if (smmu->impl && smmu->impl->init_context) {
+		ret = smmu->impl->init_context(smmu_domain, &smmu_domain->pgtbl_cfg);
+		if (ret)
+			goto out_unlock;
+	}
+
 	if (smmu_domain->non_strict)
 		smmu_domain->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;

diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b2df38c..f57cdbe 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -335,7 +335,8 @@ struct arm_smmu_impl {
 			    u64 val);
 	int (*cfg_probe)(struct arm_smmu_device *smmu);
 	int (*reset)(struct arm_smmu_device *smmu);
-	int (*init_context)(struct arm_smmu_domain *smmu_domain);
+	int (*init_context)(struct arm_smmu_domain *smmu_domain,
+			    struct io_pgtable_cfg *pgtbl_cfg);
 };

 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] iommu/arm-smmu: Add domain attribute for QCOM system cache
  2019-12-19 13:14 ` Sharat Masetty
  (?)
@ 2019-12-19 13:14   ` Sharat Masetty
  -1 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan, Vivek Gautam

From: Vivek Gautam <vivek.gautam@codeaurora.org>

Add iommu domain attribute for using system cache aka last level
cache on QCOM SoCs by client drivers like GPU to set right
attributes for caching the hardware pagetables into the system cache.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Co-developed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-qcom.c | 10 ++++++++++
 drivers/iommu/arm-smmu.c      | 14 ++++++++++++++
 drivers/iommu/arm-smmu.h      |  1 +
 include/linux/iommu.h         |  1 +
 4 files changed, 26 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c..d1d22df 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -30,7 +30,17 @@ static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
 	return ret;
 }
 
+static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+				  struct io_pgtable_cfg *pgtbl_cfg)
+{
+	if (smmu_domain->sys_cache)
+		pgtbl_cfg->coherent_walk = false;
+
+	return 0;
+}
+
 static const struct arm_smmu_impl qcom_smmu_impl = {
+	.init_context = qcom_smmu_init_context,
 	.reset = qcom_sdm845_smmu500_reset,
 };
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4f7e0c0..055b548 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1466,6 +1466,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 		case DOMAIN_ATTR_NESTING:
 			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 			return 0;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			*((int *)data) = smmu_domain->sys_cache;
+			return 0;
 		default:
 			return -ENODEV;
 		}
@@ -1506,6 +1509,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 			else
 				smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 			break;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			if (smmu_domain->smmu) {
+				ret = -EPERM;
+				goto out_unlock;
+			}
+
+			if (*((int *)data))
+				smmu_domain->sys_cache = true;
+			else
+				smmu_domain->sys_cache = false;
+			break;
 		default:
 			ret = -ENODEV;
 		}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index f57cdbe..8aeaaf0 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -322,6 +322,7 @@ struct arm_smmu_domain {
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	struct iommu_domain		domain;
+	bool				sys_cache;
 };
 
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0c60e75..bd61c60 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_QCOM_SYS_CACHE,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] iommu/arm-smmu: Add domain attribute for QCOM system cache
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	Vivek Gautam, robin.murphy

From: Vivek Gautam <vivek.gautam@codeaurora.org>

Add iommu domain attribute for using system cache aka last level
cache on QCOM SoCs by client drivers like GPU to set right
attributes for caching the hardware pagetables into the system cache.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Co-developed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-qcom.c | 10 ++++++++++
 drivers/iommu/arm-smmu.c      | 14 ++++++++++++++
 drivers/iommu/arm-smmu.h      |  1 +
 include/linux/iommu.h         |  1 +
 4 files changed, 26 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c..d1d22df 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -30,7 +30,17 @@ static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
 	return ret;
 }
 
+static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+				  struct io_pgtable_cfg *pgtbl_cfg)
+{
+	if (smmu_domain->sys_cache)
+		pgtbl_cfg->coherent_walk = false;
+
+	return 0;
+}
+
 static const struct arm_smmu_impl qcom_smmu_impl = {
+	.init_context = qcom_smmu_init_context,
 	.reset = qcom_sdm845_smmu500_reset,
 };
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4f7e0c0..055b548 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1466,6 +1466,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 		case DOMAIN_ATTR_NESTING:
 			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 			return 0;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			*((int *)data) = smmu_domain->sys_cache;
+			return 0;
 		default:
 			return -ENODEV;
 		}
@@ -1506,6 +1509,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 			else
 				smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 			break;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			if (smmu_domain->smmu) {
+				ret = -EPERM;
+				goto out_unlock;
+			}
+
+			if (*((int *)data))
+				smmu_domain->sys_cache = true;
+			else
+				smmu_domain->sys_cache = false;
+			break;
 		default:
 			ret = -ENODEV;
 		}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index f57cdbe..8aeaaf0 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -322,6 +322,7 @@ struct arm_smmu_domain {
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	struct iommu_domain		domain;
+	bool				sys_cache;
 };
 
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0c60e75..bd61c60 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_QCOM_SYS_CACHE,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] iommu/arm-smmu: Add domain attribute for QCOM system cache
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, Vivek Gautam, robin.murphy

From: Vivek Gautam <vivek.gautam@codeaurora.org>

Add iommu domain attribute for using system cache aka last level
cache on QCOM SoCs by client drivers like GPU to set right
attributes for caching the hardware pagetables into the system cache.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Co-developed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu-qcom.c | 10 ++++++++++
 drivers/iommu/arm-smmu.c      | 14 ++++++++++++++
 drivers/iommu/arm-smmu.h      |  1 +
 include/linux/iommu.h         |  1 +
 4 files changed, 26 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c..d1d22df 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -30,7 +30,17 @@ static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
 	return ret;
 }
 
+static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+				  struct io_pgtable_cfg *pgtbl_cfg)
+{
+	if (smmu_domain->sys_cache)
+		pgtbl_cfg->coherent_walk = false;
+
+	return 0;
+}
+
 static const struct arm_smmu_impl qcom_smmu_impl = {
+	.init_context = qcom_smmu_init_context,
 	.reset = qcom_sdm845_smmu500_reset,
 };
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4f7e0c0..055b548 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1466,6 +1466,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 		case DOMAIN_ATTR_NESTING:
 			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 			return 0;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			*((int *)data) = smmu_domain->sys_cache;
+			return 0;
 		default:
 			return -ENODEV;
 		}
@@ -1506,6 +1509,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 			else
 				smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 			break;
+		case DOMAIN_ATTR_QCOM_SYS_CACHE:
+			if (smmu_domain->smmu) {
+				ret = -EPERM;
+				goto out_unlock;
+			}
+
+			if (*((int *)data))
+				smmu_domain->sys_cache = true;
+			else
+				smmu_domain->sys_cache = false;
+			break;
 		default:
 			ret = -ENODEV;
 		}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index f57cdbe..8aeaaf0 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -322,6 +322,7 @@ struct arm_smmu_domain {
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	struct iommu_domain		domain;
+	bool				sys_cache;
 };
 
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0c60e75..bd61c60 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_QCOM_SYS_CACHE,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] drm/msm: rearrange the gpu_rmw() function
  2019-12-19 13:14 ` Sharat Masetty
  (?)
@ 2019-12-19 13:14   ` Sharat Masetty
  -1 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan, Sharat Masetty

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +----
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index f50fefb..4c4559f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -165,6 +165,14 @@ u32 msm_readl(const void __iomem *addr)
 	return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+	u32 val = msm_readl(addr);
+
+	val &= ~mask;
+	msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
 	struct work_struct work;
 	int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 71547e7..997729e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -409,6 +409,7 @@ void __iomem *msm_ioremap(struct platform_device *pdev, const char *name,
 		const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index ab8f0f9c..a58ef16 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -223,10 +223,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-	uint32_t val = gpu_read(gpu, reg);
-
-	val &= ~mask;
-	gpu_write(gpu, reg, val | or);
+	msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] drm/msm: rearrange the gpu_rmw() function
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy, Sharat Masetty

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +----
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index f50fefb..4c4559f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -165,6 +165,14 @@ u32 msm_readl(const void __iomem *addr)
 	return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+	u32 val = msm_readl(addr);
+
+	val &= ~mask;
+	msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
 	struct work_struct work;
 	int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 71547e7..997729e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -409,6 +409,7 @@ void __iomem *msm_ioremap(struct platform_device *pdev, const char *name,
 		const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index ab8f0f9c..a58ef16 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -223,10 +223,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-	uint32_t val = gpu_read(gpu, reg);
-
-	val &= ~mask;
-	gpu_write(gpu, reg, val | or);
+	msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] drm/msm: rearrange the gpu_rmw() function
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, robin.murphy, Sharat Masetty

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +----
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index f50fefb..4c4559f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -165,6 +165,14 @@ u32 msm_readl(const void __iomem *addr)
 	return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+	u32 val = msm_readl(addr);
+
+	val &= ~mask;
+	msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
 	struct work_struct work;
 	int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 71547e7..997729e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -409,6 +409,7 @@ void __iomem *msm_ioremap(struct platform_device *pdev, const char *name,
 		const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index ab8f0f9c..a58ef16 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -223,10 +223,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-	uint32_t val = gpu_read(gpu, reg);
-
-	val &= ~mask;
-	gpu_write(gpu, reg, val | or);
+	msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] drm/msm: Pass mmu features to generic layers
  2019-12-19 13:14 ` Sharat Masetty
  (?)
@ 2019-12-19 13:14   ` Sharat Masetty
  -1 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan, Sharat Masetty

Allow different Adreno targets the ability to pass
specific mmu features to the generic layers. This will
help conditionally configure certain iommu features for
certain Adreno targets.

Also Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
 drivers/gpu/drm/msm/msm_gpu.h           |  1 +
 drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
 10 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..bbac43c 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
 
 	adreno_gpu->reg_offsets = a2xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 5f7e980..63448fb 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index ab2b752..90ae26d 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 99cd6e6..a51ed2e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	check_speed_bin(&pdev->dev);
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index daf0780..faff6ff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 048c8be..7dade16 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu,
-		const struct adreno_gpu_funcs *funcs, int nr_rings)
+		const struct adreno_gpu_funcs *funcs, int nr_rings,
+		u32 mmu_features)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
 
 	adreno_gpu_config.nr_rings = nr_rings;
+	adreno_gpu_config.mmu_features = mmu_features;
 
 	adreno_get_pwrlevels(&pdev->dev, gpu);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index e12d5a9..27716f6 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
-		int nr_rings);
+		int nr_rings, u32 mmu_features);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 int adreno_load_fw(struct adreno_gpu *adreno_gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a052364..8bba01e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		uint64_t va_start, uint64_t va_end, u32 mmu_features)
 {
 	struct msm_gem_address_space *aspace;
 	int ret;
@@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 		return ERR_CAST(aspace);
 	}
 
+	msm_mmu_set_feature(aspace->mmu, mmu_features);
+
 	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
 	if (ret) {
 		msm_gem_address_space_put(aspace);
@@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->mmu_features);
 
 	if (gpu->aspace == NULL)
 		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index a58ef16..fcdbab6 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -24,6 +24,7 @@ struct msm_gpu_config {
 	uint64_t va_start;
 	uint64_t va_end;
 	unsigned int nr_rings;
+	u32 mmu_features;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 871d563..1e4ac36d 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -23,6 +23,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
 		dma_addr_t *tran_error);
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 #endif /* __MSM_MMU_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] drm/msm: Pass mmu features to generic layers
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy, Sharat Masetty

Allow different Adreno targets the ability to pass
specific mmu features to the generic layers. This will
help conditionally configure certain iommu features for
certain Adreno targets.

Also Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
 drivers/gpu/drm/msm/msm_gpu.h           |  1 +
 drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
 10 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..bbac43c 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
 
 	adreno_gpu->reg_offsets = a2xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 5f7e980..63448fb 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index ab2b752..90ae26d 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 99cd6e6..a51ed2e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	check_speed_bin(&pdev->dev);
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index daf0780..faff6ff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 048c8be..7dade16 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu,
-		const struct adreno_gpu_funcs *funcs, int nr_rings)
+		const struct adreno_gpu_funcs *funcs, int nr_rings,
+		u32 mmu_features)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
 
 	adreno_gpu_config.nr_rings = nr_rings;
+	adreno_gpu_config.mmu_features = mmu_features;
 
 	adreno_get_pwrlevels(&pdev->dev, gpu);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index e12d5a9..27716f6 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
-		int nr_rings);
+		int nr_rings, u32 mmu_features);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 int adreno_load_fw(struct adreno_gpu *adreno_gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a052364..8bba01e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		uint64_t va_start, uint64_t va_end, u32 mmu_features)
 {
 	struct msm_gem_address_space *aspace;
 	int ret;
@@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 		return ERR_CAST(aspace);
 	}
 
+	msm_mmu_set_feature(aspace->mmu, mmu_features);
+
 	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
 	if (ret) {
 		msm_gem_address_space_put(aspace);
@@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->mmu_features);
 
 	if (gpu->aspace == NULL)
 		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index a58ef16..fcdbab6 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -24,6 +24,7 @@ struct msm_gpu_config {
 	uint64_t va_start;
 	uint64_t va_end;
 	unsigned int nr_rings;
+	u32 mmu_features;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 871d563..1e4ac36d 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -23,6 +23,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
 		dma_addr_t *tran_error);
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 #endif /* __MSM_MMU_H__ */
-- 
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] drm/msm: Pass mmu features to generic layers
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, robin.murphy, Sharat Masetty

Allow different Adreno targets the ability to pass
specific mmu features to the generic layers. This will
help conditionally configure certain iommu features for
certain Adreno targets.

Also Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
 drivers/gpu/drm/msm/msm_gpu.h           |  1 +
 drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
 10 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..bbac43c 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
 
 	adreno_gpu->reg_offsets = a2xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 5f7e980..63448fb 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index ab2b752..90ae26d 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 99cd6e6..a51ed2e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	check_speed_bin(&pdev->dev);
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index daf0780..faff6ff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 048c8be..7dade16 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu,
-		const struct adreno_gpu_funcs *funcs, int nr_rings)
+		const struct adreno_gpu_funcs *funcs, int nr_rings,
+		u32 mmu_features)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
 
 	adreno_gpu_config.nr_rings = nr_rings;
+	adreno_gpu_config.mmu_features = mmu_features;
 
 	adreno_get_pwrlevels(&pdev->dev, gpu);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index e12d5a9..27716f6 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
-		int nr_rings);
+		int nr_rings, u32 mmu_features);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 int adreno_load_fw(struct adreno_gpu *adreno_gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a052364..8bba01e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		uint64_t va_start, uint64_t va_end, u32 mmu_features)
 {
 	struct msm_gem_address_space *aspace;
 	int ret;
@@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 		return ERR_CAST(aspace);
 	}
 
+	msm_mmu_set_feature(aspace->mmu, mmu_features);
+
 	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
 	if (ret) {
 		msm_gem_address_space_put(aspace);
@@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->mmu_features);
 
 	if (gpu->aspace == NULL)
 		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index a58ef16..fcdbab6 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -24,6 +24,7 @@ struct msm_gpu_config {
 	uint64_t va_start;
 	uint64_t va_end;
 	unsigned int nr_rings;
+	u32 mmu_features;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 871d563..1e4ac36d 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -23,6 +23,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
 		dma_addr_t *tran_error);
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 #endif /* __MSM_MMU_H__ */
-- 
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
  2019-12-19 13:14 ` Sharat Masetty
  (?)
@ 2019-12-19 13:14   ` Sharat Masetty
  -1 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: dri-devel, linux-arm-msm, linux-kernel, will, robin.murphy, joro,
	iommu, jcrouse, saiprakash.ranjan, Sharat Masetty

The last level system cache can be partitioned to 32 different slices
of which GPU has two slices preallocated. One slice is used for caching GPU
buffers and the other slice is used for caching the GPU SMMU pagetables.
This patch talks to the core system cache driver to acquire the slice handles,
configure the SCID's to those slices and activates and deactivates the slices
upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use of the
system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
caching GPU data buffers in the system cache with memory attributes such
as outer cacheable, read-allocate, write-allocate for buffers. The GPU
then has the ability to override a few cacheability parameters which it
does to override write-allocate to write-no-allocate as the GPU hardware
does not benefit much from it.

Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
used by the IOMMU driver to set the right attributes to cache the hardware
pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
 drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h         |   3 +
 4 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index faff6ff..0c7fdee 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -9,6 +9,7 @@
 #include "a6xx_gmu.xml.h"

 #include <linux/devfreq.h>
+#include <linux/soc/qcom/llcc-qcom.h>

 #define GPU_PAS_ID 13

@@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
 }

+#define A6XX_LLC_NUM_GPU_SCIDS		5
+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
+
+#define A6XX_GPU_LLC_SCID_MASK \
+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
+
+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
+#define A6XX_GPUHTW_LLC_SCID_MASK \
+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
+
+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
+	u32 reg, u32 mask, u32 or)
+{
+	msm_rmw(llc->mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
+{
+	llcc_slice_deactivate(llc->gpu_llc_slice);
+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_llc *llc)
+{
+	if (!llc->mmio)
+		return;
+
+	/* Program the sub-cache ID for all GPU blocks */
+	if (!llcc_slice_activate(llc->gpu_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPU_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPU_LLC_SCID_MASK));
+
+	/* Program the sub-cache ID for the GPU pagetables */
+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPUHTW_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPUHTW_LLC_SCID_MASK));
+
+	/* Program cacheability overrides */
+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
+		llc->cntl0_regval);
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
+{
+	if (llc->mmio)
+		iounmap(llc->mmio);
+
+	llcc_slice_putd(llc->gpu_llc_slice);
+	llcc_slice_putd(llc->gpuhtw_llc_slice);
+}
+
+static int a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_llc *llc)
+{
+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR_OR_NULL(llc->mmio))
+		return -ENODEV;
+
+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
+		return -ENODEV;
+
+	/*
+	 * CNTL0 provides options to override the settings for the
+	 * read and write allocation policies for the LLC. These
+	 * overrides are global for all memory transactions from
+	 * the GPU.
+	 *
+	 * 0x3: read-no-alloc-overridden = 0
+	 *      read-no-alloc = 0 - Allocate lines on read miss
+	 *      write-no-alloc-overridden = 1
+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
+	 */
+	llc->cntl0_regval = 0x03;
+
+	/*
+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
+	 * FLAG cache) GPU blocks. This value will be passed along with
+	 * the address for any memory transaction from GPU to identify
+	 * the sub-cache for that transaction.
+	 */
+	if (!IS_ERR(llc->gpu_llc_slice)) {
+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
+		int i;
+
+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
+			llc->cntl1_regval |=
+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
+	}
+
+	/*
+	 * Set SCID for GPU IOMMU. This will be used to access
+	 * page tables that are cached in LLC.
+	 */
+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
+
+		llc->cntl1_regval |=
+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
+	}
+
+	return 0;
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)

 	msm_gpu_resume_devfreq(gpu);

+	a6xx_llc_activate(&a6xx_gpu->llc);
+
 	return 0;
 }

@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);

+	a6xx_llc_deactivate(&a6xx_gpu->llc);
+
 	devfreq_suspend_device(gpu->devfreq.devfreq);

 	/*
@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
 	}

+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
 	a6xx_gmu_remove(a6xx_gpu);

 	adreno_gpu_cleanup(adreno_gpu);
@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;

-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
+			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 7239b8b..09b9ad0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -12,6 +12,14 @@

 extern bool hang_debug;

+struct a6xx_llc {
+	void __iomem *mmio;
+	void *gpu_llc_slice;
+	void *gpuhtw_llc_slice;
+	u32 cntl0_regval;
+	u32 cntl1_regval;
+};
+
 struct a6xx_gpu {
 	struct adreno_gpu base;

@@ -21,6 +29,7 @@ struct a6xx_gpu {
 	struct msm_ringbuffer *cur_ring;

 	struct a6xx_gmu gmu;
+	struct a6xx_llc llc;
 };

 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 8c95c31..4699367 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
+	int gpu_htw_llc = 1;
+
+	/*
+	 * This allows GPU to set the bus attributes required
+	 * to use system cache on behalf of the iommu page table
+	 * walker.
+	 */
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		iommu_domain_set_attr(iommu->domain,
+				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
@@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;

+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		prot |= IOMMU_QCOM_SYS_CACHE;
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(!ret);

diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 1e4ac36d..3e6bdad 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -18,6 +18,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };

+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
--
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy, Sharat Masetty

The last level system cache can be partitioned to 32 different slices
of which GPU has two slices preallocated. One slice is used for caching GPU
buffers and the other slice is used for caching the GPU SMMU pagetables.
This patch talks to the core system cache driver to acquire the slice handles,
configure the SCID's to those slices and activates and deactivates the slices
upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use of the
system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
caching GPU data buffers in the system cache with memory attributes such
as outer cacheable, read-allocate, write-allocate for buffers. The GPU
then has the ability to override a few cacheability parameters which it
does to override write-allocate to write-no-allocate as the GPU hardware
does not benefit much from it.

Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
used by the IOMMU driver to set the right attributes to cache the hardware
pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
 drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h         |   3 +
 4 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index faff6ff..0c7fdee 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -9,6 +9,7 @@
 #include "a6xx_gmu.xml.h"

 #include <linux/devfreq.h>
+#include <linux/soc/qcom/llcc-qcom.h>

 #define GPU_PAS_ID 13

@@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
 }

+#define A6XX_LLC_NUM_GPU_SCIDS		5
+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
+
+#define A6XX_GPU_LLC_SCID_MASK \
+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
+
+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
+#define A6XX_GPUHTW_LLC_SCID_MASK \
+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
+
+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
+	u32 reg, u32 mask, u32 or)
+{
+	msm_rmw(llc->mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
+{
+	llcc_slice_deactivate(llc->gpu_llc_slice);
+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_llc *llc)
+{
+	if (!llc->mmio)
+		return;
+
+	/* Program the sub-cache ID for all GPU blocks */
+	if (!llcc_slice_activate(llc->gpu_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPU_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPU_LLC_SCID_MASK));
+
+	/* Program the sub-cache ID for the GPU pagetables */
+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPUHTW_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPUHTW_LLC_SCID_MASK));
+
+	/* Program cacheability overrides */
+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
+		llc->cntl0_regval);
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
+{
+	if (llc->mmio)
+		iounmap(llc->mmio);
+
+	llcc_slice_putd(llc->gpu_llc_slice);
+	llcc_slice_putd(llc->gpuhtw_llc_slice);
+}
+
+static int a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_llc *llc)
+{
+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR_OR_NULL(llc->mmio))
+		return -ENODEV;
+
+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
+		return -ENODEV;
+
+	/*
+	 * CNTL0 provides options to override the settings for the
+	 * read and write allocation policies for the LLC. These
+	 * overrides are global for all memory transactions from
+	 * the GPU.
+	 *
+	 * 0x3: read-no-alloc-overridden = 0
+	 *      read-no-alloc = 0 - Allocate lines on read miss
+	 *      write-no-alloc-overridden = 1
+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
+	 */
+	llc->cntl0_regval = 0x03;
+
+	/*
+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
+	 * FLAG cache) GPU blocks. This value will be passed along with
+	 * the address for any memory transaction from GPU to identify
+	 * the sub-cache for that transaction.
+	 */
+	if (!IS_ERR(llc->gpu_llc_slice)) {
+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
+		int i;
+
+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
+			llc->cntl1_regval |=
+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
+	}
+
+	/*
+	 * Set SCID for GPU IOMMU. This will be used to access
+	 * page tables that are cached in LLC.
+	 */
+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
+
+		llc->cntl1_regval |=
+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
+	}
+
+	return 0;
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)

 	msm_gpu_resume_devfreq(gpu);

+	a6xx_llc_activate(&a6xx_gpu->llc);
+
 	return 0;
 }

@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);

+	a6xx_llc_deactivate(&a6xx_gpu->llc);
+
 	devfreq_suspend_device(gpu->devfreq.devfreq);

 	/*
@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
 	}

+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
 	a6xx_gmu_remove(a6xx_gpu);

 	adreno_gpu_cleanup(adreno_gpu);
@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;

-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
+			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 7239b8b..09b9ad0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -12,6 +12,14 @@

 extern bool hang_debug;

+struct a6xx_llc {
+	void __iomem *mmio;
+	void *gpu_llc_slice;
+	void *gpuhtw_llc_slice;
+	u32 cntl0_regval;
+	u32 cntl1_regval;
+};
+
 struct a6xx_gpu {
 	struct adreno_gpu base;

@@ -21,6 +29,7 @@ struct a6xx_gpu {
 	struct msm_ringbuffer *cur_ring;

 	struct a6xx_gmu gmu;
+	struct a6xx_llc llc;
 };

 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 8c95c31..4699367 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
+	int gpu_htw_llc = 1;
+
+	/*
+	 * This allows GPU to set the bus attributes required
+	 * to use system cache on behalf of the iommu page table
+	 * walker.
+	 */
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		iommu_domain_set_attr(iommu->domain,
+				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
@@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;

+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		prot |= IOMMU_QCOM_SYS_CACHE;
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(!ret);

diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 1e4ac36d..3e6bdad 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -18,6 +18,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };

+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
--
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 13:14   ` Sharat Masetty
  0 siblings, 0 replies; 36+ messages in thread
From: Sharat Masetty @ 2019-12-19 13:14 UTC (permalink / raw)
  To: freedreno
  Cc: saiprakash.ranjan, will, linux-arm-msm, linux-kernel, iommu,
	dri-devel, robin.murphy, Sharat Masetty

The last level system cache can be partitioned to 32 different slices
of which GPU has two slices preallocated. One slice is used for caching GPU
buffers and the other slice is used for caching the GPU SMMU pagetables.
This patch talks to the core system cache driver to acquire the slice handles,
configure the SCID's to those slices and activates and deactivates the slices
upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use of the
system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
caching GPU data buffers in the system cache with memory attributes such
as outer cacheable, read-allocate, write-allocate for buffers. The GPU
then has the ability to override a few cacheability parameters which it
does to override write-allocate to write-no-allocate as the GPU hardware
does not benefit much from it.

Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
used by the IOMMU driver to set the right attributes to cache the hardware
pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
 drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
 drivers/gpu/drm/msm/msm_mmu.h         |   3 +
 4 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index faff6ff..0c7fdee 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -9,6 +9,7 @@
 #include "a6xx_gmu.xml.h"

 #include <linux/devfreq.h>
+#include <linux/soc/qcom/llcc-qcom.h>

 #define GPU_PAS_ID 13

@@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
 }

+#define A6XX_LLC_NUM_GPU_SCIDS		5
+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
+
+#define A6XX_GPU_LLC_SCID_MASK \
+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
+
+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
+#define A6XX_GPUHTW_LLC_SCID_MASK \
+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
+
+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
+	u32 reg, u32 mask, u32 or)
+{
+	msm_rmw(llc->mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
+{
+	llcc_slice_deactivate(llc->gpu_llc_slice);
+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_llc *llc)
+{
+	if (!llc->mmio)
+		return;
+
+	/* Program the sub-cache ID for all GPU blocks */
+	if (!llcc_slice_activate(llc->gpu_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPU_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPU_LLC_SCID_MASK));
+
+	/* Program the sub-cache ID for the GPU pagetables */
+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPUHTW_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPUHTW_LLC_SCID_MASK));
+
+	/* Program cacheability overrides */
+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
+		llc->cntl0_regval);
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
+{
+	if (llc->mmio)
+		iounmap(llc->mmio);
+
+	llcc_slice_putd(llc->gpu_llc_slice);
+	llcc_slice_putd(llc->gpuhtw_llc_slice);
+}
+
+static int a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_llc *llc)
+{
+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR_OR_NULL(llc->mmio))
+		return -ENODEV;
+
+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
+		return -ENODEV;
+
+	/*
+	 * CNTL0 provides options to override the settings for the
+	 * read and write allocation policies for the LLC. These
+	 * overrides are global for all memory transactions from
+	 * the GPU.
+	 *
+	 * 0x3: read-no-alloc-overridden = 0
+	 *      read-no-alloc = 0 - Allocate lines on read miss
+	 *      write-no-alloc-overridden = 1
+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
+	 */
+	llc->cntl0_regval = 0x03;
+
+	/*
+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
+	 * FLAG cache) GPU blocks. This value will be passed along with
+	 * the address for any memory transaction from GPU to identify
+	 * the sub-cache for that transaction.
+	 */
+	if (!IS_ERR(llc->gpu_llc_slice)) {
+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
+		int i;
+
+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
+			llc->cntl1_regval |=
+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
+	}
+
+	/*
+	 * Set SCID for GPU IOMMU. This will be used to access
+	 * page tables that are cached in LLC.
+	 */
+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
+
+		llc->cntl1_regval |=
+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
+	}
+
+	return 0;
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)

 	msm_gpu_resume_devfreq(gpu);

+	a6xx_llc_activate(&a6xx_gpu->llc);
+
 	return 0;
 }

@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);

+	a6xx_llc_deactivate(&a6xx_gpu->llc);
+
 	devfreq_suspend_device(gpu->devfreq.devfreq);

 	/*
@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
 	}

+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
 	a6xx_gmu_remove(a6xx_gpu);

 	adreno_gpu_cleanup(adreno_gpu);
@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = NULL;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;

-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
+			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 7239b8b..09b9ad0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -12,6 +12,14 @@

 extern bool hang_debug;

+struct a6xx_llc {
+	void __iomem *mmio;
+	void *gpu_llc_slice;
+	void *gpuhtw_llc_slice;
+	u32 cntl0_regval;
+	u32 cntl1_regval;
+};
+
 struct a6xx_gpu {
 	struct adreno_gpu base;

@@ -21,6 +29,7 @@ struct a6xx_gpu {
 	struct msm_ringbuffer *cur_ring;

 	struct a6xx_gmu gmu;
+	struct a6xx_llc llc;
 };

 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 8c95c31..4699367 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
+	int gpu_htw_llc = 1;
+
+	/*
+	 * This allows GPU to set the bus attributes required
+	 * to use system cache on behalf of the iommu page table
+	 * walker.
+	 */
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		iommu_domain_set_attr(iommu->domain,
+				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
@@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;

+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		prot |= IOMMU_QCOM_SYS_CACHE;
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(!ret);

diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 1e4ac36d..3e6bdad 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -18,6 +18,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };

+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] drm/msm: Pass mmu features to generic layers
  2019-12-19 13:14   ` Sharat Masetty
  (?)
@ 2019-12-19 19:23     ` Jordan Crouse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:23 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, dri-devel, linux-arm-msm, linux-kernel, will,
	robin.murphy, joro, iommu, saiprakash.ranjan

On Thu, Dec 19, 2019 at 06:44:45PM +0530, Sharat Masetty wrote:
> Allow different Adreno targets the ability to pass
> specific mmu features to the generic layers. This will
> help conditionally configure certain iommu features for
> certain Adreno targets.
> 
> Also Add a few simple support functions to support a bitmask of
> features that a specific MMU implementation supports.

This whole change could benefit from [1] which makes the address space
creation target specific.

That would get rid of most of the blobs. Further more, if you took part of [2]
that set up the mmu inside of the target specific code (skipping over the
SPLIT_PAGETABLE stuff for now) you could set mmu->features directly and not need
a helper function to do it.

[1] https://patchwork.freedesktop.org/patch/342170/
[2] https://patchwork.freedesktop.org/patch/342173/

Jordan

> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
>  drivers/gpu/drm/msm/msm_gpu.h           |  1 +
>  drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
>  10 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 1f83bc1..bbac43c 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
>  
>  	adreno_gpu->reg_offsets = a2xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 5f7e980..63448fb 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a3xx_registers;
>  	adreno_gpu->reg_offsets = a3xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ab2b752..90ae26d 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a4xx_registers;
>  	adreno_gpu->reg_offsets = a4xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 99cd6e6..a51ed2e 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  
>  	check_speed_bin(&pdev->dev);
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
>  	if (ret) {
>  		a5xx_destroy(&(a5xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index daf0780..faff6ff 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 048c8be..7dade16 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *adreno_gpu,
> -		const struct adreno_gpu_funcs *funcs, int nr_rings)
> +		const struct adreno_gpu_funcs *funcs, int nr_rings,
> +		u32 mmu_features)
>  {
>  	struct adreno_platform_config *config = pdev->dev.platform_data;
>  	struct msm_gpu_config adreno_gpu_config  = { 0 };
> @@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
>  
>  	adreno_gpu_config.nr_rings = nr_rings;
> +	adreno_gpu_config.mmu_features = mmu_features;
>  
>  	adreno_get_pwrlevels(&pdev->dev, gpu);
>  
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index e12d5a9..27716f6 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
> -		int nr_rings);
> +		int nr_rings, u32 mmu_features);
>  void adreno_gpu_cleanup(struct adreno_gpu *gpu);
>  int adreno_load_fw(struct adreno_gpu *adreno_gpu);
>  
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index a052364..8bba01e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  
>  static struct msm_gem_address_space *
>  msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
> -		uint64_t va_start, uint64_t va_end)
> +		uint64_t va_start, uint64_t va_end, u32 mmu_features)
>  {
>  	struct msm_gem_address_space *aspace;
>  	int ret;
> @@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  		return ERR_CAST(aspace);
>  	}
>  
> +	msm_mmu_set_feature(aspace->mmu, mmu_features);
> +
>  	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
>  	if (ret) {
>  		msm_gem_address_space_put(aspace);
> @@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	msm_devfreq_init(gpu);
>  
>  	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
> -		config->va_start, config->va_end);
> +		config->va_start, config->va_end, config->mmu_features);


>  
>  	if (gpu->aspace == NULL)
>  		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index a58ef16..fcdbab6 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -24,6 +24,7 @@ struct msm_gpu_config {
>  	uint64_t va_start;
>  	uint64_t va_end;
>  	unsigned int nr_rings;
> +	u32 mmu_features;
>  };
>  
>  /* So far, with hardware that I've seen to date, we can have:
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 871d563..1e4ac36d 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -23,6 +23,7 @@ struct msm_mmu {
>  	struct device *dev;
>  	int (*handler)(void *arg, unsigned long iova, int flags);
>  	void *arg;
> +	u32 features;
>  };
>  
>  static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
> @@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
>  void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
>  		dma_addr_t *tran_error);
>  
> +static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	mmu->features |= feature;
> +}
> +
> +static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	return (mmu->features & feature) ? true : false;
> +}
> +
>  #endif /* __MSM_MMU_H__ */
> -- 
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] drm/msm: Pass mmu features to generic layers
@ 2019-12-19 19:23     ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:23 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy

On Thu, Dec 19, 2019 at 06:44:45PM +0530, Sharat Masetty wrote:
> Allow different Adreno targets the ability to pass
> specific mmu features to the generic layers. This will
> help conditionally configure certain iommu features for
> certain Adreno targets.
> 
> Also Add a few simple support functions to support a bitmask of
> features that a specific MMU implementation supports.

This whole change could benefit from [1] which makes the address space
creation target specific.

That would get rid of most of the blobs. Further more, if you took part of [2]
that set up the mmu inside of the target specific code (skipping over the
SPLIT_PAGETABLE stuff for now) you could set mmu->features directly and not need
a helper function to do it.

[1] https://patchwork.freedesktop.org/patch/342170/
[2] https://patchwork.freedesktop.org/patch/342173/

Jordan

> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
>  drivers/gpu/drm/msm/msm_gpu.h           |  1 +
>  drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
>  10 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 1f83bc1..bbac43c 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
>  
>  	adreno_gpu->reg_offsets = a2xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 5f7e980..63448fb 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a3xx_registers;
>  	adreno_gpu->reg_offsets = a3xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ab2b752..90ae26d 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a4xx_registers;
>  	adreno_gpu->reg_offsets = a4xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 99cd6e6..a51ed2e 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  
>  	check_speed_bin(&pdev->dev);
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
>  	if (ret) {
>  		a5xx_destroy(&(a5xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index daf0780..faff6ff 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 048c8be..7dade16 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *adreno_gpu,
> -		const struct adreno_gpu_funcs *funcs, int nr_rings)
> +		const struct adreno_gpu_funcs *funcs, int nr_rings,
> +		u32 mmu_features)
>  {
>  	struct adreno_platform_config *config = pdev->dev.platform_data;
>  	struct msm_gpu_config adreno_gpu_config  = { 0 };
> @@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
>  
>  	adreno_gpu_config.nr_rings = nr_rings;
> +	adreno_gpu_config.mmu_features = mmu_features;
>  
>  	adreno_get_pwrlevels(&pdev->dev, gpu);
>  
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index e12d5a9..27716f6 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
> -		int nr_rings);
> +		int nr_rings, u32 mmu_features);
>  void adreno_gpu_cleanup(struct adreno_gpu *gpu);
>  int adreno_load_fw(struct adreno_gpu *adreno_gpu);
>  
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index a052364..8bba01e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  
>  static struct msm_gem_address_space *
>  msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
> -		uint64_t va_start, uint64_t va_end)
> +		uint64_t va_start, uint64_t va_end, u32 mmu_features)
>  {
>  	struct msm_gem_address_space *aspace;
>  	int ret;
> @@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  		return ERR_CAST(aspace);
>  	}
>  
> +	msm_mmu_set_feature(aspace->mmu, mmu_features);
> +
>  	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
>  	if (ret) {
>  		msm_gem_address_space_put(aspace);
> @@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	msm_devfreq_init(gpu);
>  
>  	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
> -		config->va_start, config->va_end);
> +		config->va_start, config->va_end, config->mmu_features);


>  
>  	if (gpu->aspace == NULL)
>  		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index a58ef16..fcdbab6 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -24,6 +24,7 @@ struct msm_gpu_config {
>  	uint64_t va_start;
>  	uint64_t va_end;
>  	unsigned int nr_rings;
> +	u32 mmu_features;
>  };
>  
>  /* So far, with hardware that I've seen to date, we can have:
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 871d563..1e4ac36d 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -23,6 +23,7 @@ struct msm_mmu {
>  	struct device *dev;
>  	int (*handler)(void *arg, unsigned long iova, int flags);
>  	void *arg;
> +	u32 features;
>  };
>  
>  static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
> @@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
>  void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
>  		dma_addr_t *tran_error);
>  
> +static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	mmu->features |= feature;
> +}
> +
> +static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	return (mmu->features & feature) ? true : false;
> +}
> +
>  #endif /* __MSM_MMU_H__ */
> -- 
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] drm/msm: Pass mmu features to generic layers
@ 2019-12-19 19:23     ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:23 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, saiprakash.ranjan, will, linux-arm-msm, linux-kernel,
	iommu, dri-devel, robin.murphy

On Thu, Dec 19, 2019 at 06:44:45PM +0530, Sharat Masetty wrote:
> Allow different Adreno targets the ability to pass
> specific mmu features to the generic layers. This will
> help conditionally configure certain iommu features for
> certain Adreno targets.
> 
> Also Add a few simple support functions to support a bitmask of
> features that a specific MMU implementation supports.

This whole change could benefit from [1] which makes the address space
creation target specific.

That would get rid of most of the blobs. Further more, if you took part of [2]
that set up the mmu inside of the target specific code (skipping over the
SPLIT_PAGETABLE stuff for now) you could set mmu->features directly and not need
a helper function to do it.

[1] https://patchwork.freedesktop.org/patch/342170/
[2] https://patchwork.freedesktop.org/patch/342173/

Jordan

> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
>  drivers/gpu/drm/msm/msm_gpu.h           |  1 +
>  drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
>  10 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 1f83bc1..bbac43c 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
>  
>  	adreno_gpu->reg_offsets = a2xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 5f7e980..63448fb 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a3xx_registers;
>  	adreno_gpu->reg_offsets = a3xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ab2b752..90ae26d 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a4xx_registers;
>  	adreno_gpu->reg_offsets = a4xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 99cd6e6..a51ed2e 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  
>  	check_speed_bin(&pdev->dev);
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
>  	if (ret) {
>  		a5xx_destroy(&(a5xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index daf0780..faff6ff 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 048c8be..7dade16 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *adreno_gpu,
> -		const struct adreno_gpu_funcs *funcs, int nr_rings)
> +		const struct adreno_gpu_funcs *funcs, int nr_rings,
> +		u32 mmu_features)
>  {
>  	struct adreno_platform_config *config = pdev->dev.platform_data;
>  	struct msm_gpu_config adreno_gpu_config  = { 0 };
> @@ -916,6 +917,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K;
>  
>  	adreno_gpu_config.nr_rings = nr_rings;
> +	adreno_gpu_config.mmu_features = mmu_features;
>  
>  	adreno_get_pwrlevels(&pdev->dev, gpu);
>  
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index e12d5a9..27716f6 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -248,7 +248,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
> -		int nr_rings);
> +		int nr_rings, u32 mmu_features);
>  void adreno_gpu_cleanup(struct adreno_gpu *gpu);
>  int adreno_load_fw(struct adreno_gpu *adreno_gpu);
>  
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index a052364..8bba01e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -804,7 +804,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  
>  static struct msm_gem_address_space *
>  msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
> -		uint64_t va_start, uint64_t va_end)
> +		uint64_t va_start, uint64_t va_end, u32 mmu_features)
>  {
>  	struct msm_gem_address_space *aspace;
>  	int ret;
> @@ -838,6 +838,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
>  		return ERR_CAST(aspace);
>  	}
>  
> +	msm_mmu_set_feature(aspace->mmu, mmu_features);
> +
>  	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
>  	if (ret) {
>  		msm_gem_address_space_put(aspace);
> @@ -920,7 +922,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	msm_devfreq_init(gpu);
>  
>  	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
> -		config->va_start, config->va_end);
> +		config->va_start, config->va_end, config->mmu_features);


>  
>  	if (gpu->aspace == NULL)
>  		DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index a58ef16..fcdbab6 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -24,6 +24,7 @@ struct msm_gpu_config {
>  	uint64_t va_start;
>  	uint64_t va_end;
>  	unsigned int nr_rings;
> +	u32 mmu_features;
>  };
>  
>  /* So far, with hardware that I've seen to date, we can have:
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 871d563..1e4ac36d 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -23,6 +23,7 @@ struct msm_mmu {
>  	struct device *dev;
>  	int (*handler)(void *arg, unsigned long iova, int flags);
>  	void *arg;
> +	u32 features;
>  };
>  
>  static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
> @@ -45,4 +46,14 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
>  void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
>  		dma_addr_t *tran_error);
>  
> +static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	mmu->features |= feature;
> +}
> +
> +static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
> +{
> +	return (mmu->features & feature) ? true : false;
> +}
> +
>  #endif /* __MSM_MMU_H__ */
> -- 
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
  2019-12-19 13:14   ` Sharat Masetty
  (?)
@ 2019-12-19 19:58     ` Jordan Crouse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:58 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, dri-devel, linux-arm-msm, linux-kernel, will,
	robin.murphy, joro, iommu, saiprakash.ranjan

On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32 different slices
> of which GPU has two slices preallocated. One slice is used for caching GPU
> buffers and the other slice is used for caching the GPU SMMU pagetables.
> This patch talks to the core system cache driver to acquire the slice handles,
> configure the SCID's to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use of the
> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
> caching GPU data buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> then has the ability to override a few cacheability parameters which it
> does to override write-allocate to write-no-allocate as the GPU hardware
> does not benefit much from it.
> 
> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> used by the IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>  4 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index faff6ff..0c7fdee 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -9,6 +9,7 @@
>  #include "a6xx_gmu.xml.h"
> 
>  #include <linux/devfreq.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
> 
>  #define GPU_PAS_ID 13
> 
> @@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>  }
> 
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5

As I mention below, I'm not sure if we need these 

> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +

Normally these go into the envytools regmap but if we're going to do these guys
lets use the power of <linux/bitfield.h> for good.

#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)

> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,

Don't mark C functions as inline - let the compiler figure it out for you.

> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> +{
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct a6xx_llc *llc)
> +{
> +	if (!llc->mmio)
> +		return;
> +
> +	/* Program the sub-cache ID for all GPU blocks */
> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPU_LLC_SCID_MASK));

This is out of order with the comments below, but if we store the slice id then
you could calculate regval here and not have to store it.

> +
> +	/* Program the sub-cache ID for the GPU pagetables */
> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))

val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);

> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPUHTW_LLC_SCID_MASK));

And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);

In theory you could just calculate the u32 and write it directly without a rmw.
In fact, that might be preferable - if the slice activate failed, you don't want
to run the risk that the scid for htw is still populated.

> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);

As below, this could easily be a constant.

> +}
> +
> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> +	if (llc->mmio)
> +		iounmap(llc->mmio);

msm_ioremap returns a devm_ managed resource, so do not use iounmap() to free
it. Bets to just leave it and let the gpu device handle it when it goes boom.

> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
T
This can be void, I don't think we care if it passes or fails.

> +		struct a6xx_llc *llc)
> +{
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR_OR_NULL(llc->mmio))

msm_ioremap can not return NULL.

> +		return -ENODEV;
> +
> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> +		return -ENODEV;
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;

This is a fixed value isn't it?  We should be able to get away with writing a
constant.

> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 */
> +	if (!IS_ERR(llc->gpu_llc_slice)) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +		int i;
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);

As above, i'm not sure a loop is better than just:

gpu_scid &= 0x1f;

llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
 | (gpu_scid << 15) | (gpu_scid << 20);

And I'm not even sure we need do this math here in the first place.

> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}

As above, I think storing the slice id could be more beneficial than calculating
a value, but if we do calculate a value, use FIELD_SET(A6XX_GPUHTW_LLC_SCID, )

> +
> +	return 0;
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> 
>  	msm_gpu_resume_devfreq(gpu);
> 
> +	a6xx_llc_activate(&a6xx_gpu->llc);
> +
>  	return 0;
>  }
> 
> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> 
> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
> +
>  	devfreq_suspend_device(gpu->devfreq.devfreq);
> 
>  	/*
> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>  	}
> 
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>  	a6xx_gmu_remove(a6xx_gpu);
> 
>  	adreno_gpu_cleanup(adreno_gpu);
> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
> 
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> +

Confirming we don't care if a6xx_llc_slices_init passes or fails.

> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 7239b8b..09b9ad0 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -12,6 +12,14 @@
> 
>  extern bool hang_debug;
> 
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;

As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure cntl1 is
needed - since we could store or query the ids at activate time.

> +	u32 cntl1_regval;
> +};
> +
>  struct a6xx_gpu {
>  	struct adreno_gpu base;
> 
> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>  	struct msm_ringbuffer *cur_ring;
> 
>  	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>  };
> 
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 8c95c31..4699367 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>  			    int cnt)
>  {
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

We're all okay if this fails?  No harm no foul?

> 
>  	return iommu_attach_device(iommu->domain, mmu->dev);
>  }
> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	size_t ret;
> 
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_QCOM_SYS_CACHE;
> +
>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>  	WARN_ON(!ret);
> 
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 1e4ac36d..3e6bdad 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>  	void (*destroy)(struct msm_mmu *mmu);
>  };
> 
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>  struct msm_mmu {
>  	const struct msm_mmu_funcs *funcs;
>  	struct device *dev;
> --
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 19:58     ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:58 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy

On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32 different slices
> of which GPU has two slices preallocated. One slice is used for caching GPU
> buffers and the other slice is used for caching the GPU SMMU pagetables.
> This patch talks to the core system cache driver to acquire the slice handles,
> configure the SCID's to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use of the
> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
> caching GPU data buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> then has the ability to override a few cacheability parameters which it
> does to override write-allocate to write-no-allocate as the GPU hardware
> does not benefit much from it.
> 
> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> used by the IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>  4 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index faff6ff..0c7fdee 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -9,6 +9,7 @@
>  #include "a6xx_gmu.xml.h"
> 
>  #include <linux/devfreq.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
> 
>  #define GPU_PAS_ID 13
> 
> @@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>  }
> 
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5

As I mention below, I'm not sure if we need these 

> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +

Normally these go into the envytools regmap but if we're going to do these guys
lets use the power of <linux/bitfield.h> for good.

#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)

> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,

Don't mark C functions as inline - let the compiler figure it out for you.

> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> +{
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct a6xx_llc *llc)
> +{
> +	if (!llc->mmio)
> +		return;
> +
> +	/* Program the sub-cache ID for all GPU blocks */
> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPU_LLC_SCID_MASK));

This is out of order with the comments below, but if we store the slice id then
you could calculate regval here and not have to store it.

> +
> +	/* Program the sub-cache ID for the GPU pagetables */
> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))

val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);

> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPUHTW_LLC_SCID_MASK));

And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);

In theory you could just calculate the u32 and write it directly without a rmw.
In fact, that might be preferable - if the slice activate failed, you don't want
to run the risk that the scid for htw is still populated.

> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);

As below, this could easily be a constant.

> +}
> +
> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> +	if (llc->mmio)
> +		iounmap(llc->mmio);

msm_ioremap returns a devm_ managed resource, so do not use iounmap() to free
it. Bets to just leave it and let the gpu device handle it when it goes boom.

> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
T
This can be void, I don't think we care if it passes or fails.

> +		struct a6xx_llc *llc)
> +{
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR_OR_NULL(llc->mmio))

msm_ioremap can not return NULL.

> +		return -ENODEV;
> +
> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> +		return -ENODEV;
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;

This is a fixed value isn't it?  We should be able to get away with writing a
constant.

> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 */
> +	if (!IS_ERR(llc->gpu_llc_slice)) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +		int i;
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);

As above, i'm not sure a loop is better than just:

gpu_scid &= 0x1f;

llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
 | (gpu_scid << 15) | (gpu_scid << 20);

And I'm not even sure we need do this math here in the first place.

> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}

As above, I think storing the slice id could be more beneficial than calculating
a value, but if we do calculate a value, use FIELD_SET(A6XX_GPUHTW_LLC_SCID, )

> +
> +	return 0;
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> 
>  	msm_gpu_resume_devfreq(gpu);
> 
> +	a6xx_llc_activate(&a6xx_gpu->llc);
> +
>  	return 0;
>  }
> 
> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> 
> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
> +
>  	devfreq_suspend_device(gpu->devfreq.devfreq);
> 
>  	/*
> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>  	}
> 
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>  	a6xx_gmu_remove(a6xx_gpu);
> 
>  	adreno_gpu_cleanup(adreno_gpu);
> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
> 
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> +

Confirming we don't care if a6xx_llc_slices_init passes or fails.

> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 7239b8b..09b9ad0 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -12,6 +12,14 @@
> 
>  extern bool hang_debug;
> 
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;

As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure cntl1 is
needed - since we could store or query the ids at activate time.

> +	u32 cntl1_regval;
> +};
> +
>  struct a6xx_gpu {
>  	struct adreno_gpu base;
> 
> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>  	struct msm_ringbuffer *cur_ring;
> 
>  	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>  };
> 
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 8c95c31..4699367 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>  			    int cnt)
>  {
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

We're all okay if this fails?  No harm no foul?

> 
>  	return iommu_attach_device(iommu->domain, mmu->dev);
>  }
> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	size_t ret;
> 
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_QCOM_SYS_CACHE;
> +
>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>  	WARN_ON(!ret);
> 
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 1e4ac36d..3e6bdad 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>  	void (*destroy)(struct msm_mmu *mmu);
>  };
> 
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>  struct msm_mmu {
>  	const struct msm_mmu_funcs *funcs;
>  	struct device *dev;
> --
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 19:58     ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 19:58 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, saiprakash.ranjan, will, linux-arm-msm, linux-kernel,
	iommu, dri-devel, robin.murphy

On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32 different slices
> of which GPU has two slices preallocated. One slice is used for caching GPU
> buffers and the other slice is used for caching the GPU SMMU pagetables.
> This patch talks to the core system cache driver to acquire the slice handles,
> configure the SCID's to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use of the
> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
> caching GPU data buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> then has the ability to override a few cacheability parameters which it
> does to override write-allocate to write-no-allocate as the GPU hardware
> does not benefit much from it.
> 
> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> used by the IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>  4 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index faff6ff..0c7fdee 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -9,6 +9,7 @@
>  #include "a6xx_gmu.xml.h"
> 
>  #include <linux/devfreq.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
> 
>  #define GPU_PAS_ID 13
> 
> @@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>  }
> 
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5

As I mention below, I'm not sure if we need these 

> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +

Normally these go into the envytools regmap but if we're going to do these guys
lets use the power of <linux/bitfield.h> for good.

#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)

> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,

Don't mark C functions as inline - let the compiler figure it out for you.

> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> +{
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct a6xx_llc *llc)
> +{
> +	if (!llc->mmio)
> +		return;
> +
> +	/* Program the sub-cache ID for all GPU blocks */
> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPU_LLC_SCID_MASK));

This is out of order with the comments below, but if we store the slice id then
you could calculate regval here and not have to store it.

> +
> +	/* Program the sub-cache ID for the GPU pagetables */
> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))

val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);

> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPUHTW_LLC_SCID_MASK));

And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);

In theory you could just calculate the u32 and write it directly without a rmw.
In fact, that might be preferable - if the slice activate failed, you don't want
to run the risk that the scid for htw is still populated.

> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);

As below, this could easily be a constant.

> +}
> +
> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> +	if (llc->mmio)
> +		iounmap(llc->mmio);

msm_ioremap returns a devm_ managed resource, so do not use iounmap() to free
it. Bets to just leave it and let the gpu device handle it when it goes boom.

> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
T
This can be void, I don't think we care if it passes or fails.

> +		struct a6xx_llc *llc)
> +{
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR_OR_NULL(llc->mmio))

msm_ioremap can not return NULL.

> +		return -ENODEV;
> +
> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> +		return -ENODEV;
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;

This is a fixed value isn't it?  We should be able to get away with writing a
constant.

> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 */
> +	if (!IS_ERR(llc->gpu_llc_slice)) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +		int i;
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);

As above, i'm not sure a loop is better than just:

gpu_scid &= 0x1f;

llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
 | (gpu_scid << 15) | (gpu_scid << 20);

And I'm not even sure we need do this math here in the first place.

> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}

As above, I think storing the slice id could be more beneficial than calculating
a value, but if we do calculate a value, use FIELD_SET(A6XX_GPUHTW_LLC_SCID, )

> +
> +	return 0;
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> 
>  	msm_gpu_resume_devfreq(gpu);
> 
> +	a6xx_llc_activate(&a6xx_gpu->llc);
> +
>  	return 0;
>  }
> 
> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> 
> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
> +
>  	devfreq_suspend_device(gpu->devfreq.devfreq);
> 
>  	/*
> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>  	}
> 
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>  	a6xx_gmu_remove(a6xx_gpu);
> 
>  	adreno_gpu_cleanup(adreno_gpu);
> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = NULL;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
> 
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> +

Confirming we don't care if a6xx_llc_slices_init passes or fails.

> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 7239b8b..09b9ad0 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -12,6 +12,14 @@
> 
>  extern bool hang_debug;
> 
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;

As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure cntl1 is
needed - since we could store or query the ids at activate time.

> +	u32 cntl1_regval;
> +};
> +
>  struct a6xx_gpu {
>  	struct adreno_gpu base;
> 
> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>  	struct msm_ringbuffer *cur_ring;
> 
>  	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>  };
> 
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 8c95c31..4699367 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>  			    int cnt)
>  {
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);

We're all okay if this fails?  No harm no foul?

> 
>  	return iommu_attach_device(iommu->domain, mmu->dev);
>  }
> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	size_t ret;
> 
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_QCOM_SYS_CACHE;
> +
>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>  	WARN_ON(!ret);
> 
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 1e4ac36d..3e6bdad 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>  	void (*destroy)(struct msm_mmu *mmu);
>  };
> 
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>  struct msm_mmu {
>  	const struct msm_mmu_funcs *funcs;
>  	struct device *dev;
> --
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
  2019-12-19 19:58     ` Jordan Crouse
  (?)
@ 2019-12-19 20:10       ` Jordan Crouse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 20:10 UTC (permalink / raw)
  To: Sharat Masetty, freedreno, dri-devel, linux-arm-msm,
	linux-kernel, will, robin.murphy, joro, iommu, saiprakash.ranjan

On Thu, Dec 19, 2019 at 12:58:15PM -0700, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:


<snip>

> > +
> > +	/*
> > +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> > +	 * FLAG cache) GPU blocks. This value will be passed along with
> > +	 * the address for any memory transaction from GPU to identify
> > +	 * the sub-cache for that transaction.
> > +	 */
> > +	if (!IS_ERR(llc->gpu_llc_slice)) {
> > +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> > +		int i;
> > +
> > +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> > +			llc->cntl1_regval |=
> > +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.

One more question - can you get a valid slice id before activation?

<snip>

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 20:10       ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 20:10 UTC (permalink / raw)
  To: Sharat Masetty, freedreno, dri-devel, linux-arm-msm,
	linux-kernel, will, robin.murphy, joro, iommu, saiprakash.ranjan

On Thu, Dec 19, 2019 at 12:58:15PM -0700, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:


<snip>

> > +
> > +	/*
> > +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> > +	 * FLAG cache) GPU blocks. This value will be passed along with
> > +	 * the address for any memory transaction from GPU to identify
> > +	 * the sub-cache for that transaction.
> > +	 */
> > +	if (!IS_ERR(llc->gpu_llc_slice)) {
> > +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> > +		int i;
> > +
> > +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> > +			llc->cntl1_regval |=
> > +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.

One more question - can you get a valid slice id before activation?

<snip>

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-19 20:10       ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-19 20:10 UTC (permalink / raw)
  To: Sharat Masetty, freedreno, dri-devel, linux-arm-msm,
	linux-kernel, will, robin.murphy, joro, iommu, saiprakash.ranjan

On Thu, Dec 19, 2019 at 12:58:15PM -0700, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:


<snip>

> > +
> > +	/*
> > +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> > +	 * FLAG cache) GPU blocks. This value will be passed along with
> > +	 * the address for any memory transaction from GPU to identify
> > +	 * the sub-cache for that transaction.
> > +	 */
> > +	if (!IS_ERR(llc->gpu_llc_slice)) {
> > +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> > +		int i;
> > +
> > +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> > +			llc->cntl1_regval |=
> > +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.

One more question - can you get a valid slice id before activation?

<snip>

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
  2019-12-19 19:58     ` Jordan Crouse
  (?)
@ 2019-12-20 10:10       ` smasetty
  -1 siblings, 0 replies; 36+ messages in thread
From: smasetty @ 2019-12-20 10:10 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, dri-devel, linux-arm-msm, linux-kernel, will,
	robin.murphy, joro, iommu, saiprakash.ranjan,
	linux-arm-msm-owner

On 2019-12-20 01:28, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
>> The last level system cache can be partitioned to 32 different slices
>> of which GPU has two slices preallocated. One slice is used for 
>> caching GPU
>> buffers and the other slice is used for caching the GPU SMMU 
>> pagetables.
>> This patch talks to the core system cache driver to acquire the slice 
>> handles,
>> configure the SCID's to those slices and activates and deactivates the 
>> slices
>> upon GPU power collapse and restore.
>> 
>> Some support from the IOMMU driver is also needed to make use of the
>> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which 
>> enables
>> caching GPU data buffers in the system cache with memory attributes 
>> such
>> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
>> then has the ability to override a few cacheability parameters which 
>> it
>> does to override write-allocate to write-no-allocate as the GPU 
>> hardware
>> does not benefit much from it.
>> 
>> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
>> used by the IOMMU driver to set the right attributes to cache the 
>> hardware
>> pagetables into the system cache.
>> 
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 
>> +++++++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>>  4 files changed, 146 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index faff6ff..0c7fdee 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -9,6 +9,7 @@
>>  #include "a6xx_gmu.xml.h"
>> 
>>  #include <linux/devfreq.h>
>> +#include <linux/soc/qcom/llcc-qcom.h>
>> 
>>  #define GPU_PAS_ID 13
>> 
>> @@ -781,6 +782,117 @@ static void 
>> a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>>  }
>> 
>> +#define A6XX_LLC_NUM_GPU_SCIDS		5
>> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> 
> As I mention below, I'm not sure if we need these
> 
>> +#define A6XX_GPU_LLC_SCID_MASK \
>> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
>> +
>> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
>> +#define A6XX_GPUHTW_LLC_SCID_MASK \
>> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << 
>> A6XX_GPUHTW_LLC_SCID_SHIFT)
>> +
> 
> Normally these go into the envytools regmap but if we're going to do 
> these guys
> lets use the power of <linux/bitfield.h> for good.
> 
> #define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> #define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> 
>> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> 
> Don't mark C functions as inline - let the compiler figure it out for 
> you.
> 
>> +	u32 reg, u32 mask, u32 or)
>> +{
>> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
>> +}
>> +
>> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
>> +{
>> +	llcc_slice_deactivate(llc->gpu_llc_slice);
>> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static void a6xx_llc_activate(struct a6xx_llc *llc)
>> +{
>> +	if (!llc->mmio)
>> +		return;
>> +
>> +	/* Program the sub-cache ID for all GPU blocks */
>> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPU_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPU_LLC_SCID_MASK));
> 
> This is out of order with the comments below, but if we store the slice 
> id then
> you could calculate regval here and not have to store it.
> 
>> +
>> +	/* Program the sub-cache ID for the GPU pagetables */
>> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> 
> val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> 
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPUHTW_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPUHTW_LLC_SCID_MASK));
> 
> And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> 
> In theory you could just calculate the u32 and write it directly 
> without a rmw.
> In fact, that might be preferable - if the slice activate failed, you 
> don't want
> to run the risk that the scid for htw is still populated.
> 
>> +
>> +	/* Program cacheability overrides */
>> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
>> +		llc->cntl0_regval);
> 
> As below, this could easily be a constant.
> 
>> +}
>> +
>> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
>> +{
>> +	if (llc->mmio)
>> +		iounmap(llc->mmio);
> 
> msm_ioremap returns a devm_ managed resource, so do not use iounmap() 
> to free
> it. Bets to just leave it and let the gpu device handle it when it goes 
> boom.
> 
>> +
>> +	llcc_slice_putd(llc->gpu_llc_slice);
>> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> T
> This can be void, I don't think we care if it passes or fails.
> 
>> +		struct a6xx_llc *llc)
>> +{
>> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
>> +	if (IS_ERR_OR_NULL(llc->mmio))
> 
> msm_ioremap can not return NULL.
> 
>> +		return -ENODEV;
>> +
>> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
>> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
>> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * CNTL0 provides options to override the settings for the
>> +	 * read and write allocation policies for the LLC. These
>> +	 * overrides are global for all memory transactions from
>> +	 * the GPU.
>> +	 *
>> +	 * 0x3: read-no-alloc-overridden = 0
>> +	 *      read-no-alloc = 0 - Allocate lines on read miss
>> +	 *      write-no-alloc-overridden = 1
>> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
>> +	 */
>> +	llc->cntl0_regval = 0x03;
> 
> This is a fixed value isn't it?  We should be able to get away with 
> writing a
> constant.
> 
>> +
>> +	/*
>> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
>> +	 * FLAG cache) GPU blocks. This value will be passed along with
>> +	 * the address for any memory transaction from GPU to identify
>> +	 * the sub-cache for that transaction.
>> +	 */
>> +	if (!IS_ERR(llc->gpu_llc_slice)) {
>> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
>> +		int i;
>> +
>> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
>> +			llc->cntl1_regval |=
>> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 
> 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.
> 
>> +	}
>> +
>> +	/*
>> +	 * Set SCID for GPU IOMMU. This will be used to access
>> +	 * page tables that are cached in LLC.
>> +	 */
>> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
>> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
>> +
>> +		llc->cntl1_regval |=
>> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
>> +	}
> 
> As above, I think storing the slice id could be more beneficial than 
> calculating
> a value, but if we do calculate a value, use 
> FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>> 
>>  	msm_gpu_resume_devfreq(gpu);
>> 
>> +	a6xx_llc_activate(&a6xx_gpu->llc);
>> +
>>  	return 0;
>>  }
>> 
>> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> 
>> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
>> +
>>  	devfreq_suspend_device(gpu->devfreq.devfreq);
>> 
>>  	/*
>> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>>  	}
>> 
>> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>>  	a6xx_gmu_remove(a6xx_gpu);
>> 
>>  	adreno_gpu_cleanup(adreno_gpu);
>> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device 
>> *dev)
>>  	adreno_gpu->registers = NULL;
>>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>> 
>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
>> +
> 
> Confirming we don't care if a6xx_llc_slices_init passes or fails.

Are you suggesting to unconditionally set the memory attributes in 
iommu(see the code below in msm_iommu.c).
We probably wouldn't need this patch too in that case: 
https://patchwork.freedesktop.org/patch/346097/

The return code  is used in the line below to pass 
MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

> 
>> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
>> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>>  	if (ret) {
>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>  		return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 7239b8b..09b9ad0 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -12,6 +12,14 @@
>> 
>>  extern bool hang_debug;
>> 
>> +struct a6xx_llc {
>> +	void __iomem *mmio;
>> +	void *gpu_llc_slice;
>> +	void *gpuhtw_llc_slice;
>> +	u32 cntl0_regval;
> 
> As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure 
> cntl1 is
> needed - since we could store or query the ids at activate time.
> 
>> +	u32 cntl1_regval;
>> +};
>> +
>>  struct a6xx_gpu {
>>  	struct adreno_gpu base;
>> 
>> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>>  	struct msm_ringbuffer *cur_ring;
>> 
>>  	struct a6xx_gmu gmu;
>> +	struct a6xx_llc llc;
>>  };
>> 
>>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index 8c95c31..4699367 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, 
>> const char * const *names,
>>  			    int cnt)
>>  {
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>> +	int gpu_htw_llc = 1;
>> +
>> +	/*
>> +	 * This allows GPU to set the bus attributes required
>> +	 * to use system cache on behalf of the iommu page table
>> +	 * walker.
>> +	 */
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		iommu_domain_set_attr(iommu->domain,
>> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);
> 
> We're all okay if this fails?  No harm no foul?
> 
>> 
>>  	return iommu_attach_device(iommu->domain, mmu->dev);
>>  }
>> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, 
>> uint64_t iova,
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>>  	size_t ret;
>> 
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		prot |= IOMMU_QCOM_SYS_CACHE;
>> +
>>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>>  	WARN_ON(!ret);
>> 
>> diff --git a/drivers/gpu/drm/msm/msm_mmu.h 
>> b/drivers/gpu/drm/msm/msm_mmu.h
>> index 1e4ac36d..3e6bdad 100644
>> --- a/drivers/gpu/drm/msm/msm_mmu.h
>> +++ b/drivers/gpu/drm/msm/msm_mmu.h
>> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>>  	void (*destroy)(struct msm_mmu *mmu);
>>  };
>> 
>> +/* MMU features */
>> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
>> +
>>  struct msm_mmu {
>>  	const struct msm_mmu_funcs *funcs;
>>  	struct device *dev;
>> --
>> 1.9.1
>> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-20 10:10       ` smasetty
  0 siblings, 0 replies; 36+ messages in thread
From: smasetty @ 2019-12-20 10:10 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, will, linux-arm-msm, linux-kernel, iommu, dri-devel,
	robin.murphy, linux-arm-msm-owner

On 2019-12-20 01:28, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
>> The last level system cache can be partitioned to 32 different slices
>> of which GPU has two slices preallocated. One slice is used for 
>> caching GPU
>> buffers and the other slice is used for caching the GPU SMMU 
>> pagetables.
>> This patch talks to the core system cache driver to acquire the slice 
>> handles,
>> configure the SCID's to those slices and activates and deactivates the 
>> slices
>> upon GPU power collapse and restore.
>> 
>> Some support from the IOMMU driver is also needed to make use of the
>> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which 
>> enables
>> caching GPU data buffers in the system cache with memory attributes 
>> such
>> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
>> then has the ability to override a few cacheability parameters which 
>> it
>> does to override write-allocate to write-no-allocate as the GPU 
>> hardware
>> does not benefit much from it.
>> 
>> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
>> used by the IOMMU driver to set the right attributes to cache the 
>> hardware
>> pagetables into the system cache.
>> 
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 
>> +++++++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>>  4 files changed, 146 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index faff6ff..0c7fdee 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -9,6 +9,7 @@
>>  #include "a6xx_gmu.xml.h"
>> 
>>  #include <linux/devfreq.h>
>> +#include <linux/soc/qcom/llcc-qcom.h>
>> 
>>  #define GPU_PAS_ID 13
>> 
>> @@ -781,6 +782,117 @@ static void 
>> a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>>  }
>> 
>> +#define A6XX_LLC_NUM_GPU_SCIDS		5
>> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> 
> As I mention below, I'm not sure if we need these
> 
>> +#define A6XX_GPU_LLC_SCID_MASK \
>> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
>> +
>> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
>> +#define A6XX_GPUHTW_LLC_SCID_MASK \
>> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << 
>> A6XX_GPUHTW_LLC_SCID_SHIFT)
>> +
> 
> Normally these go into the envytools regmap but if we're going to do 
> these guys
> lets use the power of <linux/bitfield.h> for good.
> 
> #define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> #define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> 
>> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> 
> Don't mark C functions as inline - let the compiler figure it out for 
> you.
> 
>> +	u32 reg, u32 mask, u32 or)
>> +{
>> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
>> +}
>> +
>> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
>> +{
>> +	llcc_slice_deactivate(llc->gpu_llc_slice);
>> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static void a6xx_llc_activate(struct a6xx_llc *llc)
>> +{
>> +	if (!llc->mmio)
>> +		return;
>> +
>> +	/* Program the sub-cache ID for all GPU blocks */
>> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPU_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPU_LLC_SCID_MASK));
> 
> This is out of order with the comments below, but if we store the slice 
> id then
> you could calculate regval here and not have to store it.
> 
>> +
>> +	/* Program the sub-cache ID for the GPU pagetables */
>> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> 
> val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> 
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPUHTW_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPUHTW_LLC_SCID_MASK));
> 
> And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> 
> In theory you could just calculate the u32 and write it directly 
> without a rmw.
> In fact, that might be preferable - if the slice activate failed, you 
> don't want
> to run the risk that the scid for htw is still populated.
> 
>> +
>> +	/* Program cacheability overrides */
>> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
>> +		llc->cntl0_regval);
> 
> As below, this could easily be a constant.
> 
>> +}
>> +
>> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
>> +{
>> +	if (llc->mmio)
>> +		iounmap(llc->mmio);
> 
> msm_ioremap returns a devm_ managed resource, so do not use iounmap() 
> to free
> it. Bets to just leave it and let the gpu device handle it when it goes 
> boom.
> 
>> +
>> +	llcc_slice_putd(llc->gpu_llc_slice);
>> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> T
> This can be void, I don't think we care if it passes or fails.
> 
>> +		struct a6xx_llc *llc)
>> +{
>> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
>> +	if (IS_ERR_OR_NULL(llc->mmio))
> 
> msm_ioremap can not return NULL.
> 
>> +		return -ENODEV;
>> +
>> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
>> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
>> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * CNTL0 provides options to override the settings for the
>> +	 * read and write allocation policies for the LLC. These
>> +	 * overrides are global for all memory transactions from
>> +	 * the GPU.
>> +	 *
>> +	 * 0x3: read-no-alloc-overridden = 0
>> +	 *      read-no-alloc = 0 - Allocate lines on read miss
>> +	 *      write-no-alloc-overridden = 1
>> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
>> +	 */
>> +	llc->cntl0_regval = 0x03;
> 
> This is a fixed value isn't it?  We should be able to get away with 
> writing a
> constant.
> 
>> +
>> +	/*
>> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
>> +	 * FLAG cache) GPU blocks. This value will be passed along with
>> +	 * the address for any memory transaction from GPU to identify
>> +	 * the sub-cache for that transaction.
>> +	 */
>> +	if (!IS_ERR(llc->gpu_llc_slice)) {
>> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
>> +		int i;
>> +
>> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
>> +			llc->cntl1_regval |=
>> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 
> 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.
> 
>> +	}
>> +
>> +	/*
>> +	 * Set SCID for GPU IOMMU. This will be used to access
>> +	 * page tables that are cached in LLC.
>> +	 */
>> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
>> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
>> +
>> +		llc->cntl1_regval |=
>> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
>> +	}
> 
> As above, I think storing the slice id could be more beneficial than 
> calculating
> a value, but if we do calculate a value, use 
> FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>> 
>>  	msm_gpu_resume_devfreq(gpu);
>> 
>> +	a6xx_llc_activate(&a6xx_gpu->llc);
>> +
>>  	return 0;
>>  }
>> 
>> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> 
>> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
>> +
>>  	devfreq_suspend_device(gpu->devfreq.devfreq);
>> 
>>  	/*
>> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>>  	}
>> 
>> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>>  	a6xx_gmu_remove(a6xx_gpu);
>> 
>>  	adreno_gpu_cleanup(adreno_gpu);
>> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device 
>> *dev)
>>  	adreno_gpu->registers = NULL;
>>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>> 
>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
>> +
> 
> Confirming we don't care if a6xx_llc_slices_init passes or fails.

Are you suggesting to unconditionally set the memory attributes in 
iommu(see the code below in msm_iommu.c).
We probably wouldn't need this patch too in that case: 
https://patchwork.freedesktop.org/patch/346097/

The return code  is used in the line below to pass 
MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

> 
>> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
>> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>>  	if (ret) {
>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>  		return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 7239b8b..09b9ad0 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -12,6 +12,14 @@
>> 
>>  extern bool hang_debug;
>> 
>> +struct a6xx_llc {
>> +	void __iomem *mmio;
>> +	void *gpu_llc_slice;
>> +	void *gpuhtw_llc_slice;
>> +	u32 cntl0_regval;
> 
> As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure 
> cntl1 is
> needed - since we could store or query the ids at activate time.
> 
>> +	u32 cntl1_regval;
>> +};
>> +
>>  struct a6xx_gpu {
>>  	struct adreno_gpu base;
>> 
>> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>>  	struct msm_ringbuffer *cur_ring;
>> 
>>  	struct a6xx_gmu gmu;
>> +	struct a6xx_llc llc;
>>  };
>> 
>>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index 8c95c31..4699367 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, 
>> const char * const *names,
>>  			    int cnt)
>>  {
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>> +	int gpu_htw_llc = 1;
>> +
>> +	/*
>> +	 * This allows GPU to set the bus attributes required
>> +	 * to use system cache on behalf of the iommu page table
>> +	 * walker.
>> +	 */
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		iommu_domain_set_attr(iommu->domain,
>> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);
> 
> We're all okay if this fails?  No harm no foul?
> 
>> 
>>  	return iommu_attach_device(iommu->domain, mmu->dev);
>>  }
>> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, 
>> uint64_t iova,
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>>  	size_t ret;
>> 
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		prot |= IOMMU_QCOM_SYS_CACHE;
>> +
>>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>>  	WARN_ON(!ret);
>> 
>> diff --git a/drivers/gpu/drm/msm/msm_mmu.h 
>> b/drivers/gpu/drm/msm/msm_mmu.h
>> index 1e4ac36d..3e6bdad 100644
>> --- a/drivers/gpu/drm/msm/msm_mmu.h
>> +++ b/drivers/gpu/drm/msm/msm_mmu.h
>> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>>  	void (*destroy)(struct msm_mmu *mmu);
>>  };
>> 
>> +/* MMU features */
>> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
>> +
>>  struct msm_mmu {
>>  	const struct msm_mmu_funcs *funcs;
>>  	struct device *dev;
>> --
>> 1.9.1
>> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-20 10:10       ` smasetty
  0 siblings, 0 replies; 36+ messages in thread
From: smasetty @ 2019-12-20 10:10 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: freedreno, saiprakash.ranjan, will, linux-arm-msm, linux-kernel,
	iommu, dri-devel, robin.murphy, linux-arm-msm-owner

On 2019-12-20 01:28, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
>> The last level system cache can be partitioned to 32 different slices
>> of which GPU has two slices preallocated. One slice is used for 
>> caching GPU
>> buffers and the other slice is used for caching the GPU SMMU 
>> pagetables.
>> This patch talks to the core system cache driver to acquire the slice 
>> handles,
>> configure the SCID's to those slices and activates and deactivates the 
>> slices
>> upon GPU power collapse and restore.
>> 
>> Some support from the IOMMU driver is also needed to make use of the
>> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which 
>> enables
>> caching GPU data buffers in the system cache with memory attributes 
>> such
>> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
>> then has the ability to override a few cacheability parameters which 
>> it
>> does to override write-allocate to write-no-allocate as the GPU 
>> hardware
>> does not benefit much from it.
>> 
>> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
>> used by the IOMMU driver to set the right attributes to cache the 
>> hardware
>> pagetables into the system cache.
>> 
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 
>> +++++++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>>  drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
>>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>>  4 files changed, 146 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index faff6ff..0c7fdee 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -9,6 +9,7 @@
>>  #include "a6xx_gmu.xml.h"
>> 
>>  #include <linux/devfreq.h>
>> +#include <linux/soc/qcom/llcc-qcom.h>
>> 
>>  #define GPU_PAS_ID 13
>> 
>> @@ -781,6 +782,117 @@ static void 
>> a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
>>  	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>>  }
>> 
>> +#define A6XX_LLC_NUM_GPU_SCIDS		5
>> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> 
> As I mention below, I'm not sure if we need these
> 
>> +#define A6XX_GPU_LLC_SCID_MASK \
>> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
>> +
>> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
>> +#define A6XX_GPUHTW_LLC_SCID_MASK \
>> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << 
>> A6XX_GPUHTW_LLC_SCID_SHIFT)
>> +
> 
> Normally these go into the envytools regmap but if we're going to do 
> these guys
> lets use the power of <linux/bitfield.h> for good.
> 
> #define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> #define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> 
>> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> 
> Don't mark C functions as inline - let the compiler figure it out for 
> you.
> 
>> +	u32 reg, u32 mask, u32 or)
>> +{
>> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
>> +}
>> +
>> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
>> +{
>> +	llcc_slice_deactivate(llc->gpu_llc_slice);
>> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static void a6xx_llc_activate(struct a6xx_llc *llc)
>> +{
>> +	if (!llc->mmio)
>> +		return;
>> +
>> +	/* Program the sub-cache ID for all GPU blocks */
>> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPU_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPU_LLC_SCID_MASK));
> 
> This is out of order with the comments below, but if we store the slice 
> id then
> you could calculate regval here and not have to store it.
> 
>> +
>> +	/* Program the sub-cache ID for the GPU pagetables */
>> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> 
> val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> 
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPUHTW_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPUHTW_LLC_SCID_MASK));
> 
> And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> 
> In theory you could just calculate the u32 and write it directly 
> without a rmw.
> In fact, that might be preferable - if the slice activate failed, you 
> don't want
> to run the risk that the scid for htw is still populated.
> 
>> +
>> +	/* Program cacheability overrides */
>> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
>> +		llc->cntl0_regval);
> 
> As below, this could easily be a constant.
> 
>> +}
>> +
>> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
>> +{
>> +	if (llc->mmio)
>> +		iounmap(llc->mmio);
> 
> msm_ioremap returns a devm_ managed resource, so do not use iounmap() 
> to free
> it. Bets to just leave it and let the gpu device handle it when it goes 
> boom.
> 
>> +
>> +	llcc_slice_putd(llc->gpu_llc_slice);
>> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> T
> This can be void, I don't think we care if it passes or fails.
> 
>> +		struct a6xx_llc *llc)
>> +{
>> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
>> +	if (IS_ERR_OR_NULL(llc->mmio))
> 
> msm_ioremap can not return NULL.
> 
>> +		return -ENODEV;
>> +
>> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
>> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
>> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * CNTL0 provides options to override the settings for the
>> +	 * read and write allocation policies for the LLC. These
>> +	 * overrides are global for all memory transactions from
>> +	 * the GPU.
>> +	 *
>> +	 * 0x3: read-no-alloc-overridden = 0
>> +	 *      read-no-alloc = 0 - Allocate lines on read miss
>> +	 *      write-no-alloc-overridden = 1
>> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
>> +	 */
>> +	llc->cntl0_regval = 0x03;
> 
> This is a fixed value isn't it?  We should be able to get away with 
> writing a
> constant.
> 
>> +
>> +	/*
>> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
>> +	 * FLAG cache) GPU blocks. This value will be passed along with
>> +	 * the address for any memory transaction from GPU to identify
>> +	 * the sub-cache for that transaction.
>> +	 */
>> +	if (!IS_ERR(llc->gpu_llc_slice)) {
>> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
>> +		int i;
>> +
>> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
>> +			llc->cntl1_regval |=
>> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 
> 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.
> 
>> +	}
>> +
>> +	/*
>> +	 * Set SCID for GPU IOMMU. This will be used to access
>> +	 * page tables that are cached in LLC.
>> +	 */
>> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
>> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
>> +
>> +		llc->cntl1_regval |=
>> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
>> +	}
> 
> As above, I think storing the slice id could be more beneficial than 
> calculating
> a value, but if we do calculate a value, use 
> FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>> 
>>  	msm_gpu_resume_devfreq(gpu);
>> 
>> +	a6xx_llc_activate(&a6xx_gpu->llc);
>> +
>>  	return 0;
>>  }
>> 
>> @@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> 
>> +	a6xx_llc_deactivate(&a6xx_gpu->llc);
>> +
>>  	devfreq_suspend_device(gpu->devfreq.devfreq);
>> 
>>  	/*
>> @@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>  		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
>>  	}
>> 
>> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>>  	a6xx_gmu_remove(a6xx_gpu);
>> 
>>  	adreno_gpu_cleanup(adreno_gpu);
>> @@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device 
>> *dev)
>>  	adreno_gpu->registers = NULL;
>>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>> 
>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
>> +
> 
> Confirming we don't care if a6xx_llc_slices_init passes or fails.

Are you suggesting to unconditionally set the memory attributes in 
iommu(see the code below in msm_iommu.c).
We probably wouldn't need this patch too in that case: 
https://patchwork.freedesktop.org/patch/346097/

The return code  is used in the line below to pass 
MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

> 
>> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
>> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>>  	if (ret) {
>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>  		return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 7239b8b..09b9ad0 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -12,6 +12,14 @@
>> 
>>  extern bool hang_debug;
>> 
>> +struct a6xx_llc {
>> +	void __iomem *mmio;
>> +	void *gpu_llc_slice;
>> +	void *gpuhtw_llc_slice;
>> +	u32 cntl0_regval;
> 
> As above, I'm not sure if cntl0 is needed.  Heck, I'm not even sure 
> cntl1 is
> needed - since we could store or query the ids at activate time.
> 
>> +	u32 cntl1_regval;
>> +};
>> +
>>  struct a6xx_gpu {
>>  	struct adreno_gpu base;
>> 
>> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>>  	struct msm_ringbuffer *cur_ring;
>> 
>>  	struct a6xx_gmu gmu;
>> +	struct a6xx_llc llc;
>>  };
>> 
>>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index 8c95c31..4699367 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -27,6 +27,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, 
>> const char * const *names,
>>  			    int cnt)
>>  {
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>> +	int gpu_htw_llc = 1;
>> +
>> +	/*
>> +	 * This allows GPU to set the bus attributes required
>> +	 * to use system cache on behalf of the iommu page table
>> +	 * walker.
>> +	 */
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		iommu_domain_set_attr(iommu->domain,
>> +				DOMAIN_ATTR_QCOM_SYS_CACHE, &gpu_htw_llc);
> 
> We're all okay if this fails?  No harm no foul?
> 
>> 
>>  	return iommu_attach_device(iommu->domain, mmu->dev);
>>  }
>> @@ -45,6 +55,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, 
>> uint64_t iova,
>>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>>  	size_t ret;
>> 
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		prot |= IOMMU_QCOM_SYS_CACHE;
>> +
>>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>>  	WARN_ON(!ret);
>> 
>> diff --git a/drivers/gpu/drm/msm/msm_mmu.h 
>> b/drivers/gpu/drm/msm/msm_mmu.h
>> index 1e4ac36d..3e6bdad 100644
>> --- a/drivers/gpu/drm/msm/msm_mmu.h
>> +++ b/drivers/gpu/drm/msm/msm_mmu.h
>> @@ -18,6 +18,9 @@ struct msm_mmu_funcs {
>>  	void (*destroy)(struct msm_mmu *mmu);
>>  };
>> 
>> +/* MMU features */
>> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
>> +
>>  struct msm_mmu {
>>  	const struct msm_mmu_funcs *funcs;
>>  	struct device *dev;
>> --
>> 1.9.1
>> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
  2019-12-20 10:10       ` smasetty
  (?)
@ 2019-12-20 19:57         ` Jordan Crouse
  -1 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-20 19:57 UTC (permalink / raw)
  To: smasetty
  Cc: freedreno, saiprakash.ranjan, will, linux-arm-msm, joro,
	linux-kernel, iommu, dri-devel, robin.murphy,
	linux-arm-msm-owner

On Fri, Dec 20, 2019 at 03:40:59PM +0530, smasetty@codeaurora.org wrote:
> On 2019-12-20 01:28, Jordan Crouse wrote:
> >On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> >>The last level system cache can be partitioned to 32 different slices
> >>of which GPU has two slices preallocated. One slice is used for caching
> >>GPU
> >>buffers and the other slice is used for caching the GPU SMMU pagetables.
> >>This patch talks to the core system cache driver to acquire the slice
> >>handles,
> >>configure the SCID's to those slices and activates and deactivates the
> >>slices
> >>upon GPU power collapse and restore.
> >>
> >>Some support from the IOMMU driver is also needed to make use of the
> >>system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which
> >>enables
> >>caching GPU data buffers in the system cache with memory attributes such
> >>as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> >>then has the ability to override a few cacheability parameters which it
> >>does to override write-allocate to write-no-allocate as the GPU hardware
> >>does not benefit much from it.
> >>
> >>Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> >>used by the IOMMU driver to set the right attributes to cache the
> >>hardware
> >>pagetables into the system cache.
> >>
> >>Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> >>---
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122
> >>+++++++++++++++++++++++++++++++++-
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
> >> drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
> >> drivers/gpu/drm/msm/msm_mmu.h         |   3 +
> >> 4 files changed, 146 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>index faff6ff..0c7fdee 100644
> >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>@@ -9,6 +9,7 @@
> >> #include "a6xx_gmu.xml.h"
> >>
> >> #include <linux/devfreq.h>
> >>+#include <linux/soc/qcom/llcc-qcom.h>
> >>
> >> #define GPU_PAS_ID 13
> >>
> >>@@ -781,6 +782,117 @@ static void
> >>a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
> >> 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >> }
> >>
> >>+#define A6XX_LLC_NUM_GPU_SCIDS		5
> >>+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> >
> >As I mention below, I'm not sure if we need these
> >
> >>+#define A6XX_GPU_LLC_SCID_MASK \
> >>+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> >>+
> >>+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> >>+#define A6XX_GPUHTW_LLC_SCID_MASK \
> >>+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) <<
> >>A6XX_GPUHTW_LLC_SCID_SHIFT)
> >>+
> >
> >Normally these go into the envytools regmap but if we're going to do these
> >guys
> >lets use the power of <linux/bitfield.h> for good.
> >
> >#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> >#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> >
> >>+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> >
> >Don't mark C functions as inline - let the compiler figure it out for you.
> >
> >>+	u32 reg, u32 mask, u32 or)
> >>+{
> >>+	msm_rmw(llc->mmio + (reg << 2), mask, or);
> >>+}
> >>+
> >>+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> >>+{
> >>+	llcc_slice_deactivate(llc->gpu_llc_slice);
> >>+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static void a6xx_llc_activate(struct a6xx_llc *llc)
> >>+{
> >>+	if (!llc->mmio)
> >>+		return;
> >>+
> >>+	/* Program the sub-cache ID for all GPU blocks */
> >>+	if (!llcc_slice_activate(llc->gpu_llc_slice))
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPU_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPU_LLC_SCID_MASK));
> >
> >This is out of order with the comments below, but if we store the slice id
> >then
> >you could calculate regval here and not have to store it.
> >
> >>+
> >>+	/* Program the sub-cache ID for the GPU pagetables */
> >>+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> >
> >val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> >
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPUHTW_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPUHTW_LLC_SCID_MASK));
> >
> >And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> >
> >In theory you could just calculate the u32 and write it directly without a
> >rmw.
> >In fact, that might be preferable - if the slice activate failed, you
> >don't want
> >to run the risk that the scid for htw is still populated.
> >
> >>+
> >>+	/* Program cacheability overrides */
> >>+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> >>+		llc->cntl0_regval);
> >
> >As below, this could easily be a constant.
> >
> >>+}
> >>+
> >>+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> >>+{
> >>+	if (llc->mmio)
> >>+		iounmap(llc->mmio);
> >
> >msm_ioremap returns a devm_ managed resource, so do not use iounmap() to
> >free
> >it. Bets to just leave it and let the gpu device handle it when it goes
> >boom.
> >
> >>+
> >>+	llcc_slice_putd(llc->gpu_llc_slice);
> >>+	llcc_slice_putd(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static int a6xx_llc_slices_init(struct platform_device *pdev,
> >T
> >This can be void, I don't think we care if it passes or fails.
> >
> >>+		struct a6xx_llc *llc)
> >>+{
> >>+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> >>+	if (IS_ERR_OR_NULL(llc->mmio))
> >
> >msm_ioremap can not return NULL.
> >
> >>+		return -ENODEV;
> >>+
> >>+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> >>+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> >>+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> >>+		return -ENODEV;
> >>+
> >>+	/*
> >>+	 * CNTL0 provides options to override the settings for the
> >>+	 * read and write allocation policies for the LLC. These
> >>+	 * overrides are global for all memory transactions from
> >>+	 * the GPU.
> >>+	 *
> >>+	 * 0x3: read-no-alloc-overridden = 0
> >>+	 *      read-no-alloc = 0 - Allocate lines on read miss
> >>+	 *      write-no-alloc-overridden = 1
> >>+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> >>+	 */
> >>+	llc->cntl0_regval = 0x03;
> >
> >This is a fixed value isn't it?  We should be able to get away with
> >writing a
> >constant.
> >
> >>+
> >>+	/*
> >>+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> >>+	 * FLAG cache) GPU blocks. This value will be passed along with
> >>+	 * the address for any memory transaction from GPU to identify
> >>+	 * the sub-cache for that transaction.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpu_llc_slice)) {
> >>+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> >>+		int i;
> >>+
> >>+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> >>+			llc->cntl1_regval |=
> >>+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> >
> >As above, i'm not sure a loop is better than just:
> >
> >gpu_scid &= 0x1f;
> >
> >llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
> > | (gpu_scid << 15) | (gpu_scid << 20);
> >
> >And I'm not even sure we need do this math here in the first place.
> >
> >>+	}
> >>+
> >>+	/*
> >>+	 * Set SCID for GPU IOMMU. This will be used to access
> >>+	 * page tables that are cached in LLC.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> >>+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> >>+
> >>+		llc->cntl1_regval |=
> >>+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> >>+	}
> >
> >As above, I think storing the slice id could be more beneficial than
> >calculating
> >a value, but if we do calculate a value, use
> >FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> >
> >>+
> >>+	return 0;
> >>+}
> >>+
> >> static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> {
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>
> >> 	msm_gpu_resume_devfreq(gpu);
> >>
> >>+	a6xx_llc_activate(&a6xx_gpu->llc);
> >>+
> >> 	return 0;
> >> }
> >>
> >>@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>
> >>+	a6xx_llc_deactivate(&a6xx_gpu->llc);
> >>+
> >> 	devfreq_suspend_device(gpu->devfreq.devfreq);
> >>
> >> 	/*
> >>@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >> 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
> >> 	}
> >>
> >>+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> >> 	a6xx_gmu_remove(a6xx_gpu);
> >>
> >> 	adreno_gpu_cleanup(adreno_gpu);
> >>@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device
> >>*dev)
> >> 	adreno_gpu->registers = NULL;
> >> 	adreno_gpu->reg_offsets = a6xx_register_offsets;
> >>
> >>-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> >>+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> >>+
> >
> >Confirming we don't care if a6xx_llc_slices_init passes or fails.
> 
> Are you suggesting to unconditionally set the memory attributes in iommu(see
> the code below in msm_iommu.c).
> We probably wouldn't need this patch too in that case:
> https://patchwork.freedesktop.org/patch/346097/
> 
> The return code  is used in the line below to pass
> MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

Oh, I see. Please don't do that. Set a separate flag if you need to.

features = 0;

 if (a6xx_llc_slices_init(pdev, &a6xx_gpu->llc))
    features = MMU_FEATURE_USE_SYSTEM_CACHE;

Hiding ret in a function that also sets ret has a tendency to confuse people
like me.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-20 19:57         ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-20 19:57 UTC (permalink / raw)
  To: smasetty
  Cc: freedreno, linux-kernel, iommu, dri-devel, linux-arm-msm, will,
	linux-arm-msm-owner, robin.murphy

On Fri, Dec 20, 2019 at 03:40:59PM +0530, smasetty@codeaurora.org wrote:
> On 2019-12-20 01:28, Jordan Crouse wrote:
> >On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> >>The last level system cache can be partitioned to 32 different slices
> >>of which GPU has two slices preallocated. One slice is used for caching
> >>GPU
> >>buffers and the other slice is used for caching the GPU SMMU pagetables.
> >>This patch talks to the core system cache driver to acquire the slice
> >>handles,
> >>configure the SCID's to those slices and activates and deactivates the
> >>slices
> >>upon GPU power collapse and restore.
> >>
> >>Some support from the IOMMU driver is also needed to make use of the
> >>system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which
> >>enables
> >>caching GPU data buffers in the system cache with memory attributes such
> >>as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> >>then has the ability to override a few cacheability parameters which it
> >>does to override write-allocate to write-no-allocate as the GPU hardware
> >>does not benefit much from it.
> >>
> >>Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> >>used by the IOMMU driver to set the right attributes to cache the
> >>hardware
> >>pagetables into the system cache.
> >>
> >>Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> >>---
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122
> >>+++++++++++++++++++++++++++++++++-
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
> >> drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
> >> drivers/gpu/drm/msm/msm_mmu.h         |   3 +
> >> 4 files changed, 146 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>index faff6ff..0c7fdee 100644
> >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>@@ -9,6 +9,7 @@
> >> #include "a6xx_gmu.xml.h"
> >>
> >> #include <linux/devfreq.h>
> >>+#include <linux/soc/qcom/llcc-qcom.h>
> >>
> >> #define GPU_PAS_ID 13
> >>
> >>@@ -781,6 +782,117 @@ static void
> >>a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
> >> 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >> }
> >>
> >>+#define A6XX_LLC_NUM_GPU_SCIDS		5
> >>+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> >
> >As I mention below, I'm not sure if we need these
> >
> >>+#define A6XX_GPU_LLC_SCID_MASK \
> >>+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> >>+
> >>+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> >>+#define A6XX_GPUHTW_LLC_SCID_MASK \
> >>+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) <<
> >>A6XX_GPUHTW_LLC_SCID_SHIFT)
> >>+
> >
> >Normally these go into the envytools regmap but if we're going to do these
> >guys
> >lets use the power of <linux/bitfield.h> for good.
> >
> >#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> >#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> >
> >>+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> >
> >Don't mark C functions as inline - let the compiler figure it out for you.
> >
> >>+	u32 reg, u32 mask, u32 or)
> >>+{
> >>+	msm_rmw(llc->mmio + (reg << 2), mask, or);
> >>+}
> >>+
> >>+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> >>+{
> >>+	llcc_slice_deactivate(llc->gpu_llc_slice);
> >>+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static void a6xx_llc_activate(struct a6xx_llc *llc)
> >>+{
> >>+	if (!llc->mmio)
> >>+		return;
> >>+
> >>+	/* Program the sub-cache ID for all GPU blocks */
> >>+	if (!llcc_slice_activate(llc->gpu_llc_slice))
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPU_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPU_LLC_SCID_MASK));
> >
> >This is out of order with the comments below, but if we store the slice id
> >then
> >you could calculate regval here and not have to store it.
> >
> >>+
> >>+	/* Program the sub-cache ID for the GPU pagetables */
> >>+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> >
> >val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> >
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPUHTW_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPUHTW_LLC_SCID_MASK));
> >
> >And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> >
> >In theory you could just calculate the u32 and write it directly without a
> >rmw.
> >In fact, that might be preferable - if the slice activate failed, you
> >don't want
> >to run the risk that the scid for htw is still populated.
> >
> >>+
> >>+	/* Program cacheability overrides */
> >>+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> >>+		llc->cntl0_regval);
> >
> >As below, this could easily be a constant.
> >
> >>+}
> >>+
> >>+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> >>+{
> >>+	if (llc->mmio)
> >>+		iounmap(llc->mmio);
> >
> >msm_ioremap returns a devm_ managed resource, so do not use iounmap() to
> >free
> >it. Bets to just leave it and let the gpu device handle it when it goes
> >boom.
> >
> >>+
> >>+	llcc_slice_putd(llc->gpu_llc_slice);
> >>+	llcc_slice_putd(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static int a6xx_llc_slices_init(struct platform_device *pdev,
> >T
> >This can be void, I don't think we care if it passes or fails.
> >
> >>+		struct a6xx_llc *llc)
> >>+{
> >>+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> >>+	if (IS_ERR_OR_NULL(llc->mmio))
> >
> >msm_ioremap can not return NULL.
> >
> >>+		return -ENODEV;
> >>+
> >>+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> >>+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> >>+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> >>+		return -ENODEV;
> >>+
> >>+	/*
> >>+	 * CNTL0 provides options to override the settings for the
> >>+	 * read and write allocation policies for the LLC. These
> >>+	 * overrides are global for all memory transactions from
> >>+	 * the GPU.
> >>+	 *
> >>+	 * 0x3: read-no-alloc-overridden = 0
> >>+	 *      read-no-alloc = 0 - Allocate lines on read miss
> >>+	 *      write-no-alloc-overridden = 1
> >>+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> >>+	 */
> >>+	llc->cntl0_regval = 0x03;
> >
> >This is a fixed value isn't it?  We should be able to get away with
> >writing a
> >constant.
> >
> >>+
> >>+	/*
> >>+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> >>+	 * FLAG cache) GPU blocks. This value will be passed along with
> >>+	 * the address for any memory transaction from GPU to identify
> >>+	 * the sub-cache for that transaction.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpu_llc_slice)) {
> >>+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> >>+		int i;
> >>+
> >>+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> >>+			llc->cntl1_regval |=
> >>+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> >
> >As above, i'm not sure a loop is better than just:
> >
> >gpu_scid &= 0x1f;
> >
> >llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
> > | (gpu_scid << 15) | (gpu_scid << 20);
> >
> >And I'm not even sure we need do this math here in the first place.
> >
> >>+	}
> >>+
> >>+	/*
> >>+	 * Set SCID for GPU IOMMU. This will be used to access
> >>+	 * page tables that are cached in LLC.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> >>+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> >>+
> >>+		llc->cntl1_regval |=
> >>+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> >>+	}
> >
> >As above, I think storing the slice id could be more beneficial than
> >calculating
> >a value, but if we do calculate a value, use
> >FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> >
> >>+
> >>+	return 0;
> >>+}
> >>+
> >> static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> {
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>
> >> 	msm_gpu_resume_devfreq(gpu);
> >>
> >>+	a6xx_llc_activate(&a6xx_gpu->llc);
> >>+
> >> 	return 0;
> >> }
> >>
> >>@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>
> >>+	a6xx_llc_deactivate(&a6xx_gpu->llc);
> >>+
> >> 	devfreq_suspend_device(gpu->devfreq.devfreq);
> >>
> >> 	/*
> >>@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >> 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
> >> 	}
> >>
> >>+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> >> 	a6xx_gmu_remove(a6xx_gpu);
> >>
> >> 	adreno_gpu_cleanup(adreno_gpu);
> >>@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device
> >>*dev)
> >> 	adreno_gpu->registers = NULL;
> >> 	adreno_gpu->reg_offsets = a6xx_register_offsets;
> >>
> >>-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> >>+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> >>+
> >
> >Confirming we don't care if a6xx_llc_slices_init passes or fails.
> 
> Are you suggesting to unconditionally set the memory attributes in iommu(see
> the code below in msm_iommu.c).
> We probably wouldn't need this patch too in that case:
> https://patchwork.freedesktop.org/patch/346097/
> 
> The return code  is used in the line below to pass
> MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

Oh, I see. Please don't do that. Set a separate flag if you need to.

features = 0;

 if (a6xx_llc_slices_init(pdev, &a6xx_gpu->llc))
    features = MMU_FEATURE_USE_SYSTEM_CACHE;

Hiding ret in a function that also sets ret has a tendency to confuse people
like me.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)
@ 2019-12-20 19:57         ` Jordan Crouse
  0 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2019-12-20 19:57 UTC (permalink / raw)
  To: smasetty
  Cc: saiprakash.ranjan, freedreno, linux-kernel, iommu, dri-devel,
	linux-arm-msm, will, linux-arm-msm-owner, robin.murphy

On Fri, Dec 20, 2019 at 03:40:59PM +0530, smasetty@codeaurora.org wrote:
> On 2019-12-20 01:28, Jordan Crouse wrote:
> >On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> >>The last level system cache can be partitioned to 32 different slices
> >>of which GPU has two slices preallocated. One slice is used for caching
> >>GPU
> >>buffers and the other slice is used for caching the GPU SMMU pagetables.
> >>This patch talks to the core system cache driver to acquire the slice
> >>handles,
> >>configure the SCID's to those slices and activates and deactivates the
> >>slices
> >>upon GPU power collapse and restore.
> >>
> >>Some support from the IOMMU driver is also needed to make use of the
> >>system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which
> >>enables
> >>caching GPU data buffers in the system cache with memory attributes such
> >>as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> >>then has the ability to override a few cacheability parameters which it
> >>does to override write-allocate to write-no-allocate as the GPU hardware
> >>does not benefit much from it.
> >>
> >>Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> >>used by the IOMMU driver to set the right attributes to cache the
> >>hardware
> >>pagetables into the system cache.
> >>
> >>Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> >>---
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122
> >>+++++++++++++++++++++++++++++++++-
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
> >> drivers/gpu/drm/msm/msm_iommu.c       |  13 ++++
> >> drivers/gpu/drm/msm/msm_mmu.h         |   3 +
> >> 4 files changed, 146 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>index faff6ff..0c7fdee 100644
> >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>@@ -9,6 +9,7 @@
> >> #include "a6xx_gmu.xml.h"
> >>
> >> #include <linux/devfreq.h>
> >>+#include <linux/soc/qcom/llcc-qcom.h>
> >>
> >> #define GPU_PAS_ID 13
> >>
> >>@@ -781,6 +782,117 @@ static void
> >>a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
> >> 	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >> }
> >>
> >>+#define A6XX_LLC_NUM_GPU_SCIDS		5
> >>+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> >
> >As I mention below, I'm not sure if we need these
> >
> >>+#define A6XX_GPU_LLC_SCID_MASK \
> >>+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> >>+
> >>+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> >>+#define A6XX_GPUHTW_LLC_SCID_MASK \
> >>+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) <<
> >>A6XX_GPUHTW_LLC_SCID_SHIFT)
> >>+
> >
> >Normally these go into the envytools regmap but if we're going to do these
> >guys
> >lets use the power of <linux/bitfield.h> for good.
> >
> >#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> >#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> >
> >>+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> >
> >Don't mark C functions as inline - let the compiler figure it out for you.
> >
> >>+	u32 reg, u32 mask, u32 or)
> >>+{
> >>+	msm_rmw(llc->mmio + (reg << 2), mask, or);
> >>+}
> >>+
> >>+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> >>+{
> >>+	llcc_slice_deactivate(llc->gpu_llc_slice);
> >>+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static void a6xx_llc_activate(struct a6xx_llc *llc)
> >>+{
> >>+	if (!llc->mmio)
> >>+		return;
> >>+
> >>+	/* Program the sub-cache ID for all GPU blocks */
> >>+	if (!llcc_slice_activate(llc->gpu_llc_slice))
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPU_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPU_LLC_SCID_MASK));
> >
> >This is out of order with the comments below, but if we store the slice id
> >then
> >you could calculate regval here and not have to store it.
> >
> >>+
> >>+	/* Program the sub-cache ID for the GPU pagetables */
> >>+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> >
> >val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);
> >
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPUHTW_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPUHTW_LLC_SCID_MASK));
> >
> >And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);
> >
> >In theory you could just calculate the u32 and write it directly without a
> >rmw.
> >In fact, that might be preferable - if the slice activate failed, you
> >don't want
> >to run the risk that the scid for htw is still populated.
> >
> >>+
> >>+	/* Program cacheability overrides */
> >>+	a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> >>+		llc->cntl0_regval);
> >
> >As below, this could easily be a constant.
> >
> >>+}
> >>+
> >>+static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> >>+{
> >>+	if (llc->mmio)
> >>+		iounmap(llc->mmio);
> >
> >msm_ioremap returns a devm_ managed resource, so do not use iounmap() to
> >free
> >it. Bets to just leave it and let the gpu device handle it when it goes
> >boom.
> >
> >>+
> >>+	llcc_slice_putd(llc->gpu_llc_slice);
> >>+	llcc_slice_putd(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static int a6xx_llc_slices_init(struct platform_device *pdev,
> >T
> >This can be void, I don't think we care if it passes or fails.
> >
> >>+		struct a6xx_llc *llc)
> >>+{
> >>+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> >>+	if (IS_ERR_OR_NULL(llc->mmio))
> >
> >msm_ioremap can not return NULL.
> >
> >>+		return -ENODEV;
> >>+
> >>+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> >>+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> >>+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> >>+		return -ENODEV;
> >>+
> >>+	/*
> >>+	 * CNTL0 provides options to override the settings for the
> >>+	 * read and write allocation policies for the LLC. These
> >>+	 * overrides are global for all memory transactions from
> >>+	 * the GPU.
> >>+	 *
> >>+	 * 0x3: read-no-alloc-overridden = 0
> >>+	 *      read-no-alloc = 0 - Allocate lines on read miss
> >>+	 *      write-no-alloc-overridden = 1
> >>+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> >>+	 */
> >>+	llc->cntl0_regval = 0x03;
> >
> >This is a fixed value isn't it?  We should be able to get away with
> >writing a
> >constant.
> >
> >>+
> >>+	/*
> >>+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> >>+	 * FLAG cache) GPU blocks. This value will be passed along with
> >>+	 * the address for any memory transaction from GPU to identify
> >>+	 * the sub-cache for that transaction.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpu_llc_slice)) {
> >>+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> >>+		int i;
> >>+
> >>+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> >>+			llc->cntl1_regval |=
> >>+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> >
> >As above, i'm not sure a loop is better than just:
> >
> >gpu_scid &= 0x1f;
> >
> >llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
> > | (gpu_scid << 15) | (gpu_scid << 20);
> >
> >And I'm not even sure we need do this math here in the first place.
> >
> >>+	}
> >>+
> >>+	/*
> >>+	 * Set SCID for GPU IOMMU. This will be used to access
> >>+	 * page tables that are cached in LLC.
> >>+	 */
> >>+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> >>+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> >>+
> >>+		llc->cntl1_regval |=
> >>+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> >>+	}
> >
> >As above, I think storing the slice id could be more beneficial than
> >calculating
> >a value, but if we do calculate a value, use
> >FIELD_SET(A6XX_GPUHTW_LLC_SCID, )
> >
> >>+
> >>+	return 0;
> >>+}
> >>+
> >> static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> {
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>@@ -795,6 +907,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>
> >> 	msm_gpu_resume_devfreq(gpu);
> >>
> >>+	a6xx_llc_activate(&a6xx_gpu->llc);
> >>+
> >> 	return 0;
> >> }
> >>
> >>@@ -803,6 +917,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>
> >>+	a6xx_llc_deactivate(&a6xx_gpu->llc);
> >>+
> >> 	devfreq_suspend_device(gpu->devfreq.devfreq);
> >>
> >> 	/*
> >>@@ -851,6 +967,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >> 		drm_gem_object_put_unlocked(a6xx_gpu->sqe_bo);
> >> 	}
> >>
> >>+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> >> 	a6xx_gmu_remove(a6xx_gpu);
> >>
> >> 	adreno_gpu_cleanup(adreno_gpu);
> >>@@ -924,7 +1041,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device
> >>*dev)
> >> 	adreno_gpu->registers = NULL;
> >> 	adreno_gpu->reg_offsets = a6xx_register_offsets;
> >>
> >>-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> >>+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> >>+
> >
> >Confirming we don't care if a6xx_llc_slices_init passes or fails.
> 
> Are you suggesting to unconditionally set the memory attributes in iommu(see
> the code below in msm_iommu.c).
> We probably wouldn't need this patch too in that case:
> https://patchwork.freedesktop.org/patch/346097/
> 
> The return code  is used in the line below to pass
> MMU_FEATURE_USE_SYSTEM_CACHE. Am I missing something here?

Oh, I see. Please don't do that. Set a separate flag if you need to.

features = 0;

 if (a6xx_llc_slices_init(pdev, &a6xx_gpu->llc))
    features = MMU_FEATURE_USE_SYSTEM_CACHE;

Hiding ret in a function that also sets ret has a tendency to confuse people
like me.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/A6xx: Add support for using system cache(llc)
       [not found]     ` <1521789591-28628-6-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2018-04-03 21:24       ` Jordan Crouse
@ 2018-04-05  7:04       ` Vivek Gautam
  1 sibling, 0 replies; 36+ messages in thread
From: Vivek Gautam @ 2018-04-05  7:04 UTC (permalink / raw)
  To: Sharat Masetty, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Sharat,


On 3/23/2018 12:49 PM, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32
> different slices of which GPU has two slices preallocated.
> The "gpu" slice is used for caching GPU buffers and
> the "gpuhtw" slice is used for caching the GPU SMMU
> pagetables.  This patch talks to the core system cache
> driver to acquire the slice handles, configure the SCID's
> to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
>
> Some support from the IOMMU driver is also needed to
> make use of the system cache. IOMMU_UPSTREAM_HINT is
> a buffer protection flag which enables caching GPU data
> buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers.
> The GPU then has the ability to override a few cacheability
> parameters which it does to override write-allocate to
> write-no-allocate as the GPU hardware does not benefit much
> from it.
> Similarly DOMAIN_ATTR_USE_UPSTREAM_HINT is another domain level
> attribute used by the IOMMU driver to set the right attributes
> to cache the hardware pagetables into the system cache.
>
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---

Couple of minor nits. Please see comments inline below.

>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 162 +++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
>   drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
>   drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>   4 files changed, 186 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bd50674..e4554eb 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -13,6 +13,7 @@
>   
>   #include <linux/qcom_scm.h>
>   #include <linux/soc/qcom/mdt_loader.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
>   
>   #include "msm_gem.h"
>   #include "msm_mmu.h"
> @@ -913,6 +914,154 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>   	~0
>   };
>   
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> +
> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +
> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	if (!llc->mmio)
> +		return;
> +
> +	if (llc->gpu_llc_slice)
> +		if (!llcc_slice_activate(llc->gpu_llc_slice))
> +			/* Program the sub-cache ID for all GPU blocks */
> +			a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +					A6XX_GPU_LLC_SCID_MASK));
> +
> +	if (llc->gpuhtw_llc_slice)
> +		if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> +			/* Program the sub-cache ID for GPU pagetables */
> +			a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +					A6XX_GPUHTW_LLC_SCID_MASK));
> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);
> +}
> +
> +void a6xx_llc_slices_destroy(struct a6xx_llc *llc)

static?

> +{
> +	if (llc->mmio) {
> +		iounmap(llc->mmio);
> +		llc->mmio = NULL;
> +	}
> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llc->gpu_llc_slice = NULL;
> +
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +	llc->gpuhtw_llc_slice = NULL;
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> +		struct a6xx_llc *llc)
> +{
> +	int i;
> +
> +	/* Get the system cache slice descriptor for GPU and GPUHTWs */
> +	llc->gpu_llc_slice = llcc_slice_getd(&pdev->dev, "gpu");
> +	if (IS_ERR(llc->gpu_llc_slice))
> +		llc->gpu_llc_slice = NULL;
> +
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(&pdev->dev, "gpuhtw");
> +	if (IS_ERR(llc->gpuhtw_llc_slice))
> +		llc->gpuhtw_llc_slice = NULL;
> +
> +	if (llc->gpu_llc_slice == NULL && llc->gpuhtw_llc_slice == NULL)
> +		return -1;
> +
> +	/* Map registers */
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR(llc->mmio)) {
> +		llc->mmio = NULL;
> +		a6xx_llc_slices_destroy(llc);
> +		return -1;
> +	}

If you move this ioremap to start of the function, then you wouldn't need to
call a6xx_llcc_slices_destroy().

regards
Vivek

> +
> +	/*
> +	 * Setup GPU system cache CNTL0 and CNTL1 register values.
> +	 * These values will be programmed everytime GPU comes out
> +	 * of power collapse as these are non-retention registers.
> +	 */
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;
> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 *
> +	 * Currently there is only one SCID allocated for all GPU blocks
> +	 * Hence set same SCID for all the blocks.
> +	 */
> +
> +	if (llc->gpu_llc_slice) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (llc->gpuhtw_llc_slice) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}
> +
> +	return 0;
> +}
> +
>   static int a6xx_pm_resume(struct msm_gpu *gpu)
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -923,6 +1072,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>   
>   	gpu->needs_hw_init = true;
>   
> +	/* Activate LLC slices */
> +	a6xx_llc_activate(gpu);
> +
>   	return ret;
>   }
>   
> @@ -931,6 +1083,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   
> +	/* Deactivate LLC slices */
> +	a6xx_llc_deactivate(gpu);
> +
>   	/*
>   	 * Make sure the GMU is idle before continuing (because some transitions
>   	 * may use VBIF
> @@ -993,6 +1148,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>   		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
>   	}
>   
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> +
>   	a6xx_gmu_remove(a6xx_gpu);
>   
>   	adreno_gpu_cleanup(adreno_gpu);
> @@ -1040,7 +1197,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>   	adreno_gpu->registers = a6xx_registers;
>   	adreno_gpu->reg_offsets = a6xx_register_offsets;
>   
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> +
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>   	if (ret) {
>   		a6xx_destroy(&(a6xx_gpu->base.base));
>   		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 21ab701..392c426 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -21,6 +21,14 @@
>   
>   extern bool hang_debug;
>   
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;
> +	u32 cntl1_regval;
> +};
> +
>   struct a6xx_gpu {
>   	struct adreno_gpu base;
>   
> @@ -46,6 +54,7 @@ struct a6xx_gpu {
>   	uint64_t scratch_iova;
>   
>   	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>   };
>   
>   #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 1ab629b..6c03eda 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -39,6 +39,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>   {
>   	struct msm_iommu *iommu = to_msm_iommu(mmu);
>   	int ret;
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_USE_UPSTREAM_HINT, &gpu_htw_llc);
>   
>   	pm_runtime_get_suppliers(mmu->dev);
>   	ret = iommu_attach_device(iommu->domain, mmu->dev);
> @@ -63,6 +73,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>   	struct msm_iommu *iommu = to_msm_iommu(mmu);
>   	size_t ret;
>   
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_USE_UPSTREAM_HINT;
> +
>   	pm_runtime_get_suppliers(mmu->dev);
>   	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>   	pm_runtime_put_suppliers(mmu->dev);
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 85df78d..257bdea 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -30,6 +30,9 @@ struct msm_mmu_funcs {
>   	void (*destroy)(struct msm_mmu *mmu);
>   };
>   
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>   struct msm_mmu {
>   	const struct msm_mmu_funcs *funcs;
>   	struct device *dev;

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] drm/msm/A6xx: Add support for using system cache(llc)
       [not found]     ` <1521789591-28628-6-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-04-03 21:24       ` Jordan Crouse
  2018-04-05  7:04       ` Vivek Gautam
  1 sibling, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2018-04-03 21:24 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Mar 23, 2018 at 12:49:51PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32
> different slices of which GPU has two slices preallocated.
> The "gpu" slice is used for caching GPU buffers and
> the "gpuhtw" slice is used for caching the GPU SMMU
> pagetables.  This patch talks to the core system cache
> driver to acquire the slice handles, configure the SCID's
> to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to
> make use of the system cache. IOMMU_UPSTREAM_HINT is
> a buffer protection flag which enables caching GPU data
> buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers.
> The GPU then has the ability to override a few cacheability
> parameters which it does to override write-allocate to
> write-no-allocate as the GPU hardware does not benefit much
> from it.
> Similarly DOMAIN_ATTR_USE_UPSTREAM_HINT is another domain level
> attribute used by the IOMMU driver to set the right attributes
> to cache the hardware pagetables into the system cache.

This has a dependency on the LLCC driver and the API to that may change (it is
under review now).  When it does, this will have to naturally change as well but
that'll be a minor tweek and won't affect the functionality of this driver so
pending those changes..

Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 162 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
>  drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>  4 files changed, 186 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bd50674..e4554eb 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -13,6 +13,7 @@
>  
>  #include <linux/qcom_scm.h>
>  #include <linux/soc/qcom/mdt_loader.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
>  
>  #include "msm_gem.h"
>  #include "msm_mmu.h"
> @@ -913,6 +914,154 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>  	~0
>  };
>  
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> +
> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +
> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	if (!llc->mmio)
> +		return;
> +
> +	if (llc->gpu_llc_slice)
> +		if (!llcc_slice_activate(llc->gpu_llc_slice))
> +			/* Program the sub-cache ID for all GPU blocks */
> +			a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +					A6XX_GPU_LLC_SCID_MASK));
> +
> +	if (llc->gpuhtw_llc_slice)
> +		if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> +			/* Program the sub-cache ID for GPU pagetables */
> +			a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +					A6XX_GPUHTW_LLC_SCID_MASK));
> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);
> +}
> +
> +void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> +	if (llc->mmio) {
> +		iounmap(llc->mmio);
> +		llc->mmio = NULL;
> +	}
> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llc->gpu_llc_slice = NULL;
> +
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +	llc->gpuhtw_llc_slice = NULL;
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> +		struct a6xx_llc *llc)
> +{
> +	int i;
> +
> +	/* Get the system cache slice descriptor for GPU and GPUHTWs */
> +	llc->gpu_llc_slice = llcc_slice_getd(&pdev->dev, "gpu");
> +	if (IS_ERR(llc->gpu_llc_slice))
> +		llc->gpu_llc_slice = NULL;
> +
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(&pdev->dev, "gpuhtw");
> +	if (IS_ERR(llc->gpuhtw_llc_slice))
> +		llc->gpuhtw_llc_slice = NULL;
> +
> +	if (llc->gpu_llc_slice == NULL && llc->gpuhtw_llc_slice == NULL)
> +		return -1;
> +
> +	/* Map registers */
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR(llc->mmio)) {
> +		llc->mmio = NULL;
> +		a6xx_llc_slices_destroy(llc);
> +		return -1;
> +	}
> +
> +	/*
> +	 * Setup GPU system cache CNTL0 and CNTL1 register values.
> +	 * These values will be programmed everytime GPU comes out
> +	 * of power collapse as these are non-retention registers.
> +	 */
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;
> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 *
> +	 * Currently there is only one SCID allocated for all GPU blocks
> +	 * Hence set same SCID for all the blocks.
> +	 */
> +
> +	if (llc->gpu_llc_slice) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (llc->gpuhtw_llc_slice) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}
> +
> +	return 0;
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -923,6 +1072,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>  
>  	gpu->needs_hw_init = true;
>  
> +	/* Activate LLC slices */
> +	a6xx_llc_activate(gpu);
> +
>  	return ret;
>  }
>  
> @@ -931,6 +1083,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  
> +	/* Deactivate LLC slices */
> +	a6xx_llc_deactivate(gpu);
> +
>  	/*
>  	 * Make sure the GMU is idle before continuing (because some transitions
>  	 * may use VBIF
> @@ -993,6 +1148,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
>  	}
>  
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> +
>  	a6xx_gmu_remove(a6xx_gpu);
>  
>  	adreno_gpu_cleanup(adreno_gpu);
> @@ -1040,7 +1197,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a6xx_registers;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> +
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 21ab701..392c426 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -21,6 +21,14 @@
>  
>  extern bool hang_debug;
>  
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;
> +	u32 cntl1_regval;
> +};
> +
>  struct a6xx_gpu {
>  	struct adreno_gpu base;
>  
> @@ -46,6 +54,7 @@ struct a6xx_gpu {
>  	uint64_t scratch_iova;
>  
>  	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>  };
>  
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 1ab629b..6c03eda 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -39,6 +39,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>  {
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	int ret;
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_USE_UPSTREAM_HINT, &gpu_htw_llc);
>  
>  	pm_runtime_get_suppliers(mmu->dev);
>  	ret = iommu_attach_device(iommu->domain, mmu->dev);
> @@ -63,6 +73,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	size_t ret;
>  
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_USE_UPSTREAM_HINT;
> +
>  	pm_runtime_get_suppliers(mmu->dev);
>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>  	pm_runtime_put_suppliers(mmu->dev);
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 85df78d..257bdea 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -30,6 +30,9 @@ struct msm_mmu_funcs {
>  	void (*destroy)(struct msm_mmu *mmu);
>  };
>  
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>  struct msm_mmu {
>  	const struct msm_mmu_funcs *funcs;
>  	struct device *dev;
> -- 
> 1.9.1
> 
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 5/5] drm/msm/A6xx: Add support for using system cache(llc)
       [not found] ` <1521789591-28628-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-03-23  7:19   ` Sharat Masetty
       [not found]     ` <1521789591-28628-6-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Sharat Masetty @ 2018-03-23  7:19 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

The last level system cache can be partitioned to 32
different slices of which GPU has two slices preallocated.
The "gpu" slice is used for caching GPU buffers and
the "gpuhtw" slice is used for caching the GPU SMMU
pagetables.  This patch talks to the core system cache
driver to acquire the slice handles, configure the SCID's
to those slices and activates and deactivates the slices
upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to
make use of the system cache. IOMMU_UPSTREAM_HINT is
a buffer protection flag which enables caching GPU data
buffers in the system cache with memory attributes such
as outer cacheable, read-allocate, write-allocate for buffers.
The GPU then has the ability to override a few cacheability
parameters which it does to override write-allocate to
write-no-allocate as the GPU hardware does not benefit much
from it.
Similarly DOMAIN_ATTR_USE_UPSTREAM_HINT is another domain level
attribute used by the IOMMU driver to set the right attributes
to cache the hardware pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 162 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
 drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
 drivers/gpu/drm/msm/msm_mmu.h         |   3 +
 4 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index bd50674..e4554eb 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -13,6 +13,7 @@
 
 #include <linux/qcom_scm.h>
 #include <linux/soc/qcom/mdt_loader.h>
+#include <linux/soc/qcom/llcc-qcom.h>
 
 #include "msm_gem.h"
 #include "msm_mmu.h"
@@ -913,6 +914,154 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
 	~0
 };
 
+#define A6XX_LLC_NUM_GPU_SCIDS		5
+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
+
+#define A6XX_GPU_LLC_SCID_MASK \
+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
+
+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
+#define A6XX_GPUHTW_LLC_SCID_MASK \
+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
+
+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
+	u32 reg, u32 mask, u32 or)
+{
+	msm_rmw(llc->mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_deactivate(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_llc *llc = &a6xx_gpu->llc;
+
+	llcc_slice_deactivate(llc->gpu_llc_slice);
+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_llc *llc = &a6xx_gpu->llc;
+
+	if (!llc->mmio)
+		return;
+
+	if (llc->gpu_llc_slice)
+		if (!llcc_slice_activate(llc->gpu_llc_slice))
+			/* Program the sub-cache ID for all GPU blocks */
+			a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPU_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+					A6XX_GPU_LLC_SCID_MASK));
+
+	if (llc->gpuhtw_llc_slice)
+		if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
+			/* Program the sub-cache ID for GPU pagetables */
+			a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPUHTW_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+					A6XX_GPUHTW_LLC_SCID_MASK));
+
+	/* Program cacheability overrides */
+	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
+		llc->cntl0_regval);
+}
+
+void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
+{
+	if (llc->mmio) {
+		iounmap(llc->mmio);
+		llc->mmio = NULL;
+	}
+
+	llcc_slice_putd(llc->gpu_llc_slice);
+	llc->gpu_llc_slice = NULL;
+
+	llcc_slice_putd(llc->gpuhtw_llc_slice);
+	llc->gpuhtw_llc_slice = NULL;
+}
+
+static int a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_llc *llc)
+{
+	int i;
+
+	/* Get the system cache slice descriptor for GPU and GPUHTWs */
+	llc->gpu_llc_slice = llcc_slice_getd(&pdev->dev, "gpu");
+	if (IS_ERR(llc->gpu_llc_slice))
+		llc->gpu_llc_slice = NULL;
+
+	llc->gpuhtw_llc_slice = llcc_slice_getd(&pdev->dev, "gpuhtw");
+	if (IS_ERR(llc->gpuhtw_llc_slice))
+		llc->gpuhtw_llc_slice = NULL;
+
+	if (llc->gpu_llc_slice == NULL && llc->gpuhtw_llc_slice == NULL)
+		return -1;
+
+	/* Map registers */
+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR(llc->mmio)) {
+		llc->mmio = NULL;
+		a6xx_llc_slices_destroy(llc);
+		return -1;
+	}
+
+	/*
+	 * Setup GPU system cache CNTL0 and CNTL1 register values.
+	 * These values will be programmed everytime GPU comes out
+	 * of power collapse as these are non-retention registers.
+	 */
+
+	/*
+	 * CNTL0 provides options to override the settings for the
+	 * read and write allocation policies for the LLC. These
+	 * overrides are global for all memory transactions from
+	 * the GPU.
+	 *
+	 * 0x3: read-no-alloc-overridden = 0
+	 *      read-no-alloc = 0 - Allocate lines on read miss
+	 *      write-no-alloc-overridden = 1
+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
+	 */
+	llc->cntl0_regval = 0x03;
+
+	/*
+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
+	 * FLAG cache) GPU blocks. This value will be passed along with
+	 * the address for any memory transaction from GPU to identify
+	 * the sub-cache for that transaction.
+	 *
+	 * Currently there is only one SCID allocated for all GPU blocks
+	 * Hence set same SCID for all the blocks.
+	 */
+
+	if (llc->gpu_llc_slice) {
+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
+
+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
+			llc->cntl1_regval |=
+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
+	}
+
+	/*
+	 * Set SCID for GPU IOMMU. This will be used to access
+	 * page tables that are cached in LLC.
+	 */
+	if (llc->gpuhtw_llc_slice) {
+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
+
+		llc->cntl1_regval |=
+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
+	}
+
+	return 0;
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -923,6 +1072,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
 	gpu->needs_hw_init = true;
 
+	/* Activate LLC slices */
+	a6xx_llc_activate(gpu);
+
 	return ret;
 }
 
@@ -931,6 +1083,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
+	/* Deactivate LLC slices */
+	a6xx_llc_deactivate(gpu);
+
 	/*
 	 * Make sure the GMU is idle before continuing (because some transitions
 	 * may use VBIF
@@ -993,6 +1148,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
 	}
 
+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
+
 	a6xx_gmu_remove(a6xx_gpu);
 
 	adreno_gpu_cleanup(adreno_gpu);
@@ -1040,7 +1197,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a6xx_registers;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4,
+			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 21ab701..392c426 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -21,6 +21,14 @@
 
 extern bool hang_debug;
 
+struct a6xx_llc {
+	void __iomem *mmio;
+	void *gpu_llc_slice;
+	void *gpuhtw_llc_slice;
+	u32 cntl0_regval;
+	u32 cntl1_regval;
+};
+
 struct a6xx_gpu {
 	struct adreno_gpu base;
 
@@ -46,6 +54,7 @@ struct a6xx_gpu {
 	uint64_t scratch_iova;
 
 	struct a6xx_gmu gmu;
+	struct a6xx_llc llc;
 };
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1ab629b..6c03eda 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -39,6 +39,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	int ret;
+	int gpu_htw_llc = 1;
+
+	/*
+	 * This allows GPU to set the bus attributes required
+	 * to use system cache on behalf of the iommu page table
+	 * walker.
+	 */
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		iommu_domain_set_attr(iommu->domain,
+				DOMAIN_ATTR_USE_UPSTREAM_HINT, &gpu_htw_llc);
 
 	pm_runtime_get_suppliers(mmu->dev);
 	ret = iommu_attach_device(iommu->domain, mmu->dev);
@@ -63,6 +73,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;
 
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		prot |= IOMMU_USE_UPSTREAM_HINT;
+
 	pm_runtime_get_suppliers(mmu->dev);
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	pm_runtime_put_suppliers(mmu->dev);
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 85df78d..257bdea 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -30,6 +30,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };
 
+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-12-20 19:58 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19 13:14 [PATCH 0/5] drm/msm/a6xx: System Cache Support Sharat Masetty
2019-12-19 13:14 ` Sharat Masetty
2019-12-19 13:14 ` Sharat Masetty
2019-12-19 13:14 ` [PATCH 1/5] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14 ` [PATCH 2/5] iommu/arm-smmu: Add domain attribute for QCOM system cache Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14 ` [PATCH 3/5] drm/msm: rearrange the gpu_rmw() function Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14 ` [PATCH 4/5] drm/msm: Pass mmu features to generic layers Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 19:23   ` Jordan Crouse
2019-12-19 19:23     ` Jordan Crouse
2019-12-19 19:23     ` Jordan Crouse
2019-12-19 13:14 ` [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC) Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 13:14   ` Sharat Masetty
2019-12-19 19:58   ` Jordan Crouse
2019-12-19 19:58     ` Jordan Crouse
2019-12-19 19:58     ` Jordan Crouse
2019-12-19 20:10     ` Jordan Crouse
2019-12-19 20:10       ` Jordan Crouse
2019-12-19 20:10       ` Jordan Crouse
2019-12-20 10:10     ` smasetty
2019-12-20 10:10       ` smasetty
2019-12-20 10:10       ` smasetty
2019-12-20 19:57       ` [Freedreno] " Jordan Crouse
2019-12-20 19:57         ` Jordan Crouse
2019-12-20 19:57         ` Jordan Crouse
  -- strict thread matches above, loose matches on Subject: below --
2018-03-23  7:19 [PATCH 0/5] Adreno A6xx system cache support Sharat Masetty
     [not found] ` <1521789591-28628-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-23  7:19   ` [PATCH 5/5] drm/msm/A6xx: Add support for using system cache(llc) Sharat Masetty
     [not found]     ` <1521789591-28628-6-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-04-03 21:24       ` Jordan Crouse
2018-04-05  7:04       ` Vivek Gautam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.