All of lore.kernel.org
 help / color / mirror / Atom feed
* [v2 0/7] drm/msm/a6xx: System Cache Support
@ 2018-10-05 13:08 Sharat Masetty
  2018-10-05 13:08 ` [v2 2/7] iommu/arm-smmu: Add support to use Last level cache Sharat Masetty
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Some hardware variants contain a system level cache or the last level
cache(llc). This cache is typically a large block which is shared by multiple
clients on the SOC. GPU uses the system cache to cache both the GPU data
buffers(like textures) as well the SMMU pagetables. This helps with
improved render performance as well as lower power consumption by reducing
the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which
then be used by multiple SOC clients. This patch series is an effort to enable
and use two of those slices perallocated for the GPU, one for the GPU data
buffers and another for the GPU SMMU hardware pagetables.

v2: Fixed code review comments from previous round. The first version was posted
a few months ago, so this is a refresh of the previous series. Update code to
conform to the newer version of the core llcc driver. Minor tweaks and
adjustments here and there.
Testing: Nothing breaks, but need to profile DDR traffic to see the impact the
cache blocks are really making.

Please review...

Jordan Crouse (1):
  soc: qcom: llcc-slice: Add error checks for API functions

Sharat Masetty (5):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm/adreno: Add registers in the GPU CX domain
  arm64:dts:sdm845: Add register range for gpu CX
  drm/msm: Pass mmu features to generic layers
  drm/msm/a6xx: Add support for using system cache(LLC)

Vivek Gautam (1):
  iommu/arm-smmu: Add support to use Last level cache

 arch/arm64/boot/dts/qcom/sdm845.dtsi    |   4 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx.xml.h   |   3 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 159 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |   9 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |   4 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c           |   8 ++
 drivers/gpu/drm/msm/msm_drv.h           |   1 +
 drivers/gpu/drm/msm/msm_gpu.c           |   6 +-
 drivers/gpu/drm/msm/msm_gpu.h           |   6 +-
 drivers/gpu/drm/msm/msm_iommu.c         |  13 +++
 drivers/gpu/drm/msm/msm_mmu.h           |  14 +++
 drivers/iommu/arm-smmu.c                |  14 +++
 drivers/iommu/io-pgtable-arm.c          |  24 ++++-
 drivers/iommu/io-pgtable.h              |   4 +
 drivers/soc/qcom/llcc-slice.c           |  15 ++-
 include/linux/iommu.h                   |   4 +
 20 files changed, 276 insertions(+), 20 deletions(-)

--
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [v2 1/7] soc: qcom: llcc-slice: Add error checks for API functions
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-10-05 13:08   ` Sharat Masetty
       [not found]     ` <1538744915-25490-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2018-10-05 13:08   ` [v2 5/7] arm64:dts:sdm845: Add register range for gpu CX Sharat Masetty
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Jordan Crouse <jcrouse@codeaurora.org>

llcc_slice_getd can return a ERR_PTR code on failure. Add a IS_ERR_OR_NULL
check to subsequent API calls that use struct llcc_slice_desc to guard
against faults and to let the leaf drivers get away with safely using a
ERR_PTR() encoded "pointer" in the aftermath of a llcc_slice_getd error.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/soc/qcom/llcc-slice.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/qcom/llcc-slice.c b/drivers/soc/qcom/llcc-slice.c
index d789267..6a03e4e 100644
--- a/drivers/soc/qcom/llcc-slice.c
+++ b/drivers/soc/qcom/llcc-slice.c
@@ -94,7 +94,8 @@ struct llcc_slice_desc *llcc_slice_getd(u32 uid)
  */
 void llcc_slice_putd(struct llcc_slice_desc *desc)
 {
-	kfree(desc);
+	if (!IS_ERR_OR_NULL(desc))
+		kfree(desc);
 }
 EXPORT_SYMBOL_GPL(llcc_slice_putd);
 
@@ -141,6 +142,9 @@ int llcc_slice_activate(struct llcc_slice_desc *desc)
 	int ret;
 	u32 act_ctrl_val;
 
+	if (IS_ERR_OR_NULL(desc))
+		return -EINVAL;
+
 	mutex_lock(&drv_data->lock);
 	if (test_bit(desc->slice_id, drv_data->bitmap)) {
 		mutex_unlock(&drv_data->lock);
@@ -175,6 +179,9 @@ int llcc_slice_deactivate(struct llcc_slice_desc *desc)
 	u32 act_ctrl_val;
 	int ret;
 
+	if (IS_ERR_OR_NULL(desc))
+		return -EINVAL;
+
 	mutex_lock(&drv_data->lock);
 	if (!test_bit(desc->slice_id, drv_data->bitmap)) {
 		mutex_unlock(&drv_data->lock);
@@ -202,6 +209,9 @@ int llcc_slice_deactivate(struct llcc_slice_desc *desc)
  */
 int llcc_get_slice_id(struct llcc_slice_desc *desc)
 {
+	if (IS_ERR_OR_NULL(desc))
+		return -EINVAL;
+
 	return desc->slice_id;
 }
 EXPORT_SYMBOL_GPL(llcc_get_slice_id);
@@ -212,6 +222,9 @@ int llcc_get_slice_id(struct llcc_slice_desc *desc)
  */
 size_t llcc_get_slice_size(struct llcc_slice_desc *desc)
 {
+	if (IS_ERR_OR_NULL(desc))
+		return 0;
+
 	return desc->slice_size;
 }
 EXPORT_SYMBOL_GPL(llcc_get_slice_size);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 2/7] iommu/arm-smmu: Add support to use Last level cache
  2018-10-05 13:08 [v2 0/7] drm/msm/a6xx: System Cache Support Sharat Masetty
@ 2018-10-05 13:08 ` Sharat Masetty
  2018-10-05 13:08 ` [v2 3/7] drm/msm: rearrange the gpu_rmw() function Sharat Masetty
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, Vivek Gautam, dri-devel

From: Vivek Gautam <vivek.gautam@codeaurora.org>

Qualcomm SoCs have an additional level of cache called as
System cache or Last level cache[1]. This cache sits right
before the DDR, and is tightly coupled with the memory
controller.
The cache is available to all the clients present in the
SoC system. The clients request their slices from this system
cache, make it active, and can then start using it. For these
clients with smmu, to start using the system cache for
dma buffers and related page tables [2], few of the memory
attributes need to be set accordingly.
This change makes the related memory Outer-Shareable, and
updates the MAIR with necessary protection.

The MAIR attribute requirements are:
    Inner Cacheablity = 0
    Outer Cacheablity = 1, Write-Back Write Allocate
    Outer Shareablity = 1

This change is a realisation of following changes
from downstream msm-4.9:
iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT
iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT

[1] https://patchwork.kernel.org/patch/10422531/
[2] https://patchwork.kernel.org/patch/10302791/

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c       | 14 ++++++++++++++
 drivers/iommu/io-pgtable-arm.c | 24 +++++++++++++++++++-----
 drivers/iommu/io-pgtable.h     |  4 ++++
 include/linux/iommu.h          |  4 ++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c057396..6f13744 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -253,6 +253,7 @@ struct arm_smmu_domain {
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	struct iommu_domain		domain;
+	bool				has_sys_cache;
 };
 
 struct arm_smmu_option_prop {
@@ -880,6 +881,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 
 	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
 		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
+	if (smmu_domain->has_sys_cache)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_SYS_CACHE;
 
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
@@ -1539,6 +1542,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_USE_SYS_CACHE:
+		*((int *)data) = smmu_domain->has_sys_cache;
+		return 0;
 	default:
 		return -ENODEV;
 	}
@@ -1568,6 +1574,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
 		break;
+	case DOMAIN_ATTR_USE_SYS_CACHE:
+		if (smmu_domain->smmu) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		if (*((int *)data))
+			smmu_domain->has_sys_cache = true;
+		break;
 	default:
 		ret = -ENODEV;
 	}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 010a254..b2aee18 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -169,9 +169,11 @@
 #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
 #define ARM_LPAE_MAIR_ATTR_NC		0x44
 #define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+#define ARM_LPAE_MAIR_ATTR_SYS_CACHE	0xf4
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
+#define ARM_LPAE_MAIR_ATTR_IDX_SYS_CACHE	3
 
 /* IOPTE accessors */
 #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
@@ -442,6 +444,10 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 		else if (prot & IOMMU_CACHE)
 			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (prot & IOMMU_SYS_CACHE)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_SYS_CACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+
 	} else {
 		pte = ARM_LPAE_PTE_HAP_FAULT;
 		if (prot & IOMMU_READ)
@@ -771,7 +777,8 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
+			    IO_PGTABLE_QUIRK_SYS_CACHE))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -779,9 +786,14 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 		return NULL;
 
 	/* TCR */
-	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+	if (cfg->quirks & IO_PGTABLE_QUIRK_SYS_CACHE) {
+		reg = (ARM_LPAE_TCR_SH_OS << ARM_LPAE_TCR_SH0_SHIFT) |
+		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT);
+	} else {
+		reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
+		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT);
+	}
+	reg |= (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -833,7 +845,9 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 	      (ARM_LPAE_MAIR_ATTR_WBRWA
 	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
 	      (ARM_LPAE_MAIR_ATTR_DEVICE
-	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+	      (ARM_LPAE_MAIR_ATTR_SYS_CACHE
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_SYS_CACHE));
 
 	cfg->arm_lpae_s1_cfg.mair[0] = reg;
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df7909..b5a3983 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,16 @@ struct io_pgtable_cfg {
 	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 	 *	software-emulated IOMMU), such that pagetable updates need not
 	 *	be treated as explicit DMA data.
+	 *
+	 * IO_PGTABLE_QUIRK_SYS_CACHE: Override the attributes set in TCR for
+	 *	the page table walker when using system cache.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
 	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
+	#define IO_PGTABLE_QUIRK_SYS_CACHE	BIT(5)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee..dacb9648 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -41,6 +41,9 @@
  * if the IOMMU page table format is equivalent.
  */
 #define IOMMU_PRIV	(1 << 5)
+/* Use last level cache available with few architectures */
+#define IOMMU_SYS_CACHE	(1 << 6)
+
 
 struct iommu_ops;
 struct iommu_group;
@@ -124,6 +127,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_USE_SYS_CACHE,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 3/7] drm/msm: rearrange the gpu_rmw() function
  2018-10-05 13:08 [v2 0/7] drm/msm/a6xx: System Cache Support Sharat Masetty
  2018-10-05 13:08 ` [v2 2/7] iommu/arm-smmu: Add support to use Last level cache Sharat Masetty
@ 2018-10-05 13:08 ` Sharat Masetty
  2018-10-05 13:08 ` [v2 4/7] drm/msm/adreno: Add registers in the GPU CX domain Sharat Masetty
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  3 siblings, 0 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, Sharat Masetty, dri-devel

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +----
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 85e46a3..d85e169 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -210,6 +210,14 @@ u32 msm_readl(const void __iomem *addr)
 	return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+	u32 val = msm_readl(addr);
+
+	val &= ~mask;
+	msm_writel(val | or, addr);
+}
+
 struct vblank_event {
 	struct list_head node;
 	int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 6f5c343..32c843c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -444,6 +444,7 @@ void __iomem *msm_ioremap(struct platform_device *pdev, const char *name,
 		const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 9df48e3..63ca28b 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -230,10 +230,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-	uint32_t val = gpu_read(gpu, reg);
-
-	val &= ~mask;
-	gpu_write(gpu, reg, val | or);
+	msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 4/7] drm/msm/adreno: Add registers in the GPU CX domain
  2018-10-05 13:08 [v2 0/7] drm/msm/a6xx: System Cache Support Sharat Masetty
  2018-10-05 13:08 ` [v2 2/7] iommu/arm-smmu: Add support to use Last level cache Sharat Masetty
  2018-10-05 13:08 ` [v2 3/7] drm/msm: rearrange the gpu_rmw() function Sharat Masetty
@ 2018-10-05 13:08 ` Sharat Masetty
       [not found]   ` <1538744915-25490-5-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  3 siblings, 1 reply; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, Sharat Masetty, dri-devel

Add the registers needed for configuring the system cache slice info and
other parameters in the GPU.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx.xml.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
index 2206765..2645b8f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
@@ -1780,5 +1780,8 @@ static inline uint32_t A6XX_CX_DBGC_CFG_DBGBUS_BYTEL_1_BYTEL15(uint32_t val)
 
 #define REG_A6XX_PDC_GPU_SEQ_MEM_0				0x00000000
 
+#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0		0x00000001
+
+#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1		0x00000002
 
 #endif /* A6XX_XML */
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 5/7] arm64:dts:sdm845: Add register range for gpu CX
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2018-10-05 13:08   ` [v2 1/7] soc: qcom: llcc-slice: Add error checks for API functions Sharat Masetty
@ 2018-10-05 13:08   ` Sharat Masetty
  2018-10-05 13:08   ` [v2 6/7] drm/msm: Pass mmu features to generic layers Sharat Masetty
  2018-10-05 13:08   ` [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC) Sharat Masetty
  3 siblings, 0 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This patch adds a register range in the gpu CX domain. This is needed to
support the last level system cache(LLC).

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 720b734..e106f26 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2780,8 +2780,8 @@
 			compatible = "qcom,adreno-630.2", "qcom,adreno";
 			#stream-id-cells = <16>;
 
-			reg = <0x5000000 0x40000>;
-			reg-names = "kgsl_3d0_reg_memory";
+			reg = <0x5000000 0x40000>, <0x509e000 0x10>;
+			reg-names = "kgsl_3d0_reg_memory", "cx_mem";
 
 			/*
 			 * Look ma, no clocks! The GPU clocks and power are
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 6/7] drm/msm: Pass mmu features to generic layers
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2018-10-05 13:08   ` [v2 1/7] soc: qcom: llcc-slice: Add error checks for API functions Sharat Masetty
  2018-10-05 13:08   ` [v2 5/7] arm64:dts:sdm845: Add register range for gpu CX Sharat Masetty
@ 2018-10-05 13:08   ` Sharat Masetty
  2018-10-05 13:08   ` [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC) Sharat Masetty
  3 siblings, 0 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Allow different Adreno targets the ability to pass
specific mmu features to the generic layers. This will
help conditionally configure certain iommu features for
certain Adreno targets.

Also Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c           |  6 ++++--
 drivers/gpu/drm/msm/msm_gpu.h           |  1 +
 drivers/gpu/drm/msm/msm_mmu.h           | 11 +++++++++++
 9 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 669c2d4..c8bb879 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -501,7 +501,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 7c4e6dc..a4240e9 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -581,7 +581,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index b540680..0c7ccc0 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1521,7 +1521,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	check_speed_bin(&pdev->dev);
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5004626..177dbfc 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -819,7 +819,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a6xx_registers;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 47e093f..9b58583 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -693,7 +693,8 @@ static int adreno_get_pwrlevels(struct device *dev,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu,
-		const struct adreno_gpu_funcs *funcs, int nr_rings)
+		const struct adreno_gpu_funcs *funcs, int nr_rings,
+		u32 mmu_features)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -712,6 +713,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.va_end = 0xffffffff;
 
 	adreno_gpu_config.nr_rings = nr_rings;
+	adreno_gpu_config.mmu_features = mmu_features;
 
 	adreno_get_pwrlevels(&pdev->dev, gpu);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index de6e6ee..871b951 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -228,7 +228,7 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
-		int nr_rings);
+		int nr_rings, u32 mmu_features);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 int adreno_load_fw(struct adreno_gpu *adreno_gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 19b4afe..d435988 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -798,7 +798,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		uint64_t va_start, uint64_t va_end, u32 mmu_features)
 {
 	struct iommu_domain *iommu;
 	struct msm_gem_address_space *aspace;
@@ -826,6 +826,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 		return ERR_CAST(aspace);
 	}
 
+	msm_mmu_set_feature(aspace->mmu, mmu_features);
+
 	ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0);
 	if (ret) {
 		msm_gem_address_space_put(aspace);
@@ -909,7 +911,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->mmu_features);
 
 	if (gpu->aspace == NULL)
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 63ca28b..3345ca3 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -36,6 +36,7 @@ struct msm_gpu_config {
 	uint64_t va_start;
 	uint64_t va_end;
 	unsigned int nr_rings;
+	u32 mmu_features;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 9c1b5aa..9b9f43f 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -54,6 +54,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -74,6 +75,16 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 	mmu->handler = handler;
 }
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu, u32 feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, u32 feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 /* DPU smmu driver initialize and cleanup functions */
 int __init msm_smmu_driver_init(void);
 void __exit msm_smmu_driver_cleanup(void);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC)
       [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-10-05 13:08   ` [v2 6/7] drm/msm: Pass mmu features to generic layers Sharat Masetty
@ 2018-10-05 13:08   ` Sharat Masetty
  2018-10-05 15:07     ` Jordan Crouse
  3 siblings, 1 reply; 14+ messages in thread
From: Sharat Masetty @ 2018-10-05 13:08 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

The last level system cache can be partitioned to 32 different slices
of which GPU has two slices preallocated. One slice is used for caching GPU
buffers and the other slice is used for caching the GPU SMMU pagetables.
This patch talks to the core system cache driver to acquire the slice handles,
configure the SCID's to those slices and activates and deactivates the slices
upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use of the
system cache. IOMMU_SYS_CACHE is a buffer protection flag which enables
caching GPU data buffers in the system cache with memory attributes such
as outer cacheable, read-allocate, write-allocate for buffers. The GPU
then has the ability to override a few cacheability parameters which it
does to override write-allocate to write-no-allocate as the GPU hardware
does not benefit much from it.

Similarly DOMAIN_ATTR_USE_SYS_CACHE is another domain level attribute
used by the IOMMU driver to set the right attributes to cache the hardware
pagetables into the system cache.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 159 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
 drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
 drivers/gpu/drm/msm/msm_mmu.h         |   3 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 177dbfc..1790dde 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -8,6 +8,7 @@
 #include "a6xx_gmu.xml.h"
 
 #include <linux/devfreq.h>
+#include <linux/soc/qcom/llcc-qcom.h>
 
 static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 {
@@ -674,6 +675,151 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
 	~0
 };
 
+#define A6XX_LLC_NUM_GPU_SCIDS		5
+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
+
+#define A6XX_GPU_LLC_SCID_MASK \
+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
+
+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
+#define A6XX_GPUHTW_LLC_SCID_MASK \
+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
+
+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
+	u32 reg, u32 mask, u32 or)
+{
+	msm_rmw(llc->mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_deactivate(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_llc *llc = &a6xx_gpu->llc;
+
+	llcc_slice_deactivate(llc->gpu_llc_slice);
+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_llc *llc = &a6xx_gpu->llc;
+
+	if (!llc->mmio)
+		return;
+
+	/*
+	 * If the LLCC_GPU slice activated, program the sub-cache ID for all
+	 * GPU blocks
+	 */
+	if (!llcc_slice_activate(llc->gpu_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPU_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPU_LLC_SCID_MASK));
+
+	/*
+	 * If the LLCC_GPUHTW slice activated, program the sub-cache ID for the
+	 * GPU pagetables
+	 */
+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
+		a6xx_gpu_cx_rmw(llc,
+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
+				A6XX_GPUHTW_LLC_SCID_MASK,
+				(llc->cntl1_regval &
+				 A6XX_GPUHTW_LLC_SCID_MASK));
+
+	/* Program cacheability overrides */
+	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
+		llc->cntl0_regval);
+}
+
+void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
+{
+	if (llc->mmio) {
+		iounmap(llc->mmio);
+		llc->mmio = NULL;
+	}
+
+	llcc_slice_putd(llc->gpu_llc_slice);
+	llc->gpu_llc_slice = NULL;
+
+	llcc_slice_putd(llc->gpuhtw_llc_slice);
+	llc->gpuhtw_llc_slice = NULL;
+}
+
+static int a6xx_llc_slices_init(struct platform_device *pdev,
+		struct a6xx_llc *llc)
+{
+	int i;
+
+	/* Map registers */
+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+	if (IS_ERR(llc->mmio)) {
+		llc->mmio = NULL;
+		return -1;
+	}
+
+	/* Get the system cache slice descriptor for GPU and GPUHTWs */
+	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
+	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
+		return -1;
+
+	/*
+	 * Setup GPU system cache CNTL0 and CNTL1 register values.
+	 * These values will be programmed everytime GPU comes out
+	 * of power collapse as these are non-retention registers.
+	 */
+
+	/*
+	 * CNTL0 provides options to override the settings for the
+	 * read and write allocation policies for the LLC. These
+	 * overrides are global for all memory transactions from
+	 * the GPU.
+	 *
+	 * 0x3: read-no-alloc-overridden = 0
+	 *      read-no-alloc = 0 - Allocate lines on read miss
+	 *      write-no-alloc-overridden = 1
+	 *      write-no-alloc = 1 - Do not allocates lines on write miss
+	 */
+	llc->cntl0_regval = 0x03;
+
+	/*
+	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
+	 * FLAG cache) GPU blocks. This value will be passed along with
+	 * the address for any memory transaction from GPU to identify
+	 * the sub-cache for that transaction.
+	 *
+	 * Currently there is only one SCID allocated for all GPU blocks
+	 * Hence set same SCID for all the blocks.
+	 */
+
+	if (!IS_ERR(llc->gpu_llc_slice)) {
+		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
+
+		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
+			llc->cntl1_regval |=
+				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
+	}
+
+	/*
+	 * Set SCID for GPU IOMMU. This will be used to access
+	 * page tables that are cached in LLC.
+	 */
+	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
+		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
+
+		llc->cntl1_regval |=
+			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
+	}
+
+	return 0;
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -686,6 +832,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
 	msm_gpu_resume_devfreq(gpu);
 
+	/* Activate LLC slices */
+	a6xx_llc_activate(gpu);
+
 	return ret;
 }
 
@@ -694,6 +843,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
+	/* Deactivate LLC slices */
+	a6xx_llc_deactivate(gpu);
+
 	devfreq_suspend_device(gpu->devfreq.devfreq);
 
 	/*
@@ -753,6 +905,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
 	}
 
+	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
+
 	a6xx_gmu_remove(a6xx_gpu);
 
 	adreno_gpu_cleanup(adreno_gpu);
@@ -819,7 +973,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a6xx_registers;
 	adreno_gpu->reg_offsets = a6xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
+	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
+			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 4127dce..86353e8 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -12,6 +12,14 @@
 
 extern bool hang_debug;
 
+struct a6xx_llc {
+	void __iomem *mmio;
+	void *gpu_llc_slice;
+	void *gpuhtw_llc_slice;
+	u32 cntl0_regval;
+	u32 cntl1_regval;
+};
+
 struct a6xx_gpu {
 	struct adreno_gpu base;
 
@@ -21,6 +29,7 @@ struct a6xx_gpu {
 	struct msm_ringbuffer *cur_ring;
 
 	struct a6xx_gmu gmu;
+	struct a6xx_llc llc;
 };
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index e80c79b..66612c4 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -38,6 +38,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
+	int gpu_htw_llc = 1;
+
+	/*
+	 * This allows GPU to set the bus attributes required
+	 * to use system cache on behalf of the iommu page table
+	 * walker.
+	 */
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		iommu_domain_set_attr(iommu->domain,
+				DOMAIN_ATTR_USE_SYS_CACHE, &gpu_htw_llc);
 
 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
@@ -56,6 +66,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;
 
+	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
+		prot |= IOMMU_SYS_CACHE;
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(ret < 0);
 
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 9b9f43f..524790b 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -49,6 +49,9 @@ struct msm_mmu_funcs {
 	bool (*is_domain_secure)(struct msm_mmu *mmu);
 };
 
+/* MMU features */
+#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [v2 4/7] drm/msm/adreno: Add registers in the GPU CX domain
       [not found]   ` <1538744915-25490-5-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-10-05 15:01     ` Jordan Crouse
       [not found]       ` <20181005150157.GI31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jordan Crouse @ 2018-10-05 15:01 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Oct 05, 2018 at 06:38:32PM +0530, Sharat Masetty wrote:
> Add the registers needed for configuring the system cache slice info and
> other parameters in the GPU.

This would conflict with msm-next or at least with the latest update from the
rnndb. It is good to have this out here for people to prototype but we need to
do a better job of keeping rnndb up to date so please send out a update for that
as soon as you can - it is a pretty easy thing for Rob to generate and push
new headers if we know that the database is good.

Jordan

> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx.xml.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
> index 2206765..2645b8f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
> @@ -1780,5 +1780,8 @@ static inline uint32_t A6XX_CX_DBGC_CFG_DBGBUS_BYTEL_1_BYTEL15(uint32_t val)
>  
>  #define REG_A6XX_PDC_GPU_SEQ_MEM_0				0x00000000
>  
> +#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0		0x00000001
> +
> +#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1		0x00000002
>  
>  #endif /* A6XX_XML */
> -- 
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC)
  2018-10-05 13:08   ` [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC) Sharat Masetty
@ 2018-10-05 15:07     ` Jordan Crouse
       [not found]       ` <20181005150745.GJ31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jordan Crouse @ 2018-10-05 15:07 UTC (permalink / raw)
  To: Sharat Masetty; +Cc: linux-arm-msm, freedreno, dri-devel

On Fri, Oct 05, 2018 at 06:38:35PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32 different slices
> of which GPU has two slices preallocated. One slice is used for caching GPU
> buffers and the other slice is used for caching the GPU SMMU pagetables.
> This patch talks to the core system cache driver to acquire the slice handles,
> configure the SCID's to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use of the
> system cache. IOMMU_SYS_CACHE is a buffer protection flag which enables
> caching GPU data buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> then has the ability to override a few cacheability parameters which it
> does to override write-allocate to write-no-allocate as the GPU hardware
> does not benefit much from it.
> 
> Similarly DOMAIN_ATTR_USE_SYS_CACHE is another domain level attribute
> used by the IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 159 +++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
>  drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>  4 files changed, 183 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 177dbfc..1790dde 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -8,6 +8,7 @@
>  #include "a6xx_gmu.xml.h"
>  
>  #include <linux/devfreq.h>
> +#include <linux/soc/qcom/llcc-qcom.h>
>  
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>  {
> @@ -674,6 +675,151 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>  	~0
>  };
>  
> +#define A6XX_LLC_NUM_GPU_SCIDS		5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> +
> +#define A6XX_GPU_LLC_SCID_MASK \
> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +
> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> +	u32 reg, u32 mask, u32 or)
> +{
> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	llcc_slice_deactivate(llc->gpu_llc_slice);
> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
> +
> +	if (!llc->mmio)
> +		return;
> +
> +	/*
> +	 * If the LLCC_GPU slice activated, program the sub-cache ID for all
> +	 * GPU blocks
> +	 */
> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPU_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPU_LLC_SCID_MASK));
> +
> +	/*
> +	 * If the LLCC_GPUHTW slice activated, program the sub-cache ID for the
> +	 * GPU pagetables
> +	 */
> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> +		a6xx_gpu_cx_rmw(llc,
> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> +				A6XX_GPUHTW_LLC_SCID_MASK,
> +				(llc->cntl1_regval &
> +				 A6XX_GPUHTW_LLC_SCID_MASK));
> +
> +	/* Program cacheability overrides */
> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> +		llc->cntl0_regval);
> +}
> +
> +void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> +	if (llc->mmio) {
> +		iounmap(llc->mmio);
> +		llc->mmio = NULL;
> +	}
> +
> +	llcc_slice_putd(llc->gpu_llc_slice);
> +	llc->gpu_llc_slice = NULL;

I don't think these need to be put back to NULL - we shouldn't touch them again
after this point.

> +
> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
> +	llc->gpuhtw_llc_slice = NULL;
> +}
> +
> +static int a6xx_llc_slices_init(struct platform_device *pdev,
> +		struct a6xx_llc *llc)
> +{
> +	int i;
> +
> +	/* Map registers */
> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> +	if (IS_ERR(llc->mmio)) {
> +		llc->mmio = NULL;
> +		return -1;

Return a valid error code here even if we don't care what it is.  -ENODEV maybe.
And in fact, if we don't care what it is (LLCC is very optional) then just don't
return anything at all.

> +	}
> +
> +	/* Get the system cache slice descriptor for GPU and GPUHTWs */
> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
> +		return -1;
> +
> +	/*
> +	 * Setup GPU system cache CNTL0 and CNTL1 register values.
> +	 * These values will be programmed everytime GPU comes out
> +	 * of power collapse as these are non-retention registers.
> +	 */
> +
> +	/*
> +	 * CNTL0 provides options to override the settings for the
> +	 * read and write allocation policies for the LLC. These
> +	 * overrides are global for all memory transactions from
> +	 * the GPU.
> +	 *
> +	 * 0x3: read-no-alloc-overridden = 0
> +	 *      read-no-alloc = 0 - Allocate lines on read miss
> +	 *      write-no-alloc-overridden = 1
> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
> +	 */
> +	llc->cntl0_regval = 0x03;
> +
> +	/*
> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> +	 * FLAG cache) GPU blocks. This value will be passed along with
> +	 * the address for any memory transaction from GPU to identify
> +	 * the sub-cache for that transaction.
> +	 *
> +	 * Currently there is only one SCID allocated for all GPU blocks
> +	 * Hence set same SCID for all the blocks.

This last sentence is not needed

> +	 */
> +
> +	if (!IS_ERR(llc->gpu_llc_slice)) {
> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> +
> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> +			llc->cntl1_regval |=
> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> +	}
> +
> +	/*
> +	 * Set SCID for GPU IOMMU. This will be used to access
> +	 * page tables that are cached in LLC.
> +	 */
> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
> +
> +		llc->cntl1_regval |=
> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
> +	}
> +
> +	return 0;
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -686,6 +832,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>  
>  	msm_gpu_resume_devfreq(gpu);
>  
> +	/* Activate LLC slices */
> +	a6xx_llc_activate(gpu);
> +
>  	return ret;
>  }
>  
> @@ -694,6 +843,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  
> +	/* Deactivate LLC slices */
> +	a6xx_llc_deactivate(gpu);
> +
>  	devfreq_suspend_device(gpu->devfreq.devfreq);
>  
>  	/*
> @@ -753,6 +905,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
>  	}
>  
> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
> +
>  	a6xx_gmu_remove(a6xx_gpu);
>  
>  	adreno_gpu_cleanup(adreno_gpu);
> @@ -819,7 +973,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	adreno_gpu->registers = a6xx_registers;
>  	adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);

Yep - there is no reason to take a ret and not deal with it.

> +
> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 4127dce..86353e8 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -12,6 +12,14 @@
>  
>  extern bool hang_debug;
>  
> +struct a6xx_llc {
> +	void __iomem *mmio;
> +	void *gpu_llc_slice;
> +	void *gpuhtw_llc_slice;
> +	u32 cntl0_regval;
> +	u32 cntl1_regval;
> +};
> +
>  struct a6xx_gpu {
>  	struct adreno_gpu base;
>  
> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>  	struct msm_ringbuffer *cur_ring;
>  
>  	struct a6xx_gmu gmu;
> +	struct a6xx_llc llc;
>  };
>  
>  #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index e80c79b..66612c4 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -38,6 +38,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>  			    int cnt)
>  {
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
> +	int gpu_htw_llc = 1;
> +
> +	/*
> +	 * This allows GPU to set the bus attributes required
> +	 * to use system cache on behalf of the iommu page table
> +	 * walker.
> +	 */
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		iommu_domain_set_attr(iommu->domain,
> +				DOMAIN_ATTR_USE_SYS_CACHE, &gpu_htw_llc);
>  
>  	return iommu_attach_device(iommu->domain, mmu->dev);
>  }
> @@ -56,6 +66,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>  	struct msm_iommu *iommu = to_msm_iommu(mmu);
>  	size_t ret;
>  
> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
> +		prot |= IOMMU_SYS_CACHE;
> +
>  	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>  	WARN_ON(ret < 0);
>  
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 9b9f43f..524790b 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -49,6 +49,9 @@ struct msm_mmu_funcs {
>  	bool (*is_domain_secure)(struct msm_mmu *mmu);
>  };
>  
> +/* MMU features */
> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
> +
>  struct msm_mmu {
>  	const struct msm_mmu_funcs *funcs;
>  	struct device *dev;
> -- 
> 1.9.1
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 4/7] drm/msm/adreno: Add registers in the GPU CX domain
       [not found]       ` <20181005150157.GI31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
@ 2018-10-08 13:46         ` Sharat Masetty
  0 siblings, 0 replies; 14+ messages in thread
From: Sharat Masetty @ 2018-10-08 13:46 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
  Cc: robdclark-Re5JQEeQqe8AvxtiuMwx3w



On 10/5/2018 8:31 PM, Jordan Crouse wrote:
> On Fri, Oct 05, 2018 at 06:38:32PM +0530, Sharat Masetty wrote:
>> Add the registers needed for configuring the system cache slice info and
>> other parameters in the GPU.
> 
> This would conflict with msm-next or at least with the latest update from the
> rnndb. It is good to have this out here for people to prototype but we need to
> do a better job of keeping rnndb up to date so please send out a update for that
> as soon as you can - it is a pretty easy thing for Rob to generate and push
> new headers if we know that the database is good.
> 
> Jordan
Okay, got it.  rnndb patch is 
@https://patchwork.freedesktop.org/patch/255356/. Added Rob to this 
thread too.
> 
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx.xml.h | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
>> index 2206765..2645b8f 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
>> @@ -1780,5 +1780,8 @@ static inline uint32_t A6XX_CX_DBGC_CFG_DBGBUS_BYTEL_1_BYTEL15(uint32_t val)
>>   
>>   #define REG_A6XX_PDC_GPU_SEQ_MEM_0				0x00000000
>>   
>> +#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0		0x00000001
>> +
>> +#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1		0x00000002
>>   
>>   #endif /* A6XX_XML */
>> -- 
>> 1.9.1
>>
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC)
       [not found]       ` <20181005150745.GJ31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
@ 2018-10-08 13:59         ` Sharat Masetty
       [not found]           ` <4dd1439a-990e-6a34-0290-7adc4837ca7f-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Sharat Masetty @ 2018-10-08 13:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA



On 10/5/2018 8:37 PM, Jordan Crouse wrote:
> On Fri, Oct 05, 2018 at 06:38:35PM +0530, Sharat Masetty wrote:
>> The last level system cache can be partitioned to 32 different slices
>> of which GPU has two slices preallocated. One slice is used for caching GPU
>> buffers and the other slice is used for caching the GPU SMMU pagetables.
>> This patch talks to the core system cache driver to acquire the slice handles,
>> configure the SCID's to those slices and activates and deactivates the slices
>> upon GPU power collapse and restore.
>>
>> Some support from the IOMMU driver is also needed to make use of the
>> system cache. IOMMU_SYS_CACHE is a buffer protection flag which enables
>> caching GPU data buffers in the system cache with memory attributes such
>> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
>> then has the ability to override a few cacheability parameters which it
>> does to override write-allocate to write-no-allocate as the GPU hardware
>> does not benefit much from it.
>>
>> Similarly DOMAIN_ATTR_USE_SYS_CACHE is another domain level attribute
>> used by the IOMMU driver to set the right attributes to cache the hardware
>> pagetables into the system cache.
>>
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 159 +++++++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
>>   drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
>>   drivers/gpu/drm/msm/msm_mmu.h         |   3 +
>>   4 files changed, 183 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 177dbfc..1790dde 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -8,6 +8,7 @@
>>   #include "a6xx_gmu.xml.h"
>>   
>>   #include <linux/devfreq.h>
>> +#include <linux/soc/qcom/llcc-qcom.h>
>>   
>>   static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>   {
>> @@ -674,6 +675,151 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>>   	~0
>>   };
>>   
>> +#define A6XX_LLC_NUM_GPU_SCIDS		5
>> +#define A6XX_GPU_LLC_SCID_NUM_BITS	5
>> +
>> +#define A6XX_GPU_LLC_SCID_MASK \
>> +	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
>> +
>> +#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
>> +#define A6XX_GPUHTW_LLC_SCID_MASK \
>> +	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
>> +
>> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
>> +	u32 reg, u32 mask, u32 or)
>> +{
>> +	msm_rmw(llc->mmio + (reg << 2), mask, or);
>> +}
>> +
>> +static void a6xx_llc_deactivate(struct msm_gpu *gpu)
>> +{
>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
>> +
>> +	llcc_slice_deactivate(llc->gpu_llc_slice);
>> +	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
>> +}
>> +
>> +static void a6xx_llc_activate(struct msm_gpu *gpu)
>> +{
>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_llc *llc = &a6xx_gpu->llc;
>> +
>> +	if (!llc->mmio)
>> +		return;
>> +
>> +	/*
>> +	 * If the LLCC_GPU slice activated, program the sub-cache ID for all
>> +	 * GPU blocks
>> +	 */
>> +	if (!llcc_slice_activate(llc->gpu_llc_slice))
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPU_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPU_LLC_SCID_MASK));
>> +
>> +	/*
>> +	 * If the LLCC_GPUHTW slice activated, program the sub-cache ID for the
>> +	 * GPU pagetables
>> +	 */
>> +	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
>> +		a6xx_gpu_cx_rmw(llc,
>> +				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
>> +				A6XX_GPUHTW_LLC_SCID_MASK,
>> +				(llc->cntl1_regval &
>> +				 A6XX_GPUHTW_LLC_SCID_MASK));
>> +
>> +	/* Program cacheability overrides */
>> +	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
>> +		llc->cntl0_regval);
>> +}
>> +
>> +void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
>> +{
>> +	if (llc->mmio) {
>> +		iounmap(llc->mmio);
>> +		llc->mmio = NULL;
>> +	}
>> +
>> +	llcc_slice_putd(llc->gpu_llc_slice);
>> +	llc->gpu_llc_slice = NULL;
> 
> I don't think these need to be put back to NULL - we shouldn't touch them again
> after this point.
> 
>> +
>> +	llcc_slice_putd(llc->gpuhtw_llc_slice);
>> +	llc->gpuhtw_llc_slice = NULL;
>> +}
>> +
>> +static int a6xx_llc_slices_init(struct platform_device *pdev,
>> +		struct a6xx_llc *llc)
>> +{
>> +	int i;
>> +
>> +	/* Map registers */
>> +	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
>> +	if (IS_ERR(llc->mmio)) {
>> +		llc->mmio = NULL;
>> +		return -1;
> 
> Return a valid error code here even if we don't care what it is.  -ENODEV maybe.
> And in fact, if we don't care what it is (LLCC is very optional) then just don't
> return anything at all.
Hi Jordan,

We do need the error code, as we have to let iommu layer know, so that 
it can set additional properties for the buffers and the page tables.
> 
>> +	}
>> +
>> +	/* Get the system cache slice descriptor for GPU and GPUHTWs */
>> +	llc->gpu_llc_slice = llcc_slice_getd(LLCC_GPU);
>> +	llc->gpuhtw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
>> +	if (IS_ERR(llc->gpu_llc_slice) && IS_ERR(llc->gpuhtw_llc_slice))
>> +		return -1;
>> +
>> +	/*
>> +	 * Setup GPU system cache CNTL0 and CNTL1 register values.
>> +	 * These values will be programmed everytime GPU comes out
>> +	 * of power collapse as these are non-retention registers.
>> +	 */
>> +
>> +	/*
>> +	 * CNTL0 provides options to override the settings for the
>> +	 * read and write allocation policies for the LLC. These
>> +	 * overrides are global for all memory transactions from
>> +	 * the GPU.
>> +	 *
>> +	 * 0x3: read-no-alloc-overridden = 0
>> +	 *      read-no-alloc = 0 - Allocate lines on read miss
>> +	 *      write-no-alloc-overridden = 1
>> +	 *      write-no-alloc = 1 - Do not allocates lines on write miss
>> +	 */
>> +	llc->cntl0_regval = 0x03;
>> +
>> +	/*
>> +	 * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
>> +	 * FLAG cache) GPU blocks. This value will be passed along with
>> +	 * the address for any memory transaction from GPU to identify
>> +	 * the sub-cache for that transaction.
>> +	 *
>> +	 * Currently there is only one SCID allocated for all GPU blocks
>> +	 * Hence set same SCID for all the blocks.
> 
> This last sentence is not needed
> 
>> +	 */
>> +
>> +	if (!IS_ERR(llc->gpu_llc_slice)) {
>> +		u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
>> +
>> +		for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
>> +			llc->cntl1_regval |=
>> +				gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
>> +	}
>> +
>> +	/*
>> +	 * Set SCID for GPU IOMMU. This will be used to access
>> +	 * page tables that are cached in LLC.
>> +	 */
>> +	if (!IS_ERR(llc->gpuhtw_llc_slice)) {
>> +		u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice);
>> +
>> +		llc->cntl1_regval |=
>> +			gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int a6xx_pm_resume(struct msm_gpu *gpu)
>>   {
>>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> @@ -686,6 +832,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>   
>>   	msm_gpu_resume_devfreq(gpu);
>>   
>> +	/* Activate LLC slices */
>> +	a6xx_llc_activate(gpu);
>> +
>>   	return ret;
>>   }
>>   
>> @@ -694,6 +843,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>   
>> +	/* Deactivate LLC slices */
>> +	a6xx_llc_deactivate(gpu);
>> +
>>   	devfreq_suspend_device(gpu->devfreq.devfreq);
>>   
>>   	/*
>> @@ -753,6 +905,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>   		drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo);
>>   	}
>>   
>> +	a6xx_llc_slices_destroy(&a6xx_gpu->llc);
>> +
>>   	a6xx_gmu_remove(a6xx_gpu);
>>   
>>   	adreno_gpu_cleanup(adreno_gpu);
>> @@ -819,7 +973,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>   	adreno_gpu->registers = a6xx_registers;
>>   	adreno_gpu->reg_offsets = a6xx_register_offsets;
>>   
>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>> +	ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc);
> 
> Yep - there is no reason to take a ret and not deal with it.
> 
>> +
>> +	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1,
>> +			ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE);
>>   	if (ret) {
>>   		a6xx_destroy(&(a6xx_gpu->base.base));
>>   		return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index 4127dce..86353e8 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -12,6 +12,14 @@
>>   
>>   extern bool hang_debug;
>>   
>> +struct a6xx_llc {
>> +	void __iomem *mmio;
>> +	void *gpu_llc_slice;
>> +	void *gpuhtw_llc_slice;
>> +	u32 cntl0_regval;
>> +	u32 cntl1_regval;
>> +};
>> +
>>   struct a6xx_gpu {
>>   	struct adreno_gpu base;
>>   
>> @@ -21,6 +29,7 @@ struct a6xx_gpu {
>>   	struct msm_ringbuffer *cur_ring;
>>   
>>   	struct a6xx_gmu gmu;
>> +	struct a6xx_llc llc;
>>   };
>>   
>>   #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
>> index e80c79b..66612c4 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -38,6 +38,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
>>   			    int cnt)
>>   {
>>   	struct msm_iommu *iommu = to_msm_iommu(mmu);
>> +	int gpu_htw_llc = 1;
>> +
>> +	/*
>> +	 * This allows GPU to set the bus attributes required
>> +	 * to use system cache on behalf of the iommu page table
>> +	 * walker.
>> +	 */
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		iommu_domain_set_attr(iommu->domain,
>> +				DOMAIN_ATTR_USE_SYS_CACHE, &gpu_htw_llc);
>>   
>>   	return iommu_attach_device(iommu->domain, mmu->dev);
>>   }
>> @@ -56,6 +66,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>>   	struct msm_iommu *iommu = to_msm_iommu(mmu);
>>   	size_t ret;
>>   
>> +	if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE))
>> +		prot |= IOMMU_SYS_CACHE;
>> +
>>   	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
>>   	WARN_ON(ret < 0);
>>   
>> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
>> index 9b9f43f..524790b 100644
>> --- a/drivers/gpu/drm/msm/msm_mmu.h
>> +++ b/drivers/gpu/drm/msm/msm_mmu.h
>> @@ -49,6 +49,9 @@ struct msm_mmu_funcs {
>>   	bool (*is_domain_secure)(struct msm_mmu *mmu);
>>   };
>>   
>> +/* MMU features */
>> +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0)
>> +
>>   struct msm_mmu {
>>   	const struct msm_mmu_funcs *funcs;
>>   	struct device *dev;
>> -- 
>> 1.9.1
>>
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC)
       [not found]           ` <4dd1439a-990e-6a34-0290-7adc4837ca7f-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-10-08 14:18             ` Jordan Crouse
  0 siblings, 0 replies; 14+ messages in thread
From: Jordan Crouse @ 2018-10-08 14:18 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Oct 08, 2018 at 07:29:03PM +0530, Sharat Masetty wrote:
> 
> 
> On 10/5/2018 8:37 PM, Jordan Crouse wrote:
> >On Fri, Oct 05, 2018 at 06:38:35PM +0530, Sharat Masetty wrote:
> >>The last level system cache can be partitioned to 32 different slices
> >>of which GPU has two slices preallocated. One slice is used for caching GPU
> >>buffers and the other slice is used for caching the GPU SMMU pagetables.
> >>This patch talks to the core system cache driver to acquire the slice handles,
> >>configure the SCID's to those slices and activates and deactivates the slices
> >>upon GPU power collapse and restore.
> >>
> >>Some support from the IOMMU driver is also needed to make use of the
> >>system cache. IOMMU_SYS_CACHE is a buffer protection flag which enables
> >>caching GPU data buffers in the system cache with memory attributes such
> >>as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> >>then has the ability to override a few cacheability parameters which it
> >>does to override write-allocate to write-no-allocate as the GPU hardware
> >>does not benefit much from it.
> >>
> >>Similarly DOMAIN_ATTR_USE_SYS_CACHE is another domain level attribute
> >>used by the IOMMU driver to set the right attributes to cache the hardware
> >>pagetables into the system cache.
> >>
> >>Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> >>---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 159 +++++++++++++++++++++++++++++++++-
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 ++
> >>  drivers/gpu/drm/msm/msm_iommu.c       |  13 +++
> >>  drivers/gpu/drm/msm/msm_mmu.h         |   3 +
> >>  4 files changed, 183 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>index 177dbfc..1790dde 100644
> >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>@@ -8,6 +8,7 @@
> >>  #include "a6xx_gmu.xml.h"
> >>  #include <linux/devfreq.h>
> >>+#include <linux/soc/qcom/llcc-qcom.h>
> >>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >>  {
> >>@@ -674,6 +675,151 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
> >>  	~0
> >>  };
> >>+#define A6XX_LLC_NUM_GPU_SCIDS		5
> >>+#define A6XX_GPU_LLC_SCID_NUM_BITS	5
> >>+
> >>+#define A6XX_GPU_LLC_SCID_MASK \
> >>+	((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> >>+
> >>+#define A6XX_GPUHTW_LLC_SCID_SHIFT	25
> >>+#define A6XX_GPUHTW_LLC_SCID_MASK \
> >>+	(((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> >>+
> >>+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> >>+	u32 reg, u32 mask, u32 or)
> >>+{
> >>+	msm_rmw(llc->mmio + (reg << 2), mask, or);
> >>+}
> >>+
> >>+static void a6xx_llc_deactivate(struct msm_gpu *gpu)
> >>+{
> >>+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>+	struct a6xx_llc *llc = &a6xx_gpu->llc;
> >>+
> >>+	llcc_slice_deactivate(llc->gpu_llc_slice);
> >>+	llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static void a6xx_llc_activate(struct msm_gpu *gpu)
> >>+{
> >>+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>+	struct a6xx_llc *llc = &a6xx_gpu->llc;
> >>+
> >>+	if (!llc->mmio)
> >>+		return;
> >>+
> >>+	/*
> >>+	 * If the LLCC_GPU slice activated, program the sub-cache ID for all
> >>+	 * GPU blocks
> >>+	 */
> >>+	if (!llcc_slice_activate(llc->gpu_llc_slice))
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPU_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPU_LLC_SCID_MASK));
> >>+
> >>+	/*
> >>+	 * If the LLCC_GPUHTW slice activated, program the sub-cache ID for the
> >>+	 * GPU pagetables
> >>+	 */
> >>+	if (!llcc_slice_activate(llc->gpuhtw_llc_slice))
> >>+		a6xx_gpu_cx_rmw(llc,
> >>+				REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+				A6XX_GPUHTW_LLC_SCID_MASK,
> >>+				(llc->cntl1_regval &
> >>+				 A6XX_GPUHTW_LLC_SCID_MASK));
> >>+
> >>+	/* Program cacheability overrides */
> >>+	a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> >>+		llc->cntl0_regval);
> >>+}
> >>+
> >>+void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> >>+{
> >>+	if (llc->mmio) {
> >>+		iounmap(llc->mmio);
> >>+		llc->mmio = NULL;
> >>+	}
> >>+
> >>+	llcc_slice_putd(llc->gpu_llc_slice);
> >>+	llc->gpu_llc_slice = NULL;
> >
> >I don't think these need to be put back to NULL - we shouldn't touch them again
> >after this point.
> >
> >>+
> >>+	llcc_slice_putd(llc->gpuhtw_llc_slice);
> >>+	llc->gpuhtw_llc_slice = NULL;
> >>+}
> >>+
> >>+static int a6xx_llc_slices_init(struct platform_device *pdev,
> >>+		struct a6xx_llc *llc)
> >>+{
> >>+	int i;
> >>+
> >>+	/* Map registers */
> >>+	llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> >>+	if (IS_ERR(llc->mmio)) {
> >>+		llc->mmio = NULL;
> >>+		return -1;
> >
> >Return a valid error code here even if we don't care what it is.  -ENODEV maybe.
> >And in fact, if we don't care what it is (LLCC is very optional) then just don't
> >return anything at all.
> Hi Jordan,
> 
> We do need the error code, as we have to let iommu layer know, so
> that it can set additional properties for the buffers and the page
> tables.

Oh, I see what you did (you should set the flag outside of the function call
so it is clearer what is going on).

But why is the MMU flag conditional? If we don't end up activating our slice
does it hurt to set the UPSTREAM_HINT in the pagetable?

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 1/7] soc: qcom: llcc-slice: Add error checks for API functions
       [not found]     ` <1538744915-25490-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-11-14 18:00       ` Andy Gross
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Gross @ 2018-11-14 18:00 UTC (permalink / raw)
  To: Sharat Masetty
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Oct 05, 2018 at 06:38:29PM +0530, Sharat Masetty wrote:
> From: Jordan Crouse <jcrouse@codeaurora.org>
> 
> llcc_slice_getd can return a ERR_PTR code on failure. Add a IS_ERR_OR_NULL
> check to subsequent API calls that use struct llcc_slice_desc to guard
> against faults and to let the leaf drivers get away with safely using a
> ERR_PTR() encoded "pointer" in the aftermath of a llcc_slice_getd error.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---

Thanks for sending this.  I'll queue this up.

Andy
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-11-14 18:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 13:08 [v2 0/7] drm/msm/a6xx: System Cache Support Sharat Masetty
2018-10-05 13:08 ` [v2 2/7] iommu/arm-smmu: Add support to use Last level cache Sharat Masetty
2018-10-05 13:08 ` [v2 3/7] drm/msm: rearrange the gpu_rmw() function Sharat Masetty
2018-10-05 13:08 ` [v2 4/7] drm/msm/adreno: Add registers in the GPU CX domain Sharat Masetty
     [not found]   ` <1538744915-25490-5-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-10-05 15:01     ` Jordan Crouse
     [not found]       ` <20181005150157.GI31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2018-10-08 13:46         ` Sharat Masetty
     [not found] ` <1538744915-25490-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-10-05 13:08   ` [v2 1/7] soc: qcom: llcc-slice: Add error checks for API functions Sharat Masetty
     [not found]     ` <1538744915-25490-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-11-14 18:00       ` Andy Gross
2018-10-05 13:08   ` [v2 5/7] arm64:dts:sdm845: Add register range for gpu CX Sharat Masetty
2018-10-05 13:08   ` [v2 6/7] drm/msm: Pass mmu features to generic layers Sharat Masetty
2018-10-05 13:08   ` [v2 7/7] drm/msm/a6xx: Add support for using system cache(LLC) Sharat Masetty
2018-10-05 15:07     ` Jordan Crouse
     [not found]       ` <20181005150745.GJ31641-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2018-10-08 13:59         ` Sharat Masetty
     [not found]           ` <4dd1439a-990e-6a34-0290-7adc4837ca7f-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-10-08 14:18             ` Jordan Crouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.