linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Initial support for the Adreno A5XX
@ 2016-11-04 22:44 Jordan Crouse
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Here is the first blast of patches for support for the fine family of Adreno
A5XX GPUs. These are designed to work against Rob's version of the Linaro
target tree with some additional 8996 clock stuff added in:

https://github.com/freedreno/kernel-msm/tree/a5xx

In addition, you will also need these SCM changes for the last three patches to
work:

https://patchwork.kernel.org/patch/9291593/ arm64: kernel: Add SMC Session ID to
results
https://patchwork.kernel.org/patch/9291595/ firmware: qcom: scm: Fix interrupted
SCM calls
https://patchwork.kernel.org/patch/9353735/ firmware: qcom: scm: Fix interrupted
SCM calls fully
https://patchwork.codeaurora.org/patch/109673/ dt-bindings: firmware: scm: Add
MSM8996 DT bindings
https://patchwork.codeaurora.org/patch/109675/ firmware: qcom: scm: Remove core,
iface and bus clocks dependency
https://patchwork.codeaurora.org/patch/109671/ firmware: qcom: scm: Return
PTR_ERR when devm_clk_get fails

And you also need a updated copy of the register header files by way of the
rnndb database with this change on top of it:

https://patchwork.codeaurora.org/patch/110423/ rnndb: a5xx: Update/enhance
registers

if you do all these things, apply these patches and copy in the right firmware
files on the db820c you should be able to start doing useful things. And there
was much rejoicing.

This is a RFC so please tear into all of these with glee. I'll try to explain
myself as best as I can. There are a few that might generate a bit of discussion
but I'm hoping that together we can find the right solutions.

Very best regards,
Jordan
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/16] drm/msm: Remove dependency on COMMON_CLK
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 02/16] drm/msm: Rename the MSM driver so it doesn't conflict with other drivers Jordan Crouse
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Remove an uneeded dependency on COMMON_CLK. Anything that uses
COMMON_CLK is already appropriately ifdefed.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 7c7a031..8526c50 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -3,7 +3,7 @@ config DRM_MSM
 	tristate "MSM DRM"
 	depends on DRM
 	depends on ARCH_QCOM || (ARM && COMPILE_TEST)
-	depends on OF && COMMON_CLK
+	depends on OF
 	select REGULATOR
 	select DRM_KMS_HELPER
 	select DRM_PANEL
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/16] drm/msm: Rename the MSM driver so it doesn't conflict with other drivers
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44   ` [PATCH 01/16] drm/msm: Remove dependency on COMMON_CLK Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 03/16] drm/msm: gpu: Cut down the list of "generic" registers to the ones we use Jordan Crouse
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

MSM is a very loaded name. Rename it to msm-drm to make it a little
bit more descriptive and to (kinda) avoid namespace contamination.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 8a02370..1c749a2 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1057,7 +1057,7 @@ static struct platform_driver msm_platform_driver = {
 	.probe      = msm_pdev_probe,
 	.remove     = msm_pdev_remove,
 	.driver     = {
-		.name   = "msm",
+		.name   = "msm-drm",
 		.of_match_table = dt_match,
 		.pm     = &msm_pm_ops,
 	},
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/16] drm/msm: gpu: Cut down the list of "generic" registers to the ones we use
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44   ` [PATCH 01/16] drm/msm: Remove dependency on COMMON_CLK Jordan Crouse
  2016-11-04 22:44   ` [PATCH 02/16] drm/msm: Rename the MSM driver so it doesn't conflict with other drivers Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages Jordan Crouse
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

There are very few register accesses in the common code. Cut down
the list of common registers to just those that are used.  This
saves const space and saves us the effort of maintaining registers
for A3XX and A4XX that don't exist or are unused.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 80 ---------------------------------
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 76 -------------------------------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 59 ------------------------
 3 files changed, 215 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index fd266ed..5eda847 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -419,91 +419,11 @@ static void a3xx_dump(struct msm_gpu *gpu)
 }
 /* Register offset defines for A3XX */
 static const unsigned int a3xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_DEBUG, REG_AXXX_CP_DEBUG),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_WADDR, REG_AXXX_CP_ME_RAM_WADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_DATA, REG_AXXX_CP_ME_RAM_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_DATA,
-			REG_A3XX_CP_PFP_UCODE_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_ADDR,
-			REG_A3XX_CP_PFP_UCODE_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_WFI_PEND_CTR, REG_A3XX_CP_WFI_PEND_CTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_AXXX_CP_RB_RPTR_ADDR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_AXXX_CP_RB_RPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_AXXX_CP_RB_WPTR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_CTRL, REG_A3XX_CP_PROTECT_CTRL),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_CNTL, REG_AXXX_CP_ME_CNTL),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_AXXX_CP_RB_CNTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BASE, REG_AXXX_CP_IB1_BASE),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BUFSZ, REG_AXXX_CP_IB1_BUFSZ),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BASE, REG_AXXX_CP_IB2_BASE),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BUFSZ, REG_AXXX_CP_IB2_BUFSZ),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_TIMESTAMP, REG_AXXX_CP_SCRATCH_REG0),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_RADDR, REG_AXXX_CP_ME_RAM_RADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_ADDR, REG_AXXX_SCRATCH_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_UMSK, REG_AXXX_SCRATCH_UMSK),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_ADDR, REG_A3XX_CP_ROQ_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_DATA, REG_A3XX_CP_ROQ_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_ADDR, REG_A3XX_CP_MERCIU_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA, REG_A3XX_CP_MERCIU_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA2, REG_A3XX_CP_MERCIU_DATA2),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_ADDR, REG_A3XX_CP_MEQ_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_DATA, REG_A3XX_CP_MEQ_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_HW_FAULT, REG_A3XX_CP_HW_FAULT),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_STATUS,
-			REG_A3XX_CP_PROTECT_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_STATUS, REG_A3XX_RBBM_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_CTL,
-			REG_A3XX_RBBM_PERFCTR_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
-			REG_A3XX_RBBM_PERFCTR_LOAD_CMD0),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
-			REG_A3XX_RBBM_PERFCTR_LOAD_CMD1),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
-			REG_A3XX_RBBM_PERFCTR_PWR_1_LO),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_MASK, REG_A3XX_RBBM_INT_0_MASK),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_STATUS,
-			REG_A3XX_RBBM_INT_0_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ERROR_STATUS,
-			REG_A3XX_RBBM_AHB_ERROR_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_CMD, REG_A3XX_RBBM_AHB_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_CLEAR_CMD,
-			REG_A3XX_RBBM_INT_CLEAR_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_CLOCK_CTL, REG_A3XX_RBBM_CLOCK_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_SEL,
-			REG_A3XX_VPC_VPC_DEBUG_RAM_SEL),
-	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_READ,
-			REG_A3XX_VPC_VPC_DEBUG_RAM_READ),
-	REG_ADRENO_DEFINE(REG_ADRENO_VSC_SIZE_ADDRESS,
-			REG_A3XX_VSC_SIZE_ADDRESS),
-	REG_ADRENO_DEFINE(REG_ADRENO_VFD_CONTROL_0, REG_A3XX_VFD_CONTROL_0),
-	REG_ADRENO_DEFINE(REG_ADRENO_VFD_INDEX_MAX, REG_A3XX_VFD_INDEX_MAX),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
-			REG_A3XX_SP_VS_PVT_MEM_ADDR_REG),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
-			REG_A3XX_SP_FS_PVT_MEM_ADDR_REG),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_OBJ_START_REG,
-			REG_A3XX_SP_VS_OBJ_START_REG),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_OBJ_START_REG,
-			REG_A3XX_SP_FS_OBJ_START_REG),
-	REG_ADRENO_DEFINE(REG_ADRENO_PA_SC_AA_CONFIG, REG_A3XX_PA_SC_AA_CONFIG),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PM_OVERRIDE2,
-			REG_A3XX_RBBM_PM_OVERRIDE2),
-	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_REG2, REG_AXXX_CP_SCRATCH_REG2),
-	REG_ADRENO_DEFINE(REG_ADRENO_SQ_GPR_MANAGEMENT,
-			REG_A3XX_SQ_GPR_MANAGEMENT),
-	REG_ADRENO_DEFINE(REG_ADRENO_SQ_INST_STORE_MANAGMENT,
-			REG_A3XX_SQ_INST_STORE_MANAGMENT),
-	REG_ADRENO_DEFINE(REG_ADRENO_TP0_CHICKEN, REG_A3XX_TP0_CHICKEN),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_RBBM_CTL, REG_A3XX_RBBM_RBBM_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_SW_RESET_CMD,
-			REG_A3XX_RBBM_SW_RESET_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_UCHE_INVALIDATE0,
-			REG_A3XX_UCHE_CACHE_INVALIDATE0_REG),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
-			REG_A3XX_RBBM_PERFCTR_LOAD_VALUE_LO),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
-			REG_A3XX_RBBM_PERFCTR_LOAD_VALUE_HI),
 };
 
 static const struct adreno_gpu_funcs funcs = {
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index d0d3c7b..0354be2 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -460,87 +460,11 @@ static void a4xx_show(struct msm_gpu *gpu, struct seq_file *m)
 
 /* Register offset defines for A4XX, in order of enum adreno_regs */
 static const unsigned int a4xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_DEBUG, REG_A4XX_CP_DEBUG),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_WADDR, REG_A4XX_CP_ME_RAM_WADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_DATA, REG_A4XX_CP_ME_RAM_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_DATA,
-			REG_A4XX_CP_PFP_UCODE_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PFP_UCODE_ADDR,
-			REG_A4XX_CP_PFP_UCODE_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_WFI_PEND_CTR, REG_A4XX_CP_WFI_PEND_CTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_A4XX_CP_RB_BASE),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_A4XX_CP_RB_RPTR_ADDR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_A4XX_CP_RB_RPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_A4XX_CP_RB_WPTR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_CTRL, REG_A4XX_CP_PROTECT_CTRL),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_CNTL, REG_A4XX_CP_ME_CNTL),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A4XX_CP_RB_CNTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BASE, REG_A4XX_CP_IB1_BASE),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB1_BUFSZ, REG_A4XX_CP_IB1_BUFSZ),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BASE, REG_A4XX_CP_IB2_BASE),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_IB2_BUFSZ, REG_A4XX_CP_IB2_BUFSZ),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_TIMESTAMP, REG_AXXX_CP_SCRATCH_REG0),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ME_RAM_RADDR, REG_A4XX_CP_ME_RAM_RADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_ADDR, REG_A4XX_CP_ROQ_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_ROQ_DATA, REG_A4XX_CP_ROQ_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_ADDR, REG_A4XX_CP_MERCIU_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA, REG_A4XX_CP_MERCIU_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MERCIU_DATA2, REG_A4XX_CP_MERCIU_DATA2),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_ADDR, REG_A4XX_CP_MEQ_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_MEQ_DATA, REG_A4XX_CP_MEQ_DATA),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_HW_FAULT, REG_A4XX_CP_HW_FAULT),
-	REG_ADRENO_DEFINE(REG_ADRENO_CP_PROTECT_STATUS,
-			REG_A4XX_CP_PROTECT_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_ADDR, REG_A4XX_CP_SCRATCH_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_SCRATCH_UMSK, REG_A4XX_CP_SCRATCH_UMASK),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_STATUS, REG_A4XX_RBBM_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_CTL,
-			REG_A4XX_RBBM_PERFCTR_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
-			REG_A4XX_RBBM_PERFCTR_LOAD_CMD0),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
-			REG_A4XX_RBBM_PERFCTR_LOAD_CMD1),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_CMD2,
-			REG_A4XX_RBBM_PERFCTR_LOAD_CMD2),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
-			REG_A4XX_RBBM_PERFCTR_PWR_1_LO),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_MASK, REG_A4XX_RBBM_INT_0_MASK),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_0_STATUS,
-			REG_A4XX_RBBM_INT_0_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ERROR_STATUS,
-			REG_A4XX_RBBM_AHB_ERROR_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_CMD, REG_A4XX_RBBM_AHB_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_CLOCK_CTL, REG_A4XX_RBBM_CLOCK_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_ME_SPLIT_STATUS,
-			REG_A4XX_RBBM_AHB_ME_SPLIT_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_AHB_PFP_SPLIT_STATUS,
-			REG_A4XX_RBBM_AHB_PFP_SPLIT_STATUS),
-	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_SEL,
-			REG_A4XX_VPC_DEBUG_RAM_SEL),
-	REG_ADRENO_DEFINE(REG_ADRENO_VPC_DEBUG_RAM_READ,
-			REG_A4XX_VPC_DEBUG_RAM_READ),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_INT_CLEAR_CMD,
-			REG_A4XX_RBBM_INT_CLEAR_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_VSC_SIZE_ADDRESS,
-			REG_A4XX_VSC_SIZE_ADDRESS),
-	REG_ADRENO_DEFINE(REG_ADRENO_VFD_CONTROL_0, REG_A4XX_VFD_CONTROL_0),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
-			REG_A4XX_SP_VS_PVT_MEM_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
-			REG_A4XX_SP_FS_PVT_MEM_ADDR),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_VS_OBJ_START_REG,
-			REG_A4XX_SP_VS_OBJ_START),
-	REG_ADRENO_DEFINE(REG_ADRENO_SP_FS_OBJ_START_REG,
-			REG_A4XX_SP_FS_OBJ_START),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_RBBM_CTL, REG_A4XX_RBBM_RBBM_CTL),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_SW_RESET_CMD,
-			REG_A4XX_RBBM_SW_RESET_CMD),
-	REG_ADRENO_DEFINE(REG_ADRENO_UCHE_INVALIDATE0,
-			REG_A4XX_UCHE_INVALIDATE0),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
-			REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_LO),
-	REG_ADRENO_DEFINE(REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
-			REG_A4XX_RBBM_PERFCTR_LOAD_VALUE_HI),
 };
 
 static void a4xx_dump(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index a54f6e0..fd976e8 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -35,70 +35,11 @@
  * and are indexed by the enumeration values defined in this enum
  */
 enum adreno_regs {
-	REG_ADRENO_CP_DEBUG,
-	REG_ADRENO_CP_ME_RAM_WADDR,
-	REG_ADRENO_CP_ME_RAM_DATA,
-	REG_ADRENO_CP_PFP_UCODE_DATA,
-	REG_ADRENO_CP_PFP_UCODE_ADDR,
-	REG_ADRENO_CP_WFI_PEND_CTR,
 	REG_ADRENO_CP_RB_BASE,
 	REG_ADRENO_CP_RB_RPTR_ADDR,
 	REG_ADRENO_CP_RB_RPTR,
 	REG_ADRENO_CP_RB_WPTR,
-	REG_ADRENO_CP_PROTECT_CTRL,
-	REG_ADRENO_CP_ME_CNTL,
 	REG_ADRENO_CP_RB_CNTL,
-	REG_ADRENO_CP_IB1_BASE,
-	REG_ADRENO_CP_IB1_BUFSZ,
-	REG_ADRENO_CP_IB2_BASE,
-	REG_ADRENO_CP_IB2_BUFSZ,
-	REG_ADRENO_CP_TIMESTAMP,
-	REG_ADRENO_CP_ME_RAM_RADDR,
-	REG_ADRENO_CP_ROQ_ADDR,
-	REG_ADRENO_CP_ROQ_DATA,
-	REG_ADRENO_CP_MERCIU_ADDR,
-	REG_ADRENO_CP_MERCIU_DATA,
-	REG_ADRENO_CP_MERCIU_DATA2,
-	REG_ADRENO_CP_MEQ_ADDR,
-	REG_ADRENO_CP_MEQ_DATA,
-	REG_ADRENO_CP_HW_FAULT,
-	REG_ADRENO_CP_PROTECT_STATUS,
-	REG_ADRENO_SCRATCH_ADDR,
-	REG_ADRENO_SCRATCH_UMSK,
-	REG_ADRENO_SCRATCH_REG2,
-	REG_ADRENO_RBBM_STATUS,
-	REG_ADRENO_RBBM_PERFCTR_CTL,
-	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD0,
-	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD1,
-	REG_ADRENO_RBBM_PERFCTR_LOAD_CMD2,
-	REG_ADRENO_RBBM_PERFCTR_PWR_1_LO,
-	REG_ADRENO_RBBM_INT_0_MASK,
-	REG_ADRENO_RBBM_INT_0_STATUS,
-	REG_ADRENO_RBBM_AHB_ERROR_STATUS,
-	REG_ADRENO_RBBM_PM_OVERRIDE2,
-	REG_ADRENO_RBBM_AHB_CMD,
-	REG_ADRENO_RBBM_INT_CLEAR_CMD,
-	REG_ADRENO_RBBM_SW_RESET_CMD,
-	REG_ADRENO_RBBM_CLOCK_CTL,
-	REG_ADRENO_RBBM_AHB_ME_SPLIT_STATUS,
-	REG_ADRENO_RBBM_AHB_PFP_SPLIT_STATUS,
-	REG_ADRENO_VPC_DEBUG_RAM_SEL,
-	REG_ADRENO_VPC_DEBUG_RAM_READ,
-	REG_ADRENO_VSC_SIZE_ADDRESS,
-	REG_ADRENO_VFD_CONTROL_0,
-	REG_ADRENO_VFD_INDEX_MAX,
-	REG_ADRENO_SP_VS_PVT_MEM_ADDR_REG,
-	REG_ADRENO_SP_FS_PVT_MEM_ADDR_REG,
-	REG_ADRENO_SP_VS_OBJ_START_REG,
-	REG_ADRENO_SP_FS_OBJ_START_REG,
-	REG_ADRENO_PA_SC_AA_CONFIG,
-	REG_ADRENO_SQ_GPR_MANAGEMENT,
-	REG_ADRENO_SQ_INST_STORE_MANAGMENT,
-	REG_ADRENO_TP0_CHICKEN,
-	REG_ADRENO_RBBM_RBBM_CTL,
-	REG_ADRENO_UCHE_INVALIDATE0,
-	REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_LO,
-	REG_ADRENO_RBBM_PERFCTR_LOAD_VALUE_HI,
 	REG_ADRENO_REGISTER_MAX,
 };
 
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 03/16] drm/msm: gpu: Cut down the list of "generic" registers to the ones we use Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-06 14:15     ` [Freedreno] " Rob Clark
  2016-11-04 22:44   ` [PATCH 05/16] drm/msm: gpu: Return error on hw_init failure Jordan Crouse
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

For reasons that are not entirely understood using dma_map_sg()
for nocache/write combine buffers doesn't always successfully flush
the cache after the memory is zeroed somewhere deep in the bowels
of the shmem code.  My working theory is that the cache flush on
the swiotlb bounce buffer address work isn't always flushing what
we need.

Instead of using dma_map_sg() directly kmap and flush each page
at allocate time.  We could use invalidate + clean or just invalidate
if we wanted to but on ARM64 using a flush is safer and not much
slower for what we are trying to do.

Hopefully someday I'll more clearly understand the relationship between
shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
about when and how we invalidate the caches.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 85f3047..29f5a30 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
 		struct drm_device *dev = obj->dev;
 		struct page **p;
 		int npages = obj->size >> PAGE_SHIFT;
+		int i;
 
 		if (use_pages(obj))
 			p = drm_gem_get_pages(obj);
@@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object *obj)
 			return p;
 		}
 
+		for (i = 0; i < npages; i++) {
+			void *addr = kmap_atomic(p[i]);
+
+			__dma_flush_range(addr, addr + PAGE_SIZE);
+			kunmap_atomic(addr);
+		}
+
 		msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
 		if (IS_ERR(msm_obj->sgt)) {
 			dev_err(dev->dev, "failed to allocate sgt\n");
@@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object *obj)
 		}
 
 		msm_obj->pages = p;
-
-		/* For non-cached buffers, ensure the new pages are clean
-		 * because display controller, GPU, etc. are not coherent:
-		 */
-		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_map_sg(dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 	}
 
 	return msm_obj->pages;
@@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	if (msm_obj->pages) {
-		/* For non-cached buffers, ensure the new pages are clean
-		 * because display controller, GPU, etc. are not coherent:
-		 */
-		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 		sg_free_table(msm_obj->sgt);
 		kfree(msm_obj->sgt);
 
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/16] drm/msm: gpu: Return error on hw_init failure
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
       [not found]     ` <1478299497-9729-6-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44   ` [PATCH 06/16] drm/msm: gpu: Add OUT_TYPE4 and OUT_TYPE7 Jordan Crouse
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

When the GPU hardware init function fails (like say, ME_INIT timed
out) return error instead of blindly continuing on. This gives us
a small chance of saving the system before it goes boom.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 21 ++++++++++++---------
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 20 +++++++++++---------
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 11 +++++------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
 drivers/gpu/drm/msm/msm_gpu.h           |  2 +-
 5 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 5eda847..13989d9 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -41,7 +41,7 @@ extern bool hang_debug;
 
 static void a3xx_dump(struct msm_gpu *gpu);
 
-static void a3xx_me_init(struct msm_gpu *gpu)
+static bool a3xx_me_init(struct msm_gpu *gpu)
 {
 	struct msm_ringbuffer *ring = gpu->rb;
 
@@ -65,7 +65,7 @@ static void a3xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	gpu->funcs->idle(gpu);
+	return gpu->funcs->idle(gpu);
 }
 
 static int a3xx_hw_init(struct msm_gpu *gpu)
@@ -294,9 +294,7 @@ static int a3xx_hw_init(struct msm_gpu *gpu)
 	/* clear ME_HALT to start micro engine */
 	gpu_write(gpu, REG_AXXX_CP_ME_CNTL, 0);
 
-	a3xx_me_init(gpu);
-
-	return 0;
+	return a3xx_me_init(gpu) ? 0 : -EINVAL;
 }
 
 static void a3xx_recover(struct msm_gpu *gpu)
@@ -330,17 +328,22 @@ static void a3xx_destroy(struct msm_gpu *gpu)
 	kfree(a3xx_gpu);
 }
 
-static void a3xx_idle(struct msm_gpu *gpu)
+static bool a3xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	adreno_idle(gpu);
+	if (!adreno_idle(gpu))
+		return false;
 
 	/* then wait for GPU to finish: */
 	if (spin_until(!(gpu_read(gpu, REG_A3XX_RBBM_STATUS) &
-			A3XX_RBBM_STATUS_GPU_BUSY)))
+			A3XX_RBBM_STATUS_GPU_BUSY))) {
 		DRM_ERROR("%s: timeout waiting for GPU to idle!\n", gpu->name);
 
-	/* TODO maybe we need to reset GPU here to recover from hang? */
+		/* TODO maybe we need to reset GPU here to recover from hang? */
+		return false;
+	}
+
+	return true;
 }
 
 static irqreturn_t a3xx_irq(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 0354be2..ba16507 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -113,7 +113,7 @@ static void a4xx_enable_hwcg(struct msm_gpu *gpu)
 }
 
 
-static void a4xx_me_init(struct msm_gpu *gpu)
+static bool a4xx_me_init(struct msm_gpu *gpu)
 {
 	struct msm_ringbuffer *ring = gpu->rb;
 
@@ -137,7 +137,7 @@ static void a4xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	gpu->funcs->idle(gpu);
+	return gpu->funcs->idle(gpu);
 }
 
 static int a4xx_hw_init(struct msm_gpu *gpu)
@@ -292,9 +292,7 @@ static int a4xx_hw_init(struct msm_gpu *gpu)
 	/* clear ME_HALT to start micro engine */
 	gpu_write(gpu, REG_A4XX_CP_ME_CNTL, 0);
 
-	a4xx_me_init(gpu);
-
-	return 0;
+	return a4xx_me_init(gpu) ? 0 : -EINVAL;
 }
 
 static void a4xx_recover(struct msm_gpu *gpu)
@@ -328,17 +326,21 @@ static void a4xx_destroy(struct msm_gpu *gpu)
 	kfree(a4xx_gpu);
 }
 
-static void a4xx_idle(struct msm_gpu *gpu)
+static bool a4xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	adreno_idle(gpu);
+	if (!adreno_idle(gpu))
+		return false;
 
 	/* then wait for GPU to finish: */
 	if (spin_until(!(gpu_read(gpu, REG_A4XX_RBBM_STATUS) &
-					A4XX_RBBM_STATUS_GPU_BUSY)))
+					A4XX_RBBM_STATUS_GPU_BUSY))) {
 		DRM_ERROR("%s: timeout waiting for GPU to idle!\n", gpu->name);
+		/* TODO maybe we need to reset GPU here to recover from hang? */
+		return false;
+	}
 
-	/* TODO maybe we need to reset GPU here to recover from hang? */
+	return true;
 }
 
 static irqreturn_t a4xx_irq(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index f386f46..8b2201c 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -218,19 +218,18 @@ void adreno_flush(struct msm_gpu *gpu)
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_WPTR, wptr);
 }
 
-void adreno_idle(struct msm_gpu *gpu)
+bool adreno_idle(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr = get_wptr(gpu->rb);
-	int ret;
 
 	/* wait for CP to drain ringbuffer: */
-	ret = spin_until(get_rptr(adreno_gpu) == wptr);
-
-	if (ret)
-		DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
+	if (!spin_until(get_rptr(adreno_gpu) == wptr))
+		return true;
 
 	/* TODO maybe we need to reset GPU here to recover from hang? */
+	DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
+	return false;
 }
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index fd976e8..30c2956 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -182,7 +182,7 @@ void adreno_recover(struct msm_gpu *gpu);
 void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		struct msm_file_private *ctx);
 void adreno_flush(struct msm_gpu *gpu);
-void adreno_idle(struct msm_gpu *gpu);
+bool adreno_idle(struct msm_gpu *gpu);
 #ifdef CONFIG_DEBUG_FS
 void adreno_show(struct msm_gpu *gpu, struct seq_file *m);
 #endif
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index c902283..4ee95ca 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -50,7 +50,7 @@ struct msm_gpu_funcs {
 	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			struct msm_file_private *ctx);
 	void (*flush)(struct msm_gpu *gpu);
-	void (*idle)(struct msm_gpu *gpu);
+	bool (*idle)(struct msm_gpu *gpu);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
 	uint32_t (*last_fence)(struct msm_gpu *gpu);
 	void (*recover)(struct msm_gpu *gpu);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/16] drm/msm: gpu: Add OUT_TYPE4 and OUT_TYPE7
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 05/16] drm/msm: gpu: Return error on hw_init failure Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 07/16] drm/msm: Add adreno_gpu_write64() Jordan Crouse
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Add helper functions for TYPE4 and TYPE7 ME opcodes that replace
TYPE0 and TYPE3 starting with the A5XX targets.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 30c2956..707487f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -219,6 +219,36 @@ OUT_PKT3(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 	OUT_RING(ring, CP_TYPE3_PKT | ((cnt-1) << 16) | ((opcode & 0xFF) << 8));
 }
 
+static inline u32 PM4_PARITY(u32 val)
+{
+	return (0x9669 >> (0xF & (val ^
+		(val >> 4) ^ (val >> 8) ^ (val >> 12) ^
+		(val >> 16) ^ ((val) >> 20) ^ (val >> 24) ^
+		(val >> 28)))) & 1;
+}
+
+/* Maximum number of values that can be executed for one opcode */
+#define TYPE4_MAX_PAYLOAD 127
+
+#define PKT4(_reg, _cnt) \
+	(CP_TYPE4_PKT | ((_cnt) << 0) | (PM4_PARITY((_cnt)) << 7) | \
+	 (((_reg) & 0x3FFFF) << 8) | (PM4_PARITY((_reg)) << 27))
+
+static inline void
+OUT_PKT4(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
+{
+	adreno_wait_ring(ring->gpu, cnt + 1);
+	OUT_RING(ring, PKT4(regindx, cnt));
+}
+
+static inline void
+OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
+{
+	adreno_wait_ring(ring->gpu, cnt + 1);
+	OUT_RING(ring, CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) |
+		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23));
+}
+
 /*
  * adreno_checkreg_off() - Checks the validity of a register enum
  * @gpu:		Pointer to struct adreno_gpu
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/16] drm/msm: Add adreno_gpu_write64()
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 06/16] drm/msm: gpu: Add OUT_TYPE4 and OUT_TYPE7 Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-07 19:19     ` [Freedreno] " Rob Clark
  2016-11-04 22:44   ` [PATCH 08/16] drm/msm: Remove 'src_clk' from adreno configuration Jordan Crouse
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Add a new generic function to write a "64" bit value. This isn't
actually a 64 bit operation, it just writes the upper and lower
32 bit of a 64 bit value to a specified LO and HI register.  If
a particular target doesn't support one of the registers it can
mark that register as SKIP and writes/reads from that register
will be quietly dropped.

This can be immediately put in place for the ringbuffer base and
the RPTR address.  Both writes are converted to use
adreno_gpu_write64() with their respective high and low registers
and the high register appropriately marked as SKIP for both 32 bit
targets (a3xx and a4xx). When a5xx comes it will define valid target
registers for the 'hi' option and everything else will just work.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 ++
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 11 +++++++----
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++++++-
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 13989d9..806aab4 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -423,7 +423,9 @@ static void a3xx_dump(struct msm_gpu *gpu)
 /* Register offset defines for A3XX */
 static const unsigned int a3xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
+	REG_ADRENO_SKIP(REG_ADRENO_CP_RB_BASE_HI),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_AXXX_CP_RB_RPTR_ADDR),
+	REG_ADRENO_SKIP(REG_ADRENO_CP_RB_RPTR_ADDR_HI),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_AXXX_CP_RB_RPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_AXXX_CP_RB_WPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_AXXX_CP_RB_CNTL),
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index ba16507..f39e082 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -463,7 +463,9 @@ static void a4xx_show(struct msm_gpu *gpu, struct seq_file *m)
 /* Register offset defines for A4XX, in order of enum adreno_regs */
 static const unsigned int a4xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_A4XX_CP_RB_BASE),
+	REG_ADRENO_SKIP(REG_ADRENO_CP_RB_BASE_HI),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_A4XX_CP_RB_RPTR_ADDR),
+	REG_ADRENO_SKIP(REG_ADRENO_CP_RB_RPTR_ADDR_HI),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_A4XX_CP_RB_RPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_A4XX_CP_RB_WPTR),
 	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A4XX_CP_RB_CNTL),
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 8b2201c..d7b62b8 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -79,11 +79,14 @@ int adreno_hw_init(struct msm_gpu *gpu)
 			(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
 
 	/* Setup ringbuffer address: */
-	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_BASE, gpu->rb_iova);
+	adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_BASE,
+		REG_ADRENO_CP_RB_BASE_HI, gpu->rb_iova);
 
-	if (!adreno_is_a430(adreno_gpu))
-		adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
-						rbmemptr(adreno_gpu, rptr));
+	if (!adreno_is_a430(adreno_gpu)) {
+		adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
+			REG_ADRENO_CP_RB_RPTR_ADDR_HI,
+			rbmemptr(adreno_gpu, rptr));
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 707487f..7b9895f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -28,6 +28,9 @@
 #include "adreno_pm4.xml.h"
 
 #define REG_ADRENO_DEFINE(_offset, _reg) [_offset] = (_reg) + 1
+#define REG_SKIP ~0
+#define REG_ADRENO_SKIP(_offset) [_offset] = REG_SKIP
+
 /**
  * adreno_regs: List of registers that are used in across all
  * 3D devices. Each device type has different offset value for the same
@@ -36,7 +39,9 @@
  */
 enum adreno_regs {
 	REG_ADRENO_CP_RB_BASE,
+	REG_ADRENO_CP_RB_BASE_HI,
 	REG_ADRENO_CP_RB_RPTR_ADDR,
+	REG_ADRENO_CP_RB_RPTR_ADDR_HI,
 	REG_ADRENO_CP_RB_RPTR,
 	REG_ADRENO_CP_RB_WPTR,
 	REG_ADRENO_CP_RB_CNTL,
@@ -250,7 +255,7 @@ OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 }
 
 /*
- * adreno_checkreg_off() - Checks the validity of a register enum
+ * adreno_reg_check() - Checks the validity of a register enum
  * @gpu:		Pointer to struct adreno_gpu
  * @offset_name:	The register enum that is checked
  */
@@ -261,6 +266,16 @@ static inline bool adreno_reg_check(struct adreno_gpu *gpu,
 			!gpu->reg_offsets[offset_name]) {
 		BUG();
 	}
+
+	/*
+	 * REG_SKIP is a special value that tell us that the register in
+	 * question isn't implemented on target but don't trigger a BUG(). This
+	 * is used to cleanly implement adreno_gpu_write64() and
+	 * adreno_gpu_read64() in a generic fashion
+	 */
+	if (gpu->reg_offsets[offset_name] == REG_SKIP)
+		return false;
+
 	return true;
 }
 
@@ -282,4 +297,11 @@ static inline void adreno_gpu_write(struct adreno_gpu *gpu,
 		gpu_write(&gpu->base, reg - 1, data);
 }
 
+static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
+		enum adreno_regs lo, enum adreno_regs hi, u64 data)
+{
+	adreno_gpu_write(gpu, lo, lower_32_bits(data));
+	adreno_gpu_write(gpu, hi, upper_32_bits(data));
+}
+
 #endif /* __ADRENO_GPU_H__ */
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/16] drm/msm: Remove 'src_clk' from adreno configuration
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 07/16] drm/msm: Add adreno_gpu_write64() Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 09/16] drm/msm: gpu Add new gpu register read/write functions Jordan Crouse
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

The adreno code inherited a silly workaround from downstream
from the bad old days before decent clock control. grp_clk[0]
(named 'src_clk') doesn't actually exist - it was used as a proxy
for whatever the core clock actually was (usually 'core_clk').

All targets should be able to correctly request 'core_clk' and
get the right thing back so zap the anachronism and directly
use grp_clk[0] to control the clock rate.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 36 +++++++++++++-----------------------
 drivers/gpu/drm/msm/msm_gpu.h |  2 +-
 2 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 36ed53e..207c7d2e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -91,21 +91,16 @@ static int disable_pwrrail(struct msm_gpu *gpu)
 
 static int enable_clk(struct msm_gpu *gpu)
 {
-	struct clk *rate_clk = NULL;
 	int i;
 
-	/* NOTE: kgsl_pwrctrl_clk() ignores grp_clks[0].. */
-	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i > 0; i--) {
-		if (gpu->grp_clks[i]) {
-			clk_prepare(gpu->grp_clks[i]);
-			rate_clk = gpu->grp_clks[i];
-		}
-	}
+	if (gpu->grp_clks[0] && gpu->fast_rate)
+		clk_set_rate(gpu->grp_clks[0], gpu->fast_rate);
 
-	if (rate_clk && gpu->fast_rate)
-		clk_set_rate(rate_clk, gpu->fast_rate);
+	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i >= 0; i--)
+		if (gpu->grp_clks[i])
+			clk_prepare(gpu->grp_clks[i]);
 
-	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i > 0; i--)
+	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i >= 0; i--)
 		if (gpu->grp_clks[i])
 			clk_enable(gpu->grp_clks[i]);
 
@@ -114,24 +109,19 @@ static int enable_clk(struct msm_gpu *gpu)
 
 static int disable_clk(struct msm_gpu *gpu)
 {
-	struct clk *rate_clk = NULL;
 	int i;
 
-	/* NOTE: kgsl_pwrctrl_clk() ignores grp_clks[0].. */
-	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i > 0; i--) {
-		if (gpu->grp_clks[i]) {
+	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i >= 0; i--)
+		if (gpu->grp_clks[i])
 			clk_disable(gpu->grp_clks[i]);
-			rate_clk = gpu->grp_clks[i];
-		}
-	}
 
-	if (rate_clk && gpu->slow_rate)
-		clk_set_rate(rate_clk, gpu->slow_rate);
-
-	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i > 0; i--)
+	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i >= 0; i--)
 		if (gpu->grp_clks[i])
 			clk_unprepare(gpu->grp_clks[i]);
 
+	if (gpu->grp_clks[0] && gpu->slow_rate)
+		clk_set_rate(gpu->grp_clks[0], gpu->slow_rate);
+
 	return 0;
 }
 
@@ -572,7 +562,7 @@ static irqreturn_t irq_handler(int irq, void *data)
 }
 
 static const char *clk_names[] = {
-		"src_clk", "core_clk", "iface_clk", "mem_clk", "mem_iface_clk",
+		"core_clk", "iface_clk", "mem_clk", "mem_iface_clk",
 		"alt_mem_iface_clk",
 };
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 4ee95ca..161cd2f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -103,7 +103,7 @@ struct msm_gpu {
 
 	/* Power Control: */
 	struct regulator *gpu_reg, *gpu_cx;
-	struct clk *ebi1_clk, *grp_clks[6];
+	struct clk *ebi1_clk, *grp_clks[5];
 	uint32_t fast_rate, slow_rate, bus_freq;
 
 #ifdef DOWNSTREAM_CONFIG_MSM_BUS_SCALING
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/16] drm/msm: gpu Add new gpu register read/write functions
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 08/16] drm/msm: Remove 'src_clk' from adreno configuration Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
       [not found]     ` <1478299497-9729-10-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44   ` [PATCH 10/16] drm/msm: Disable interrupts during init Jordan Crouse
                     ` (3 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Add some new functions to manipulate GPU registers.  gpu_read64 and
gpu_write64 can read/write a 64 bit value to two 32 bit registers.
For 4XX and older these are normally perfcounter registers, but
future targets will use 64 bit addressing so there will be many
more spots where a 64 bit read and write are needed.

gpu_rmw() does a read/modify/write on a 32 bit register given a mask
and bits to OR in.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
 drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index f39e082..f2f9c91 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -515,16 +515,8 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
 
 static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-	uint32_t hi, lo, tmp;
-
-	tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
-	do {
-		hi = tmp;
-		lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
-		tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
-	} while (tmp != hi);
-
-	*value = (((uint64_t)hi) << 32) | lo;
+	*value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
+		REG_A4XX_RBBM_PERFCTR_CP_0_HI);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 161cd2f..bec3735 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -154,6 +154,45 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 	return msm_readl(gpu->mmio + (reg << 2));
 }
 
+static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
+{
+	uint32_t val = gpu_read(gpu, reg);
+
+	val &= ~mask;
+	gpu_write(gpu, reg, val | or);
+}
+
+static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
+{
+	u64 val;
+
+	/*
+	 * Why not a readq here? Two reasons: 1) many of the LO registers are
+	 * not quad word aligned and 2) the GPU hardware designers have a bit
+	 * of a history of putting registers where they fit, especially in
+	 * spins. The longer a GPU family goes the higher the chance that
+	 * we'll get burned.  We could do a series of validity checks if we
+	 * wanted to, but really is a readq() that much better? Nah.
+	 */
+
+	/*
+	 * For some lo/hi registers (like perfcounters), the hi value is latched
+	 * when the lo is read, so make sure to read the lo first to trigger
+	 * that
+	 */
+	val = (u64) msm_readl(gpu->mmio + (lo << 2));
+	val |= ((u64) msm_readl(gpu->mmio + (hi << 2)) << 32);
+
+	return val;
+}
+
+static inline void gpu_write64(struct msm_gpu *gpu, u32 lo, u32 hi, u64 val)
+{
+	/* Why not a writeq here? Read the screed above */
+	msm_writel(lower_32_bits(val), gpu->mmio + (lo << 2));
+	msm_writel(upper_32_bits(val), gpu->mmio + (hi << 2));
+}
+
 int msm_gpu_pm_suspend(struct msm_gpu *gpu);
 int msm_gpu_pm_resume(struct msm_gpu *gpu);
 
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/16] drm/msm: Disable interrupts during init
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 09/16] drm/msm: gpu Add new gpu register read/write functions Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 13/16] drm/msm: gpu: Add support for the GPMU Jordan Crouse
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Disable the interrupt during the init sequence to avoid having
interrupts fired for errors and other things that we are not
ready to handle while initializing.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 4 ++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c    | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 5127b75..3321d33 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -148,12 +148,16 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev)
 		mutex_lock(&dev->struct_mutex);
 		gpu->funcs->pm_resume(gpu);
 		mutex_unlock(&dev->struct_mutex);
+
+		disable_irq(gpu->irq);
+
 		ret = gpu->funcs->hw_init(gpu);
 		if (ret) {
 			dev_err(dev->dev, "gpu hw init failed: %d\n", ret);
 			gpu->funcs->destroy(gpu);
 			gpu = NULL;
 		} else {
+			enable_irq(gpu->irq);
 			/* give inactive pm a chance to kick in: */
 			msm_gpu_retire(gpu);
 		}
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index d7b62b8..6cc1d7f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -129,11 +129,14 @@ void adreno_recover(struct msm_gpu *gpu)
 	adreno_gpu->memptrs->wptr  = 0;
 
 	gpu->funcs->pm_resume(gpu);
+
+	disable_irq(gpu->irq);
 	ret = gpu->funcs->hw_init(gpu);
 	if (ret) {
 		dev_err(dev->dev, "gpu hw init failed: %d\n", ret);
 		/* hmm, oh well? */
 	}
+	enable_irq(gpu->irq);
 }
 
 void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/16] arm64: dts: Add Adreno GPU and GPU smmu definitions
  2016-11-04 22:44 [RFC] Initial support for the Adreno A5XX Jordan Crouse
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-04 22:44 ` Jordan Crouse
  2016-11-04 22:44 ` [PATCH 12/16] drm/msm: gpu: Add A5XX target support Jordan Crouse
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm

Add an initial node for the Adreno GPU and it's companion
SMMU. The GPU node is mostly complete except for a bare
bones power table that will be filled out more completely
later.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/msm8996.dtsi | 78 +++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 1f7f8a9..f71b468 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -467,6 +467,84 @@
 			};
 		};
 
+		adreno_smmu: arm,smmu@b40000 {
+			compatible = "arm,smmu-v2";
+			reg = <0xb40000 0x10000>;
+
+			#global-interrupts = <1>;
+			interrupts = <0 334 0>,
+				     <0 329 0>,
+				     <0 330 0>;
+			#iommu-cells = <1>;
+
+			clocks = <&mmcc MMSS_MMAGIC_AHB_CLK>,
+				<&mmcc MMSS_MMAGIC_CFG_AHB_CLK>,
+				<&mmcc GPU_AHB_CLK>,
+				<&gcc GCC_MMSS_BIMC_GFX_CLK>,
+				<&gcc GCC_BIMC_GFX_CLK>,
+				<&mmcc MMSS_MISC_AHB_CLK>;
+			clock-names = "mmagic_ahb_clk",
+				"mmagic_cfg_ahb_clk",
+				"gpu_ahb_clk",
+				"gcc_mmss_bimc_gfx_clk",
+				"gcc_bimc_gfx_clk",
+				"mmss_misc_bus_clk";
+
+			power-domains = <&mmcc GPU_GDSC>;
+
+			qcom,skip-init;
+			qcom,register-save;
+
+			status = "okay";
+		};
+
+		adreno-3xx@b00000 {
+			compatible = "qcom,adreno-3xx";
+			#stream-id-cells = <16>;
+
+			reg = <0xb00000 0x3f000>;
+			reg-names = "kgsl_3d0_reg_memory";
+
+			interrupts = <0 300 0>;
+			interrupt-names = "kgsl_3d0_irq";
+
+			clocks = <&mmcc GPU_GX_GFX3D_CLK>,
+				<&mmcc GPU_AHB_CLK>,
+				<&mmcc GPU_GX_RBBMTIMER_CLK>,
+				<&gcc GCC_BIMC_GFX_CLK>,
+				<&gcc GCC_MMSS_BIMC_GFX_CLK>,
+				<&mmcc MMSS_MMAGIC_AHB_CLK>;
+
+			clock-names = "core_clk",
+				"iface_clk",
+				"rbbmtimer_clk",
+				"mem_clk",
+				"mem_iface_clk",
+				"alt_mem_iface_clk";
+
+			power-domains = <&mmcc GPU_GDSC>;
+			iommus = <&adreno_smmu 0>;
+
+			/* There are patchlevel 3 chips in the world (Snapdragon
+			 * (820) but they are functionally similar to the 821 in
+			 * the code so we can safely set the chipset as
+			 * patchlevel 4. */
+			qcom,chipid = <0x05030004>;
+
+			/* This is a safe speed for bring up in all bin levels.
+			 * This isn't the fastest the chip can go, but we can
+			 * get there eventually */
+			qcom,gpu-pwrlevels {
+				compatible = "qcom,gpu-pwrlevels";
+				qcom,gpu-pwrlevel@0 {
+					qcom,gpu-freq = <205000000>;
+				};
+				qcom,gpu-pwrlevel@1 {
+					qcom,gpu-freq = <27000000>;
+				};
+			};
+		};
+
 		mdp_smmu: arm,smmu@d00000 {
 			compatible = "arm,smmu-v2";
 			reg = <0xd00000 0x10000>;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/16] drm/msm: gpu: Add A5XX target support
  2016-11-04 22:44 [RFC] Initial support for the Adreno A5XX Jordan Crouse
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44 ` [PATCH 11/16] arm64: dts: Add Adreno GPU and GPU smmu definitions Jordan Crouse
@ 2016-11-04 22:44 ` Jordan Crouse
  2016-11-04 22:44 ` [PATCH 14/16] firmware: qcom_scm: Add qcom_scm_gpu_zap_resume() Jordan Crouse
  2016-11-08 17:12 ` [Freedreno] [RFC] Initial support for the Adreno A5XX Jordan Crouse
  4 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm

Add support for the A5XX family of Adreno GPUs.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/Makefile               |   1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c      | 820 +++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h      |  37 ++
 drivers/gpu/drm/msm/adreno/adreno_device.c |  25 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c    |   6 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h    |  40 ++
 drivers/gpu/drm/msm/msm_gpu.c              |  11 +-
 drivers/gpu/drm/msm/msm_gpu.h              |   2 +-
 8 files changed, 934 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_gpu.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_gpu.h

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 4e2806c..7c4ddb3 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -6,6 +6,7 @@ msm-y := \
 	adreno/adreno_gpu.o \
 	adreno/a3xx_gpu.o \
 	adreno/a4xx_gpu.o \
+	adreno/a5xx_gpu.o \
 	hdmi/hdmi.o \
 	hdmi/hdmi_audio.o \
 	hdmi/hdmi_bridge.o \
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
new file mode 100644
index 0000000..89fea7e
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -0,0 +1,820 @@
+/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "msm_gem.h"
+#include "a5xx_gpu.h"
+
+extern bool hang_debug;
+static void a5xx_dump(struct msm_gpu *gpu);
+
+static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
+	struct msm_file_private *ctx)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_drm_private *priv = gpu->dev->dev_private;
+	struct msm_ringbuffer *ring = gpu->rb;
+	unsigned int i, ibs = 0;
+
+	for (i = 0; i < submit->nr_cmds; i++) {
+		switch (submit->cmd[i].type) {
+		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
+			break;
+		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
+			if (priv->lastctx == ctx)
+				break;
+		case MSM_SUBMIT_CMD_BUF:
+			OUT_PKT7(ring, CP_INDIRECT_BUFFER_PFE, 3);
+			OUT_RING(ring, lower_32_bits(submit->cmd[i].iova));
+			OUT_RING(ring, upper_32_bits(submit->cmd[i].iova));
+			OUT_RING(ring, submit->cmd[i].size);
+			ibs++;
+			break;
+		}
+	}
+
+	OUT_PKT4(ring, REG_A5XX_CP_SCRATCH_REG(2), 1);
+	OUT_RING(ring, submit->fence->seqno);
+
+	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
+	OUT_RING(ring, CACHE_FLUSH_TS | (1 << 31));
+	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, fence)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, fence)));
+	OUT_RING(ring, submit->fence->seqno);
+
+	gpu->funcs->flush(gpu);
+}
+
+struct a5xx_hwcg {
+	u32 offset;
+	u32 value;
+};
+
+static const struct a5xx_hwcg a530_hwcg[] = {
+	{REG_A5XX_RBBM_CLOCK_CNTL_SP0, 0x02222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_SP1, 0x02222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_SP2, 0x02222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_SP3, 0x02222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_SP1, 0x02222220},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_SP2, 0x02222220},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_SP3, 0x02222220},
+	{REG_A5XX_RBBM_CLOCK_HYST_SP0, 0x0000F3CF},
+	{REG_A5XX_RBBM_CLOCK_HYST_SP1, 0x0000F3CF},
+	{REG_A5XX_RBBM_CLOCK_HYST_SP2, 0x0000F3CF},
+	{REG_A5XX_RBBM_CLOCK_HYST_SP3, 0x0000F3CF},
+	{REG_A5XX_RBBM_CLOCK_DELAY_SP0, 0x00000080},
+	{REG_A5XX_RBBM_CLOCK_DELAY_SP1, 0x00000080},
+	{REG_A5XX_RBBM_CLOCK_DELAY_SP2, 0x00000080},
+	{REG_A5XX_RBBM_CLOCK_DELAY_SP3, 0x00000080},
+	{REG_A5XX_RBBM_CLOCK_CNTL_TP0, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_TP1, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_TP2, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_TP3, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_TP1, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_TP2, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_TP3, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL3_TP0, 0x00002222},
+	{REG_A5XX_RBBM_CLOCK_CNTL3_TP1, 0x00002222},
+	{REG_A5XX_RBBM_CLOCK_CNTL3_TP2, 0x00002222},
+	{REG_A5XX_RBBM_CLOCK_CNTL3_TP3, 0x00002222},
+	{REG_A5XX_RBBM_CLOCK_HYST_TP0, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST_TP1, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST_TP2, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST_TP3, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST2_TP0, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST2_TP1, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST2_TP2, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST2_TP3, 0x77777777},
+	{REG_A5XX_RBBM_CLOCK_HYST3_TP0, 0x00007777},
+	{REG_A5XX_RBBM_CLOCK_HYST3_TP1, 0x00007777},
+	{REG_A5XX_RBBM_CLOCK_HYST3_TP2, 0x00007777},
+	{REG_A5XX_RBBM_CLOCK_HYST3_TP3, 0x00007777},
+	{REG_A5XX_RBBM_CLOCK_DELAY_TP0, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY_TP1, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY_TP2, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY_TP3, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY2_TP1, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY2_TP2, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY2_TP3, 0x11111111},
+	{REG_A5XX_RBBM_CLOCK_DELAY3_TP0, 0x00001111},
+	{REG_A5XX_RBBM_CLOCK_DELAY3_TP1, 0x00001111},
+	{REG_A5XX_RBBM_CLOCK_DELAY3_TP2, 0x00001111},
+	{REG_A5XX_RBBM_CLOCK_DELAY3_TP3, 0x00001111},
+	{REG_A5XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_UCHE, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL3_UCHE, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL4_UCHE, 0x00222222},
+	{REG_A5XX_RBBM_CLOCK_HYST_UCHE, 0x00444444},
+	{REG_A5XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002},
+	{REG_A5XX_RBBM_CLOCK_CNTL_RB0, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_RB1, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_RB2, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_RB3, 0x22222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_RB0, 0x00222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_RB1, 0x00222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_RB2, 0x00222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_RB3, 0x00222222},
+	{REG_A5XX_RBBM_CLOCK_CNTL_CCU0, 0x00022220},
+	{REG_A5XX_RBBM_CLOCK_CNTL_CCU1, 0x00022220},
+	{REG_A5XX_RBBM_CLOCK_CNTL_CCU2, 0x00022220},
+	{REG_A5XX_RBBM_CLOCK_CNTL_CCU3, 0x00022220},
+	{REG_A5XX_RBBM_CLOCK_CNTL_RAC, 0x05522222},
+	{REG_A5XX_RBBM_CLOCK_CNTL2_RAC, 0x00505555},
+	{REG_A5XX_RBBM_CLOCK_HYST_RB_CCU0, 0x04040404},
+	{REG_A5XX_RBBM_CLOCK_HYST_RB_CCU1, 0x04040404},
+	{REG_A5XX_RBBM_CLOCK_HYST_RB_CCU2, 0x04040404},
+	{REG_A5XX_RBBM_CLOCK_HYST_RB_CCU3, 0x04040404},
+	{REG_A5XX_RBBM_CLOCK_HYST_RAC, 0x07444044},
+	{REG_A5XX_RBBM_CLOCK_DELAY_RB_CCU_L1_0, 0x00000002},
+	{REG_A5XX_RBBM_CLOCK_DELAY_RB_CCU_L1_1, 0x00000002},
+	{REG_A5XX_RBBM_CLOCK_DELAY_RB_CCU_L1_2, 0x00000002},
+	{REG_A5XX_RBBM_CLOCK_DELAY_RB_CCU_L1_3, 0x00000002},
+	{REG_A5XX_RBBM_CLOCK_DELAY_RAC, 0x00010011},
+	{REG_A5XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222},
+	{REG_A5XX_RBBM_CLOCK_MODE_GPC, 0x02222222},
+	{REG_A5XX_RBBM_CLOCK_MODE_VFD, 0x00002222},
+	{REG_A5XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000},
+	{REG_A5XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+	{REG_A5XX_RBBM_CLOCK_HYST_VFD, 0x00000000},
+	{REG_A5XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000},
+	{REG_A5XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000},
+	{REG_A5XX_RBBM_CLOCK_DELAY_GPC, 0x00000200},
+	{REG_A5XX_RBBM_CLOCK_DELAY_VFD, 0x00002222}
+};
+
+static const struct {
+	int (*test)(struct adreno_gpu *gpu);
+	const struct a5xx_hwcg *regs;
+	unsigned int count;
+} a5xx_hwcg_regs[] = {
+	{ adreno_is_a530, a530_hwcg, ARRAY_SIZE(a530_hwcg), },
+};
+
+static void _a5xx_enable_hwcg(struct msm_gpu *gpu,
+		const struct a5xx_hwcg *regs, unsigned int count)
+{
+	unsigned int i;
+
+	for (i = 0; i < count; i++)
+		gpu_write(gpu, regs[i].offset, regs[i].value);
+
+	gpu_write(gpu, REG_A5XX_RBBM_CLOCK_CNTL, 0xAAA8AA00);
+	gpu_write(gpu, REG_A5XX_RBBM_ISDB_CNT, 0x182);
+}
+
+static void a5xx_enable_hwcg(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(a5xx_hwcg_regs); i++) {
+		if (a5xx_hwcg_regs[i].test(adreno_gpu)) {
+			_a5xx_enable_hwcg(gpu, a5xx_hwcg_regs[i].regs,
+				a5xx_hwcg_regs[i].count);
+			return;
+		}
+	}
+}
+
+static int a5xx_me_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_ringbuffer *ring = gpu->rb;
+
+	OUT_PKT7(ring, CP_ME_INIT, 8);
+
+	OUT_RING(ring, 0x0000002F);
+
+	/* Enable multiple hardware contexts */
+	OUT_RING(ring, 0x00000003);
+
+	/* Enable error detection */
+	OUT_RING(ring, 0x20000000);
+
+	/* Don't enable header dump */
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+
+	/* Specify workarounds for various microcode issues */
+	if (adreno_is_a530(adreno_gpu)) {
+		/* Workaround for token end syncs
+		 * Force a WFI after every direct-render 3D mode draw and every
+		 * 2D mode 3 draw
+		 */
+		OUT_RING(ring, 0x0000000B);
+	} else {
+		/* No workarounds enabled */
+		OUT_RING(ring, 0x00000000);
+	}
+
+	OUT_RING(ring, 0x00000000);
+	OUT_RING(ring, 0x00000000);
+
+	gpu->funcs->flush(gpu);
+
+	return gpu->funcs->idle(gpu) ? 0 : -EINVAL;
+}
+
+static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
+		const struct firmware *fw, u32 *iova)
+{
+	struct drm_device *drm = gpu->dev;
+	struct drm_gem_object *bo;
+	void *ptr;
+
+	mutex_lock(&drm->struct_mutex);
+	bo = msm_gem_new(drm, fw->size - 4, MSM_BO_UNCACHED);
+	mutex_unlock(&drm->struct_mutex);
+
+	if (IS_ERR(bo))
+		return bo;
+
+	ptr = msm_gem_get_vaddr(bo);
+	if (!ptr) {
+		drm_gem_object_unreference_unlocked(bo);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (iova) {
+		int ret = msm_gem_get_iova(bo, gpu->id, iova);
+
+		if (ret) {
+			drm_gem_object_unreference_unlocked(bo);
+			return ERR_PTR(ret);
+		}
+	}
+
+	memcpy(ptr, &fw->data[4], fw->size - 4);
+
+	msm_gem_put_vaddr(bo);
+	return bo;
+}
+
+static int a5xx_ucode_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	int ret;
+
+	if (!a5xx_gpu->pm4_bo) {
+		a5xx_gpu->pm4_bo = a5xx_ucode_load_bo(gpu, adreno_gpu->pm4,
+			&a5xx_gpu->pm4_iova);
+
+		if (IS_ERR(a5xx_gpu->pm4_bo)) {
+			ret = PTR_ERR(a5xx_gpu->pm4_bo);
+			a5xx_gpu->pm4_bo = NULL;
+			dev_err(gpu->dev->dev, "could not allocate PM4: %d\n",
+				ret);
+			return ret;
+		}
+	}
+
+	if (!a5xx_gpu->pfp_bo) {
+		a5xx_gpu->pfp_bo = a5xx_ucode_load_bo(gpu, adreno_gpu->pfp,
+			&a5xx_gpu->pfp_iova);
+
+		if (IS_ERR(a5xx_gpu->pfp_bo)) {
+			ret = PTR_ERR(a5xx_gpu->pfp_bo);
+			a5xx_gpu->pfp_bo = NULL;
+			dev_err(gpu->dev->dev, "could not allocate PFP: %d\n",
+				ret);
+			return ret;
+		}
+	}
+
+	gpu_write64(gpu, REG_A5XX_CP_ME_INSTR_BASE_LO,
+		REG_A5XX_CP_ME_INSTR_BASE_HI, a5xx_gpu->pm4_iova);
+
+	gpu_write64(gpu, REG_A5XX_CP_PFP_INSTR_BASE_LO,
+		REG_A5XX_CP_PFP_INSTR_BASE_HI, a5xx_gpu->pfp_iova);
+
+	return 0;
+}
+
+#define A5XX_INT_MASK (A5XX_RBBM_INT_0_MASK_RBBM_AHB_ERROR | \
+	  A5XX_RBBM_INT_0_MASK_RBBM_TRANSFER_TIMEOUT | \
+	  A5XX_RBBM_INT_0_MASK_RBBM_ME_MS_TIMEOUT | \
+	  A5XX_RBBM_INT_0_MASK_RBBM_PFP_MS_TIMEOUT | \
+	  A5XX_RBBM_INT_0_MASK_RBBM_ETS_MS_TIMEOUT | \
+	  A5XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNC_OVERFLOW | \
+	  A5XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
+	  A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS | \
+	  A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS | \
+	  A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
+
+static int a5xx_hw_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	int ret;
+
+	gpu_write(gpu, REG_A5XX_VBIF_ROUND_ROBIN_QOS_ARB, 0x00000003);
+
+	/* Make all blocks contribute to the GPU BUSY perf counter */
+	gpu_write(gpu, REG_A5XX_RBBM_PERFCTR_GPU_BUSY_MASKED, 0xFFFFFFFF);
+
+	/* Enable RBBM error reporting bits */
+	gpu_write(gpu, REG_A5XX_RBBM_AHB_CNTL0, 0x00000001);
+
+	if (adreno_gpu->quirks & ADRENO_QUIRK_FAULT_DETECT_MASK) {
+		/*
+		 * Mask out the activity signals from RB1-3 to avoid false
+		 * positives
+		 */
+
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL11,
+			0xF0000000);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL12,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL13,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL14,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL15,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL16,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL17,
+			0xFFFFFFFF);
+		gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_MASK_CNTL18,
+			0xFFFFFFFF);
+	}
+
+	/* Enable fault detection */
+	gpu_write(gpu, REG_A5XX_RBBM_INTERFACE_HANG_INT_CNTL,
+		(1 << 30) | 0xFFFF);
+
+	/* Turn on performance counters */
+	gpu_write(gpu, REG_A5XX_RBBM_PERFCTR_CNTL, 0x01);
+
+	/* Increase VFD cache access so LRZ and other data gets evicted less */
+	gpu_write(gpu, REG_A5XX_UCHE_CACHE_WAYS, 0x02);
+
+	/* Disable L2 bypass in the UCHE */
+	gpu_write(gpu, REG_A5XX_UCHE_TRAP_BASE_LO, 0xFFFF0000);
+	gpu_write(gpu, REG_A5XX_UCHE_TRAP_BASE_HI, 0x0001FFFF);
+	gpu_write(gpu, REG_A5XX_UCHE_WRITE_THRU_BASE_LO, 0xFFFF0000);
+	gpu_write(gpu, REG_A5XX_UCHE_WRITE_THRU_BASE_HI, 0x0001FFFF);
+
+	/* Set the GMEM VA range (0 to gpu->gmem) */
+	gpu_write(gpu, REG_A5XX_UCHE_GMEM_RANGE_MIN_LO, 0x80000000);
+	gpu_write(gpu, REG_A5XX_UCHE_GMEM_RANGE_MIN_HI, 0x00000000);
+	gpu_write(gpu, REG_A5XX_UCHE_GMEM_RANGE_MAX_LO,
+		0x80000000 + adreno_gpu->gmem - 1);
+	gpu_write(gpu, REG_A5XX_UCHE_GMEM_RANGE_MAX_HI, 0x00000000);
+
+	gpu_write(gpu, REG_A5XX_CP_MEQ_THRESHOLDS, 0x40);
+	gpu_write(gpu, REG_A5XX_CP_MERCIU_SIZE, 0x40);
+	gpu_write(gpu, REG_A5XX_CP_ROQ_THRESHOLDS_2, 0x80000060);
+	gpu_write(gpu, REG_A5XX_CP_ROQ_THRESHOLDS_1, 0x40201B16);
+
+	gpu_write(gpu, REG_A5XX_PC_DBG_ECO_CNTL, (0x400 << 11 | 0x300 << 22));
+
+	if (adreno_gpu->quirks & ADRENO_QUIRK_TWO_PASS_USE_WFI)
+		gpu_rmw(gpu, REG_A5XX_PC_DBG_ECO_CNTL, 0, (1 << 8));
+
+	gpu_write(gpu, REG_A5XX_PC_DBG_ECO_CNTL, 0xc0200100);
+
+	/* Enable USE_RETENTION_FLOPS */
+	gpu_write(gpu, REG_A5XX_CP_CHICKEN_DBG, 0x02000000);
+
+	/* Enable ME/PFP split notification */
+	gpu_write(gpu, REG_A5XX_RBBM_AHB_CNTL1, 0xA6FFFFFF);
+
+	/* Enable HWCG */
+	a5xx_enable_hwcg(gpu);
+
+	gpu_write(gpu, REG_A5XX_RBBM_AHB_CNTL2, 0x0000003F);
+
+	/* Set the highest bank bit */
+	gpu_write(gpu, REG_A5XX_TPL1_MODE_CNTL, 2 << 7);
+	gpu_write(gpu, REG_A5XX_RB_MODE_CNTL, 2 << 1);
+
+	/* Protect registers from the CP */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT_CNTL, 0x00000007);
+
+	/* RBBM */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(0), ADRENO_PROTECT_RW(0x04, 4));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(1), ADRENO_PROTECT_RW(0x08, 8));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(2), ADRENO_PROTECT_RW(0x10, 16));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(3), ADRENO_PROTECT_RW(0x20, 32));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(4), ADRENO_PROTECT_RW(0x40, 64));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(5), ADRENO_PROTECT_RW(0x80, 64));
+
+	/* Content protect */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(6),
+		ADRENO_PROTECT_RW(REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO,
+			16));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(7),
+		ADRENO_PROTECT_RW(REG_A5XX_RBBM_SECVID_TRUST_CNTL, 2));
+
+	/* CP */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(8), ADRENO_PROTECT_RW(0x800, 64));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(9), ADRENO_PROTECT_RW(0x840, 8));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(10), ADRENO_PROTECT_RW(0x880, 32));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(11), ADRENO_PROTECT_RW(0xAA0, 1));
+
+	/* RB */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(12), ADRENO_PROTECT_RW(0xCC0, 1));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(13), ADRENO_PROTECT_RW(0xCF0, 2));
+
+	/* VPC */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(14), ADRENO_PROTECT_RW(0xE68, 8));
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(15), ADRENO_PROTECT_RW(0xE70, 4));
+
+	/* UCHE */
+	gpu_write(gpu, REG_A5XX_CP_PROTECT(16), ADRENO_PROTECT_RW(0xE80, 16));
+
+	if (adreno_is_a530(adreno_gpu))
+		gpu_write(gpu, REG_A5XX_CP_PROTECT(17),
+			ADRENO_PROTECT_RW(0x10000, 0x8000));
+
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_CNTL, 0);
+	/*
+	 * Disable the trusted memory range - we don't actually supported secure
+	 * memory rendering at this point in time and we don't want to block off
+	 * part of the virtual memory space.
+	 */
+	gpu_write64(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO,
+		REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x00000000);
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x00000000);
+
+	ret = adreno_hw_init(gpu);
+	if (ret)
+		return ret;
+
+	ret = a5xx_ucode_init(gpu);
+	if (ret)
+		return ret;
+
+	/* Disable the interrupts through the initial bringup stage */
+	gpu_write(gpu, REG_A5XX_RBBM_INT_0_MASK, A5XX_INT_MASK);
+
+	/* Clear ME_HALT to start the micro engine */
+	gpu_write(gpu, REG_A5XX_CP_PFP_ME_CNTL, 0);
+	ret = a5xx_me_init(gpu);
+	if (ret)
+		return ret;
+
+	/* Put the GPU into insecure mode */
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
+
+	/*
+	 * Send a pipeline event stat to get misbehaving counters to start
+	 * ticking correctly
+	 */
+	if (adreno_is_a530(adreno_gpu)) {
+		OUT_PKT7(gpu->rb, CP_EVENT_WRITE, 1);
+		OUT_RING(gpu->rb, 0x0F);
+
+		gpu->funcs->flush(gpu);
+		if (!gpu->funcs->idle(gpu))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void a5xx_recover(struct msm_gpu *gpu)
+{
+	adreno_dump_info(gpu);
+
+	if (hang_debug)
+		a5xx_dump(gpu);
+
+	adreno_recover(gpu);
+}
+
+static void a5xx_destroy(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
+	DBG("%s", gpu->name);
+
+	if (a5xx_gpu->pm4_bo) {
+		if (a5xx_gpu->pm4_iova)
+			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->id);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->pm4_bo);
+	}
+
+	if (a5xx_gpu->pfp_bo) {
+		if (a5xx_gpu->pfp_iova)
+			msm_gem_put_iova(a5xx_gpu->pfp_bo, gpu->id);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->pfp_bo);
+	}
+
+	adreno_gpu_cleanup(adreno_gpu);
+	kfree(a5xx_gpu);
+}
+
+static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
+{
+	if (gpu_read(gpu, REG_A5XX_RBBM_STATUS) & ~A5XX_RBBM_STATUS_HI_BUSY)
+		return false;
+
+	/*
+	 * Nearly every abnormality ends up pausing the GPU and triggering a
+	 * fault so we can safely just watch for this one interrupt to fire
+	 */
+	return !(gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS) &
+		A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT);
+}
+
+static bool a5xx_idle(struct msm_gpu *gpu)
+{
+	/* wait for CP to drain ringbuffer: */
+	if (!adreno_idle(gpu))
+		return false;
+
+	if (spin_until(_a5xx_check_idle(gpu))) {
+		DRM_ERROR("%s: %ps: timeout waiting for GPU to idle: status %8.8X irq %8.8X\n",
+			gpu->name, __builtin_return_address(0),
+			gpu_read(gpu, REG_A5XX_RBBM_STATUS),
+			gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS));
+
+		return false;
+	}
+
+	return true;
+}
+
+static void a5xx_cp_err_irq(struct msm_gpu *gpu)
+{
+	u32 status = gpu_read(gpu, REG_A5XX_CP_INTERRUPT_STATUS);
+
+	if (status & A5XX_CP_INT_CP_OPCODE_ERROR) {
+		u32 val;
+
+		gpu_write(gpu, REG_A5XX_CP_PFP_STAT_ADDR, 0);
+
+		/*
+		 * REG_A5XX_CP_PFP_STAT_DATA is indexed, and we want index 1 so
+		 * read it twice
+		 */
+
+		gpu_read(gpu, REG_A5XX_CP_PFP_STAT_DATA);
+		val = gpu_read(gpu, REG_A5XX_CP_PFP_STAT_DATA);
+
+		dev_err_ratelimited(gpu->dev->dev, "CP | opcode error | possible opcode=0x%8.8X\n",
+			val);
+	}
+
+	if (status & A5XX_CP_INT_CP_HW_FAULT_ERROR)
+		dev_err_ratelimited(gpu->dev->dev, "CP | HW fault | status=0x%8.8X\n",
+			gpu_read(gpu, REG_A5XX_CP_HW_FAULT));
+
+	if (status & A5XX_CP_INT_CP_DMA_ERROR)
+		dev_err_ratelimited(gpu->dev->dev, "CP | DMA error\n");
+
+	if (status & A5XX_CP_INT_CP_REGISTER_PROTECTION_ERROR) {
+		u32 val = gpu_read(gpu, REG_A5XX_CP_PROTECT_STATUS);
+
+		dev_err_ratelimited(gpu->dev->dev,
+			"CP | protected mode error | %s | addr=0x%8.8X | status=0x%8.8X\n",
+			val & (1 << 24) ? "WRITE" : "READ",
+			(val & 0xFFFFF) >> 2, val);
+	}
+
+	if (status & A5XX_CP_INT_CP_AHB_ERROR) {
+		u32 status = gpu_read(gpu, REG_A5XX_CP_AHB_FAULT);
+		const char *access[16] = { "reserved", "reserved",
+			"timestamp lo", "timestamp hi", "pfp read", "pfp write",
+			"", "", "me read", "me write", "", "", "crashdump read",
+			"crashdump write" };
+
+		dev_err_ratelimited(gpu->dev->dev,
+			"CP | AHB error | addr=%X access=%s error=%d | status=0x%8.8X\n",
+			status & 0xFFFFF, access[(status >> 24) & 0xF],
+			(status & (1 << 31)), status);
+	}
+}
+
+static void a5xx_rbbm_err_irq(struct msm_gpu *gpu)
+{
+	u32 status = gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS);
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_AHB_ERROR) {
+		u32 val = gpu_read(gpu, REG_A5XX_RBBM_AHB_ERROR_STATUS);
+
+		dev_err_ratelimited(gpu->dev->dev,
+			"RBBM | AHB bus error | %s | addr=0x%X | ports=0x%X:0x%X\n",
+			val & (1 << 28) ? "WRITE" : "READ",
+			(val & 0xFFFFF) >> 2, (val >> 20) & 0x3,
+			(val >> 24) & 0xF);
+
+		/* Clear the error */
+		gpu_write(gpu, REG_A5XX_RBBM_AHB_CMD, (1 << 4));
+	}
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_TRANSFER_TIMEOUT)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | AHB transfer timeout\n");
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_ME_MS_TIMEOUT)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | ME master split | status=0x%X\n",
+			gpu_read(gpu, REG_A5XX_RBBM_AHB_ME_SPLIT_STATUS));
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_PFP_MS_TIMEOUT)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | PFP master split | status=0x%X\n",
+			gpu_read(gpu, REG_A5XX_RBBM_AHB_PFP_SPLIT_STATUS));
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_ETS_MS_TIMEOUT)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | ETS master split | status=0x%X\n",
+			gpu_read(gpu, REG_A5XX_RBBM_AHB_ETS_SPLIT_STATUS));
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNC_OVERFLOW)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | ATB ASYNC overflow\n");
+
+	if (status & A5XX_RBBM_INT_0_MASK_RBBM_ATB_BUS_OVERFLOW)
+		dev_err_ratelimited(gpu->dev->dev, "RBBM | ATB bus overflow\n");
+}
+
+static void a5xx_uche_err_irq(struct msm_gpu *gpu)
+{
+	uint64_t addr = (uint64_t) gpu_read(gpu, REG_A5XX_UCHE_TRAP_LOG_HI);
+
+	addr |= gpu_read(gpu, REG_A5XX_UCHE_TRAP_LOG_LO);
+
+	dev_err_ratelimited(gpu->dev->dev, "UCHE | Out of bounds access | addr=0x%llX\n",
+		addr);
+}
+
+static void a5xx_gpmu_err_irq(struct msm_gpu *gpu)
+{
+	dev_err_ratelimited(gpu->dev->dev, "GPMU | voltage droop\n");
+}
+
+#define RBBM_ERROR_MASK \
+	(A5XX_RBBM_INT_0_MASK_RBBM_AHB_ERROR | \
+	A5XX_RBBM_INT_0_MASK_RBBM_TRANSFER_TIMEOUT | \
+	A5XX_RBBM_INT_0_MASK_RBBM_ME_MS_TIMEOUT | \
+	A5XX_RBBM_INT_0_MASK_RBBM_PFP_MS_TIMEOUT | \
+	A5XX_RBBM_INT_0_MASK_RBBM_ETS_MS_TIMEOUT | \
+	A5XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNC_OVERFLOW)
+
+static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
+{
+	u32 status = gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS);
+
+	gpu_write(gpu, REG_A5XX_RBBM_INT_CLEAR_CMD, status);
+
+	if (status & RBBM_ERROR_MASK)
+		a5xx_rbbm_err_irq(gpu);
+
+	if (status & A5XX_RBBM_INT_0_MASK_CP_HW_ERROR)
+		a5xx_cp_err_irq(gpu);
+
+	if (status & A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS)
+		a5xx_uche_err_irq(gpu);
+
+	if (status & A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
+		a5xx_gpmu_err_irq(gpu);
+
+	if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
+		msm_gpu_retire(gpu);
+
+	return IRQ_HANDLED;
+}
+
+static const u32 a5xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_A5XX_CP_RB_BASE),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE_HI, REG_A5XX_CP_RB_BASE_HI),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_A5XX_CP_RB_RPTR_ADDR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR_HI,
+		REG_A5XX_CP_RB_RPTR_ADDR_HI),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_A5XX_CP_RB_RPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_A5XX_CP_RB_WPTR),
+	REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A5XX_CP_RB_CNTL),
+};
+
+static const u32 a5xx_registers[] = {
+	0x0000, 0x0002, 0x0004, 0x0020, 0x0022, 0x0026, 0x0029, 0x002B,
+	0x002E, 0x0035, 0x0038, 0x0042, 0x0044, 0x0044, 0x0047, 0x0095,
+	0x0097, 0x00BB, 0x03A0, 0x0464, 0x0469, 0x046F, 0x04D2, 0x04D3,
+	0x04E0, 0x0533, 0x0540, 0x0555, 0xF400, 0xF400, 0xF800, 0xF807,
+	0x0800, 0x081A, 0x081F, 0x0841, 0x0860, 0x0860, 0x0880, 0x08A0,
+	0x0B00, 0x0B12, 0x0B15, 0x0B28, 0x0B78, 0x0B7F, 0x0BB0, 0x0BBD,
+	0x0BC0, 0x0BC6, 0x0BD0, 0x0C53, 0x0C60, 0x0C61, 0x0C80, 0x0C82,
+	0x0C84, 0x0C85, 0x0C90, 0x0C98, 0x0CA0, 0x0CA0, 0x0CB0, 0x0CB2,
+	0x2180, 0x2185, 0x2580, 0x2585, 0x0CC1, 0x0CC1, 0x0CC4, 0x0CC7,
+	0x0CCC, 0x0CCC, 0x0CD0, 0x0CD8, 0x0CE0, 0x0CE5, 0x0CE8, 0x0CE8,
+	0x0CEC, 0x0CF1, 0x0CFB, 0x0D0E, 0x2100, 0x211E, 0x2140, 0x2145,
+	0x2500, 0x251E, 0x2540, 0x2545, 0x0D10, 0x0D17, 0x0D20, 0x0D23,
+	0x0D30, 0x0D30, 0x20C0, 0x20C0, 0x24C0, 0x24C0, 0x0E40, 0x0E43,
+	0x0E4A, 0x0E4A, 0x0E50, 0x0E57, 0x0E60, 0x0E7C, 0x0E80, 0x0E8E,
+	0x0E90, 0x0E96, 0x0EA0, 0x0EA8, 0x0EB0, 0x0EB2, 0xE140, 0xE147,
+	0xE150, 0xE187, 0xE1A0, 0xE1A9, 0xE1B0, 0xE1B6, 0xE1C0, 0xE1C7,
+	0xE1D0, 0xE1D1, 0xE200, 0xE201, 0xE210, 0xE21C, 0xE240, 0xE268,
+	0xE000, 0xE006, 0xE010, 0xE09A, 0xE0A0, 0xE0A4, 0xE0AA, 0xE0EB,
+	0xE100, 0xE105, 0xE380, 0xE38F, 0xE3B0, 0xE3B0, 0xE400, 0xE405,
+	0xE408, 0xE4E9, 0xE4F0, 0xE4F0, 0xE280, 0xE280, 0xE282, 0xE2A3,
+	0xE2A5, 0xE2C2, 0xE940, 0xE947, 0xE950, 0xE987, 0xE9A0, 0xE9A9,
+	0xE9B0, 0xE9B6, 0xE9C0, 0xE9C7, 0xE9D0, 0xE9D1, 0xEA00, 0xEA01,
+	0xEA10, 0xEA1C, 0xEA40, 0xEA68, 0xE800, 0xE806, 0xE810, 0xE89A,
+	0xE8A0, 0xE8A4, 0xE8AA, 0xE8EB, 0xE900, 0xE905, 0xEB80, 0xEB8F,
+	0xEBB0, 0xEBB0, 0xEC00, 0xEC05, 0xEC08, 0xECE9, 0xECF0, 0xECF0,
+	0xEA80, 0xEA80, 0xEA82, 0xEAA3, 0xEAA5, 0xEAC2, 0xA800, 0xA8FF,
+	0xAC60, 0xAC60, 0xB000, 0xB97F, 0xB9A0, 0xB9BF,
+	~0
+};
+
+static void a5xx_dump(struct msm_gpu *gpu)
+{
+	dev_info(gpu->dev->dev, "status:   %08x\n",
+		gpu_read(gpu, REG_A5XX_RBBM_STATUS));
+	adreno_dump(gpu);
+}
+
+static int a5xx_pm_resume(struct msm_gpu *gpu)
+{
+	return msm_gpu_pm_resume(gpu);
+}
+
+static int a5xx_pm_suspend(struct msm_gpu *gpu)
+{
+	return msm_gpu_pm_suspend(gpu);
+}
+
+static int a5xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
+{
+	*value = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_CP_0_LO,
+		REG_A5XX_RBBM_PERFCTR_CP_0_HI);
+
+	return 0;
+}
+
+#ifdef CONFIG_DEBUG_FS
+static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
+{
+	gpu->funcs->pm_resume(gpu);
+
+	seq_printf(m, "status:   %08x\n",
+			gpu_read(gpu, REG_A5XX_RBBM_STATUS));
+	gpu->funcs->pm_suspend(gpu);
+
+	adreno_show(gpu, m);
+}
+#endif
+
+static const struct adreno_gpu_funcs funcs = {
+	.base = {
+		.get_param = adreno_get_param,
+		.hw_init = a5xx_hw_init,
+		.pm_suspend = a5xx_pm_suspend,
+		.pm_resume = a5xx_pm_resume,
+		.recover = a5xx_recover,
+		.last_fence = adreno_last_fence,
+		.submit = a5xx_submit,
+		.flush = adreno_flush,
+		.idle = a5xx_idle,
+		.irq = a5xx_irq,
+		.destroy = a5xx_destroy,
+		.show = a5xx_show,
+	},
+	.get_timestamp = a5xx_get_timestamp,
+};
+
+struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+	struct platform_device *pdev = priv->gpu_pdev;
+	struct a5xx_gpu *a5xx_gpu = NULL;
+	struct adreno_gpu *adreno_gpu;
+	struct msm_gpu *gpu;
+	int ret;
+
+	if (!pdev) {
+		dev_err(dev->dev, "No A5XX device is defined\n");
+		return ERR_PTR(-ENXIO);
+	}
+
+	a5xx_gpu = kzalloc(sizeof(*a5xx_gpu), GFP_KERNEL);
+	if (!a5xx_gpu)
+		return ERR_PTR(-ENOMEM);
+
+	adreno_gpu = &a5xx_gpu->base;
+	gpu = &adreno_gpu->base;
+
+	a5xx_gpu->pdev = pdev;
+	adreno_gpu->registers = a5xx_registers;
+	adreno_gpu->reg_offsets = a5xx_register_offsets;
+
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	if (ret) {
+		a5xx_destroy(&(a5xx_gpu->base.base));
+		return ERR_PTR(ret);
+	}
+
+	return gpu;
+}
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
new file mode 100644
index 0000000..d58d6a3
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -0,0 +1,37 @@
+/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#ifndef __A5XX_GPU_H__
+#define __A5XX_GPU_H__
+
+#include "adreno_gpu.h"
+
+/* Bringing over the hack from the previous targets */
+#undef ROP_COPY
+#undef ROP_XOR
+
+#include "a5xx.xml.h"
+
+struct a5xx_gpu {
+	struct adreno_gpu base;
+	struct platform_device *pdev;
+
+	struct drm_gem_object *pm4_bo;
+	uint32_t pm4_iova;
+
+	struct drm_gem_object *pfp_bo;
+	uint32_t pfp_iova;
+};
+
+#define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
+
+#endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 3321d33..7749b60 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -27,6 +27,7 @@ module_param_named(hang_debug, hang_debug, bool, 0600);
 
 struct msm_gpu *a3xx_gpu_init(struct drm_device *dev);
 struct msm_gpu *a4xx_gpu_init(struct drm_device *dev);
+struct msm_gpu *a5xx_gpu_init(struct drm_device *dev);
 
 static const struct adreno_info gpulist[] = {
 	{
@@ -77,6 +78,14 @@ static const struct adreno_info gpulist[] = {
 		.pfpfw = "a420_pfp.fw",
 		.gmem  = (SZ_1M + SZ_512K),
 		.init  = a4xx_gpu_init,
+	}, {
+		.rev = ADRENO_REV(5, 3, 0, ANY_ID),
+		.revn = 530,
+		.name = "A530",
+		.pm4fw = "a530_pm4.fw",
+		.pfpfw = "a530_pfp.fw",
+		.gmem = SZ_1M,
+		.init = a5xx_gpu_init,
 	},
 };
 
@@ -86,6 +95,8 @@ MODULE_FIRMWARE("a330_pm4.fw");
 MODULE_FIRMWARE("a330_pfp.fw");
 MODULE_FIRMWARE("a420_pm4.fw");
 MODULE_FIRMWARE("a420_pfp.fw");
+MODULE_FIRMWARE("a530_fm4.fw");
+MODULE_FIRMWARE("a530_pfp.fw");
 
 static inline bool _rev_match(uint8_t entry, uint8_t id)
 {
@@ -173,12 +184,20 @@ static void set_gpu_pdev(struct drm_device *dev,
 	priv->gpu_pdev = pdev;
 }
 
+static const struct {
+	const char *str;
+	uint32_t flag;
+} quirks[] = {
+	{ "qcom,gpu-quirk-two-pass-use-wfi", ADRENO_QUIRK_TWO_PASS_USE_WFI },
+	{ "qcom,gpu-quirk-fault-detect-mask", ADRENO_QUIRK_FAULT_DETECT_MASK },
+};
+
 static int adreno_bind(struct device *dev, struct device *master, void *data)
 {
 	static struct adreno_platform_config config = {};
 	struct device_node *child, *node = dev->of_node;
 	u32 val;
-	int ret;
+	int ret, i;
 
 	ret = of_property_read_u32(node, "qcom,chipid", &val);
 	if (ret) {
@@ -212,6 +231,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
 		return -ENXIO;
 	}
 
+	for (i = 0; i < ARRAY_SIZE(quirks); i++)
+		if (of_property_read_bool(node, quirks[i].str))
+			config.quirks |= quirks[i].flag;
+
 	dev->platform_data = &config;
 	set_gpu_pdev(dev_get_drvdata(master), to_platform_device(dev));
 	return 0;
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 6cc1d7f..b3227c3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -22,7 +22,7 @@
 #include "msm_mmu.h"
 
 #define RB_SIZE    SZ_32K
-#define RB_BLKSIZE 16
+#define RB_BLKSIZE 32
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 {
@@ -54,9 +54,6 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 	}
 }
 
-#define rbmemptr(adreno_gpu, member)  \
-	((adreno_gpu)->memptrs_iova + offsetof(struct adreno_rbmemptrs, member))
-
 int adreno_hw_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -355,6 +352,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu->gmem = adreno_gpu->info->gmem;
 	adreno_gpu->revn = adreno_gpu->info->revn;
 	adreno_gpu->rev = config->rev;
+	adreno_gpu->quirks = config->quirks;
 
 	gpu->fast_rate = config->fast_rate;
 	gpu->slow_rate = config->slow_rate;
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 7b9895f..a946cf2 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -48,6 +48,11 @@ enum adreno_regs {
 	REG_ADRENO_REGISTER_MAX,
 };
 
+enum adreno_quirks {
+	ADRENO_QUIRK_TWO_PASS_USE_WFI = 1,
+	ADRENO_QUIRK_FAULT_DETECT_MASK = 2,
+};
+
 struct adreno_rev {
 	uint8_t  core;
 	uint8_t  major;
@@ -74,6 +79,9 @@ struct adreno_info {
 
 const struct adreno_info *adreno_info(struct adreno_rev rev);
 
+#define rbmemptr(adreno_gpu, member)  \
+	((adreno_gpu)->memptrs_iova + offsetof(struct adreno_rbmemptrs, member))
+
 struct adreno_rbmemptrs {
 	volatile uint32_t rptr;
 	volatile uint32_t wptr;
@@ -107,6 +115,8 @@ struct adreno_gpu {
 	 * code (a3xx_gpu.c) and stored in this common location.
 	 */
 	const unsigned int *reg_offsets;
+
+	uint32_t quirks;
 };
 #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
 
@@ -117,6 +127,7 @@ struct adreno_platform_config {
 #ifdef DOWNSTREAM_CONFIG_MSM_BUS_SCALING
 	struct msm_bus_scale_pdata *bus_scale_table;
 #endif
+	uint32_t quirks;
 };
 
 #define ADRENO_IDLE_TIMEOUT msecs_to_jiffies(1000)
@@ -180,6 +191,11 @@ static inline int adreno_is_a430(struct adreno_gpu *gpu)
        return gpu->revn == 430;
 }
 
+static inline int adreno_is_a530(struct adreno_gpu *gpu)
+{
+	return gpu->revn == 530;
+}
+
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 int adreno_hw_init(struct msm_gpu *gpu);
 uint32_t adreno_last_fence(struct msm_gpu *gpu);
@@ -304,4 +320,28 @@ static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
 	adreno_gpu_write(gpu, hi, upper_32_bits(data));
 }
 
+/*
+ * Given a register and a count, return a value to program into
+ * REG_CP_PROTECT_REG(n) - this will block both reads and writes for _len
+ * registers starting at _reg.
+ *
+ * The register base needs to be a multiple of the length. If it is not, the
+ * hardware will quietly mask off the bits for you and shift the size. For
+ * example, if you intend the protection to start at 0x07 for a length of 4
+ * (0x07-0x0A) the hardware will actually protect (0x04-0x07) which might
+ * expose registers you intended to protect!
+ */
+#define ADRENO_PROTECT_RW(_reg, _len) \
+	((1 << 30) | (1 << 29) | \
+	((ilog2((_len)) & 0x1F) << 24) | (((_reg) << 2) & 0xFFFFF))
+
+/*
+ * Same as above, but allow reads over the range. For areas of mixed use (such
+ * as performance counters) this allows us to protect a much larger range with a
+ * single register
+ */
+#define ADRENO_PROTECT_RDONLY(_reg, _len) \
+	((1 << 29) \
+	((ilog2((_len)) & 0x1F) << 24) | (((_reg) << 2) & 0xFFFFF))
+
 #endif /* __ADRENO_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 207c7d2e..23df06c 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -96,6 +96,10 @@ static int enable_clk(struct msm_gpu *gpu)
 	if (gpu->grp_clks[0] && gpu->fast_rate)
 		clk_set_rate(gpu->grp_clks[0], gpu->fast_rate);
 
+	/* Set the RBBM timer rate to 19.2Mhz */
+	if (gpu->grp_clks[2])
+		clk_set_rate(gpu->grp_clks[2], 19200000);
+
 	for (i = ARRAY_SIZE(gpu->grp_clks) - 1; i >= 0; i--)
 		if (gpu->grp_clks[i])
 			clk_prepare(gpu->grp_clks[i]);
@@ -122,6 +126,9 @@ static int disable_clk(struct msm_gpu *gpu)
 	if (gpu->grp_clks[0] && gpu->slow_rate)
 		clk_set_rate(gpu->grp_clks[0], gpu->slow_rate);
 
+	if (gpu->grp_clks[2])
+		clk_set_rate(gpu->grp_clks[2], 0);
+
 	return 0;
 }
 
@@ -562,8 +569,8 @@ static irqreturn_t irq_handler(int irq, void *data)
 }
 
 static const char *clk_names[] = {
-		"core_clk", "iface_clk", "mem_clk", "mem_iface_clk",
-		"alt_mem_iface_clk",
+		"core_clk", "iface_clk", "rbbmtimer_clk", "mem_clk",
+		"mem_iface_clk", "alt_mem_iface_clk",
 };
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index bec3735..f8e6657 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -103,7 +103,7 @@ struct msm_gpu {
 
 	/* Power Control: */
 	struct regulator *gpu_reg, *gpu_cx;
-	struct clk *ebi1_clk, *grp_clks[5];
+	struct clk *ebi1_clk, *grp_clks[6];
 	uint32_t fast_rate, slow_rate, bus_freq;
 
 #ifdef DOWNSTREAM_CONFIG_MSM_BUS_SCALING
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/16] drm/msm: gpu: Add support for the GPMU
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (9 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 10/16] drm/msm: Disable interrupts during init Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
       [not found]     ` <1478299497-9729-14-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-11-04 22:44   ` [PATCH 15/16] drm/msm: Add a quick and dirty PIL loader Jordan Crouse
  2016-11-04 22:44   ` [PATCH 16/16] drm/msm: gpu: Use the zap shader on 5XX if we can Jordan Crouse
  12 siblings, 1 reply; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Most 5XX targets have GPMU (Graphics Power Management Unit) that
handles a lot of the heavy lifting for power management including
thermal and limits management and dynamic power collapse. While
the GPMU itself is optional, it is usually nessesary to hit
aggressive power targets.

The GPMU firmware needs to be loaded into the GPMU at init time via a
shared hardware block of registers. Using the GPU to write the microcode
is more efficient than using the CPU so at first load create an indirect
buffer that can be executed during subsequent initalization sequences.

After loading the GPMU gets initalized through a shared register
interface and then we mostly get out of its way and let it do
its thing.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/Makefile               |   1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c      |  64 +++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h      |  23 ++
 drivers/gpu/drm/msm/adreno/a5xx_power.c    | 341 +++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_device.c |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h    |   1 +
 6 files changed, 428 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_power.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7c4ddb3..394c52e 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -7,6 +7,7 @@ msm-y := \
 	adreno/a3xx_gpu.o \
 	adreno/a4xx_gpu.o \
 	adreno/a5xx_gpu.o \
+	adreno/a5xx_power.o \
 	hdmi/hdmi.o \
 	hdmi/hdmi_audio.o \
 	hdmi/hdmi_bridge.o \
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 89fea7e..997ae31 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -450,6 +450,9 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x00000000);
 	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x00000000);
 
+	/* Load the GPMU firmware before starting the HW init */
+	a5xx_gpmu_ucode_init(gpu);
+
 	ret = adreno_hw_init(gpu);
 	if (ret)
 		return ret;
@@ -467,8 +470,9 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	if (ret)
 		return ret;
 
-	/* Put the GPU into insecure mode */
-	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
+	ret = a5xx_power_init(gpu);
+	if (ret)
+		return ret;
 
 	/*
 	 * Send a pipeline event stat to get misbehaving counters to start
@@ -483,6 +487,9 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 			return -EINVAL;
 	}
 
+	/* Put the GPU into unsecure mode */
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
+
 	return 0;
 }
 
@@ -515,6 +522,12 @@ static void a5xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_unreference_unlocked(a5xx_gpu->pfp_bo);
 	}
 
+	if (a5xx_gpu->gpmu_bo) {
+		if (a5xx_gpu->gpmu_bo)
+			msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
+	}
+
 	adreno_gpu_cleanup(adreno_gpu);
 	kfree(a5xx_gpu);
 }
@@ -738,11 +751,54 @@ static void a5xx_dump(struct msm_gpu *gpu)
 
 static int a5xx_pm_resume(struct msm_gpu *gpu)
 {
-	return msm_gpu_pm_resume(gpu);
+	int ret;
+
+	/* Turn on the core power */
+	ret = msm_gpu_pm_resume(gpu);
+	if (ret)
+		return ret;
+
+	/* Turn the RBCCU domain first to limit the chances of voltage droop */
+	gpu_write(gpu, REG_A5XX_GPMU_RBCCU_POWER_CNTL, 0x778000);
+
+	/* Wait 3 usecs before polling */
+	udelay(3);
+
+	ret = spin_usecs(gpu, 20, REG_A5XX_GPMU_RBCCU_PWR_CLK_STATUS,
+		(1 << 20), (1 << 20));
+	if (ret) {
+		DRM_ERROR("%s: timeout waiting for RBCCU GDSC enable: %X\n",
+			gpu->name,
+			gpu_read(gpu, REG_A5XX_GPMU_RBCCU_PWR_CLK_STATUS));
+		return ret;
+	}
+
+	/* Turn on the SP domain */
+	gpu_write(gpu, REG_A5XX_GPMU_SP_POWER_CNTL, 0x778000);
+	ret = spin_usecs(gpu, 20, REG_A5XX_GPMU_SP_PWR_CLK_STATUS,
+		(1 << 20), (1 << 20));
+	if (ret)
+		DRM_ERROR("%s: timeout waiting for SP GDSC enable\n",
+			gpu->name);
+
+	return ret;
 }
 
 static int a5xx_pm_suspend(struct msm_gpu *gpu)
 {
+	/* Clear the VBIF pipe before shutting down */
+	gpu_write(gpu, REG_A5XX_VBIF_XIN_HALT_CTRL0, 0xF);
+	spin_until((gpu_read(gpu, REG_A5XX_VBIF_XIN_HALT_CTRL1) & 0xF) == 0xF);
+
+	gpu_write(gpu, REG_A5XX_VBIF_XIN_HALT_CTRL0, 0);
+
+	/*
+	 * Reset the VBIF before power collapse to avoid issue with FIFO
+	 * entries
+	 */
+	gpu_write(gpu, REG_A5XX_RBBM_BLOCK_SW_RESET_CMD, 0x003C0000);
+	gpu_write(gpu, REG_A5XX_RBBM_BLOCK_SW_RESET_CMD, 0x00000000);
+
 	return msm_gpu_pm_suspend(gpu);
 }
 
@@ -810,6 +866,8 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a5xx_registers;
 	adreno_gpu->reg_offsets = a5xx_register_offsets;
 
+	a5xx_gpu->lm_leakage = 0x4E001A;
+
 	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index d58d6a3..7fa2be8 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -30,8 +30,31 @@ struct a5xx_gpu {
 
 	struct drm_gem_object *pfp_bo;
 	uint32_t pfp_iova;
+
+	struct drm_gem_object *gpmu_bo;
+	uint32_t gpmu_iova;
+	uint32_t gpmu_dwords;
+
+	uint32_t lm_leakage;
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
 
+int a5xx_power_init(struct msm_gpu *gpu);
+void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
+
+static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
+		uint32_t reg, uint32_t mask, uint32_t value)
+{
+	while (usecs--) {
+		udelay(1);
+		if ((gpu_read(gpu, reg) & mask) == value)
+			return 0;
+		cpu_relax();
+	}
+
+	return -ETIMEDOUT;
+}
+
+
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
new file mode 100644
index 0000000..7c76bde
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -0,0 +1,341 @@
+/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/pm_opp.h>
+#include "a5xx_gpu.h"
+
+/*
+ * The GPMU data block is a block of shared registers that can be used to
+ * communicate back and forth. These "registers" are by convention with the GPMU
+ * firwmare and not bound to any specific hardware design
+ */
+
+#define AGC_INIT_BASE REG_A5XX_GPMU_DATA_RAM_BASE
+#define AGC_INIT_MSG_MAGIC (AGC_INIT_BASE + 5)
+#define AGC_MSG_BASE (AGC_INIT_BASE + 7)
+
+#define AGC_MSG_STATE (AGC_MSG_BASE + 0)
+#define AGC_MSG_COMMAND (AGC_MSG_BASE + 1)
+#define AGC_MSG_PAYLOAD_SIZE (AGC_MSG_BASE + 3)
+#define AGC_MSG_PAYLOAD(_o) ((AGC_MSG_BASE + 5) + (_o))
+
+#define AGC_POWER_CONFIG_PRODUCTION_ID 1
+#define AGC_INIT_MSG_VALUE 0xBABEFACE
+
+static struct {
+	uint32_t reg;
+	uint32_t value;
+} a5xx_sequence_regs[] = {
+	{ 0xB9A1, 0x00010303 },
+	{ 0xB9A2, 0x13000000 },
+	{ 0xB9A3, 0x00460020 },
+	{ 0xB9A4, 0x10000000 },
+	{ 0xB9A5, 0x040A1707 },
+	{ 0xB9A6, 0x00010000 },
+	{ 0xB9A7, 0x0E000904 },
+	{ 0xB9A8, 0x10000000 },
+	{ 0xB9A9, 0x01165000 },
+	{ 0xB9AA, 0x000E0002 },
+	{ 0xB9AB, 0x03884141 },
+	{ 0xB9AC, 0x10000840 },
+	{ 0xB9AD, 0x572A5000 },
+	{ 0xB9AE, 0x00000003 },
+	{ 0xB9AF, 0x00000000 },
+	{ 0xB9B0, 0x10000000 },
+	{ 0xB828, 0x6C204010 },
+	{ 0xB829, 0x6C204011 },
+	{ 0xB82A, 0x6C204012 },
+	{ 0xB82B, 0x6C204013 },
+	{ 0xB82C, 0x6C204014 },
+	{ 0xB90F, 0x00000004 },
+	{ 0xB910, 0x00000002 },
+	{ 0xB911, 0x00000002 },
+	{ 0xB912, 0x00000002 },
+	{ 0xB913, 0x00000002 },
+	{ 0xB92F, 0x00000004 },
+	{ 0xB930, 0x00000005 },
+	{ 0xB931, 0x00000005 },
+	{ 0xB932, 0x00000005 },
+	{ 0xB933, 0x00000005 },
+	{ 0xB96F, 0x00000001 },
+	{ 0xB970, 0x00000003 },
+	{ 0xB94F, 0x00000004 },
+	{ 0xB950, 0x0000000B },
+	{ 0xB951, 0x0000000B },
+	{ 0xB952, 0x0000000B },
+	{ 0xB953, 0x0000000B },
+	{ 0xB907, 0x00000019 },
+	{ 0xB927, 0x00000019 },
+	{ 0xB947, 0x00000019 },
+	{ 0xB967, 0x00000019 },
+	{ 0xB987, 0x00000019 },
+	{ 0xB906, 0x00220001 },
+	{ 0xB926, 0x00220001 },
+	{ 0xB946, 0x00220001 },
+	{ 0xB966, 0x00220001 },
+	{ 0xB986, 0x00300000 },
+	{ 0xAC40, 0x0340FF41 },
+	{ 0xAC41, 0x03BEFED0 },
+	{ 0xAC42, 0x00331FED },
+	{ 0xAC43, 0x021FFDD3 },
+	{ 0xAC44, 0x5555AAAA },
+	{ 0xAC45, 0x5555AAAA },
+	{ 0xB9BA, 0x00000008 },
+};
+
+/*
+ * Get the actual voltage value for the operating point at the specified
+ * frequency
+ */
+static inline uint32_t _get_mvolts(struct msm_gpu *gpu, uint32_t freq)
+{
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+	struct platform_device *pdev = priv->gpu_pdev;
+	struct dev_pm_opp *opp;
+
+	opp = dev_pm_opp_find_freq_exact(&pdev->dev, freq, true);
+
+	return (!IS_ERR(opp)) ? dev_pm_opp_get_voltage(opp) / 1000 : 0;
+}
+
+/* Setup thermal limit management */
+static void a5xx_lm_setup(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	unsigned int i;
+
+	/* Write the block of sequence registers */
+	for (i = 0; i < ARRAY_SIZE(a5xx_sequence_regs); i++)
+		gpu_write(gpu, a5xx_sequence_regs[i].reg,
+			a5xx_sequence_regs[i].value);
+
+	/* Hard code the A530 GPU thermal sensor ID for the GPMU */
+	gpu_write(gpu, REG_A5XX_GPMU_TEMP_SENSOR_ID, 0x60007);
+	gpu_write(gpu, REG_A5XX_GPMU_DELTA_TEMP_THRESHOLD, 0x01);
+	gpu_write(gpu, REG_A5XX_GPMU_TEMP_SENSOR_CONFIG, 0x01);
+
+	/* Until we get clock scaling 0 is always the active power level */
+	gpu_write(gpu, REG_A5XX_GPMU_GPMU_VOLTAGE, 0x80000000 | 0);
+
+	gpu_write(gpu, REG_A5XX_GPMU_BASE_LEAKAGE, a5xx_gpu->lm_leakage);
+
+	/* The threshold is fixed at 6000 for A530 */
+	gpu_write(gpu, REG_A5XX_GPMU_GPMU_PWR_THRESHOLD, 0x80000000 | 6000);
+
+	gpu_write(gpu, REG_A5XX_GPMU_BEC_ENABLE, 0x10001FFF);
+	gpu_write(gpu, REG_A5XX_GDPM_CONFIG1, 0x00201FF1);
+
+	/* Write the voltage table */
+	gpu_write(gpu, REG_A5XX_GPMU_BEC_ENABLE, 0x10001FFF);
+	gpu_write(gpu, REG_A5XX_GDPM_CONFIG1, 0x201FF1);
+
+	gpu_write(gpu, AGC_MSG_STATE, 1);
+	gpu_write(gpu, AGC_MSG_COMMAND, AGC_POWER_CONFIG_PRODUCTION_ID);
+
+	/* Write the max power - hard coded to 5448 for A530 */
+	gpu_write(gpu, AGC_MSG_PAYLOAD(0), 5448);
+	gpu_write(gpu, AGC_MSG_PAYLOAD(1), 1);
+
+	/*
+	 * For now just write the one voltage level - we will do more when we
+	 * can do scaling
+	 */
+	gpu_write(gpu, AGC_MSG_PAYLOAD(2), _get_mvolts(gpu, gpu->fast_rate));
+	gpu_write(gpu, AGC_MSG_PAYLOAD(3), gpu->fast_rate / 1000000);
+
+	gpu_write(gpu, AGC_MSG_PAYLOAD_SIZE, 4 * sizeof(uint32_t));
+	gpu_write(gpu, AGC_INIT_MSG_MAGIC, AGC_INIT_MSG_VALUE);
+}
+
+/* Enable SP/TP cpower collapse */
+static void a5xx_pc_init(struct msm_gpu *gpu)
+{
+	gpu_write(gpu, REG_A5XX_GPMU_PWR_COL_INTER_FRAME_CTRL, 0x7F);
+	gpu_write(gpu, REG_A5XX_GPMU_PWR_COL_BINNING_CTRL, 0);
+	gpu_write(gpu, REG_A5XX_GPMU_PWR_COL_INTER_FRAME_HYST, 0xA0080);
+	gpu_write(gpu, REG_A5XX_GPMU_PWR_COL_STAGGER_DELAY, 0x600040);
+}
+
+/* Enable the GPMU microcontroller */
+static int a5xx_gpmu_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring = gpu->rb;
+
+	if (!a5xx_gpu->gpmu_dwords)
+		return 0;
+
+	/* Turn off protected mode for this operation */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Kick off the IB to load the GPMU microcode */
+	OUT_PKT7(ring, CP_INDIRECT_BUFFER_PFE, 3);
+	OUT_RING(ring, lower_32_bits(a5xx_gpu->gpmu_iova));
+	OUT_RING(ring, upper_32_bits(a5xx_gpu->gpmu_iova));
+	OUT_RING(ring, a5xx_gpu->gpmu_dwords);
+
+	/* Turn back on protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+
+	gpu->funcs->flush(gpu);
+
+	if (!gpu->funcs->idle(gpu)) {
+		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
+			gpu->name);
+		return -EINVAL;
+	}
+
+	gpu_write(gpu, REG_A5XX_GPMU_WFI_CONFIG, 0x4014);
+
+	/* Kick off the GPMU */
+	gpu_write(gpu, REG_A5XX_GPMU_CM3_SYSRESET, 0x0);
+
+	/*
+	 * Wait for the GPMU to respond. It isn't fatal if it doesn't, we just
+	 * won't have advanced power collapse.
+	 */
+	if (spin_usecs(gpu, 25, REG_A5XX_GPMU_GENERAL_0, 0xFFFFFFFF,
+		0xBABEFACE))
+		DRM_ERROR("%s: GPMU firmware initialization timed out\n",
+			gpu->name);
+
+	return 0;
+}
+
+/* Enable limits management */
+static void a5xx_lm_enable(struct msm_gpu *gpu)
+{
+	gpu_write(gpu, REG_A5XX_GDPM_INT_MASK, 0x0);
+	gpu_write(gpu, REG_A5XX_GDPM_INT_EN, 0x0A);
+	gpu_write(gpu, REG_A5XX_GPMU_GPMU_VOLTAGE_INTR_EN_MASK, 0x01);
+	gpu_write(gpu, REG_A5XX_GPMU_TEMP_THRESHOLD_INTR_EN_MASK, 0x50000);
+	gpu_write(gpu, REG_A5XX_GPMU_THROTTLE_UNMASK_FORCE_CTRL, 0x30000);
+
+	gpu_write(gpu, REG_A5XX_GPMU_CLOCK_THROTTLE_CTRL, 0x011);
+}
+
+int a5xx_power_init(struct msm_gpu *gpu)
+{
+	int ret;
+
+	/* Set up the limits management */
+	a5xx_lm_setup(gpu);
+
+	/* Set up SP/TP power collpase */
+	a5xx_pc_init(gpu);
+
+	/* Start the GPMU */
+	ret = a5xx_gpmu_init(gpu);
+	if (ret)
+		return ret;
+
+	/* Start the limits management */
+	a5xx_lm_enable(gpu);
+
+	return 0;
+}
+
+void a5xx_gpmu_ucode_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct drm_device *drm = gpu->dev;
+	const struct firmware *fw;
+	uint32_t dwords = 0, offset = 0, bosize;
+	unsigned int *data, *ptr, *cmds;
+	unsigned int cmds_size;
+
+	/* Get the firmware */
+	if (request_firmware(&fw, adreno_gpu->info->gpmufw, drm->dev)) {
+		DRM_ERROR("%s: Could not get GPMU firmware. GPMU will not be active\n",
+			gpu->name);
+		return;
+	}
+
+	data = (unsigned int *) fw->data;
+
+	/*
+	 * The first dword is the size of the remaining data in dwords. Use it
+	 * as a checksum of sorts and make sure it matches the actual size of
+	 * the firmware that we read
+	 */
+
+	if (fw->size < 8 || (data[0] < 2) || (data[0] >= (fw->size >> 2)))
+		goto out;
+
+	/* The second dword is an ID - look for 2 (GPMU_FIRMWARE_ID) */
+	if (data[1] != 2)
+		goto out;
+
+	cmds = data + data[2] + 3;
+	cmds_size = data[0] - data[2] - 2;
+
+	/*
+	 * A single type4 opcode can only have so many values attached so
+	 * add enough opcodes to load the all the commands
+	 */
+	bosize = (cmds_size + (cmds_size / TYPE4_MAX_PAYLOAD) + 1) << 2;
+
+	mutex_lock(&drm->struct_mutex);
+	a5xx_gpu->gpmu_bo = msm_gem_new(drm, bosize, MSM_BO_UNCACHED);
+	mutex_unlock(&drm->struct_mutex);
+
+	if (IS_ERR(a5xx_gpu->gpmu_bo))
+		goto err;
+
+	if (msm_gem_get_iova(a5xx_gpu->gpmu_bo, gpu->id, &a5xx_gpu->gpmu_iova))
+		goto err;
+
+	ptr = msm_gem_get_vaddr(a5xx_gpu->gpmu_bo);
+	if (!ptr)
+		goto err;
+
+	while (cmds_size > 0) {
+		int i;
+		uint32_t _size = cmds_size > TYPE4_MAX_PAYLOAD ?
+			TYPE4_MAX_PAYLOAD : cmds_size;
+
+		ptr[dwords++] = PKT4(REG_A5XX_GPMU_INST_RAM_BASE + offset,
+			_size);
+
+		for (i = 0; i < _size; i++)
+			ptr[dwords++] = *cmds++;
+
+		offset += _size;
+		cmds_size -= _size;
+	}
+
+	msm_gem_put_vaddr(a5xx_gpu->gpmu_bo);
+	a5xx_gpu->gpmu_dwords = dwords;
+
+	goto out;
+
+err:
+	if (a5xx_gpu->gpmu_iova)
+		msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+	if (a5xx_gpu->gpmu_bo)
+		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
+
+	a5xx_gpu->gpmu_bo = NULL;
+	a5xx_gpu->gpmu_iova = 0;
+	a5xx_gpu->gpmu_dwords = 0;
+
+out:
+	/* No need to keep that firmware laying around anymore */
+	release_firmware(fw);
+}
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 7749b60..0d9ceb2 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -86,6 +86,7 @@ static const struct adreno_info gpulist[] = {
 		.pfpfw = "a530_pfp.fw",
 		.gmem = SZ_1M,
 		.init = a5xx_gpu_init,
+		.gpmufw = "a530v3_gpmu.fw2",
 	},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index a946cf2..937546a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -73,6 +73,7 @@ struct adreno_info {
 	uint32_t revn;
 	const char *name;
 	const char *pm4fw, *pfpfw;
+	const char *gpmufw;
 	uint32_t gmem;
 	struct msm_gpu *(*init)(struct drm_device *dev);
 };
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/16] firmware: qcom_scm: Add qcom_scm_gpu_zap_resume()
  2016-11-04 22:44 [RFC] Initial support for the Adreno A5XX Jordan Crouse
                   ` (2 preceding siblings ...)
  2016-11-04 22:44 ` [PATCH 12/16] drm/msm: gpu: Add A5XX target support Jordan Crouse
@ 2016-11-04 22:44 ` Jordan Crouse
  2016-11-08 17:12 ` [Freedreno] [RFC] Initial support for the Adreno A5XX Jordan Crouse
  4 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm

Add an interface to trigger the remote processor to reinitialize the GPU
zap shader on power-up.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/firmware/qcom_scm-32.c |  5 +++++
 drivers/firmware/qcom_scm-64.c | 15 +++++++++++++++
 drivers/firmware/qcom_scm.c    |  6 ++++++
 drivers/firmware/qcom_scm.h    |  2 ++
 include/linux/qcom_scm.h       |  2 ++
 5 files changed, 30 insertions(+)

diff --git a/drivers/firmware/qcom_scm-32.c b/drivers/firmware/qcom_scm-32.c
index c6aeedb..1a0876c 100644
--- a/drivers/firmware/qcom_scm-32.c
+++ b/drivers/firmware/qcom_scm-32.c
@@ -560,3 +560,8 @@ int __qcom_scm_pas_mss_reset(struct device *dev, bool reset)
 
 	return ret ? : le32_to_cpu(out);
 }
+
+int __qcom_scm_gpu_zap_resume(struct device *dev)
+{
+	return -ENOTSUPP;
+}
diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
index 4a0f5ea..31089c1a 100644
--- a/drivers/firmware/qcom_scm-64.c
+++ b/drivers/firmware/qcom_scm-64.c
@@ -358,3 +358,18 @@ int __qcom_scm_pas_mss_reset(struct device *dev, bool reset)
 
 	return ret ? : res.a1;
 }
+
+int __qcom_scm_gpu_zap_resume(struct device *dev)
+{
+	struct qcom_scm_desc desc = {0};
+	struct arm_smccc_res res;
+	int ret;
+
+	desc.args[0] = 0;
+	desc.args[1] = 13;
+	desc.arginfo = QCOM_SCM_ARGS(2);
+
+	ret = qcom_scm_call(dev, QCOM_SCM_SVC_BOOT, 0x0A, &desc, &res);
+
+	return ret ? : res.a1;
+}
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index e64a501..4912a92 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -308,6 +308,12 @@ static const struct reset_control_ops qcom_scm_pas_reset_ops = {
 	.deassert = qcom_scm_pas_reset_deassert,
 };
 
+int qcom_scm_gpu_zap_resume(void)
+{
+	return __qcom_scm_gpu_zap_resume(__scm->dev);
+}
+EXPORT_SYMBOL(qcom_scm_gpu_zap_resume);
+
 /**
  * qcom_scm_is_available() - Checks if SCM is available
  */
diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h
index 3584b00..581f934 100644
--- a/drivers/firmware/qcom_scm.h
+++ b/drivers/firmware/qcom_scm.h
@@ -56,6 +56,8 @@ extern int  __qcom_scm_pas_auth_and_reset(struct device *dev, u32 peripheral);
 extern int  __qcom_scm_pas_shutdown(struct device *dev, u32 peripheral);
 extern int  __qcom_scm_pas_mss_reset(struct device *dev, bool reset);
 
+extern int __qcom_scm_gpu_zap_resume(struct device *dev);
+
 /* common error codes */
 #define QCOM_SCM_V2_EBUSY	-12
 #define QCOM_SCM_ENOMEM		-5
diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h
index cc32ab8..e1729fb 100644
--- a/include/linux/qcom_scm.h
+++ b/include/linux/qcom_scm.h
@@ -37,6 +37,8 @@ extern int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr,
 extern int qcom_scm_pas_auth_and_reset(u32 peripheral);
 extern int qcom_scm_pas_shutdown(u32 peripheral);
 
+extern int qcom_scm_gpu_zap_resume(void);
+
 #define QCOM_SCM_CPU_PWR_DOWN_L2_ON	0x0
 #define QCOM_SCM_CPU_PWR_DOWN_L2_OFF	0x1
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/16] drm/msm: Add a quick and dirty PIL loader
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (10 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 13/16] drm/msm: gpu: Add support for the GPMU Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  2016-11-04 22:44   ` [PATCH 16/16] drm/msm: gpu: Use the zap shader on 5XX if we can Jordan Crouse
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

In order to switch the GPU out of secure mode on most systems we
need to load a zap shader into memory and get it authenticated
and into the secure world.  All the bits and pieces to do
the load are scattered throughout the kernel, but we need to
bring everything together.

Add a semi-custom loader that will read a MDT file and get
it loaded and authenticated through SCM.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 166 ++++++++++++++++++++++++++++++++++
 1 file changed, 166 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 997ae31..c4a9a12 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -11,12 +11,178 @@
  *
  */
 
+#include <linux/elf.h>
+#include <linux/types.h>
+#include <linux/cpumask.h>
+#include <linux/qcom_scm.h>
+#include <linux/dma-mapping.h>
+#include <linux/of_reserved_mem.h>
 #include "msm_gem.h"
 #include "a5xx_gpu.h"
 
 extern bool hang_debug;
 static void a5xx_dump(struct msm_gpu *gpu);
 
+static inline bool _check_segment(const struct elf32_phdr *phdr)
+{
+	return ((phdr->p_type == PT_LOAD) &&
+		((phdr->p_flags & (7 << 24)) != (2 << 24)) &&
+		phdr->p_memsz);
+}
+
+static int __pil_tz_load_image(struct platform_device *pdev,
+		const struct firmware *mdt, const char *fwname,
+		void *fwptr, size_t fw_size, unsigned long fw_min_addr)
+{
+	char str[64] = { 0 };
+	const struct elf32_hdr *ehdr = (struct elf32_hdr *) mdt->data;
+	const struct elf32_phdr *phdrs = (struct elf32_phdr *) (ehdr + 1);
+	const struct firmware *fw;
+	int i, ret = 0;
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		const struct elf32_phdr *phdr = &phdrs[i];
+		size_t offset;
+
+		/* Make sure the segment is loadable */
+		if (!_check_segment(phdr))
+			continue;
+
+		/* Get the offset of the segment within the region */
+		offset = (phdr->p_paddr - fw_min_addr);
+
+		/* Request the file containing the segment */
+		snprintf(str, sizeof(str) - 1, "%s.b%02d", fwname, i);
+
+		ret = request_firmware(&fw, str, &pdev->dev);
+		if (ret) {
+			dev_err(&pdev->dev, "Failed to load segment %s\n", str);
+			break;
+		}
+
+		if (offset + fw->size > fw_size) {
+			dev_err(&pdev->dev, "Segment %s is too big\n", str);
+			ret = -EINVAL;
+			release_firmware(fw);
+			break;
+		}
+
+		/* Copy the segment into place */
+		memcpy(fwptr + offset, fw->data, fw->size);
+		release_firmware(fw);
+	}
+
+	return ret;
+}
+
+static int _pil_tz_load_image(struct platform_device *pdev)
+{
+	char str[64] = { 0 };
+	const char *fwname;
+	const struct elf32_hdr *ehdr;
+	const struct elf32_phdr *phdrs;
+	const struct firmware *mdt;
+	phys_addr_t fw_min_addr, fw_max_addr;
+	dma_addr_t fw_phys;
+	size_t fw_size;
+	u32 pas_id;
+	void *ptr;
+	int i, ret;
+
+	if (pdev == NULL)
+		return -ENODEV;
+
+	if (!qcom_scm_is_available()) {
+		dev_err(&pdev->dev, "SCM is not available\n");
+		return -EINVAL;
+	}
+
+	ret = of_reserved_mem_device_init(&pdev->dev);
+
+	if (ret) {
+		dev_err(&pdev->dev, "Unable to set up the reserved memory\n");
+		return ret;
+	}
+
+	/* Get the firmware and PAS id from the device node */
+	if (of_property_read_string(pdev->dev.of_node, "qcom,firmware",
+		&fwname)) {
+		dev_err(&pdev->dev, "Could not read a firmware name\n");
+		return -EINVAL;
+	}
+
+	if (of_property_read_u32(pdev->dev.of_node, "qcom,pas-id", &pas_id)) {
+		dev_err(&pdev->dev, "Could not read the pas ID\n");
+		return -EINVAL;
+	}
+
+	snprintf(str, sizeof(str) - 1, "%s.mdt", fwname);
+
+	/* Request the MDT file for the firmware */
+	ret = request_firmware(&mdt, str, &pdev->dev);
+	if (ret) {
+		dev_err(&pdev->dev, "Unable to load %s\n", str);
+		return ret;
+	}
+
+	ehdr = (struct elf32_hdr *) mdt->data;
+	phdrs = (struct elf32_phdr *) (ehdr + 1);
+
+	/* Get the extents of the firmware image */
+
+	fw_min_addr = (phys_addr_t) ULLONG_MAX;
+	fw_max_addr = 0;
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		const struct elf32_phdr *phdr = &phdrs[i];
+
+		if (!_check_segment(phdr))
+			continue;
+
+		fw_min_addr = min_t(phys_addr_t, fw_min_addr, phdr->p_paddr);
+		fw_max_addr = max_t(phys_addr_t, fw_max_addr,
+			PAGE_ALIGN(phdr->p_paddr + phdr->p_memsz));
+	}
+
+	if (fw_min_addr == (phys_addr_t) ULLONG_MAX && fw_max_addr == 0)
+		goto out;
+
+	fw_size = (size_t) (fw_max_addr - fw_min_addr);
+
+	/* Verify the MDT header */
+	ret = qcom_scm_pas_init_image(pas_id, mdt->data, mdt->size);
+	if (ret) {
+		dev_err(&pdev->dev, "Invalid firmware metadata\n");
+		goto out;
+	}
+
+	/* allocate some memory */
+	ptr = dma_alloc_coherent(&pdev->dev, fw_size, &fw_phys, GFP_KERNEL);
+	if (ptr == NULL)
+		goto out;
+
+	/* Set up the newly allocated memory region */
+	ret = qcom_scm_pas_mem_setup(pas_id, fw_phys, fw_size);
+	if (ret) {
+		dev_err(&pdev->dev, "Unable to set up firmware memory\n");
+		goto out;
+	}
+
+	ret = __pil_tz_load_image(pdev, mdt, fwname, ptr, fw_size, fw_min_addr);
+	if (!ret) {
+		ret = qcom_scm_pas_auth_and_reset(pas_id);
+		if (ret)
+			dev_err(&pdev->dev, "Unable to authorize the image\n");
+	}
+
+out:
+	if (ret && ptr)
+		dma_free_coherent(&pdev->dev, fw_size, ptr, fw_phys);
+
+	release_firmware(mdt);
+	return ret;
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/16] drm/msm: gpu: Use the zap shader on 5XX if we can
       [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (11 preceding siblings ...)
  2016-11-04 22:44   ` [PATCH 15/16] drm/msm: Add a quick and dirty PIL loader Jordan Crouse
@ 2016-11-04 22:44   ` Jordan Crouse
  12 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-04 22:44 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

The A5XX GPU powers on in "secure" mode. In secure mode the GPU can
only render to buffers that are marked as secure and inaccessible
to the kernel and user through a series of hardware protections. In
practice secure mode is used to draw things like a UI on a secure
video frame.

In order to switch out of secure mode the GPU executes a special
shader that clears out the GMEM and other sensitve registers and
then writes a register. Because the kernel can't be trusted the
shader binary is signed and verified and programmed by the
secure world. To do this we need to read the MDT header and the
segments from the firmware location and put them in memory and
present them for approval.

For targets without secure support there is an out: if the
secure world doesn't support secure then there are no hardware
protections and we can freely write the SECVID_TRUST register from
the CPU. We don't have 100% confidence that we can query the
secure capabilities at run time but we have enough calls that
need to go right to give us some confidence that we're at least doing
something useful.

Of course if we guess wrong you trigger a permissions violation
which usually ends up in a system crash but thats a problem
that shows up immediately.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/msm8996.dtsi | 20 ++++++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 72 ++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index f71b468..bb9f418 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -133,6 +133,18 @@
 		method = "smc";
 	};
 
+	reserved-memory {
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		peripheral_reserved: peripheral_region@8ea00000 {
+			compatible = "shared-dma-pool";
+			reg = <0 0x8ea00000 0 0x2b00000>;
+			no-map;
+		};
+	};
+
 	soc: soc {
 		#address-cells = <1>;
 		#size-cells = <1>;
@@ -543,6 +555,14 @@
 					qcom,gpu-freq = <27000000>;
 				};
 			};
+
+			qcom,zap-shader {
+				compatible = "qcom,zap-shader";
+				memory-region = <&peripheral_reserved>;
+
+				qcom,firmware = "a530_zap";
+				qcom,pas-id = <13>;
+			};
 		};
 
 		mdp_smmu: arm,smmu@d00000 {
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c4a9a12..6cf9429 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -469,6 +469,55 @@ static int a5xx_ucode_init(struct msm_gpu *gpu)
 	return 0;
 }
 
+static int a5xx_zap_shader_resume(struct msm_gpu *gpu)
+{
+	int ret;
+
+	ret = qcom_scm_gpu_zap_resume();
+	if (ret)
+		DRM_ERROR("%s: zap-shader resume failed: %d\n",
+			gpu->name, ret);
+
+	return ret;
+}
+
+static int a5xx_zap_shader_init(struct msm_gpu *gpu)
+{
+	static bool loaded;
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct platform_device *pdev = a5xx_gpu->pdev;
+	struct device_node *node;
+	int ret;
+
+	/*
+	 * If the zap shader is already loaded into memory we just need to kick
+	 * the remote processor to reinitialize it
+	 */
+	if (loaded)
+		return a5xx_zap_shader_resume(gpu);
+
+	/* Populate the sub-nodes if they haven't already been done */
+	of_platform_populate(pdev->dev.of_node, NULL, NULL, &pdev->dev);
+
+	/* Find the sub-node for the zap shader */
+	node = of_find_node_by_name(pdev->dev.of_node, "qcom,zap-shader");
+	if (!node) {
+		DRM_ERROR("%s: qcom,zap-shader not found in device tree\n",
+			gpu->name);
+		return -ENODEV;
+	}
+
+	ret = _pil_tz_load_image(of_find_device_by_node(node));
+	if (ret)
+		DRM_ERROR("%s: Unable to load the zap shader\n",
+			gpu->name);
+
+	loaded = !ret;
+
+	return ret;
+}
+
 #define A5XX_INT_MASK (A5XX_RBBM_INT_0_MASK_RBBM_AHB_ERROR | \
 	  A5XX_RBBM_INT_0_MASK_RBBM_TRANSFER_TIMEOUT | \
 	  A5XX_RBBM_INT_0_MASK_RBBM_ME_MS_TIMEOUT | \
@@ -653,8 +702,27 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 			return -EINVAL;
 	}
 
-	/* Put the GPU into unsecure mode */
-	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
+	/*
+	 * Try to load a zap shader into the secure world. If successful
+	 * we can use the CP to switch out of secure mode. If not then we
+	 * have no resource but to try to switch ourselves out manually. If we
+	 * guessed wrong then access to the RBBM_SECVID_TRUST_CNTL register will
+	 * be blocked and a permissions violation will soon follow.
+	 */
+	ret = a5xx_zap_shader_init(gpu);
+	if (!ret) {
+		OUT_PKT7(gpu->rb, CP_SET_SECURE_MODE, 1);
+		OUT_RING(gpu->rb, 0x00000000);
+
+		gpu->funcs->flush(gpu);
+		if (!gpu->funcs->idle(gpu))
+			return -EINVAL;
+	} else {
+		/* Print a warning so if we die, we know why */
+		dev_warn_once(gpu->dev->dev,
+			"Zap shader not enabled - using SECVID_TRUST_CNTL instead\n");
+		gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
+	}
 
 	return 0;
 }
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Freedreno] [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages
  2016-11-04 22:44   ` [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages Jordan Crouse
@ 2016-11-06 14:15     ` Rob Clark
       [not found]       ` <CAF6AEGtDv8tRZi82Eno5RF6a58qSRpjYcUo-J8dDDioDLLJqmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Rob Clark @ 2016-11-06 14:15 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: freedreno, linux-arm-msm, Archit Taneja, Sricharan R

On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org> wrote:
> For reasons that are not entirely understood using dma_map_sg()
> for nocache/write combine buffers doesn't always successfully flush
> the cache after the memory is zeroed somewhere deep in the bowels
> of the shmem code.  My working theory is that the cache flush on
> the swiotlb bounce buffer address work isn't always flushing what
> we need.
>
> Instead of using dma_map_sg() directly kmap and flush each page
> at allocate time.  We could use invalidate + clean or just invalidate
> if we wanted to but on ARM64 using a flush is safer and not much
> slower for what we are trying to do.
>
> Hopefully someday I'll more clearly understand the relationship between
> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
> about when and how we invalidate the caches.

Like I mentioned on irc, we defn don't want to ever hit bounce
buffers.  I think the problem here is dma-mapping assumes we only
support 32b physical addresses, which is not true (at least as long as
we have iommu)..

Archit hit a similar problem on the display side of things.

I think the proper solution is to do something like this somewhere:

   dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))

where size is 32 or 64 (48?) depending on device..

Note that this value should be perhaps something that the iommu driver
knows, since really it is about the iommu page tables.

Maybe it would even make sense for the iommu driver to set this when
we attach?  Although since the GEM code is allocating/mapping for
multiple subdev's that might not work out (ie. if we only attached
subdev's and not the parent drm dev which is used by the GEM code for
dma_map/unmap()).

(ofc it would be better if dma-mapping was structured more like
helpers which could be bypassed in the few special cases where the
abstraction doesn't work, rather than forcing us to do pretend
dma_map/unmap() for cache operations.. but that is a topic for a
different rant)

BR,
-R

> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 85f3047..29f5a30 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
>                 struct drm_device *dev = obj->dev;
>                 struct page **p;
>                 int npages = obj->size >> PAGE_SHIFT;
> +               int i;
>
>                 if (use_pages(obj))
>                         p = drm_gem_get_pages(obj);
> @@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object *obj)
>                         return p;
>                 }
>
> +               for (i = 0; i < npages; i++) {
> +                       void *addr = kmap_atomic(p[i]);
> +
> +                       __dma_flush_range(addr, addr + PAGE_SIZE);
> +                       kunmap_atomic(addr);
> +               }
> +
>                 msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
>                 if (IS_ERR(msm_obj->sgt)) {
>                         dev_err(dev->dev, "failed to allocate sgt\n");
> @@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object *obj)
>                 }
>
>                 msm_obj->pages = p;
> -
> -               /* For non-cached buffers, ensure the new pages are clean
> -                * because display controller, GPU, etc. are not coherent:
> -                */
> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> -                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>         }
>
>         return msm_obj->pages;
> @@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
>         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>
>         if (msm_obj->pages) {
> -               /* For non-cached buffers, ensure the new pages are clean
> -                * because display controller, GPU, etc. are not coherent:
> -                */
> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> -                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>                 sg_free_table(msm_obj->sgt);
>                 kfree(msm_obj->sgt);
>
> --
> 1.9.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages
       [not found]       ` <CAF6AEGtDv8tRZi82Eno5RF6a58qSRpjYcUo-J8dDDioDLLJqmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-07  8:35         ` Archit Taneja
       [not found]           ` <99a66f0f-ec84-a26e-0108-60367362c29e-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Archit Taneja @ 2016-11-07  8:35 UTC (permalink / raw)
  To: Rob Clark, Jordan Crouse
  Cc: linux-arm-msm, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Sricharan R



On 11/06/2016 07:45 PM, Rob Clark wrote:
> On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org> wrote:
>> For reasons that are not entirely understood using dma_map_sg()
>> for nocache/write combine buffers doesn't always successfully flush
>> the cache after the memory is zeroed somewhere deep in the bowels
>> of the shmem code.  My working theory is that the cache flush on
>> the swiotlb bounce buffer address work isn't always flushing what
>> we need.
>>
>> Instead of using dma_map_sg() directly kmap and flush each page
>> at allocate time.  We could use invalidate + clean or just invalidate
>> if we wanted to but on ARM64 using a flush is safer and not much
>> slower for what we are trying to do.
>>
>> Hopefully someday I'll more clearly understand the relationship between
>> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
>> about when and how we invalidate the caches.
>
> Like I mentioned on irc, we defn don't want to ever hit bounce
> buffers.  I think the problem here is dma-mapping assumes we only
> support 32b physical addresses, which is not true (at least as long as
> we have iommu)..
>
> Archit hit a similar problem on the display side of things.

Yeah, the shmem allocated pages ended up being 33 bit addresses some times
on db820c. The msm driver sets the dma mask to a default of 32 bits.
The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
addresses, and tries to use switolb for them. We eventually end up
overflowing the swiotlb.

Setting the mask to 33 bits worked as a temporary hack.

>
> I think the proper solution is to do something like this somewhere:
>
>    dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))
>
> where size is 32 or 64 (48?) depending on device..
>
> Note that this value should be perhaps something that the iommu driver
> knows, since really it is about the iommu page tables.
>
> Maybe it would even make sense for the iommu driver to set this when
> we attach?  Although since the GEM code is allocating/mapping for
> multiple subdev's that might not work out (ie. if we only attached
> subdev's and not the parent drm dev which is used by the GEM code for
> dma_map/unmap()).

By multiple subdevs, you mean MDP and GPU devs? As far as their dma
masks are concerned, it should be the same for both, right? It's only
the iommu driver's returned iova that we need to care about. Weren't we
thinking of managing that by having separate drm_mm's for MDP and GPU?


>
> (ofc it would be better if dma-mapping was structured more like
> helpers which could be bypassed in the few special cases where the
> abstraction doesn't work, rather than forcing us to do pretend
> dma_map/unmap() for cache operations.. but that is a topic for a
> different rant)

Yeah, that would be nice.

Archit

>
> BR,
> -R
>
>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
>> index 85f3047..29f5a30 100644
>> --- a/drivers/gpu/drm/msm/msm_gem.c
>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>> @@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                 struct drm_device *dev = obj->dev;
>>                 struct page **p;
>>                 int npages = obj->size >> PAGE_SHIFT;
>> +               int i;
>>
>>                 if (use_pages(obj))
>>                         p = drm_gem_get_pages(obj);
>> @@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                         return p;
>>                 }
>>
>> +               for (i = 0; i < npages; i++) {
>> +                       void *addr = kmap_atomic(p[i]);
>> +
>> +                       __dma_flush_range(addr, addr + PAGE_SIZE);
>> +                       kunmap_atomic(addr);
>> +               }
>> +
>>                 msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
>>                 if (IS_ERR(msm_obj->sgt)) {
>>                         dev_err(dev->dev, "failed to allocate sgt\n");
>> @@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                 }
>>
>>                 msm_obj->pages = p;
>> -
>> -               /* For non-cached buffers, ensure the new pages are clean
>> -                * because display controller, GPU, etc. are not coherent:
>> -                */
>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>> -                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
>> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>>         }
>>
>>         return msm_obj->pages;
>> @@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
>>         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>
>>         if (msm_obj->pages) {
>> -               /* For non-cached buffers, ensure the new pages are clean
>> -                * because display controller, GPU, etc. are not coherent:
>> -                */
>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>> -                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
>> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>>                 sg_free_table(msm_obj->sgt);
>>                 kfree(msm_obj->sgt);
>>
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Freedreno mailing list
>> Freedreno@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages
       [not found]           ` <99a66f0f-ec84-a26e-0108-60367362c29e-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-07 12:19             ` Rob Clark
  2016-11-07 18:01               ` [Freedreno] " Jordan Crouse
  0 siblings, 1 reply; 29+ messages in thread
From: Rob Clark @ 2016-11-07 12:19 UTC (permalink / raw)
  To: Archit Taneja
  Cc: linux-arm-msm, Jordan Crouse,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Sricharan R

On Mon, Nov 7, 2016 at 3:35 AM, Archit Taneja <architt@codeaurora.org> wrote:
>
>
> On 11/06/2016 07:45 PM, Rob Clark wrote:
>>
>> On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org>
>> wrote:
>>>
>>> For reasons that are not entirely understood using dma_map_sg()
>>> for nocache/write combine buffers doesn't always successfully flush
>>> the cache after the memory is zeroed somewhere deep in the bowels
>>> of the shmem code.  My working theory is that the cache flush on
>>> the swiotlb bounce buffer address work isn't always flushing what
>>> we need.
>>>
>>> Instead of using dma_map_sg() directly kmap and flush each page
>>> at allocate time.  We could use invalidate + clean or just invalidate
>>> if we wanted to but on ARM64 using a flush is safer and not much
>>> slower for what we are trying to do.
>>>
>>> Hopefully someday I'll more clearly understand the relationship between
>>> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
>>> about when and how we invalidate the caches.
>>
>>
>> Like I mentioned on irc, we defn don't want to ever hit bounce
>> buffers.  I think the problem here is dma-mapping assumes we only
>> support 32b physical addresses, which is not true (at least as long as
>> we have iommu)..
>>
>> Archit hit a similar problem on the display side of things.
>
>
> Yeah, the shmem allocated pages ended up being 33 bit addresses some times
> on db820c. The msm driver sets the dma mask to a default of 32 bits.
> The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
> addresses, and tries to use switolb for them. We eventually end up
> overflowing the swiotlb.
>
> Setting the mask to 33 bits worked as a temporary hack.

actually I think setting the mask is the correct thing to do, but
probably should use dma_set_ fxn..  and I suspect that the PA for the
iommu is larger than 33 bits.  It is probably equal to the number of
address lines that are wired up.  Not quite sure what that is, but I
don't think there are any devices where the iommu cannot map some
physical pages.

>>
>> I think the proper solution is to do something like this somewhere:
>>
>>    dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))
>>
>> where size is 32 or 64 (48?) depending on device..
>>
>> Note that this value should be perhaps something that the iommu driver
>> knows, since really it is about the iommu page tables.
>>
>> Maybe it would even make sense for the iommu driver to set this when
>> we attach?  Although since the GEM code is allocating/mapping for
>> multiple subdev's that might not work out (ie. if we only attached
>> subdev's and not the parent drm dev which is used by the GEM code for
>> dma_map/unmap()).
>
>
> By multiple subdevs, you mean MDP and GPU devs? As far as their dma
> masks are concerned, it should be the same for both, right? It's only
> the iommu driver's returned iova that we need to care about. Weren't we
> thinking of managing that by having separate drm_mm's for MDP and GPU?

correct, it isn't about the mdp vs gpu iova, but about the pa that is
mapped in the iommu.

And yeah, multiple address spaces is one of the things I have for 4.10:

https://github.com/freedreno/kernel-msm/commit/40d73b331e45fa4bb255ae26012378d396ee46a5

BR,
-R

>
>>
>> (ofc it would be better if dma-mapping was structured more like
>> helpers which could be bypassed in the few special cases where the
>> abstraction doesn't work, rather than forcing us to do pretend
>> dma_map/unmap() for cache operations.. but that is a topic for a
>> different rant)
>
>
> Yeah, that would be nice.
>
> Archit
>
>
>>
>> BR,
>> -R
>>
>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>> ---
>>>  drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
>>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_gem.c
>>> b/drivers/gpu/drm/msm/msm_gem.c
>>> index 85f3047..29f5a30 100644
>>> --- a/drivers/gpu/drm/msm/msm_gem.c
>>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>>> @@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                 struct drm_device *dev = obj->dev;
>>>                 struct page **p;
>>>                 int npages = obj->size >> PAGE_SHIFT;
>>> +               int i;
>>>
>>>                 if (use_pages(obj))
>>>                         p = drm_gem_get_pages(obj);
>>> @@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                         return p;
>>>                 }
>>>
>>> +               for (i = 0; i < npages; i++) {
>>> +                       void *addr = kmap_atomic(p[i]);
>>> +
>>> +                       __dma_flush_range(addr, addr + PAGE_SIZE);
>>> +                       kunmap_atomic(addr);
>>> +               }
>>> +
>>>                 msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
>>>                 if (IS_ERR(msm_obj->sgt)) {
>>>                         dev_err(dev->dev, "failed to allocate sgt\n");
>>> @@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                 }
>>>
>>>                 msm_obj->pages = p;
>>> -
>>> -               /* For non-cached buffers, ensure the new pages are clean
>>> -                * because display controller, GPU, etc. are not
>>> coherent:
>>> -                */
>>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>>> -                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
>>> -                                       msm_obj->sgt->nents,
>>> DMA_BIDIRECTIONAL);
>>>         }
>>>
>>>         return msm_obj->pages;
>>> @@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
>>>         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>>
>>>         if (msm_obj->pages) {
>>> -               /* For non-cached buffers, ensure the new pages are clean
>>> -                * because display controller, GPU, etc. are not
>>> coherent:
>>> -                */
>>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>>> -                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
>>> -                                       msm_obj->sgt->nents,
>>> DMA_BIDIRECTIONAL);
>>>                 sg_free_table(msm_obj->sgt);
>>>                 kfree(msm_obj->sgt);
>>>
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Freedreno mailing list
>>> Freedreno@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/freedreno
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/16] drm/msm: gpu: Add support for the GPMU
       [not found]     ` <1478299497-9729-14-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-07 12:58       ` Stanimir Varbanov
  2016-11-07 13:02         ` [Freedreno] " Rob Clark
       [not found]         ` <740c4fda-dfd6-7a70-9cb7-3eec6a5781ca-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
  0 siblings, 2 replies; 29+ messages in thread
From: Stanimir Varbanov @ 2016-11-07 12:58 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Hi Jordan,

On 11/05/2016 12:44 AM, Jordan Crouse wrote:
> Most 5XX targets have GPMU (Graphics Power Management Unit) that
> handles a lot of the heavy lifting for power management including
> thermal and limits management and dynamic power collapse. While
> the GPMU itself is optional, it is usually nessesary to hit
> aggressive power targets.
> 
> The GPMU firmware needs to be loaded into the GPMU at init time via a
> shared hardware block of registers. Using the GPU to write the microcode
> is more efficient than using the CPU so at first load create an indirect
> buffer that can be executed during subsequent initalization sequences.
> 
> After loading the GPMU gets initalized through a shared register
> interface and then we mostly get out of its way and let it do
> its thing.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/Makefile               |   1 +
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c      |  64 +++++-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h      |  23 ++
>  drivers/gpu/drm/msm/adreno/a5xx_power.c    | 341 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/adreno_device.c |   1 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h    |   1 +
>  6 files changed, 428 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_power.c
> 

<cut>

> +/* Enable the GPMU microcontroller */
> +static int a5xx_gpmu_init(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
> +	struct msm_ringbuffer *ring = gpu->rb;
> +
> +	if (!a5xx_gpu->gpmu_dwords)
> +		return 0;
> +
> +	/* Turn off protected mode for this operation */
> +	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);

This looks wrong, shouldn't it be?

OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 0)

> +	OUT_RING(ring, 0);
> +
> +	/* Kick off the IB to load the GPMU microcode */
> +	OUT_PKT7(ring, CP_INDIRECT_BUFFER_PFE, 3);
> +	OUT_RING(ring, lower_32_bits(a5xx_gpu->gpmu_iova));
> +	OUT_RING(ring, upper_32_bits(a5xx_gpu->gpmu_iova));
> +	OUT_RING(ring, a5xx_gpu->gpmu_dwords);
> +
> +	/* Turn back on protected mode */
> +	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> +	OUT_RING(ring, 1);
> +
> +	gpu->funcs->flush(gpu);
> +
> +	if (!gpu->funcs->idle(gpu)) {
> +		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
> +			gpu->name);
> +		return -EINVAL;
> +	}
> +
> +	gpu_write(gpu, REG_A5XX_GPMU_WFI_CONFIG, 0x4014);
> +
> +	/* Kick off the GPMU */
> +	gpu_write(gpu, REG_A5XX_GPMU_CM3_SYSRESET, 0x0);
> +
> +	/*
> +	 * Wait for the GPMU to respond. It isn't fatal if it doesn't, we just
> +	 * won't have advanced power collapse.
> +	 */
> +	if (spin_usecs(gpu, 25, REG_A5XX_GPMU_GENERAL_0, 0xFFFFFFFF,
> +		0xBABEFACE))
> +		DRM_ERROR("%s: GPMU firmware initialization timed out\n",
> +			gpu->name);
> +
> +	return 0;
> +}
> +

<cut>


-- 
regards,
Stan
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Freedreno] [PATCH 13/16] drm/msm: gpu: Add support for the GPMU
  2016-11-07 12:58       ` Stanimir Varbanov
@ 2016-11-07 13:02         ` Rob Clark
       [not found]           ` <CAF6AEGuW6ThJM-+X-=XGtqTCY_hcq8DghJHRf38OWjy4Z3R=DQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]         ` <740c4fda-dfd6-7a70-9cb7-3eec6a5781ca-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Rob Clark @ 2016-11-07 13:02 UTC (permalink / raw)
  To: Stanimir Varbanov; +Cc: Jordan Crouse, freedreno, linux-arm-msm

On Mon, Nov 7, 2016 at 7:58 AM, Stanimir Varbanov
<stanimir.varbanov@linaro.org> wrote:
>> +     /* Turn off protected mode for this operation */
>> +     OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
>
> This looks wrong, shouldn't it be?

no, it is actually correct, the 3rd arg is # of dwords of payload in
the msg.  So the payload of the CP_SET_PROTECTED_MODE packet is the
following OUT_RING(ring, 0)

BR,
-R

> OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 0)
>
>> +     OUT_RING(ring, 0);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/16] drm/msm: gpu: Add support for the GPMU
       [not found]           ` <CAF6AEGuW6ThJM-+X-=XGtqTCY_hcq8DghJHRf38OWjy4Z3R=DQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-07 14:47             ` Stanimir Varbanov
  0 siblings, 0 replies; 29+ messages in thread
From: Stanimir Varbanov @ 2016-11-07 14:47 UTC (permalink / raw)
  To: Rob Clark, Stanimir Varbanov
  Cc: linux-arm-msm, Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Rob,

On 11/07/2016 03:02 PM, Rob Clark wrote:
> On Mon, Nov 7, 2016 at 7:58 AM, Stanimir Varbanov
> <stanimir.varbanov@linaro.org> wrote:
>>> +     /* Turn off protected mode for this operation */
>>> +     OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
>>
>> This looks wrong, shouldn't it be?
> 
> no, it is actually correct, the 3rd arg is # of dwords of payload in
> the msg.  So the payload of the CP_SET_PROTECTED_MODE packet is the
> following OUT_RING(ring, 0)

Ahh, understood now. Sorry for the noise :)

-- 
regards,
Stan
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Freedreno] [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages
  2016-11-07 12:19             ` Rob Clark
@ 2016-11-07 18:01               ` Jordan Crouse
  0 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-07 18:01 UTC (permalink / raw)
  To: Rob Clark; +Cc: Archit Taneja, freedreno, linux-arm-msm, Sricharan R

On Mon, Nov 07, 2016 at 07:19:24AM -0500, Rob Clark wrote:
> On Mon, Nov 7, 2016 at 3:35 AM, Archit Taneja <architt@codeaurora.org> wrote:
> >
> >
> > On 11/06/2016 07:45 PM, Rob Clark wrote:
> >>
> >> On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org>
> >> wrote:
> >>>
> >>> For reasons that are not entirely understood using dma_map_sg()
> >>> for nocache/write combine buffers doesn't always successfully flush
> >>> the cache after the memory is zeroed somewhere deep in the bowels
> >>> of the shmem code.  My working theory is that the cache flush on
> >>> the swiotlb bounce buffer address work isn't always flushing what
> >>> we need.
> >>>
> >>> Instead of using dma_map_sg() directly kmap and flush each page
> >>> at allocate time.  We could use invalidate + clean or just invalidate
> >>> if we wanted to but on ARM64 using a flush is safer and not much
> >>> slower for what we are trying to do.
> >>>
> >>> Hopefully someday I'll more clearly understand the relationship between
> >>> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
> >>> about when and how we invalidate the caches.
> >>
> >>
> >> Like I mentioned on irc, we defn don't want to ever hit bounce
> >> buffers.  I think the problem here is dma-mapping assumes we only
> >> support 32b physical addresses, which is not true (at least as long as
> >> we have iommu)..
> >>
> >> Archit hit a similar problem on the display side of things.
> >
> >
> > Yeah, the shmem allocated pages ended up being 33 bit addresses some times
> > on db820c. The msm driver sets the dma mask to a default of 32 bits.
> > The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
> > addresses, and tries to use switolb for them. We eventually end up
> > overflowing the swiotlb.
> >
> > Setting the mask to 33 bits worked as a temporary hack.
> 
> actually I think setting the mask is the correct thing to do, but
> probably should use dma_set_ fxn..  and I suspect that the PA for the
> iommu is larger than 33 bits.  It is probably equal to the number of
> address lines that are wired up.  Not quite sure what that is, but I
> don't think there are any devices where the iommu cannot map some
> physical pages.

The GPU/MMU combo supporting 48 bit PAs since at least the 4XX era. I'm not sure
if a 48 bit mask would break the older targets or not - you won't be getting any
addressable  memory > 32 bits on those devices anyway.

I'll try the dma_set_mask trick with 48 bits and see how it pans out.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/16] drm/msm: gpu: Add support for the GPMU
       [not found]         ` <740c4fda-dfd6-7a70-9cb7-3eec6a5781ca-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
@ 2016-11-07 18:09           ` Jordan Crouse
  0 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-07 18:09 UTC (permalink / raw)
  To: Stanimir Varbanov
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Nov 07, 2016 at 02:58:01PM +0200, Stanimir Varbanov wrote:
> Hi Jordan,
> 
> On 11/05/2016 12:44 AM, Jordan Crouse wrote:
> > Most 5XX targets have GPMU (Graphics Power Management Unit) that
> > handles a lot of the heavy lifting for power management including
> > thermal and limits management and dynamic power collapse. While
> > the GPMU itself is optional, it is usually nessesary to hit
> > aggressive power targets.
> > 
> > The GPMU firmware needs to be loaded into the GPMU at init time via a
> > shared hardware block of registers. Using the GPU to write the microcode
> > is more efficient than using the CPU so at first load create an indirect
> > buffer that can be executed during subsequent initalization sequences.
> > 
> > After loading the GPMU gets initalized through a shared register
> > interface and then we mostly get out of its way and let it do
> > its thing.
> > 
> > Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> > ---
> >  drivers/gpu/drm/msm/Makefile               |   1 +
> >  drivers/gpu/drm/msm/adreno/a5xx_gpu.c      |  64 +++++-
> >  drivers/gpu/drm/msm/adreno/a5xx_gpu.h      |  23 ++
> >  drivers/gpu/drm/msm/adreno/a5xx_power.c    | 341 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/msm/adreno/adreno_device.c |   1 +
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h    |   1 +
> >  6 files changed, 428 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_power.c
> > 
> 
> <cut>
> 
> > +/* Enable the GPMU microcontroller */
> > +static int a5xx_gpmu_init(struct msm_gpu *gpu)
> > +{
> > +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > +	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
> > +	struct msm_ringbuffer *ring = gpu->rb;
> > +
> > +	if (!a5xx_gpu->gpmu_dwords)
> > +		return 0;
> > +
> > +	/* Turn off protected mode for this operation */
> > +	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> 
> This looks wrong, shouldn't it be?
> 
> OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 0)

I admit this syntax can be really confusing.  Opcode headers consist of a
opcode and a payload size (and some other stuff you don't need to worry
about). So this line is declaring opcode CP_SET_PROTETED_MODE with a payload
length of 1 dword.  The payload is either 0 (disable protected mode) or 1
(enable protected mode):

> > +	OUT_RING(ring, 0);

Here we disable.

> > +
> > +	/* Kick off the IB to load the GPMU microcode */
> > +	OUT_PKT7(ring, CP_INDIRECT_BUFFER_PFE, 3);
> > +	OUT_RING(ring, lower_32_bits(a5xx_gpu->gpmu_iova));
> > +	OUT_RING(ring, upper_32_bits(a5xx_gpu->gpmu_iova));
> > +	OUT_RING(ring, a5xx_gpu->gpmu_dwords);
> > +
> > +	/* Turn back on protected mode */
> > +	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > +	OUT_RING(ring, 1);

And here we re-enable.

Downstream uses inline functions for a lot of the common CP functions to avoid
confusion like this - the problem with those is that you need to pass in the GPU
and do a bunch of logic to get the right opcode for pre-A5XX and post-A5XX and I
don't generally like those when you can just directly go to the ring with a
minimum of fuss.

Jordan
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/16] drm/msm: gpu: Return error on hw_init failure
       [not found]     ` <1478299497-9729-6-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-07 18:54       ` Rob Clark
  0 siblings, 0 replies; 29+ messages in thread
From: Rob Clark @ 2016-11-07 18:54 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: linux-arm-msm, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org> wrote:
> When the GPU hardware init function fails (like say, ME_INIT timed
> out) return error instead of blindly continuing on. This gives us
> a small chance of saving the system before it goes boom.

seems like a good idea.. although I guess we should just make it an
int return (ie. 0 or -ETIMEDOUT)..

BR,
-R

> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 21 ++++++++++++---------
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 20 +++++++++++---------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 11 +++++------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.h           |  2 +-
>  5 files changed, 30 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 5eda847..13989d9 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -41,7 +41,7 @@ extern bool hang_debug;
>
>  static void a3xx_dump(struct msm_gpu *gpu);
>
> -static void a3xx_me_init(struct msm_gpu *gpu)
> +static bool a3xx_me_init(struct msm_gpu *gpu)
>  {
>         struct msm_ringbuffer *ring = gpu->rb;
>
> @@ -65,7 +65,7 @@ static void a3xx_me_init(struct msm_gpu *gpu)
>         OUT_RING(ring, 0x00000000);
>
>         gpu->funcs->flush(gpu);
> -       gpu->funcs->idle(gpu);
> +       return gpu->funcs->idle(gpu);
>  }
>
>  static int a3xx_hw_init(struct msm_gpu *gpu)
> @@ -294,9 +294,7 @@ static int a3xx_hw_init(struct msm_gpu *gpu)
>         /* clear ME_HALT to start micro engine */
>         gpu_write(gpu, REG_AXXX_CP_ME_CNTL, 0);
>
> -       a3xx_me_init(gpu);
> -
> -       return 0;
> +       return a3xx_me_init(gpu) ? 0 : -EINVAL;
>  }
>
>  static void a3xx_recover(struct msm_gpu *gpu)
> @@ -330,17 +328,22 @@ static void a3xx_destroy(struct msm_gpu *gpu)
>         kfree(a3xx_gpu);
>  }
>
> -static void a3xx_idle(struct msm_gpu *gpu)
> +static bool a3xx_idle(struct msm_gpu *gpu)
>  {
>         /* wait for ringbuffer to drain: */
> -       adreno_idle(gpu);
> +       if (!adreno_idle(gpu))
> +               return false;
>
>         /* then wait for GPU to finish: */
>         if (spin_until(!(gpu_read(gpu, REG_A3XX_RBBM_STATUS) &
> -                       A3XX_RBBM_STATUS_GPU_BUSY)))
> +                       A3XX_RBBM_STATUS_GPU_BUSY))) {
>                 DRM_ERROR("%s: timeout waiting for GPU to idle!\n", gpu->name);
>
> -       /* TODO maybe we need to reset GPU here to recover from hang? */
> +               /* TODO maybe we need to reset GPU here to recover from hang? */
> +               return false;
> +       }
> +
> +       return true;
>  }
>
>  static irqreturn_t a3xx_irq(struct msm_gpu *gpu)
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index 0354be2..ba16507 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -113,7 +113,7 @@ static void a4xx_enable_hwcg(struct msm_gpu *gpu)
>  }
>
>
> -static void a4xx_me_init(struct msm_gpu *gpu)
> +static bool a4xx_me_init(struct msm_gpu *gpu)
>  {
>         struct msm_ringbuffer *ring = gpu->rb;
>
> @@ -137,7 +137,7 @@ static void a4xx_me_init(struct msm_gpu *gpu)
>         OUT_RING(ring, 0x00000000);
>
>         gpu->funcs->flush(gpu);
> -       gpu->funcs->idle(gpu);
> +       return gpu->funcs->idle(gpu);
>  }
>
>  static int a4xx_hw_init(struct msm_gpu *gpu)
> @@ -292,9 +292,7 @@ static int a4xx_hw_init(struct msm_gpu *gpu)
>         /* clear ME_HALT to start micro engine */
>         gpu_write(gpu, REG_A4XX_CP_ME_CNTL, 0);
>
> -       a4xx_me_init(gpu);
> -
> -       return 0;
> +       return a4xx_me_init(gpu) ? 0 : -EINVAL;
>  }
>
>  static void a4xx_recover(struct msm_gpu *gpu)
> @@ -328,17 +326,21 @@ static void a4xx_destroy(struct msm_gpu *gpu)
>         kfree(a4xx_gpu);
>  }
>
> -static void a4xx_idle(struct msm_gpu *gpu)
> +static bool a4xx_idle(struct msm_gpu *gpu)
>  {
>         /* wait for ringbuffer to drain: */
> -       adreno_idle(gpu);
> +       if (!adreno_idle(gpu))
> +               return false;
>
>         /* then wait for GPU to finish: */
>         if (spin_until(!(gpu_read(gpu, REG_A4XX_RBBM_STATUS) &
> -                                       A4XX_RBBM_STATUS_GPU_BUSY)))
> +                                       A4XX_RBBM_STATUS_GPU_BUSY))) {
>                 DRM_ERROR("%s: timeout waiting for GPU to idle!\n", gpu->name);
> +               /* TODO maybe we need to reset GPU here to recover from hang? */
> +               return false;
> +       }
>
> -       /* TODO maybe we need to reset GPU here to recover from hang? */
> +       return true;
>  }
>
>  static irqreturn_t a4xx_irq(struct msm_gpu *gpu)
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index f386f46..8b2201c 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -218,19 +218,18 @@ void adreno_flush(struct msm_gpu *gpu)
>         adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_WPTR, wptr);
>  }
>
> -void adreno_idle(struct msm_gpu *gpu)
> +bool adreno_idle(struct msm_gpu *gpu)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         uint32_t wptr = get_wptr(gpu->rb);
> -       int ret;
>
>         /* wait for CP to drain ringbuffer: */
> -       ret = spin_until(get_rptr(adreno_gpu) == wptr);
> -
> -       if (ret)
> -               DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
> +       if (!spin_until(get_rptr(adreno_gpu) == wptr))
> +               return true;
>
>         /* TODO maybe we need to reset GPU here to recover from hang? */
> +       DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
> +       return false;
>  }
>
>  #ifdef CONFIG_DEBUG_FS
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index fd976e8..30c2956 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -182,7 +182,7 @@ void adreno_recover(struct msm_gpu *gpu);
>  void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                 struct msm_file_private *ctx);
>  void adreno_flush(struct msm_gpu *gpu);
> -void adreno_idle(struct msm_gpu *gpu);
> +bool adreno_idle(struct msm_gpu *gpu);
>  #ifdef CONFIG_DEBUG_FS
>  void adreno_show(struct msm_gpu *gpu, struct seq_file *m);
>  #endif
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index c902283..4ee95ca 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -50,7 +50,7 @@ struct msm_gpu_funcs {
>         void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                         struct msm_file_private *ctx);
>         void (*flush)(struct msm_gpu *gpu);
> -       void (*idle)(struct msm_gpu *gpu);
> +       bool (*idle)(struct msm_gpu *gpu);
>         irqreturn_t (*irq)(struct msm_gpu *irq);
>         uint32_t (*last_fence)(struct msm_gpu *gpu);
>         void (*recover)(struct msm_gpu *gpu);
> --
> 1.9.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/16] drm/msm: gpu Add new gpu register read/write functions
       [not found]     ` <1478299497-9729-10-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-11-07 19:17       ` Rob Clark
  0 siblings, 0 replies; 29+ messages in thread
From: Rob Clark @ 2016-11-07 19:17 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: linux-arm-msm, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org> wrote:
> Add some new functions to manipulate GPU registers.  gpu_read64 and
> gpu_write64 can read/write a 64 bit value to two 32 bit registers.
> For 4XX and older these are normally perfcounter registers, but
> future targets will use 64 bit addressing so there will be many
> more spots where a 64 bit read and write are needed.
>
> gpu_rmw() does a read/modify/write on a 32 bit register given a mask
> and bits to OR in.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
>  drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index f39e082..f2f9c91 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -515,16 +515,8 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
>
>  static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>  {
> -       uint32_t hi, lo, tmp;
> -
> -       tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
> -       do {
> -               hi = tmp;
> -               lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
> -               tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
> -       } while (tmp != hi);
> -
> -       *value = (((uint64_t)hi) << 32) | lo;
> +       *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
> +               REG_A4XX_RBBM_PERFCTR_CP_0_HI);

probably other places are ok to use gpu_read64, but I kinda think the
timestamp counters should stay with the loop while(tmp!=hi)..

I guess if we were reading perfctrs in enough places, a special
read_perfctr type fxn would be in order.

BR,
-R

>         return 0;
>  }
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 161cd2f..bec3735 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -154,6 +154,45 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
>         return msm_readl(gpu->mmio + (reg << 2));
>  }
>
> +static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
> +{
> +       uint32_t val = gpu_read(gpu, reg);
> +
> +       val &= ~mask;
> +       gpu_write(gpu, reg, val | or);
> +}
> +
> +static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
> +{
> +       u64 val;
> +
> +       /*
> +        * Why not a readq here? Two reasons: 1) many of the LO registers are
> +        * not quad word aligned and 2) the GPU hardware designers have a bit
> +        * of a history of putting registers where they fit, especially in
> +        * spins. The longer a GPU family goes the higher the chance that
> +        * we'll get burned.  We could do a series of validity checks if we
> +        * wanted to, but really is a readq() that much better? Nah.
> +        */
> +
> +       /*
> +        * For some lo/hi registers (like perfcounters), the hi value is latched
> +        * when the lo is read, so make sure to read the lo first to trigger
> +        * that
> +        */
> +       val = (u64) msm_readl(gpu->mmio + (lo << 2));
> +       val |= ((u64) msm_readl(gpu->mmio + (hi << 2)) << 32);
> +
> +       return val;
> +}
> +
> +static inline void gpu_write64(struct msm_gpu *gpu, u32 lo, u32 hi, u64 val)
> +{
> +       /* Why not a writeq here? Read the screed above */
> +       msm_writel(lower_32_bits(val), gpu->mmio + (lo << 2));
> +       msm_writel(upper_32_bits(val), gpu->mmio + (hi << 2));
> +}
> +
>  int msm_gpu_pm_suspend(struct msm_gpu *gpu);
>  int msm_gpu_pm_resume(struct msm_gpu *gpu);
>
> --
> 1.9.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Freedreno] [PATCH 07/16] drm/msm: Add adreno_gpu_write64()
  2016-11-04 22:44   ` [PATCH 07/16] drm/msm: Add adreno_gpu_write64() Jordan Crouse
@ 2016-11-07 19:19     ` Rob Clark
  0 siblings, 0 replies; 29+ messages in thread
From: Rob Clark @ 2016-11-07 19:19 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: freedreno, linux-arm-msm

On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse@codeaurora.org> wrote:
> Add a new generic function to write a "64" bit value. This isn't
> actually a 64 bit operation, it just writes the upper and lower
> 32 bit of a 64 bit value to a specified LO and HI register.  If
> a particular target doesn't support one of the registers it can
> mark that register as SKIP and writes/reads from that register
> will be quietly dropped.
>
> This can be immediately put in place for the ringbuffer base and
> the RPTR address.  Both writes are converted to use
> adreno_gpu_write64() with their respective high and low registers
> and the high register appropriately marked as SKIP for both 32 bit
> targets (a3xx and a4xx). When a5xx comes it will define valid target
> registers for the 'hi' option and everything else will just work.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 ++
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 11 +++++++----
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++++++-
>  4 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 13989d9..806aab4 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -423,7 +423,9 @@ static void a3xx_dump(struct msm_gpu *gpu)
>  /* Register offset defines for A3XX */
>  static const unsigned int a3xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
> +       REG_ADRENO_SKIP(REG_ADRENO_CP_RB_BASE_HI),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_AXXX_CP_RB_RPTR_ADDR),
> +       REG_ADRENO_SKIP(REG_ADRENO_CP_RB_RPTR_ADDR_HI),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_AXXX_CP_RB_RPTR),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_AXXX_CP_RB_WPTR),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_AXXX_CP_RB_CNTL),
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ba16507..f39e082 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -463,7 +463,9 @@ static void a4xx_show(struct msm_gpu *gpu, struct seq_file *m)
>  /* Register offset defines for A4XX, in order of enum adreno_regs */
>  static const unsigned int a4xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_A4XX_CP_RB_BASE),
> +       REG_ADRENO_SKIP(REG_ADRENO_CP_RB_BASE_HI),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR_ADDR, REG_A4XX_CP_RB_RPTR_ADDR),
> +       REG_ADRENO_SKIP(REG_ADRENO_CP_RB_RPTR_ADDR_HI),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_RPTR, REG_A4XX_CP_RB_RPTR),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_WPTR, REG_A4XX_CP_RB_WPTR),
>         REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_CNTL, REG_A4XX_CP_RB_CNTL),
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 8b2201c..d7b62b8 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -79,11 +79,14 @@ int adreno_hw_init(struct msm_gpu *gpu)
>                         (adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
>
>         /* Setup ringbuffer address: */
> -       adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_BASE, gpu->rb_iova);
> +       adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_BASE,
> +               REG_ADRENO_CP_RB_BASE_HI, gpu->rb_iova);
>
> -       if (!adreno_is_a430(adreno_gpu))
> -               adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
> -                                               rbmemptr(adreno_gpu, rptr));
> +       if (!adreno_is_a430(adreno_gpu)) {
> +               adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
> +                       REG_ADRENO_CP_RB_RPTR_ADDR_HI,
> +                       rbmemptr(adreno_gpu, rptr));
> +       }
>
>         return 0;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 707487f..7b9895f 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -28,6 +28,9 @@
>  #include "adreno_pm4.xml.h"
>
>  #define REG_ADRENO_DEFINE(_offset, _reg) [_offset] = (_reg) + 1
> +#define REG_SKIP ~0
> +#define REG_ADRENO_SKIP(_offset) [_offset] = REG_SKIP
> +

I think you can drop this.. we use (_reg)+1 so that a zero value in
the table can be interpreted as a skip.  That was one small diff from
how the downstream driver does it.

BR,
-R

>  /**
>   * adreno_regs: List of registers that are used in across all
>   * 3D devices. Each device type has different offset value for the same
> @@ -36,7 +39,9 @@
>   */
>  enum adreno_regs {
>         REG_ADRENO_CP_RB_BASE,
> +       REG_ADRENO_CP_RB_BASE_HI,
>         REG_ADRENO_CP_RB_RPTR_ADDR,
> +       REG_ADRENO_CP_RB_RPTR_ADDR_HI,
>         REG_ADRENO_CP_RB_RPTR,
>         REG_ADRENO_CP_RB_WPTR,
>         REG_ADRENO_CP_RB_CNTL,
> @@ -250,7 +255,7 @@ OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
>  }
>
>  /*
> - * adreno_checkreg_off() - Checks the validity of a register enum
> + * adreno_reg_check() - Checks the validity of a register enum
>   * @gpu:               Pointer to struct adreno_gpu
>   * @offset_name:       The register enum that is checked
>   */
> @@ -261,6 +266,16 @@ static inline bool adreno_reg_check(struct adreno_gpu *gpu,
>                         !gpu->reg_offsets[offset_name]) {
>                 BUG();
>         }
> +
> +       /*
> +        * REG_SKIP is a special value that tell us that the register in
> +        * question isn't implemented on target but don't trigger a BUG(). This
> +        * is used to cleanly implement adreno_gpu_write64() and
> +        * adreno_gpu_read64() in a generic fashion
> +        */
> +       if (gpu->reg_offsets[offset_name] == REG_SKIP)
> +               return false;
> +
>         return true;
>  }
>
> @@ -282,4 +297,11 @@ static inline void adreno_gpu_write(struct adreno_gpu *gpu,
>                 gpu_write(&gpu->base, reg - 1, data);
>  }
>
> +static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
> +               enum adreno_regs lo, enum adreno_regs hi, u64 data)
> +{
> +       adreno_gpu_write(gpu, lo, lower_32_bits(data));
> +       adreno_gpu_write(gpu, hi, upper_32_bits(data));
> +}
> +
>  #endif /* __ADRENO_GPU_H__ */
> --
> 1.9.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Freedreno] [RFC] Initial support for the Adreno A5XX
  2016-11-04 22:44 [RFC] Initial support for the Adreno A5XX Jordan Crouse
                   ` (3 preceding siblings ...)
  2016-11-04 22:44 ` [PATCH 14/16] firmware: qcom_scm: Add qcom_scm_gpu_zap_resume() Jordan Crouse
@ 2016-11-08 17:12 ` Jordan Crouse
  4 siblings, 0 replies; 29+ messages in thread
From: Jordan Crouse @ 2016-11-08 17:12 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm

On Fri, Nov 04, 2016 at 04:44:41PM -0600, Jordan Crouse wrote:
> Here is the first blast of patches for support for the fine family of Adreno
> A5XX GPUs. These are designed to work against Rob's version of the Linaro
> target tree with some additional 8996 clock stuff added in:
> 
> https://github.com/freedreno/kernel-msm/tree/a5xx
> 
> In addition, you will also need these SCM changes for the last three patches to
> work:
> 
> https://patchwork.kernel.org/patch/9291593/ arm64: kernel: Add SMC Session ID to
> results
> https://patchwork.kernel.org/patch/9291595/ firmware: qcom: scm: Fix interrupted
> SCM calls
> https://patchwork.kernel.org/patch/9353735/ firmware: qcom: scm: Fix interrupted
> SCM calls fully
> https://patchwork.codeaurora.org/patch/109673/ dt-bindings: firmware: scm: Add
> MSM8996 DT bindings
> https://patchwork.codeaurora.org/patch/109675/ firmware: qcom: scm: Remove core,
> iface and bus clocks dependency
> https://patchwork.codeaurora.org/patch/109671/ firmware: qcom: scm: Return
> PTR_ERR when devm_clk_get fails
 
Note, you'll also want this to add the SCM DT bindings:

https://patchwork.codeaurora.org/patch/106901/

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-11-08 17:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-04 22:44 [RFC] Initial support for the Adreno A5XX Jordan Crouse
     [not found] ` <1478299497-9729-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-11-04 22:44   ` [PATCH 01/16] drm/msm: Remove dependency on COMMON_CLK Jordan Crouse
2016-11-04 22:44   ` [PATCH 02/16] drm/msm: Rename the MSM driver so it doesn't conflict with other drivers Jordan Crouse
2016-11-04 22:44   ` [PATCH 03/16] drm/msm: gpu: Cut down the list of "generic" registers to the ones we use Jordan Crouse
2016-11-04 22:44   ` [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages Jordan Crouse
2016-11-06 14:15     ` [Freedreno] " Rob Clark
     [not found]       ` <CAF6AEGtDv8tRZi82Eno5RF6a58qSRpjYcUo-J8dDDioDLLJqmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-07  8:35         ` Archit Taneja
     [not found]           ` <99a66f0f-ec84-a26e-0108-60367362c29e-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-11-07 12:19             ` Rob Clark
2016-11-07 18:01               ` [Freedreno] " Jordan Crouse
2016-11-04 22:44   ` [PATCH 05/16] drm/msm: gpu: Return error on hw_init failure Jordan Crouse
     [not found]     ` <1478299497-9729-6-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-11-07 18:54       ` Rob Clark
2016-11-04 22:44   ` [PATCH 06/16] drm/msm: gpu: Add OUT_TYPE4 and OUT_TYPE7 Jordan Crouse
2016-11-04 22:44   ` [PATCH 07/16] drm/msm: Add adreno_gpu_write64() Jordan Crouse
2016-11-07 19:19     ` [Freedreno] " Rob Clark
2016-11-04 22:44   ` [PATCH 08/16] drm/msm: Remove 'src_clk' from adreno configuration Jordan Crouse
2016-11-04 22:44   ` [PATCH 09/16] drm/msm: gpu Add new gpu register read/write functions Jordan Crouse
     [not found]     ` <1478299497-9729-10-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-11-07 19:17       ` Rob Clark
2016-11-04 22:44   ` [PATCH 10/16] drm/msm: Disable interrupts during init Jordan Crouse
2016-11-04 22:44   ` [PATCH 13/16] drm/msm: gpu: Add support for the GPMU Jordan Crouse
     [not found]     ` <1478299497-9729-14-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-11-07 12:58       ` Stanimir Varbanov
2016-11-07 13:02         ` [Freedreno] " Rob Clark
     [not found]           ` <CAF6AEGuW6ThJM-+X-=XGtqTCY_hcq8DghJHRf38OWjy4Z3R=DQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-07 14:47             ` Stanimir Varbanov
     [not found]         ` <740c4fda-dfd6-7a70-9cb7-3eec6a5781ca-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-11-07 18:09           ` Jordan Crouse
2016-11-04 22:44   ` [PATCH 15/16] drm/msm: Add a quick and dirty PIL loader Jordan Crouse
2016-11-04 22:44   ` [PATCH 16/16] drm/msm: gpu: Use the zap shader on 5XX if we can Jordan Crouse
2016-11-04 22:44 ` [PATCH 11/16] arm64: dts: Add Adreno GPU and GPU smmu definitions Jordan Crouse
2016-11-04 22:44 ` [PATCH 12/16] drm/msm: gpu: Add A5XX target support Jordan Crouse
2016-11-04 22:44 ` [PATCH 14/16] firmware: qcom_scm: Add qcom_scm_gpu_zap_resume() Jordan Crouse
2016-11-08 17:12 ` [Freedreno] [RFC] Initial support for the Adreno A5XX Jordan Crouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).