linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Improve GPU Recovery
@ 2022-08-17 15:14 Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Chia-I Wu,
	Dan Carpenter, Daniel Vetter, David Airlie, Philipp Zabel,
	Sean Paul, Wang Qing, linux-kernel


Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.

First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.

Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.

This series is rebased on top of v2 version of [1] which is based on
linus's master branch.

[1] https://patchwork.freedesktop.org/series/106860/

Changes in v4:
- Keep active_submit lock across the suspend & resume (Rob)
- Clear gpu->active_submits to silence a WARN() during runpm suspend (Rob)

Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
  https://patchwork.freedesktop.org/series/106860/
- Use single pm refcount for all active submits

Changes in v2:
- Rebased on msm-next tip

Akhil P Oommen (7):
  drm/msm: Remove unnecessary pm_runtime_get/put
  drm/msm: Take single rpm refcount on behalf of all submits
  drm/msm: Correct pm_runtime votes in recover worker
  drm/msm: Fix cx collapse issue during recovery
  drm/msm/a6xx: Ensure CX collapse during gpu recovery
  drm/msm/a6xx: Improve gpu recovery sequence
  drm/msm/a6xx: Handle GMU prepare-slumber hfi failure

 drivers/gpu/drm/msm/adreno/a6xx.xml.h |  4 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 ++++++++++++++++++++++-------------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 43 ++++++++++++++++--
 drivers/gpu/drm/msm/msm_gpu.c         | 21 ++++++---
 drivers/gpu/drm/msm/msm_gpu.h         |  4 ++
 drivers/gpu/drm/msm/msm_ringbuffer.c  |  4 --
 6 files changed, 114 insertions(+), 45 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/7] drm/msm: Remove unnecessary pm_runtime_get/put
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 2/7] drm/msm: Take single rpm refcount on behalf of all submits Akhil P Oommen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Daniel Vetter,
	David Airlie, Sean Paul, linux-kernel

We already enable gpu power from msm_gpu_submit(), so avoid a duplicate
pm_runtime_get/put from msm_job_run().

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

(no changes since v1)

 drivers/gpu/drm/msm/msm_ringbuffer.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 56eecb4..cad4c35 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -29,8 +29,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 		msm_gem_unlock(obj);
 	}
 
-	pm_runtime_get_sync(&gpu->pdev->dev);
-
 	/* TODO move submit path over to using a per-ring lock.. */
 	mutex_lock(&gpu->lock);
 
@@ -38,8 +36,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 
 	mutex_unlock(&gpu->lock);
 
-	pm_runtime_put(&gpu->pdev->dev);
-
 	return dma_fence_get(submit->hw_fence);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/7] drm/msm: Take single rpm refcount on behalf of all submits
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 3/7] drm/msm: Correct pm_runtime votes in recover worker Akhil P Oommen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Daniel Vetter,
	David Airlie, Sean Paul, linux-kernel

Instead of separate refcount for each submit, take single rpm refcount
on behalf of all the submits. This makes it easier to drop the rpm
refcount during recovery in an upcoming patch.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

(no changes since v3)

Changes in v3:
- New patch

 drivers/gpu/drm/msm/msm_gpu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index c8cd9bf..e1dd3cc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -663,11 +663,12 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 	mutex_lock(&gpu->active_lock);
 	gpu->active_submits--;
 	WARN_ON(gpu->active_submits < 0);
-	if (!gpu->active_submits)
+	if (!gpu->active_submits) {
 		msm_devfreq_idle(gpu);
-	mutex_unlock(&gpu->active_lock);
+		pm_runtime_put_autosuspend(&gpu->pdev->dev);
+	}
 
-	pm_runtime_put_autosuspend(&gpu->pdev->dev);
+	mutex_unlock(&gpu->active_lock);
 
 	msm_gem_submit_put(submit);
 }
@@ -756,14 +757,17 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 	/* Update devfreq on transition from idle->active: */
 	mutex_lock(&gpu->active_lock);
-	if (!gpu->active_submits)
+	if (!gpu->active_submits) {
+		pm_runtime_get(&gpu->pdev->dev);
 		msm_devfreq_active(gpu);
+	}
 	gpu->active_submits++;
 	mutex_unlock(&gpu->active_lock);
 
 	gpu->funcs->submit(gpu, submit);
 	gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
 
+	pm_runtime_put(&gpu->pdev->dev);
 	hangcheck_timer_reset(gpu);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/7] drm/msm: Correct pm_runtime votes in recover worker
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 2/7] drm/msm: Take single rpm refcount on behalf of all submits Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery Akhil P Oommen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Daniel Vetter,
	David Airlie, Sean Paul, linux-kernel

In the scenario where there is one a single submit which is hung, gpu is
power collapsed when it is retired. Because of this, by the time we call
reover(), gpu state would be already clear. Fix this by correctly
managing the pm runtime votes.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

(no changes since v1)

 drivers/gpu/drm/msm/msm_gpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index e1dd3cc..1945efb 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -398,7 +398,6 @@ static void recover_worker(struct kthread_work *work)
 	/* Record the crash state */
 	pm_runtime_get_sync(&gpu->pdev->dev);
 	msm_gpu_crashstate_capture(gpu, submit, comm, cmd);
-	pm_runtime_put_sync(&gpu->pdev->dev);
 
 	kfree(cmd);
 	kfree(comm);
@@ -446,6 +445,8 @@ static void recover_worker(struct kthread_work *work)
 		}
 	}
 
+	pm_runtime_put_sync(&gpu->pdev->dev);
+
 	mutex_unlock(&gpu->lock);
 
 	msm_gpu_retire(gpu);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
                   ` (2 preceding siblings ...)
  2022-08-17 15:14 ` [PATCH v4 3/7] drm/msm: Correct pm_runtime votes in recover worker Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery Akhil P Oommen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Chia-I Wu,
	Daniel Vetter, David Airlie, Sean Paul, linux-kernel

There are some hardware logic under CX domain. For a successful
recovery, we should ensure cx headswitch collapses to ensure all the
stale states are cleard out. This is especially true to for a6xx family
where we can GMU co-processor.

Currently, cx doesn't collapse due to a devlink between gpu and its
smmu. So the *struct gpu device* needs to be runtime suspended to ensure
that the iommu driver removes its vote on cx gdsc.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

Changes in v4:
- Keep active_submit lock across the suspend & resume (Rob)
- Clear gpu->active_submits to silence a WARN() during runpm suspend (Rob)

Changes in v3:
- Simplied the pm refcount drop since we have just a single refcount now
for all active submits

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 32 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/msm/msm_gpu.c         |  4 +---
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 42ed9a3..0c8f19e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1193,7 +1193,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
-	int i;
+	int i, active_submits;
 
 	adreno_dump_info(gpu);
 
@@ -1210,8 +1210,34 @@ static void a6xx_recover(struct msm_gpu *gpu)
 	 */
 	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
 
-	gpu->funcs->pm_suspend(gpu);
-	gpu->funcs->pm_resume(gpu);
+	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
+
+	/* active_submit won't change until we make a submission */
+	mutex_lock(&gpu->active_lock);
+	active_submits = gpu->active_submits;
+
+	/*
+	 * Temporarily clear active_submits count to silence a WARN() in the
+	 * runtime suspend cb
+	 */
+	gpu->active_submits = 0;
+
+	/* Drop the rpm refcount from active submits */
+	if (active_submits)
+		pm_runtime_put(&gpu->pdev->dev);
+
+	/* And the final one from recover worker */
+	pm_runtime_put_sync(&gpu->pdev->dev);
+
+	pm_runtime_use_autosuspend(&gpu->pdev->dev);
+
+	if (active_submits)
+		pm_runtime_get(&gpu->pdev->dev);
+
+	pm_runtime_get_sync(&gpu->pdev->dev);
+
+	gpu->active_submits = active_submits;
+	mutex_unlock(&gpu->active_lock);
 
 	msm_gpu_hw_init(gpu);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 1945efb..07e55a6 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -426,9 +426,7 @@ static void recover_worker(struct kthread_work *work)
 		/* retire completed submits, plus the one that hung: */
 		retire_submits(gpu);
 
-		pm_runtime_get_sync(&gpu->pdev->dev);
 		gpu->funcs->recover(gpu);
-		pm_runtime_put_sync(&gpu->pdev->dev);
 
 		/*
 		 * Replay all remaining submits starting with highest priority
@@ -445,7 +443,7 @@ static void recover_worker(struct kthread_work *work)
 		}
 	}
 
-	pm_runtime_put_sync(&gpu->pdev->dev);
+	pm_runtime_put(&gpu->pdev->dev);
 
 	mutex_unlock(&gpu->lock);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
                   ` (3 preceding siblings ...)
  2022-08-17 15:14 ` [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-18  8:54   ` Philipp Zabel
  2022-08-17 15:14 ` [PATCH v4 6/7] drm/msm/a6xx: Improve gpu recovery sequence Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure Akhil P Oommen
  6 siblings, 1 reply; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Chia-I Wu,
	Daniel Vetter, David Airlie, Philipp Zabel, Sean Paul,
	linux-kernel

Because there could be transient votes from other drivers/tz/hyp which
may keep the cx gdsc enabled, we should poll until cx gdsc collapses.
We can use the reset framework to poll for cx gdsc collapse from gpucc
clk driver.

This feature requires support from the platform's gpucc driver.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---

(no changes since v3)

Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
  https://patchwork.freedesktop.org/series/106860/

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++
 drivers/gpu/drm/msm/msm_gpu.c         | 4 ++++
 drivers/gpu/drm/msm/msm_gpu.h         | 4 ++++
 3 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 0c8f19e..0ec4fcd 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -10,6 +10,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/devfreq.h>
+#include <linux/reset.h>
 #include <linux/soc/qcom/llcc-qcom.h>
 
 #define GPU_PAS_ID 13
@@ -1229,6 +1230,9 @@ static void a6xx_recover(struct msm_gpu *gpu)
 	/* And the final one from recover worker */
 	pm_runtime_put_sync(&gpu->pdev->dev);
 
+	/* Call into gpucc driver to poll for cx gdsc collapse */
+	reset_control_reset(gpu->cx_collapse);
+
 	pm_runtime_use_autosuspend(&gpu->pdev->dev);
 
 	if (active_submits)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 07e55a6..4a57627 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -14,6 +14,7 @@
 #include <generated/utsrelease.h>
 #include <linux/string_helpers.h>
 #include <linux/devcoredump.h>
+#include <linux/reset.h>
 #include <linux/sched/task.h>
 
 /*
@@ -903,6 +904,9 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	if (IS_ERR(gpu->gpu_cx))
 		gpu->gpu_cx = NULL;
 
+	gpu->cx_collapse = devm_reset_control_get_optional(&pdev->dev,
+			"cx_collapse");
+
 	gpu->pdev = pdev;
 	platform_set_drvdata(pdev, &gpu->adreno_smmu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6def008..ab59fd2 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -13,6 +13,7 @@
 #include <linux/interconnect.h>
 #include <linux/pm_opp.h>
 #include <linux/regulator/consumer.h>
+#include <linux/reset.h>
 
 #include "msm_drv.h"
 #include "msm_fence.h"
@@ -268,6 +269,9 @@ struct msm_gpu {
 	bool hw_apriv;
 
 	struct thermal_cooling_device *cooling;
+
+	/* To poll for cx gdsc collapse during gpu recovery */
+	struct reset_control *cx_collapse;
 };
 
 static inline struct msm_gpu *dev_to_gpu(struct device *dev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 6/7] drm/msm/a6xx: Improve gpu recovery sequence
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
                   ` (4 preceding siblings ...)
  2022-08-17 15:14 ` [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  2022-08-17 15:14 ` [PATCH v4 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure Akhil P Oommen
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Chia-I Wu,
	Dan Carpenter, Daniel Vetter, David Airlie, Sean Paul, Wang Qing,
	linux-kernel

We can do a few more things to improve our chance at a successful gpu
recovery, especially during a hangcheck timeout:
1. Halt CP and GMU core
2. Do RBBM GBIF HALT sequence
3. Do a soft reset of GPU core

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

(no changes since v1)

 drivers/gpu/drm/msm/adreno/a6xx.xml.h |  4 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 77 +++++++++++++++++++++--------------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  7 ++++
 3 files changed, 58 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
index b03e2c4..beea4a7 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
@@ -1413,6 +1413,10 @@ static inline uint32_t REG_A6XX_RBBM_PERFCTR_RBBM_SEL(uint32_t i0) { return 0x00
 
 #define REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL			0x00000011
 
+#define REG_A6XX_RBBM_GBIF_HALT					0x00000016
+
+#define REG_A6XX_RBBM_GBIF_HALT_ACK				0x00000017
+
 #define REG_A6XX_RBBM_WAIT_FOR_GPU_IDLE_CMD			0x0000001c
 #define A6XX_RBBM_WAIT_FOR_GPU_IDLE_CMD_WAIT_GPU_IDLE		0x00000001
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 9f76f5b..db05942 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -869,9 +869,47 @@ static void a6xx_gmu_rpmh_off(struct a6xx_gmu *gmu)
 		(val & 1), 100, 1000);
 }
 
+#define GBIF_CLIENT_HALT_MASK             BIT(0)
+#define GBIF_ARB_HALT_MASK                BIT(1)
+
+static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
+{
+	struct msm_gpu *gpu = &adreno_gpu->base;
+
+	if (!a6xx_has_gbif(adreno_gpu)) {
+		gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
+		spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
+								0xf) == 0xf);
+		gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
+
+		return;
+	}
+
+	/* Halt the gx side of GBIF */
+	gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 1);
+	spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) & 1);
+
+	/* Halt new client requests on GBIF */
+	gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
+	spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
+			(GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
+
+	/* Halt all AXI requests on GBIF */
+	gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
+	spin_until((gpu_read(gpu,  REG_A6XX_GBIF_HALT_ACK) &
+			(GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
+
+	/* The GBIF halt needs to be explicitly cleared */
+	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
+}
+
 /* Force the GMU off in case it isn't responsive */
 static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
 {
+	struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
+	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+	struct msm_gpu *gpu = &adreno_gpu->base;
+
 	/* Flush all the queues */
 	a6xx_hfi_stop(gmu);
 
@@ -883,6 +921,15 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
 
 	/* Make sure there are no outstanding RPMh votes */
 	a6xx_gmu_rpmh_off(gmu);
+
+	/* Halt the gmu cm3 core */
+	gmu_write(gmu, REG_A6XX_GMU_CM3_SYSRESET, 1);
+
+	a6xx_bus_clear_pending_transactions(adreno_gpu);
+
+	/* Reset GPU core blocks */
+	gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
+	udelay(100);
 }
 
 static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
@@ -1010,36 +1057,6 @@ bool a6xx_gmu_isidle(struct a6xx_gmu *gmu)
 	return true;
 }
 
-#define GBIF_CLIENT_HALT_MASK             BIT(0)
-#define GBIF_ARB_HALT_MASK                BIT(1)
-
-static void a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
-{
-	struct msm_gpu *gpu = &adreno_gpu->base;
-
-	if (!a6xx_has_gbif(adreno_gpu)) {
-		gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0xf);
-		spin_until((gpu_read(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL1) &
-								0xf) == 0xf);
-		gpu_write(gpu, REG_A6XX_VBIF_XIN_HALT_CTRL0, 0);
-
-		return;
-	}
-
-	/* Halt new client requests on GBIF */
-	gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
-	spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
-			(GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
-
-	/* Halt all AXI requests on GBIF */
-	gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
-	spin_until((gpu_read(gpu,  REG_A6XX_GBIF_HALT_ACK) &
-			(GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
-
-	/* The GBIF halt needs to be explicitly cleared */
-	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
-}
-
 /* Gracefully try to shut down the GMU and by extension the GPU */
 static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)
 {
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 0ec4fcd..a9335bc 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -920,6 +920,10 @@ static int hw_init(struct msm_gpu *gpu)
 	/* Make sure the GMU keeps the GPU on while we set it up */
 	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
 
+	/* Clear GBIF halt in case GX domain was not collapsed */
+	if (a6xx_has_gbif(adreno_gpu))
+		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
+
 	gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
 
 	/*
@@ -1205,6 +1209,9 @@ static void a6xx_recover(struct msm_gpu *gpu)
 	if (hang_debug)
 		a6xx_dump(gpu);
 
+	/* Halt SQE first */
+	gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 3);
+
 	/*
 	 * Turn off keep alive that might have been enabled by the hang
 	 * interrupt
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure
  2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
                   ` (5 preceding siblings ...)
  2022-08-17 15:14 ` [PATCH v4 6/7] drm/msm/a6xx: Improve gpu recovery sequence Akhil P Oommen
@ 2022-08-17 15:14 ` Akhil P Oommen
  6 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2022-08-17 15:14 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov
  Cc: Matthias Kaehlcke, Jonathan Marek, Jordan Crouse,
	Douglas Anderson, Akhil P Oommen, Abhinav Kumar, Dan Carpenter,
	Daniel Vetter, David Airlie, Sean Paul, Wang Qing, linux-kernel

When prepare-slumber hfi fails, we should follow a6xx_gmu_force_off()
sequence.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

(no changes since v1)

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index db05942..3d00ef9 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1082,7 +1082,11 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)
 		a6xx_bus_clear_pending_transactions(adreno_gpu);
 
 		/* tell the GMU we want to slumber */
-		a6xx_gmu_notify_slumber(gmu);
+		ret = a6xx_gmu_notify_slumber(gmu);
+		if (ret) {
+			a6xx_gmu_force_off(gmu);
+			return;
+		}
 
 		ret = gmu_poll_timeout(gmu,
 			REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_STATUS, val,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery
  2022-08-17 15:14 ` [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery Akhil P Oommen
@ 2022-08-18  8:54   ` Philipp Zabel
  0 siblings, 0 replies; 9+ messages in thread
From: Philipp Zabel @ 2022-08-18  8:54 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson,
	Dmitry Baryshkov, Matthias Kaehlcke, Jonathan Marek,
	Jordan Crouse, Douglas Anderson, Abhinav Kumar, Chia-I Wu,
	Daniel Vetter, David Airlie, Sean Paul, linux-kernel

Hi Akhil,

On Wed, Aug 17, 2022 at 08:44:18PM +0530, Akhil P Oommen wrote:
> Because there could be transient votes from other drivers/tz/hyp which
> may keep the cx gdsc enabled, we should poll until cx gdsc collapses.
> We can use the reset framework to poll for cx gdsc collapse from gpucc
> clk driver.
> 
> This feature requires support from the platform's gpucc driver.
> 
> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
> 
> (no changes since v3)
> 
> Changes in v3:
> - Use reset interface from gpucc driver to poll for cx gdsc collapse
>   https://patchwork.freedesktop.org/series/106860/
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++
>  drivers/gpu/drm/msm/msm_gpu.c         | 4 ++++
>  drivers/gpu/drm/msm/msm_gpu.h         | 4 ++++
>  3 files changed, 12 insertions(+)
> 
[...]
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index 07e55a6..4a57627 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
[...]
> @@ -903,6 +904,9 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	if (IS_ERR(gpu->gpu_cx))
>  		gpu->gpu_cx = NULL;
>  
> +	gpu->cx_collapse = devm_reset_control_get_optional(&pdev->dev,
> +			"cx_collapse");

Please use devm_reset_control_get_optional_exclusive() instead.

With that,
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>

regards
Philipp

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-18  8:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17 15:14 [PATCH v4 0/7] Improve GPU Recovery Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 2/7] drm/msm: Take single rpm refcount on behalf of all submits Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 3/7] drm/msm: Correct pm_runtime votes in recover worker Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery Akhil P Oommen
2022-08-18  8:54   ` Philipp Zabel
2022-08-17 15:14 ` [PATCH v4 6/7] drm/msm/a6xx: Improve gpu recovery sequence Akhil P Oommen
2022-08-17 15:14 ` [PATCH v4 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure Akhil P Oommen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).