[PATCH v3 0/8] Improve GPU Recovery

* [PATCH v3 0/8] Improve GPU Recovery
@ 2022-07-30  9:40 ` Akhil P Oommen
  0 siblings, 0 replies; 35+ messages in thread
From: Akhil P Oommen @ 2022-07-30  9:40 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark, Bjorn Andersson
  Cc: Jordan Crouse, Jonathan Marek, Douglas Anderson,
	Matthias Kaehlcke, Akhil P Oommen, Abhinav Kumar,
	AngeloGioacchino Del Regno, Chia-I Wu, Dan Carpenter,
	Daniel Vetter, David Airlie, Dmitry Baryshkov, Nathan Chancellor,
	Philipp Zabel, Sean Paul, Stephen Boyd, Vladimir Lypak,
	Wang Qing, linux-kernel

Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.

First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.

Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.

This series is rebased on top of [1] which based on linus's master
branch.

[1] https://patchwork.freedesktop.org/series/106860/

Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
  https://patchwork.freedesktop.org/series/106860/
- Use single pm refcount for all active submits

Changes in v2:
- Rebased on msm-next tip

Akhil P Oommen (8):
  drm/msm: Remove unnecessary pm_runtime_get/put
  drm/msm: Take single rpm refcount on behalf of all submits
  drm/msm: Correct pm_runtime votes in recover worker
  drm/msm: Fix cx collapse issue during recovery
  drm/msm/a6xx: Ensure CX collapse during gpu recovery
  drm/msm/adreno: Remove a WARN() during runtime_suspend
  drm/msm/a6xx: Improve gpu recovery sequence
  drm/msm/a6xx: Handle GMU prepare-slumber hfi failure

 drivers/gpu/drm/msm/adreno/a6xx.xml.h      |  4 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c      | 83 +++++++++++++++++++-----------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      | 35 +++++++++++--
 drivers/gpu/drm/msm/adreno/adreno_device.c |  7 ---
 drivers/gpu/drm/msm/msm_gpu.c              | 21 +++++---
 drivers/gpu/drm/msm/msm_gpu.h              |  4 ++
 drivers/gpu/drm/msm/msm_ringbuffer.c       |  4 --
 7 files changed, 106 insertions(+), 52 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 35+ messages in thread