linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/6] drm/msm: Increase gpu boost interval
@ 2021-11-23 21:17 Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 2/6] drm/msm: Fix null ptr dereference in msm_ioctl_gem_submit() Akhil P Oommen
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, Daniel Vetter, David Airlie, Sean Paul,
	linux-kernel

Currently, we boost gpu freq after 25ms of inactivity. This regresses
some of the 30 fps usecases where the workload on gpu (at 33ms internval)
is very small which it can finish at the lowest OPP before the deadline.
Lets increase this inactivity threshold to 50ms (same as the current
devfreq interval) to fix this.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

Changes in v2:
- Added patch (5) & (6) to this stack

 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index 84e98c0..5028f92 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -167,7 +167,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
 	 * interval, then we won't meet the threshold of busyness for
 	 * the governor to ramp up the freq.. so give some boost
 	 */
-	if (idle_time > msm_devfreq_profile.polling_ms/2) {
+	if (idle_time > msm_devfreq_profile.polling_ms) {
 		target_freq *= 2;
 	}
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/6] drm/msm: Fix null ptr dereference in msm_ioctl_gem_submit()
  2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
@ 2021-11-23 21:17 ` Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 3/6] drm/msm/a6xx: Fix smatch warning for gpu_scid Akhil P Oommen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, Daniel Vetter, David Airlie, Sean Paul,
	linux-kernel

Fix the below null pointer dereference in msm_ioctl_gem_submit():

 26545.260705:   Call trace:
 26545.263223:    kref_put+0x1c/0x60
 26545.266452:    msm_ioctl_gem_submit+0x254/0x744
 26545.270937:    drm_ioctl_kernel+0xa8/0x124
 26545.274976:    drm_ioctl+0x21c/0x33c
 26545.278478:    drm_compat_ioctl+0xdc/0xf0
 26545.282428:    __arm64_compat_sys_ioctl+0xc8/0x100
 26545.287169:    el0_svc_common+0xf8/0x250
 26545.291025:    do_el0_svc_compat+0x28/0x54
 26545.295066:    el0_svc_compat+0x10/0x1c
 26545.298838:    el0_sync_compat_handler+0xa8/0xcc
 26545.303403:    el0_sync_compat+0x188/0x1c0
 26545.307445:   Code: d503201f d503201f 52800028 4b0803e8 (b8680008)
 26545.313703:   ---[ end trace 5c93eb55e485b259 ]---
 26545.318799:   Kernel panic - not syncing: Oops: Fatal exception

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

(no changes since v1)

 drivers/gpu/drm/msm/msm_gem_submit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index ac23bbd..88a6cd5 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -780,6 +780,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		args->nr_cmds);
 	if (IS_ERR(submit)) {
 		ret = PTR_ERR(submit);
+		submit = NULL;
 		goto out_unlock;
 	}
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/6] drm/msm/a6xx: Fix smatch warning for gpu_scid
  2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 2/6] drm/msm: Fix null ptr dereference in msm_ioctl_gem_submit() Akhil P Oommen
@ 2021-11-23 21:17 ` Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump Akhil P Oommen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, Daniel Vetter, David Airlie, Jordan Crouse,
	Sean Paul, Sharat Masetty, Stephen Boyd, Stéphane Marchesin,
	linux-kernel

Fix the below smatch warning:
	drivers/gpu/drm/msm/adreno/a6xx_gpu.c:1480 a6xx_llc_activate()
	error: uninitialized symbol 'gpu_scid'.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

(no changes since v1)

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index b7f11cd..6c2edce 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1411,17 +1411,24 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 {
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
 	struct msm_gpu *gpu = &adreno_gpu->base;
-	u32 gpu_scid, cntl1_regval = 0;
+	u32 cntl1_regval = 0;
 
 	if (IS_ERR(a6xx_gpu->llc_mmio))
 		return;
 
 	if (!llcc_slice_activate(a6xx_gpu->llc_slice)) {
-		gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
+		u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
 
 		gpu_scid &= 0x1f;
 		cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 10) |
 			       (gpu_scid << 15) | (gpu_scid << 20);
+
+		/* On A660, the SCID programming for UCHE traffic is done in
+		 * A6XX_GBIF_SCACHE_CNTL0[14:10]
+		 */
+		if (adreno_is_a660_family(adreno_gpu))
+			gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL0, (0x1f << 10) |
+				(1 << 8), (gpu_scid << 10) | (1 << 8));
 	}
 
 	/*
@@ -1458,13 +1465,6 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 	}
 
 	gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0), cntl1_regval);
-
-	/* On A660, the SCID programming for UCHE traffic is done in
-	 * A6XX_GBIF_SCACHE_CNTL0[14:10]
-	 */
-	if (adreno_is_a660_family(adreno_gpu))
-		gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL0, (0x1f << 10) |
-			(1 << 8), (gpu_scid << 10) | (1 << 8));
 }
 
 static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump
  2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 2/6] drm/msm: Fix null ptr dereference in msm_ioctl_gem_submit() Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 3/6] drm/msm/a6xx: Fix smatch warning for gpu_scid Akhil P Oommen
@ 2021-11-23 21:17 ` Akhil P Oommen
  2021-11-23 23:03   ` Bjorn Andersson
  2021-11-23 21:17 ` [PATCH v2 5/6] drm/msm: Add a module param to force coredump Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump Akhil P Oommen
  4 siblings, 1 reply; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, AngeloGioacchino Del Regno, Bjorn Andersson,
	Daniel Vetter, David Airlie, Konrad Dybcio, Lee Jones,
	Linux Patches Robot, Sai Prakash Ranjan, Sean Paul,
	Sharat Masetty, Stephen Boyd, Stéphane Marchesin,
	linux-kernel

Capture gmu log in coredump to enhance debugging.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

Changes in v2:
- Fix kernel test robot's warning about size_t's format specifier

 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 +++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h     |  2 ++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index e8f65cd..e6f5571 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -42,6 +42,8 @@ struct a6xx_gpu_state {
 	struct a6xx_gpu_state_obj *cx_debugbus;
 	int nr_cx_debugbus;
 
+	struct msm_gpu_state_bo *gmu_log;
+
 	struct list_head objs;
 };
 
@@ -800,6 +802,30 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
 		&a6xx_state->gmu_registers[2], false);
 }
 
+static void a6xx_get_gmu_log(struct msm_gpu *gpu,
+		struct a6xx_gpu_state *a6xx_state)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+	struct msm_gpu_state_bo *gmu_log;
+
+	gmu_log = state_kcalloc(a6xx_state,
+		1, sizeof(*a6xx_state->gmu_log));
+	if (!gmu_log)
+		return;
+
+	gmu_log->iova = gmu->log.iova;
+	gmu_log->size = gmu->log.size;
+	gmu_log->data = kvzalloc(gmu_log->size, GFP_KERNEL);
+	if (!gmu_log->data)
+		return;
+
+	memcpy(gmu_log->data, gmu->log.virt, gmu->log.size);
+
+	a6xx_state->gmu_log = gmu_log;
+}
+
 #define A6XX_GBIF_REGLIST_SIZE   1
 static void a6xx_get_registers(struct msm_gpu *gpu,
 		struct a6xx_gpu_state *a6xx_state,
@@ -937,6 +963,8 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
 
 	a6xx_get_gmu_registers(gpu, a6xx_state);
 
+	a6xx_get_gmu_log(gpu, a6xx_state);
+
 	/* If GX isn't on the rest of the data isn't going to be accessible */
 	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
 		return &a6xx_state->base;
@@ -978,6 +1006,9 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
 	struct a6xx_gpu_state *a6xx_state = container_of(state,
 			struct a6xx_gpu_state, base);
 
+	if (a6xx_state->gmu_log && a6xx_state->gmu_log->data)
+		kvfree(a6xx_state->gmu_log->data);
+
 	list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
 		kfree(obj);
 
@@ -1191,6 +1222,16 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 	adreno_show(gpu, state, p);
 
+	drm_puts(p, "gmu-log:\n");
+	if (a6xx_state->gmu_log) {
+		struct msm_gpu_state_bo *gmu_log = a6xx_state->gmu_log;
+
+		drm_printf(p, "    iova: 0x%016llx\n", gmu_log->iova);
+		drm_printf(p, "    size: %zu\n", gmu_log->size);
+		adreno_show_object(p, &gmu_log->data, gmu_log->size,
+				&gmu_log->encoded);
+	}
+
 	drm_puts(p, "registers:\n");
 	for (i = 0; i < a6xx_state->nr_registers; i++) {
 		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 1539b8e..b43346e 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -638,7 +638,7 @@ static char *adreno_gpu_ascii85_encode(u32 *src, size_t len)
 }
 
 /* len is expected to be in bytes */
-static void adreno_show_object(struct drm_printer *p, void **ptr, int len,
+void adreno_show_object(struct drm_printer *p, void **ptr, int len,
 		bool *encoded)
 {
 	if (!*ptr || !len)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 225c277..6762308 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -306,6 +306,8 @@ void adreno_gpu_state_destroy(struct msm_gpu_state *state);
 
 int adreno_gpu_state_get(struct msm_gpu *gpu, struct msm_gpu_state *state);
 int adreno_gpu_state_put(struct msm_gpu_state *state);
+void adreno_show_object(struct drm_printer *p, void **ptr, int len,
+		bool *encoded);
 
 /*
  * Common helper function to initialize the default address space for arm-smmu
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/6] drm/msm: Add a module param to force coredump
  2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
                   ` (2 preceding siblings ...)
  2021-11-23 21:17 ` [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump Akhil P Oommen
@ 2021-11-23 21:17 ` Akhil P Oommen
  2021-11-23 21:17 ` [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump Akhil P Oommen
  4 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, AngeloGioacchino Del Regno, Bjorn Andersson,
	Daniel Vetter, David Airlie, Iskren Chernev, Jordan Crouse,
	Konrad Dybcio, Marijn Suijten, Sean Paul, Sharat Masetty,
	Stephen Boyd, Stéphane Marchesin, linux-kernel

Add a module param "force_gpu_coredump" to force coredump on relatively
harmless gpu hw errors.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

(no changes since v1)

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c      | 33 ++++++++++++++++++--------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      | 38 +++++++++++++++++++++---------
 drivers/gpu/drm/msm/adreno/adreno_device.c |  4 ++++
 3 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 5e2750e..1861e9a 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -14,6 +14,7 @@
 #include "a5xx_gpu.h"
 
 extern bool hang_debug;
+extern bool force_gpu_coredump;
 static void a5xx_dump(struct msm_gpu *gpu);
 
 #define GPU_PAS_ID 13
@@ -1237,11 +1238,6 @@ static void a5xx_fault_detect_irq(struct msm_gpu *gpu)
 		gpu_read(gpu, REG_A5XX_CP_IB1_BUFSZ),
 		gpu_read64(gpu, REG_A5XX_CP_IB2_BASE, REG_A5XX_CP_IB2_BASE_HI),
 		gpu_read(gpu, REG_A5XX_CP_IB2_BUFSZ));
-
-	/* Turn off the hangcheck timer to keep it from bothering us */
-	del_timer(&gpu->hangcheck_timer);
-
-	kthread_queue_work(gpu->worker, &gpu->recover_work);
 }
 
 #define RBBM_ERROR_MASK \
@@ -1255,6 +1251,7 @@ static void a5xx_fault_detect_irq(struct msm_gpu *gpu)
 static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
 {
 	u32 status = gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS);
+	bool coredump = false;
 
 	/*
 	 * Clear all the interrupts except RBBM_AHB_ERROR - if we clear it
@@ -1264,20 +1261,30 @@ static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
 		status & ~A5XX_RBBM_INT_0_MASK_RBBM_AHB_ERROR);
 
 	/* Pass status to a5xx_rbbm_err_irq because we've already cleared it */
-	if (status & RBBM_ERROR_MASK)
+	if (status & RBBM_ERROR_MASK) {
 		a5xx_rbbm_err_irq(gpu, status);
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A5XX_RBBM_INT_0_MASK_CP_HW_ERROR)
+	if (status & A5XX_RBBM_INT_0_MASK_CP_HW_ERROR) {
 		a5xx_cp_err_irq(gpu);
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT)
+	if (status & A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT) {
 		a5xx_fault_detect_irq(gpu);
+		coredump = true;
+	}
 
-	if (status & A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS)
+	if (status & A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS) {
 		a5xx_uche_err_irq(gpu);
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
+	if (status & A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP) {
 		a5xx_gpmu_err_irq(gpu);
+		coredump |= force_gpu_coredump;
+	}
 
 	if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS) {
 		a5xx_preempt_trigger(gpu);
@@ -1287,6 +1294,12 @@ static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
 	if (status & A5XX_RBBM_INT_0_MASK_CP_SW)
 		a5xx_preempt_irq(gpu);
 
+	if (coredump) {
+		/* Turn off the hangcheck timer to keep it from bothering us */
+		del_timer(&gpu->hangcheck_timer);
+		kthread_queue_work(gpu->worker, &gpu->recover_work);
+	}
+
 	return IRQ_HANDLED;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 6c2edce..f96587f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -15,6 +15,8 @@
 
 #define GPU_PAS_ID 13
 
+extern bool force_gpu_coredump;
+
 static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1354,40 +1356,54 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
 		gpu_read(gpu, REG_A6XX_CP_IB1_REM_SIZE),
 		gpu_read64(gpu, REG_A6XX_CP_IB2_BASE, REG_A6XX_CP_IB2_BASE_HI),
 		gpu_read(gpu, REG_A6XX_CP_IB2_REM_SIZE));
-
-	/* Turn off the hangcheck timer to keep it from bothering us */
-	del_timer(&gpu->hangcheck_timer);
-
-	kthread_queue_work(gpu->worker, &gpu->recover_work);
 }
 
 static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
 {
 	u32 status = gpu_read(gpu, REG_A6XX_RBBM_INT_0_STATUS);
+	bool coredump = false;
 
 	gpu_write(gpu, REG_A6XX_RBBM_INT_CLEAR_CMD, status);
 
-	if (status & A6XX_RBBM_INT_0_MASK_RBBM_HANG_DETECT)
+	if (status & A6XX_RBBM_INT_0_MASK_RBBM_HANG_DETECT) {
 		a6xx_fault_detect_irq(gpu);
+		coredump = true;
+	}
 
-	if (status & A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR)
+	if (status & A6XX_RBBM_INT_0_MASK_CP_AHB_ERROR) {
 		dev_err_ratelimited(&gpu->pdev->dev, "CP | AHB bus error\n");
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A6XX_RBBM_INT_0_MASK_CP_HW_ERROR)
+	if (status & A6XX_RBBM_INT_0_MASK_CP_HW_ERROR) {
 		a6xx_cp_hw_err_irq(gpu);
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW)
+	if (status & A6XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNCFIFO_OVERFLOW) {
 		dev_err_ratelimited(&gpu->pdev->dev, "RBBM | ATB ASYNC overflow\n");
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A6XX_RBBM_INT_0_MASK_RBBM_ATB_BUS_OVERFLOW)
+	if (status & A6XX_RBBM_INT_0_MASK_RBBM_ATB_BUS_OVERFLOW) {
 		dev_err_ratelimited(&gpu->pdev->dev, "RBBM | ATB bus overflow\n");
+		coredump |= force_gpu_coredump;
+	}
 
-	if (status & A6XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS)
+	if (status & A6XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS) {
 		dev_err_ratelimited(&gpu->pdev->dev, "UCHE | Out of bounds access\n");
+		coredump |= force_gpu_coredump;
+	}
 
 	if (status & A6XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
 		msm_gpu_retire(gpu);
 
+	if (coredump) {
+		/* Turn off the hangcheck timer to keep it from bothering us */
+		del_timer(&gpu->hangcheck_timer);
+		kthread_queue_work(gpu->worker, &gpu->recover_work);
+	}
+
 	return IRQ_HANDLED;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 2a6ce76..a159cb9 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -20,6 +20,10 @@ bool allow_vram_carveout = false;
 MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place of IOMMU");
 module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
 
+bool force_gpu_coredump = false;
+MODULE_PARM_DESC(snapshot_debugbus, "Force gpu coredump on hw errors which are usually harmless");
+module_param_named(force_gpu_coredump, force_gpu_coredump, bool, 0600);
+
 static const struct adreno_info gpulist[] = {
 	{
 		.rev   = ADRENO_REV(2, 0, 0, 0),
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump
  2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
                   ` (3 preceding siblings ...)
  2021-11-23 21:17 ` [PATCH v2 5/6] drm/msm: Add a module param to force coredump Akhil P Oommen
@ 2021-11-23 21:17 ` Akhil P Oommen
  2021-11-23 21:36   ` Akhil P Oommen
  4 siblings, 1 reply; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:17 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Douglas Anderson, Jordan Crouse, Matthias Kaehlcke,
	Jonathan Marek, Daniel Vetter, David Airlie, Lee Jones,
	Sean Paul, Stephen Boyd, Stéphane Marchesin, linux-kernel

Add a few more gmu buffers to coredump to help debug gmu
issues.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---

(no changes since v1)

 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 157 +++++++++++++++++++---------
 1 file changed, 108 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index e6f5571..0cb6551 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -13,12 +13,22 @@ struct a6xx_gpu_state_obj {
 	u32 *data;
 };
 
+struct a6xx_gmu_state {
+	struct a6xx_gpu_state_obj *registers;
+	int nr_registers;
+
+	struct msm_gpu_state_bo *log_bo;
+
+	struct msm_gpu_state_bo *hfi_bo;
+
+	struct msm_gpu_state_bo *debug_bo;
+
+	struct msm_gpu_state_bo *mem_bin_bo[2];
+};
+
 struct a6xx_gpu_state {
 	struct msm_gpu_state base;
 
-	struct a6xx_gpu_state_obj *gmu_registers;
-	int nr_gmu_registers;
-
 	struct a6xx_gpu_state_obj *registers;
 	int nr_registers;
 
@@ -42,7 +52,7 @@ struct a6xx_gpu_state {
 	struct a6xx_gpu_state_obj *cx_debugbus;
 	int nr_cx_debugbus;
 
-	struct msm_gpu_state_bo *gmu_log;
+	struct a6xx_gmu_state gmu_state;
 
 	struct list_head objs;
 };
@@ -777,20 +787,21 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
 
-	a6xx_state->gmu_registers = state_kcalloc(a6xx_state,
-		2, sizeof(*a6xx_state->gmu_registers));
+	gmu_state->registers = state_kcalloc(a6xx_state,
+		2, sizeof(*gmu_state->registers));
 
-	if (!a6xx_state->gmu_registers)
+	if (!gmu_state->registers)
 		return;
 
-	a6xx_state->nr_gmu_registers = 2;
+	gmu_state->nr_registers = 2;
 
 	/* Get the CX GMU registers from AHB */
 	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[0],
-		&a6xx_state->gmu_registers[0], false);
+		&gmu_state->registers[0], false);
 	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[1],
-		&a6xx_state->gmu_registers[1], true);
+		&gmu_state->registers[1], true);
 
 	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
 		return;
@@ -799,31 +810,46 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
 	gpu_write(gpu, REG_A6XX_GMU_AO_AHB_FENCE_CTRL, 0);
 
 	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[2],
-		&a6xx_state->gmu_registers[2], false);
+		&gmu_state->registers[2], false);
+
+	gmu_state->nr_registers = 3;
 }
 
-static void a6xx_get_gmu_log(struct msm_gpu *gpu,
+static void a6xx_get_gmu_bo(struct a6xx_gpu_state *a6xx_state,
+		struct a6xx_gmu_bo *gmu_bo, struct msm_gpu_state_bo **dest_bo)
+{
+	struct msm_gpu_state_bo *bo;
+
+	bo = state_kcalloc(a6xx_state, 1, sizeof(**dest_bo));
+	if (!bo)
+		return;
+
+	bo->iova = gmu_bo->iova;
+	bo->size = gmu_bo->size;
+	bo->data = kvzalloc(bo->size, GFP_KERNEL);
+	if (!bo->data)
+		return;
+
+	memcpy(bo->data, gmu_bo->virt, gmu_bo->size);
+
+	*dest_bo = bo;
+}
+
+static void a6xx_get_gmu_state(struct msm_gpu *gpu,
 		struct a6xx_gpu_state *a6xx_state)
 {
+	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
-	struct msm_gpu_state_bo *gmu_log;
 
-	gmu_log = state_kcalloc(a6xx_state,
-		1, sizeof(*a6xx_state->gmu_log));
-	if (!gmu_log)
-		return;
+	a6xx_get_gmu_registers(gpu, a6xx_state);
 
-	gmu_log->iova = gmu->log.iova;
-	gmu_log->size = gmu->log.size;
-	gmu_log->data = kvzalloc(gmu_log->size, GFP_KERNEL);
-	if (!gmu_log->data)
-		return;
+	a6xx_get_gmu_bo(a6xx_state, &gmu->log, &gmu_state->log_bo);
 
-	memcpy(gmu_log->data, gmu->log.virt, gmu->log.size);
+	a6xx_get_gmu_bo(a6xx_state, &gmu->hfi, &gmu_state->hfi_bo);
 
-	a6xx_state->gmu_log = gmu_log;
+	a6xx_get_gmu_bo(a6xx_state, &gmu->debug, &gmu_state->debug_bo);
 }
 
 #define A6XX_GBIF_REGLIST_SIZE   1
@@ -961,9 +987,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
 	/* Get the generic state from the adreno core */
 	adreno_gpu_state_get(gpu, &a6xx_state->base);
 
-	a6xx_get_gmu_registers(gpu, a6xx_state);
-
-	a6xx_get_gmu_log(gpu, a6xx_state);
+	a6xx_get_gmu_state(gpu, a6xx_state);
 
 	/* If GX isn't on the rest of the data isn't going to be accessible */
 	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
@@ -1005,9 +1029,16 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
 			struct msm_gpu_state, ref);
 	struct a6xx_gpu_state *a6xx_state = container_of(state,
 			struct a6xx_gpu_state, base);
+	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
 
-	if (a6xx_state->gmu_log && a6xx_state->gmu_log->data)
-		kvfree(a6xx_state->gmu_log->data);
+	if (gmu_state->log_bo && gmu_state->log_bo->data)
+		kvfree(gmu_state->log_bo->data);
+
+	if (gmu_state->hfi_bo && gmu_state->hfi_bo->data)
+		kvfree(gmu_state->hfi_bo->data);
+
+	if (gmu_state->debug_bo && gmu_state->debug_bo->data)
+		kvfree(gmu_state->debug_bo->data);
 
 	list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
 		kfree(obj);
@@ -1210,31 +1241,43 @@ static void a6xx_show_debugbus(struct a6xx_gpu_state *a6xx_state,
 	}
 }
 
-void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
-		struct drm_printer *p)
+void a6xx_gmu_show(struct a6xx_gmu_state *gmu_state, struct drm_printer *p)
 {
-	struct a6xx_gpu_state *a6xx_state = container_of(state,
-			struct a6xx_gpu_state, base);
 	int i;
 
-	if (IS_ERR_OR_NULL(state))
-		return;
+	drm_puts(p, "gmu-log:\n");
+	if (gmu_state->log_bo) {
+		struct msm_gpu_state_bo *log_bo = gmu_state->log_bo;
 
-	adreno_show(gpu, state, p);
+		drm_printf(p, "    iova: 0x%016llx\n", log_bo->iova);
+		drm_printf(p, "    size: %zu\n", log_bo->size);
+		adreno_show_object(p, &log_bo->data, log_bo->size,
+				&log_bo->encoded);
+	}
 
-	drm_puts(p, "gmu-log:\n");
-	if (a6xx_state->gmu_log) {
-		struct msm_gpu_state_bo *gmu_log = a6xx_state->gmu_log;
+	drm_puts(p, "gmu-hfi:\n");
+	if (gmu_state->hfi_bo) {
+		struct msm_gpu_state_bo *hfi_bo = gmu_state->hfi_bo;
 
-		drm_printf(p, "    iova: 0x%016llx\n", gmu_log->iova);
-		drm_printf(p, "    size: %zu\n", gmu_log->size);
-		adreno_show_object(p, &gmu_log->data, gmu_log->size,
-				&gmu_log->encoded);
+		drm_printf(p, "    iova: 0x%016llx\n", hfi_bo->iova);
+		drm_printf(p, "    size: %zu\n", hfi_bo->size);
+		adreno_show_object(p, &hfi_bo->data, hfi_bo->size,
+				&hfi_bo->encoded);
 	}
 
-	drm_puts(p, "registers:\n");
-	for (i = 0; i < a6xx_state->nr_registers; i++) {
-		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
+	drm_puts(p, "gmu-debug:\n");
+	if (gmu_state->debug_bo) {
+		struct msm_gpu_state_bo *debug_bo = gmu_state->debug_bo;
+
+		drm_printf(p, "    iova: 0x%016llx\n", debug_bo->iova);
+		drm_printf(p, "    size: %zu\n", debug_bo->size);
+		adreno_show_object(p, &debug_bo->data, debug_bo->size,
+				&debug_bo->encoded);
+	}
+
+	drm_puts(p, "registers-gmu:\n");
+	for (i = 0; i < gmu_state->nr_registers; i++) {
+		struct a6xx_gpu_state_obj *obj = &gmu_state->registers[i];
 		const struct a6xx_registers *regs = obj->handle;
 
 		if (!obj->handle)
@@ -1242,10 +1285,26 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
 
 		a6xx_show_registers(regs->registers, obj->data, regs->count, p);
 	}
+}
 
-	drm_puts(p, "registers-gmu:\n");
-	for (i = 0; i < a6xx_state->nr_gmu_registers; i++) {
-		struct a6xx_gpu_state_obj *obj = &a6xx_state->gmu_registers[i];
+
+void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
+		struct drm_printer *p)
+{
+	struct a6xx_gpu_state *a6xx_state = container_of(state,
+			struct a6xx_gpu_state, base);
+	int i;
+
+	if (IS_ERR_OR_NULL(state))
+		return;
+
+	adreno_show(gpu, state, p);
+
+	a6xx_gmu_show(&a6xx_state->gmu_state, p);
+
+	drm_puts(p, "registers:\n");
+	for (i = 0; i < a6xx_state->nr_registers; i++) {
+		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
 		const struct a6xx_registers *regs = obj->handle;
 
 		if (!obj->handle)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump
  2021-11-23 21:17 ` [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump Akhil P Oommen
@ 2021-11-23 21:36   ` Akhil P Oommen
  0 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-23 21:36 UTC (permalink / raw)
  To: freedreno, dri-devel, linux-arm-msm, Rob Clark
  Cc: Jonathan Marek, David Airlie, Sean Paul, Douglas Anderson,
	Jordan Crouse, Matthias Kaehlcke, Stéphane Marchesin,
	Stephen Boyd, Lee Jones, linux-kernel

On 11/24/2021 2:47 AM, Akhil P Oommen wrote:
> Add a few more gmu buffers to coredump to help debug gmu
> issues.
> 
> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> ---
> 
> (no changes since v1)
> 
>   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 157 +++++++++++++++++++---------
>   1 file changed, 108 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> index e6f5571..0cb6551 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> @@ -13,12 +13,22 @@ struct a6xx_gpu_state_obj {
>   	u32 *data;
>   };
>   
> +struct a6xx_gmu_state {
> +	struct a6xx_gpu_state_obj *registers;
> +	int nr_registers;
> +
> +	struct msm_gpu_state_bo *log_bo;
> +
> +	struct msm_gpu_state_bo *hfi_bo;
> +
> +	struct msm_gpu_state_bo *debug_bo;
> +
> +	struct msm_gpu_state_bo *mem_bin_bo[2];
> +};
> +
>   struct a6xx_gpu_state {
>   	struct msm_gpu_state base;
>   
> -	struct a6xx_gpu_state_obj *gmu_registers;
> -	int nr_gmu_registers;
> -
>   	struct a6xx_gpu_state_obj *registers;
>   	int nr_registers;
>   
> @@ -42,7 +52,7 @@ struct a6xx_gpu_state {
>   	struct a6xx_gpu_state_obj *cx_debugbus;
>   	int nr_cx_debugbus;
>   
> -	struct msm_gpu_state_bo *gmu_log;
> +	struct a6xx_gmu_state gmu_state;
>   
>   	struct list_head objs;
>   };
> @@ -777,20 +787,21 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
>   
> -	a6xx_state->gmu_registers = state_kcalloc(a6xx_state,
> -		2, sizeof(*a6xx_state->gmu_registers));
> +	gmu_state->registers = state_kcalloc(a6xx_state,
> +		2, sizeof(*gmu_state->registers));
>   
> -	if (!a6xx_state->gmu_registers)
> +	if (!gmu_state->registers)
>   		return;
>   
> -	a6xx_state->nr_gmu_registers = 2;
> +	gmu_state->nr_registers = 2;
>   
>   	/* Get the CX GMU registers from AHB */
>   	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[0],
> -		&a6xx_state->gmu_registers[0], false);
> +		&gmu_state->registers[0], false);
>   	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[1],
> -		&a6xx_state->gmu_registers[1], true);
> +		&gmu_state->registers[1], true);
>   
>   	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>   		return;
> @@ -799,31 +810,46 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
>   	gpu_write(gpu, REG_A6XX_GMU_AO_AHB_FENCE_CTRL, 0);
>   
>   	_a6xx_get_gmu_registers(gpu, a6xx_state, &a6xx_gmu_reglist[2],
> -		&a6xx_state->gmu_registers[2], false);
> +		&gmu_state->registers[2], false);
> +
> +	gmu_state->nr_registers = 3;

This is not required after rebasing on top of:
https://lore.kernel.org/all/20211103153049.1.Idfa574ccb529d17b69db3a1852e49b580132035c@changeid/

-Akhil.

>   }
>   
> -static void a6xx_get_gmu_log(struct msm_gpu *gpu,
> +static void a6xx_get_gmu_bo(struct a6xx_gpu_state *a6xx_state,
> +		struct a6xx_gmu_bo *gmu_bo, struct msm_gpu_state_bo **dest_bo)
> +{
> +	struct msm_gpu_state_bo *bo;
> +
> +	bo = state_kcalloc(a6xx_state, 1, sizeof(**dest_bo));
> +	if (!bo)
> +		return;
> +
> +	bo->iova = gmu_bo->iova;
> +	bo->size = gmu_bo->size;
> +	bo->data = kvzalloc(bo->size, GFP_KERNEL);
> +	if (!bo->data)
> +		return;
> +
> +	memcpy(bo->data, gmu_bo->virt, gmu_bo->size);
> +
> +	*dest_bo = bo;
> +}
> +
> +static void a6xx_get_gmu_state(struct msm_gpu *gpu,
>   		struct a6xx_gpu_state *a6xx_state)
>   {
> +	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> -	struct msm_gpu_state_bo *gmu_log;
>   
> -	gmu_log = state_kcalloc(a6xx_state,
> -		1, sizeof(*a6xx_state->gmu_log));
> -	if (!gmu_log)
> -		return;
> +	a6xx_get_gmu_registers(gpu, a6xx_state);
>   
> -	gmu_log->iova = gmu->log.iova;
> -	gmu_log->size = gmu->log.size;
> -	gmu_log->data = kvzalloc(gmu_log->size, GFP_KERNEL);
> -	if (!gmu_log->data)
> -		return;
> +	a6xx_get_gmu_bo(a6xx_state, &gmu->log, &gmu_state->log_bo);
>   
> -	memcpy(gmu_log->data, gmu->log.virt, gmu->log.size);
> +	a6xx_get_gmu_bo(a6xx_state, &gmu->hfi, &gmu_state->hfi_bo);
>   
> -	a6xx_state->gmu_log = gmu_log;
> +	a6xx_get_gmu_bo(a6xx_state, &gmu->debug, &gmu_state->debug_bo);
>   }
>   
>   #define A6XX_GBIF_REGLIST_SIZE   1
> @@ -961,9 +987,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>   	/* Get the generic state from the adreno core */
>   	adreno_gpu_state_get(gpu, &a6xx_state->base);
>   
> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> -
> -	a6xx_get_gmu_log(gpu, a6xx_state);
> +	a6xx_get_gmu_state(gpu, a6xx_state);
>   
>   	/* If GX isn't on the rest of the data isn't going to be accessible */
>   	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> @@ -1005,9 +1029,16 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
>   			struct msm_gpu_state, ref);
>   	struct a6xx_gpu_state *a6xx_state = container_of(state,
>   			struct a6xx_gpu_state, base);
> +	struct a6xx_gmu_state *gmu_state = &a6xx_state->gmu_state;
>   
> -	if (a6xx_state->gmu_log && a6xx_state->gmu_log->data)
> -		kvfree(a6xx_state->gmu_log->data);
> +	if (gmu_state->log_bo && gmu_state->log_bo->data)
> +		kvfree(gmu_state->log_bo->data);
> +
> +	if (gmu_state->hfi_bo && gmu_state->hfi_bo->data)
> +		kvfree(gmu_state->hfi_bo->data);
> +
> +	if (gmu_state->debug_bo && gmu_state->debug_bo->data)
> +		kvfree(gmu_state->debug_bo->data);
>   
>   	list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
>   		kfree(obj);
> @@ -1210,31 +1241,43 @@ static void a6xx_show_debugbus(struct a6xx_gpu_state *a6xx_state,
>   	}
>   }
>   
> -void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> -		struct drm_printer *p)
> +void a6xx_gmu_show(struct a6xx_gmu_state *gmu_state, struct drm_printer *p)
>   {
> -	struct a6xx_gpu_state *a6xx_state = container_of(state,
> -			struct a6xx_gpu_state, base);
>   	int i;
>   
> -	if (IS_ERR_OR_NULL(state))
> -		return;
> +	drm_puts(p, "gmu-log:\n");
> +	if (gmu_state->log_bo) {
> +		struct msm_gpu_state_bo *log_bo = gmu_state->log_bo;
>   
> -	adreno_show(gpu, state, p);
> +		drm_printf(p, "    iova: 0x%016llx\n", log_bo->iova);
> +		drm_printf(p, "    size: %zu\n", log_bo->size);
> +		adreno_show_object(p, &log_bo->data, log_bo->size,
> +				&log_bo->encoded);
> +	}
>   
> -	drm_puts(p, "gmu-log:\n");
> -	if (a6xx_state->gmu_log) {
> -		struct msm_gpu_state_bo *gmu_log = a6xx_state->gmu_log;
> +	drm_puts(p, "gmu-hfi:\n");
> +	if (gmu_state->hfi_bo) {
> +		struct msm_gpu_state_bo *hfi_bo = gmu_state->hfi_bo;
>   
> -		drm_printf(p, "    iova: 0x%016llx\n", gmu_log->iova);
> -		drm_printf(p, "    size: %zu\n", gmu_log->size);
> -		adreno_show_object(p, &gmu_log->data, gmu_log->size,
> -				&gmu_log->encoded);
> +		drm_printf(p, "    iova: 0x%016llx\n", hfi_bo->iova);
> +		drm_printf(p, "    size: %zu\n", hfi_bo->size);
> +		adreno_show_object(p, &hfi_bo->data, hfi_bo->size,
> +				&hfi_bo->encoded);
>   	}
>   
> -	drm_puts(p, "registers:\n");
> -	for (i = 0; i < a6xx_state->nr_registers; i++) {
> -		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
> +	drm_puts(p, "gmu-debug:\n");
> +	if (gmu_state->debug_bo) {
> +		struct msm_gpu_state_bo *debug_bo = gmu_state->debug_bo;
> +
> +		drm_printf(p, "    iova: 0x%016llx\n", debug_bo->iova);
> +		drm_printf(p, "    size: %zu\n", debug_bo->size);
> +		adreno_show_object(p, &debug_bo->data, debug_bo->size,
> +				&debug_bo->encoded);
> +	}
> +
> +	drm_puts(p, "registers-gmu:\n");
> +	for (i = 0; i < gmu_state->nr_registers; i++) {
> +		struct a6xx_gpu_state_obj *obj = &gmu_state->registers[i];
>   		const struct a6xx_registers *regs = obj->handle;
>   
>   		if (!obj->handle)
> @@ -1242,10 +1285,26 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>   
>   		a6xx_show_registers(regs->registers, obj->data, regs->count, p);
>   	}
> +}
>   
> -	drm_puts(p, "registers-gmu:\n");
> -	for (i = 0; i < a6xx_state->nr_gmu_registers; i++) {
> -		struct a6xx_gpu_state_obj *obj = &a6xx_state->gmu_registers[i];
> +
> +void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> +		struct drm_printer *p)
> +{
> +	struct a6xx_gpu_state *a6xx_state = container_of(state,
> +			struct a6xx_gpu_state, base);
> +	int i;
> +
> +	if (IS_ERR_OR_NULL(state))
> +		return;
> +
> +	adreno_show(gpu, state, p);
> +
> +	a6xx_gmu_show(&a6xx_state->gmu_state, p);
> +
> +	drm_puts(p, "registers:\n");
> +	for (i = 0; i < a6xx_state->nr_registers; i++) {
> +		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
>   		const struct a6xx_registers *regs = obj->handle;
>   
>   		if (!obj->handle)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump
  2021-11-23 21:17 ` [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump Akhil P Oommen
@ 2021-11-23 23:03   ` Bjorn Andersson
  2021-11-24 10:20     ` Akhil P Oommen
  0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Andersson @ 2021-11-23 23:03 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: freedreno, dri-devel, linux-arm-msm, Rob Clark, Douglas Anderson,
	Jordan Crouse, Matthias Kaehlcke, Jonathan Marek,
	AngeloGioacchino Del Regno, Daniel Vetter, David Airlie,
	Konrad Dybcio, Lee Jones, Linux Patches Robot,
	Sai Prakash Ranjan, Sean Paul, Sharat Masetty, Stephen Boyd,
	St?phane Marchesin, linux-kernel

On Tue 23 Nov 13:17 PST 2021, Akhil P Oommen wrote:

> Capture gmu log in coredump to enhance debugging.
> 
> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> ---
> 
> Changes in v2:
> - Fix kernel test robot's warning about size_t's format specifier
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |  2 ++
>  3 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> index e8f65cd..e6f5571 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> @@ -42,6 +42,8 @@ struct a6xx_gpu_state {
>  	struct a6xx_gpu_state_obj *cx_debugbus;
>  	int nr_cx_debugbus;
>  
> +	struct msm_gpu_state_bo *gmu_log;
> +
>  	struct list_head objs;
>  };
>  
> @@ -800,6 +802,30 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
>  		&a6xx_state->gmu_registers[2], false);
>  }
>  
> +static void a6xx_get_gmu_log(struct msm_gpu *gpu,
> +		struct a6xx_gpu_state *a6xx_state)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> +	struct msm_gpu_state_bo *gmu_log;
> +
> +	gmu_log = state_kcalloc(a6xx_state,
> +		1, sizeof(*a6xx_state->gmu_log));

This line isn't even 80 chars long, so I see no reason to wrap it and if
you ran checkpatch --strict on this patch it would complain about how
you indent that second line as well.

It would also look better with sizeof(*gmu_log), even though they should
have the same size today...

> +	if (!gmu_log)
> +		return;
> +
> +	gmu_log->iova = gmu->log.iova;
> +	gmu_log->size = gmu->log.size;
> +	gmu_log->data = kvzalloc(gmu_log->size, GFP_KERNEL);
> +	if (!gmu_log->data)
> +		return;
> +
> +	memcpy(gmu_log->data, gmu->log.virt, gmu->log.size);
> +
> +	a6xx_state->gmu_log = gmu_log;
> +}
> +
>  #define A6XX_GBIF_REGLIST_SIZE   1
>  static void a6xx_get_registers(struct msm_gpu *gpu,
>  		struct a6xx_gpu_state *a6xx_state,
> @@ -937,6 +963,8 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>  
>  	a6xx_get_gmu_registers(gpu, a6xx_state);
>  
> +	a6xx_get_gmu_log(gpu, a6xx_state);
> +
>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>  	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>  		return &a6xx_state->base;
> @@ -978,6 +1006,9 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
>  	struct a6xx_gpu_state *a6xx_state = container_of(state,
>  			struct a6xx_gpu_state, base);
>  
> +	if (a6xx_state->gmu_log && a6xx_state->gmu_log->data)
> +		kvfree(a6xx_state->gmu_log->data);
> +
>  	list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
>  		kfree(obj);
>  
> @@ -1191,6 +1222,16 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>  
>  	adreno_show(gpu, state, p);
>  
> +	drm_puts(p, "gmu-log:\n");
> +	if (a6xx_state->gmu_log) {
> +		struct msm_gpu_state_bo *gmu_log = a6xx_state->gmu_log;
> +
> +		drm_printf(p, "    iova: 0x%016llx\n", gmu_log->iova);
> +		drm_printf(p, "    size: %zu\n", gmu_log->size);
> +		adreno_show_object(p, &gmu_log->data, gmu_log->size,
> +				&gmu_log->encoded);
> +	}
> +
>  	drm_puts(p, "registers:\n");
>  	for (i = 0; i < a6xx_state->nr_registers; i++) {
>  		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1539b8e..b43346e 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -638,7 +638,7 @@ static char *adreno_gpu_ascii85_encode(u32 *src, size_t len)
>  }
>  
>  /* len is expected to be in bytes */
> -static void adreno_show_object(struct drm_printer *p, void **ptr, int len,
> +void adreno_show_object(struct drm_printer *p, void **ptr, int len,
>  		bool *encoded)

Please indent your broken lines by the ( on the line before.

Regards,
Bjorn

>  {
>  	if (!*ptr || !len)
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 225c277..6762308 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -306,6 +306,8 @@ void adreno_gpu_state_destroy(struct msm_gpu_state *state);
>  
>  int adreno_gpu_state_get(struct msm_gpu *gpu, struct msm_gpu_state *state);
>  int adreno_gpu_state_put(struct msm_gpu_state *state);
> +void adreno_show_object(struct drm_printer *p, void **ptr, int len,
> +		bool *encoded);
>  
>  /*
>   * Common helper function to initialize the default address space for arm-smmu
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump
  2021-11-23 23:03   ` Bjorn Andersson
@ 2021-11-24 10:20     ` Akhil P Oommen
  0 siblings, 0 replies; 9+ messages in thread
From: Akhil P Oommen @ 2021-11-24 10:20 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: freedreno, dri-devel, linux-arm-msm, Rob Clark, Douglas Anderson,
	Jordan Crouse, Matthias Kaehlcke, Jonathan Marek,
	AngeloGioacchino Del Regno, Daniel Vetter, David Airlie,
	Konrad Dybcio, Lee Jones, Linux Patches Robot,
	Sai Prakash Ranjan, Sean Paul, Sharat Masetty, Stephen Boyd,
	St?phane Marchesin, linux-kernel

On 11/24/2021 4:33 AM, Bjorn Andersson wrote:
> On Tue 23 Nov 13:17 PST 2021, Akhil P Oommen wrote:
> 
>> Capture gmu log in coredump to enhance debugging.
>>
>> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
>> ---
>>
>> Changes in v2:
>> - Fix kernel test robot's warning about size_t's format specifier
>>
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 +++++++++++++++++++++++++++++
>>   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  2 +-
>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h     |  2 ++
>>   3 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> index e8f65cd..e6f5571 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> @@ -42,6 +42,8 @@ struct a6xx_gpu_state {
>>   	struct a6xx_gpu_state_obj *cx_debugbus;
>>   	int nr_cx_debugbus;
>>   
>> +	struct msm_gpu_state_bo *gmu_log;
>> +
>>   	struct list_head objs;
>>   };
>>   
>> @@ -800,6 +802,30 @@ static void a6xx_get_gmu_registers(struct msm_gpu *gpu,
>>   		&a6xx_state->gmu_registers[2], false);
>>   }
>>   
>> +static void a6xx_get_gmu_log(struct msm_gpu *gpu,
>> +		struct a6xx_gpu_state *a6xx_state)
>> +{
>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> +	struct msm_gpu_state_bo *gmu_log;
>> +
>> +	gmu_log = state_kcalloc(a6xx_state,
>> +		1, sizeof(*a6xx_state->gmu_log));
> 
> This line isn't even 80 chars long, so I see no reason to wrap it and if
> you ran checkpatch --strict on this patch it would complain about how
> you indent that second line as well.
> 
> It would also look better with sizeof(*gmu_log), even though they should
> have the same size today...
> 
>> +	if (!gmu_log)
>> +		return;
>> +
>> +	gmu_log->iova = gmu->log.iova;
>> +	gmu_log->size = gmu->log.size;
>> +	gmu_log->data = kvzalloc(gmu_log->size, GFP_KERNEL);
>> +	if (!gmu_log->data)
>> +		return;
>> +
>> +	memcpy(gmu_log->data, gmu->log.virt, gmu->log.size);
>> +
>> +	a6xx_state->gmu_log = gmu_log;
>> +}
>> +
>>   #define A6XX_GBIF_REGLIST_SIZE   1
>>   static void a6xx_get_registers(struct msm_gpu *gpu,
>>   		struct a6xx_gpu_state *a6xx_state,
>> @@ -937,6 +963,8 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>   
>>   	a6xx_get_gmu_registers(gpu, a6xx_state);
>>   
>> +	a6xx_get_gmu_log(gpu, a6xx_state);
>> +
>>   	/* If GX isn't on the rest of the data isn't going to be accessible */
>>   	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>   		return &a6xx_state->base;
>> @@ -978,6 +1006,9 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
>>   	struct a6xx_gpu_state *a6xx_state = container_of(state,
>>   			struct a6xx_gpu_state, base);
>>   
>> +	if (a6xx_state->gmu_log && a6xx_state->gmu_log->data)
>> +		kvfree(a6xx_state->gmu_log->data);
>> +
>>   	list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
>>   		kfree(obj);
>>   
>> @@ -1191,6 +1222,16 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
>>   
>>   	adreno_show(gpu, state, p);
>>   
>> +	drm_puts(p, "gmu-log:\n");
>> +	if (a6xx_state->gmu_log) {
>> +		struct msm_gpu_state_bo *gmu_log = a6xx_state->gmu_log;
>> +
>> +		drm_printf(p, "    iova: 0x%016llx\n", gmu_log->iova);
>> +		drm_printf(p, "    size: %zu\n", gmu_log->size);
>> +		adreno_show_object(p, &gmu_log->data, gmu_log->size,
>> +				&gmu_log->encoded);
>> +	}
>> +
>>   	drm_puts(p, "registers:\n");
>>   	for (i = 0; i < a6xx_state->nr_registers; i++) {
>>   		struct a6xx_gpu_state_obj *obj = &a6xx_state->registers[i];
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> index 1539b8e..b43346e 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> @@ -638,7 +638,7 @@ static char *adreno_gpu_ascii85_encode(u32 *src, size_t len)
>>   }
>>   
>>   /* len is expected to be in bytes */
>> -static void adreno_show_object(struct drm_printer *p, void **ptr, int len,
>> +void adreno_show_object(struct drm_printer *p, void **ptr, int len,
>>   		bool *encoded)
> 
> Please indent your broken lines by the ( on the line before.

Just curious, is this a common coding style in kernel?

-Akhil.

> 
> Regards,
> Bjorn
> 
>>   {
>>   	if (!*ptr || !len)
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> index 225c277..6762308 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> @@ -306,6 +306,8 @@ void adreno_gpu_state_destroy(struct msm_gpu_state *state);
>>   
>>   int adreno_gpu_state_get(struct msm_gpu *gpu, struct msm_gpu_state *state);
>>   int adreno_gpu_state_put(struct msm_gpu_state *state);
>> +void adreno_show_object(struct drm_printer *p, void **ptr, int len,
>> +		bool *encoded);
>>   
>>   /*
>>    * Common helper function to initialize the default address space for arm-smmu
>> -- 
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
>> of Code Aurora Forum, hosted by The Linux Foundation.
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-24 10:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 21:17 [PATCH v2 1/6] drm/msm: Increase gpu boost interval Akhil P Oommen
2021-11-23 21:17 ` [PATCH v2 2/6] drm/msm: Fix null ptr dereference in msm_ioctl_gem_submit() Akhil P Oommen
2021-11-23 21:17 ` [PATCH v2 3/6] drm/msm/a6xx: Fix smatch warning for gpu_scid Akhil P Oommen
2021-11-23 21:17 ` [PATCH v2 4/6] drm/msm/a6xx: Capture gmu log in devcoredump Akhil P Oommen
2021-11-23 23:03   ` Bjorn Andersson
2021-11-24 10:20     ` Akhil P Oommen
2021-11-23 21:17 ` [PATCH v2 5/6] drm/msm: Add a module param to force coredump Akhil P Oommen
2021-11-23 21:17 ` [PATCH v2 6/6] drm/msm/a6xx: Add a few gmu buffers to coredump Akhil P Oommen
2021-11-23 21:36   ` Akhil P Oommen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).