linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Erico Nunes <nunes.erico@gmail.com>
To: Qiang Yu <yuq825@gmail.com>,
	anarsoul@gmail.com, christian.koenig@amd.com,
	dri-devel@lists.freedesktop.org, lima@lists.freedesktop.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	linux-kernel@vger.kernel.org, Erico Nunes <nunes.erico@gmail.com>
Subject: [PATCH v2 5/8] drm/lima: handle spurious timeouts due to high irq latency
Date: Wed, 24 Jan 2024 03:59:44 +0100	[thread overview]
Message-ID: <20240124025947.2110659-6-nunes.erico@gmail.com> (raw)
In-Reply-To: <20240124025947.2110659-1-nunes.erico@gmail.com>

There are several unexplained and unreproduced cases of rendering
timeouts with lima, for which one theory is high IRQ latency coming from
somewhere else in the system.
This kind of occurrence may cause applications to trigger unnecessary
resets of the GPU or even applications to hang if it hits an issue in
the recovery path.
Panfrost already does some special handling to account for such
"spurious timeouts", it makes sense to have this in lima too to reduce
the chance that it hit users.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
---
 drivers/gpu/drm/lima/lima_sched.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index c3bf8cda8498..814428564637 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /* Copyright 2017-2019 Qiang Yu <yuq825@gmail.com> */
 
+#include <linux/hardirq.h>
 #include <linux/iosys-map.h>
 #include <linux/kthread.h>
 #include <linux/slab.h>
@@ -401,9 +402,35 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
 	struct lima_sched_pipe *pipe = to_lima_pipe(job->sched);
 	struct lima_sched_task *task = to_lima_task(job);
 	struct lima_device *ldev = pipe->ldev;
+	struct lima_ip *ip = pipe->processor[0];
+	int i;
+
+	/*
+	 * If the GPU managed to complete this jobs fence, the timeout is
+	 * spurious. Bail out.
+	 */
+	if (dma_fence_is_signaled(task->fence)) {
+		DRM_WARN("%s spurious timeout\n", lima_ip_name(ip));
+		return DRM_GPU_SCHED_STAT_NOMINAL;
+	}
+
+	/*
+	 * Lima IRQ handler may take a long time to process an interrupt
+	 * if there is another IRQ handler hogging the processing.
+	 * In order to catch such cases and not report spurious Lima job
+	 * timeouts, synchronize the IRQ handler and re-check the fence
+	 * status.
+	 */
+	for (i = 0; i < pipe->num_processor; i++)
+		synchronize_irq(pipe->processor[i]->irq);
+
+	if (dma_fence_is_signaled(task->fence)) {
+		DRM_WARN("%s unexpectedly high interrupt latency\n", lima_ip_name(ip));
+		return DRM_GPU_SCHED_STAT_NOMINAL;
+	}
 
 	if (!pipe->error)
-		DRM_ERROR("lima job timeout\n");
+		DRM_ERROR("%s job timeout\n", lima_ip_name(ip));
 
 	drm_sched_stop(&pipe->base, &task->base);
 
@@ -417,8 +444,6 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
 	if (pipe->bcast_mmu)
 		lima_mmu_page_fault_resume(pipe->bcast_mmu);
 	else {
-		int i;
-
 		for (i = 0; i < pipe->num_mmu; i++)
 			lima_mmu_page_fault_resume(pipe->mmu[i]);
 	}
-- 
2.43.0


  parent reply	other threads:[~2024-01-24  3:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24  2:59 [PATCH v2 0/8] drm/lima: fixes and improvements to error recovery Erico Nunes
2024-01-24  2:59 ` [PATCH v2 1/8] drm/lima: reset async_reset on pp hard reset Erico Nunes
2024-01-24  2:59 ` [PATCH v2 2/8] drm/lima: reset async_reset on gp " Erico Nunes
2024-01-24  2:59 ` [PATCH v2 3/8] drm/lima: set pp bus_stop bit before " Erico Nunes
2024-01-24  2:59 ` [PATCH v2 4/8] drm/lima: set gp " Erico Nunes
2024-01-24  2:59 ` Erico Nunes [this message]
2024-01-24 12:38   ` [PATCH v2 5/8] drm/lima: handle spurious timeouts due to high irq latency Qiang Yu
2024-01-29 22:55     ` Erico Nunes
2024-01-30  1:05       ` Qiang Yu
2024-01-24  2:59 ` [PATCH v2 6/8] drm/lima: remove guilty drm_sched context handling Erico Nunes
2024-01-24  2:59 ` [PATCH v2 7/8] drm/lima: increase default job timeout to 10s Erico Nunes
2024-01-24  2:59 ` [PATCH v2 8/8] drm/lima: standardize debug messages by ip name Erico Nunes
2024-01-30  1:07 ` [PATCH v2 0/8] drm/lima: fixes and improvements to error recovery Qiang Yu
2024-02-12  8:40   ` Qiang Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240124025947.2110659-6-nunes.erico@gmail.com \
    --to=nunes.erico@gmail.com \
    --cc=airlied@gmail.com \
    --cc=anarsoul@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=lima@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=yuq825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).